Pandas DataFrame.iterrows

The DataFrame.iterrows method in pandas is used to iterate over rows of a DataFrame as (index, Series) pairs. This is a convenient way to perform row-wise operations on a DataFrame.

Syntax

The syntax for DataFrame.iterrows is:

DataFrame.iterrows()

Here, DataFrame refers to the pandas DataFrame over which the iteration is performed.

Returns

An iterator that yields tuples of the form (index, Series), where:

index: The index of the row.
Series: A pandas Series containing the data for the row.

Examples

Iterating Over Rows in a DataFrame

Use iterrows to iterate through each row of a DataFrame and access its data.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000.5, 80000.0, 90000.0]
}
df = pd.DataFrame(data)

# Iterate over rows
print("Iterating over rows:")
for index, row in df.iterrows():
    print(f"Index: {index}, Name: {row['Name']}, Age: {row['Age']}, Salary: {row['Salary']}")

Output

Iterating over rows:
Index: 0, Name: Arjun, Age: 25, Salary: 70000.5
Index: 1, Name: Ram, Age: 30, Salary: 80000.0
Index: 2, Name: Priya, Age: 35, Salary: 90000.0

Modifying Rows Using `iterrows`

You can use iterrows to modify or process rows dynamically. However, changes to the row object won't affect the original DataFrame.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000.5, 80000.0, 90000.0]
}
df = pd.DataFrame(data)

# Add a custom message for each row
print("Adding custom messages dynamically:")
for index, row in df.iterrows():
    print(f"{row['Name']} earns {row['Salary']} at the age of {row['Age']}.")

Output

Adding custom messages dynamically:
Arjun earns 70000.5 at the age of 25.
Ram earns 80000.0 at the age of 30.
Priya earns 90000.0 at the age of 35.

Using `iterrows` to Create a New Column

Generate new column values based on row-wise operations using iterrows.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000.5, 80000.0, 90000.0]
}
df = pd.DataFrame(data)

# Add a new column based on row data
discounted_salaries = []
for index, row in df.iterrows():
    discounted_salary = row['Salary'] * 0.9  # Apply a 10% discount
    discounted_salaries.append(discounted_salary)

df['Discounted Salary'] = discounted_salaries
print("DataFrame with Discounted Salaries:")
print(df)

Output

DataFrame with Discounted Salaries:
    Name  Age   Salary  Discounted Salary
0  Arjun   25  70000.5           63000.45
1    Ram   30  80000.0           72000.00
2   Priya   35  90000.0           81000.00

Performance Consideration

iterrows is not the most efficient method for processing large DataFrames because it creates a new Series object for each row. For better performance, consider vectorized operations or itertuples.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000.5, 80000.0, 90000.0]
}
df = pd.DataFrame(data)

# Use itertuples for better performance
print("Using itertuples for better performance:")
for row in df.itertuples():
    print(f"Name: {row.Name}, Age: {row.Age}, Salary: {row.Salary}")

Output

Using itertuples for better performance:
Name: Arjun, Age: 25, Salary: 70000.5
Name: Ram, Age: 30, Salary: 80000.0
Name: Priya, Age: 35, Salary: 90000.0

Summary

In this tutorial, we explored the DataFrame.iterrows method in pandas. Key takeaways include:

Using iterrows to iterate over rows in a DataFrame as (index, Series) pairs.
Modifying rows dynamically (though changes won't affect the original DataFrame).
Considering performance implications for large DataFrames.

While iterrows is useful for row-wise operations, for larger datasets, vectorized operations or itertuples are often more efficient.