Pandas DataFrame.where
Pandas DataFrame.where
The DataFrame.where method in pandas is used to replace values in a DataFrame based on a condition. It is particularly useful for applying conditional operations where elements that do not satisfy a condition are replaced with a specified value.
Syntax
The syntax for DataFrame.where is:
DataFrame.where(cond, other=nan, *, inplace=False, axis=None, level=None)Here, DataFrame refers to the pandas DataFrame on which the method is being applied.
Parameters
| Parameter | Description |
|---|---|
cond | A condition to evaluate. Elements satisfying the condition retain their original values; otherwise, they are replaced by other. |
other | Value to replace elements not satisfying the condition. Defaults to NaN. |
inplace | If True, performs the operation in place and modifies the DataFrame directly. Defaults to False. |
axis | Axis along which to evaluate the condition. Defaults to None. |
level | Used for multi-level indexing to specify which level to apply the operation on. Defaults to None. |
Returns
A DataFrame with values replaced based on the condition.
Examples
Basic Usage of where
Use where to replace values that do not satisfy a condition with NaN.
Python Program
import pandas as pd
import numpy as np
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)
# Replace values where Age < 30
print("DataFrame where Age < 30 is replaced with NaN:")
result = df.where(df['Age'] >= 30)
print(result)Output
DataFrame where Age < 30 is replaced with NaN:
Name Age Salary
0 NaN NaN NaN
1 Ram 30.0 80000.0
2 Priya 35.0 90000.0Using the other Parameter
Specify a custom value to replace elements that do not satisfy the condition.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)
# Replace values where Age < 30 with 0
print("DataFrame where Age < 30 is replaced with 0:")
result = df.where(df['Age'] >= 30, other=0)
print(result)Output
DataFrame where Age < 30 is replaced with 0:
Name Age Salary
0 0 0 0
1 Ram 30 80000
2 Priya 35 90000Using inplace=True
Modify the DataFrame directly using the inplace=True parameter.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)
# Replace values where Age < 30 with NaN in place
print("Modifying DataFrame in place where Age < 30:")
df.where(df['Age'] >= 30, inplace=True)
print(df)Output
Modifying DataFrame in place where Age < 30:
Name Age Salary
0 NaN NaN NaN
1 Ram 30.0 80000.0
2 Priya 35.0 90000.0Applying Conditions Across All Columns
Use where to apply a condition across multiple columns.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)
# Replace all values less than 80000 with NaN
print("DataFrame where all values < 80000 are replaced with NaN:")
result = df.where(df >= 80000)
print(result)Output
DataFrame where all values < 80000 are replaced with NaN:
Name Age Salary
0 NaN NaN NaN
1 NaN NaN 80000.0
2 Priya 35.0 90000.0Summary
In this tutorial, we explored the DataFrame.where method in pandas. Key takeaways include:
- Using
whereto replace values in a DataFrame based on conditions. - Specifying a custom replacement value with
other. - Modifying DataFrames in place using
inplace=True. - Applying conditions across all columns.
The DataFrame.where method is a powerful tool for conditional operations and is especially useful in data preprocessing and analysis.