Pandas DataFrame.mask
Pandas DataFrame.mask
The DataFrame.mask method in pandas is used to replace values in a DataFrame where a specified condition is True. This method is useful for conditionally updating data in a DataFrame.
Syntax
The syntax for DataFrame.mask is:
DataFrame.mask(cond, other=, *, inplace=False, axis=None, level=None) Here, DataFrame refers to the pandas DataFrame on which the operation is applied.
Parameters
| Parameter | Description |
|---|---|
cond | A boolean condition. Where True, values in the DataFrame will be replaced. |
other | Optional. The value to replace with where the condition is True. Defaults to NaN. |
inplace | If True, modifies the original DataFrame. If False, returns a new DataFrame. Defaults to False. |
axis | Axis along which the condition is evaluated (0 for rows, 1 for columns). Defaults to None. |
level | Used with a MultiIndex to specify the level to apply the mask on. Defaults to None. |
Returns
A DataFrame with updated values where the condition is True. If inplace=True, returns None and modifies the original DataFrame.
Examples
Replacing Values Based on a Condition
Replace values greater than a threshold with NaN.
Python Program
import pandas as pd
import numpy as np
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)
# Replace values greater than 75000 with NaN
print("DataFrame after masking values greater than 75000:")
df_masked = df.mask(df['Salary'] > 75000)
print(df_masked)Output
DataFrame after masking values greater than 75000:
Name Age Salary
0 Arjun 25 70000.0
1 Ram 30 NaN
2 Priya 35 NaNReplacing Values with a Specific Value
Replace values that meet a condition with a specific value.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)
# Replace values greater than 75000 with 0
print("DataFrame after masking values greater than 75000 with 0:")
df_masked = df.mask(df['Salary'] > 75000, other=0)
print(df_masked)Output
DataFrame after masking values greater than 75000 with 0:
Name Age Salary
0 Arjun 25 70000.0
1 Ram 30 0.0
2 Priya 35 0.0Masking with inplace=True
Modify the original DataFrame by using inplace=True.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)
# Replace values where 'Age' is greater than 30 with NaN inplace
print("DataFrame after masking 'Age' greater than 30:")
df.mask(df['Age'] > 30, inplace=True)
print(df)Output
DataFrame after masking 'Age' greater than 30:
Name Age Salary
0 Arjun 25.0 70000.0
1 Ram 30.0 80000.0
2 Priya NaN NaNMasking Along a Specific Axis
Apply the mask condition along rows or columns using the axis parameter.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)
# Mask along columns (axis=1)
print("DataFrame after masking along columns:")
df_masked = df.mask(df > 75000, axis=1)
print(df_masked)Output
DataFrame after masking along columns:
Name Age Salary
0 Arjun 25.0 70000.0
1 Ram 30.0 NaN
2 Priya 35.0 NaNSummary
In this tutorial, we explored the DataFrame.mask method in pandas. Key takeaways include:
- Replacing values in a DataFrame where a condition is True.
- Replacing with NaN by default or with a specific value using
other. - Using
inplace=Trueto modify the original DataFrame directly. - Applying the mask condition along rows or columns with
axis.
The DataFrame.mask method is a powerful tool for conditionally updating data in pandas DataFrames.