Pandas DataFrame.mask


Pandas DataFrame.mask

The DataFrame.mask method in pandas is used to replace values in a DataFrame where a specified condition is True. This method is useful for conditionally updating data in a DataFrame.


Syntax

The syntax for DataFrame.mask is:

DataFrame.mask(cond, other=, *, inplace=False, axis=None, level=None)

Here, DataFrame refers to the pandas DataFrame on which the operation is applied.


Parameters

ParameterDescription
condA boolean condition. Where True, values in the DataFrame will be replaced.
otherOptional. The value to replace with where the condition is True. Defaults to NaN.
inplaceIf True, modifies the original DataFrame. If False, returns a new DataFrame. Defaults to False.
axisAxis along which the condition is evaluated (0 for rows, 1 for columns). Defaults to None.
levelUsed with a MultiIndex to specify the level to apply the mask on. Defaults to None.

Returns

A DataFrame with updated values where the condition is True. If inplace=True, returns None and modifies the original DataFrame.


Examples

Replacing Values Based on a Condition

Replace values greater than a threshold with NaN.

Python Program

import pandas as pd
import numpy as np

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)

# Replace values greater than 75000 with NaN
print("DataFrame after masking values greater than 75000:")
df_masked = df.mask(df['Salary'] > 75000)
print(df_masked)

Output

DataFrame after masking values greater than 75000:
    Name  Age   Salary
0  Arjun   25  70000.0
1    Ram   30      NaN
2   Priya   35      NaN

Replacing Values with a Specific Value

Replace values that meet a condition with a specific value.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)

# Replace values greater than 75000 with 0
print("DataFrame after masking values greater than 75000 with 0:")
df_masked = df.mask(df['Salary'] > 75000, other=0)
print(df_masked)

Output

DataFrame after masking values greater than 75000 with 0:
    Name  Age   Salary
0  Arjun   25  70000.0
1    Ram   30      0.0
2   Priya   35      0.0

Masking with inplace=True

Modify the original DataFrame by using inplace=True.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)

# Replace values where 'Age' is greater than 30 with NaN inplace
print("DataFrame after masking 'Age' greater than 30:")
df.mask(df['Age'] > 30, inplace=True)
print(df)

Output

DataFrame after masking 'Age' greater than 30:
    Name   Age   Salary
0  Arjun  25.0  70000.0
1    Ram  30.0  80000.0
2   Priya   NaN      NaN

Masking Along a Specific Axis

Apply the mask condition along rows or columns using the axis parameter.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)

# Mask along columns (axis=1)
print("DataFrame after masking along columns:")
df_masked = df.mask(df > 75000, axis=1)
print(df_masked)

Output

DataFrame after masking along columns:
    Name   Age   Salary
0  Arjun  25.0  70000.0
1    Ram  30.0      NaN
2   Priya  35.0      NaN

Summary

In this tutorial, we explored the DataFrame.mask method in pandas. Key takeaways include:

  • Replacing values in a DataFrame where a condition is True.
  • Replacing with NaN by default or with a specific value using other.
  • Using inplace=True to modify the original DataFrame directly.
  • Applying the mask condition along rows or columns with axis.

The DataFrame.mask method is a powerful tool for conditionally updating data in pandas DataFrames.


Python Libraries