Pandas DataFrame.where


Pandas DataFrame.where

The DataFrame.where method in pandas is used to replace values in a DataFrame based on a condition. It is particularly useful for applying conditional operations where elements that do not satisfy a condition are replaced with a specified value.


Syntax

The syntax for DataFrame.where is:

DataFrame.where(cond, other=nan, *, inplace=False, axis=None, level=None)

Here, DataFrame refers to the pandas DataFrame on which the method is being applied.


Parameters

ParameterDescription
condA condition to evaluate. Elements satisfying the condition retain their original values; otherwise, they are replaced by other.
otherValue to replace elements not satisfying the condition. Defaults to NaN.
inplaceIf True, performs the operation in place and modifies the DataFrame directly. Defaults to False.
axisAxis along which to evaluate the condition. Defaults to None.
levelUsed for multi-level indexing to specify which level to apply the operation on. Defaults to None.

Returns

A DataFrame with values replaced based on the condition.


Examples

Basic Usage of where

Use where to replace values that do not satisfy a condition with NaN.

Python Program

import pandas as pd
import numpy as np

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)

# Replace values where Age < 30
print("DataFrame where Age < 30 is replaced with NaN:")
result = df.where(df['Age'] >= 30)
print(result)

Output

DataFrame where Age < 30 is replaced with NaN:
     Name   Age   Salary
0     NaN   NaN     NaN
1     Ram  30.0  80000.0
2   Priya  35.0  90000.0

Using the other Parameter

Specify a custom value to replace elements that do not satisfy the condition.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)

# Replace values where Age < 30 with 0
print("DataFrame where Age < 30 is replaced with 0:")
result = df.where(df['Age'] >= 30, other=0)
print(result)

Output

DataFrame where Age < 30 is replaced with 0:
     Name  Age  Salary
0       0    0       0
1     Ram   30   80000
2   Priya   35   90000

Using inplace=True

Modify the DataFrame directly using the inplace=True parameter.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)

# Replace values where Age < 30 with NaN in place
print("Modifying DataFrame in place where Age < 30:")
df.where(df['Age'] >= 30, inplace=True)
print(df)

Output

Modifying DataFrame in place where Age < 30:
     Name   Age   Salary
0     NaN   NaN     NaN
1     Ram  30.0  80000.0
2   Priya  35.0  90000.0

Applying Conditions Across All Columns

Use where to apply a condition across multiple columns.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)

# Replace all values less than 80000 with NaN
print("DataFrame where all values < 80000 are replaced with NaN:")
result = df.where(df >= 80000)
print(result)

Output

DataFrame where all values < 80000 are replaced with NaN:
     Name     Age    Salary
0     NaN     NaN      NaN
1     NaN     NaN  80000.0
2   Priya    35.0  90000.0

Summary

In this tutorial, we explored the DataFrame.where method in pandas. Key takeaways include:

  • Using where to replace values in a DataFrame based on conditions.
  • Specifying a custom replacement value with other.
  • Modifying DataFrames in place using inplace=True.
  • Applying conditions across all columns.

The DataFrame.where method is a powerful tool for conditional operations and is especially useful in data preprocessing and analysis.


Python Libraries