Pandas DataFrame.fillna: Fill Missing Values in a DataFrame
Pandas DataFrame.fillna
The DataFrame.fillna
method in pandas is used to fill missing (NA/NaN) values in a DataFrame using a specified method or value.
Syntax
The syntax for DataFrame.fillna
is:
DataFrame.fillna(value=None, *, method=None, axis=None, inplace=False, limit=None, downcast=None)
Here, DataFrame
refers to the pandas DataFrame on which missing values need to be filled.
Parameters
Parameter | Description |
---|---|
value | Scalar, dict, Series, or DataFrame – Value to replace NaN values with. If a dictionary or Series is passed, it specifies values for specific columns. |
method | String, default None – Specifies method for filling: ffill (forward fill) or bfill (backward fill). |
axis | {0 or ‘index’, 1 or ‘columns’} – Determines whether filling is done along rows or columns. |
inplace | Boolean, default False – If True , modifies the DataFrame in place. |
limit | Integer, default None – Specifies the maximum number of consecutive NaN values to fill. |
downcast | Dictionary, default None – Attempts to convert filled values into a smaller datatype if possible. |
Returns
A DataFrame with missing values filled, or None
if inplace=True
.
Examples
1. Filling Missing Values with a Scalar Value
This example demonstrates how to fill all NaN values with a specific scalar value.
Python Program
import pandas as pd
# Create a DataFrame with NaN values
df = pd.DataFrame({
'A': [1, 2, None, 4],
'B': [None, 2, 3, None]
})
# Fill NaN values with 0
df_filled = df.fillna(0)
print(df_filled)
Output
A B
0 1.0 0.0
1 2.0 2.0
2 0.0 3.0
3 4.0 0.0
2. Filling Missing Values with a Dictionary
You can fill NaN values with different values for each column using a dictionary.
Python Program
import pandas as pd
# Create a DataFrame with NaN values
df = pd.DataFrame({
'A': [1, 2, None, 4],
'B': [None, 2, 3, None]
})
# Fill NaN values with different values for each column
df_filled_dict = df.fillna({'A': 100, 'B': 200})
print(df_filled_dict)
Output
A B
0 1.0 200.0
1 2.0 2.0
2 100.0 3.0
3 4.0 200.0
3. Forward Filling (Propagating Last Valid Observation)
Using method='ffill'
, the last valid value propagates forward.
Python Program
import pandas as pd
# Create a DataFrame with NaN values
df = pd.DataFrame({
'A': [1, 2, None, 4],
'B': [None, 2, 3, None]
})
# Forward fill missing values
df_ffill = df.fillna(method='ffill')
print(df_ffill)
Output
A B
0 1.0 NaN
1 2.0 2.0
2 2.0 3.0
3 4.0 3.0
4. Backward Filling (Using Next Valid Observation)
Using method='bfill'
, the next valid value fills NaN values.
Python Program
import pandas as pd
# Create a DataFrame with NaN values
df = pd.DataFrame({
'A': [1, 2, None, 4],
'B': [None, 2, 3, None]
})
# Backward fill missing values
df_bfill = df.fillna(method='bfill')
print(df_bfill)
Output
A B
0 1.0 2.0
1 2.0 2.0
2 4.0 3.0
3 4.0 NaN
5. Limiting the Number of Values Filled
You can set a limit on the number of consecutive NaN values to fill.
Python Program
import pandas as pd
# Create a DataFrame with NaN values
df = pd.DataFrame({
'A': [1, 2, None, 4],
'B': [None, 2, 3, None]
})
# Forward fill with a limit of 1
df_limited = df.fillna(method='ffill', limit=1)
print(df_limited)
Output
A B
0 1.0 NaN
1 2.0 2.0
2 2.0 3.0
3 4.0 3.0
Summary
In this tutorial, we explored the DataFrame.fillna
method in pandas. Key takeaways include:
- Using
fillna
to replace NaN values with a scalar, dictionary, or another DataFrame. - Forward filling (
ffill
) propagates the last valid value. - Backward filling (
bfill
) uses the next valid value to fill gaps. - The
limit
parameter restricts the number of values filled. - Using
inplace=True
modifies the original DataFrame instead of returning a new one.