Pandas DataFrame.combine_first: Update Missing Values with Another DataFrame


Pandas DataFrame.combine_first

The DataFrame.combine_first method in pandas is used to update missing (NaN) values in a DataFrame with non-missing values from another DataFrame. This method is useful when you want to fill in missing data in one DataFrame using corresponding values from another DataFrame.


Syntax

The syntax for DataFrame.combine_first is:

DataFrame.combine_first(other)

Here, DataFrame refers to the pandas DataFrame being updated, and other is the DataFrame used to fill missing values.


Parameters

ParameterDescription
otherThe other DataFrame used to fill missing values in the original DataFrame. The columns in other that are not present in the original DataFrame will be added to the result.

Returns

A DataFrame with missing values updated using non-missing values from other. If other contains columns not present in the original DataFrame, these columns will be added to the result.


Examples

Basic Usage of combine_first

This example demonstrates how to use combine_first to fill missing values in one DataFrame with values from another DataFrame.

Python Program

import pandas as pd

# Create two DataFrames with missing values
df1 = pd.DataFrame({'A': [1, None, 3], 'B': [None, 5, 6]})
df2 = pd.DataFrame({'A': [10, 20, None], 'B': [40, None, 60]})

# Use combine_first to fill missing values in df1 with values from df2
result = df1.combine_first(df2)
print(result)

Output

      A     B
0   1.0  40.0
1  20.0   5.0
2   3.0  60.0

Adding New Columns with combine_first

This example shows how combine_first can add new columns to the original DataFrame if they are present in the other DataFrame.

Python Program

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({'A': [1, None, 3], 'B': [None, 5, 6]})
df2 = pd.DataFrame({'A': [10, 20, None], 'C': [7, 8, 9]})

# Use combine_first to fill missing values and add new columns
result = df1.combine_first(df2)
print(result)

Output

      A    B    C
0   1.0  NaN  7.0
1  20.0  5.0  8.0
2   3.0  6.0  9.0

Summary

In this tutorial, we explored the DataFrame.combine_first method in pandas. Key takeaways include:

  • Using combine_first to fill missing values in a DataFrame with values from another DataFrame.
  • Adding new columns to the original DataFrame if they are present in the other DataFrame.
  • Understanding how combine_first prioritizes non-missing values from the other DataFrame.

Python Libraries