Pandas DataFrame.combine


Pandas DataFrame.combine

The DataFrame.combine method in pandas is used to perform element-wise combination of two DataFrames using a specified function. It allows you to merge two DataFrames by applying a function to each pair of elements from the two DataFrames.


Syntax

The syntax for DataFrame.combine is:

DataFrame.combine(other, func, fill_value=None, overwrite=True)

Here, DataFrame refers to the pandas DataFrame being combined.


Parameters

ParameterDescription
otherThe other DataFrame to combine with.
funcA function that takes two scalar values and returns a scalar value. This function is applied to each pair of elements from the two DataFrames.
fill_valueA scalar value to use for missing values in either DataFrame before applying the function. Defaults to None.
overwriteIf True, values in the original DataFrame that are NaN will be overwritten by values from the other DataFrame. Defaults to True.

Returns

A DataFrame that results from the element-wise combination of the two DataFrames using the specified function.


Examples

Basic Combination of Two DataFrames

This example demonstrates how to combine two DataFrames using a simple addition function. The function add_values takes two values and returns their sum. The combine method applies this function to each pair of elements from the two DataFrames.

Python Program

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60]})

# Define a function to add two values
def add_values(x, y):
    return x + y

# Combine the DataFrames using the add_values function
result = df1.combine(df2, add_values)
print(result)

Output

    A   B
0  11  44
1  22  55
2  33  66

Combining DataFrames with Fill Value

This example shows how to handle missing values (NaN) in DataFrames during combination. The fill_value parameter is used to replace missing values with a specified value (in this case, 0) before applying the function.

Python Program

import pandas as pd

# Create two DataFrames with missing values
df1 = pd.DataFrame({'A': [1, 2, None], 'B': [4, None, 6]})
df2 = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60]})

# Define a function to add two values
def add_values(x, y):
    return x + y

# Combine the DataFrames using the add_values function and a fill value of 0
result = df1.combine(df2, add_values, fill_value=0)
print(result)

Output

      A     B
0  11.0  44.0
1  22.0  50.0
2  30.0  66.0

Combining DataFrames without Overwrite

This example demonstrates the effect of the overwrite parameter. When overwrite is set to False, NaN values in the original DataFrame (df1) are not replaced by values from the other DataFrame (df2). Instead, they remain as NaN in the result.

Python Program

import pandas as pd

# Create two DataFrames with missing values
df1 = pd.DataFrame({'A': [1, 2, None], 'B': [4, None, 6]})
df2 = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60]})

# Define a function to add two values
def add_values(x, y):
    return x + y

# Combine the DataFrames using the add_values function with overwrite set to False
result = df1.combine(df2, add_values, overwrite=False)
print(result)

Output

      A     B
0  11.0  44.0
1  22.0   NaN
2   NaN  66.0

Summary

In this tutorial, we explored the DataFrame.combine method in pandas. Key takeaways include:

  • Using combine for element-wise combination of two DataFrames with a custom function.
  • Handling missing values with the fill_value parameter.
  • Controlling whether to overwrite NaN values in the original DataFrame with the overwrite parameter.

Python Libraries