Pandas DataFrame.corrwith: Compute Correlation with Another DataFrame or Series


Pandas DataFrame.corrwith

The DataFrame.corrwith method in pandas computes the pairwise correlation between rows or columns of a DataFrame and another DataFrame or Series. This is particularly useful for comparing correlations between aligned structures.


Syntax

The syntax for DataFrame.corrwith is:

DataFrame.corrwith(other, axis=0, drop=False, method='pearson', numeric_only=False)

Here, DataFrame refers to the pandas DataFrame on which the correlation computation is performed.


Parameters

ParameterDescription
otherDataFrame or Series with which to compute the correlation.
axisAxis along which to compute the correlation:
  • 0 or 'index': Compute column-wise correlation.
  • 1 or 'columns': Compute row-wise correlation.
Defaults to 0.
dropIf True, excludes labels that are not common to both input objects. Defaults to False.
methodSpecifies the method for correlation. Options include:
  • 'pearson': Standard correlation coefficient (default).
  • 'kendall': Kendall Tau correlation coefficient.
  • 'spearman': Spearman rank correlation coefficient.
numeric_onlyIf True, only considers numerical data. Defaults to False.

Returns

A Series containing the pairwise correlation values. The index corresponds to the columns or rows of the DataFrame, depending on the specified axis.


Examples

Computing Correlation Between DataFrame Columns

This example demonstrates how to compute correlation between columns of a DataFrame and a Series.

Python Program

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [10, 20, 30, 40],
    'C': [100, 200, 300, 400]
})

# Create a Series
s = pd.Series([1, 2, 3, 4])

# Compute correlation with the Series
result = df.corrwith(s)
print(result)

Output

A    1.0
B    1.0
C    1.0
dtype: float64

Computing Correlation Between Two DataFrames

This example demonstrates how to compute correlation between rows of two DataFrames.

Python Program

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})
df2 = pd.DataFrame({
    'A': [7, 8, 9],
    'B': [10, 11, 12]
})

# Compute row-wise correlation
df_corr = df1.corrwith(df2, axis=1)
print(df_corr)

Output

0    1.0
1    1.0
2    1.0
dtype: float64

Handling Missing Values in DataFrames

This example demonstrates how DataFrame.corrwith handles missing values during correlation computation.

Python Program

import pandas as pd

# Create two DataFrames with missing values
df1 = pd.DataFrame({
    'A': [1, 2, None],
    'B': [4, None, 6]
})
df2 = pd.DataFrame({
    'A': [7, 8, 9],
    'B': [10, 11, None]
})

# Compute correlation, excluding NaN values
df_corr = df1.corrwith(df2)
print(df_corr)

Output

A    1.0
B    NaN
dtype: float64

Specifying Correlation Method

This example demonstrates how to compute correlation using the Spearman method.

Python Program

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [4, 3, 2, 1]
})

# Create another DataFrame
df2 = pd.DataFrame({
    'A': [4, 3, 2, 1],
    'B': [1, 2, 3, 4]
})

# Compute correlation with Spearman method
result = df.corrwith(df2, method='spearman')
print(result)

Output

A   -1.0
B   -1.0
dtype: float64

Summary

In this tutorial, we explored the DataFrame.corrwith method in pandas. Key takeaways include:

  • Using corrwith to compute correlation between a DataFrame and another DataFrame or Series.
  • Choosing correlation methods such as Pearson, Kendall, or Spearman.
  • Handling missing values during correlation computation.
  • Specifying the axis along which to compute correlations.

Python Libraries