Pandas DataFrame.corrwith: Compute Correlation with Another DataFrame or Series
Pandas DataFrame.corrwith
The DataFrame.corrwith method in pandas computes the pairwise correlation between rows or columns of a DataFrame and another DataFrame or Series. This is particularly useful for comparing correlations between aligned structures.
Syntax
The syntax for DataFrame.corrwith is:
DataFrame.corrwith(other, axis=0, drop=False, method='pearson', numeric_only=False)Here, DataFrame refers to the pandas DataFrame on which the correlation computation is performed.
Parameters
| Parameter | Description |
|---|---|
other | DataFrame or Series with which to compute the correlation. |
axis | Axis along which to compute the correlation:
0. |
drop | If True, excludes labels that are not common to both input objects. Defaults to False. |
method | Specifies the method for correlation. Options include:
|
numeric_only | If True, only considers numerical data. Defaults to False. |
Returns
A Series containing the pairwise correlation values. The index corresponds to the columns or rows of the DataFrame, depending on the specified axis.
Examples
Computing Correlation Between DataFrame Columns
This example demonstrates how to compute correlation between columns of a DataFrame and a Series.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [10, 20, 30, 40],
'C': [100, 200, 300, 400]
})
# Create a Series
s = pd.Series([1, 2, 3, 4])
# Compute correlation with the Series
result = df.corrwith(s)
print(result)Output
A 1.0
B 1.0
C 1.0
dtype: float64Computing Correlation Between Two DataFrames
This example demonstrates how to compute correlation between rows of two DataFrames.
Python Program
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
})
df2 = pd.DataFrame({
'A': [7, 8, 9],
'B': [10, 11, 12]
})
# Compute row-wise correlation
df_corr = df1.corrwith(df2, axis=1)
print(df_corr)Output
0 1.0
1 1.0
2 1.0
dtype: float64Handling Missing Values in DataFrames
This example demonstrates how DataFrame.corrwith handles missing values during correlation computation.
Python Program
import pandas as pd
# Create two DataFrames with missing values
df1 = pd.DataFrame({
'A': [1, 2, None],
'B': [4, None, 6]
})
df2 = pd.DataFrame({
'A': [7, 8, 9],
'B': [10, 11, None]
})
# Compute correlation, excluding NaN values
df_corr = df1.corrwith(df2)
print(df_corr)Output
A 1.0
B NaN
dtype: float64Specifying Correlation Method
This example demonstrates how to compute correlation using the Spearman method.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [4, 3, 2, 1]
})
# Create another DataFrame
df2 = pd.DataFrame({
'A': [4, 3, 2, 1],
'B': [1, 2, 3, 4]
})
# Compute correlation with Spearman method
result = df.corrwith(df2, method='spearman')
print(result)Output
A -1.0
B -1.0
dtype: float64Summary
In this tutorial, we explored the DataFrame.corrwith method in pandas. Key takeaways include:
- Using
corrwithto compute correlation between a DataFrame and another DataFrame or Series. - Choosing correlation methods such as Pearson, Kendall, or Spearman.
- Handling missing values during correlation computation.
- Specifying the axis along which to compute correlations.