Pandas DataFrame.corrwith: Compute Correlation with Another DataFrame or Series
Pandas DataFrame.corrwith
The DataFrame.corrwith
method in pandas computes the pairwise correlation between rows or columns of a DataFrame and another DataFrame or Series. This is particularly useful for comparing correlations between aligned structures.
Syntax
The syntax for DataFrame.corrwith
is:
DataFrame.corrwith(other, axis=0, drop=False, method='pearson', numeric_only=False)
Here, DataFrame
refers to the pandas DataFrame on which the correlation computation is performed.
Parameters
Parameter | Description |
---|---|
other | DataFrame or Series with which to compute the correlation. |
axis | Axis along which to compute the correlation:
0 . |
drop | If True , excludes labels that are not common to both input objects. Defaults to False . |
method | Specifies the method for correlation. Options include:
|
numeric_only | If True , only considers numerical data. Defaults to False . |
Returns
A Series containing the pairwise correlation values. The index corresponds to the columns or rows of the DataFrame, depending on the specified axis
.
Examples
Computing Correlation Between DataFrame Columns
This example demonstrates how to compute correlation between columns of a DataFrame and a Series.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [10, 20, 30, 40],
'C': [100, 200, 300, 400]
})
# Create a Series
s = pd.Series([1, 2, 3, 4])
# Compute correlation with the Series
result = df.corrwith(s)
print(result)
Output
A 1.0
B 1.0
C 1.0
dtype: float64
Computing Correlation Between Two DataFrames
This example demonstrates how to compute correlation between rows of two DataFrames.
Python Program
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
})
df2 = pd.DataFrame({
'A': [7, 8, 9],
'B': [10, 11, 12]
})
# Compute row-wise correlation
df_corr = df1.corrwith(df2, axis=1)
print(df_corr)
Output
0 1.0
1 1.0
2 1.0
dtype: float64
Handling Missing Values in DataFrames
This example demonstrates how DataFrame.corrwith
handles missing values during correlation computation.
Python Program
import pandas as pd
# Create two DataFrames with missing values
df1 = pd.DataFrame({
'A': [1, 2, None],
'B': [4, None, 6]
})
df2 = pd.DataFrame({
'A': [7, 8, 9],
'B': [10, 11, None]
})
# Compute correlation, excluding NaN values
df_corr = df1.corrwith(df2)
print(df_corr)
Output
A 1.0
B NaN
dtype: float64
Specifying Correlation Method
This example demonstrates how to compute correlation using the Spearman method.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [4, 3, 2, 1]
})
# Create another DataFrame
df2 = pd.DataFrame({
'A': [4, 3, 2, 1],
'B': [1, 2, 3, 4]
})
# Compute correlation with Spearman method
result = df.corrwith(df2, method='spearman')
print(result)
Output
A -1.0
B -1.0
dtype: float64
Summary
In this tutorial, we explored the DataFrame.corrwith
method in pandas. Key takeaways include:
- Using
corrwith
to compute correlation between a DataFrame and another DataFrame or Series. - Choosing correlation methods such as Pearson, Kendall, or Spearman.
- Handling missing values during correlation computation.
- Specifying the axis along which to compute correlations.