Pandas DataFrame.corr: Correlation Between DataFrame Columns
Pandas DataFrame.corr
The DataFrame.corr
method in pandas computes pairwise correlation of columns, excluding NaN
values. This is useful for analyzing relationships between numerical variables in a DataFrame.
Syntax
The syntax for DataFrame.corr
is:
DataFrame.corr(method='pearson', min_periods=1, numeric_only=False)
Here, DataFrame
refers to the pandas DataFrame on which the correlation computation is performed.
Parameters
Parameter | Description |
---|---|
method | Specifies the method for correlation. Options include:
|
min_periods | Minimum number of observations required per pair of columns to compute the correlation. Defaults to 1 . |
numeric_only | If True , only considers numerical columns in the DataFrame. Defaults to False . |
Returns
A DataFrame containing the pairwise correlation values between columns. If the input DataFrame is empty or only contains NaN
values, the result is an empty DataFrame.
Examples
Computing Correlation Between DataFrame Columns
This example demonstrates how to compute the correlation between columns in a DataFrame.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [5, 6, 7, 8, 9],
'C': [10, 20, 30, 40, 50]
})
# Compute correlation
df_corr = df.corr()
print(df_corr)
Output
A B C
A 1.0 1.0 1.0
B 1.0 1.0 1.0
C 1.0 1.0 1.0
Computing Correlation with Different Methods
This example demonstrates using the method
parameter to compute Spearman correlation.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [5, 6, 7, 8, 9],
'C': [50, 40, 30, 20, 10]
})
# Compute Spearman correlation
df_corr = df.corr(method='spearman')
print(df_corr)
Output
A B C
A 1.0 1.0 -1.0
B 1.0 1.0 -1.0
C -1.0 -1.0 1.0
Handling Missing Values in a DataFrame
This example demonstrates how DataFrame.corr
excludes NaN
values when computing correlations.
Python Program
import pandas as pd
# Create a DataFrame with missing values
df = pd.DataFrame({
'A': [1, 2, 3, None, 5],
'B': [5, 6, 7, 8, None],
'C': [10, 20, 30, 40, 50]
})
# Compute correlation
df_corr = df.corr()
print(df_corr)
Output
A B C
A 1.000000 1.0 1.0
B 1.000000 1.0 1.0
C 1.000000 1.0 1.0
Computing Correlation with Minimum Periods
This example demonstrates how to use the min_periods
parameter to specify a minimum number of observations required for correlation computation.
Python Program
import pandas as pd
# Create a DataFrame with missing values
df = pd.DataFrame({
'A': [1, 2, 3, None, 5],
'B': [5, 6, 7, 8, None],
'C': [10, 20, 30, 40, 50]
})
# Compute correlation with min_periods=3
df_corr = df.corr(min_periods=3)
print(df_corr)
Output
A B C
A 1.000000 1.0 1.0
B 1.000000 1.0 1.0
C 1.000000 1.0 1.0
Summary
In this tutorial, we explored the DataFrame.corr
method in pandas. Key takeaways include:
- Using
corr
to compute pairwise correlations between numerical columns. - Choosing correlation methods such as Pearson, Kendall, or Spearman.
- Handling missing values during correlation computation.
- Using the
min_periods
parameter to specify observation requirements.