Pandas DataFrame.corr: Correlation Between DataFrame Columns
Pandas DataFrame.corr
The DataFrame.corr method in pandas computes pairwise correlation of columns, excluding NaN values. This is useful for analyzing relationships between numerical variables in a DataFrame.
Syntax
The syntax for DataFrame.corr is:
DataFrame.corr(method='pearson', min_periods=1, numeric_only=False)Here, DataFrame refers to the pandas DataFrame on which the correlation computation is performed.
Parameters
| Parameter | Description |
|---|---|
method | Specifies the method for correlation. Options include:
|
min_periods | Minimum number of observations required per pair of columns to compute the correlation. Defaults to 1. |
numeric_only | If True, only considers numerical columns in the DataFrame. Defaults to False. |
Returns
A DataFrame containing the pairwise correlation values between columns. If the input DataFrame is empty or only contains NaN values, the result is an empty DataFrame.
Examples
Computing Correlation Between DataFrame Columns
This example demonstrates how to compute the correlation between columns in a DataFrame.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [5, 6, 7, 8, 9],
'C': [10, 20, 30, 40, 50]
})
# Compute correlation
df_corr = df.corr()
print(df_corr)Output
A B C
A 1.0 1.0 1.0
B 1.0 1.0 1.0
C 1.0 1.0 1.0Computing Correlation with Different Methods
This example demonstrates using the method parameter to compute Spearman correlation.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [5, 6, 7, 8, 9],
'C': [50, 40, 30, 20, 10]
})
# Compute Spearman correlation
df_corr = df.corr(method='spearman')
print(df_corr)Output
A B C
A 1.0 1.0 -1.0
B 1.0 1.0 -1.0
C -1.0 -1.0 1.0Handling Missing Values in a DataFrame
This example demonstrates how DataFrame.corr excludes NaN values when computing correlations.
Python Program
import pandas as pd
# Create a DataFrame with missing values
df = pd.DataFrame({
'A': [1, 2, 3, None, 5],
'B': [5, 6, 7, 8, None],
'C': [10, 20, 30, 40, 50]
})
# Compute correlation
df_corr = df.corr()
print(df_corr)Output
A B C
A 1.000000 1.0 1.0
B 1.000000 1.0 1.0
C 1.000000 1.0 1.0Computing Correlation with Minimum Periods
This example demonstrates how to use the min_periods parameter to specify a minimum number of observations required for correlation computation.
Python Program
import pandas as pd
# Create a DataFrame with missing values
df = pd.DataFrame({
'A': [1, 2, 3, None, 5],
'B': [5, 6, 7, 8, None],
'C': [10, 20, 30, 40, 50]
})
# Compute correlation with min_periods=3
df_corr = df.corr(min_periods=3)
print(df_corr)Output
A B C
A 1.000000 1.0 1.0
B 1.000000 1.0 1.0
C 1.000000 1.0 1.0Summary
In this tutorial, we explored the DataFrame.corr method in pandas. Key takeaways include:
- Using
corrto compute pairwise correlations between numerical columns. - Choosing correlation methods such as Pearson, Kendall, or Spearman.
- Handling missing values during correlation computation.
- Using the
min_periodsparameter to specify observation requirements.