Pandas DataFrame.corr: Correlation Between DataFrame Columns


Pandas DataFrame.corr

The DataFrame.corr method in pandas computes pairwise correlation of columns, excluding NaN values. This is useful for analyzing relationships between numerical variables in a DataFrame.


Syntax

The syntax for DataFrame.corr is:

DataFrame.corr(method='pearson', min_periods=1, numeric_only=False)

Here, DataFrame refers to the pandas DataFrame on which the correlation computation is performed.


Parameters

ParameterDescription
methodSpecifies the method for correlation. Options include:
  • 'pearson': Standard correlation coefficient (default).
  • 'kendall': Kendall Tau correlation coefficient.
  • 'spearman': Spearman rank correlation coefficient.
min_periodsMinimum number of observations required per pair of columns to compute the correlation. Defaults to 1.
numeric_onlyIf True, only considers numerical columns in the DataFrame. Defaults to False.

Returns

A DataFrame containing the pairwise correlation values between columns. If the input DataFrame is empty or only contains NaN values, the result is an empty DataFrame.


Examples

Computing Correlation Between DataFrame Columns

This example demonstrates how to compute the correlation between columns in a DataFrame.

Python Program

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [5, 6, 7, 8, 9],
    'C': [10, 20, 30, 40, 50]
})

# Compute correlation
df_corr = df.corr()
print(df_corr)

Output

     A    B    C
A  1.0  1.0  1.0
B  1.0  1.0  1.0
C  1.0  1.0  1.0

Computing Correlation with Different Methods

This example demonstrates using the method parameter to compute Spearman correlation.

Python Program

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [5, 6, 7, 8, 9],
    'C': [50, 40, 30, 20, 10]
})

# Compute Spearman correlation
df_corr = df.corr(method='spearman')
print(df_corr)

Output

     A    B    C
A  1.0  1.0 -1.0
B  1.0  1.0 -1.0
C -1.0 -1.0  1.0

Handling Missing Values in a DataFrame

This example demonstrates how DataFrame.corr excludes NaN values when computing correlations.

Python Program

import pandas as pd

# Create a DataFrame with missing values
df = pd.DataFrame({
    'A': [1, 2, 3, None, 5],
    'B': [5, 6, 7, 8, None],
    'C': [10, 20, 30, 40, 50]
})

# Compute correlation
df_corr = df.corr()
print(df_corr)

Output

          A    B    C
A  1.000000  1.0  1.0
B  1.000000  1.0  1.0
C  1.000000  1.0  1.0

Computing Correlation with Minimum Periods

This example demonstrates how to use the min_periods parameter to specify a minimum number of observations required for correlation computation.

Python Program

import pandas as pd

# Create a DataFrame with missing values
df = pd.DataFrame({
    'A': [1, 2, 3, None, 5],
    'B': [5, 6, 7, 8, None],
    'C': [10, 20, 30, 40, 50]
})

# Compute correlation with min_periods=3
df_corr = df.corr(min_periods=3)
print(df_corr)

Output

          A    B    C
A  1.000000  1.0  1.0
B  1.000000  1.0  1.0
C  1.000000  1.0  1.0

Summary

In this tutorial, we explored the DataFrame.corr method in pandas. Key takeaways include:

  • Using corr to compute pairwise correlations between numerical columns.
  • Choosing correlation methods such as Pearson, Kendall, or Spearman.
  • Handling missing values during correlation computation.
  • Using the min_periods parameter to specify observation requirements.

Python Libraries