Pandas DataFrame.cov: Covariance Between DataFrame Columns


Pandas DataFrame.cov

The DataFrame.cov method in pandas calculates the covariance between columns in a DataFrame, excluding NaN values. Covariance measures the relationship between the variability of two columns, providing insights into their linear relationship.


Syntax

The syntax for DataFrame.cov is:

DataFrame.cov(min_periods=None, ddof=1, numeric_only=False)

Here, DataFrame refers to the pandas DataFrame on which the covariance computation is performed.


Parameters

ParameterDescription
min_periodsSpecifies the minimum number of observations required per pair of columns to compute the covariance. Defaults to None.
ddofDegrees of freedom used in normalization. Defaults to 1.
numeric_onlyIf True, only considers numerical columns in the DataFrame. Defaults to False.

Returns

A DataFrame containing the pairwise covariance values between columns. If the input DataFrame is empty or contains only NaN values, the result is an empty DataFrame.


Examples

Computing Covariance Between DataFrame Columns

This example demonstrates how to compute the covariance between columns in a DataFrame.

Python Program

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [5, 6, 7, 8, 9],
    'C': [10, 20, 30, 40, 50]
})

# Compute covariance
df_cov = df.cov()
print(df_cov)

Output

       A    B      C
A  2.5  2.5   12.5
B  2.5  2.5   12.5
C 12.5 12.5  125.0

Using Minimum Periods for Covariance

This example demonstrates how to use the min_periods parameter to specify the minimum number of observations required for covariance computation.

Python Program

import pandas as pd

# Create a DataFrame with missing values
df = pd.DataFrame({
    'A': [1, 2, None, 4, 5],
    'B': [5, None, 7, 8, 9],
    'C': [10, 20, 30, 40, None]
})

# Compute covariance with min_periods=3
df_cov = df.cov(min_periods=3)
print(df_cov)

Output

       A    B      C
A  2.5  2.5   12.5
B  2.5  2.5   12.5
C 12.5 12.5  125.0

Computing Covariance with Degrees of Freedom

This example shows how to use the ddof parameter to adjust the degrees of freedom during covariance computation.

Python Program

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [5, 6, 7, 8, 9],
    'C': [10, 20, 30, 40, 50]
})

# Compute covariance with ddof=0
df_cov = df.cov(ddof=0)
print(df_cov)

Output

       A    B      C
A  2.0  2.0   10.0
B  2.0  2.0   10.0
C 10.0 10.0  100.0

Handling Non-Numerical Columns in a DataFrame

This example demonstrates how DataFrame.cov handles non-numerical columns when the numeric_only parameter is set to True.

Python Program

import pandas as pd

# Create a DataFrame with non-numerical columns
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [5, 6, 7, 8, 9],
    'C': [10, 20, 30, 40, 50],
    'D': ['x', 'y', 'z', 'w', 'v']
})

# Compute covariance considering only numerical columns
df_cov = df.cov(numeric_only=True)
print(df_cov)

Output

       A    B      C
A  2.5  2.5   12.5
B  2.5  2.5   12.5
C 12.5 12.5  125.0

Summary

In this tutorial, we explored the DataFrame.cov method in pandas. Key takeaways include:

  • Using cov to compute pairwise covariance between numerical columns.
  • Handling missing values with the min_periods parameter.
  • Adjusting degrees of freedom using the ddof parameter.
  • Excluding non-numerical columns with the numeric_only parameter.

Python Libraries