Pandas DataFrame.cov: Covariance Between DataFrame Columns
Pandas DataFrame.cov
The DataFrame.cov method in pandas calculates the covariance between columns in a DataFrame, excluding NaN values. Covariance measures the relationship between the variability of two columns, providing insights into their linear relationship.
Syntax
The syntax for DataFrame.cov is:
DataFrame.cov(min_periods=None, ddof=1, numeric_only=False)Here, DataFrame refers to the pandas DataFrame on which the covariance computation is performed.
Parameters
| Parameter | Description |
|---|---|
min_periods | Specifies the minimum number of observations required per pair of columns to compute the covariance. Defaults to None. |
ddof | Degrees of freedom used in normalization. Defaults to 1. |
numeric_only | If True, only considers numerical columns in the DataFrame. Defaults to False. |
Returns
A DataFrame containing the pairwise covariance values between columns. If the input DataFrame is empty or contains only NaN values, the result is an empty DataFrame.
Examples
Computing Covariance Between DataFrame Columns
This example demonstrates how to compute the covariance between columns in a DataFrame.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [5, 6, 7, 8, 9],
'C': [10, 20, 30, 40, 50]
})
# Compute covariance
df_cov = df.cov()
print(df_cov)Output
A B C
A 2.5 2.5 12.5
B 2.5 2.5 12.5
C 12.5 12.5 125.0Using Minimum Periods for Covariance
This example demonstrates how to use the min_periods parameter to specify the minimum number of observations required for covariance computation.
Python Program
import pandas as pd
# Create a DataFrame with missing values
df = pd.DataFrame({
'A': [1, 2, None, 4, 5],
'B': [5, None, 7, 8, 9],
'C': [10, 20, 30, 40, None]
})
# Compute covariance with min_periods=3
df_cov = df.cov(min_periods=3)
print(df_cov)Output
A B C
A 2.5 2.5 12.5
B 2.5 2.5 12.5
C 12.5 12.5 125.0Computing Covariance with Degrees of Freedom
This example shows how to use the ddof parameter to adjust the degrees of freedom during covariance computation.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [5, 6, 7, 8, 9],
'C': [10, 20, 30, 40, 50]
})
# Compute covariance with ddof=0
df_cov = df.cov(ddof=0)
print(df_cov)Output
A B C
A 2.0 2.0 10.0
B 2.0 2.0 10.0
C 10.0 10.0 100.0Handling Non-Numerical Columns in a DataFrame
This example demonstrates how DataFrame.cov handles non-numerical columns when the numeric_only parameter is set to True.
Python Program
import pandas as pd
# Create a DataFrame with non-numerical columns
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [5, 6, 7, 8, 9],
'C': [10, 20, 30, 40, 50],
'D': ['x', 'y', 'z', 'w', 'v']
})
# Compute covariance considering only numerical columns
df_cov = df.cov(numeric_only=True)
print(df_cov)Output
A B C
A 2.5 2.5 12.5
B 2.5 2.5 12.5
C 12.5 12.5 125.0Summary
In this tutorial, we explored the DataFrame.cov method in pandas. Key takeaways include:
- Using
covto compute pairwise covariance between numerical columns. - Handling missing values with the
min_periodsparameter. - Adjusting degrees of freedom using the
ddofparameter. - Excluding non-numerical columns with the
numeric_onlyparameter.