Pandas DataFrame.cov: Covariance Between DataFrame Columns
Pandas DataFrame.cov
The DataFrame.cov
method in pandas calculates the covariance between columns in a DataFrame, excluding NaN
values. Covariance measures the relationship between the variability of two columns, providing insights into their linear relationship.
Syntax
The syntax for DataFrame.cov
is:
DataFrame.cov(min_periods=None, ddof=1, numeric_only=False)
Here, DataFrame
refers to the pandas DataFrame on which the covariance computation is performed.
Parameters
Parameter | Description |
---|---|
min_periods | Specifies the minimum number of observations required per pair of columns to compute the covariance. Defaults to None . |
ddof | Degrees of freedom used in normalization. Defaults to 1 . |
numeric_only | If True , only considers numerical columns in the DataFrame. Defaults to False . |
Returns
A DataFrame containing the pairwise covariance values between columns. If the input DataFrame is empty or contains only NaN
values, the result is an empty DataFrame.
Examples
Computing Covariance Between DataFrame Columns
This example demonstrates how to compute the covariance between columns in a DataFrame.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [5, 6, 7, 8, 9],
'C': [10, 20, 30, 40, 50]
})
# Compute covariance
df_cov = df.cov()
print(df_cov)
Output
A B C
A 2.5 2.5 12.5
B 2.5 2.5 12.5
C 12.5 12.5 125.0
Using Minimum Periods for Covariance
This example demonstrates how to use the min_periods
parameter to specify the minimum number of observations required for covariance computation.
Python Program
import pandas as pd
# Create a DataFrame with missing values
df = pd.DataFrame({
'A': [1, 2, None, 4, 5],
'B': [5, None, 7, 8, 9],
'C': [10, 20, 30, 40, None]
})
# Compute covariance with min_periods=3
df_cov = df.cov(min_periods=3)
print(df_cov)
Output
A B C
A 2.5 2.5 12.5
B 2.5 2.5 12.5
C 12.5 12.5 125.0
Computing Covariance with Degrees of Freedom
This example shows how to use the ddof
parameter to adjust the degrees of freedom during covariance computation.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [5, 6, 7, 8, 9],
'C': [10, 20, 30, 40, 50]
})
# Compute covariance with ddof=0
df_cov = df.cov(ddof=0)
print(df_cov)
Output
A B C
A 2.0 2.0 10.0
B 2.0 2.0 10.0
C 10.0 10.0 100.0
Handling Non-Numerical Columns in a DataFrame
This example demonstrates how DataFrame.cov
handles non-numerical columns when the numeric_only
parameter is set to True
.
Python Program
import pandas as pd
# Create a DataFrame with non-numerical columns
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [5, 6, 7, 8, 9],
'C': [10, 20, 30, 40, 50],
'D': ['x', 'y', 'z', 'w', 'v']
})
# Compute covariance considering only numerical columns
df_cov = df.cov(numeric_only=True)
print(df_cov)
Output
A B C
A 2.5 2.5 12.5
B 2.5 2.5 12.5
C 12.5 12.5 125.0
Summary
In this tutorial, we explored the DataFrame.cov
method in pandas. Key takeaways include:
- Using
cov
to compute pairwise covariance between numerical columns. - Handling missing values with the
min_periods
parameter. - Adjusting degrees of freedom using the
ddof
parameter. - Excluding non-numerical columns with the
numeric_only
parameter.