Pandas DataFrame.describe: Generate Descriptive Statistics
Pandas DataFrame.describe
The DataFrame.describe
method in pandas is used to generate descriptive statistics of a DataFrame. This includes measures such as mean, standard deviation, min, max, and percentiles for numerical columns or count, unique, top, and frequency for object columns.
Syntax
The syntax for DataFrame.describe
is:
DataFrame.describe(percentiles=None, include=None, exclude=None)
Here, DataFrame
refers to the pandas DataFrame on which the descriptive statistics are generated.
Parameters
Parameter | Description |
---|---|
percentiles | A list of numbers between 0 and 1 to specify the percentiles to include. Defaults to [.25, .5, .75] if None . |
include | Specifies the data types to include. Use 'all' to include all data types, or specify a list of data types (e.g., ["float", "int"] ). Defaults to None (only numeric columns). |
exclude | Specifies the data types to exclude. Cannot be used together with include . |
Returns
A DataFrame containing descriptive statistics. For numeric data, this includes mean, standard deviation, min, max, and percentiles. For object data, it includes count, unique, top, and frequency.
Examples
Descriptive Statistics of a DataFrame
This example demonstrates how to use describe
to generate descriptive statistics for numerical columns in a DataFrame.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': ['a', 'b', 'a', 'b', 'a']
})
# Generate descriptive statistics
result = df.describe()
print(result)
Output
A B
count 5.0 5.0
mean 3.0 30.0
std 1.58 15.81
min 1.0 10.0
25% 2.0 20.0
50% 3.0 30.0
75% 4.0 40.0
max 5.0 50.0
Descriptive Statistics Including All Data Types
This example demonstrates how to include all data types in the descriptive statistics using include='all'
.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': ['a', 'b', 'a', 'b', 'a']
})
# Generate descriptive statistics for all data types
result = df.describe(include='all')
print(result)
Output
A B C
count 5.0 5.0 5
unique NaN NaN 2
top NaN NaN a
freq NaN NaN 3
mean 3.0 30.0 NaN
std 1.58 15.81 NaN
min 1.0 10.0 NaN
25% 2.0 20.0 NaN
50% 3.0 30.0 NaN
75% 4.0 40.0 NaN
max 5.0 50.0 NaN
Custom Percentiles for Descriptive Statistics
This example demonstrates how to use the percentiles
parameter to include custom percentiles in the output.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50]
})
# Generate descriptive statistics with custom percentiles
result = df.describe(percentiles=[.1, .9])
print(result)
Output
A B
count 5.0 5.0
mean 3.0 30.0
std 1.58 15.81
min 1.0 10.0
10% 1.4 14.0
50% 3.0 30.0
90% 4.6 46.0
max 5.0 50.0
Summary
In this tutorial, we explored the DataFrame.describe
method in pandas. Key takeaways include:
- Using
describe
to compute descriptive statistics for numerical columns by default. - Including all data types with the
include='all'
parameter. - Customizing the output by specifying percentiles with the
percentiles
parameter. - Excluding specific data types using the
exclude
parameter.