Pandas DataFrame.describe: Generate Descriptive Statistics
Pandas DataFrame.describe
The DataFrame.describe method in pandas is used to generate descriptive statistics of a DataFrame. This includes measures such as mean, standard deviation, min, max, and percentiles for numerical columns or count, unique, top, and frequency for object columns.
Syntax
The syntax for DataFrame.describe is:
DataFrame.describe(percentiles=None, include=None, exclude=None)Here, DataFrame refers to the pandas DataFrame on which the descriptive statistics are generated.
Parameters
| Parameter | Description |
|---|---|
percentiles | A list of numbers between 0 and 1 to specify the percentiles to include. Defaults to [.25, .5, .75] if None. |
include | Specifies the data types to include. Use 'all' to include all data types, or specify a list of data types (e.g., ["float", "int"]). Defaults to None (only numeric columns). |
exclude | Specifies the data types to exclude. Cannot be used together with include. |
Returns
A DataFrame containing descriptive statistics. For numeric data, this includes mean, standard deviation, min, max, and percentiles. For object data, it includes count, unique, top, and frequency.
Examples
Descriptive Statistics of a DataFrame
This example demonstrates how to use describe to generate descriptive statistics for numerical columns in a DataFrame.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': ['a', 'b', 'a', 'b', 'a']
})
# Generate descriptive statistics
result = df.describe()
print(result)Output
A B
count 5.0 5.0
mean 3.0 30.0
std 1.58 15.81
min 1.0 10.0
25% 2.0 20.0
50% 3.0 30.0
75% 4.0 40.0
max 5.0 50.0Descriptive Statistics Including All Data Types
This example demonstrates how to include all data types in the descriptive statistics using include='all'.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': ['a', 'b', 'a', 'b', 'a']
})
# Generate descriptive statistics for all data types
result = df.describe(include='all')
print(result)Output
A B C
count 5.0 5.0 5
unique NaN NaN 2
top NaN NaN a
freq NaN NaN 3
mean 3.0 30.0 NaN
std 1.58 15.81 NaN
min 1.0 10.0 NaN
25% 2.0 20.0 NaN
50% 3.0 30.0 NaN
75% 4.0 40.0 NaN
max 5.0 50.0 NaNCustom Percentiles for Descriptive Statistics
This example demonstrates how to use the percentiles parameter to include custom percentiles in the output.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50]
})
# Generate descriptive statistics with custom percentiles
result = df.describe(percentiles=[.1, .9])
print(result)Output
A B
count 5.0 5.0
mean 3.0 30.0
std 1.58 15.81
min 1.0 10.0
10% 1.4 14.0
50% 3.0 30.0
90% 4.6 46.0
max 5.0 50.0Summary
In this tutorial, we explored the DataFrame.describe method in pandas. Key takeaways include:
- Using
describeto compute descriptive statistics for numerical columns by default. - Including all data types with the
include='all'parameter. - Customizing the output by specifying percentiles with the
percentilesparameter. - Excluding specific data types using the
excludeparameter.