Pandas DataFrame.groupby: Group Data in a DataFrame
Pandas DataFrame.groupby
The DataFrame.groupby method in pandas is used to group data in a DataFrame based on one or more columns or criteria. It allows you to split the data into groups, apply functions to each group, and combine the results.
Syntax
The syntax for DataFrame.groupby is:
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)Here, DataFrame refers to the pandas DataFrame being grouped.
Parameters
| Parameter | Description |
|---|---|
by | Specifies the criteria for grouping. This can be a column name, list of column names, a function, or a dictionary. |
axis | Specifies the axis along which the grouping is performed. Use 0 or 'index' to group by rows, and 1 or 'columns' to group by columns. Defaults to 0. |
level | Specifies the level(s) of a MultiIndex to group by. Used when grouping by index levels. |
as_index | If True, the grouped columns become the index of the result. Defaults to True. |
sort | If True, the group keys are sorted. Defaults to True. |
group_keys | If True, group keys are added to the index to identify pieces of the original DataFrame. Defaults to True. |
observed | If True, only observed values for categorical groupers are used. Defaults to False. |
dropna | If True, rows with missing values in the group key are dropped. Defaults to True. |
Returns
A GroupBy object, which can be used to apply aggregation, transformation, or filtering operations to the grouped data.
Examples
Grouping a DataFrame by a Single Column
This example demonstrates how to use groupby to group a DataFrame by a single column and calculate the mean of each group.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'Category': ['A', 'B', 'A', 'B', 'A'],
'Values': [10, 20, 30, 40, 50]
})
# Group by the 'Category' column and calculate the mean
result = df.groupby('Category').mean()
print(result)Output
Values
Category
A 30
B 30Grouping a DataFrame by Multiple Columns
This example shows how to use groupby to group a DataFrame by multiple columns and calculate the sum of each group.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'Category': ['A', 'B', 'A', 'B', 'A'],
'Subcategory': ['X', 'X', 'Y', 'Y', 'X'],
'Values': [10, 20, 30, 40, 50]
})
# Group by 'Category' and 'Subcategory' columns and calculate the sum
result = df.groupby(['Category', 'Subcategory']).sum()
print(result)Output
Values
Category Subcategory
A X 60
Y 30
B X 20
Y 40Grouping a DataFrame and Applying Multiple Aggregations
This example demonstrates how to use groupby to group a DataFrame and apply multiple aggregation functions (e.g., sum and mean) to each group.
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'Category': ['A', 'B', 'A', 'B', 'A'],
'Values': [10, 20, 30, 40, 50]
})
# Group by the 'Category' column and apply multiple aggregations
result = df.groupby('Category').agg(['sum', 'mean'])
print(result)Output
Values
sum mean
Category
A 90 30
B 60 30Grouping a DataFrame and Filtering Groups
This example shows how to use groupby to filter groups based on a condition (e.g., groups with a sum greater than 50).
Python Program
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'Category': ['A', 'B', 'A', 'B', 'A'],
'Values': [10, 20, 30, 40, 50]
})
# Group by the 'Category' column and filter groups
result = df.groupby('Category').filter(lambda x: x['Values'].sum() > 50)
print(result)Output
Category Values
0 A 10
2 A 30
4 A 50Summary
In this tutorial, we explored the DataFrame.groupby method in pandas. Key takeaways include:
- Using
groupbyto group data in a DataFrame by one or more columns. - Applying aggregation functions (e.g.,
sum,mean) to grouped data. - Filtering groups based on specific conditions.
- Understanding the parameters of
groupby, such asby,axis, anddropna.