Pandas DataFrame.groupby: Group Data in a DataFrame

Pandas DataFrame.groupby

The DataFrame.groupby method in pandas is used to group data in a DataFrame based on one or more columns or criteria. It allows you to split the data into groups, apply functions to each group, and combine the results.

Syntax

The syntax for DataFrame.groupby is:

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)

Here, DataFrame refers to the pandas DataFrame being grouped.

Parameters

Parameter	Description
`by`	Specifies the criteria for grouping. This can be a column name, list of column names, a function, or a dictionary.
`axis`	Specifies the axis along which the grouping is performed. Use `0` or `'index'` to group by rows, and `1` or `'columns'` to group by columns. Defaults to `0`.
`level`	Specifies the level(s) of a MultiIndex to group by. Used when grouping by index levels.
`as_index`	If `True`, the grouped columns become the index of the result. Defaults to `True`.
`sort`	If `True`, the group keys are sorted. Defaults to `True`.
`group_keys`	If `True`, group keys are added to the index to identify pieces of the original DataFrame. Defaults to `True`.
`observed`	If `True`, only observed values for categorical groupers are used. Defaults to `False`.
`dropna`	If `True`, rows with missing values in the group key are dropped. Defaults to `True`.

Returns

A GroupBy object, which can be used to apply aggregation, transformation, or filtering operations to the grouped data.

Examples

Grouping a DataFrame by a Single Column

This example demonstrates how to use groupby to group a DataFrame by a single column and calculate the mean of each group.

Python Program

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Category': ['A', 'B', 'A', 'B', 'A'],
    'Values': [10, 20, 30, 40, 50]
})

# Group by the 'Category' column and calculate the mean
result = df.groupby('Category').mean()
print(result)

Output

          Values
Category       
A             30
B             30

Grouping a DataFrame by Multiple Columns

This example shows how to use groupby to group a DataFrame by multiple columns and calculate the sum of each group.

Python Program

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Category': ['A', 'B', 'A', 'B', 'A'],
    'Subcategory': ['X', 'X', 'Y', 'Y', 'X'],
    'Values': [10, 20, 30, 40, 50]
})

# Group by 'Category' and 'Subcategory' columns and calculate the sum
result = df.groupby(['Category', 'Subcategory']).sum()
print(result)

Output

                  Values
Category Subcategory       
A        X               60
         Y               30
B        X               20
         Y               40

Grouping a DataFrame and Applying Multiple Aggregations

This example demonstrates how to use groupby to group a DataFrame and apply multiple aggregation functions (e.g., sum and mean) to each group.

Python Program

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Category': ['A', 'B', 'A', 'B', 'A'],
    'Values': [10, 20, 30, 40, 50]
})

# Group by the 'Category' column and apply multiple aggregations
result = df.groupby('Category').agg(['sum', 'mean'])
print(result)

Output

          Values       
             sum  mean
Category              
A             90    30
B             60    30

Grouping a DataFrame and Filtering Groups

This example shows how to use groupby to filter groups based on a condition (e.g., groups with a sum greater than 50).

Python Program

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Category': ['A', 'B', 'A', 'B', 'A'],
    'Values': [10, 20, 30, 40, 50]
})

# Group by the 'Category' column and filter groups
result = df.groupby('Category').filter(lambda x: x['Values'].sum() > 50)
print(result)

Output

  Category  Values
0        A      10
2        A      30
4        A      50

Summary

In this tutorial, we explored the DataFrame.groupby method in pandas. Key takeaways include:

Using groupby to group data in a DataFrame by one or more columns.
Applying aggregation functions (e.g., sum, mean) to grouped data.
Filtering groups based on specific conditions.
Understanding the parameters of groupby, such as by, axis, and dropna.