Pandas DataFrame.select_dtypes
Pandas DataFrame.select_dtypes
The DataFrame.select_dtypes
method in pandas is used to select columns in a DataFrame based on their data types. It allows filtering of columns by specifying which data types to include or exclude.
Syntax
The syntax for DataFrame.select_dtypes
is:
DataFrame.select_dtypes(include=None, exclude=None)
Here, DataFrame
refers to the pandas DataFrame whose columns are being filtered.
Parameters
Parameter | Description |
---|---|
include | Scalar or list-like data types to include. Can be a single data type (e.g., 'int' ) or a list (e.g., ['int', 'float'] ). |
exclude | Scalar or list-like data types to exclude. Can be a single data type or a list of data types. |
Examples
Selecting Numeric Columns
To select numeric columns in a DataFrame, use the include
parameter with data types such as 'number'
.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Suresh'],
'Age': [25, 30, 35],
'Salary': [70000.5, 80000.0, 90000.0],
'JoiningDate': pd.to_datetime(['2022-01-01', '2021-05-12', '2020-08-15'])
}
df = pd.DataFrame(data)
# Select numeric columns
numeric_columns = df.select_dtypes(include=['number'])
print("Numeric Columns:")
print(numeric_columns)
Output
Numeric Columns:
Age Salary
0 25 70000.5
1 30 80000.0
2 35 90000.0
Excluding Object Columns
To exclude object (string) columns, use the exclude
parameter with 'object'
.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Suresh'],
'Age': [25, 30, 35],
'Salary': [70000.5, 80000.0, 90000.0],
'JoiningDate': pd.to_datetime(['2022-01-01', '2021-05-12', '2020-08-15'])
}
df = pd.DataFrame(data)
# Exclude object columns
non_object_columns = df.select_dtypes(exclude=['object'])
print("Non-Object Columns:")
print(non_object_columns)
Output
Non-Object Columns:
Age Salary JoiningDate
0 25 70000.5 2022-01-01
1 30 80000.0 2021-05-12
2 35 90000.0 2020-08-15
Selecting Multiple Data Types
To select columns with multiple data types, pass a list to the include
parameter.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Suresh'],
'Age': [25, 30, 35],
'Salary': [70000.5, 80000.0, 90000.0],
'JoiningDate': pd.to_datetime(['2022-01-01', '2021-05-12', '2020-08-15'])
}
df = pd.DataFrame(data)
# Select columns with numeric and datetime data types
selected_columns = df.select_dtypes(include=['number', 'datetime'])
print("Selected Columns (Numeric and Datetime):")
print(selected_columns)
Output
Selected Columns (Numeric and Datetime):
Age Salary JoiningDate
0 25 70000.5 2022-01-01
1 30 80000.0 2021-05-12
2 35 90000.0 2020-08-15
Summary
In this tutorial, we explored the DataFrame.select_dtypes
method in pandas. Key takeaways include:
- Using
include
to filter columns by data type - Using
exclude
to exclude specific data types - Selecting multiple data types at once
The DataFrame.select_dtypes
method is an efficient way to filter DataFrame columns based on data types for further analysis.