Pandas DataFrame.select_dtypes


Pandas DataFrame.select_dtypes

The DataFrame.select_dtypes method in pandas is used to select columns in a DataFrame based on their data types. It allows filtering of columns by specifying which data types to include or exclude.


Syntax

The syntax for DataFrame.select_dtypes is:

DataFrame.select_dtypes(include=None, exclude=None)

Here, DataFrame refers to the pandas DataFrame whose columns are being filtered.


Parameters

ParameterDescription
includeScalar or list-like data types to include. Can be a single data type (e.g., 'int') or a list (e.g., ['int', 'float']).
excludeScalar or list-like data types to exclude. Can be a single data type or a list of data types.

Examples

Selecting Numeric Columns

To select numeric columns in a DataFrame, use the include parameter with data types such as 'number'.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Suresh'],
    'Age': [25, 30, 35],
    'Salary': [70000.5, 80000.0, 90000.0],
    'JoiningDate': pd.to_datetime(['2022-01-01', '2021-05-12', '2020-08-15'])
}
df = pd.DataFrame(data)

# Select numeric columns
numeric_columns = df.select_dtypes(include=['number'])
print("Numeric Columns:")
print(numeric_columns)

Output

Numeric Columns:
   Age   Salary
0   25  70000.5
1   30  80000.0
2   35  90000.0

Excluding Object Columns

To exclude object (string) columns, use the exclude parameter with 'object'.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Suresh'],
    'Age': [25, 30, 35],
    'Salary': [70000.5, 80000.0, 90000.0],
    'JoiningDate': pd.to_datetime(['2022-01-01', '2021-05-12', '2020-08-15'])
}
df = pd.DataFrame(data)

# Exclude object columns
non_object_columns = df.select_dtypes(exclude=['object'])
print("Non-Object Columns:")
print(non_object_columns)

Output

Non-Object Columns:
   Age   Salary JoiningDate
0   25  70000.5 2022-01-01
1   30  80000.0 2021-05-12
2   35  90000.0 2020-08-15

Selecting Multiple Data Types

To select columns with multiple data types, pass a list to the include parameter.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Suresh'],
    'Age': [25, 30, 35],
    'Salary': [70000.5, 80000.0, 90000.0],
    'JoiningDate': pd.to_datetime(['2022-01-01', '2021-05-12', '2020-08-15'])
}
df = pd.DataFrame(data)

# Select columns with numeric and datetime data types
selected_columns = df.select_dtypes(include=['number', 'datetime'])
print("Selected Columns (Numeric and Datetime):")
print(selected_columns)

Output

Selected Columns (Numeric and Datetime):
   Age   Salary JoiningDate
0   25  70000.5 2022-01-01
1   30  80000.0 2021-05-12
2   35  90000.0 2020-08-15

Summary

In this tutorial, we explored the DataFrame.select_dtypes method in pandas. Key takeaways include:

  • Using include to filter columns by data type
  • Using exclude to exclude specific data types
  • Selecting multiple data types at once

The DataFrame.select_dtypes method is an efficient way to filter DataFrame columns based on data types for further analysis.


Python Libraries