Pandas DataFrame.convert_dtypes

The DataFrame.convert_dtypes method in pandas is used to convert columns of a DataFrame to the best possible data types. It ensures compatibility with newer features like nullable dtypes and provides optimized data types for memory efficiency and operations.

Syntax

The syntax for DataFrame.convert_dtypes is:

DataFrame.convert_dtypes(
    infer_objects=True,
    convert_string=True,
    convert_integer=True,
    convert_boolean=True,
    convert_floating=True,
    dtype_backend='numpy_nullable'
)

Here, DataFrame refers to the pandas DataFrame being converted to optimal dtypes.

Parameters

Parameter	Description
`infer_objects`	If `True`, attempts to infer better types for object columns. Defaults to `True`.
`convert_string`	If `True`, converts object dtypes to string dtypes. Defaults to `True`.
`convert_integer`	If `True`, converts integer columns to the nullable integer dtype. Defaults to `True`.
`convert_boolean`	If `True`, converts boolean columns to the nullable boolean dtype. Defaults to `True`.
`convert_floating`	If `True`, converts floating columns to the nullable floating dtype. Defaults to `True`.
`dtype_backend`	Specifies the dtype storage format: `'numpy_nullable'` (default) or `'pyarrow'` for PyArrow-backed arrays.

Returns

A DataFrame with updated dtypes.

Examples

Converting Object Columns to Optimal Dtypes

Use convert_dtypes to convert object columns to the most appropriate dtypes, including nullable types.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, None, 35],
    'Salary': [70000.5, 80000.0, None],
    'IsEmployed': [True, False, None]
}
df = pd.DataFrame(data)

# Convert dtypes
print("DataFrame with Converted Dtypes:")
df_converted = df.convert_dtypes()
print(df_converted)
print("\nDtypes After Conversion:")
print(df_converted.dtypes)

Output

DataFrame with Converted Dtypes:
    Name   Age   Salary IsEmployed
0  Arjun  25.0  70000.5       True
1    Ram   NaN  80000.0      False
2   Priya  35.0      NaN       NaN

Dtypes After Conversion:
Name           string
Age           Int64
Salary        Float64
IsEmployed    boolean
dtype: object

Handling Integer and Floating Columns

Convert integer and floating-point columns to nullable dtypes.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'ID': [1, 2, None],
    'Score': [98.5, None, 88.0]
}
df = pd.DataFrame(data)

# Convert dtypes
print("DataFrame with Nullable Integer and Floating Types:")
df_converted = df.convert_dtypes()
print(df_converted)
print("\nDtypes After Conversion:")
print(df_converted.dtypes)

Output

DataFrame with Nullable Integer and Floating Types:
      ID  Score
0    1.0   98.5
1    NaN    NaN
2    NaN   88.0

Dtypes After Conversion:
ID       Int64
Score    Float64
dtype: object

Using PyArrow Backend

Use the dtype_backend='pyarrow' parameter for PyArrow-backed arrays, which can improve memory usage and performance.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000.5, 80000.0, 90000.0]
}
df = pd.DataFrame(data)

# Convert dtypes with PyArrow backend
print("DataFrame with PyArrow Backend:")
df_arrow = df.convert_dtypes(dtype_backend='pyarrow')
print(df_arrow)
print("\nDtypes with PyArrow Backend:")
print(df_arrow.dtypes)

Output

DataFrame with PyArrow Backend:
    Name  Age   Salary
0  Arjun   25  70000.5
1    Ram   30  80000.0
2   Priya   35  90000.0

Dtypes with PyArrow Backend:
Name      string[pyarrow]
Age       int64[pyarrow]
Salary  double[pyarrow]
dtype: object

Controlling Specific Conversions

You can control specific conversions like skipping convert_string or convert_boolean.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, None, 35],
    'IsEmployed': [True, False, None]
}
df = pd.DataFrame(data)

# Convert dtypes without converting strings
print("DataFrame with Selected Conversions:")
df_partial = df.convert_dtypes(convert_string=False)
print(df_partial)
print("\nDtypes After Partial Conversion:")
print(df_partial.dtypes)

Output

DataFrame with Selected Conversions:
    Name   Age IsEmployed
0  Arjun  25.0       True
1    Ram   NaN      False
2   Priya  35.0       NaN

Dtypes After Partial Conversion:
Name           object
Age           Float64
IsEmployed    boolean
dtype: object

Summary

In this tutorial, we explored the DataFrame.convert_dtypes method in pandas. Key takeaways include:

Converting columns to optimal nullable dtypes.
Using the PyArrow backend for enhanced performance and memory efficiency.
Controlling specific conversions with parameters like convert_string and convert_boolean.

The DataFrame.convert_dtypes method is a powerful tool for ensuring efficient and compatible data types in pandas DataFrames.