Pandas DataFrame.convert_dtypes
Pandas DataFrame.convert_dtypes
The DataFrame.convert_dtypes
method in pandas is used to convert columns of a DataFrame to the best possible data types. It ensures compatibility with newer features like nullable dtypes and provides optimized data types for memory efficiency and operations.
Syntax
The syntax for DataFrame.convert_dtypes
is:
DataFrame.convert_dtypes(
infer_objects=True,
convert_string=True,
convert_integer=True,
convert_boolean=True,
convert_floating=True,
dtype_backend='numpy_nullable'
)
Here, DataFrame
refers to the pandas DataFrame being converted to optimal dtypes.
Parameters
Parameter | Description |
---|---|
infer_objects | If True , attempts to infer better types for object columns. Defaults to True . |
convert_string | If True , converts object dtypes to string dtypes. Defaults to True . |
convert_integer | If True , converts integer columns to the nullable integer dtype. Defaults to True . |
convert_boolean | If True , converts boolean columns to the nullable boolean dtype. Defaults to True . |
convert_floating | If True , converts floating columns to the nullable floating dtype. Defaults to True . |
dtype_backend | Specifies the dtype storage format: 'numpy_nullable' (default) or 'pyarrow' for PyArrow-backed arrays. |
Returns
A DataFrame with updated dtypes.
Examples
Converting Object Columns to Optimal Dtypes
Use convert_dtypes
to convert object columns to the most appropriate dtypes, including nullable types.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, None, 35],
'Salary': [70000.5, 80000.0, None],
'IsEmployed': [True, False, None]
}
df = pd.DataFrame(data)
# Convert dtypes
print("DataFrame with Converted Dtypes:")
df_converted = df.convert_dtypes()
print(df_converted)
print("\nDtypes After Conversion:")
print(df_converted.dtypes)
Output
DataFrame with Converted Dtypes:
Name Age Salary IsEmployed
0 Arjun 25.0 70000.5 True
1 Ram NaN 80000.0 False
2 Priya 35.0 NaN NaN
Dtypes After Conversion:
Name string
Age Int64
Salary Float64
IsEmployed boolean
dtype: object
Handling Integer and Floating Columns
Convert integer and floating-point columns to nullable dtypes.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'ID': [1, 2, None],
'Score': [98.5, None, 88.0]
}
df = pd.DataFrame(data)
# Convert dtypes
print("DataFrame with Nullable Integer and Floating Types:")
df_converted = df.convert_dtypes()
print(df_converted)
print("\nDtypes After Conversion:")
print(df_converted.dtypes)
Output
DataFrame with Nullable Integer and Floating Types:
ID Score
0 1.0 98.5
1 NaN NaN
2 NaN 88.0
Dtypes After Conversion:
ID Int64
Score Float64
dtype: object
Using PyArrow Backend
Use the dtype_backend='pyarrow'
parameter for PyArrow-backed arrays, which can improve memory usage and performance.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000.5, 80000.0, 90000.0]
}
df = pd.DataFrame(data)
# Convert dtypes with PyArrow backend
print("DataFrame with PyArrow Backend:")
df_arrow = df.convert_dtypes(dtype_backend='pyarrow')
print(df_arrow)
print("\nDtypes with PyArrow Backend:")
print(df_arrow.dtypes)
Output
DataFrame with PyArrow Backend:
Name Age Salary
0 Arjun 25 70000.5
1 Ram 30 80000.0
2 Priya 35 90000.0
Dtypes with PyArrow Backend:
Name string[pyarrow]
Age int64[pyarrow]
Salary double[pyarrow]
dtype: object
Controlling Specific Conversions
You can control specific conversions like skipping convert_string
or convert_boolean
.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, None, 35],
'IsEmployed': [True, False, None]
}
df = pd.DataFrame(data)
# Convert dtypes without converting strings
print("DataFrame with Selected Conversions:")
df_partial = df.convert_dtypes(convert_string=False)
print(df_partial)
print("\nDtypes After Partial Conversion:")
print(df_partial.dtypes)
Output
DataFrame with Selected Conversions:
Name Age IsEmployed
0 Arjun 25.0 True
1 Ram NaN False
2 Priya 35.0 NaN
Dtypes After Partial Conversion:
Name object
Age Float64
IsEmployed boolean
dtype: object
Summary
In this tutorial, we explored the DataFrame.convert_dtypes
method in pandas. Key takeaways include:
- Converting columns to optimal nullable dtypes.
- Using the PyArrow backend for enhanced performance and memory efficiency.
- Controlling specific conversions with parameters like
convert_string
andconvert_boolean
.
The DataFrame.convert_dtypes
method is a powerful tool for ensuring efficient and compatible data types in pandas DataFrames.