Pandas DataFrame.to_numpy
Pandas DataFrame.to_numpy
The DataFrame.to_numpy
method in pandas is used to convert the data in a DataFrame into a NumPy array. This method provides a simple way to work with pandas DataFrame data using NumPy for numerical computations or analysis.
Syntax
The syntax for DataFrame.to_numpy
is:
DataFrame.to_numpy(dtype=None, copy=False, na_value=)
Here, DataFrame
refers to the pandas DataFrame being converted to a NumPy array.
Parameters
Parameter | Description |
---|---|
dtype | Data type to force. If None , the data type is inferred. |
copy | Ensures that the returned array is a copy of the original data. Defaults to False . |
na_value | Specifies the value to use for missing values (np.nan or other). Defaults to . |
Returns
A two-dimensional numpy.ndarray
containing the DataFrame's data.
Examples
Basic Conversion to NumPy Array
Convert a pandas DataFrame to a NumPy array.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000.5, 80000.0, 90000.0]
}
df = pd.DataFrame(data)
# Convert the DataFrame to a NumPy array
print("DataFrame as NumPy Array:")
numpy_array = df.to_numpy()
print(numpy_array)
Output
DataFrame as NumPy Array:
[['Arjun' 25 70000.5]
['Ram' 30 80000.0]
['Priya' 35 90000.0]]
Specifying a Data Type
Use the dtype
parameter to specify the data type of the resulting NumPy array.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000.5, 80000.0, 90000.0]
}
df = pd.DataFrame(data)
# Convert to a NumPy array with specified data type
print("DataFrame as NumPy Array (dtype=object):")
numpy_array = df.to_numpy(dtype='object')
print(numpy_array)
Output
DataFrame as NumPy Array (dtype=object):
[['Arjun' 25 70000.5]
['Ram' 30 80000.0]
['Priya' 35 90000.0]]
Handling Missing Values
Use the na_value
parameter to replace missing values in the resulting array.
Python Program
import pandas as pd
import numpy as np
# Create a DataFrame with missing values
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, None, 35],
'Salary': [70000.5, 80000.0, None]
}
df = pd.DataFrame(data)
# Convert to a NumPy array with specified na_value
print("DataFrame as NumPy Array with Missing Values Replaced:")
numpy_array = df.to_numpy(na_value=-1)
print(numpy_array)
Output
DataFrame as NumPy Array with Missing Values Replaced:
[['Arjun' 25.0 70000.5]
['Ram' -1.0 80000.0]
['Priya' 35.0 -1.0]]
Ensuring a Copy of the Data
Set copy=True
to ensure the returned array is a copy of the original data.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000.5, 80000.0, 90000.0]
}
df = pd.DataFrame(data)
# Convert to a NumPy array with copy=True
print("DataFrame as NumPy Array (copy=True):")
numpy_array = df.to_numpy(copy=True)
print(numpy_array)
Output
DataFrame as NumPy Array (copy=True):
[['Arjun' 25 70000.5]
['Ram' 30 80000.0]
['Priya' 35 90000.0]]
Summary
In this tutorial, we explored the DataFrame.to_numpy
method in pandas. Key takeaways include:
- Using
to_numpy
to convert a DataFrame to a NumPy array. - Specifying the data type using the
dtype
parameter. - Handling missing values with
na_value
. - Ensuring a copy of the data with
copy=True
.
The DataFrame.to_numpy
method is a versatile tool for integrating pandas data with NumPy-based workflows.