Pandas DataFrame.to_numpy


Pandas DataFrame.to_numpy

The DataFrame.to_numpy method in pandas is used to convert the data in a DataFrame into a NumPy array. This method provides a simple way to work with pandas DataFrame data using NumPy for numerical computations or analysis.


Syntax

The syntax for DataFrame.to_numpy is:

DataFrame.to_numpy(dtype=None, copy=False, na_value=)

Here, DataFrame refers to the pandas DataFrame being converted to a NumPy array.


Parameters

ParameterDescription
dtypeData type to force. If None, the data type is inferred.
copyEnsures that the returned array is a copy of the original data. Defaults to False.
na_valueSpecifies the value to use for missing values (np.nan or other). Defaults to .

Returns

A two-dimensional numpy.ndarray containing the DataFrame's data.


Examples

Basic Conversion to NumPy Array

Convert a pandas DataFrame to a NumPy array.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000.5, 80000.0, 90000.0]
}
df = pd.DataFrame(data)

# Convert the DataFrame to a NumPy array
print("DataFrame as NumPy Array:")
numpy_array = df.to_numpy()
print(numpy_array)

Output

DataFrame as NumPy Array:
[['Arjun' 25 70000.5]
 ['Ram' 30 80000.0]
 ['Priya' 35 90000.0]]

Specifying a Data Type

Use the dtype parameter to specify the data type of the resulting NumPy array.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000.5, 80000.0, 90000.0]
}
df = pd.DataFrame(data)

# Convert to a NumPy array with specified data type
print("DataFrame as NumPy Array (dtype=object):")
numpy_array = df.to_numpy(dtype='object')
print(numpy_array)

Output

DataFrame as NumPy Array (dtype=object):
[['Arjun' 25 70000.5]
 ['Ram' 30 80000.0]
 ['Priya' 35 90000.0]]

Handling Missing Values

Use the na_value parameter to replace missing values in the resulting array.

Python Program

import pandas as pd
import numpy as np

# Create a DataFrame with missing values
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, None, 35],
    'Salary': [70000.5, 80000.0, None]
}
df = pd.DataFrame(data)

# Convert to a NumPy array with specified na_value
print("DataFrame as NumPy Array with Missing Values Replaced:")
numpy_array = df.to_numpy(na_value=-1)
print(numpy_array)

Output

DataFrame as NumPy Array with Missing Values Replaced:
[['Arjun' 25.0 70000.5]
 ['Ram' -1.0 80000.0]
 ['Priya' 35.0 -1.0]]

Ensuring a Copy of the Data

Set copy=True to ensure the returned array is a copy of the original data.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000.5, 80000.0, 90000.0]
}
df = pd.DataFrame(data)

# Convert to a NumPy array with copy=True
print("DataFrame as NumPy Array (copy=True):")
numpy_array = df.to_numpy(copy=True)
print(numpy_array)

Output

DataFrame as NumPy Array (copy=True):
[['Arjun' 25 70000.5]
 ['Ram' 30 80000.0]
 ['Priya' 35 90000.0]]

Summary

In this tutorial, we explored the DataFrame.to_numpy method in pandas. Key takeaways include:

  • Using to_numpy to convert a DataFrame to a NumPy array.
  • Specifying the data type using the dtype parameter.
  • Handling missing values with na_value.
  • Ensuring a copy of the data with copy=True.

The DataFrame.to_numpy method is a versatile tool for integrating pandas data with NumPy-based workflows.


Python Libraries