Python Pickle - Pandas DataFrame
Python Pickle - Pandas DataFrame
To pickle a DataFrame in Python, use pickle.dump()
, and to unpickle it, use pickle.load()
.
In this tutorial, we shall learn how to pickle and unpickle a Pandas DataFrame, with the help of example programs.
1. Pickle a DataFrame
In the following example, we will initialize a DataFrame and then pickle it to a file. The steps for pickling are:
- Create a file in write mode and handle the file as binary.
- Call the function
pickle.dump(file, dataframe)
.
Python Program
import numpy as np
import pandas as pd
import pickle
# Initialize the DataFrame
df = pd.DataFrame(
[['Somu', 68, 84, 78, 96],
['Kiku', 74, 56, 88, 85],
['Amol', 77, 73, 82, 87],
['Lini', 78, 69, 87, 92]],
columns=['name', 'physics', 'chemistry','algebra','calculus'])
#create a file
picklefile = open('df_marks', 'wb')
# Pickle the DataFrame
pickle.dump(df, picklefile)
# Close file
picklefile.close()
Explanation:
- The DataFrame
df
is created using Pandas with the provided data. - The file
df_marks
is opened in write-binary mode usingopen('df_marks', 'wb')
. - The
pickle.dump()
function serializes the DataFrame and writes it to the file. - The file is then closed to ensure the serialized data is saved correctly.
The pickle file df_marks is now created in the current working directory.
2. Un-pickle a DataFrame
In the following example, we will read the pickle file and then unpickle it to retrieve the original DataFrame.
The steps for unpickling are:
- Read the file in read mode and handle the file as binary.
- Call the function
pickle.load(file)
to deserialize the DataFrame.
Python Program
import numpy as np
import pandas as pd
import pickle
# Read the pickle file
picklefile = open('df_marks', 'rb')
# Unpickle the DataFrame
df = pickle.load(picklefile)
# Close the file
picklefile.close()
# Print the DataFrame
print(type(df))
print(df)
Explanation:
- The file
df_marks
is opened in read-binary mode usingopen('df_marks', 'rb')
. - The
pickle.load()
function deserializes the data and reconstructs the original DataFrame. - We print the type of the object to verify it is a
pandas.DataFrame
, and then print the entire DataFrame.
Output
<class 'pandas.core.frame.DataFrame'>
name physics chemistry algebra calculus
0 Somu 68 84 78 96
1 Kiku 74 56 88 85
2 Amol 77 73 82 87
3 Lini 78 69 87 92
3. Pickling and Unpickling with Different File Formats
In some cases, you might want to pickle and unpickle DataFrames with different file formats, such as using a specific file extension for clarity. Here's an example where we use .pkl
extension for the pickle file:
Python Program
# Save DataFrame with .pkl extension
picklefile = open('df_marks.pkl', 'wb')
pickle.dump(df, picklefile)
picklefile.close()
# Load DataFrame from .pkl file
picklefile = open('df_marks.pkl', 'rb')
df = pickle.load(picklefile)
picklefile.close()
# Print the loaded DataFrame
print(df)
Explanation:
- The DataFrame is pickled into a file named
df_marks.pkl
using the.pkl
extension for clarity. - The DataFrame is unpickled from the file
df_marks.pkl
and then printed to confirm the successful operation.
Summary
In this tutorial, we covered how to serialize and deserialize Pandas DataFrames using the Pickle library. We demonstrated how to pickle and unpickle DataFrames using different file formats and explained the process with code examples.