Pandas DataFrame.insert

The DataFrame.insert method in pandas is used to insert a column into a DataFrame at a specified location. This is useful when you need to add new data to a DataFrame in a specific order.

Syntax

The syntax for DataFrame.insert is:

DataFrame.insert(loc, column, value, allow_duplicates=)

Here, DataFrame refers to the pandas DataFrame where the column is being inserted.

Parameters

Parameter	Description
`loc`	Integer specifying the column index where the new column is inserted. Must be within the range `[0, number of columns]`.
`column`	String representing the name of the column to insert.
`value`	The values to insert. Can be a scalar, list, or Series with the same length as the DataFrame.
`allow_duplicates`	If `False`, raises a ValueError if the column name already exists. If not specified, defaults to `False`.

Returns

None: Modifies the DataFrame in place.

Examples

Inserting a New Column at a Specific Position

Insert a new column into a DataFrame at a specified index.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35]
}
df = pd.DataFrame(data)

# Insert a new column for Salary at index 1
df.insert(loc=1, column='Salary', value=[70000, 80000, 90000])

print("DataFrame after inserting the 'Salary' column:")
print(df)

Output

DataFrame after inserting the 'Salary' column:
    Name  Salary  Age
0  Arjun   70000   25
1    Ram   80000   30
2   Priya   90000   35

Inserting a Column with Scalar Values

Insert a column where all values are the same.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35]
}
df = pd.DataFrame(data)

# Insert a column for Department with a scalar value
df.insert(loc=2, column='Department', value='Engineering')

print("DataFrame after inserting the 'Department' column:")
print(df)

Output

DataFrame after inserting the 'Department' column:
    Name  Age    Department
0  Arjun   25  Engineering
1    Ram   30  Engineering
2   Priya   35  Engineering

Preventing Duplicate Column Names

Use allow_duplicates=False to prevent adding columns with duplicate names.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35]
}
df = pd.DataFrame(data)

# Attempt to insert a duplicate column name
try:
    df.insert(loc=1, column='Age', value=[26, 31, 36], allow_duplicates=False)
except ValueError as e:
    print("Error:", e)

Output

Error: cannot insert Age, already exists

Inserting a Column with a pandas Series

Insert a new column using a pandas Series.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35]
}
df = pd.DataFrame(data)

# Create a Series for Salaries
salaries = pd.Series([70000, 80000, 90000])

# Insert the Series as a new column
df.insert(loc=2, column='Salary', value=salaries)

print("DataFrame after inserting the 'Salary' column as a Series:")
print(df)

Output

DataFrame after inserting the 'Salary' column as a Series:
    Name  Age  Salary
0  Arjun   25   70000
1    Ram   30   80000
2   Priya   35   90000

Summary

In this tutorial, we explored the DataFrame.insert method in pandas. Key takeaways include:

Using loc to specify the position of the new column.
Inserting values as scalars, lists, or pandas Series.
Preventing duplicate column names with allow_duplicates=False.

The DataFrame.insert method is a flexible and efficient way to add columns to a pandas DataFrame at specific positions.