How to Concatenate DataFrames in Pandas? Examples


Concatenate DataFrames - pandas.concat()

You can concatenate two or more Pandas DataFrames with similar columns. To concatenate Pandas DataFrames, usually with similar columns, use pandas.concat() function.

In this tutorial, we will learn how to concatenate DataFrames with similar and different columns.

Python Pandas - Concatenate DataFrames

Syntax of pandas.concat() method

The syntax of pandas.concat() is:

pandas.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)

Video Tutorial

https://youtu.be/mqOSurULZw4?si=P9PYQRDyp8jjd0hl

Examples

1. Concatenate DataFrames with similar columns

In this example, we take two DataFrames with same column names and concatenate them using concat() function.

Python Program

import pandas as pd
	
df_1 = pd.DataFrame(
	[['Somu', 68, 84, 78, 96],
	['Kiku', 74, 56, 88, 85],
	['Ajit', 77, 73, 82, 87]],
	columns=['name', 'physics', 'chemistry','algebra','calculus'])

df_2 = pd.DataFrame(
	[['Amol', 72, 67, 91, 83],
	['Lini', 78, 69, 87, 92]],
	columns=['name', 'physics', 'chemistry','algebra','calculus'])	

frames = [df_1, df_2]

#concatenate dataframes
df = pd.concat(frames, sort=False)

#print dataframe
print("df_1\n------\n",df_1)
print("\ndf_2\n------\n",df_2)
print("\ndf\n--------\n",df)

Explanation

  1. The program imports the pandas library, which is used for working with structured data.
  2. Two DataFrames, df_1 and df_2, are created using the pd.DataFrame() function:
    • df_1: Contains data for three students with their scores in 'physics', 'chemistry', 'algebra', and 'calculus'.
    • df_2: Contains data for two additional students with similar columns as df_1.
  3. A list named frames is created to hold both df_1 and df_2 for further processing.
  4. The pd.concat() function is used to concatenate the two DataFrames along the rows. The argument sort=False ensures that the columns are not re-ordered, even if they are not in the same order in both DataFrames.
  5. The concatenated DataFrame, df, contains all the rows from both DataFrames, with any missing columns filled with NaN.
  6. Finally, the program prints the individual DataFrames df_1 and df_2, followed by the concatenated DataFrame df, to show how the data is combined.

Output

pandas.concat() function example

The two DataFrames are concatenated. But the index is not in order. You can reset the index by using reset_index() function.

Python Program

import pandas as pd
	
df_1 = pd.DataFrame(
	[['Somu', 68, 84, 78, 96],
	['Kiku', 74, 56, 88, 85],
	['Ajit', 77, 73, 82, 87]],
	columns=['name', 'physics', 'chemistry','algebra','calculus'])

df_2 = pd.DataFrame(
	[['Amol', 72, 67, 91, 83],
	['Lini', 78, 69, 87, 92]],
	columns=['name', 'physics', 'chemistry','algebra','calculus'])	

frames = [df_1, df_2]

#concatenate dataframes
df = pd.concat(frames)

# reset index
df.reset_index(drop=True, inplace=True)

#print dataframe
print(df)

Explanation

  1. The program imports the pandas library for creating and manipulating tabular data.
  2. Two DataFrames, df_1 and df_2, are created using the pd.DataFrame() function:
    • df_1: Contains three students' names and their scores in 'physics', 'chemistry', 'algebra', and 'calculus'.
    • df_2: Contains two additional students with the same column structure as df_1.
  3. A list named frames is created to hold both DataFrames.
  4. The pd.concat() function is used to concatenate the DataFrames along their rows, combining all records into a single DataFrame named df.
  5. After concatenation, the indices from the original DataFrames are preserved. To create a new sequential index, the reset_index() method is called with drop=True. This drops the old indices and resets the index to start from zero.
  6. The final DataFrame, df, contains all rows from df_1 and df_2 with a reset index, making the data more structured and consistent for further analysis.
  7. The program prints the updated DataFrame df, showing all rows with a uniform index.

Output

   name  physics  chemistry  algebra  calculus
0  Somu       68         84       78        96
1  Kiku       74         56       88        85
2  Ajit       77         73       82        87
3  Amol       72         67       91        83
4  Lini       78         69       87        92

2. Concatenate two DataFrames with different columns

In this following example, we take two DataFrames. The second dataframe has a new column, and does not contain one of the column that first dataframe has.

pandas.concat() function concatenates the two DataFrames and returns a new dataframe with the new columns as well. The dataframe row that has no value for the column will be filled with NaN short for Not a Number.

Python Program

import pandas as pd
	
df_1 = pd.DataFrame(
	[['Somu', 68, 84, 78, 96],
	['Kiku', 74, 56, 88, 85],
	['Ajit', 77, 73, 82, 87]],
	columns=['name', 'physics', 'chemistry','algebra','calculus'])

df_2 = pd.DataFrame(
	[['Amol', 72, 67, 91, 83],
	['Lini', 78, 69, 87, 92]],
	columns=['name', 'physics', 'chemistry','geometry','calculus'])	

frames = [df_1, df_2]

#concatenate dataframes
df = pd.concat(frames, sort=False)

#print dataframe
print("df_1\n------\n",df_1)
print("\ndf_2\n------\n",df_2)
print("\ndf\n--------\n",df)

Explanation

  1. The program imports the pandas library, which is commonly used for handling and analyzing tabular data.
  2. Two DataFrames, df_1 and df_2, are created using the pd.DataFrame() function:
    • df_1: Contains student names and their scores in 'physics', 'chemistry', 'algebra', and 'calculus'.
    • df_2: Contains student names and their scores in 'physics', 'chemistry', 'geometry', and 'calculus'.
  3. Both DataFrames have the 'name' column in common but differ in some of the other column names ('algebra' vs. 'geometry').
  4. A list named frames is created, containing df_1 and df_2 as its elements.
  5. The pd.concat() function is used to concatenate the two DataFrames along their rows. The argument sort=False ensures that the column order remains as it is in the input DataFrames.
  6. In the concatenated DataFrame df:
    • Columns present in both DataFrames ('name', 'physics', 'chemistry', and 'calculus') are combined.
    • Columns unique to either DataFrame ('algebra' or 'geometry') are included, but missing values in these columns for rows from the other DataFrame are filled with NaN (Not a Number).
  7. The individual DataFrames df_1 and df_2, as well as the concatenated DataFrame df, are printed to the console.

Output

pandas.concat() function example - when columns are different

Summary

In this tutorial of Python Examples, we learned how to concatenate one or more DataFrames into a single DataFrame, with the help of well detailed examples.