Pandas – Set Column as Index
By default an index is created for DataFrame. But, you can set a specific column of DataFrame as index, if required.
To set a column as index for a DataFrame, use DataFrame.set_index() function, with the column name passed as argument.
You can also setup MultiIndex with multiple columns in the index. In this case, pass the array of column names required for index, to set_index() method.
Syntax of set_index()
The syntax of set_index() to setup a column as index is
where myDataFrame is the DataFrame for which you would like to set column_name column as index.
To setup MultiIndex, use the following syntax.
You can pass as many column names as required.
Note that set_index() method does not modify the original DataFrame, but returns the DataFrame with the column set as index.
1. Set column as index in Pandas DataFrame
In this example, we take a DataFrame, and try to set a column as index.
Run Code Copy
import pandas as pd # Initialize a dataframe df = pd.DataFrame( [[21, 'Amol', 72, 67], [23, 'Lini', 78, 69], [32, 'Kiku', 74, 56], [52, 'Ajit', 54, 76]], columns=['rollno', 'name', 'physics', 'botony']) print('DataFrame with default index\n', df) # Set column as index df = df.set_index('rollno') print('\nDataFrame with column as index\n',df)
The column rollno of the DataFrame is set as index.
Also, observe the output of original dataframe and the output of dataframe with rollno as index. In the original dataframe, there is a separate index column (first column) with no column name. But in our second dataframe, as existing column is acting as index, this column took the first place.
2. Set multi-index for DataFrame
In this example, we will pass multiple column names as an array to set_index() method to setup MultiIndex for the Pandas DataFrame.
Run Code Copy
import pandas as pd # Initialize a dataframe df = pd.DataFrame( [[21, 'Amol', 72, 67], [23, 'Lini', 78, 69], [32, 'Kiku', 74, 56], [52, 'Ajit', 54, 76]], columns=['rollno', 'name', 'physics', 'botony']) print('DataFrame with default index\n', df) # Set multiple columns as index df = df.set_index(['rollno','name']) print('\nDataFrame with MultiIndex\n',df)
D:\>python example1.py DataFrame with default index rollno name physics botony 0 21 Amol 72 67 1 23 Lini 78 69 2 32 Kiku 74 56 3 52 Ajit 54 76 DataFrame with MultiIndex physics botony rollno name 21 Amol 72 67 23 Lini 78 69 32 Kiku 74 56 52 Ajit 54 76
In this Pandas Tutorial, we learned how to set a specific column of the DataFrame as index.