Pandas – Set Column as Index
By default an index is created for DataFrame. But, you can set a specific column of DataFrame as index, if required.
To set a column as index for a DataFrame, use
DataFrame.set_index() function, with the column name passed as argument.
You can also setup MultiIndex with multiple columns in the index. In this case, pass the array of column names required for index, to set_index() method.
Syntax of set_index()
The syntax of set_index() to setup a column as index is
myDataFrame is the DataFrame for which you would like to set
column_name column as index.
To setup MultiIndex, use the following syntax.
You can pass as many column names as required.
set_index() method does not modify the original DataFrame, but returns the DataFrame with the column set as index.
Example 1: Set Column as Index in Pandas DataFrame
In this example, we take a DataFrame, and try to set a column as index.
import pandas as pd #initialize a dataframe df = pd.DataFrame( [[21, 'Amol', 72, 67], [23, 'Lini', 78, 69], [32, 'Kiku', 74, 56], [52, 'Ajit', 54, 76]], columns=['rollno', 'name', 'physics', 'botony']) print('DataFrame with default index\n', df) #set column as index df = df.set_index('rollno') print('\nDataFrame with column as index\n',df)
rollno of the DataFrame is set as index.
Also, observe the output of original dataframe and the output of dataframe with
rollno as index. In the original dataframe, there is a separate index column (first column) with no column name. But in our second dataframe, as existing column is acting as index, this column took the first place.
Example 2: Set MultiIndex for Pandas DataFrame
In this example, we will pass multiple column names as an array to set_index() method to setup MultiIndex for the Pandas DataFrame.
import pandas as pd #initialize a dataframe df = pd.DataFrame( [[21, 'Amol', 72, 67], [23, 'Lini', 78, 69], [32, 'Kiku', 74, 56], [52, 'Ajit', 54, 76]], columns=['rollno', 'name', 'physics', 'botony']) print('DataFrame with default index\n', df) #set multiple columns as index df = df.set_index(['rollno','name']) print('\nDataFrame with MultiIndex\n',df)
D:\>python example1.py DataFrame with default index rollno name physics botony 0 21 Amol 72 67 1 23 Lini 78 69 2 32 Kiku 74 56 3 52 Ajit 54 76 DataFrame with MultiIndex physics botony rollno name 21 Amol 72 67 23 Lini 78 69 32 Kiku 74 56 52 Ajit 54 76
In this Pandas Tutorial, we learned how to set a specific column of the DataFrame as index.