Pandas - Sort DataFrame by Multiple Columns


Pandas - Sort DataFrame by Multiple Columns

To sort a DataFrame by multiple columns in Python Pandas, you can use pandas.DataFrame.sort_values() method. Call the sort_values() method on the DataFrame object, and pass the required columns as a list of strings to the by named argument.

dataframe.sort_values(by=['column_1', 'column_2'])

Replace the column_1 and column_2 with the column names required, and you may specify more than two columns by adding them to the list.

The sort_values() method does not modify the original DataFrame, but sorts the contents of given DataFrame in ascending order and returns a new sorted DataFrame.

You can sort the DataFrame by specified columns in ascending or descending order. To specify the sorting order of the columns, give ascending parameter with the list of sorting orders respective to the sorting order of columns.

dataframe.sort_values(by=['column_1', 'column_2'], ascending=[True, False])

In this tutorial, we shall go through some example programs, where we shall sort DataFrame by multiple columns in ascending or descending order.


Examples

1. Sort DataFrame by two columns

In this example, we shall take a DataFrame in df_1 with three columns: A, B, and C. We have to sort the DataFrame by the columns A and B.

  1. Given a DataFrame in df_1 with three columns: A, B, and C.
  2. Call sort_values() method on df_1, and pass the list of column names ['A', 'B'] for the by named argument.
  3. The sort_values() method returns a new DataFrame with the contents of original DataFrame sorted by the specified columns in ascending order.
  4. Assign the returned value to a variable, say sorted_df, and print it to output.

Python Program

import pandas as pd

# Take a DataFrame
df_1 = pd.DataFrame({
    'A': [1, 1, 3, 5, 1],
    'B': [2, 4, 5, 1, 3],
    'C': [10, 20, 10, 30, 20]
})

# Sort DataFrame by two columns
sorted_df = df_1.sort_values(by=['A', 'B'])
print(sorted_df)

Output

   A  B   C
0  1  2  10
4  1  3  20
1  1  4  20
2  3  5  10
3  5  1  30

The values of column A is in ascending order. Now, observe the second column B. For the same value of column A, say 1, the values of column B are sorted in ascending order. Let us specify just those two columns and the first three rows.

   A  B
0  1  2
4  1  3
1  1  4

2. Sort DataFrame by multiple columns in respective orders

To sort a DataFrame by multiple columns with sorting order specified for each column, pass ascending argument to the sort_values() method with the list of sorting orders.

In this example, we have to sort the DataFrame in df_1 by columns ['A', 'B'] where column A has to sorted in descending order and the column B has to be sorted in ascending order.

     by               ascending     ← named arguments to sort_values() method
(column names)     (sorting order)
    A  B       →    [False, True]

for column 'A'  →  ascending=False
for column 'B'  →  ascending=True
  1. Given a DataFrame in df_1 with three columns: A, B, and C.
  2. Call sort_values() method on df_1, and pass the list of column names ['A', 'B'] for the by named argument, and the list of sorting orders [False, True] for the ascending named argument.
  3. The sort_values() method returns a new DataFrame with the contents of original DataFrame sorted by the specified columns in specified sorting orders.
  4. Assign the returned value to a variable, say sorted_df, and print it to output.

Python Program

import pandas as pd

# Take a DataFrame
df_1 = pd.DataFrame({
    'A': [1, 1, 3, 5, 1],
    'B': [2, 4, 5, 1, 3],
    'C': [10, 20, 10, 30, 20]
})

# Sort DataFrame by two columns
sorted_df = df_1.sort_values(by=['A', 'B'], ascending=[False, True])
print(sorted_df)

Output

   A  B   C
3  5  1  30
2  3  5  10
0  1  2  10
4  1  3  20
1  1  4  20

Summary

In this Pandas Tutorial, we learned how to sort a DataFrame by multiple columns in specified sorting order, using sort_values() method of the DataFrame instance, with the help of well detailed Python example programs.




Python Libraries