Pandas DataFrame.query
Pandas DataFrame.query
The DataFrame.query method in pandas allows querying and filtering rows of a DataFrame using a string expression. It provides an intuitive way to subset data without explicitly using indexing or boolean masking.
Syntax
The syntax for DataFrame.query is:
DataFrame.query(expr, *, inplace=False, **kwargs)Here, DataFrame refers to the pandas DataFrame being queried.
Parameters
| Parameter | Description |
|---|---|
expr | A string expression to evaluate. Column names can be referenced directly in the expression. |
inplace | If True, modifies the original DataFrame. If False, returns a new DataFrame. Defaults to False. |
**kwargs | Additional keyword arguments passed to pandas.eval, such as engine and parser. |
Returns
A DataFrame filtered by the given query expression.
Examples
Querying Rows Based on a Single Condition
Filter rows where the Age column is greater than 30.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)
# Query rows where Age > 30
print("Rows where Age > 30:")
filtered_df = df.query('Age > 30')
print(filtered_df)Output
Rows where Age > 30:
Name Age Salary
2 Priya 35 90000Querying Rows Based on Multiple Conditions
Filter rows where Age is greater than 25 and Salary is less than 90000.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)
# Query rows with multiple conditions
print("Rows where Age > 25 and Salary < 90000:")
filtered_df = df.query('Age > 25 and Salary < 90000')
print(filtered_df)Output
Rows where Age > 25 and Salary < 90000:
Name Age Salary
1 Ram 30 80000Using Variables in Query Expressions
Include external variables in a query using the @ symbol.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)
# Define a variable
min_age = 30
# Query rows using a variable
print("Rows where Age >= min_age (min_age=30):")
filtered_df = df.query('Age >= @min_age')
print(filtered_df)Output
Rows where Age >= min_age (min_age=30):
Name Age Salary
1 Ram 30 80000
2 Priya 35 90000Modifying the Original DataFrame
Use inplace=True to filter the original DataFrame directly.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)
# Modify the original DataFrame
print("Filtering original DataFrame where Salary > 75000:")
df.query('Salary > 75000', inplace=True)
print(df)Output
Filtering original DataFrame where Salary > 75000:
Name Age Salary
1 Ram 30 80000
2 Priya 35 90000Summary
In this tutorial, we explored the DataFrame.query method in pandas. Key takeaways include:
- Using
queryfor intuitive filtering with string expressions. - Applying single or multiple conditions.
- Incorporating variables with the
@symbol. - Using
inplace=Trueto modify the original DataFrame.
The DataFrame.query method is a powerful and flexible tool for filtering rows in pandas DataFrames.