Question
Impact on Performance When Using SQL-Like Padding in Polars
Description:
I have been using SQL-like padding in my Polars DataFrame queries for better readability and easier commenting of conditions. This approach involves using True
(analogous to 1=1
in SQL) to pad the conditional logic.
SQL Example
In SQL, padding with 1=1
makes it easy to add or remove conditions:
SELECT *
FROM employees
WHERE 1=1
AND department = 'Sales'
AND salary > 50000
AND hire_date > '2020-01-01';
SELECT *
FROM employees
WHERE 1=1
AND department = 'Sales'
-- AND salary > 50000
AND hire_date > '2020-01-01';
(>>>> in my opinion <<<<) This makes adding, removing, and replacing operators very easy. Also it simplifies the git diff because adding and removing conditions appear as a single diff.
Polars Example
Here's the Polars equivalent that I have been using:
# Example data
data = {
"name": ["Alice", "Bob", "Charlie", "David"],
"department": ["Sales", "HR", "Sales", "IT"],
"salary": [60000, 45000, 70000, 50000],
"hire_date": ["2021-06-01", "2019-03-15", "2020-08-20", "2018-11-05"]
}
# Create a DataFrame
df = pl.DataFrame(data)
# Filter with padding
filtered_df = df.filter(
(
True
& pl.col("department").eq("Sales")
# & pl.col("salary").gt(50000)
& pl.col("hire_date").gt("2020-01-01")
& True
)
)
print(filtered_df)
Concern
I am concerned about the potential performance impact of this approach in Polars. Does padding the filter conditions with True
significantly affect the performance of query execution in Polars? Also is there a better way to do this ?
Thank you for your assistance.