Question
What is the best idiomatic way to use .filter() in .agg()?
There are certain contexts in Polars that call on a column, not the entire dataframe, like agg, groupby, with_columns, select, and more. In these contexts filter sometimes needs to be used. What is the idiomatic way to use filter in this situation?
E.g.:
import polars as pl
from polars import col
confidence_level = 0.95
lookback_days = 200
percentile_lower = col.Excess_Return.quantile(quantile=1-confidence_level)
percentile_upper = col.Excess_Return.quantile(quantile=confidence_level)
rachev_ratio = excess_return.with_row_index().rolling('index', period=str(lookback_days)+'i').agg(
Date = col.Date.last(),
Price = col.Price.last(),
PnL = col.PnL.last(),
TBill_Return = col.TBill_Return.last(),
Excess_Return = col.Excess_Return.last(),
Percentile_Lower = percentile_lower,
Percentile_Upper = percentile_upper,
Lower_CVaR = col.Excess_Return.filter(col.Excess_Return <= percentile_lower).mean(),
Upper_CVaR = col.Excess_Return.filter(col.Excess_Return >= percentile_upper).mean(),
.with_columns(
Rachev_Ratio = (col.Lower_CVaR / col.Upper_CVaR).abs(),
).with_columns(
Rachev_Ratio = pl.when(col.Rachev_Ratio > 3).then(3).otherwise(col.Rachev_Ratio)
)[lookback_days:].drop('index')
Specifically the lines:
Lower_CVaR = col.Excess_Return.filter(col.Excess_Return <= percentile_lower).mean(),
Upper_CVaR = col.Excess_Return.filter(col.Excess_Return >= percentile_upper).mean(),`
To use filter on col.Excess_Return
it needs to be called twice, first so .filter
can be called and second in the condition statement itself.
There doesn't seem to be pl.filter
as a shorthand. df.filter
doesn't work in this scenario, neither does pl.all.filter
nor pl.all().filter
or pl.col('*').filter
.
Does anyone know the idiomatic way to use filter in .agg
or in similar contexts? Should the column name be written twice in every if statement?