Question

Get max date column name on polars

I'm trying to get the column name containing the maximum date value in my Polars DataFrame. I found a similar question that was already answered here.

However, in my case, I have many columns, and adding them manually would be tedious. I would like to use column selectors cs.datetime() and have tried the following:

import polars as pl
from datetime import datetime
import polars.selectors as cs

data = {
    "ID" : [1,2,3],
    "Values_A" : [datetime(1,1,2),datetime(1,1,3),datetime(1,1,4)],
    "Values_B" : [datetime(1,1,4),datetime(1,1,7),datetime(1,1,2)]
}

df = pl.DataFrame(data)

def arg_max_horizontal(*columns: pl.Expr) -> pl.Expr:
    return (
        pl.concat_list(columns)
        .list.arg_max()
        .replace_strict({i: col_name for i, col_name in enumerate(columns)})
    )

(
    df
    .with_columns(
        Largest=arg_max_horizontal(pl.select(cs.datetime()))
    )
)
 3  57  3
1 Jan 1970

Solution

 2

You were one step away and simply needed to select the column names of interest using df.select(cs.datetime()).columns. Then, we can unpack the list in the function call.

Note. I've adapted the type hint of arg_max_horizontal accordingly. Moreover, (thanks to @Cameron Riddell), we can simplify conversion to a string representation using a cast to a suitable pl.Enum.

def arg_max_horizontal(*columns: str) -> pl.Expr:
    return (
        pl.concat_list(columns)
        .list.arg_max()
        .cast(pl.Enum(columns))
    )

(
    df
    .with_columns(
        Largest=arg_max_horizontal(*df.select(cs.datetime()).columns)
    )
)
shape: (3, 4)
┌─────┬─────────────────────┬─────────────────────┬──────────┐
│ ID  ┆ Values_A            ┆ Values_B            ┆ Largest  │
│ --- ┆ ---                 ┆ ---                 ┆ ---      │
│ i64 ┆ datetime[μs]        ┆ datetime[μs]        ┆ str      │
╞═════╪═════════════════════╪═════════════════════╪══════════╡
│ 1   ┆ 0001-01-02 00:00:00 ┆ 0001-01-04 00:00:00 ┆ Values_B │
│ 2   ┆ 0001-01-03 00:00:00 ┆ 0001-01-07 00:00:00 ┆ Values_B │
│ 3   ┆ 0001-01-04 00:00:00 ┆ 0001-01-02 00:00:00 ┆ Values_A │
└─────┴─────────────────────┴─────────────────────┴──────────┘
2024-07-24
Hericks