Question

Cumulative calculation across rows?

Suppose I have a function:

def f(prev, curr):
  return prev * 2 + curr

(Just an example, could have been anything)

And a Polars dataframe:

| some_col | other_col |
|----------|-----------|
|    7     |    ...
|    3     |
|    9     |
|    2     |

I would like to use f on my dataframe cumulatively, and the output would be:

| some_col | other_col |
|----------|-----------|
|    7     |    ...
|    17    |
|    43    |
|    88    |

I understand that, naturally, this type of calculation isn't going to be very efficient since it has to be done one row at a time (at least in the general case).

I can obviously loop over rows. But is there an elegant, idiomatic way to do this in Polars?

 4  63  4
1 Jan 1970

Solution

 3

It depends on the exact operation you need to perform.

The example you've given can be expressed in terms of .cum_sum() with additional arithmetic:

def plus_prev_times_2(col):
    x = 2 ** pl.int_range(pl.len() - 1).reverse()
    y = 2 ** pl.int_range(1, pl.len())
    cs = (x * col.slice(1)).cum_sum()
    return cs / x + col.first() * y

df = pl.DataFrame({"some_col": [7, 3, 9, 2]})

df.with_columns(
   pl.col.some_col.first()
     .append(pl.col.some_col.pipe(plus_prev_times_2))
     .alias("plus_prev_times_2")
)     
shape: (4, 2)
┌──────────┬───────────────────┐
│ some_col ┆ plus_prev_times_2 │
│ ---      ┆ ---               │
│ i64      ┆ f64               │
╞══════════╪═══════════════════╡
│ 7        ┆ 7.0               │
│ 3        ┆ 17.0              │
│ 9        ┆ 43.0              │
│ 2        ┆ 88.0              │
└──────────┴───────────────────┘

Vertical fold/scan

In general, I believe what you're asking for is called a "Vertical fold/scan"

Polars only offers a horizontal version, pl.cum_fold

df = pl.DataFrame(dict(a=[7], b=[3], c=[9], d=[2]))

df.with_columns(
   pl.cum_fold(acc=0, function=lambda acc, x: acc * 2 + x, exprs=pl.all())
)
shape: (1, 5)
┌─────┬─────┬─────┬─────┬──────────────┐
│ a   ┆ b   ┆ c   ┆ d   ┆ cum_fold     │
│ --- ┆ --- ┆ --- ┆ --- ┆ ---          │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ struct[4]    │
╞═════╪═════╪═════╪═════╪══════════════╡
│ 7   ┆ 3   ┆ 9   ┆ 2   ┆ {7,17,43,88} │
└─────┴─────┴─────┴─────┴──────────────┘

As discussed in the issue, a vertical equivalent would be hugely inefficient.

For an efficient approach, you can write plugins in Rust:

But using something like numba is probably easier to implement.

There are several existing numba answers, e.g.

2024-07-13
jqurious