Question

How to take the average of all previous entries in a group?

I'd like to do the following in python using the polars library:

Input:

df = pl.from_repr("""
┌──────┬────────┐
│ Name ┆ Number │
│ ---  ┆ ---    │
│ str  ┆ i64    │
╞══════╪════════╡
│ Mr.A ┆ 1      │
│ Mr.A ┆ 4      │
│ Mr.A ┆ 5      │
│ Mr.B ┆ 3      │
│ Mr.B ┆ 5      │
│ Mr.B ┆ 6      │
│ Mr.B ┆ 10     │
└──────┴────────┘
""")

Output:

shape: (7, 3)
┌──────┬────────┬──────────┐
│ Name ┆ Number ┆ average  │
│ ---  ┆ ---    ┆ ---      │
│ str  ┆ i64    ┆ f64      │
╞══════╪════════╪══════════╡
│ Mr.A ┆ 1      ┆ 0.0      │
│ Mr.A ┆ 4      ┆ 1.0      │
│ Mr.A ┆ 5      ┆ 2.5      │
│ Mr.B ┆ 3      ┆ 0.0      │
│ Mr.B ┆ 5      ┆ 3.0      │
│ Mr.B ┆ 6      ┆ 4.0      │
│ Mr.B ┆ 10     ┆ 4.666667 │
└──────┴────────┴──────────┘

That is to say:

For every first entry of a person, set the average to zero.
For every subsequent entry, calculate the average based on the previous entries

Example:

Mr. A started off with average=0 and the Number=1.

Then, Mr. A has the Number=4, thus it took the average of the previous entry (1/1 data=1)

Then, Mr. A has the Number=5, thus the previous average was: (1+4) / (2 data) = 5/2 = 2.5

And so on

I've tried the rolling mean function (using a Polars Dataframe, df), however, I'm restricted by rolling_mean's window size (i.e. it calculates only the past 2 entries, plus it averages the current entry as well; I want to average only the previous entries)

Does anyone have an idea? Much appreciated!:

df.group_by("Name").agg(pl.col("Number").rolling_mean(window_size=2))

4 61 4

1 Jan 1970