Question

How do I identify consecutive/contiguous dates in polars

I'm trying to isolate runs of dates using polars. For this, I've been playing with .rolling, .rle, and .rle_id but can't seem to fit them together to make it work.

Given this:

df = pl.DataFrame({
    "date": ["2024-01-01", "2024-01-02", "2024-01-03", "2024-01-05", "2024-01-06", "2024-01-07"]
})
df = df.with_columns(pl.col("date").str.to_date())

I would like to have a 2nd column with the id of the run if they're contiguous:

[1, 1, 1, 2, 2, 2]
 4  67  4
1 Jan 1970

Solution

 6

You could take the diff of the successive dates, identify those that are not 1 day and compute a cum_sum:

(pl.DataFrame({'date': ['2024-01-01', '2024-01-02', '2024-01-03',
                        '2024-01-05', '2024-01-06', '2024-01-07']})
  .with_columns(pl.col('date').str.to_date())
  .with_columns(id=pl.col('date').diff().ne(pl.duration(days=1))
                     .fill_null(True).cum_sum())
)

Alternatively, generate a range of the step (with pl.duration), subtract it to the date and use rle_id:

(pl.DataFrame({'date': ['2024-01-01', '2024-01-02', '2024-01-03',
                        '2024-01-05', '2024-01-06', '2024-01-07']})
  .with_columns(pl.col('date').str.to_date())
  .with_columns(id=(pl.col('date')
                   -pl.duration(days=pl.arange(0, pl.col('date').len()))
                   ).rle_id()+1)
)

Output:

┌────────────┬─────┐
│ date       ┆ id  │
│ ---        ┆ --- │
│ date       ┆ u32 │
╞════════════╪═════╡
│ 2024-01-01 ┆ 1   │
│ 2024-01-02 ┆ 1   │
│ 2024-01-03 ┆ 1   │
│ 2024-01-05 ┆ 2   │
│ 2024-01-06 ┆ 2   │
│ 2024-01-07 ┆ 2   │
└────────────┴─────┘
2024-07-15
mozway

Solution

 1

You can do this:

df.with_columns(
                c=(pl.col('date')!=pl.col('date').shift(1).dt.offset_by('1d'))
                #this marks when the previous value is more than a day away from previous
                .fill_null(True)
                # presumably there are no missing dates, this just seeds the first row 
                .cum_sum()
                # this implicitly converts Trues to 1 and False to 0
                )
shape: (6, 2)
┌────────────┬─────┐
│ date       ┆ c   │
│ ---        ┆ --- │
│ date       ┆ u32 │
╞════════════╪═════╡
│ 2024-01-01 ┆ 1   │
│ 2024-01-02 ┆ 1   │
│ 2024-01-03 ┆ 1   │
│ 2024-01-05 ┆ 2   │
│ 2024-01-06 ┆ 2   │
│ 2024-01-07 ┆ 2   │
└────────────┴─────┘



2024-07-15
Dean MacGregor