Question
Row-wise dot product in Polars
I have a dataframe with two columns values
and weights
of list[i64]
dtype, and I'd like to perform row-wise dot product of the two.
df = pl.DataFrame({
'values': [[0], [0, 2], [0, 2, 4], [2, 4, 0], [4, 0, 8]],
'weights': [[3], [2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]]
})
There's one way that worked, which is first putting values
and weights
into a struct
and then do .map_elements
on each row:
df.with_columns(
pl.struct(['values', 'weights'])
.map_elements(
lambda x: np.dot(x['values'], x['weights']), return_dtype=pl.Float64
).alias('dot')
)
But as the documentation points out, map_elements
is in general much slower than native polars expressions, so I was trying to implement in native expressions.
I tried the following:
df.with_columns(
pl.concat_list('values', 'weights').alias('combined'),
pl.concat_list('values', 'weights').list.eval(pl.element().slice(0, pl.len() // 2)).alias('values1'),
pl.concat_list('values', 'weights').list.eval(pl.element().slice(pl.len() // 2, pl.len() // 2)).alias('values2'),
pl.concat_list('values', 'weights').list.eval(
pl.element().slice(0, pl.len() // 2).dot(pl.element().slice(pl.len() // 2, pl.len() // 2))
).list.first().alias('dot'),
pl.concat_list('values', 'weights').list.eval(
pl.element().slice(0, pl.len() // 2) + pl.element().slice(pl.len() // 2, pl.len() // 2)
).alias('sum'),
)
I was expecting the dot
column to be [0, 6, 16, 10, 28]
, but it turns out to be the following.
shape: (5, 7)
┌───────────┬───────────┬─────────────┬───────────┬───────────┬─────┬────────────┐
│ values ┆ weights ┆ combined ┆ values1 ┆ values2 ┆ dot ┆ sum │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ list[i64] ┆ list[i64] ┆ list[i64] ┆ list[i64] ┆ list[i64] ┆ i64 ┆ list[i64] │
╞═══════════╪═══════════╪═════════════╪═══════════╪═══════════╪═════╪════════════╡
│ [0] ┆ [3] ┆ [0, 3] ┆ [0] ┆ [3] ┆ 0 ┆ [0] │
│ [0, 2] ┆ [2, 3] ┆ [0, 2, … 3] ┆ [0, 2] ┆ [2, 3] ┆ 4 ┆ [0, 4] │
│ [0, 2, 4] ┆ [1, 2, 3] ┆ [0, 2, … 3] ┆ [0, 2, 4] ┆ [1, 2, 3] ┆ 20 ┆ [0, 4, 8] │
│ [2, 4, 0] ┆ [1, 2, 3] ┆ [2, 4, … 3] ┆ [2, 4, 0] ┆ [1, 2, 3] ┆ 20 ┆ [4, 8, 0] │
│ [4, 0, 8] ┆ [1, 2, 3] ┆ [4, 0, … 3] ┆ [4, 0, 8] ┆ [1, 2, 3] ┆ 80 ┆ [8, 0, 16] │
└───────────┴───────────┴─────────────┴───────────┴───────────┴─────┴────────────┘
Note that even the sum
isn't what I expect it to be. The first slice seems to be adding itself instead of the second slice
Am I doing anything wrong? What's the best way to perform row-wise dot product in Polars?