Question

How can I efficiently `fill_null` only certain columns of a DataFrame?

For example, let us say I want to fill_null(strategy="zero") only the numeric columns of my DataFrame. My current strategy is to do this:

import polars as pl
import polars.selectors as cs

df = pl.DataFrame(
    [
        pl.Series("id", ["alpha", None, "gamma"]),
        pl.Series("xs", [None, 100, 2]),
    ]
)

final_df = df.select(cs.exclude(cs.numeric()))
final_df = final_df.with_columns(
    df.select(cs.numeric()).fill_null(strategy="zero")
)

print(final_df)
shape: (3, 2)
┌───────┬─────┐
│ id    ┆ xs  │
│ ---   ┆ --- │
│ str   ┆ i64 │
╞═══════╪═════╡
│ alpha ┆ 0   │
│ null  ┆ 100 │
│ gamma ┆ 2   │
└───────┴─────┘

Are there alternative, either more idiomatic or more efficient methods to achieve what I'd like to do?

 2  86  2
1 Jan 1970

Solution

 3

pl.DataFrame.select returns a dataframe that contains only the columns listed as arguments. Alternatively, pl.DataFrame.with_columns adds columns to the dataframe (and replaces columns with the same name).

Especially, this provides you with the tools to perform the filling without an intermediate dataframe. You can simply use pl.DataFrame.with_columns to fill missing values only in numeric columns (i.e. replace them with their filled versions).

df.with_columns(
    cs.numeric().fill_null(strategy="zero")
)
shape: (3, 2)
┌───────┬─────┐
│ id    ┆ xs  │
│ ---   ┆ --- │
│ str   ┆ i64 │
╞═══════╪═════╡
│ alpha ┆ 0   │
│ null  ┆ 100 │
│ gamma ┆ 2   │
└───────┴─────┘
2024-07-22
Hericks