Question
How to add a column with JSON representation of rows in Polars DataFrame?
I want to use polars to take a csv input and get for each row another column (e.g called json_per_row) where the entry per row is the json representation of the entire row. I also want to select only a subset of the columns to be included alongside the json_per_row column.
Ideally I don’t want to hardcode the number / names of the columns of my input but just to illustrate I’ve provided a simple example below:
# Input: csv with columns time, var1, var2,...
s1 = pl.Series("time", [100, 200, 300])
s2 = pl.Series("var1", [1,2,3])
s3 = pl.Series("var2", [4,5,6])
# I want to add this column with polars somehow
output_col = pl.Series("json_per_row", [
json.dumps({ "time": 100, "var1":1, "var2":4 }),
json.dumps({ "time": 200, "var1":2, "var2":5 }),
json.dumps({ "time":300 , "var1":3, "var2":6 })
])
# Desired output
df = pl.DataFrame([s1, output_col])
print(df)
So is there a way to do this with the functions in the polars library? I'd rather not use json.dumps if it's not needed since as the docs say it can affect performance if you have to bring in external / user defined functions. Thanks
3 47
3