程序员最近都爱上了这个网站  程序员们快来瞅瞅吧!  it98k网:it98k.com

本站消息

站长简介/公众号

  出租广告位,需要合作请联系站长

+关注
已关注

分类  

暂无分类

标签  

暂无标签

日期归档  

暂无数据

In Polars, how can you update several columns simultaneously?

发布于2025-01-05 08:54     阅读(613)     评论(0)     点赞(23)     收藏(0)


Suppose we have a Polars frame something like this

lf = pl.LazyFrame([
    pl.Series("a", ...),
    pl.Series("b", ...),
    pl.Series("c", ...),
    pl.Series("i", ...)
])

and a function something like this

def update(a, b, c, i):
    s = a + b + c + i
    a /= s
    b /= s
    c /= s
    return a, b, c

that depends of elements of columns a, b, c and also i.

How can we update each row of a frame using the function?

We could use with_columns to update the rows of each column independently but how can we do it with the dependency between columns?

Edit

In response to comments from @roman let's tighten up the question.

Use this LazyFrame

lf = pl.LazyFrame(
    [
        pl.Series("a", [1, 2, 3, 4], dtype=pl.Int8),
        pl.Series("b", [5, 6, 7, 8], dtype=pl.Int8),
        pl.Series("c", [9, 0, 1, 2], dtype=pl.Int8),
        pl.Series("i", [3, 4, 5, 6], dtype=pl.Int8),
        pl.Series("o", [7, 8, 9, 0], dtype=pl.Int8),
    ]
)

We want to update columns a, b & c in a way that depends on column i. Column o should be unaffected. We have a function that takes the values a, b, c & i and returns a, b, c & i, where the first three have been updated but i remains the same as the input. After updating, all columns should have the same dtype as before.

The closest we can get is using an update function like this.

def update(args):
    s = args["a"] + args["b"] + args["c"] + args["i"]
    args["a"] /= s
    args["b"] /= s
    args["c"] /= s
    return args.values()

and applying it like this

(
    lf.select(
        pl.struct(pl.col("a", "b", "c", "i"))
        .map_elements(update, return_dtype=pl.List(pl.Float64))
        .list.to_struct(fields=["a", "b", "c", "i"])
        .alias("result"),
    )
    .unnest("result")
    .collect()
)

But this has some problems.

  1. We have lost column o.
  2. Column i has become Float64
  3. It's pretty ugly.

Is there a better way?


解决方案


update

lf.select(
    pl.exclude(["a","b","c"]),
    pl
    .struct(pl.all()).map_elements(update, return_dtype=pl.List(pl.Float64))
    .list.to_struct(fields=["a","b","c"])
    .alias("result")
).unnest("result").collect()
shape: (4, 5)
┌─────┬─────┬──────────┬──────────┬────────┐
│ i   ┆ o   ┆ a        ┆ b        ┆ c      │
│ --- ┆ --- ┆ ---      ┆ ---      ┆ ---    │
│ i8  ┆ i8  ┆ f64      ┆ f64      ┆ f64    │
╞═════╪═════╪══════════╪══════════╪════════╡
│ 370.0555560.2777780.5    │
│ 480.1666670.50.0    │
│ 590.18750.43750.0625 │
│ 600.20.40.1    │
└─────┴─────┴──────────┴──────────┴────────┘

Or you can update your function to return dictionary and skip the lists:

def update(args):
    s = args["a"] + args["b"] + args["c"] + args["i"]
    args["a"] /= s
    args["b"] /= s
    args["c"] /= s
    return args

lf.select(
    pl
    .struct(pl.all()).map_elements(update, return_dtype=pl.Struct)
    .alias("result")
).unnest("result").collect()
shape: (4, 5)
┌──────────┬──────────┬────────┬─────┬─────┐
│ a        ┆ b        ┆ c      ┆ i   ┆ o   │
│ ---      ┆ ---      ┆ ---    ┆ --- ┆ --- │
│ f64      ┆ f64      ┆ f64    ┆ i64 ┆ i64 │
╞══════════╪══════════╪════════╪═════╪═════╡
│ 0.0555560.2777780.537   │
│ 0.1666670.50.048   │
│ 0.18750.43750.062559   │
│ 0.20.40.160   │
└──────────┴──────────┴────────┴─────┴─────┘

original In general, I'd advice agains using python function and try to stay within pure polars expressions. So in your case it could look, for example, like this:

lf = pl.LazyFrame([
    pl.Series("a", [1,2,3]),
    pl.Series("b", [2,3,4]),
    pl.Series("c", [5,6,7]),
    pl.Series("i", [7,8,7])
])

(
    lf
    .with_columns(pl.exclude("i") / pl.sum_horizontal(pl.all()))
)
shape: (3, 4)
┌──────────┬──────────┬──────────┬─────┐
│ a        ┆ b        ┆ c        ┆ i   │
│ ---      ┆ ---      ┆ ---      ┆ --- │
│ f64      ┆ f64      ┆ f64      ┆ i64 │
╞══════════╪══════════╪══════════╪═════╡
│ 0.0666670.1333330.3333337   │
│ 0.1052630.1578950.3157898   │
│ 0.1428570.1904760.3333337   │
└──────────┴──────────┴──────────┴─────┘

if you really want to pass the function, you can do it:

def update(row):
    a,b,c,i = [row[x] for x in list("abci")]
    s = a + b + c + i
    a /= s
    b /= s
    a /= s
    return a, b, c

lf.select(
    pl.col.i,
    pl
    .struct(pl.all()).map_elements(update, return_dtype=pl.List(pl.Float64))
    .list.to_struct(fields=["a","b","c"])
    .alias("result")
).unnest("result")
shape: (3, 4)
┌─────┬──────────┬──────────┬─────┐
│ i   ┆ a        ┆ b        ┆ c   │
│ --- ┆ ---      ┆ ---      ┆ --- │
│ i64 ┆ f64      ┆ f64      ┆ f64 │
╞═════╪══════════╪══════════╪═════╡
│ 70.0044440.1333335.0 │
│ 80.005540.1578956.0 │
│ 70.0068030.1904767.0 │
└─────┴──────────┴──────────┴─────┘


所属网站分类: 技术文章 > 问答

作者:黑洞官方问答小能手

链接:https://www.pythonheidong.com/blog/article/2046769/4c30d4c90267bf990016/

来源:python黑洞网

任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任

23 0
收藏该文
已收藏

评论内容:(最多支持255个字符)