程序员最近都爱上了这个网站  程序员们快来瞅瞅吧!  it98k网:it98k.com

本站消息

站长简介/公众号

  出租广告位,需要合作请联系站长

+关注
已关注

分类  

暂无分类

标签  

暂无标签

日期归档  

暂无数据

How to identify nth row that satisfies condition

发布于2024-10-31 20:31     阅读(177)     评论(0)     点赞(9)     收藏(3)


I have a dataframe with months and sales. If somebody does 250 sales for 3 consecutive months, they get a bonus.

  1. Once the bonus month is identified, the counter resets and starts to count the next 3 consecutive months with 250
  2. if there is less than 250 sales, then the counter resets
Month sales 3 month consistency
Month 1 250
Month 2 250
Month 3 250 1
Month 4 250
Month 5 250
Month 6 250 1
Month 7 0
Month 8 250
Month 9 250
Month 10 250 1
Month 11 250
Month 12 250
Month 13 0
Month 14 250
Month 15 250
Month 16 250 1

I want to populate the 3 month consistency column

Tried using cum count with conditions but it did not work and I could not find such an example anywhere. What I tried was to use rolling window to identify every 3rd consecutive record with 250. But then it does not reset after the each set of 3 consecutive 250s.

df.groupby('id').apply(lambda x: x['gt_250'].rolling(min_periods=1, window=3).sum())
#-- grouped by id because this has to be computed for each salesperson_id

The result I got from it was,

Month sales 3 month consistency
Month 1 250
Month 2 250
Month 3 250 3
Month 4 250 3
Month 5 250 3

解决方案


You could use a custom groupby.cumcount and mod:

N = 3 # number of consecutive months

# identify target months
m = df['sales'].ge(250)
# for groups of consecutive target months
group = (~m).cumsum()

# count the Nth repeat for each group
df['3 month consistency'] = (df[m].groupby(group).cumcount().mod(N).eq(N-1)
                             .map({True: 1})
                            )

NB. if you want a boolean column replace .map({True: 1}) by .reindex(df.index).eq(True).

Output:

       Month  sales  3 month consistency
0    Month 1    250                  NaN
1    Month 2    250                  NaN
2    Month 3    250                  1.0
3    Month 4    250                  NaN
4    Month 5    250                  NaN
5    Month 6    250                  1.0
6    Month 7      0                  NaN
7    Month 8    250                  NaN
8    Month 9    250                  NaN
9   Month 10    250                  1.0
10  Month 11    250                  NaN
11  Month 12    250                  NaN
12  Month 13      0                  NaN
13  Month 14    250                  NaN
14  Month 15    250                  NaN
15  Month 16    250                  1.0

Intermediates:

       Month  sales      m  group  cumcount  mod  3 month consistency
0    Month 1    250   True      0       0.0  0.0                  NaN
1    Month 2    250   True      0       1.0  1.0                  NaN
2    Month 3    250   True      0       2.0  2.0                  1.0
3    Month 4    250   True      0       3.0  0.0                  NaN
4    Month 5    250   True      0       4.0  1.0                  NaN
5    Month 6    250   True      0       5.0  2.0                  1.0
6    Month 7      0  False      1       NaN  NaN                  NaN
7    Month 8    250   True      1       0.0  0.0                  NaN
8    Month 9    250   True      1       1.0  1.0                  NaN
9   Month 10    250   True      1       2.0  2.0                  1.0
10  Month 11    250   True      1       3.0  0.0                  NaN
11  Month 12    250   True      1       4.0  1.0                  NaN
12  Month 13      0  False      2       NaN  NaN                  NaN
13  Month 14    250   True      2       0.0  0.0                  NaN
14  Month 15    250   True      2       1.0  1.0                  NaN
15  Month 16    250   True      2       2.0  2.0                  1.0

Variant (with a boolean output):

N = 3
m = df['sales'].ge(250)
group = m.ne(m.shift()).cumsum()
df['3 month consistency'] = df.groupby(group).cumcount().mod(N).eq(N-1)

Intermediates:

       Month  sales      m  group  cumcount  mod  3 month consistency
0    Month 1    250   True      1         0    0                False
1    Month 2    250   True      1         1    1                False
2    Month 3    250   True      1         2    2                 True
3    Month 4    250   True      1         3    0                False
4    Month 5    250   True      1         4    1                False
5    Month 6    250   True      1         5    2                 True
6    Month 7      0  False      2         0    0                False
7    Month 8    250   True      3         0    0                False
8    Month 9    250   True      3         1    1                False
9   Month 10    250   True      3         2    2                 True
10  Month 11    250   True      3         3    0                False
11  Month 12    250   True      3         4    1                False
12  Month 13      0  False      4         0    0                False
13  Month 14    250   True      5         0    0                False
14  Month 15    250   True      5         1    1                False
15  Month 16    250   True      5         2    2                 True

per ID

N = 3
m = df['sales'].ge(250)
group = m.ne(m.shift()).cumsum()
df['3 month consistency'] = df.groupby(['ID', group]).cumcount().mod(N).eq(N-1)

Example:

   ID     Month  sales  3 month consistency
0   A   Month 1    250                False
1   A   Month 2    250                False
2   A   Month 3    250                 True
3   A   Month 4    250                False
4   A   Month 5    250                False
5   A   Month 6    250                 True
6   A   Month 7      0                False
7   A   Month 8    250                False
8   A   Month 9    250                False
9   A  Month 10    250                 True
10  A  Month 11    250                False
11  A  Month 12    250                False
12  A  Month 13      0                False
13  A  Month 14    250                False
14  A  Month 15    250                False
15  B   Month 1    250                False  # independent from A
16  B   Month 2    250                False
17  B   Month 3    250                 True
18  B   Month 4    250                False
19  B   Month 5    250                False
20  B   Month 6    250                 True
21  B   Month 7      0                False
22  B   Month 8    250                False
23  B   Month 9    250                False
24  B  Month 10    250                 True
25  B  Month 11    250                False
26  B  Month 12    250                False
27  B  Month 13      0                False
28  B  Month 14    250                False
29  B  Month 15    250                False


所属网站分类: 技术文章 > 问答

作者:黑洞官方问答小能手

链接:https://www.pythonheidong.com/blog/article/2040402/3699a72024b0bcccb41a/

来源:python黑洞网

任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任

9 0
收藏该文
已收藏

评论内容:(最多支持255个字符)