Question
Sum multiple rows from multiple columns in a dataframe for a group
For each group in a groupby, I want to sum certain rows from several columns and output them in a new column, is_m_days
.
- Each Group (a Group has CT/RT and has a Quantity from 1 or 2 or 3 or more rows, randomly mixed up) in 'ATEXT'
- For the Sum, each group has a Row before and after.
DataFrame:
data = {'ATEXT': ['', 'CT', 'RT', '', '', '', '', 'CT', 'CT', 'CT', 'TT', ''],
'BEGUZ_UE': [11.0, 23.0, 33.0, 15.0, 12.75, 19.75, 14.75, 23.0,
24.0, 24.0, 33.0, 15.0],
'subtract': [0.0, 0.0, 0.0, 0.2, np.nan, np.nan, 2.0, np.nan,
np.nan, np.nan, np.nan, 0.0],
'add': [3.92, 0.0, 0.0, 0.0, np.nan, np.nan, 0.0, np.nan, np.nan,
np.nan, np.nan, 3.57],
'UE_more_days': [np.nan, np.nan, 56.0, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, 104.0, np.nan]}
Result should be:
ATEXT BEGUZ_UE subtract add UE_more_days is_m_days
0 11.00 *0.00* *3.92*
1 CT *23.00* 0.00 0.00
2 RT *33.00* 0.00 0.00 56.0
3 *15.00* 0.20 0.00 *74.92*
4 12.75
5 19.75
6 14.75 *2.00* *0.00*
7 CT *23.00*
8 RT *24.00*
9 CT *24.00*
10 CT *33.00* 104.0
11 *15.00* 0.00 3.57 *117.00*
12
etc
My try was:
m = df['ATEXT'].eq("")
cond = (~m) & m.shift(-1)
df['UE_more_days'] = (df['BEGUZ_UE'].mask(m)
.groupby(m.cumsum()).cumsum()
.where(cond)
)
tmv = (df[['subtract', 'add']]
.shift()
.groupby(m.cumsum())
.transform('max')
.eval('add-subtract')
)
df['is_m_days'] = (df.groupby(m[::-1].cumsum())['BEGUZ_UE']
.transform('sum')
.add(tmv)
.where(cond)
.shift()
)
Is there a better solution?
3 112
3