Question
How to vectorize groupby combination lists of two columns in Pandas Dataframe
I've a dataframe and need to group by two columns from all possible combinations of dataframe columns ['A','B','C','D','E','F','G']
import pandas as pd
d = {'A': [0,1,1,0,0,1,0,0],
'B': [1,1,0,0,0,1,0,1],
'C': [0,0,1,1,0,0,1,0],
'D': [1,1,1,0,0,1,0,0],
'E': [0,0,0,1,0,0,0,1],
'F': [0,0,1,1,0,0,1,0],
'G': [1,0,1,1,0,0,0,0],
'Feature_1':[1,0,1,1,0,1,1,0],
'Feature_2':[0,1,1,0,1,0,0,1]}
df = pd.DataFrame(d)
print(df)
A B C D E F G Feature_1 Feature_2
0 0 1 0 1 0 0 1 1 0
1 1 1 0 1 0 0 0 0 1
2 1 0 1 1 0 1 1 1 1
3 0 0 1 0 1 1 1 1 0
4 0 0 0 0 0 0 0 0 1
5 1 1 0 1 0 0 0 1 0
6 0 0 1 0 0 1 0 1 0
7 0 1 0 0 1 0 0 0 1
Need to do groupby all possible combinations of columns of length 2
is there a way to vectorize this process which takes long time if dataframe columns are relatively large :
The output expected as:
Col_1 Col_2 Col_1_value Col_2_value Feature_1_Sum Feature_2_Sum
0 A B 0 0 2 1
1 A B 0 1 1 1
2 A B 1 0 1 1
3 A B 1 1 1 1
0 A C 0 0 1 2
... ... ... ... ... ... ...
3 E G 1 1 1 0
0 F G 0 0 1 3
1 F G 0 1 1 0
2 F G 1 0 1 0
3 F G 1 1 2 1
77 rows × 6 columns
What I've done through looping :
from itertools import combinations
cols_list = ['A','B','C','D','E','F','G']
i=0
for comb in combinations(cols_list, 2):
i=i+1
comb_list = [comb[0], comb[1]]
try:
del df_gp
except Exception as e:
pass
try:
del df_comb
except Exception as e:
pass
df_gp = (df.groupby(by=comb_list).agg(Feature_1_Sum=('Feature_1','sum'), Feature_2_Sum= ('Feature_2','sum')).reset_index())
df_gp['Col_1'] = comb[0]
df_gp['Col_2'] = comb[1]
df_gp['Col_1_value'] = df_gp[comb[0]]
df_gp['Col_2_value'] = df_gp[comb[1]]
df_comb = pd.DataFrame()
cols = ['Col_1','Col_2','Col_1_value','Col_2_value','Feature_1_Sum','Feature_2_Sum']
df_comb = df_gp[cols]
if i==1:
df_agg = df_comb
else:
df_agg = pd.concat([df_agg, df_comb])
df_agg
4 76
4