Question
Groupby multiple columns and extract top rows based on non-grouped column value
I am trying to solve a problem some what very similar to: https://platform.stratascratch.com/coding/10362-top-monthly-sellers?code_type=2
here is my data frame:
product seller market total_sales
books s1 de 10
books s2 jp 20
books s3 in 30
books s1 de 25
books s5 in 15
books s1 us 12
books s2 uk 10
clothing s1 de 11
clothing s2 de 18
clothing s1 uk 55
clothing s3 in 31
clothing s1 de 14
clothing s2 de 10
clothing s1 de 35
electronics s1 us 18
electronics s1 de 12
electronics s2 in 16
electronics s3 uk 24
electronics s1 us 37
electronics s4 jp 27
electronics s3 uk 26
electronics s1 us 15
Expected output is:
product seller market total_sales
books s1 de 35
books s3 in 30
books s2 jp 20
Clothing s1 de 60
Clothing s1 uk 55
Clothing s3 in 31
electronics s1 us 70
electronics s3 uk 50
electronics s4 jp 27
I want to get top three product category sales in the world.
I was able to aggregate total sales basing on product,seller & group them. After grouping I am able to sort it by total_sales per group. But, I was not able to get only the top-3 total_sale rows per ['product'] group. Cany anyone help on this.
import pandas as pd
df = df.groupby(['product', 'seller','market'], as_index = False).agg({'total_sales':sum})
df = df.sort_values(['product','total_sales'],ascending=False).groupby(['product', 'seller','market',],as_index = False).aggregate(lambda x: ','.join(map(str, x)))
print(df)