Question

Groupby multiple columns and extract top rows based on non-grouped column value

I am trying to solve a problem some what very similar to: https://platform.stratascratch.com/coding/10362-top-monthly-sellers?code_type=2

here is my data frame:

         product  seller  market   total_sales

         books      s1      de     10     
         books      s2      jp     20
         books      s3      in     30
         books      s1      de     25     
         books      s5      in     15
         books      s1      us     12
         books      s2      uk     10
      clothing      s1      de     11
      clothing      s2      de     18
      clothing      s1      uk     55
      clothing      s3      in     31
      clothing      s1      de     14
      clothing      s2      de     10
      clothing      s1      de     35
  electronics      s1       us     18
  electronics      s1       de     12
  electronics      s2       in     16
  electronics      s3       uk     24
  electronics      s1       us     37      
  electronics      s4       jp     27
  electronics      s3       uk     26
  electronics      s1       us     15

Expected output is:

product      seller   market   total_sales

books        s1         de        35
books        s3         in        30
books        s2         jp        20
Clothing     s1         de        60
Clothing     s1         uk        55
Clothing     s3         in        31
electronics  s1         us        70
electronics  s3         uk        50
electronics  s4         jp        27

I want to get top three product category sales in the world.

I was able to aggregate total sales basing on product,seller & group them. After grouping I am able to sort it by total_sales per group. But, I was not able to get only the top-3 total_sale rows per ['product'] group. Cany anyone help on this.

import pandas as pd

df = df.groupby(['product', 'seller','market'], as_index = False).agg({'total_sales':sum})
df = df.sort_values(['product','total_sales'],ascending=False).groupby(['product', 'seller','market',],as_index = False).aggregate(lambda x: ','.join(map(str, x)))
print(df)

3 43 3

1 Jan 1970