Question
How to compare lists in two Pandas dataframes to get the common elements?
I want to compare lists from columns set_1
and set_2
in df_2
with ins
column in df_1
to find all common elements.
I've started doing it for one row and one column but I have no idea how to compare all rows between two dfs to get the desired result.
Here is my code comparing set_1
and ins
in the first row:
import pandas as pd
d1 = {'chr': [1, 1], 'start': [64, 1000], 'end': [150, 2000], 'family': ['a', 'b'],
'ins': [['P1-12', 'P1-22', 'P1-25', 'P1-28', 'P1-90'],
['P1-6', 'P1-89', 'P1-92', 'P1-93']]}
df1 = pd.DataFrame.from_dict(data=d1)
d2 = {'set_1': [['P1-12', 'P1-25', 'P1-28'], ['P1-6', 'P1-89', 'P1-93']],
'set_2': [['P1-89', 'P1-92', 'P1-93'], ['P1-25', 'P1-28', 'P1-90']]}
df2 = pd.DataFrame.from_dict(data=d2)
matches = [x for x in df2.iloc[0, 0] if x in df1.iloc[0, 4]]
There is a tiny part of my input data (in original input, df1 contains ~13k rows and df2 ~90):
df1:
chr start end family ins
0 1 64 150 a [P1-12, P1-22, P1-25, P1-28, P1-90]
1 1 1000 2000 b [P1-6, P1-89, P1-92, P1-93]
df2:
set_1 set_2
0 [P1-12, P1-25, P1-28] [P1-89, P1-92, P1-93]
1 [P1-6, P1-89, P1-93] [P1-25, P1-28, P1-90]
The desired output should look like this:
chr start end family df2_index ins_set1 ins_set2
0 1 64 150 a 0 [P1-12, P1-25, P1-28] []
1 1 64 150 a 1 [] [P1-25, P1-28, P1-90]
2 1 1000 2000 b 0 [] [P1-89, P1-92, P1-93]
3 1 1000 2000 b 1 [P1-6, P1-89, P1-93] []