python - Slice one Pandas DataFrame based on another -
i have created following pandas dataframe
based on lists of ids.
in [8]: df = pd.dataframe({'groups' : [1,2,3,4], 'id' : ["[1,3]","[2]","[5]","[4,6,7]"]}) out[9]: groups id 0 1 [1,3] 1 2 [2] 2 3 [5] 3 4 [4,6,7]
there dataframe
following.
in [12]: df2 = pd.dataframe({'id' : [1,2,3,4,5,6,7], 'path' : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3","p1,p2","p1","p2,p3,p4"]})
i need path values each group. e.g
groups path 1 p1,p2,p3,p4 p1,p5,p5,p7 2 p1,p2,p1 3 p1,p2 4 p1,p2,p3,p3 p1 p2,p3,p4
i'm not sure quite best way it, worked me. incidentally works if create id variable in df 1 without "" marks, i.e. lists, not strings...
import itertools df = pd.dataframe({'groups' : [1,2,3,4], 'id' : [[1,3],[2],[5],[4,6,7]]}) df2 = pd.dataframe({'id' : [1,2,3,4,5,6,7], 'path' : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3","p1,p2","p1","p2,p3,p4"]}) paths = [[] group in df.groups.unique()] x in df.index: paths[x].extend(itertools.chain(*[list(df2[df2.id == int(y)]['path']) y in df.id[x]])) df['paths'] = pd.series(paths) df
there neater way of doing this, odd data structure in way. gives following output
groups id paths 0 1 [1, 3] [p1,p2,p3,p4, p1,p5,p5,p7] 1 2 [2] [p1,p2,p1] 2 3 [5] [p1,p2] 3 4 [4, 6, 7] [p1,p2,p3,p3, p1, p2,p3,p4]
Comments
Post a Comment