python - select rows with duplicate observations in pandas -
i working on large dataset , there few duplicates in index. i'd (perhaps visually) check these duplicated rows , decide 1 drop. there way can select slice of dataframe have duplicated indices (or duplicates in columns)?
any appreciated.
use duplicated
method of dataframe
:
df.duplicated(cols=[...])
see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.dataframe.duplicated.html
edit
you can use:
df[df.duplicated(cols=[...]) | df.duplicated(cols=[...], take_last=true)]
or, can use groupby
, filter
:
df.groupby([...]).filter(lambda df:df.shape[0] > 1)
or apply
:
df.groupby([...], group_keys=false).apply(lambda df:df if df.shape[0] > 1 else none)
Comments
Post a Comment