python - select rows with duplicate observations in pandas -
i working on large dataset , there few duplicates in index. i'd (perhaps visually) check these duplicated rows , decide 1 drop. there way can select slice of dataframe have duplicated indices (or duplicates in columns)?
any appreciated.
use duplicated method of dataframe:
df.duplicated(cols=[...]) see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.dataframe.duplicated.html
edit
you can use:
df[df.duplicated(cols=[...]) | df.duplicated(cols=[...], take_last=true)] or, can use groupby , filter:
df.groupby([...]).filter(lambda df:df.shape[0] > 1) or apply:
df.groupby([...], group_keys=false).apply(lambda df:df if df.shape[0] > 1 else none)
Comments
Post a Comment