python - select rows with duplicate observations in pandas -


i working on large dataset , there few duplicates in index. i'd (perhaps visually) check these duplicated rows , decide 1 drop. there way can select slice of dataframe have duplicated indices (or duplicates in columns)?

any appreciated.

use duplicated method of dataframe:

df.duplicated(cols=[...]) 

see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.dataframe.duplicated.html

edit

you can use:

df[df.duplicated(cols=[...]) | df.duplicated(cols=[...], take_last=true)] 

or, can use groupby , filter:

df.groupby([...]).filter(lambda df:df.shape[0] > 1) 

or apply:

df.groupby([...], group_keys=false).apply(lambda df:df if df.shape[0] > 1 else none) 

Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -