python - select rows with duplicate observations in pandas -

i working on large dataset , there few duplicates in index. i'd (perhaps visually) check these duplicated rows , decide 1 drop. there way can select slice of dataframe have duplicated indices (or duplicates in columns)?

any appreciated.

use duplicated method of dataframe:

df.duplicated(cols=[...])

see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.dataframe.duplicated.html

edit

you can use:

df[df.duplicated(cols=[...]) | df.duplicated(cols=[...], take_last=true)]

or, can use groupby , filter:

df.groupby([...]).filter(lambda df:df.shape[0] > 1)

or apply:

df.groupby([...], group_keys=false).apply(lambda df:df if df.shape[0] > 1 else none)

Search This Blog

Brazzel

python - select rows with duplicate observations in pandas -

Comments

Post a Comment

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

Reading inputs from Keyboard in Objective C -

javascript - jQuery show full size image on click -