python - Applying a function to a MultiIndex pandas.DataFrame column -


i have multiindex pandas dataframe in want apply function 1 of columns , assign result same column.

in [1]:     import numpy np     import pandas pd     cols = ['one', 'two', 'three', 'four', 'five']     df = pd.dataframe(np.array(list('abcdefghijklmno'), dtype='object').reshape(3,5), index = list('abc'), columns=cols)     df.to_hdf('/tmp/test.h5', 'df')     df = pd.read_hdf('/tmp/test.h5', 'df')     df out[1]:          1     2     3  4    5              b       c      d       e     b    f       g       h            j     c    k       l       m      n       o     3 rows × 5 columns  in [2]:     df.columns = pd.multiindex.from_arrays([list('uuull'), ['one', 'two', 'three', 'four', 'five']])     df['l']['five'] = df['l']['five'].apply(lambda x: x.lower())     df -c:2: settingwithcopywarning: value trying set on copy of slice dataframe. try using .loc[row_index,col_indexer] = value instead  out[2]:          u                      l          1    2     3   4    5             b       c       d       e     b    f      g       h             j     c    k      l       m       n       o     3 rows × 5 columns  in [3]:     df.columns = ['one', 'two', 'three', 'four', 'five']     df     out[3]:          1    2     3   4    5             b       c       d       e     b    f      g       h             j     c    k      l       m       n       o     3 rows × 5 columns  in [4]:     df['five'] = df['five'].apply(lambda x: x.upper())     df out[4]:          1    2     3   4    5             b       c       d       e     b    f      g       h             j     c    k      l       m       n       o     3 rows × 5 columns 

as can see, function not applied column, guess because warning:

-c:2: settingwithcopywarning: value trying set on copy of slice dataframe. try using .loc[row_index,col_indexer] = value instead 

what strange error happens sometimes, , haven't been able understand when happens , when not.

i managed apply function slicing dataframe .loc warning recommended:

in [5]:     df.columns = pd.multiindex.from_arrays([list('uuull'), ['one', 'two', 'three', 'four', 'five']])     df.loc[:,('l','five')] = df.loc[:,('l','five')].apply(lambda x: x.lower())     df  out[5]:          u                      l          1    2     3   4    5             b       c       d       e     b    f      g       h             j     c    k      l       m       n       o     3 rows × 5 columns 

but understand why behavior happens when doing dict-like slicing (e.g. df['l']['five']) , not when using .loc slicing.

note: dataframe comes hdf file not multiindexed perhaps cause of strange behavior?

edit: i'm using pandas v.0.13.1 , numpy v.1.8.0

df['l']['five'] selecting level 0 value 'l' , returning dataframe, column 'five' selected, returning accessed series.

the __getitem__ accessor dataframe (the []), try right thing, , gives correct column. however, chained indexing, see here

to access multi-index, use tuple notation, ('a','b') , .loc unambiguous, e.g. df.loc[:,('a','b')]. furthermore allows multi-axes indexing @ same time (e.g. rows , columns).

so, why not work when chained indexing , assignement, e.g. df['l']['five'] = value.

df['l'] rerturns data frame singly-indexed. python operation df_with_l['five'] selects series index 'five' happens. indicated variable. because pandas sees these operations separate events (e.g. separate calls __getitem__, has treat them linear operations, happen 1 after another.

contrast df.loc[:,('l','five')] passes nested tuple of (:,('l','five')) single call __getitem__. allows pandas deal single entity (and fyi quite bit faster because can directly index frame).

why matter? since chained indexing 2 calls, possible either call may return copy of data because of way sliced. when setting setting copy, , not original frame. impossible pandas figure out because 2 separate python operations not connected.

the settingwithcopy warning 'heuristic' detect (meaning tends catch cases lightweight check). figuring out real way complicated.

the .loc operation single python operation, , can select slice (which still may copy), allows pandas assign slice frame after modified setting values think.

the reason warning, this. when slice array view back, means can set no problem. however, single dtyped array can generate copy if sliced in particular way. multi-dtyped dataframe (meaning has float , object data), yield copy. whether view created dependent on memory layout of array.

note: doesn't have source of data.


Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -