python - Big data File: Read and Create structured file -


i have 20+gb dataset structured follows:

1 3  1 2  2 3  1 4  2 1  3 4  4 2 

(note: repetition intentional , there no inherent order in either column.)

i want construct file in following format:

1: 2, 3, 4  2: 3, 1  3: 4  4: 2 

here problem; have tried writing scripts in both python , c++ load in file, create long strings, , write file line-by-line. seems, however, neither language capable of handling task @ hand. have suggestions how tackle problem? specifically, there particular method/program optimal this? or guided directions appreciated.

you can try using hadoop. can run stand-alone map reduce program. mapper output first column key , second column value. outputs same key go 1 reducer. have key , list of values key. can run through values list , output (key, valuestring) final output desire. can start simple hadoop tutorial , mapper , reducer suggested. however, i've not tried scale 20gb data on stand-alone hadoop system. may try. hope helps.


Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -