sql - How do I accumulate arrays into maps? -
i have table t
columns:
cookie string keywords array<string> fqdn string pixel bigint
i want write like
select cookie, ???? t group cookie;
to table columns
cookie string keywords map<string,int> fqdn map<string,int> pixel array<bigint>
where
cookie
unique (guaranteedby cookie
)keywords
counts how many times keyword appeared in arrays in original tablet
fqdn
counts how many times domain appeared in rows given cookiepixel
counts how many times pixel appeared in rows given cookie
you can use "vector" udf's in brickhouse ( http://github.com/klout/brickhouse ). in brickhouse, either array or map can considered "vector". array, array index considered dimension, , numeric value considered magnitude in dimension. map, consider string key "dimension" of vector in large dimensional space, , map value magnitude. ( text-analysis type problems, similar looks doing).
something following should work
select cookie, union_vector_sum( keyword_map), union_vector_sum( map( fqdn, 1 ) ), collect_set( pixel) ( select cookie, fqdn, pixel, collect( keyword, 1 ) keyword_map t lateral view explode( keywords ) k keyword group cookie, fqdn, pixel ) xk group cookie;
we should have new map constructor udf takes array , single value, don't need inner explode , collect. don't think produce additional map-reduce step in form however.
there vector, , "bag of words" udfs in brickhouse now, should add more. have special requests ??
Comments
Post a Comment