sql - How do I accumulate arrays into maps? -


i have table t columns:

cookie     string   keywords   array<string>    fqdn       string   pixel      bigint   

i want write like

select cookie, ???? t group cookie; 

to table columns

cookie     string   keywords   map<string,int>    fqdn       map<string,int>   pixel      array<bigint> 

where

  • cookie unique (guaranteed by cookie)
  • keywords counts how many times keyword appeared in arrays in original table t
  • fqdn counts how many times domain appeared in rows given cookie
  • pixel counts how many times pixel appeared in rows given cookie

you can use "vector" udf's in brickhouse ( http://github.com/klout/brickhouse ). in brickhouse, either array or map can considered "vector". array, array index considered dimension, , numeric value considered magnitude in dimension. map, consider string key "dimension" of vector in large dimensional space, , map value magnitude. ( text-analysis type problems, similar looks doing).

something following should work

select cookie,    union_vector_sum( keyword_map),    union_vector_sum( map( fqdn, 1 ) ),    collect_set( pixel) (   select cookie, fqdn, pixel,          collect( keyword, 1 ) keyword_map   t   lateral view explode( keywords ) k keyword   group cookie, fqdn, pixel ) xk group cookie; 

we should have new map constructor udf takes array , single value, don't need inner explode , collect. don't think produce additional map-reduce step in form however.

there vector, , "bag of words" udfs in brickhouse now, should add more. have special requests ??


Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -