scala - Mock a TSV source with Scalding JobTest -


i'm having hard time making unit test scalding job.

my job expects file 3 fields:

  textline(args("input"))     .map('url -> ('fetchedurl,'date,'info)){   ... 

naively would've expected fields got mapped ntuple, without needing further setup. test it's not , further contract needs stablished:

jobtest[com.kohls.crawler.miner]   .arg("input", "inputfile")   .arg("output", "outputfile")   .source(textline("inputfile"), list(("https://en.wikipedia.org/wiki/test" ,"mon apr 14 15:08:11 cdt 2014", "extra info")))   .sink[(string,date,array[byte])](tsv("outputfile")){ ... } 

this fails cascading.tuple.fieldsresolverexception: not select fields: [{1}:'url'], from: [{2}:'offset', 'line']. guess need declare tsv fields in kind of way before feeding textline's input.

most documentation i've found spotty in regard. correct why define test?

you should use tsv instead of textline. tsv takes declared fields second input parameter. job this:

tsv(args("input"), ('fetchedurl,'date,'info), skipheader = false/true).read   .map(...)   .write(tsv(args("output"), writeheader = false/true) 

and job test this:

jobtest[com.kohls.crawler.miner]   .arg("input", "inputfile")   .arg("output", "outputfile")   .source(tsv("inputfile"), list(("https://en.wikipedia.org/wiki/test" ,"mon apr 14 15:08:11 cdt 2014", "extra info")))   .sink[(string,date,array[byte])](tsv("outputfile")) { ... }   .run   .finish 

Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -