scala - Mock a TSV source with Scalding JobTest -
i'm having hard time making unit test scalding job.
my job expects file 3 fields:
textline(args("input")) .map('url -> ('fetchedurl,'date,'info)){ ...
naively would've expected fields got mapped ntuple, without needing further setup. test it's not , further contract needs stablished:
jobtest[com.kohls.crawler.miner] .arg("input", "inputfile") .arg("output", "outputfile") .source(textline("inputfile"), list(("https://en.wikipedia.org/wiki/test" ,"mon apr 14 15:08:11 cdt 2014", "extra info"))) .sink[(string,date,array[byte])](tsv("outputfile")){ ... }
this fails cascading.tuple.fieldsresolverexception: not select fields: [{1}:'url'], from: [{2}:'offset', 'line']
. guess need declare tsv fields in kind of way before feeding textline's input.
most documentation i've found spotty in regard. correct why define test?
you should use tsv
instead of textline
. tsv
takes declared fields second input parameter. job this:
tsv(args("input"), ('fetchedurl,'date,'info), skipheader = false/true).read .map(...) .write(tsv(args("output"), writeheader = false/true)
and job test this:
jobtest[com.kohls.crawler.miner] .arg("input", "inputfile") .arg("output", "outputfile") .source(tsv("inputfile"), list(("https://en.wikipedia.org/wiki/test" ,"mon apr 14 15:08:11 cdt 2014", "extra info"))) .sink[(string,date,array[byte])](tsv("outputfile")) { ... } .run .finish
Comments
Post a Comment