hadoop - Questions about Oozie/Sqoop -


i have few questions:

1. why there mapreduce process in sqoop load data hdfs mysql?  

e.g.

data in hdfs on directory: /foo/bar

to load data in mysql bar table, why there mapreduce process?

sqoop export --connect jdbc:mysql://localhost/hduser --table foo -m 1 --export-dir /foo/bar

after entering above command, mapreduce process executes.

2. how can enable/disable key in mysql using sqoop/oozie? 

since huge data getting loaded mysql, need use enable/disable. how achieve it?

3. how run multiple oozie jobs in parallel?   4. how run oozie jobs in cron? 

you can answer 1 or more questions.

thank you.

i'll go through questions 1 one. feel free ask more questions in comments , elaborate on things unclear you.

1. why there mapreduce process in sqoop load data hdfs mysql?

this because sqoop based on mapreduce. if consider how files stored in hdfs, split small chunks , these chunks stored across cluster (some of chunks might on same node). makes perfect sense have mapreduce job map tasks read these chunks of data in parallel , write them mysql.

2. how can enable/disable key in mysql using sqoop/oozie?

i don't know answer one. feel question little ambiguous. please try adding more details & if find i'll on this.

3. how run multiple oozie jobs in parallel?

each oozie job defined workflow.xml , job.properties.

  • if you're talking manual execution of multiple oozie workflows (jobs), can running command start oozie jobs jobs want run in parallel. sample command: oozie job -config job.properties -run

  • if you're talking running multiple actions within oozie workflow in parallel, can have fork trigger off multiple actions in parallel & join point parallel actions meet upon completion. example:

    <fork name = 'samplefork'>    <path start = 'sampleaction1'/>    <path start = 'sampleaction2'/> </fork>  <action name = 'sampleaction`>   ..   ..   ..   <ok = 'joinactions'/>   <error = 'fail'/> </action>  <join name = 'joinactions' 'seqaction3'/> 

4. how run oozie jobs in cron?

if want automate execution of oozie jobs, suggest oozie coordinator. using oozie coordinator, can schedule workflows trigger off after every interval (10 mins, 1 hour, 1 day etc. ).


Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -