Using foreach in R doubles memory usage -
i'm using r 2.15 in ubuntu distro.
i applying function assign keywords streaming text data popular social networking site. aim make process more efficient splitting data 2 parts , applying function:
textd<-data.frame(text=c("they","dont","think","it","be","like is", "but do"),keywordid=0) textd<-split(textd, seq(nrow(textd)) %/% 2 %% 2 == 0) keywords<-data.frame(kwds=c("be","do","is"),keywordid=1:3) library(doparallel) registerdoparallel(2) library(foreach) textd<-foreach (j = 1:2)%dopar%{ t<-textd[[j]] (i in keywords$kwds){ #for loop assign keyword ids tmp<-grepl(i, t$text, ignore.case = t) cond<-tmp & t$keywordid==0 if (length(t$keywordid[cond]) > 0){ t$keywordid[cond]<-keywords$keywordid[keywords$kwds==i] #if kw field populated... cond2<-tmp & t$keywordid!=0 extra<-t[cond2,] if (length(extra$keywordid) > 0){ extra$keywordid<-keywords$keywordid[keywords$kwds==i] t<-rbind(t,extra)}} } t } library(data.table) textd<-as.data.frame(data.table::rbindlist(textd))
the problem is, doing way makes both cores use same amount of ram, meaning each core doubles amount of ram used. runs out quickly. doing wrong? how ram split in quantity between cores? looking.
try splitting data within loop. this:
library(itertools) registerdoparallel(2) textd<-foreach (t=isplitrows(textd, chunks=2), .combine=rbind,)%dopar%{ (i in keywords$kwds){ #for loop assign keyword ids tmp<-grepl(i, t$text, ignore.case = t) cond<-tmp & t$keywordid==0 if (length(t$keywordid[cond]) > 0){ t$keywordid[cond]<-keywords$keywordid[keywords$kwds==i] #if kw field populated... cond2<-tmp & t$keywordid!=0 extra<-t[cond2,] if (length(extra$keywordid) > 0){ extra$keywordid<-keywords$keywordid[keywords$kwds==i] t<-rbind(t,extra)}} } return(t) }
Comments
Post a Comment