Parsing time for a large XML document in Perl -
i have large xml file containing around 1.5 million lines. basic skeleton of file is
<org> <dept> <emp> ... </emp> <emp> ... </emp> ... </dept> </org>
each <emp>
node can have upto .8 million lines
i required parse , hold entire data in hash.
i have tried using xml::simple
(i'm not allowed use other modules xml::twig
or xml::libxml
).
the problem takes around 5.5 minutes parse entire file. need bring down order of 30 seconds.
i tried splitting multiple files, each containing 1 <emp> .. </emp>
section. example, got around 100 files.
then used fork
, used 100 child processes parse each of these files.
i reduced total time around 1.5 minutes, yet find means communicate data child processes parent process.
the obvious solution use xml::twig
designed purpose this. allows declare callback subroutines called when prescribed xml elements closed, , doesn't try read complete structure memory.
without proper sample of data can't further, looks declare callback dept
, emp
elements.
Comments
Post a Comment