Parsing time for a large XML document in Perl -

i have large xml file containing around 1.5 million lines. basic skeleton of file is

<org>    <dept>       <emp> ... </emp>       <emp> ... </emp>       ...    </dept> </org>

each <emp> node can have upto .8 million lines

i required parse , hold entire data in hash.

i have tried using xml::simple (i'm not allowed use other modules xml::twig or xml::libxml).

the problem takes around 5.5 minutes parse entire file. need bring down order of 30 seconds.

i tried splitting multiple files, each containing 1 <emp> .. </emp> section. example, got around 100 files.

then used fork , used 100 child processes parse each of these files.

i reduced total time around 1.5 minutes, yet find means communicate data child processes parent process.

the obvious solution use xml::twig designed purpose this. allows declare callback subroutines called when prescribed xml elements closed, , doesn't try read complete structure memory.

without proper sample of data can't further, looks declare callback dept , emp elements.

Search This Blog

Brazzel

Parsing time for a large XML document in Perl -

Comments

Post a Comment

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

Reading inputs from Keyboard in Objective C -

inno setup - TLabel or TNewStaticText - change .Font.Style on Focus like Cursor changes with .Cursor -