java - Regex Pattern, Matcher, split or pattern.split() which is the most efficient -


i trying improve speed of application playing way of getting information.

i read html page url , other information. use string.contains() , string.split(). wondering efficient way this. looked bit , tried of results quite similar me :/

here bit of code (some part here testing) :

    pattern p = pattern.compile("\" title=\"read ");     //pattern p2 = pattern.compile("online\">");     //pattern p3 = pattern.compile("</a></th>");     pattern p4 = pattern.compile("online\">(.*)</a></th>");      while ((inputline = in.readline()) != null)     {         if(inputline.contains("<table id=\"updates\">"))         {             tmp = inputline.split("<tr><th><a href=\"");             for(string s : tmp)             {                 if(s.contains("\" title=\"read "))                 {                     //url = s.split("\" title=\"read ")[0].replace(" ", "%20");                     //name = s.split("online\">")[1].split("</a></th>")[0];                      url = p.split(s)[0].replace(" ", "%20");                     //name = p3.split(p2.split(s)[1])[0];                     matcher matcher = p4.matcher(s);                     while(matcher.find())                         name = matcher.group(1);                      array.add(new object(name, url));                  }             }             break;         }     } 

as can see tried here pattern, matcher, split or pattern.split() know there replaceall or replacefirst.

in case best way ?

thanks lot.

ps: read here : http://chrononsystems.com/blog/hidden-evils-of-javas-stringsplit-and-stringr pattern.split better split() couldn't find bigger benchmark.

----- update ----

                pattern p1 = pattern.compile("\" title=\"read ");                 pattern p2 = pattern.compile("online\">(.*?)</a></th>");                 matcher matcher = p2.matcher("");                  while( (inputline = in.readline()) != null)                 {                     if( (tmp = inputline.split("<tr><th><a href=\"")).length > 1 )                     {                         for(string s : tmp)                         {                             if(s.contains("\" title=\"read "))                             {                                  url = p1.split(s)[0].replace(" ", "%20");                                 if(matcher.reset(s).find())                                     name = matcher.group(1);                                  arrays.add(new object(name, url));                             }                         }                         break;                     }                 } 

any string function uses regular expressions (which matches(s), replaceall(s,s), replacefirst(s,s), split(s), , split(s,i)) compiles regular expression , creates matcher object every time, inefficient when used in loop.

if need speed thigs up, first step stop using string functions, , instead use pattern , matcher directly. here's an answer demonstrate this.

and ideally, should create single matcher object, describe in this answer.

for more regex information please check out faq


Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

inno setup - TLabel or TNewStaticText - change .Font.Style on Focus like Cursor changes with .Cursor -