java - Regex Pattern, Matcher, split or pattern.split() which is the most efficient -
i trying improve speed of application playing way of getting information.
i read html page url
, other information. use string.contains()
, string.split()
. wondering efficient way this. looked bit , tried of results quite similar me :/
here bit of code (some part here testing) :
pattern p = pattern.compile("\" title=\"read "); //pattern p2 = pattern.compile("online\">"); //pattern p3 = pattern.compile("</a></th>"); pattern p4 = pattern.compile("online\">(.*)</a></th>"); while ((inputline = in.readline()) != null) { if(inputline.contains("<table id=\"updates\">")) { tmp = inputline.split("<tr><th><a href=\""); for(string s : tmp) { if(s.contains("\" title=\"read ")) { //url = s.split("\" title=\"read ")[0].replace(" ", "%20"); //name = s.split("online\">")[1].split("</a></th>")[0]; url = p.split(s)[0].replace(" ", "%20"); //name = p3.split(p2.split(s)[1])[0]; matcher matcher = p4.matcher(s); while(matcher.find()) name = matcher.group(1); array.add(new object(name, url)); } } break; } }
as can see tried here pattern
, matcher
, split
or pattern.split()
know there replaceall or replacefirst
.
in case best way ?
thanks lot.
ps: read here : http://chrononsystems.com/blog/hidden-evils-of-javas-stringsplit-and-stringr pattern.split
better split()
couldn't find bigger benchmark.
----- update ----
pattern p1 = pattern.compile("\" title=\"read "); pattern p2 = pattern.compile("online\">(.*?)</a></th>"); matcher matcher = p2.matcher(""); while( (inputline = in.readline()) != null) { if( (tmp = inputline.split("<tr><th><a href=\"")).length > 1 ) { for(string s : tmp) { if(s.contains("\" title=\"read ")) { url = p1.split(s)[0].replace(" ", "%20"); if(matcher.reset(s).find()) name = matcher.group(1); arrays.add(new object(name, url)); } } break; } }
any string function uses regular expressions (which matches(s)
, replaceall(s,s)
, replacefirst(s,s)
, split(s)
, , split(s,i)
) compiles regular expression , creates matcher object every time, inefficient when used in loop.
if need speed thigs up, first step stop using string functions, , instead use pattern , matcher directly. here's an answer demonstrate this.
and ideally, should create single matcher object, describe in this answer.
for more regex information please check out faq
Comments
Post a Comment