regex - the regular expression \p{Punct} misses unicode punctuation in java -


i wrote little test demonstrate

@test public void missingpunctuationregex() {     pattern punct = pattern.compile("[\\p{punct}]");      matcher m = punct.matcher("'");     asserttrue("ascii puctuation", m.find());      m = punct.matcher("‘");     asserttrue("unicode puctuation", m.find()); } 

the first assert passes, , second 1 fails. may have squint see it, 'left single quotation mark' (u+2018) , should covered punctuation far can tell.

how match punctuations in java regular expressions?

you can use unicode_character_class flag make \p{punct} match unicode punctuation.


Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -