regex - the regular expression \p{Punct} misses unicode punctuation in java -


i wrote little test demonstrate

@test public void missingpunctuationregex() {     pattern punct = pattern.compile("[\\p{punct}]");      matcher m = punct.matcher("'");     asserttrue("ascii puctuation", m.find());      m = punct.matcher("‘");     asserttrue("unicode puctuation", m.find()); } 

the first assert passes, , second 1 fails. may have squint see it, 'left single quotation mark' (u+2018) , should covered punctuation far can tell.

how match punctuations in java regular expressions?

you can use unicode_character_class flag make \p{punct} match unicode punctuation.


Comments

Popular posts from this blog

hibernate - How to load global settings frequently used in application in Java -

python 3.x - Mapping specific letters onto a list of words -

objective c - Ownership modifiers with manual reference counting -