regex - the regular expression \p{Punct} misses unicode punctuation in java -
i wrote little test demonstrate
@test public void missingpunctuationregex() { pattern punct = pattern.compile("[\\p{punct}]"); matcher m = punct.matcher("'"); asserttrue("ascii puctuation", m.find()); m = punct.matcher("‘"); asserttrue("unicode puctuation", m.find()); }
the first assert passes, , second 1 fails. may have squint see it, 'left single quotation mark' (u+2018) , should covered punctuation far can tell.
how match punctuations in java regular expressions?
you can use unicode_character_class
flag make \p{punct}
match unicode punctuation.
Comments
Post a Comment