python - removing relevant hyphens in text -
lets have text looks like:
a = "i inclin- ed ask simple questions"
i first extract hyphenated words, i.e first identify if hyphen present in text, easy. use re.match("\s*-\s*", a) instance check if sentence has hyphens.
1) next extract preceding , following partial words (i case extract "inclin" , "ed")
2) next merge them "inclined" , print such words.
i stuck @ step 1. please help.
>>> import re >>> = "i inclin- ed ask simple questions" >>> result = re.findall('([a-za-z]+-)\s+(\w+)', a) >>> result [('inclin-', 'ed')] >>> [first.rstrip('-') + second first, second in result] ['inclined']
or, can make first group save word without trailing -
:
>>> result = re.findall('([a-za-z]+)-\s+(\w+)', a) >>> result [('inclin', 'ed')] >>> [''.join(item) item in result] ['inclined']
this work multiple matches in string:
>>> = "i inclin- ed ask simp- le quest- ions" >>> result = re.findall('([a-za-z]+)-\s+(\w+)', a) >>> [''.join(item) item in result] ['inclined', 'simple', 'questions']
Comments
Post a Comment