python - removing relevant hyphens in text -


lets have text looks like:

a = "i inclin- ed ask simple questions"

i first extract hyphenated words, i.e first identify if hyphen present in text, easy. use re.match("\s*-\s*", a) instance check if sentence has hyphens.

1) next extract preceding , following partial words (i case extract "inclin" , "ed")

2) next merge them "inclined" , print such words.

i stuck @ step 1. please help.

>>> import re >>> = "i inclin- ed ask simple questions" >>> result = re.findall('([a-za-z]+-)\s+(\w+)', a) >>> result [('inclin-', 'ed')]  >>> [first.rstrip('-') + second first, second in result] ['inclined'] 

or, can make first group save word without trailing -:

>>> result = re.findall('([a-za-z]+)-\s+(\w+)', a) >>> result [('inclin', 'ed')] >>> [''.join(item) item in result] ['inclined'] 

this work multiple matches in string:

>>> = "i inclin- ed ask simp- le quest- ions" >>> result = re.findall('([a-za-z]+)-\s+(\w+)', a) >>> [''.join(item) item in result] ['inclined', 'simple', 'questions'] 

Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -