python - Removing an element, but not the text after it -


i have xml file similar this:

<root> <a>some <b>bad</b> text <i>that</i> <u>do <i>not</i></u> want keep.</a> </root> 

i want remove text in <b> or <u> elements (and descendants), , print rest. tried:

from __future__ import print_function import xml.etree.elementtree et  tree = et.parse('a.xml') root = tree.getroot()  parent_map = {c:p p in root.iter() c in p}  item in root.findall('.//b'):   parent_map[item].remove(item) item in root.findall('.//u'):   parent_map[item].remove(item) print(''.join(root.itertext()).strip()) 

(i used recipe in this answer build parent_map). problem, of course, remove(item) i'm removing text after element, , result is:

some 

whereas want is:

some  text  want keep. 

is there solution?

if won't end using better, can use clear() instead of remove() keeping tail of element:

import xml.etree.elementtree et   data = """<root> <a>some <b>bad</b> text <i>that</i> <u>do <i>not</i></u> want keep.</a> </root>"""  tree = et.fromstring(data) = tree.find('a') element in a:     if element.tag in ('b', 'u'):         tail = element.tail         element.clear()         element.tail = tail  print et.tostring(tree) 

prints (see empty b , u tags):

<root> <a>some <b /> text <i>that</i> <u /> want keep.</a> </root> 

also, here's solution using xml.dom.minodom:

import xml.dom.minidom  data = """<root> <a>some <b>bad</b> text <i>that</i> <u>do <i>not</i></u> want keep.</a> </root>"""  dom = xml.dom.minidom.parsestring(data) = dom.getelementsbytagname('a')[0] child in a.childnodes:     if getattr(child, 'tagname', '') in ('u', 'b'):         a.removechild(child)  print dom.toxml() 

prints:

<?xml version="1.0" ?><root> <a>some  text <i>that</i>  want keep.</a> </root> 

Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -