python - Removing an element, but not the text after it -
i have xml
file similar this:
<root> <a>some <b>bad</b> text <i>that</i> <u>do <i>not</i></u> want keep.</a> </root>
i want remove text in <b>
or <u>
elements (and descendants), , print rest. tried:
from __future__ import print_function import xml.etree.elementtree et tree = et.parse('a.xml') root = tree.getroot() parent_map = {c:p p in root.iter() c in p} item in root.findall('.//b'): parent_map[item].remove(item) item in root.findall('.//u'): parent_map[item].remove(item) print(''.join(root.itertext()).strip())
(i used recipe in this answer build parent_map
). problem, of course, remove(item)
i'm removing text after element, , result is:
some
whereas want is:
some text want keep.
is there solution?
if won't end using better, can use clear()
instead of remove()
keeping tail of element:
import xml.etree.elementtree et data = """<root> <a>some <b>bad</b> text <i>that</i> <u>do <i>not</i></u> want keep.</a> </root>""" tree = et.fromstring(data) = tree.find('a') element in a: if element.tag in ('b', 'u'): tail = element.tail element.clear() element.tail = tail print et.tostring(tree)
prints (see empty b
, u
tags):
<root> <a>some <b /> text <i>that</i> <u /> want keep.</a> </root>
also, here's solution using xml.dom.minodom
:
import xml.dom.minidom data = """<root> <a>some <b>bad</b> text <i>that</i> <u>do <i>not</i></u> want keep.</a> </root>""" dom = xml.dom.minidom.parsestring(data) = dom.getelementsbytagname('a')[0] child in a.childnodes: if getattr(child, 'tagname', '') in ('u', 'b'): a.removechild(child) print dom.toxml()
prints:
<?xml version="1.0" ?><root> <a>some text <i>that</i> want keep.</a> </root>
Comments
Post a Comment