python - clone element with beautifulsoup -
i have copy part of 1 document another, don't want modify document copy from.
if use .extract()
removes element tree. if append selected element document2.append(document1.tag)
still removes element document1.
as use real files can not save document1 after modification, there way without corrupting document?
there no native clone function in beautifulsoup in versions before 4.4 (released july 2015); you'd have create deep copy yourself, tricky each element maintains links rest of tree.
to clone element , elements, you'd have copy attributes , reset parent-child relationships; has happen recursively. best done not copying relationship attributes , re-seat each recursively-cloned element:
from bs4 import tag, navigablestring def clone(el): if isinstance(el, navigablestring): return type(el)(el) copy = tag(none, el.builder, el.name, el.namespace, el.nsprefix) # work around bug there no builder set # https://bugs.launchpad.net/beautifulsoup/+bug/1307471 copy.attrs = dict(el.attrs) attr in ('can_be_empty_element', 'hidden'): setattr(copy, attr, getattr(el, attr)) child in el.contents: copy.append(clone(child)) return copy
this method kind-of sensitive current beautifulsoup version; tested 4.3, future versions may add attributes need copied too.
you monkeypatch functionality beautifulsoup:
from bs4 import tag, navigablestring def tag_clone(self): copy = type(self)(none, self.builder, self.name, self.namespace, self.nsprefix) # work around bug there no builder set # https://bugs.launchpad.net/beautifulsoup/+bug/1307471 copy.attrs = dict(self.attrs) attr in ('can_be_empty_element', 'hidden'): setattr(copy, attr, getattr(self, attr)) child in self.contents: copy.append(child.clone()) return copy tag.clone = tag_clone navigablestring.clone = lambda self: type(self)(self)
letting call .clone()
on elements directly:
document2.body.append(document1.find('div', id_='someid').clone())
my feature request beautifulsoup project was accepted , tweaked use copy.copy()
function; beautifulsoup 4.4 released can use version (or newer) , do:
import copy document2.body.append(copy.copy(document1.find('div', id_='someid')))
Comments
Post a Comment