我正在尝试使用Python将多个XML文件合并在一起,并且没有外部库。XML文件具有嵌套元素。
样本文件1:
<root> <element1>textA</element1> <elements> <nested1>text now</nested1> </elements> </root>
样本文件2:
<root> <element2>textB</element2> <elements> <nested1>text after</nested1> <nested2>new text</nested2> </elements> </root>
我想要的是:
<root> <element1>textA</element1> <element2>textB</element2> <elements> <nested1>text after</nested1> <nested2>new text</nested2> </elements> </root>
我试过的
从这个答案。
from xml.etree import ElementTree as et def combine_xml(files): first = None for filename in files: data = et.parse(filename).getroot() if first is None: first = data else: first.extend(data) if first is not None: return et.tostring(first)
我得到什么:
<root> <element1>textA</element1> <elements> <nested1>text now</nested1> </elements> <element2>textB</element2> <elements> <nested1>text after</nested1> <nested2>new text</nested2> </elements> </root>
希望您能看到并理解我的问题。我正在寻找适当的解决方案,任何指导都将是美好的。
为了解决这个问题,使用当前的解决方案,不会合并嵌套元素。
您发布的代码正在执行的操作是组合所有元素,而不管是否存在具有相同标签的元素。因此,您需要遍历元素并按照您认为合适的方式手动检查和组合它们,因为这不是处理XML文件的标准方法。我无法比代码更好地解释它,所以在这里或多或少地注释了一下:
from xml.etree import ElementTree as et class XMLCombiner(object): def __init__(self, filenames): assert len(filenames) > 0, 'No filenames!' # save all the roots, in order, to be processed later self.roots = [et.parse(f).getroot() for f in filenames] def combine(self): for r in self.roots[1:]: # combine each element with the first one, and update that self.combine_element(self.roots[0], r) # return the string representation return et.tostring(self.roots[0]) def combine_element(self, one, other): """ This function recursively updates either the text or the children of an element if another element is found in `one`, or adds it from `other` if not found. """ # Create a mapping from tag name to element, as that's what we are fltering with mapping = {el.tag: el for el in one} for el in other: if len(el) == 0: # Not nested try: # Update the text mapping[el.tag].text = el.text except KeyError: # An element with this name is not in the mapping mapping[el.tag] = el # Add it one.append(el) else: try: # Recursively process the element, and update it in the same way self.combine_element(mapping[el.tag], el) except KeyError: # Not in the mapping mapping[el.tag] = el # Just add it one.append(el) if __name__ == '__main__': r = XMLCombiner(('sample1.xml', 'sample2.xml')).combine() print '-'*20 print r