我有一个XML文件,并且有一个XML模式。我想针对该架构验证文件,并检查其是否符合该架构。我正在使用python,但是如果python中没有这样有用的库,则可以使用任何语言。
我在这里最好的选择是什么?我会担心如何快速启动和运行它。
绝对可以lxml。
lxml
XMLParser使用预定义的架构定义,加载文件fromstring()并捕获任何XML架构错误:
XMLParser
fromstring()
from lxml import etree def validate(xmlparser, xmlfilename): try: with open(xmlfilename, 'r') as f: etree.fromstring(f.read(), xmlparser) return True except etree.XMLSchemaError: return False schema_file = 'schema.xsd' with open(schema_file, 'r') as f: schema_root = etree.XML(f.read()) schema = etree.XMLSchema(schema_root) xmlparser = etree.XMLParser(schema=schema) filenames = ['input1.xml', 'input2.xml', 'input3.xml'] for filename in filenames: if validate(xmlparser, filename): print("%s validates" % filename) else: print("%s doesn't validate" % filename)
如果模式文件包含带有编码(例如<?xml version="1.0" encoding="UTF-8"?>)的xml标记,则上面的代码将产生以下错误:
<?xml version="1.0" encoding="UTF-8"?>
Traceback (most recent call last): File "<input>", line 2, in <module> schema_root = etree.XML(f.read()) File "src/lxml/etree.pyx", line 3192, in lxml.etree.XML File "src/lxml/parser.pxi", line 1872, in lxml.etree._parseMemoryDocument ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
一种解决方案是以字节模式打开文件:open(..., 'rb')
open(..., 'rb')
[...] def validate(xmlparser, xmlfilename): try: with open(xmlfilename, 'rb') as f: [...] with open(schema_file, 'rb') as f: [...]