我正在尝试编写一种简单的算法来读取两个XML文件,它们的节点和结构完全相同,但子节点内部的数据不一定相同,顺序也不相同。我该如何使用Microsoft的XML Diff .DLL创建一个简单的实现来创建第三个临时XML(即两个第一个XML之间的差异)?
MSDN上的XML Diff:
XML差异和补丁工具
XML Diff和Patch GUI工具
要比较的两个不同XML文件的样本XML代码:
<?xml version="1.0" encoding="utf-8" ?> <Stats Date="2011-01-01"> <Player Rank="1"> <Name>Sidney Crosby</Name> <Team>PIT</Team> <Pos>C</Pos> <GP>39</GP> <G>32</G> <A>33</A> <PlusMinus>20</PlusMinus> <PIM>29</PIM> </Player> </Stats> <?xml version="1.0" encoding="utf-8" ?> <Stats Date="2011-01-10"> <Player Rank="1"> <Name>Sidney Crosby</Name> <Team>PIT</Team> <Pos>C</Pos> <GP>42</GP> <G>35</G> <A>34</A> <PlusMinus>22</PlusMinus> <PIM>30</PIM> </Player> </Stats>
想要的结果(两者之间的差异)
<?xml version="1.0" encoding="utf-8" ?> <Stats Date="2011-01-10"> <Player Rank="1"> <Name>Sidney Crosby</Name> <Team>PIT</Team> <Pos>C</Pos> <GP>3</GP> <G>3</G> <A>1</A> <PlusMinus>2</PlusMinus> <PIM>1</PIM> </Player> </Stats>
在这种情况下,我可能会使用XSLT将生成的XML“差异”文件转换为排序的HTML文件,但是我还没有。我要做的就是从“ GP”子节点开始,在第三个XML文件中显示每个节点的每个数值的差。
到目前为止,我有C#代码:
private void CompareXml(string file1, string file2) { XmlReader reader1 = XmlReader.Create(new StringReader(file1)); XmlReader reader2 = XmlReader.Create(new StringReader(file2)); string diffFile = StatsFile.XmlDiffFilename; StringBuilder differenceStringBuilder = new StringBuilder(); FileStream fs = new FileStream(diffFile, FileMode.Create); XmlWriter diffGramWriter = XmlWriter.Create(fs); XmlDiff xmldiff = new XmlDiff(XmlDiffOptions.IgnoreChildOrder | XmlDiffOptions.IgnoreNamespaces | XmlDiffOptions.IgnorePrefixes); bool bIdentical = xmldiff.Compare(file1, file2, false, diffGramWriter); diffGramWriter.Close(); // cleaning up after we are done with the xml diff file File.Delete(diffFile); }
到目前为止,这就是我所拥有的,但是结果是垃圾…请注意,对于每个“ Player”节点,前三个子对象不必进行比较…我该如何实现呢?
有两种直接的解决方案:
解决方法1。
您可以首先对两个文档应用简单的转换,这将删除不应比较的元素。然后,将两个文档的结果与您当前的代码完全比较。这是转换:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> <xsl:template match="Name|Team|Pos"/> </xsl:stylesheet>
当此转换应用于提供的XML文档时:
<Stats Date="2011-01-01"> <Player Rank="1"> <Name>Sidney Crosby</Name> <Team>PIT</Team> <Pos>C</Pos> <GP>39</GP> <G>32</G> <A>33</A> <PlusMinus>20</PlusMinus> <PIM>29</PIM> <PP>10</PP> <SH>1</SH> <GW>3</GW> <Shots>0</Shots> <ShotPctg>154</ShotPctg> <TOIPerGame>20.8</TOIPerGame> <ShiftsPerGame>21:54</ShiftsPerGame> <FOWinPctg>22.6</FOWinPctg> </Player> </Stats>
生成所需的结果文件:
<Stats Date="2011-01-01"> <Player Rank="1"> <GP>39</GP> <G>32</G> <A>33</A> <PlusMinus>20</PlusMinus> <PIM>29</PIM> <PP>10</PP> <SH>1</SH> <GW>3</GW> <Shots>0</Shots> <ShotPctg>154</ShotPctg> <TOIPerGame>20.8</TOIPerGame> <ShiftsPerGame>21:54</ShiftsPerGame> <FOWinPctg>22.6</FOWinPctg> </Player> </Stats>
解决方案2。
这是一个完整的XSLT 1.0解决方案(仅出于方便起见,第二个XML文档已嵌入转换代码中):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:variable name="vrtfDoc2"> <Stats Date="2011-01-01"> <Player Rank="2"> <Name>John Smith</Name> <Team>NY</Team> <Pos>D</Pos> <GP>38</GP> <G>32</G> <A>33</A> <PlusMinus>15</PlusMinus> <PIM>29</PIM> <PP>10</PP> <SH>1</SH> <GW>4</GW> <Shots>0</Shots> <ShotPctg>158</ShotPctg> <TOIPerGame>20.8</TOIPerGame> <ShiftsPerGame>21:54</ShiftsPerGame> <FOWinPctg>22.6</FOWinPctg> </Player> </Stats> </xsl:variable> <xsl:variable name="vDoc2" select= "document('')/*/xsl:variable[@name='vrtfDoc2']/*"/> <xsl:template match="node()|@*" name="identity"> <xsl:param name="pDoc2"/> <xsl:copy> <xsl:apply-templates select="node()|@*"> <xsl:with-param name="pDoc2" select="$pDoc2"/> </xsl:apply-templates> </xsl:copy> </xsl:template> <xsl:template match="/"> <xsl:apply-templates select="*"> <xsl:with-param name="pDoc2" select="$vDoc2"/> </xsl:apply-templates> ----------------------- <xsl:apply-templates select="$vDoc2"> <xsl:with-param name="pDoc2" select="/*"/> </xsl:apply-templates> </xsl:template> <xsl:template match="Player/*"> <xsl:param name="pDoc2"/> <xsl:if test= "not(. = $pDoc2/*/*[name()=name(current())])"> <xsl:call-template name="identity"/> </xsl:if> </xsl:template> <xsl:template match="Name|Team|Pos" priority="20"/> </xsl:stylesheet>
当将此变换应用于与上述相同的第一个文档时,将生成正确的衍射图:
<Stats Date="2011-01-01"> <Player Rank="1"> <GP>39</GP> <PlusMinus>20</PlusMinus> <GW>3</GW> <ShotPctg>154</ShotPctg> </Player> </Stats> ----------------------- <Stats xmlns:xsl="http://www.w3.org/1999/XSL/Transform" Date="2011-01-01"> <Player Rank="2"> <GP>38</GP> <PlusMinus>15</PlusMinus> <GW>4</GW> <ShotPctg>158</ShotPctg> </Player> </Stats>
运作方式:
将该转换应用于第一个文档,并将第二个文档作为参数传递。
这将生成一个XML文档,其唯一的叶子元素节点的值与第二个文档中相应的叶子元素节点的值不同。
执行与上面1.中相同的处理,但是这次在第二个文档上,将第一个文档作为参数传递。
这将产生第二个差异图:一个XML文档,其唯一的叶子元素节点的值**与第一个文档中相应的叶子元素节点的值不同
好的,我最终选择了一个纯粹的C#解决方案来比较两个XML文件,而无需使用XML Diff / Patch .dll,甚至不需要使用XSL转换。我将在下一步中需要XSL转换,以将Xml转换为HTML以供查看,但是我想出了一种算法,该算法仅使用System.Xml和System.Xml.XPath。
这是我的算法:
private void CompareXml(string file1, string file2) { // Load the documents XmlDocument docXml1 = new XmlDocument(); docXml1.Load(file1); XmlDocument docXml2 = new XmlDocument(); docXml2.Load(file2); // Get a list of all player nodes XmlNodeList nodes1 = docXml1.SelectNodes("/Stats/Player"); XmlNodeList nodes2 = docXml2.SelectNodes("/Stats/Player"); // Define a single node XmlNode node1; XmlNode node2; // Get the root Xml element XmlElement root1 = docXml1.DocumentElement; XmlElement root2 = docXml2.DocumentElement; // Get a list of all player names XmlNodeList nameList1 = root1.GetElementsByTagName("Name"); XmlNodeList nameList2 = root2.GetElementsByTagName("Name"); // Get a list of all teams XmlNodeList teamList1 = root1.GetElementsByTagName("Team"); XmlNodeList teamList2 = root2.GetElementsByTagName("Team"); // Create an XmlWriterSettings object with the correct options. XmlWriter writer = null; XmlWriterSettings settings = new XmlWriterSettings(); settings.Indent = true; settings.IndentChars = (" "); settings.OmitXmlDeclaration = false; // Create the XmlWriter object and write some content. writer = XmlWriter.Create(StatsFile.XmlDiffFilename, settings); writer.WriteStartElement("StatsDiff"); // The compare algorithm bool match = false; int j = 0; try { // the list has 500 players for (int i = 0; i < 500; i++) { while (j < 500 && match == false) { // There is a match if the player name and team are the same in both lists if (nameList1.Item(i).InnerText == nameList2.Item(j).InnerText) { if (teamList1.Item(i).InnerText == teamList2.Item(j).InnerText) { match = true; node1 = nodes1.Item(i); node2 = nodes2.Item(j); // Call to the calculator and Xml writer this.CalculateDifferential(node1, node2, writer); j = 0; } } else { j++; } } match = false; } // end Xml document writer.WriteEndElement(); writer.Flush(); } finally { if (writer != null) writer.Close(); } }
XML结果:
<?xml version="1.0" encoding="utf-8"?> <StatsDiff> <Player Rank="1"> <Name>Sidney Crosby</Name> <Team>PIT</Team> <Pos>C</Pos> <GP>0</GP> <G>0</G> <A>0</A> <Points>0</Points> <PlusMinus>0</PlusMinus> <PIM>0</PIM> <PP>0</PP> <SH>0</SH> <GW>0</GW> <OT>0</OT> <Shots>0</Shots> <ShotPctg>0</ShotPctg> <ShiftsPerGame>0</ShiftsPerGame> <FOWinPctg>0</FOWinPctg> </Player> <Player Rank="2"> <Name>Steven Stamkos</Name> <Team>TBL</Team> <Pos>C</Pos> <GP>1</GP> <G>0</G> <A>0</A> <Points>0</Points> <PlusMinus>0</PlusMinus> <PIM>2</PIM> <PP>0</PP> <SH>0</SH> <GW>0</GW> <OT>0</OT> <Shots>4</Shots> <ShotPctg>-0,6000004</ShotPctg> <ShiftsPerGame>-0,09999847</ShiftsPerGame> <FOWinPctg>0,09999847</FOWinPctg> </Player> [...] </StatsDiff>
我已省去了CalculateDifferential()方法的实现,它虽然很隐秘,但又快速有效。这样我就可以得到所需的结果,而不必使用任何其他参考,但要严格遵守最低要求,而不必使用XSL …