Node.getTextContent()返回当前节点及其后代的文本内容。
有没有一种方法来获取当前节点的文本内容,而不是后代的文本。
例
<paragraph> <link>XML</link> is a <strong>browser based XML editor</strong> editor allows users to edit XML data in an intuitive word processor. </paragraph>
预期产量
paragraph = is a editor allows users to edit XML data in an intuitive word processor. link = XML strong = browser based XML editor
我尝试下面的代码
String str = "<paragraph>"+ "<link>XML</link>"+ " is a "+ "<strong>browser based XML editor</strong>"+ "editor allows users to edit XML data in an intuitive word processor."+ "</paragraph>"; org.w3c.dom.Document domDoc = null; DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder docBuilder; try { docBuilder = docFactory.newDocumentBuilder(); ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes()); domDoc = docBuilder.parse(bis); } catch (ParserConfigurationException e1) { e1.printStackTrace(); } catch (SAXException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } DocumentTraversal traversal = (DocumentTraversal) domDoc; NodeIterator iterator = traversal.createNodeIterator( domDoc.getDocumentElement(), NodeFilter.SHOW_ELEMENT, null, true); for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) { String tagname = ((Element) n).getTagName(); System.out.println(tagname + "=" + ((Element)n).getTextContent()); }
但是它给出了这样的输出
paragraph=XML is a browser based XML editoreditor allows users to edit XML data in an intuitive word processor. link=XML strong=browser based XML editor
请注意, 段落 元素包含 链接 和 强 标签的文本,我不希望这样。请提出一些建议?
您想要的是过滤节点的子代,<paragraph>使其仅保留具有节点类型的子代Node.TEXT_NODE。
<paragraph>
Node.TEXT_NODE
这是方法的示例,它将为您返回所需的内容
public static String getFirstLevelTextContent(Node node) { NodeList list = node.getChildNodes(); StringBuilder textContent = new StringBuilder(); for (int i = 0; i < list.getLength(); ++i) { Node child = list.item(i); if (child.getNodeType() == Node.TEXT_NODE) textContent.append(child.getTextContent()); } return textContent.toString(); }
在您的示例中,这意味着:
String str = "<paragraph>" + // "<link>XML</link>" + // " is a " + // "<strong>browser based XML editor</strong>" + // "editor allows users to edit XML data in an intuitive word processor." + // "</paragraph>"; Document domDoc = null; try { DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder docBuilder = docFactory.newDocumentBuilder(); ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes()); domDoc = docBuilder.parse(bis); } catch (Exception e) { e.printStackTrace(); } DocumentTraversal traversal = (DocumentTraversal) domDoc; NodeIterator iterator = traversal.createNodeIterator(domDoc.getDocumentElement(), NodeFilter.SHOW_ELEMENT, null, true); for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) { String tagname = ((Element) n).getTagName(); System.out.println(tagname + "=" + getFirstLevelTextContent(n)); }
输出:
paragraph= is a editor allows users to edit XML data in an intuitive word processor. link=XML strong=browser based XML editor
它的作用是在节点的所有子节点上进行迭代,仅保留TEXT(因此不包括注释,节点等)并累积其各自的文本内容。
有中没有直接的方法Node或Element得到的只有在第一级别的文本内容。
Node
Element