xpaf - 开源解析框架


Apache
跨平台
C/C++

软件简介

XPath-based Parsing Framework (XPaF) 是一个简单、方便的开源解析框架,便于从 HTML 和 XML
文档中提取语法上的相关性(subject-predicate-object triples)。

代码示例:

<table>
  <tr>
    <td class="name">Aaron</td>
    <td class="occ">Engineer</td>
  </tr>
  <tr>
    <td class="name">Jennifer</td>
    <td class="occ">Archeologist</td>
  </tr>
</table>


parser_name: "my_parser"
relation_tmpls {
  subject: "//td[@class='name']"
  predicate: "occupation"
  object: "//td[@class='occ']"

  subject_cardinality: MANY
  object_cardinality: MANY
}