htmlcxx - HTML和CSS的C++解析器


LGPL
跨平台
C/C++

软件简介

htmlcxx 是一个 C++ 的 HTML 解析器和 CSS1 的解析器。The parsing politics attempt to mimic
the behavior of Mozilla Firefox, so you should expect parse trees similar to
those created by Firefox. However, it does not insert nonexistent stuff in
your HTML. Therefore, serializing the DOM tree gives exactly the same output
as the original HTML document. Another key feature is an STL-like tree
navigation API provided by the tree.hh template library.

示例代码:

  #include <htmlcxx/html/ParserDom.h>  
  ...

  //Parse some html code  
  string html = "<html><body>hey</body></html>";  
  HTML::ParserDom parser;  
  tree<HTML::Node> dom = parser.parseTree(html);

  //Print whole DOM tree  
  cout << dom << endl;

  //Dump all links in the tree  
  tree<HTML::Node>::iterator it = dom.begin();  
  tree<HTML::Node>::iterator end = dom.end();  
  for (; it != end; ++it)  
  {  
    if (it->tagName() == "A")  
    {  
        it->parseAttributes();  
        cout << it->attributes("href");  
    }  
  }

  //Dump all text of the document  
  it = dom.begin();  
  end = dom.end();  
  for (; it != end; ++it)  
  {  
    if ((!it->isTag()) && (!it->isComment()))  
    {  
        cout << it->text();  
    }  
  }