我正在使用带有子标记的HTML元素,这些子标记我想“忽略”或删除,以便文本仍然存在。刚才,如果我尝试.string使用带有标签的任何元素,那么我得到的仅仅是None。
.string
None
import bs4 soup = bs4.BeautifulSoup(""" <div id="main"> <p>This is a paragraph.</p> <p>This is a paragraph <span class="test">with a tag</span>.</p> <p>This is another paragraph.</p> </div> """) main = soup.find(id='main') for child in main.children: print child.string
输出:
This is a paragraph. None This is another paragraph.
我要第二行This is a paragraph with a tag.。我该怎么做呢?
This is a paragraph with a tag.
for child in soup.find(id='main'): if isinstance(child, bs4.Tag): print child.text
并且,您将获得:
This is a paragraph. This is a paragraph with a tag. This is another paragraph.