使用Python从HTML文件中提取文本 在Python中,如何实现带参数的装饰器 在Python中发送html邮件 使用Python从HTML文件中提取文本 使用html2text >>> import html2text >>> >>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>")) **Zed's** dead baby, _Zed's_ dead. 或者使用一些配置选项: >>> import html2text >>> >>> h = html2text.HTML2Text() >>> # Ignore converting links from HTML >>> h.ignore_links = True >>> print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!") Hello, world! >>> print(h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")) Hello, world! >>> # Don't Ignore links anymore, I like links >>> h.ignore_links = False >>> print(h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")) Hello, [world](http://earth.google.com/)! 使用BeautifulSoup import urllib from bs4 import BeautifulSoup url = "http://news.bbc.co.uk/2/hi/health/2284783.stm" html = urllib.urlopen(url).read() soup = BeautifulSoup(html) # kill all script and style elements for script in soup(["script", "style"]): script.extract() # rip it out # get text text = soup.get_text() # break into lines and remove leading and trailing space on each lines = (line.strip() for line in text.splitlines()) # break multi-headlines into a line each chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) # drop blank lines text = '\n'.join(chunk for chunk in chunks if chunk) print(text) 在Python中,如何实现带参数的装饰器 在Python中发送html邮件