使用python和BeautifulSoup从网页中检索链接

小编典典

使用python和BeautifulSoup从网页中检索链接

all

如何检索网页的链接并使用 Python 复制链接的 url 地址？

阅读 76

2022-08-27

共1个答案

小编典典

这是在 BeautifulSoup 中使用 SoupStrainer 类的简短片段：

import httplib2
from bs4 import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')

for link in BeautifulSoup(response, parse_only=SoupStrainer('a')):
    if link.has_attr('href'):
        print(link['href'])

BeautifulSoup 文档其实相当不错，涵盖了一些典型场景：

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

编辑：请注意，我使用了 SoupStrainer 类，因为它更有效（内存和速度方面），如果您提前知道要解析什么。

2022-08-27