如果浏览到,https://httpbin.org/headers我希望得到以下JSON响应:
https://httpbin.org/headers
{ "headers": { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": "en-US,en;q=0.5", "Connection": "close", "Host": "httpbin.org", "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0" } }
但是,如果我使用selenium
from selenium import webdriver from selenium.webdriver.firefox.options import Options options = Options() options.headless = True driver = webdriver.Firefox(options=options) url = 'https://httpbin.org/headers' driver.get(url) print(driver.page_source) driver.close()
我懂了
<html platform="linux" class="theme-light" dir="ltr"><head><meta http-equiv="Content-Security-Policy" content="default-src 'none' ; script-src resource:; "><link rel="stylesheet" type="text/css" href="resource://devtools-client-jsonview/css/main.css"><script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="viewer-config" src="resource://devtools-client-jsonview/viewer-config.js"></script><script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="json-viewer" src="resource://devtools-client-jsonview/json-viewer.js"></script></head><body><div id="content"><div id="json">{ "headers": { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": "en-US,en;q=0.5", "Connection": "close", "Host": "httpbin.org", "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0" } } </div></div><script src="resource://devtools-client-jsonview/lib/require.js" data-main="resource://devtools-client-jsonview/viewer-config.js"></script></body></html>
HTML标记来自哪里?如何从获得HTTP请求的原始JSON响应driver.page_source?
driver.page_source
除了原始JSON响应外,driver.page_source还包含HTML以在浏览器中“漂亮地打印”响应。如果使用Firefox DOM和Style Inspector在浏览器中查看JSON响应的源,您将得到相同的结果。
要获取原始JSON响应,您可以照常浏览HTML元素:
print(driver.find_element_by_xpath("//div[@id='json']").text)