我正在尝试使用Selenium下载一个验证码图像,但是,下载的图像与浏览器中显示的图像不同。如果我尝试在不更改浏览器的情况下再次下载该图像,则会得到另一种图像。
有什么想法吗?
from selenium import webdriver import urllib driver = webdriver.Firefox() driver.get("http://sistemas.cvm.gov.br/?fundosreg") # Change frame. driver.switch_to.frame("Main") # Download image/captcha. img = driver.find_element_by_xpath(".//*[@id='trRandom3']/td[2]/img") src = img.get_attribute('src') urllib.request.urlretrieve(src, "captcha.jpeg")
因为图片的链接会在您打开该链接后src为您提供一个随机的 新验证 码图片!
src
src可以从屏幕快照中截取屏幕快照,而不是从图像的上下载文件。但是,您需要下载Pillow(pip installPillow)并按照此答案中提到的方式使用它:
Pillow
pip installPillow
from PIL import Image from selenium import webdriver def get_captcha(driver, element, path): # now that we have the preliminary stuff out of the way time to get that image :D location = element.location size = element.size # saves screenshot of entire page driver.save_screenshot(path) # uses PIL library to open image in memory image = Image.open(path) left = location['x'] top = location['y'] + 140 right = location['x'] + size['width'] bottom = location['y'] + size['height'] + 140 image = image.crop((left, top, right, bottom)) # defines crop points image.save(path, 'jpeg') # saves new cropped image driver = webdriver.Firefox() driver.get("http://sistemas.cvm.gov.br/?fundosreg") # change frame driver.switch_to.frame("Main") # download image/captcha img = driver.find_element_by_xpath(".//*[@id='trRandom3']/td[2]/img") get_captcha(driver, img, "captcha.jpeg") driver = webdriver.Firefox() driver.get("http://sistemas.cvm.gov.br/?fundosreg") # change frame driver.switch_to.frame("Main") # download image/captcha img = driver.find_element_by_xpath(".//*[@id='trRandom3']/td[2]/img") get_captcha(driver, img, "captcha.jpeg")
(请注意,我对代码进行了一些更改,因此可以在您的情况下使用。)