因此,我试图在WebDriver内的新选项卡上打开网站。我想这样做,因为使用PhantomJS为每个网站打开一个新的WebDriver大约需要3.5秒,所以我想提高速度…
我正在使用多进程python脚本,并且想从每个页面中获取一些元素,因此工作流程如下:
Open Browser Loop throught my array For element in array -> Open website in new tab -> do my business -> close it
但我找不到任何方法来实现这一目标。
这是我正在使用的代码。网站之间永远都需要花时间,我需要它很快。允许使用其他工具,但是我不知道有太多工具可以用来删除通过JavaScript加载的网站内容(在加载事件时触发div时创建的div等)为什么我需要selenium… BeautifulSoup无法用于我的某些页面。
#!/usr/bin/env python import multiprocessing, time, pika, json, traceback, logging, sys, os, itertools, urllib, urllib2, cStringIO, mysql.connector, shutil, hashlib, socket, urllib2, re from selenium import webdriver from selenium.webdriver.common.keys import Keys from PIL import Image from os import listdir from os.path import isfile, join from bs4 import BeautifulSoup from pprint import pprint def getPhantomData(parameters): try: # We create WebDriver browser = webdriver.Firefox() # Navigate to URL browser.get(parameters['target_url']) # Find all links by Selector links = browser.find_elements_by_css_selector(parameters['selector']) result = [] for link in links: # Extract link attribute and append to our list result.append(link.get_attribute(parameters['attribute'])) browser.close() browser.quit() return json.dumps({'data': result}) except Exception, err: browser.close() browser.quit() print err def callback(ch, method, properties, body): parameters = json.loads(body) message = getPhantomData(parameters) if message['data']: ch.basic_ack(delivery_tag=method.delivery_tag) else: ch.basic_reject(delivery_tag=method.delivery_tag, requeue=True) def consume(): credentials = pika.PlainCredentials('invitado', 'invitado') rabbit = pika.ConnectionParameters('localhost',5672,'/',credentials) connection = pika.BlockingConnection(rabbit) channel = connection.channel() # Conectamos al canal channel.queue_declare(queue='com.stuff.images', durable=True) channel.basic_consume(callback,queue='com.stuff.images') print ' [*] Waiting for messages. To exit press CTRL^C' try: channel.start_consuming() except KeyboardInterrupt: pass workers = 5 pool = multiprocessing.Pool(processes=workers) for i in xrange(0, workers): pool.apply_async(consume) try: while True: continue except KeyboardInterrupt: print ' [*] Exiting...' pool.terminate() pool.join()
您可以通过组合键COMMAND+ T或COMMAND+ W(OSX)来实现标签的打开/关闭。在其他操作系统上,您可以使用CONTROL+ T/ CONTROL+ W。
COMMAND
T
W
CONTROL
在硒中,您可以模仿这种行为。您将需要创建一个WebDriver和所需的选项卡。
这是代码。
from selenium import webdriver from selenium.webdriver.common.keys import Keys driver = webdriver.Firefox() driver.get("http://www.google.com/") #open tab driver.find_element_by_tag_name('body').send_keys(Keys.COMMAND + 't') # You can use (Keys.CONTROL + 't') on other OSs # Load a page driver.get('http://stackoverflow.com/') # Make the tests... # close the tab # (Keys.CONTROL + 'w') on other OSs. driver.find_element_by_tag_name('body').send_keys(Keys.COMMAND + 'w') driver.close()