我已经用python与selenium结合编写了一个脚本,以解析网页中表格中的一些可用日期。该表位于标题下NPL Victoria Betting Odds。表格数据位于id内tournamentTable。你可以看到三个日期还有10 Aug 2018,11 Aug 2018和12 Aug 2018。我希望根据我下面的预期输出来解析和排列它们。
NPL Victoria Betting Odds
tournamentTable
10 Aug 2018
11 Aug 2018
12 Aug 2018
网页连结
到目前为止,这是我的尝试:
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from bs4 import BeautifulSoup link = "find the link above" def get_content(driver,url): driver.get(url) for items in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"#tournamentTable tr"))): try: idate = items.find_element_by_css_selector("th span[class^='datet']").text except Exception: idate = "" try: itime = items.find_element_by_css_selector("td.table-time").text except Exception: itime = "" print(f'{idate}--{itime}') if __name__ == '__main__': driver = webdriver.Chrome() wait = WebDriverWait(driver,10) try: get_content(driver,link) finally: driver.quit()
目前,我的输出如下:
-- 10 Aug 2018-- -- --09:30 --10:15 11 Aug 2018-- -- --05:00 --05:00 --09:00 12 Aug 2018-- -- --06:00 --06:00
我的预期输出:
10 Aug 2018--09:30 10 Aug 2018--10:15 11 Aug 2018--05:00 11 Aug 2018--05:00 11 Aug 2018--09:00 12 Aug 2018--06:00 12 Aug 2018--06:00
尝试使用以下代码:
def get_content(driver,url): driver.get(url) dates = len(wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"#tournamentTable tr.center.nob-border")))) for d in range(dates): item = driver.find_elements_by_css_selector("#tournamentTable tr.center.nob-border")[d] try: idate = item.find_element_by_css_selector("th span[class^='datet']").text except Exception: idate = "" for time_td in item.find_elements_by_xpath(".//following::td[contains(@class, 'table-time') and not((preceding::tr[@class='center nob-border'])[%d])]" % (d + 2)): try: itime = time_td.text except Exception: itime = "" print(f'{idate}--{itime}')