小编典典

用selenium刮网

selenium

我正尝试在此网站上搜索company names, code, industry, sector, mkt cap, etcselenium表中的清单。我是新手,并编写了以下代码:

path_to_chromedriver = r'C:\Documents\chromedriver'
browser = webdriver.Chrome(executable_path=path_to_chromedriver)

url = r'http://sgx.com/wps/portal/sgxweb/home/company_disclosure/stockfacts'
browser.get(url)

time.sleep(15)
output = browser.page_source
print(output)

但是,我可以获取以下标签,但不能获取其中的数据。

            <div class="table-wrapper results-display">
                <table>
                    <thead>
                        <tr></tr>
                    </thead>
                    <tbody></tbody>
                </table>
            </div>
            <div class="pager results-display"></div>

我以前也尝试过BS4进行刮擦,但失败了。任何帮助深表感谢。


阅读 317

收藏
2020-06-26

共1个答案

小编典典

结果是在一个iframe -切换到它,然后得到.page_source

iframe = driver.find_element_by_css_selector("#mainContent iframe")
driver.switch_to.frame(iframe)

我还要添加一个等待表加载的方法:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)

# locate and switch to the iframe
iframe = driver.find_element_by_css_selector("#mainContent iframe")
driver.switch_to.frame(iframe)

# wait for the table to load
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.companyName')))

print(driver.page_source)
2020-06-26