在1个网站上并行运行多个Spiders？

小编典典

在1个网站上并行运行多个Spiders？

scrapy

我想抓取一个包含两部分的网站，而我的脚本却没有我所需的速度快。

是否可以发射2个spider，一个用于抓取第一部分，第二个用于抓取第二部分？

我尝试过2个不同的班级，并进行运营

scrapy crawl firstSpider
scrapy crawl secondSpider

但我认为这并不明智。

阅读 394

2020-04-09

共1个答案

小编典典

我认为你正在寻找的是这样的：

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider1(scrapy.Spider):
    # Your first spider definition
    ...

class MySpider2(scrapy.Spider):
    # Your second spider definition
    ...

process = CrawlerProcess()
process.crawl(MySpider1)
process.crawl(MySpider2)
process.start() # the script will block here until all crawling jobs are finished

2020-04-09