scrapy CrawlSpider 爬全站数据

网友投稿 237 2022-09-02

scrapy CrawlSpider 爬全站数据

# -*- coding: utf-8 -*-import scrapyfrom scrapy.spiders import CrawlSpider, Rule# from scrapy.linkextractors.sgml import SgmlLinkExtractorfrom scrapy.linkextractors import LinkExtractorfrom CrawlSpiderTest.items import CrawlspidertestItemclass CsdnarticleSpider(CrawlSpider): name = 'csdnArticle' allowed_domains = ['blog.csdn.net'] start_urls = [ pagelink = LinkExtractor(allow=('/u012150179/article/details')) rules = [ Rule(pagelink, callback='parse_item', follow=True) ] def parse_item(self, response): item = CrawlspidertestItem() item['title'] = response.css('.title-article::text').extract_first() yield item # def parse(self, response): # pass

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:Mac 安装 Hadoop 3.x
下一篇:python pandas 实战 对时区进行计数,用pyplot绘制前10
相关文章

 发表评论

暂时没有评论,来抢沙发吧~