发布于2019-08-20 10:23 阅读(912) 评论(0) 点赞(23) 收藏(2)
一顿骚操作:
scrapy startproject douban
cd douban
scrapy genspider douban_movie -t basic douban.com
生成项目文件:
爬取前250的相关数据, 打开相关url https://movie.douban.com/top250可以看到排行前25的数据。通过点击下一步,继续发现url发生了变化https://movie.douban.com/top250?start=25&filter=, 我们可以去掉&filter=,保留https://movie.douban.com/top250?start=25, 发现是可以对网站进行访问并得到相关数据,并且发现豆瓣电影每一页展示25条电影信息,我们可以通过25的倍数去访问相关的页数。
通过观察,进行修改相关的文件
douban_movie.py:
- # -*- coding: utf-8 -*-
- import scrapy
- from scrapy.http import Request
- from douban.items import DoubanItem
-
- class DoubanMovieSpider(scrapy.Spider):
- name = 'douban_movie'
- allowed_domains = []
- def start_requests(self):
- for i in range(0,11):
- num = i*25
- print(num)
- url = "https://movie.douban.com/top250?start=%s"% num
- yield Request(url, callback=self.parse)
-
- def parse(self, response):
- doubans = DoubanItem()
- names = response.xpath("//div[@class='pic']/a/img/@alt").extract()
- scores = response.xpath("//div[@class='star']/span[@class='rating_num']/text()").extract()
- quotes = response.xpath("//p[@class='quote']/span[@class='inq']/text()").extract()
-
- doubans['movie_name'] = names
- doubans['movie_score'] = scores
- doubans['movie_quote'] = quotes
- yield doubans
-
将获取豆瓣的排行250数据的电影名称和评分和相关简介进行爬取,并保持在douban.txt文件中
items.py
- import scrapy
-
-
- class DoubanItem(scrapy.Item):
- # define the fields for your item here like:
- # name = scrapy.Field()
- movie_name = scrapy.Field()
- movie_score = scrapy.Field()
- movie_quote = scrapy.Field()
pipelines.py
- class DoubanPipeline(object):
- def process_item(self, item, spider):
- for i in range(0, len(item['movie_name'])):
- with open('douban.txt', 'a', encoding='utf-8') as w:
- w.write(item['movie_name'][i] + ' 评分:' + item['movie_score'][i] + ' 介绍:' + item['movie_quote'][i] + '\n')
- return item
最后将settings.py的
ROBOTSTXT_OBEY=False,ITEM_PIPELIES = { 'douban.pipelines.DoubanPipeline': 300, }
通过scrapy crawl douban_movie 进行爬取动作,得到结果
作者:胡龙茶
链接:https://www.pythonheidong.com/blog/article/48968/65c4f9df2c895e30ba03/
来源:python黑洞网
任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任
昵称:
评论内容:(最多支持255个字符)
---无人问津也好,技不如人也罢,你都要试着安静下来,去做自己该做的事,而不是让内心的烦躁、焦虑,坏掉你本来就不多的热情和定力
Copyright © 2018-2021 python黑洞网 All Rights Reserved 版权所有,并保留所有权利。 京ICP备18063182号-1
投诉与举报,广告合作请联系vgs_info@163.com或QQ3083709327
免责声明:网站文章均由用户上传,仅供读者学习交流使用,禁止用做商业用途。若文章涉及色情,反动,侵权等违法信息,请向我们举报,一经核实我们会立即删除!