V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
donglongtu
V2EX  ›  Python

Scrapy 如何得到原始的 start_url

  •  
  •   donglongtu · Jun 28, 2017 · 2806 views
    This topic created in 3226 days ago, the information mentioned may be changed or developed.

    Scrapy爬虫时,由于重定向或是其他原因,会导致原始的start_url发生改变,怎样才能得到原始的start_url?

    def start_requests(self):
        start_url = 'your_scrapy_start_url'
        yield Request(start_url, self.parse)
        
    def parse(self, response):
        item = YourItem()
        item['start_url'] = 原始请求的 start_url
        yield item
    
    revotu
        1
    revotu  
       Jun 28, 2017
    Scrapy 爬虫常见问题总结 : http://www.revotu.com/scrapy-reptile-faq.html

    利用 Request 中的 meta 参数传递信息

    def start_requests(self):
    start_url = 'your_scrapy_start_url'
    yield Request(start_url, self.parse, meta={'start_url':start_url})

    def parse(self, response):
    item = YourItem()
    item['start_url'] = response.meta['start_url']
    yield item
    knightdf
        2
    knightdf  
       Jun 29, 2017
    response.request.url
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   1054 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 29ms · UTC 18:44 · PVG 02:44 · LAX 11:44 · JFK 14:44
    ♥ Do have faith in what you're doing.