Scrapy 如何得到原始的 start_url - V2EX

Home Sign Up Sign In

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

This topic created in 3277 days ago, the information mentioned may be changed or developed.

Scrapy爬虫时，由于重定向或是其他原因，会导致原始的start_url发生改变，怎样才能得到原始的start_url?

def start_requests(self):
    start_url = 'your_scrapy_start_url'
    yield Request(start_url, self.parse)
    
def parse(self, response):
    item = YourItem()
    item['start_url'] = 原始请求的 start_url
    yield item

2 replies

1

revotu

Jun 28, 2017

Scrapy 爬虫常见问题总结 : http://www.revotu.com/scrapy-reptile-faq.html

利用 Request 中的 meta 参数传递信息

def start_requests(self):
start_url = 'your_scrapy_start_url'
yield Request(start_url, self.parse, meta={'start_url':start_url})

def parse(self, response):
item = YourItem()
item['start_url'] = response.meta['start_url']
yield item

2

knightdf

Jun 29, 2017

response.request.url

About · Help · Advertise · Blog · API · FAQ · Solana · 2688 Online Highest 6679 ·

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 · 37ms · UTC 03:49 · PVG 11:49 · LAX 20:49 · JFK 23:49
♥ Do have faith in what you're doing.