V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
bigdude
V2EX  ›  问与答

为何curl取不到内容?

  •  
  •   bigdude · Jul 16, 2012 · 4337 views
    This topic created in 5033 days ago, the information mentioned may be changed or developed.
    各位试试这个 curl "http://brand.tmall.com/azIndexInside.htm?firstLetter=A&prt=1342414752421&prc=5" 能否取到内容。

    初步研究貌似跟referer、useragent等无关。
    7 replies    1970-01-01 08:00:00 +08:00
    yujnln
        1
    yujnln  
       Jul 16, 2012
    可以。
    >>> print len(content)
    87031
    bigdude
        2
    bigdude  
    OP
       Jul 16, 2012
    @yujnln 你用的python?我用urllib2老是告诉我
    urllib2.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.
    The last 30x error message was:
    Moved Temporarily
    yujnln
        3
    yujnln  
       Jul 16, 2012
    bigdude
        4
    bigdude  
    OP
       Jul 16, 2012
    抓狂·······
    >>> a=urllib.urlopen('http://brand.tmall.com/azIndexInside.htm?firstLetter=A&prt=1342414752421&prc=5')
    >>> len(a.read())
    0
    bigdude
        5
    bigdude  
    OP
       Jul 16, 2012
    @yujnln ok了,必须要带cookie,不带不让抓。
    est
        6
    est  
       Jul 16, 2012
    bigdude
        7
    bigdude  
    OP
       Jul 16, 2012
    @est 了解了,强制让curl follow这个链接,用-L就行了,搞不懂淘宝为何搞这么多跳转
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   5949 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 62ms · UTC 03:34 · PVG 11:34 · LAX 20:34 · JFK 23:34
    ♥ Do have faith in what you're doing.