Langjan's recent timeline updates
Langjan

Langjan

V2EX member #344736, joined on 2018-08-27 19:41:13 +08:00
  •   You need to sign in to view this topic
    Per Langjan's settings, the topics list is hidden
    Deals info, including closed deals, is not hidden
    Langjan's recent replies
    urllib.request.urlretrieve() 函数不能提交 header 信息?
    下载图片时(等同直接访问图片地址)就会触发防盗,返回错误的图片

    这个方法也是不行
    content = requests.get(picurl, headers =headers).content
    with open('F:\\PyDowns\\zhuoku\\demo.jpg', 'wb') as fp:
    fp.write(content)
    打开 20180425160320(10)
    request headers 为
    GET 20180425160320(10) HTTP/1.1
    Host: com
    Proxy-Connection: keep-alive
    Upgrade-Insecure-Requests: 1
    User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
    Accept-Encoding: gzip, deflate
    Accept-Language: en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7,zh-TW;q=0.6

    其中图片的 request headers 为
    GET 2019-Bugatti-Chiron-Sport-10 HTTP/1.1
    Host: com
    Proxy-Connection: keep-alive
    User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36
    Accept: image/webp,image/apng,image/*,*/*;q=0.8
    Referer: 20180425160320(10)
    Accept-Encoding: gzip, deflate
    Accept-Language: en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7,zh-TW;q=0.6
    (由于论坛回复机制删除了 URL )
    图片地址 http://bizhi.zhuoku.com/2018/04/25/Bugatti/2019-Bugatti-Chiron-Sport-10.jpg
    在这个下 http://www.zhuoku.com/zhuomianbizhi/jing-car/20180425160320(10).htm
    htm 可以直接访问不用加 referer,在该页面显示图片正常,图片右击新标签页打开也正常
    (浏览器打开正常应该不是封 IP 吧)

    代码如下返回了错误的图片
    import urllib.request
    import requests
    import re

    headers = {
    'Host':'www.zhuoku.com',
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
    'Referer':'http://www.zhuoku.com/zhuomianbizhi/jing-car/20180425160320(10).htm'
    'Cookie':'cck_lasttime=1535598582789; cck_count=0; bdshare_firstime=1535598583191'
    }
    url = 'http://www.zhuoku.com/zhuomianbizhi/jing-car/20180425160320(10).htm'
    req = requests.get(url, headers = headers)
    req.encoding = 'GBK'
    html = req.text
    picurl = re.findall(r'<img id="imageview" src="(.*?)"', html)[0]
    picname = re.findall(r'thumbs/tn_(.*?)"', html)[0]
    path = 'F:\\PyDowns\\zhuoku\\' + picname
    urllib.request.urlretrieve(picurl, path)
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   4613 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 12ms · UTC 01:02 · PVG 09:02 · LAX 18:02 · JFK 21:02
    ♥ Do have faith in what you're doing.