Python 提取 POST 返回的 Response

<table id="response" border="0" cellpadding="0" cellspacing="0">
<tr><td class="shorturl">http://test/shorturl</td><td class="longurl"><a href="http://test.long.url/" target="_blank">http://test.long.url/</a></td></tr>	</table>

看了下面两个帖子弄了半天还是没搞定，求助各位

先谢谢各位~

Python

response

post

提取

21 replies • 2016-04-10 22:22:19 +08:00

eoo

Apr 10, 2016 via Android

要 POST 的地址呢？

haomni

Apr 10, 2016

@eoo 感谢回复，从 POST 返回的结果中抓取 URL 应该不需要原来的 POST 地址吧……

virusdefender

Apr 10, 2016

# coding=utf-8
import re

html = """
<table id="response" border="0" cellpadding="0" cellspacing="0">
<tr><td class="shorturl">http://test/shorturl</td><td class="longurl"><a href="http://test.long.url/"
"""

print re.compile('<td class="shorturl">([\s\S]*?)</td>').findall(html)[0]

haomni

Apr 10, 2016

@virusdefender 感谢，但是这个找出来的是 shorturl
我换成 longurl 之后结果是：
<a href="http://test.long.url/" target="_blank">http://test.long.url/</a>

还是没达成目的……

uyhyygyug1234

Apr 10, 2016

uyhyygyug1234

Apr 10, 2016

这样可以不过是在太丑了。应该上 bs4 ， pyquery 之类的额

haomni

Apr 10, 2016

@uyhyygyug1234 大侠结果好像不太对啊也可能是我 Reponse 结果没有贴全的缘故

>>> print re.compile('href="(.*)"').findall(req.content)[0]
/screen.css

class="longurl" 这个在整个 Response 中是唯一的，现在要的是取后面那个指向链接

sh4n3

Apr 10, 2016

用 .longurl a 这样的 css Selector 就好了。

ericls

Apr 10, 2016

直接 pyquery 来搞

eoo

Apr 10, 2016

@haomni 用 PHP 很容易

<?php

$str='<table id="response" border="0" cellpadding="0" cellspacing="0">
<tr><td class="shorturl">http://test/shorturl</td><td class="longurl"><a href="http://test.long.url/" target="_blank">http://test.long.url/</a></td></tr> </table>';

$zz='#<td class="longurl"><a href="(.*?)" target="_blank">.*?</a></td>#';

preg_match($zz, $str, $matchs);

print_r($matchs);

haomni

Apr 10, 2016

@uyhyygyug1234
@ericls
试了下 PyQuery ，可能我用法不太对
print doc1('class:contains("longurl")')

@eoo 不准备再换 php 了，其它都写好了

eoo

Apr 10, 2016

@haomni 好吧，写的什么？

seki

Apr 10, 2016

为啥不用 beautifulsoup 或者 lxml

seki

Apr 10, 2016

嗯比方说你的 bs4 提取失败的代码是什么样的

haomni

Apr 10, 2016

感谢各位， PyQuery 不太会用
在 @uyhyygyug1234 的基础上再用一次正则就搞定了

@seki 唉，虽然有心想用，但是不会啊……

longchisihai

Apr 10, 2016

from bs4 import BeautifulSoup

html = '''<table id="response" border="0" cellpadding="0" cellspacing="0">
<tr><td class="shorturl">http://test/shorturl</td><td class="longurl"><a href="http://test.long.url/" target="_blank">http://test.long.url/</a></td></tr> </table>'''

soup = BeautifulSoup(html, 'lxml')

longurl_tag = soup.find('td', class_ = 'longurl')

print (longurl_tag.contents[0].get('href'))

haomni

Apr 10, 2016

@longchisihai 简直完美，感谢！

haomni

Apr 10, 2016

大致的样子有了，
弄了一宿，先去睡一会，醒了再测，先上个图压压惊
再次感谢各位技术帝帮忙，稍后会将作品上传到 Github 开源

hzlzh

Apr 10, 2016

干得漂亮~

haomni

Apr 10, 2016

@hzlzh 都是在前辈的基础上改的还有一些细节没有完善
弄好了我再联系你~

davidx

Apr 10, 2016

这个时候你们需要 https://daimaduan.com