做页面转码时出现乱码，如何删掉乱码？ - V2EX

Home Sign Up Sign In

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

This topic created in 3614 days ago, the information mentioned may be changed or developed.

做页面转码时出现了这个问题，同网站连续几个页面有的能转码，有的报错不能转。经过细致的检查后发现不能转码的页面有乱码，在页面中显示方框。
拿下面这个例子来说， 0 位 1 位、 3 位 4 位各是一个汉字，问题出在 2 位上， 2 位报错：
aa = b'\xb8\xad\xa4h\xd0\xc2'
bb = aa.decode('gbk')
print(bb)

UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 2: illegal multibyte sequence

我现在想既然系统能确定哪个位置有问题，那在出现问题时把这个位置的字符删掉不就行了吗？
不知道这个想法可不可行，不知道怎样把系统找出的这个位置传到变量中，请指教。

aa = b'\xb8\xad\xa4h\xd0\xc2'
try:
bb = aa.decode('gbk')
except UnicodeDecodeError:

3 replies • 2016-09-02 12:12:06 +08:00

1

lovedebug

Sep 1, 2016

正确方式不是应该按页面指定的编码方式解码么- -

2

omg21

OP

Sep 1, 2016

@lovedebug 页面上的编码就是 gbk ，但现在是有乱码，一个乱码导致整个页面都没法转换

3

DarkFenrir

Sep 2, 2016

这样行不行

aa.decode('gbk', 'ignore')

About · Help · Advertise · Blog · API · FAQ · Solana · 3904 Online Highest 6679 ·

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 · 25ms · UTC 00:11 · PVG 08:11 · LAX 17:11 · JFK 20:11
♥ Do have faith in what you're doing.