The Big Browser: Python UnicodeDecodeError: 'ascii' codec can't decode byte

27 April 2009

Python UnicodeDecodeError: 'ascii' codec can't decode byte

You use Python 2.x and keep getting the notorious Python UnicodeDecodeError, 'ascii' codec can't decode byte?

Here is a hack which is especially useful in cases when you need to work with HTML.

Instead of

print(nasty_unicode_string)

use this:

print(nasty_unicode_string.encode('us-ascii','xmlcharrefreplace'))

This code will replace all the exotic non-ascii characters with their HTML-escaped ascii representation, i.e. Sueño will be replaced with Sueño, which will display properly when rendered in HTML.

Bingo! No more UnicodeDecodeError!

Warning: In most cases the UnicodeDecodeError is caused by programmer's lack of understanding of how Unicode is handled by Python, and is a signal that the entire approach to string handling should be revised. The hack above is not the proper way to fix it, and should be used with care. On the other hand, this approach is totally justified in cases when you need to print something to the console and are not sure whether the terminal supports utf8 or other unicode-friendly encoding.

2 comments :

AnonymousApril 29, 2010 at 8:48 PM
The last time I got this, what I actually wanted was conversion to hex string:

if ord(max(str)) > 128:
newChars = [hex(ord(ch)) for ch in str]
str = " ".join(newChars)
ReplyDelete
Replies
Will McGuganApril 29, 2010 at 10:42 PM
I do print repr(some_unicode_string)
ReplyDelete
Replies

Add comment

Search

The Big Browser

27 April 2009

Python UnicodeDecodeError: 'ascii' codec can't decode byte

2 comments :

History