M y    b r a i n    h u r t s  !                                           w e                 r e a l l y                 t h i n k                   w h a t                y o u             k n o w

27 April 2009

Python UnicodeDecodeError: 'ascii' codec can't decode byte

You use Python 2.x and keep getting the notorious Python UnicodeDecodeError, 'ascii' codec can't decode byte?

Here is a hack which is especially useful in cases when you need to work with HTML.

Instead of

    print nasty_unicode_string

use this:

    print nasty_unicode_string.encode('us-ascii','xmlcharrefreplace')

This code will replace all the exotic non-ascii characters with their HTML-escaped ascii representation, i.e. Sueño will be replaced with Sueño, which will display properly when rendered in HTML.

Bingo! No more UnicodeDecodeError!

Warning: In most cases the UnicodeDecodeError is caused by programmer's lack of understanding of how Unicode is handled by Python, and is a signal that the entire approach to string handling should be revised. The hack above is not the proper way to fix it, and should be used with care. On the other hand, this approach is totally justified in cases when you need to print something to the console and are not sure whether the terminal supports utf8 or other unicode-friendly encoding.


  1. The last time I got this, what I actually wanted was conversion to hex string:

    if ord(max(str)) > 128:
    newChars = [hex(ord(ch)) for ch in str]
    str = " ".join(newChars)

  2. I do print repr(some_unicode_string)