Ian Bicking: the old part of his blog

Re: Do I hate Unicode, or Do I Hate ASCII?

There is no such thing like working Unicode support in any language. :-/ The main problem is that there are interfaces needed to the real world - and the real world doesn't like Unicode, it only accepts utf-8. But several libraries try to be smart. For example django uses the email module to parse multipart POST data - that's a logical choice, as HTTP multipart POST data is actually just mime attachements. Of course, email produces unicode strings if you pass in something that's defined as utf-8. And that produces problems in the django code. Or take sqlite3 for example: the pysqlite2 binding allways returns unicode strings. Except if you store data in the database directly with the sqlite command - then it stores what you pass in and what your local encoding is. It's iso-8859-1 in my case - so I can add stuff to the sqlite database that can't be read by the pysqlite2 library. With django this required me to write a row factory that removes the unicode string and replaces it by a bytestring in utf-8 encoding, just to make the django code happy.

Oh, and your blog doesn't accept Umlauts in the comments - it produces a traceback with UnicodeDecodeError :-)

Comment on Do I hate Unicode, or Do I Hate ASCII?
by hugo

Comments:

Oh, and just before anybody mentions it: both the email module and the pysqlite2 library will happily deliver bytestrings in other situations, so you can't just say to hell with it and all unicode. The email module will return non-utf-8 stuff as bytestrings and the pysqlite2 library will give a registered converter not the unicode string but the raw bytestring. For example if you register a (lambda s: s) as the converter, you will get bytestrings.

# hugo

Oh, and your blog doesn't accept Umlauts in the comments - it produces a traceback with UnicodeDecodeError :-)

Haha, you took my dare and YOU LOST! I actually knew you would, it's one of those Unicode errors that I'm too cowardly to try to fix. In fact, I'm pretty sure I introduced it when I was trying to fix a related encoding problem. There's a very high likelihood of regressions when you try to fix unicode-related errors.

# Ian Bicking