Ian Bicking: the old part of his blog

Re: Why Python Unicode Sucks

As I mentioned in a comment to the previous post: I setdefaultencoding to utf8 and it pretty much seems to work. No use though for shared hosting environments for code you're distributing for others to use.

I don't think the python situation is so terrible - I've been tempted to look at Rails and/or Borges and/or Seaside lately; one of the things holding me back (apart from the sheer wierdness of the Squeak environment) is that I get the impression unicode handling is even flakier in Ruby & Squeak. That's purely from a few minutes' googling in both cases and I may be wrong.

Comment on Why Python Unicode Sucks
by Alan Little


A few more minutes strolling through comp.lang.ruby strengthens my "it's all about the language's inventor's native language" theory - Ruby's BDFL is of course Japanese, and it seems one reason Ruby is behind python on unicode support (yes, really - or at least lots of Ruby people appear to think so) is the widely-held Japanese belief that Unicode is a tiny and inadequate character set. And for all I know they may be right, although it's good enough for all the languages I think I'm ever likely to be interested in so (just like ASCII-only American programmers) I don't care.

# Alan Little

I get the impression that some Japanese people are annoyed that their text is especially space-inefficient in UTF-8. But I don't quite get it -- disk is cheap, RAM doesn't fill up due to native-language text, and gzip encoding should mitigate the bandwidth issues.

# Ian Bicking

you can get a better sense of the issue starting here: http://en.wikipedia.org/wiki/Han_unification

# anonymous

At least in Ruby my impression is that they only have bytestrings, and a module iconv to convert between encodings. And it's just a wrapper around system libraries, which seems like it's opening up a whole can o' worms of cross-platform compatibility. I'll give credit to the people who handled Python Unicode -- they stepped up and took on a whole lot of work that I expect was both boring and tedious, to give us a very complete and reliable foundation.

This seems strange to me that Ruby isn't better, since Ruby is Japanese and you'd think they'd care even more about encodings. There's even a separate module for Japanese encodings -- Python appears to have better Japanese support than Ruby! It has 13 Japanese encodings built in. I'd tear my hair out if I had to deal with 13 encodings for one language. But at least, bald or not, I'd appreciate the support.

# Ian Bicking

It looks to me like the Ruby example is a case of the perfect being the enemy of the good. Matz, being a smart Japanese programmer and therefore much more keenly aware of Unicode's faults and failings than most people, is therefore reluctant to regard it and use it as the be all and end all of string handling, a la Java or C#. If you google a few of his comp.lang.ruby postings on the subject, it appears he has big ideas, maybe even a prototype, of some kind of better-than-unicode ultimate international text handling. But it Isn't Ready Yet (will it ever be?) and meanwhile Ruby struggles along with some kind of half-working implementation of utf8.

# Alan Little