Ian Bicking: the old part of his blog

Re: Why Python Unicode Sucks

Having dynamic types does NOT meant that you can be sloppy about what your functions should expect and can handle. And this comment is not just valid for string types, it is for all types. If you're not thinking about what types you are working on, then you will suffer from those encode/decode errors. There is no fix for this. Once you delineate clearly what objects you are handling and you are publishing that you are handling, and you write your code accordingly, the Python unicode system is just fantastic.

Once trick that you might want to use: use a variant of Hungarian notation to indicate the expected type, e.g.

uname = name.decode('latin-1')

or

name_u = name.decode('latin-1')

Sometimes when there is a chance for confusion, I even mark the encoded strings:

buffer_utf8 = ...

Maybe you could even have another suffix for when the object your handling is _either_ of the string types. I used to suffer the same plight, until the day I decided to sit down and really understand how Unicode works, and then I made decisions in my source code to always think about which kinds of string I'm handling where. Now I never have any troubles anymore. Dynamic typing means that it is easy for you to make mistakes. Make decisions and add assertions in your code to ensure that you're moving the correct types between functions.

Also, dealing with Unicode strings is not as efficient as simple encoded strings (e.g. data), so both data types need to remain. This problems is thus not likely to go away.

To me all this ranting is just telling that you've been sloppy about which types you are working with. The problem is not Python, the problem is this habit that we all fall into at some point to not look at the problem straight in the face and to spend some time understanding all the details (granted, I suppose that's what you're doing now, but with a lot of blog noise...).

If you just take the habit to decide, everywhere, all the time, which types of string object you're accepting (str, Unicode, or string-or-unicode), your problems will go away.

Comment on Why Python Unicode Sucks
by Martin