Ian Bicking: the old part of his blog

Re: String hash vs. Unicode hash

Ian, all this talk about sys.setdefaultencoding() is really not very productive: sys.setdefaultencoding() was added to Python as part of a compromise in choosing a default encoding and originates from a time when we considered using the locale to determine the default encoding. It is not supported and was only left in the language to allow experiments (see the site.py module) - if you change the default encoding in Python, you're on your own. Expect problems with all kinds of things.

For the historical details, read up in the python-dev archives of the year 2000 when Unicode support was added to the language. A quick overview is included in a talk I gave at the EuroPython conference in 2002: http://www.egenix.com/files/python/EuroPython2002-Python-and-Unicode.pdf

On the subject: we took great care to make sure that ASCII Unicode gives the same hash value as an ASCII string. This does not extend to non-ASCII characters, regardless of the default encoding.

Comment on String hash vs. Unicode hash
by Marc-Andre Lemburg