But that is the problem. Unicode difficulties always come when you try to mix 8 bit strings with unicode strings. A solution ( and one of the best in my opinion ) is to do all string handling in unicode.
You read a string from a file, first thing you do is convert it to unicode. You write a string to a file ? Encode it. Inbetween, always use unicode. If there is some unruly code somewhere, then it's better to ask the author of the code to correct it then to add some workaround somewhere. It is the safest solution that way because there are less hacks involved.
If really there is some external code you can't change, then consider encapsuling the piece of code in an automatic convert/unconvert routine with utf8 as the encoding.
I have a hard time asking a library author to "fix" their library in this way. Because when they've fixed it for Unicode-using me, they've simultaneously broken it for everyone else.# Ian Bicking