*Why not just have a non unicode string type and a unicode string type?* I think...

Why not just have a non unicode string type and a unicode string type?

I think a better idea would be to have a string type that is encoding-agnostic (and internally might depend on compilation switches) but conceptually isomorphic to numbers in the range [0, 2^32) and a binary-data type which may have an encoding "annotation". And then the string type should be mostly not used, especially in the standard library. That way the cost of dealing with transformations among various encodings will be borne by those who care about those encodings, which isn't something that can be said of python3.

If one were developing a language from scratch, it would also be convenient that loads of legacy code doesn't assume the triviality of transformations between strings and binary data, or between different encodings of binary data. Obviously python3 couldn't rely on this convenience, but I wonder if we weren't making more trouble for ourselves by setting an expectation that str would be used all the time, and bytes only when absolutely necessary. I suspect the opposite custom would have been less troublesome.