Hacker News new | past | comments | ask | show | jobs | submit login

This stems from the earlier Turkish 8-bit character sets like IBM code page 857, which Unicode was designed to be roundtrip-compatible with.

Aside from that, it‘s unlikely that authors writing both Turkish and non-Turkish words would properly switch their input method or language setting between both, so they would get mixed up in practice anyway.

There is no escape from knowing (or best-guessing) which language you are performing transformations on, or else just leave the text as-is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: