Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

windows probably defaults to latin-1


the default windows encoding is UTF-16, a long time ago it was Windows-1252 https://en.wikipedia.org/wiki/Windows-1252


Given the frequency with which Windows-12* mojibake occurs, people are either a number of holdouts still using Windows 98 SE, or there are a good number of paths in Windows that still use the non-Unicode encodings.


Windows supports Windows 98 API and it's more natural to use for some languages like C++. No change is planned there. Windows 98 API is also closer to Unix API, which can incentivize the programmer to use the same approach on windows and unix.


All windows needed to do is support setting that API to UTF-8. It's not like it doesn't already support multi-byte encodings. It's not like they even needed to even assign an ID for UTF-8 or implement the conversions - those existed already. All they needed to do is allow programs to set their codepage to UTF-8. This finally became possible two years ago. Better late than never I guess.


or CP-1251, in some locations.


There are a good number of them, all depending on locale.

In this case, I'd guess CP-1250, since 0xb3, from the error, decodes to "ł", from the name, in that encoding. (But not in CP-1251, or '52.)

if you want to see how to arrive there: https://news.ycombinator.com/item?id=28939960




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: