Hacker News new | past | comments | ask | show | jobs | submit login
A bug-fix 12 years in the making: Windows Unicode Support in OCaml 4.06.0 (dra27.uk)
85 points by _pvxk on Nov 4, 2017 | hide | past | favorite | 13 comments



Less than acceptable support for Unicode or it being a second class citizen is the most frequent reason why I ditch some interesting "exotic" languages.


Note that the key word here is "Windows", not "OCaml".

Less than acceptable support for Unicode or it being a second class citizen is the most frequent reason why I ditch some popular operating systems.


> Note that the key word here is "Windows", not "OCaml".

why is that?


Under "technical details" it goes into the why. What I understood is: ocaml tries to support pre-NT era strings, since that's what the console uses, but NT-era files use a different encoding, and ocaml internally uses UTF8 for everything. Since pre NT didn't handle file paths with unicode, they didn't either.

I'm not well versed in windows as to say if this was a clever way to solve the problem of console output, or if it was a really old hack that just now received some care and attention.


I don't think you can write a standard ANSI C program on Windows that opens a file specified on the command line where the file name contains characters not representable in whatever legacy charset Windows is using at the moment. At least that's what the situation was for many years. The article hints at some UTF8-related changes in Windows 10.

For almost every other system, the obvious code (fopen(argv[1], ...), basically) does the right thing. On Windows you have to enter some crazy non-portable parallel universe where not even the signature of main() is the same.

That's the reason why many programs don't support Unicode on Windows, despite there often being no reason for those programs to care about character encoding at all.


POSIX has zero support for GUI code or proper Unicode, of course it requires platform specific APIs, even if the target platform would be fully POSIX compliant.


Because “Unicode” on windows is UTF-16, which no one except windows supports these days.


it’s also the native encoding of both javascript and java, two extremely common platforms


Not very relevant to OCaml though.


And .NET.


The corollary to that is that these "exotic languages" are often such an improvement over the mainstream offerings, that such inconveniences have not prevented adoption in many cases.


What do you mean by that? A few langs support UTF-{8,16LE,16BE,32LE,32BE} + normalization out of the box: rust, c++, python, go, they all require library for that.


Very detailed article.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: