Hacker News new | past | comments | ask | show | jobs | submit login

This is the modern, post-ASCII computing world, we should no longer be willing to settle for the lowest-common-denominator of ASCII-only strings.

There's no excuse for actively supported, paid products to have these problems today.




Especially if those products are developed by a company from Russia, where Cyrillic is used. For me, a Russian myself, this situation is honestly ridiculous.


Russian companies generally have ascii-only username policies


Do you write "if" statements in Cyrillic when you write in <insert Python/Ruby/Java/.NET/whatever>?


No. Keywords are ASCII everywhere (no, APL's are not words). Mixing English in keywords and non-English in identifiers feels odd.

Algol-68 supported localized sets of keywords; fortunately this language is gone.

You can #define non-ASCII stuff in modern C++. It's your best chance to "localize" a mainstream language.

Same would work for Clojure, but Lisp uses a lot of quirky abbreviations like `cdr` or `setq` that give awkward translations.


It would be very amusing to see "если" in an if statement, given how much it looks and sound like "else" at a brief glance.


There are a few computer languages that have non-english keywords though! And among them it looks like there was a version of Algol with Russian keywords, as well as a bunch of others in the list. Scanning it it would seem that Logo and BASIC get translated a lot, which makes sense for teaching young learners who haven't learned English yet.

https://en.wikipedia.org/wiki/Non-English-based_programming_...


Why not? Several programming languages in the olden era did get localised.


I thought ArnoldC was just a couple of #define's, but looks like it isn't.


True. But these actively supported, paid products build upon layers and layers of no-longer-supported, free/opensource products. Good luck fixing them.

Not saying that this is OK, just explaining why using non-ascii characters, in this day and age, is still asking for trouble.


This is on the Windows version.

Windows 2000 is when the OS changed to UTF-16 by default. Before that Windows NT was UCS-2, IIRC only the DOS-based Windows versions were Windows-1252 internally, starting from Windows 1.0. So while ł wasn't supported in Windows 1, characters like ñ were. Windows has literally NEVER been an ASCII-based OS.


Sure, but having used a lot of the windows system apis (admittedly - a lot of years ago) it was a complete hodgepodge of which api would take a char vs a wchar, and then they tried to hide the whole thing behind tchar, which just made it even harder to keep track of.

Basically - I agree: This shouldn't be a problem, and 7 months is a long time to wait for a basic fix. But there are a lot of footguns hanging around in windows code with respect to character encodings.

Just looking at the first result on google for "c++ get windows home directory" shows this: https://docs.microsoft.com/en-us/windows/win32/api/userenv/n...

Which takes a long pointer to tchar string (LPTSTR) - so this behavior is dependent on the unicode settings of the project at compile time, even today.


> Which takes a long pointer to tchar string (LPTSTR) - so this behavior is dependent on the unicode settings of the project at compile time, even today.

The documentation is simply wrong, GetUserProfileDirectoryA which you linked always takes a LPSTR (always "ANSI") while GetUserProfileDirectoryW always takes a LPWSTR (always WTF-16). This is reflected in the function prototype at the top. Only the define GetUserProfileDirectory switches between these two. The define is a compatibility hack and arguably was a mistake, but you can always the W-suffixed function no matter what the project settings are.


> Windows 2000 is when the OS changed to UTF-16 by default.

Paths are UTF-16 + unpaired surrogates, so a Windows path isn't legally representable in UTF-8.


But WTF-16 paths can be converted to WTF-8 just fine. You can even use the same algorithm, just only pair surrogates if they are matching an otherwise interpret them as UCS-2 values and encode those normally to "UTF-8".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: