Hacker News new | past | comments | ask | show | jobs | submit login
The curious case of the disappearing Polish S (2015) (medium.engineering)
159 points by dsr12 on May 21, 2021 | hide | past | favorite | 33 comments



My favorite Polish language-related computer problems:

1. windows installing both programmers' and typists' Polish keyboard layouts by default + shortcut to change between them (I think it was alt+shift+space or even alt+space ?). Most people had no idea there even are 2 layouts and if they randomly pressed this shortcut and logged out - the next time they tried to log in - Z and Y are switched and if they had Z or Y in their password - they can't log in. This one caused millions of support calls.

2. corporate computers with English and Polish layout installed - if you happen to have layout switched to English and try to write an email in outlook - you will likely start with "cześć" ("hi") and when you press alt+s for "ś" it will send the "cze" as the full email. "cze" is very casual "hi" like writing a business email: "yo".

3. same as above but in eclipse with some version control plugins alt+s committed the code. Especially frustrating in cvs/svn.

4. python2 code often breaks if you have Polish letters embedded in code as string literals. It depends on the default system coding page vs default python coding page and other stuff, for some reason it's not utf-8 by default in python2. The solution is to use python3 or mess up with the default settings.


- Installing KeePass used to silently disable "ą" key (AltGr+A hotkey). "KeePass breaks system of every Polish user immediately after being installed." https://sourceforge.net/p/keepass/discussion/329220/thread/d... A warning was added later.


Surely this is down to one key error (pun intended) - Medium deciding to override a standard browser shortcut 'for the good of their users'. If they needed a manual save function then I might understand it, but they tried to be clever and made things worse in a subtle way.


That's a really well-written blogpost! I was expecting it to be much more surface-level (something to do with character codes - still possibly interesting!) but it had new info for me on several levels (the background of the bug, and the personal history stuff didn't feel too fillery to me).


I’m glad that they actually fixed the issue instead of just deciding to use a workaround, like copy/paste Ś from another application.


Discussed at the time:

The curious case of disappearing Polish S - https://news.ycombinator.com/item?id=8986920 - Feb 2015 (117 comments)


An interesting fact for non Polish speaking readers: for non-official writing we often don't use diacritic characters at all. It makes writing faster. With the raise of spell checkers it fades out, but still, if you write without diacritics often you will be well understood.

Second interesting fact: it is very popular for software and online apps, especially not developed in Europe to ignore diacritics. Not only polish ones, but also french, german etc. You get weird characters instead or can not write properly altogether. I hope the article will put a highlight on the issue.


Sure, but sometimes leaving out Polish diacritics makes the whole sentence ambiguous, or at the very least harder to read. I personally despise people doing that.

More to the article's point: there were countless times where I accidentaly sent unfinished email by trying to type "ś".


Czechs sometimes write without diacritics as well, but the resulting ambiguities are funny.

For example, a customer of my e-shop gave his street as "Skretova".

So, is it "Škrétova" (named after a famous Baroque painter) or "Skřetova" (Goblin's street?)

Of course, most Czech cities do not honor goblins in their street plan...


It's surprising how much software (mostly on Windows) doesn't properly handle Unicode in 2021. With something working in Unicode, it's not that big of a deal to both handle letters like Ł or ż and also to run normalization on text strings so that you can (if desired) treat Łódź and Lodz as identical (e.g., for text searching).


Just a note that normalization is not the same as diacritic insensitivity. Normalization is the process by which strings that are semantically equivalent (by some standard), yet have different underlying byte sequences, are transformed to have the same underlying byte sequence. For instance, replacing “e, combining acute accent” with “e with acute accent”.


Yes, I should clarify that the process is to normalize to letter+combining accent and then strip out the accents, e.g., in Java:

  Normalizer.normalize(text, Normalizer.Form.NFD).replaceAll("\\p{M}", "");
(I hope I got this right—I adapted it from some code which actually strips out some other bits of text to turn, e.g., F. Janaček into FJanacek.)


I wonder how much computers and spell-checkers are reducing the evolution of writing systems?

In the past people could naturally stop using diacritics or make letter substitutions. Over time the language might eliminate their use. That seems less likely now.

Similarly introducing a new letter seems rather difficult in the computer age.


I feel like that's pretty true worldwide for languages with diacritics, at least for all the Latin-script languages I'm familiar with.

It's really no different from people in English shortening "you" to "u" in texting as well, or "lol". Everybody saves keystrokes wherever they can.


It's popular for some reason even in countries where it doesn't save keystrokes. For example in Croatia we have 5 characters with diacritics (š, đ, č, ć, ž), all of them have dedicated keys on the keyboard and yet many people have a habit of simply not using them.


In German, ä would never be replaced by a (if all you have is ASCII the proper replacement is ae), except by foreigners who assume that diacritics don't matter.


I've noticed that I automatically lose some respect towards a person if I find that they don't use diacritics in their writing. Especially in a professional setting.


1. łaskę mi robisz

2. laske mi robisz

--

1. you are (not really, pejorative) doing me a favor

2. you are doing me a blowjob

...must be one of better ones.


Typing without diacritics is also popular in Croatia in casual scenarios


    e.metaKey || (e.ctrlKey && !e.altKey)
seems to me like an exceptionally strange choice. Why not an exclusive-or? The thing they want to avoid is a false-positive on both being pressed, so test for that directly.


e.metaKey is the `⌘Command` key on Mac (used for the save shortcut), an entirely different key than the ones involved in the bug.

Also, Javascript doesn't have a logical xor operator, so trying to do that would potentially reduce readability.


> Also, Javascript doesn't have a logical xor operator, so trying to do that would potentially reduce readability.

I also didn't know about any operator to logically xor two boolean variables (thought about (ab)using JavaScript's implicit type conversion mechanisms: `x ^ y`), and then I learnt that `!=` works fine as a logical xor for booleans. Tada!

I don't know how much readability is reduced by this.


> Javascript doesn't have a logical xor operator

x != y


A common hack, and that's where the "potentially reduce readability" caveat comes in.


Although with the looseness around booleans in JS, I'd imagine that you might need to do something more like `(x==true) != (y==true)`


Wouldn't using XOR trigger a save on Alt + S? Unless I'm missing something with why the meta portion should have the XOR here instead.


The only place XOR could be used is to replace the OR. As you say, replacing the AND NOT with an XOR would supress Alt + s.

Even replacing the OR with XOR is not obviously an improvement as it is not clear that we know what the correct behavior would be in the edge case where a keyboard event is emitted with both the control and command flags set.


This is a really common failure mode - people forget to explicitly assert that the other modifiers are off when checking for a modifier being on. I had to go through and fix all the ones in matrix-react-sdk (element web) a few years ago: https://github.com/matrix-org/matrix-react-sdk/pull/825/file...


Taking “5 why’s” to a whole new level!


I wonder if it is a similar reason why currently on MS Teams I can't type the letter ń.


Meanwhile Cisco Webex still exhibits this bug


[flagged]


Genuinely curious - in your mind, what would be "addressing it elegantly" in this case?


But it’s got photos of some fine keyboards made by IBM and stuff!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: