Hacker News new | past | comments | ask | show | jobs | submit login
How efficiently does Morse code encode letters? (johndcook.com)
79 points by CarolineW on Feb 23, 2017 | hide | past | favorite | 27 comments



I never got around to actually learning Morse Code, but I think the English language corpus is probably dramatically different from the messages that are actually sent via Morse Code. My understanding is that abbreviations are commonly used, greatly skewing the letter frequencies: https://en.m.wikipedia.org/wiki/Morse_code_abbreviations


There are also a large number of Q codes that cover practically everything related to communication and also naval, aeronautic and rescue concerns. Today much of this isn't in use, but amateur radio operators use a subset of these routinely; a typical fast Morse code exchange is often composed almost entirely of Q codes and call signs.

[1] https://en.wikipedia.org/wiki/Q_code


I became aware of Q codes through their use in the book Seveneves. It provided a brief glimpse into a subculture that I've never had much contact with, and I ended up in a bit of a Wikipedia crawl.


I was lucky enough as a kid to inherit my grandfather's collection of old sci-fi books. I first became aware of Q codes through the short story "QRM Interplanetary" from 1942 (https://en.wikipedia.org/wiki/Venus_Equilateral).

I find it interesting that Q codes have changed so little over the last century. I have some ham radio friends who will use Q codes when talking face-to-face with each other, and the codes they use are basically the same ones from the original list.


I'm glad you commented. I was sitting here racking my brain, trying to remember what book I'd read that involved Q codes. Then I saw your comment, and it was indeed Seveneves.


Military morse users would predominantly send ciphertext, so the letter distribution would (hopefully) be random.

Professional morse users often sent very condensed messages using a variety of abbreviated formats. For example, here's an excerpt from an NOAA aviation weather forecast:

WA CASCDS WWD CSTL SXNS...SCT025 SCT050. 06Z SCT020 BKN050 TOP 100. OTLK...MVFR CIG. OLYMPICS...BKN-SCT060 TOP 080. ISOL -SHSN. 04Z SKC. 08Z BKN050 OVC150 TOP FL200. OCNL VIS 3-5SM IN SCT -SHSN. OTLK...IFR CIG SHSN.


Some of the early telegraph messages resembled text messaging:

From 1890: The First Text Messages http://sundaymagazine.org/2010/08/from-1890-the-first-text-m...

LOL in the Age of the Telegraph https://scroll.in/article/752746/lol-in-the-age-of-the-teleg...


Wired Love: A Romance of Dots and Dashes (1879) describes a long distance romance conducted "online", from the point of view of a female telegraph operator, where neither party knew what the other looked like or even the sound of their voice. Describes the virtual communities of telegraph operators and the culture surrounding them. It's a wonderful book.

http://www.kristinholt.com/archives/5686 -- a review of the book

Copies can be found at archive.org, Gutenberg, and Amazon.

What's also interesting to me are the telegraph hacks during the American Civil War. Tapping into enemy wires and sending misleading orders, or just listening in. Here's some discussion on it:

http://civilwartalk.com/threads/hacking-into-the-telegraph-s...


I have patents 6418323, 7831208, 6850782 which cover using Morse code on a phone to send and receive text messages.

The advantage is you don't have to look at the phone's display when composing or 'reading' text messages. The phone can be in your pocket and you can send/receive messages without bothering others.

Obviously, it never caught on, but I thought it was a fun idea.


This is actually something I would use -- say, while driving. Emoji would probably throw me off, though.

I learnt Morse at age 12 or so, actually used it regularly on amateur radio for quite a while, but haven't used it for probably about 20 years. I can still "read" it when I hear it, without even thinking about it.


i use a variation of this. With custom "ring tones".. vibration patterns. I encoded my common contact's names as morse.

dad, mom, sis, matt, work.. etc.


Morse code on an apple watch:

https://www.youtube.com/watch?v=wydT9V39SLo


The article is looking at efficiency in terms of time to transmit a given message. For that you want to assign the shorter codes to the symbols with the highest frequency. Morse did a decent job of that, except that assigning '---' to 'O' seems way off, since '---' is in the top 5 for longest code length, whereas 'O' is one of the 5 more frequent symbols.

I wonder what would change, if anything, if instead we considered things from the receiving side, and put a bound on the acceptable error rate? That '---' for 'O' really stands out when listening to code. The only other letter that was a '---' in it is J ('.---'), which only occurs about 2% as often as 'O'. Maybe 'O' being so distinct and easy to hear, and frequent enough that you will have an 'O' every 10 or so characters, helps keep the listener synchronized?

Early Morse code was sent fully by hand, and so the timing would not be precise. The timing is supposed to be, in units of the length of a dot: 1 for a dot, 3 for a dash, 1 for the space between adjacent dots and dashes within a character, 3 for the space between characters in a word, and 7 for the space between words. A good, experienced operator would hit that timing very accurately, but less experienced operators could be quite a bit off.

Someone whose timing is off might shorten the gap between characters enough that it might run dashes from the end of one character and the start of the next together. For example, in the word 'awkward', the 'wk' sequence becomes '.-- -.-' and if the person did not give as big a gap as they should between the words, you could run the trailing '--' from the 'w' and the leading '-' from the 'k' together giving a '---', but even in that situation I don't think it would sound like an 'O', because with an 'O' you go into it trying make 3 evenly spaced things, and we are good at that. We might get the spacing off, but we'll get it off the same way uniformly throughout the 'O'. An accidental 'O' from running two things together won't have that uniformity, and so I think it would stand out.

In other words, I'm guessing that the apparently anomalous assignment of a seemly too long code to 'O' actually servers to make communication more accurate in the presence of inexperienced senders and receivers.


On the other hand, the letter O in the older American Morse was relatively short because it contained an "long" internal gap: "dit-dit" (as opposed to I being "didit" and the "word" EE being "dit, dit"). If the regular gap is the length of a dot, and the inter letter gap is the length of three dots, the length of the gap in O was two dots.

International Morse eliminated the "long" internal gap (according to Wikipedia [1], this had an advantage on the first long undersea cables), so O had to be re-encoded. '---' ("dahdahdah") was the only three-element code not already being used for a letter. (It happens to be the number 5 in American Morse.)

When I was a Novice-class ham many years ago, I found that older hams would sometimes send the pro sign C (in the sense of "confirm" or "yes") as didit-dit instead of dahdidahdit. I never really understood why; I just went along with it. Turns out didit-dit is C in American Morse.

[1] https://en.wikipedia.org/wiki/American_Morse_code


"Early Morse code was sent fully by hand"

The earliest Morse telegraphy systems were actually built around paper tape. Operators figured out that they could transcribe directly from the sound of the machine, obviating the need for the costly and difficult to deploy equipment. The skill developed from there with equipment evolving to accommodate human operators in real time.

"Someone whose timing is off might shorten the gap between characters enough that it might run dashes from the end of one character and the start of the next together."

Variations between operators exist and specific operators can be identified by their "fist" alone, but the variations are consistent for a given sender so receivers adapt relatively easily. Noise is the bigger problem. Energy from a lighting strike a thousand miles away ricocheting around the ionosphere can wipe out lots of dits and dahs and receivers either correct incoherent transmissions when they can or request something be repeated.


Early Morse code was sent fully by hand, and so the timing would not be precise.

Correct. My grandmother was an operator during WWII in Australia, communicating with various places northward such as Papua New Guinea. She said that she could easily recognize individual operators via their timings.


This has actually been proposed as the explanation for the distribution of word lengths seen across natural languages: http://www.pnas.org/content/108/9/3526.long.


It's not just frequency/density but also the value for forward error correction: codes (code points) that can more easily be mistaken for others should have such different semantics that an error is obvious. For example though e and a are the two most common letters in the English language, perhaps use the shortese code for e and the next shortest for q, so that a misunderstanding causes "went" to become "wqnt" rather than "want. And indeed perhaps e does not deserve the shortest code because of its importance.


English is already pretty redundant, you only need about one bit per character for most text. You can see this if you compress a block of English text; you'll probably get 12.5% of the original size which is one bit per eight-byte character.


Yes, but morse is being decoded in realtime by a piece of meat (a trained NN admittedly, but different senders have different "fists" so there's a lot of variation). The problem is that some character errors are ambiguous.


Hwoever thows errs pars fiarle easly

People on both ends tend to adapt quickly. A few brief, known exchanges will typically result in longer ones going fairly well.

And "the meat" does all sorts of stuff. Names get shortened or changed for brevity or style. Gark might be gary, for example.

One group I observed used a lot of first letter strings:

BBQSTSP?

Y

K

Barbecue, same time same place?

Yes

OK


True, but even though Morse is a denser encoding than ASCII, there is often enough redundant information to infer the correct letter from context. A/E seems a really bad case, but I don't think we have to go all the way to A/Q to fix it.


If you liked this article, you should check out a blog post giving a visual introduction to information theory[1] by Chris Olah

[1]: https://colah.github.io/posts/2015-09-Visual-Information/


SOS was meant to be easily recognizable and trivial to generate:

    ... --- ... --- ... --- ... (and so on)


Actually, it is sent sequentially as a single nine-digit signal, then repeated. Graphically this looks like ...---... ...---... ...---... and so on.

Incidentally, graphical representations of Morse code are weak tea. It has to be heard to be understood.


Note that in addition to the Morse Code that roughly equates to "printable" ASCII, there are prosigns [procedural signals] which are control characters that affect things like transmissions and message formatting.

https://en.wikipedia.org/wiki/Prosigns_for_Morse_code


They also had specialized code books for different industries so that increased compression quite a bit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: