Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Letter Frequency (simia.net)
18 points by thefilmore on Oct 28, 2022 | hide | past | favorite | 3 comments


My kid went to a bilingual school that taught English via immersion (the English teachers didn't even speak German). In grade 1 they learnt the alphabet both in German class and English, one letter a week.

I was amused that in English they learnt A, then B, and so on in alphabetic order. In German they learnt by frequency (according to this chart, "enirst"; I remember enrst so probably missed the "i"). I already knew "eatoni shrdlu" so this amused me more than it should have.


Site seems overloaded. Internet Archive has a capture.

https://web.archive.org/web/20221028111744/http://simia.net/...


Looks very much web-based, and not cleaned properly. I conclude that because digits are pretty rare in a normal corpus, much rarer than x and y. The English list also has some punctuation included, and half of the Greek alphabet. The counting didn't exclude proper names and formulas, I suppose. So if you want to identify the domain of a Wikipedia page based on 1-grams, this is helpful; otherwise, less so.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: