Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not taking into account UTF-8 though so maybe double or triple.


Encoding schemes like UTF-8 don’t affect compressed size much. What matters is the quantity of information.


You could probably do this with minimal overhead by organizing the words by language assuming a codepage for each set.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: