Hacker News new | past | comments | ask | show | jobs | submit login

There is a similar great project here [1] with the Hungarian Wikipedia corpus. Great workout for non English and maybe non-ascii operations.

The performance of Java there is super impressive. It should port relatively quickly to this file too...

[1] https://github.com/juditacs/wordcount




Great, it would be nice if Java was included by the op




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: