Looking at your code, you're processing the actual word in a Finnish-specific way. If you'd like to generalise it, you can also just see if the definitions contains `x of y`, where y is a link, and automatically dereference to `y`.
Ha, what an honor to finally be noticed by Nuenki guy. Big fan of the project.
I did consider doing something Wiktionary-centric, and in fact have a Wiktionary JSONL scrape lying around courtesy of https://kaikki.org/ (from the same guy who started OpenSSH!). `tsk` does something similar to your defereferencing when it hits a "go deeper" phrase.
I decided against that approach in favor of the libvoikko spell checker because Finnish lies in this interesting zone of being an agglutinative language with a really, really regularized orthography. People love their neologisms here, and unfortunately most of them aren't catalogued in Wiktionary quite yet. I've found the mechanistic approach covers a lot of those edge cases well.
Take the word junttihenkiseni - the root form is junttihenkinen, but as of 05/06/2025 https://en.wiktionary.org/wiki/junttihenkinen does not actually exist. So `tsk` will have no data for it, but `finstem` with its mechanical approach works just fine. The word was originally coined to refer to Finland's unique spin on rock music in the 1970s and 80s.
On a broader level, if I can avoid hitting the network with small personal projects like this, I do try to. For example, `tsk` comes bundled every Finnish word with an English dictionary entry from Wiktionary, in a ~25 MB JSONL embed, and that allows us to build the randomly pruning trie that lets us get instantaneous prefix search across such a large space of things. I have met a lot of people who want to move to Finland from places where Internet is a sparse and valuable commodity, and I think their lives are much improved by having a tool they can just download one time and then use any place their laptop can be powered on.
And yeah, I used the same data source, just heavily processed. It's a great project!
I like your approach. I wish English used neologisms more; I use them occasionally, and it's just quite fun to create a new, lexically valid word to describe something novel.
The Nuenki dictionary database is a bit over a gigabyte, albeit with ~30 languages in it, so yeah that's definitely a bonus! JSONL probably compresses quite well, too. I added compression to Nuenki's (serialised-struct) dictionary entries a while back and it reduced the size by about 30%, iirc.
Feel free to use it if you like, it's practically free, and open source if you'd like to host it yourself: https://github.com/Alex-Programs/nuenki-dictionary
Looking at your code, you're processing the actual word in a Finnish-specific way. If you'd like to generalise it, you can also just see if the definitions contains `x of y`, where y is a link, and automatically dereference to `y`.