Why didn't they just download the dumps via https://dumps.wikimedia.org/enwiktio...

gnubison · on Aug 2, 2022

They’re in wikitext, which looks to be considerably less semantic than the crawled data. I’m not sure that’s the reason, but it could be a reason.

chii · on Aug 3, 2022

I'd say not the reason, since the wiki text is pretty semantic. the wiki source of https://en.wiktionary.org/wiki/subbureau#English is:

  ==English==

  ===Etymology===
  {{prefix|en|sub|bureau}}

  ===Noun===
  {{en-noun|s|subbureaux}}

  # A [[district]]-level public security bureau in [[China]].

so as long as one can parse wikitext, it's split pretty well up!