Hacker News new | past | comments | ask | show | jobs | submit login

From the article: "I landed on the freeDictionary API that uses the Wiktionary as a source.".



Why didn't they just download the dumps via https://dumps.wikimedia.org/enwiktionary/ (as explained in https://en.wiktionary.org/wiki/Help:FAQ#Downloading_Wiktiona...)

Scraping, even via an api, is way less efficient imho.


They’re in wikitext, which looks to be considerably less semantic than the crawled data. I’m not sure that’s the reason, but it could be a reason.


I'd say not the reason, since the wiki text is pretty semantic. the wiki source of https://en.wiktionary.org/wiki/subbureau#English is:

  ==English==

  ===Etymology===
  {{prefix|en|sub|bureau}}

  ===Noun===
  {{en-noun|s|subbureaux}}

  # A [[district]]-level public security bureau in [[China]].
so as long as one can parse wikitext, it's split pretty well up!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: