Hacker News new | past | comments | ask | show | jobs | submit login

> I ended up tokenizing Wikipedia for a blacklist and still missed a lot :(

That sounds like an impressive project in itself :)




Words not on Wikipedia, found on other sources, listed by frequency (perhaps with a date-weighting of the source document to reduce rating of older sources), would be an interesting way to find holes in Wikipedia's coverage.


Someone should make a Wikipedia page of that list. Oh, wait.


I like how you had information, made a sarcastic comment about it, but didn't share the actual information ... just in case your comment might prove helpful ...


Are you saying the URL of that Wikipedia page is “actual information” that patrickthebold failed to share?

I think that page doesn’t exist. patrickthebold wasn’t sarcastically mocking people who were too lazy to look up that page. He was just making the point that as soon as a hypothetical list like that was uploaded to Wikipedia, it should be deleted, since those words would then be words found on Wikipedia.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: