A community effort to extract structured information from Wikipedia

graemep · on March 22, 2009

I like Wikipedia less and less. It contains lots of errors and copied content (the latter including stuff from my own site). To be fair there are people who take out material that is in breach of copyright, but what often then happens is they re-phrase the content (not usually very well).

Why not use Google to look for sources that are further upstream.

Wikipedia is good in some areas (especially geeky stuff in general), but outside those areas, I tend to avoid it.

drewp · on March 21, 2009

freebase.com also extracts WP and can now return its results as rdf too.

dbpedia's RDF is somewhat easier to work with, but freebase results contain more sources than just WP, you can edit the data with a nice gui on freebase.com, and freebase spends a lot of effort tracking topics between updates. With dbpedia, I think you just get a snapshot of WP, even if that means your URIs from last week are dead.

dbpedia is open source, the freebase extractor is not.

The #2 use case example from dbpedia (use it to put WP data on your pages) is a big focus of freebase.com, and they have a bunch of tools to make that easy.