Hacker News new | past | comments | ask | show | jobs | submit login

DBpedia is data scraped out of Wikipedia and made available as RDF, SPARQL (and JSON). The scraping process is... okay. Sometimes it is great, sometimes it is really crappy.

Wikidata is intended to be the database behind Wikipedia. The current infoboxes that show you things like the population of cities—ideally, they'll be driven at some point directly from Wikidata. Then Wikidata can be a place for disparate data to be placed, often directly from government and other official data sources. The US government census data would just be routinely imported into Wikidata, and then the Wikipedia infoboxes would be driven from that.

It may also at some point lead to the creation of an alternative to the current category system on Wikipedia. Wikipedia currently has policies around categorisation of people. For instance, we might have a category called "British physicists", and a category called "Jewish physicists" and a category called "LGBT physicists" but because it would be too difficult to maintain, Wikipedia doesn't have a "British Jewish LGBT physicists" category. See http://enwp.org/WP:OVERCAT

What Wikidata means is we might be able to get rid of the category system, or rather have a category system where the categories are based on Wikidata properties. So instead of having all those categories, you could have a faceted navigation system where you say "Show me all the scientists, now show me all the British scientists, now show me all the physicists, now show me all the women physicists" etc. etc. And you could pick any property you choose, not just the ones that Wikipedia category editors think are important.

This also gets rid of a whole load of politics around categories: there was a big storm a while back when someone decided to split up the "American novelists" category into "American men novelists" and "American women novelists", and some of the latter decided that this was rather a demotion. Eventually, Wikidata may end up powering the replacement for that, enabling readers to find what they want without editors having to make contentious judgment calls like that.

Where Wikidata becomes quite interesting is that because it ends up being used by Wikipedia, there's some kind of motivation to get it right and keep it up-to-date. It's all too easy for projects like Freebase to import-once-and-forget. But if it is used as the basis for a public-facing project like Wikipedia, there's hopefully some more pressure to get it right. (Obviously, how much you trust that is rather dependent on how much you trust Wikipedia to get things right.)

It also may end up being the centre node for pointers between databases: because Wikipedia is a reasonably good collection of everything (or, everything that a few people at some point decided to write about in a books, which is a reasonably good low barrier), then it becomes a fairly good central index for pointers to other data sources. Bibliographic/authority control databases like GND, BNB, VIAF and others are already being merged into Wikidata, as have pointers to the identifiers used in some specialist scientific databases. There'll be plenty more where that came from.




Wikidata has a number of structural problems of its own, though, at least when it comes to interacting with the Wikipedia projects it aims to serve.

The model for interlinks connecting different languages assumes a 1:1 correspondence between articles and concepts, although the Wikipedias for each language have different structures, and a given article can document several concepts.

Also, I have the impression that the community of Wikidatans are averse to getting their precious data pool muddied with inconsistent, ambiguous and untidy content. That's understandable, but it means that there will be friction whenever the larger community tries to capture knowledge in a distributed way, without following a single well-defined standard. Things can get messy fast, and the conversations I've followed at Wikidata show that the project maintainers are likely to present significant opposition against doing things in the required "quick & dirty" way required by collaborative editing.

I see at your Wikidata's user page that you get all the needed principles right (pragmatism over theoretical purity, usage over hypothetical cases, design for humans first) but the history of the project doesn't seem to follow them well on the areas where it has gone live and has been used by outsiders.


Yeah, generally the Wikidata folk are taking things quite slowly. Start slow and get simple things right. The software is evolving slowly too.

I'm quietly confident that it might all work out, but it is a bit too early to tell.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: