This is a big shame. Freebase was by far the most consistent of the open data graphs.
With my cynic hat on I think they're being forced by larger strategy. The existing Freebase dumps are far too useful for a Google would be competitor, and I suspect the Knowledge Graph API will be somewhat more restrictive in what you're allowed to do with it.
The thing is the more you get into this stuff the more you envy Facebook's position where people give them structured data on a plate.
Part of the value was that the dumps were updated weekly.
Say you're using it to get a list of named entities (proper nouns). The purpose is to cluster news stories about a given entity (if you look at Facebook's Trending News, each headline begins with a proper noun followed by a blurb. Not sure if they use Freebase, but it could be a useful input).
The value of Freebase will decline over time as the content becomes out of date.
The Wikidata dumps are also updated weekly; see [1].
Wikidata RDF exports are made every two months or so from those dumps and are available at [2]. I imagine that frequency will pick up. You can generate your own RDF exports using the Wikidata Toolkit [3, 4].
I'm sorry if I missed this, but what will happen to the Freebase software? I always found that to be a strong asset, but Wikidata's is less than stellar. They only mention the data and APIs here.
@markbao -- I did get Cayley out the door (http://github.com/google/cayley) which has many many parallels to Freebase's graph database, loading and storing Freebase data (as well as whatever other graph data). I've been a little busy with other bits of life at the moment, but totally open for contributions!
@markbao, I echo your concerns. Behaviour is just as important as data. In many modern systems, a lot of the behaviour is tied up in the user interface.
Has anybody here used DBpedia? If so, how do you see it in relation to Wikidata? Do the projects overlap? Or, might they serve complementary purposes, with, say, DBpedia extracting data from Wikidata (rather than from Wikipedia drirectly)?
DBpedia is data scraped out of Wikipedia and made available as RDF, SPARQL (and JSON). The scraping process is... okay. Sometimes it is great, sometimes it is really crappy.
Wikidata is intended to be the database behind Wikipedia. The current infoboxes that show you things like the population of cities—ideally, they'll be driven at some point directly from Wikidata. Then Wikidata can be a place for disparate data to be placed, often directly from government and other official data sources. The US government census data would just be routinely imported into Wikidata, and then the Wikipedia infoboxes would be driven from that.
It may also at some point lead to the creation of an alternative to the current category system on Wikipedia. Wikipedia currently has policies around categorisation of people. For instance, we might have a category called "British physicists", and a category called "Jewish physicists" and a category called "LGBT physicists" but because it would be too difficult to maintain, Wikipedia doesn't have a "British Jewish LGBT physicists" category. See http://enwp.org/WP:OVERCAT
What Wikidata means is we might be able to get rid of the category system, or rather have a category system where the categories are based on Wikidata properties. So instead of having all those categories, you could have a faceted navigation system where you say "Show me all the scientists, now show me all the British scientists, now show me all the physicists, now show me all the women physicists" etc. etc. And you could pick any property you choose, not just the ones that Wikipedia category editors think are important.
This also gets rid of a whole load of politics around categories: there was a big storm a while back when someone decided to split up the "American novelists" category into "American men novelists" and "American women novelists", and some of the latter decided that this was rather a demotion. Eventually, Wikidata may end up powering the replacement for that, enabling readers to find what they want without editors having to make contentious judgment calls like that.
Where Wikidata becomes quite interesting is that because it ends up being used by Wikipedia, there's some kind of motivation to get it right and keep it up-to-date. It's all too easy for projects like Freebase to import-once-and-forget. But if it is used as the basis for a public-facing project like Wikipedia, there's hopefully some more pressure to get it right. (Obviously, how much you trust that is rather dependent on how much you trust Wikipedia to get things right.)
It also may end up being the centre node for pointers between databases: because Wikipedia is a reasonably good collection of everything (or, everything that a few people at some point decided to write about in a books, which is a reasonably good low barrier), then it becomes a fairly good central index for pointers to other data sources. Bibliographic/authority control databases like GND, BNB, VIAF and others are already being merged into Wikidata, as have pointers to the identifiers used in some specialist scientific databases. There'll be plenty more where that came from.
Wikidata has a number of structural problems of its own, though, at least when it comes to interacting with the Wikipedia projects it aims to serve.
The model for interlinks connecting different languages assumes a 1:1 correspondence between articles and concepts, although the Wikipedias for each language have different structures, and a given article can document several concepts.
Also, I have the impression that the community of Wikidatans are averse to getting their precious data pool muddied with inconsistent, ambiguous and untidy content. That's understandable, but it means that there will be friction whenever the larger community tries to capture knowledge in a distributed way, without following a single well-defined standard. Things can get messy fast, and the conversations I've followed at Wikidata show that the project maintainers are likely to present significant opposition against doing things in the required "quick & dirty" way required by collaborative editing.
I see at your Wikidata's user page that you get all the needed principles right (pragmatism over theoretical purity, usage over hypothetical cases, design for humans first) but the history of the project doesn't seem to follow them well on the areas where it has gone live and has been used by outsiders.
See [8] and [9] for an introduction to Wikidata. I have no notable experience with Freebase, but I've been contributing to Wikidata for about 2 years and would be happy to answer any questions I can.
There's a Wikidata UI Redesign in development [1] which should improve the default site's visual appeal.
That said, while the San Francisco Wikidata page may currently be uglier than its Freebase counterpart, it is not slower. webpagetest.org has the Wikidata page fully loaded at 8.8 s and the Freebase page 11.2 s [2, 3]. And while Reasonator is certainly dog slow (21.2 s to fully load! [4]), its San Francisco page is much more polished than the Freebase's.
There will be an attempt to reconcile future contributions.
From Denny Vrandecic, current Google researcher working on the Google Knowledge Graph, former project director of Wikidata [1]:
"Freebase has seen a huge amount of effort go into it since it went public in 2007. It makes a lot of sense to make the results of this work available to Wikidata. But knowing Wikidata and its community a bit, it is obvious that we can not and should not simply upload Freebase data to Wikidata: Wikidata would prefer the data to be referenced to external, primary sources.
"In order to do so, Google will soon start to work on an Open Source tool which will run on Wikimedia labs and which will allow Wikidata contributors to find references for a statement and then upload the statement and the reference to Wikidata. We will release several sets of Freebase data ready for consumption by this tool under a CC0 license. This tool should also work for statements already in Wikidata without sufficient references, or for other datasets, like DBpedia and other machine extraction efforts, etc.
"To make sure we get it right, we invite you to participate in the design and development of this tool here:
I understood that there's a significant quality difference with Freebase: that there are a lot of places in Wikidata showing values where a reference should be. Is that true?
And do you have any way to contact you? (My email is in my profile.)
"The move to Wikidata is a bit ironic, given that some of the data sitting inside of Freebase — including musician genres, album names, and record labels, for instance — originated from pages on Wikipedia, which the nonprofit Wikimedia Foundation hosts. And Googlers understand that."
Doesn't Bing use Freebase for some search result panels? I suppose they'll just transfer over to Wikidata, but it seems funny that Google might have just added a lot of development time to some Bing developers to migrate APIs.
On May 11, 2008, the company unveiled a tool for searching a fixed subset of Wikipedia using conversational phrases rather than keywords.
The natural language processing part of Bing (Powerset) is based on Wikipedia data (scrapping the content), but they had a prototype based on Freebase too:
On April 16, 2008: "Powerset demonstrated our integration to Freebase. At one point, a group stood in front of the projected computer and threw out queries to see all of the different Freebase types that Powerset could handle." (source: https://web.archive.org/web/20080430113649/http://blog.power... )
I think this decision is a proof of the maturity of the Freebase comunity.
I always heard of open source community splitting, which is fragmenting the forces working on each project and lowering the quality of each products. Doing the opposite here will lead to one great product rather than two "not bad" products competing against each others.
With my cynic hat on I think they're being forced by larger strategy. The existing Freebase dumps are far too useful for a Google would be competitor, and I suspect the Knowledge Graph API will be somewhat more restrictive in what you're allowed to do with it.
The thing is the more you get into this stuff the more you envy Facebook's position where people give them structured data on a plate.