Hacker News new | past | comments | ask | show | jobs | submit login
Freebase is closing down; data going to WikiData (groups.google.com)
173 points by chatman on Dec 16, 2014 | hide | past | favorite | 36 comments



This is a big shame. Freebase was by far the most consistent of the open data graphs.

With my cynic hat on I think they're being forced by larger strategy. The existing Freebase dumps are far too useful for a Google would be competitor, and I suspect the Knowledge Graph API will be somewhat more restrictive in what you're allowed to do with it.

The thing is the more you get into this stuff the more you envy Facebook's position where people give them structured data on a plate.


> The existing Freebase dumps are far too useful for a Google would be competitor

from the article:

> The last Freebase data dump will remain available

and even if it wasn't, I'm sure archive.org will grab a copy (if they haven't already).


Part of the value was that the dumps were updated weekly.

Say you're using it to get a list of named entities (proper nouns). The purpose is to cluster news stories about a given entity (if you look at Facebook's Trending News, each headline begins with a proper noun followed by a blurb. Not sure if they use Freebase, but it could be a useful input).

The value of Freebase will decline over time as the content becomes out of date.


The Wikidata dumps are also updated weekly; see [1].

Wikidata RDF exports are made every two months or so from those dumps and are available at [2]. I imagine that frequency will pick up. You can generate your own RDF exports using the Wikidata Toolkit [3, 4].

[1] http://dumps.wikimedia.org/other/wikidata/

[2] http://tools.wmflabs.org/wikidata-exports/rdf/

[3] https://www.mediawiki.org/wiki/Wikidata_Toolkit

[4] https://github.com/Wikidata/Wikidata-Toolkit


> Facebook's position where people give them structured data on a plate.

Do they get any structured data other than user profiles?


Yes: Various types of pages which cover virtually anything.


I'm sorry if I missed this, but what will happen to the Freebase software? I always found that to be a strong asset, but Wikidata's is less than stellar. They only mention the data and APIs here.


@markbao -- I did get Cayley out the door (http://github.com/google/cayley) which has many many parallels to Freebase's graph database, loading and storing Freebase data (as well as whatever other graph data). I've been a little busy with other bits of life at the moment, but totally open for contributions!


@markbao, I echo your concerns. Behaviour is just as important as data. In many modern systems, a lot of the behaviour is tied up in the user interface.


Hah, I read this as Firebase.. I was bracing myself for the google hate.


Did the same exact thing. Have a bunch of personal projects on FB and my heart skipped a beat.


Me too. I was so confused.


Has anybody here used DBpedia? If so, how do you see it in relation to Wikidata? Do the projects overlap? Or, might they serve complementary purposes, with, say, DBpedia extracting data from Wikidata (rather than from Wikipedia drirectly)?


We are working on Wikidata to DBpedia mapping. I think we will finish next year. We had some extraction on DBpedia 2014. http://blog.dbpedia.org/2014/09/09/dbpedia-version-2014-rele...


DBpedia is data scraped out of Wikipedia and made available as RDF, SPARQL (and JSON). The scraping process is... okay. Sometimes it is great, sometimes it is really crappy.

Wikidata is intended to be the database behind Wikipedia. The current infoboxes that show you things like the population of cities—ideally, they'll be driven at some point directly from Wikidata. Then Wikidata can be a place for disparate data to be placed, often directly from government and other official data sources. The US government census data would just be routinely imported into Wikidata, and then the Wikipedia infoboxes would be driven from that.

It may also at some point lead to the creation of an alternative to the current category system on Wikipedia. Wikipedia currently has policies around categorisation of people. For instance, we might have a category called "British physicists", and a category called "Jewish physicists" and a category called "LGBT physicists" but because it would be too difficult to maintain, Wikipedia doesn't have a "British Jewish LGBT physicists" category. See http://enwp.org/WP:OVERCAT

What Wikidata means is we might be able to get rid of the category system, or rather have a category system where the categories are based on Wikidata properties. So instead of having all those categories, you could have a faceted navigation system where you say "Show me all the scientists, now show me all the British scientists, now show me all the physicists, now show me all the women physicists" etc. etc. And you could pick any property you choose, not just the ones that Wikipedia category editors think are important.

This also gets rid of a whole load of politics around categories: there was a big storm a while back when someone decided to split up the "American novelists" category into "American men novelists" and "American women novelists", and some of the latter decided that this was rather a demotion. Eventually, Wikidata may end up powering the replacement for that, enabling readers to find what they want without editors having to make contentious judgment calls like that.

Where Wikidata becomes quite interesting is that because it ends up being used by Wikipedia, there's some kind of motivation to get it right and keep it up-to-date. It's all too easy for projects like Freebase to import-once-and-forget. But if it is used as the basis for a public-facing project like Wikipedia, there's hopefully some more pressure to get it right. (Obviously, how much you trust that is rather dependent on how much you trust Wikipedia to get things right.)

It also may end up being the centre node for pointers between databases: because Wikipedia is a reasonably good collection of everything (or, everything that a few people at some point decided to write about in a books, which is a reasonably good low barrier), then it becomes a fairly good central index for pointers to other data sources. Bibliographic/authority control databases like GND, BNB, VIAF and others are already being merged into Wikidata, as have pointers to the identifiers used in some specialist scientific databases. There'll be plenty more where that came from.


Wikidata has a number of structural problems of its own, though, at least when it comes to interacting with the Wikipedia projects it aims to serve.

The model for interlinks connecting different languages assumes a 1:1 correspondence between articles and concepts, although the Wikipedias for each language have different structures, and a given article can document several concepts.

Also, I have the impression that the community of Wikidatans are averse to getting their precious data pool muddied with inconsistent, ambiguous and untidy content. That's understandable, but it means that there will be friction whenever the larger community tries to capture knowledge in a distributed way, without following a single well-defined standard. Things can get messy fast, and the conversations I've followed at Wikidata show that the project maintainers are likely to present significant opposition against doing things in the required "quick & dirty" way required by collaborative editing.

I see at your Wikidata's user page that you get all the needed principles right (pragmatism over theoretical purity, usage over hypothetical cases, design for humans first) but the history of the project doesn't seem to follow them well on the areas where it has gone live and has been used by outsiders.


Yeah, generally the Wikidata folk are taking things quite slowly. Start slow and get simple things right. The software is evolving slowly too.

I'm quietly confident that it might all work out, but it is a bit too early to tell.


Wikidatan here. Here's a quick comparison of Freebase and Wikidata:

Topics / items:

- Freebase: 46,476,860 [1]

- Wikidata: 12,921,731 [2]

Facts / claims:

- Freebase: 2,696,141,481 [1]

- Wikidata: 50,457,200 as of 2014-11-10 [3]

Instances of person / human:

- Freebase: 3,391,533 [4]

- Wikidata: 2,638,614 [5]

License for data

- Freebase: CC-BY [6]

- Wikidata: CC0 [7]

Data on Paul Graham:

- Freebase: http://www.freebase.com/m/017cm9

- Wikidata: https://www.wikidata.org/wiki/Q92650

Data on San Francisco:

- Freebase: http://www.freebase.com/m/0d6lp

- Wikidata: https://www.wikidata.org/wiki/Q62

Data on Python:

- Freebase: http://www.freebase.com/m/05z1_

- Wikidata: https://www.wikidata.org/wiki/Q28865

Data on APOE / Apolipoprotein E:

- Freebase: http://www.freebase.com/m/0byv2v

- Wikidata: https://www.wikidata.org/wiki/Q14890468 (APOE), https://www.wikidata.org/wiki/Q424728 (Apolipoprotein E)

See [8] and [9] for an introduction to Wikidata. I have no notable experience with Freebase, but I've been contributing to Wikidata for about 2 years and would be happy to answer any questions I can.

[1] http://www.freebase.com/

[2] https://www.wikidata.org

[3] http://tools.wmflabs.org/wikidata-todo/stats.php

[4] http://www.freebase.com/people/person?instances

[5] http://tools.wmflabs.org/autolist /autolist1.html?q=claim[31:5]

[6] http://www.freebase.com/policies/tos

[7] See bottom of [2]

[8] Up and running with Wikidata: http://www.slideshare.net/_emw/up-and-running-with-wikidata

[9] Introducing Wikidata to the Linked Data Web: http://korrekt.org/papers/Wikidata-RDF-export-2014.pdf


Reasonator is a better way to visualize info from Wikidata:

Data on Paul Graham:

- Freebase: http://www.freebase.com/m/017cm9

- Wikidata: https://tools.wmflabs.org/reasonator/?&q=92650

Data on San Francisco:

- Freebase: http://www.freebase.com/m/0d6lp

- Wikidata: https://tools.wmflabs.org/reasonator/?&q=62

Data on Python:

- Freebase: http://www.freebase.com/m/05z1_

- Wikidata: https://tools.wmflabs.org/reasonator/?&q=28865

Data on APOE / Apolipoprotein E:

- Freebase: http://www.freebase.com/m/0byv2v

- Wikidata: https://tools.wmflabs.org/reasonator/?&q=14890468 (APOE), https://tools.wmflabs.org/reasonator/?&q=424728 (Apolipoprotein E)


So sad. Freebase is way ahead and more polished.

Wikidata originates from the German Wikipedia. The idea is good, but the implementation pales in comparison to Freebase (at the moment).

This is the real San Francisco Wikidata page (slow and ugly): https://www.wikidata.org/wiki/Q62

Reasonator takes ages to load and render the content.


There's a Wikidata UI Redesign in development [1] which should improve the default site's visual appeal.

That said, while the San Francisco Wikidata page may currently be uglier than its Freebase counterpart, it is not slower. webpagetest.org has the Wikidata page fully loaded at 8.8 s and the Freebase page 11.2 s [2, 3]. And while Reasonator is certainly dog slow (21.2 s to fully load! [4]), its San Francisco page is much more polished than the Freebase's.

[1] http://www.wikidata.org/wiki/Wikidata:UI_redesign_input

[2] http://www.webpagetest.org/result/141218_DR_9W4/

[3] http://www.webpagetest.org/result/141218_ZA_9WF/

[4] http://www.webpagetest.org/result/141218_6N_9WK/


Will CC0 and CC-BY data remain separate, or will there be an attempt to reconcile future contributions?


There will be an attempt to reconcile future contributions.

From Denny Vrandecic, current Google researcher working on the Google Knowledge Graph, former project director of Wikidata [1]:

"Freebase has seen a huge amount of effort go into it since it went public in 2007. It makes a lot of sense to make the results of this work available to Wikidata. But knowing Wikidata and its community a bit, it is obvious that we can not and should not simply upload Freebase data to Wikidata: Wikidata would prefer the data to be referenced to external, primary sources.

"In order to do so, Google will soon start to work on an Open Source tool which will run on Wikimedia labs and which will allow Wikidata contributors to find references for a statement and then upload the statement and the reference to Wikidata. We will release several sets of Freebase data ready for consumption by this tool under a CC0 license. This tool should also work for statements already in Wikidata without sufficient references, or for other datasets, like DBpedia and other machine extraction efforts, etc.

"To make sure we get it right, we invite you to participate in the design and development of this tool here:

https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool "

[1] https://lists.wikimedia.org/pipermail/wikidata-l/2014-Decemb...


"Google will soon start to work on an Open Source tool which will run on Wikimedia labs".

Hmm. Don't forget about their Knol vs. Wikipedia: http://en.wikipedia.org/wiki/Knol

And now they use Wikipedia data for their knowledge graph. If Google open sourced their knowledge graph algorithms, that would be another thing.


I understood that there's a significant quality difference with Freebase: that there are a lot of places in Wikidata showing values where a reference should be. Is that true?

And do you have any way to contact you? (My email is in my profile.)


From the first chart in [1], which gives Wikidata statistics for 2014-11-10:

- Total statements: 50,457,200

- Items with referenced statements: 8,188,516 (49.41%)

- Statements referenced to Wikipedia: 18,614,138 (36.89%)

- Statements referenced to other sources: 7,466,240 (14.80%)

"References" to Wikipedia are obviously frowned about, so the relevant datum here is that 14.80% of Wikidata's 50,457,200 statements have references.

I have no idea what that figure is for Freebase. Anyone know how to find what proportion of statements in Freebase are referenced?

The best way to contact me about this is to leave a note at https://www.wikidata.org/wiki/User_talk:Emw.

[1] http://tools.wmflabs.org/wikidata-todo/stats.php


"The move to Wikidata is a bit ironic, given that some of the data sitting inside of Freebase — including musician genres, album names, and record labels, for instance — originated from pages on Wikipedia, which the nonprofit Wikimedia Foundation hosts. And Googlers understand that."

http://venturebeat.com/2014/12/16/google-plans-to-integrate-...


Gee, didn't see that coming :/ Hopefully wikidata will be able to serve the community well.


I've been having a look at Wikidata and it looks pretty good.


Doesn't Bing use Freebase for some search result panels? I suppose they'll just transfer over to Wikidata, but it seems funny that Google might have just added a lot of development time to some Bing developers to migrate APIs.


Microsoft bought Powerset (company) on July 1, 2008: http://en.wikipedia.org/wiki/Powerset_(company)

On May 11, 2008, the company unveiled a tool for searching a fixed subset of Wikipedia using conversational phrases rather than keywords.

The natural language processing part of Bing (Powerset) is based on Wikipedia data (scrapping the content), but they had a prototype based on Freebase too:

On April 16, 2008: "Powerset demonstrated our integration to Freebase. At one point, a group stood in front of the projected computer and threw out queries to see all of the different Freebase types that Powerset could handle." (source: https://web.archive.org/web/20080430113649/http://blog.power... )


I think this decision is a proof of the maturity of the Freebase comunity.

I always heard of open source community splitting, which is fragmenting the forces working on each project and lowering the quality of each products. Doing the opposite here will lead to one great product rather than two "not bad" products competing against each others.


You'll always be able to query this data in RDF with

http://basekb.com/


A year ago I developed an android client for freebase, but unfortunately I never had the chance to finish or publish it.

I guess now it's too late :D

If anybody is interested the apk is here

http://goo.gl/6BxAZJ


What is/was Freebase?


isn't Freebase essentially what Import.io and Kimono are doing?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: