So this seems to be about superimposing the relational model on ad-hoc data? It seems like an idea that has been explored ad infinitum. It also seems like it's for want of a use case, in its current form.
Like they say in the article, the main issue they'll run in to is that people really don't want to build taxonomies manually. My skeptic prediction: Unless they get some really snappy text mining to go with the product, it'll be dead in the water. When you want to ad-hoc mine certain data sets, it's usually because you've just figured out you need some specific query answered. Having to spend several hours structuring the data before you run the actual query seems about as laborious as just doing the original work yourself. Though I might be of limited vision, or just missing something fundamental here.
When people have existing data with some structure, they can easily convert this into a semantic Silk site. This can be done using various importer tools (such as our CSV importer) and via the read/write API.
When people are creating new content, we're trying to make it as easy as possible to add semantics/tags. It's as easy as making your text bold and we try to provide instant value to users who tag their content, thereby encouraging them to keep doing so. Some things we do are:
- converting currencies
- showing how long ago a date was
- displaying a map for locations
- showing how data relates to other data (tagging the age of a person, will show a list of how old other people in the site are)
So, if you have unstructured data and are looking for one quick answer, Silk may not be the right tool to use. When the data has some structure already, you can easily convert it into a Silk site and then enjoy some Silk magic. But ideally, you use Silk to make some specific content available in a structured manner, so you can use it for a longer period of time.
Ok, very cool. I might misunderstand the use case then.
Do you do any sort of history-based analysis? E.g. solving "I just imported this, which I tagged with a,b,c from locations x,y,z, now I want to import this, that has d,e,f also from positions x,y,z" - I guess Wikipedias info boxes would be a typical example of that kind of semi-structured data. Or does that require setting up an import channel yourself?
The http://world.silkapp.com site was actually built that way. It's a combination of information on Wikipedia & the CIA world factbook. We built a MediaWiki importer for this, that will be publicly available at some point. Also, some of the functionality may make it to the core product, but we'll have to see how this develops over time.
Silk does use pattern recognition to automatically infer data types already, so you won't have to specify that something is a location, date, currency, number, etc. The editor will find out itself. We are working hard to bring this kind of 'cleverness' to more and more aspects of the editor.
Seems like they're stuck on the same problems that have always killed the Semantic Web: manual metadata, and incompatible schemas. Without a solution to either, this is no different from what we had in 2003.
As for incompatible schemas, we're not aiming to build one big ontology for the world, but we are aiming to facilitate a large number of datasets that are useful by themselves. We're working towards being able to link various Silk sites to each other and will try to find ways to make as many as schemas fit into each other.
Not to be confused with Jason Rohrer's silk, which is an actually creative solution to a much more core problem, is open-source, and not driven by venture-capital segway hypewaves: http://hypertext.sourceforge.net/silk/
To recap, 'semantics' is the study of meaning. Humans derive meaning from linguistic symbols by comparing them to their prior experiences. The basic question being "Has this symbol, or a closely associated symbol, been present in the context of a memory I have?" If it is, then we 'understand' the symbol by pulling up the memory or memories tied to it. For most symbols (e.g. apple) there is a rich personal history or experiences we can compare it to.
So the mapping here is symbol -> experiences, where experiences are the memories of senses over time.
Thus the data structure that will ultimately solve this problem is one that performs that mapping, not one which maps symbols to a necessarily arbitrary and limited other set of symbols (aka category labels and facts).
I don't see how else would they monetise it to be honest.
It's one of the main reasons the "semantic web" didn't really catch up back in the day. Those who are in position to really gather massive amounts of data won't share it all with the competition (namely Google and Facebook).
They should have a tutorial that walks me through how to accomplish example project in the promo video. From what I can tell I would have to import all those prime minister pages from Wikipedia manually?
PS. I'm very interested in the ideas behind the semantic web, and the video hints that Silk may have gotten closer than anyone else at successfully "building the human" into the software design. IMO it's all about removing friction and I see some promising ideas in that regard, so congrats to Silk for that!
1. Freebase aims to build one big repository of structured data for all the information in the world, whereas Silk consists of separate sites with separate owners.
2. Datasets in Silk are also common websites. Technically, there is no need to use the semantic layar of Silk. Admittedly, it's a lot less fun without :)
1. Our read/write API allows you to get any existing data set into Silk. We have various importers available. No OWL/RDF at this time, but that may very well be developed by ourselves or our developer eco-system in the future.
2. Silk has a powerful query engine that allows quite complex reasoning. Part of that is already exposed in the product, through our search bar and the so called "explore mode". We have no UI yet for the more complex queries, but those can be performed through our API.
Good suggestion. BTW, have you looked at http://dbpedia.org/About ? DBPedia is RDF data extracted from Wikipedia data. There is an interesting and useful ecosystem built around it.
Like they say in the article, the main issue they'll run in to is that people really don't want to build taxonomies manually. My skeptic prediction: Unless they get some really snappy text mining to go with the product, it'll be dead in the water. When you want to ad-hoc mine certain data sets, it's usually because you've just figured out you need some specific query answered. Having to spend several hours structuring the data before you run the actual query seems about as laborious as just doing the original work yourself. Though I might be of limited vision, or just missing something fundamental here.