Graph query languages are nice and all, but what about Linked Data here? Queries...

moxious · on Sept 18, 2019

Linked Data tends to live in a semantic web world that has a lot of open world assumptions. While there are a few systems like this out there, there aren't many. More practically focused systems collapse this worldview down into a much simpler model, and property graphs suit just fine.

There's nothing wrong with enabling linked data use cases, but you don't need RDF+SPARQL+OWL and the like to do that.

The "semantic web stack" I think has been shown by time and implementation experience to be an elegant set of standards and solutions for problems that very few real world systems want to tackle. In the intervening 2 full generations of tech development that have happened since a lot of those standards were born, some of the underlying stuff too (most particularly XML and XML-NS) went from indispensable to just plain irritating.

westurner · on Sept 18, 2019

> Linked Data tends to live in a semantic web world that has a lot of open world assumptions. While there are a few systems like this out there, there aren't many. More practically focused systems collapse this worldview down into a much simpler model, and property graphs suit just fine.

Data integration is cost prohibitive. In n years time, the task is "Let's move all of these data silos into a data lake housed in our singular data warehouse; and then synchronize and also copy data around to efficiently query it in one form or another"

Linked data enables data integration from day one: enables the linking of tragically-silo'd records within disparate databases

There are very very many systems that share linked data. Some only label some of the properties with URIs in templates. Some enable federated online querying.

When you develop a schema for only one application implementation, you're tragically limiting the future value of the data.

> There's nothing wrong with enabling linked data use cases, but you don't need RDF+SPARQL+OWL and the like to do that.

Can you name a property graph use case that cannot be solved with RDFS and SPARQL?

> The "semantic web stack" I think has been shown by time and implementation experience to be an elegant set of standards and solutions for problems that very few real world systems want to tackle.

TBH, I think the problem is that people don't understand the value in linking our data silos through URIs; and so they don't take the time to learn RDFS or JSON-LD (which is pretty simple and useful for very important things like SEO: search engine result cards come from linked data embedded in HTML attributes (RDFa, Microdata) or JSON-LD)

The action buttons to 'RSVP', 'Track Package', anf 'View Issue' on Gmail emails are schema.org JSON-LD.

Applications can use linked data in any part of the stack: the database, the messages on the message queue, in the UI.

You might take a look at all of the use cases that SOLID solves for and realize how much unnecessary re-work has gone into indexing structs and forms validation. These are all the same app with UIs for interlinked subclasses of https://schema.org/Thing with unique inferred properties and aggregations thereof.

> In the intervening 2 full generations of tech development that have happened since a lot of those standards were born, some of the underlying stuff too (most particularly XML and XML-NS) went from indispensable to just plain irritating.

Without XSD, for example, we have no portable way to share complex fractions.

There's a compact representation of JSON-LD that minimizes record schema overhead (which gzip or lzma generally handle anyway)

https://lod-cloud.net is not a trivial or insignificant amount of linked data: there's real value in structuring property graphs with standard semantics.

Are our brains URI-labeled graphs? Nope, and we spend a ton of time talking to share data. Eventually, it's "well let's just get a spreadsheet and define some columns" for these property graph objects. And then, the other teams' spreadsheets have very similar columns with different labels and no portable datatypes (instead of URIs)

moxious · on Sept 18, 2019

> Can you name a property graph use case that cannot be solved with RDFS and SPARQL?

No - that's not the point. Of course you can do it with RDFS + SPARQL. For that matter you could do it with redis. Fully beside the point.

What's important is what the more fluent and easy way to do things is. People vote with their feet, and property graphs are demonstrably easier to work with for most use cases.

namedgraph · on Sept 19, 2019

“Easier” is completely subjective, no way you can demonstrate that.

RDF solves a much larger problem than just graph data model and query. It addresses data interchange on the web scale, using URIs, zero-cost merge, Linked Data etc.

moxious · on Sept 19, 2019

> “Easier” is completely subjective, no way you can demonstrate that.

I agree it's subjective. While there's no exact measurement for this sort of thing, the proxy measure people usually use is adoption; and if you look into for example Cypher vs. SPARQL adoption, Neo4j vs. RDF store adoption, people are basically voting with their feet.

From my personal experiences developing software with both, I've found property graphs much simpler and a better map for how people think of data.

It's true that RDF tries to solve data interchange on the web scale. That's what it was designed for. But the original design vision, in my view, hasn't come to fruition. There are bits and pieces that have been adopted to great effect (things like RDF microformats for tagging HTML docs) but nothing like what the vision was.

westurner · on Sept 22, 2019

What was the vision?

The RDFJS "Comparison of RDFJS libraries" wiki page lists a number of implementations; though none for React or AngularJS yet, unfortunately. https://www.w3.org/community/rdfjs/wiki/Comparison_of_RDFJS_...

There's extra work to build general purpose frameworks for Linked Data. It may have been hard for any firm with limited resources to justify doing it the harder way (for collective returns)

Dokieli (SOLID (LDP,), WebID, W3C Web Annotations,) is a pretty cool - if deceptively simple-looking - showcase of what's possible with Linked Data; it just needs some CSS and a revenue model to pay for moderation. https://dokie.li/

westurner · on Sept 18, 2019

> property graphs are demonstrably easier to work with for most use cases.

How do you see property graphs as distinct from RDF?

People build terrible apps without schema or validation and leave others to clean that up.

moxious · on Sept 19, 2019

> How do you see property graphs as distinct from RDF?

This is the full answer: https://stackoverflow.com/a/30167732/2920686

westurner · on Sept 20, 2019

I added an answer in context to the comments on the answer you've linked but didn't add a link from the comments to the answer. Here's that answer:

> (in reply to the comments on this answer: https://stackoverflow.com/a/30167732 )

> When an owl:inverseOf production rule is defined, the inverse property triple is inferred by the reasoner either when adding or updating the store, or when selecting from the store. This is a "materialized relation"

> Schema.org - an RDFS vocabulary - defines, for example, https://schema.org/isPartOf as the inverse property of hasPart. If both are specified, it's not necessary to run another graph pattern query to traverse a directed relation in the other direction. (:book1 schema:hasPart ?o), (?o schema:isPartOf :book1), (?s schema:hasPart :chapter2)

> It's certainly possible to use RDFS and OWL to describe schema for and within neo4j property graphs; but there's no reasoner to e.g. infer inverse properties or do schema validation.

> Is there any RDF graph that neo4j cannot store? RDF has datatypes and languages for objects: you'd need to reify properties where datatypes and/or languages are specified (and you'd be re-implementing well-defined semantics)

> Can every neo4j graph be represented with RDF? Yes.

> RDF is a representation for graphs for which there are very many store implementations that are optimized for various use cases like insert and query performance.

> Comparing neo4j to a particular triplestore (with reasoning support) might be a more useful comparison given that all neo4j graphs can be expressed as RDF.

westurner · on Sept 18, 2019

And then, some time later, I realize that I want/need to: (3) apply production rules to do inference at INSERT/UPDATE/DELETE time or SELECT time (and indicate which properties were inferred (x is a :Shape and a :Square, so x is also a :Rectangle; x is a :Rectangle and :width and :height are defined, so x has an :area)); (4) run triggers (that execute code written in a different language) when data is inserted, updated, modified, or linked to; (5) asynchronously yield streaming results to message queue subscribers who were disconnected when the cached pages were updated