Hacker News new | past | comments | ask | show | jobs | submit login

Graph query languages are nice and all, but what about Linked Data here? Queries of schemaless graphs miss lots of data because without a schema this graph calls it "color" and that graph calls it "colour" and that graph calls it "色" or "カラー". (Of course this is also an issue even when there is a defined schema; but it's hardly possible to just happen to have comprehensible inter or even intra-organizational cohesion without e.g. RDFS and/or OWL and/or SHACL for describing (and changing) the shape of the data)

So, the task is then to compile schema-aware SPARQL to GQL or GraphQL or SQL or interminable recursive SQL queries or whatever it is.

For GraphQL, there's GraphQL-LD (which somewhat unfortunately contains a hashtag-indeterminate dash). I cite this in full here because it's very relevant to the GQL task at hand:

"GraphQL-LD: Linked Data Querying with GraphQL" (2018) https://comunica.github.io/Article-ISWC2018-Demo-GraphQlLD/

> GraphQL is a query language that has proven to be a popular among developers. In 2015, the GraphQL framework [3] was introduced by Facebook as an alternative way of querying data through interfaces. Since then, GraphQL has been gaining increasing attention among developers, partly due to its simplicity in usage, and its large collection of supporting tools. One major disadvantage of GraphQL compared to SPARQL is the fact that it has no notion of semantics, i.e., it requires an interface-specific schema. This therefore makes it difficult to combine GraphQL data that originates from different sources. This is then further complicated by the fact that GraphQL has no notion of global identifiers, which is possible in RDF through the use of URIs. Furthermore, GraphQL is however not as expressive as SPARQL, as GraphQL queries represent trees [4], and not full graphs as in SPARQL.

> In this work, we introduce GraphQL-LD, an approach for extending GraphQL queries with a JSON-LD context [5], so that they can be used to evaluate queries over RDF data. This results in a query language that is less expressive than SPARQL, but can still achieve many of the typical data retrieval tasks in applications. Our approach consists of an algorithm that translates GraphQL-LD queries to SPARQL algebra [6]. This allows such queries to be used as an alternative input to SPARQL engines, and thereby opens up the world of RDF data to the large amount of people that already know GraphQL. Furthermore, results can be translated into the GraphQL-prescribed shapes. The only additional requirement is their queries would now also need a JSON-LD context, which could be provided by external domain experts.

> In related work, HyperGraphQL [7] was introduced as a way to expose access to RDF sources through GraphQL queries and emit results as JSON-LD. The difference with our approach is that HyperGraphQL requires a service to be set up that acts as a intermediary between the GraphQL client and the RDF sources. Instead, our approach enables agents to directly query RDF sources by translating GraphQL queries client-side.

All of these RDFS vocabularies and OWL ontologies provide structure that minimizes the costs of merging and/or querying multiple datasets: https://lov.linkeddata.es/dataset/lov/

All of these schema.org/Dataset s in the "Linked Open Data Cloud" are easier to query than a schemaless graph: https://lod-cloud.net/ . Though one can query schemaless graphs with SPARQL, as well.

For reference, RDFLib has a bunch of RDF graph implementations over various key/value and SQL store backends. RDFLib-sqlachemy does query parametrization correctly in order to minimize the risk of query injection. FOR THE RECORD, SQL Injection is the CWE Top 25 #1 most prevalent security weakness; which is something that any new spec and implementation should really consider before launching anything other than an e.g. overly-verbose JSON-based query language that people end up bolting a micro-DSL onto. https://github.com/RDFLib/rdflib-sqlalchemy

Most practically, I frequently want to read a graph of objects into RAM; update, extend, and interlink; and then transactionally save the delta back to the store. This requires a few things: (1) an efficient binary serialization protocol like Apache Arrow (SIMD), Parquet, or any of the BSON binary JSONs; (2) a transactional local store that can be manually synchronized with the remote store until it's consistent.

SPARQL Update was somewhat of an out-of-scope afterthought. Here's SPARQL 1.1 Update: https://www.w3.org/TR/sparql11-update/

Here's SOLID, which could be implemented with SPARQL on GQL, too; though all the re-serialization really shouldn't be necessary for EAV triples with a named graph URI identifier: https://solidproject.org/

5 star data: PDF -> XLS -> CSV -> RDF (GQL, AFAIU (but with no URIs(!?))) -> LOD https://5stardata.info/en/




Linked Data tends to live in a semantic web world that has a lot of open world assumptions. While there are a few systems like this out there, there aren't many. More practically focused systems collapse this worldview down into a much simpler model, and property graphs suit just fine.

There's nothing wrong with enabling linked data use cases, but you don't need RDF+SPARQL+OWL and the like to do that.

The "semantic web stack" I think has been shown by time and implementation experience to be an elegant set of standards and solutions for problems that very few real world systems want to tackle. In the intervening 2 full generations of tech development that have happened since a lot of those standards were born, some of the underlying stuff too (most particularly XML and XML-NS) went from indispensable to just plain irritating.


> Linked Data tends to live in a semantic web world that has a lot of open world assumptions. While there are a few systems like this out there, there aren't many. More practically focused systems collapse this worldview down into a much simpler model, and property graphs suit just fine.

Data integration is cost prohibitive. In n years time, the task is "Let's move all of these data silos into a data lake housed in our singular data warehouse; and then synchronize and also copy data around to efficiently query it in one form or another"

Linked data enables data integration from day one: enables the linking of tragically-silo'd records within disparate databases

There are very very many systems that share linked data. Some only label some of the properties with URIs in templates. Some enable federated online querying.

When you develop a schema for only one application implementation, you're tragically limiting the future value of the data.

> There's nothing wrong with enabling linked data use cases, but you don't need RDF+SPARQL+OWL and the like to do that.

Can you name a property graph use case that cannot be solved with RDFS and SPARQL?

> The "semantic web stack" I think has been shown by time and implementation experience to be an elegant set of standards and solutions for problems that very few real world systems want to tackle.

TBH, I think the problem is that people don't understand the value in linking our data silos through URIs; and so they don't take the time to learn RDFS or JSON-LD (which is pretty simple and useful for very important things like SEO: search engine result cards come from linked data embedded in HTML attributes (RDFa, Microdata) or JSON-LD)

The action buttons to 'RSVP', 'Track Package', anf 'View Issue' on Gmail emails are schema.org JSON-LD.

Applications can use linked data in any part of the stack: the database, the messages on the message queue, in the UI.

You might take a look at all of the use cases that SOLID solves for and realize how much unnecessary re-work has gone into indexing structs and forms validation. These are all the same app with UIs for interlinked subclasses of https://schema.org/Thing with unique inferred properties and aggregations thereof.

> In the intervening 2 full generations of tech development that have happened since a lot of those standards were born, some of the underlying stuff too (most particularly XML and XML-NS) went from indispensable to just plain irritating.

Without XSD, for example, we have no portable way to share complex fractions.

There's a compact representation of JSON-LD that minimizes record schema overhead (which gzip or lzma generally handle anyway)

https://lod-cloud.net is not a trivial or insignificant amount of linked data: there's real value in structuring property graphs with standard semantics.

Are our brains URI-labeled graphs? Nope, and we spend a ton of time talking to share data. Eventually, it's "well let's just get a spreadsheet and define some columns" for these property graph objects. And then, the other teams' spreadsheets have very similar columns with different labels and no portable datatypes (instead of URIs)


> Can you name a property graph use case that cannot be solved with RDFS and SPARQL?

No - that's not the point. Of course you can do it with RDFS + SPARQL. For that matter you could do it with redis. Fully beside the point.

What's important is what the more fluent and easy way to do things is. People vote with their feet, and property graphs are demonstrably easier to work with for most use cases.


“Easier” is completely subjective, no way you can demonstrate that.

RDF solves a much larger problem than just graph data model and query. It addresses data interchange on the web scale, using URIs, zero-cost merge, Linked Data etc.


> “Easier” is completely subjective, no way you can demonstrate that.

I agree it's subjective. While there's no exact measurement for this sort of thing, the proxy measure people usually use is adoption; and if you look into for example Cypher vs. SPARQL adoption, Neo4j vs. RDF store adoption, people are basically voting with their feet.

From my personal experiences developing software with both, I've found property graphs much simpler and a better map for how people think of data.

It's true that RDF tries to solve data interchange on the web scale. That's what it was designed for. But the original design vision, in my view, hasn't come to fruition. There are bits and pieces that have been adopted to great effect (things like RDF microformats for tagging HTML docs) but nothing like what the vision was.


What was the vision?

The RDFJS "Comparison of RDFJS libraries" wiki page lists a number of implementations; though none for React or AngularJS yet, unfortunately. https://www.w3.org/community/rdfjs/wiki/Comparison_of_RDFJS_...

There's extra work to build general purpose frameworks for Linked Data. It may have been hard for any firm with limited resources to justify doing it the harder way (for collective returns)

Dokieli (SOLID (LDP,), WebID, W3C Web Annotations,) is a pretty cool - if deceptively simple-looking - showcase of what's possible with Linked Data; it just needs some CSS and a revenue model to pay for moderation. https://dokie.li/


> property graphs are demonstrably easier to work with for most use cases.

How do you see property graphs as distinct from RDF?

People build terrible apps without schema or validation and leave others to clean that up.


> How do you see property graphs as distinct from RDF?

This is the full answer: https://stackoverflow.com/a/30167732/2920686


I added an answer in context to the comments on the answer you've linked but didn't add a link from the comments to the answer. Here's that answer:

> (in reply to the comments on this answer: https://stackoverflow.com/a/30167732 )

> When an owl:inverseOf production rule is defined, the inverse property triple is inferred by the reasoner either when adding or updating the store, or when selecting from the store. This is a "materialized relation"

> Schema.org - an RDFS vocabulary - defines, for example, https://schema.org/isPartOf as the inverse property of hasPart. If both are specified, it's not necessary to run another graph pattern query to traverse a directed relation in the other direction. (:book1 schema:hasPart ?o), (?o schema:isPartOf :book1), (?s schema:hasPart :chapter2)

> It's certainly possible to use RDFS and OWL to describe schema for and within neo4j property graphs; but there's no reasoner to e.g. infer inverse properties or do schema validation.

> Is there any RDF graph that neo4j cannot store? RDF has datatypes and languages for objects: you'd need to reify properties where datatypes and/or languages are specified (and you'd be re-implementing well-defined semantics)

> Can every neo4j graph be represented with RDF? Yes.

> RDF is a representation for graphs for which there are very many store implementations that are optimized for various use cases like insert and query performance.

> Comparing neo4j to a particular triplestore (with reasoning support) might be a more useful comparison given that all neo4j graphs can be expressed as RDF.


And then, some time later, I realize that I want/need to: (3) apply production rules to do inference at INSERT/UPDATE/DELETE time or SELECT time (and indicate which properties were inferred (x is a :Shape and a :Square, so x is also a :Rectangle; x is a :Rectangle and :width and :height are defined, so x has an :area)); (4) run triggers (that execute code written in a different language) when data is inserted, updated, modified, or linked to; (5) asynchronously yield streaming results to message queue subscribers who were disconnected when the cached pages were updated




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: