Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft GraphView: Use SQL Server or Azure SQL to efficiently manage graphs (github.com/microsoft)
48 points by us0r on Dec 22, 2015 | hide | past | favorite | 6 comments



I'm hoping I'm missing something important, or that there's a reason the documentation isn't more clear, especially if this is coming from Microsoft.

    "efficiently manage graphs" != "efficiently traverse graphs"
It's very very easy to store a graph in a database. It's very very hard to efficiently walk an arbitrary path between nodes using SQL.

Browsing the user manual, I don't see any examples of graph walks. There are some glorified joins, but there's no real explanation of why a graph database is better suited for the queries show (quite the opposite I feel -- it's harder to read the non-standard SQL!).


Seems like its more geared towards matching relatively small template-based subgraph patterns rather than doing arbitrary walks efficiently. From page 9 of the user manual:

> Below is an example query matching a triangle over the employee-client graph:

    SELECT En1.Name, En2.Name, Cn.Name
    FROM EmployeeNode AS En1, EmployeeNode AS En2, ClientNode AS Cn
    MATCH En1-[Colleagues AS Cg]->En2-[Clients AS C1]->Cn,
          En1-[Clients AS C2]->Cn
    WHERE En1.Name = ‘Jane Doe’ AND C1.credit > 1000


Clearer and more standard SQL Join:

    SELECT En1.Name, En2.Name, Cn.Name
      FROM EmployeeNode AS En1, EmployeeNode AS En2, ClientNode AS Cn,
           Colleagues AS Cg, ClientOf AS C1, ClientOf AS C2
     WHERE En1.Name = ‘Jane Doe’ AND C1.credit > 1000
       AND En1.ID=Cg.ID1 AND En2.ID=Cg.ID2
       AND En1.ID=C1.EnID AND En2.ID=C2.EnID
       AND C1.CnID=C2.CnID

Really all the "MATCH" gets you is cleaner many-to-many join. But is that tradeoff worth the DLL dependency and burden to bringing on new developers?


http://research.microsoft.com/pubs/259290/GraphView%20User%2...:

  CREATE TABLE EmployeeNode
  (
    [ColumnRole:”NodeId”] 
    WorkId varchar(32),
    [ColumnRole:”Property”]
    Name varchar(32),
    [ColumnRole:”Edge”, Reference:”EmployeeNode”]
    Colleagues VARBINARY,
    [ColumnRole:”Edge”, Reference:”ClientNode”, Attributes: {credit:”int”, note:”string”}]
    Clients VARBINARY
  )
=> they also appear to replace "many rows each with key and a single referenced value" by "row for each unique key with packed values for that key" (possibly with multiple rows for a key if the packed data doesn't fit in a VARBINARY(MAX))

So, the database structure looks a bit like an inverse index. That could speed up some queries.

I have been playing with something similar in pure T-SQL, using a cross apply over a table-valued function to get from this packed format back to the normalized table, but wasn't impressed with its performance. Using CLR was next on my list, but I haven't come around it because I don't have high hopes that it will perform, but I may try this.


I dislike Microsoft in general, but I believe you can accomplish this with SQL server easier than other SQL databases. SQL CLR and your own defined types. Maybe you could even go as far as using some dynamic types with ExpandoObject to have fully dynamic lamda expressions on the columns, however I dont know how well this would perform.


I wonder how this will compare with TitanDB?

http://thinkaurelius.github.io/titan/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: