Trinity - Distributed Graph Database from Microsoft Research

nopal · on March 24, 2011

So many cool things come from Microsoft that lack the story and passion of some of the more well-known open source projects. It's truly disappointing.

_ques · on March 24, 2011

You want a story, here's a passionate story!

MSR Asia is based in Beijing and is one of the fastest growing research outfits in the world. They're slowly showing up all over the place in the CS world, and this is just one of the examples.

Consider Microsoft Academic Search. It BLOWS AWAY Google Scholar, Citeseer etc. in terms of features. Once they attain coverage parity, there is reason to use anything else.

Academic Search, Trinity, and similar projects out of MSR Asia seem to be directly having impact in Bing. In a matter of a short few years, you're looking at a barebones search engine (Live.com) building an R&D infrastructure, with prototypes, and production modules feeding into what is now a pretty rocking search engine. [Example of Academic search integration with Bing: http://www.bing.com/search?q=donald+knuth , scroll to bottom -- page also shows Freebase integration in middle]

Ironically, Microsoft is playing David in the David vs Goliath story here, and the passion is showing in terms of the ecosystem of computing projects and products that are making it into Bing.

If you want to look outside of Bing, consider the Kinect effort. Did you know that the _hardware_ ships with mathemetical models built at MSR, trained using Dryad? (Dryad is an MapReduce competitor out of MSR) The training is the "secret sauce" and why you don't have to spend days calibrating the Kinect.

None of this is privileged information -- you just have to follow the hyperlinks :) The open source world is built on collaboration and sharing, and hence the "story" is the backbone of most work. But that doesn't mean other people don't have stories and passion!

protomyth · on March 24, 2011

Side note: If you go back to the Bill and Steve interview from D5, listen to Mr. Gates mention the tech of kinect years before the product. It is interesting how much MSR jives with that talk.

mhansen · on March 24, 2011

Searching http://www.bing.com/search?q=donald+knuth, I see the freebase integration, but I don't see anything other than search results and ads at the bottom of the page. What am I looking for?

eagleal · on March 24, 2011

I would say that's Powerset's technology merged into Bing, not direct Freebase.

If you want a more practical application for Trinity look at http://research.microsoft.com/probase/ (linked at the bottom of the page). Also to quote the page:

    Microsoft Bing’s AEther project now uses Trinity for managing AEther’s experimental data,
    which consists of large number of workflows, and the evolutions among the workflows.
    Trinity is the backend graph storage engine of AEther's workflow management system.
    We are adding more functionalities, in particular,
    subgraph matching and frequent subgraph mining, to support the project.

_ques · on March 24, 2011

See: http://i.imgur.com/efj7m.png

"Data provided by MS Academic Search"

mhansen · on March 25, 2011

Strange - that didn't appear for me.

kakuri · on March 24, 2011

Why is it disappointing?

Microsoft has become a company that produces technological advances as a side-effect of their primary goal, which is making money. They are not really interested in sharing, or promoting innovation - they are interested in generating lock-in, collecting licensing fees, and stamping out the competition.

What I find truly disappointing is that there are so many brilliant, creative people letting their work be owned and controlled by companies like Microsoft.

Open source projects get admiration and passion because they are OPEN. The top priority is to share the discovery or creation and promote productivity and innovation among others.

scottchin · on March 24, 2011

Actually, my understanding of Microsoft Research is that their main focus is on promoting and sharing innovative ideas. A MSR researcher's performance is reviewed based on academic publications and the amount of impact that their work has on the communities in which their work lies. If their work happens to help one of MS's products, then that is a bonus.

That's what I've been told at least by someone who works for MSR in Seattle. In fact, some of the most passionate people in my field of study (I'm a graduate student in electrical/computer engineering) work for MSR.

dschobel · on March 24, 2011

^^ this. I didn't known about MSR until I became a grad student in comp sci and started seeing the name pop up time and again in the literature whether it was graphics, computer vision, AI, etc etc.

If you like computer science, you like MSR. (Plus, they gave the world F# and C# which are an oasis in the world of enterprise development)

http://scholar.google.com.au/scholar?q=microsoft+research...

sukuriant · on March 24, 2011

I honestly didn't know the exact goal of MSR. That's incredibly pleasing to hear.

neutronicus · on March 24, 2011

Microsoft Research has contributed a lot to GHC.

sushilchoudhari · on March 24, 2011

That is so true. I also did not understand what the real world application around this technology would be. Any ideas there? Something that can be understood in not so layman terms :)

rglullis · on March 24, 2011

A few ideas that come to mind:

1) A distributed wikipedia/dbpedia. Instead of fighting with deletionists, one would just run its own node of the graph database, and keep/merge/sort changes that they see fit.

2) Recommendation engines that are seeded with the user data, at the edge. For instance, instead of me having to upload my song library to Apple or having a last.fm scrobbler running, I would able to have a "Genius" feature without ever having data leaving my computer. If I want to get my friends' recommendations, I could just add them as peers.

3) A medical expert-system that can analyze my medical records that I hold, instead of one central place. Instead of trusting something like Google Health or Microsoft HealthVault to keep the data safe, I can be the only one with access to it, and only talk to my doctor if this system triggers some sort of alarm.

sushilchoudhari · on March 26, 2011

Thanks for the detailed explanation !

neutronicus · on March 24, 2011

This is a perfect fit for something like Diaspora.

rbanffy · on March 24, 2011

But only if it's open source, portable and scalable.

shriphani · on March 24, 2011

Yeah they should have put a video of the researcher against a milk-white background with inspirational music and while they visualize a subgraph we get to hear the climax of a coldplay song.

Disappointing indeed.

barista · on March 24, 2011

I don't get why the passion and a story matters really. What I feel bad about is that they make these cool projects that rarely ship as part of a real product that peopole buy.

What use is this database if its not going to make my search results better or make my computer be a little smarter...

shriphani · on March 24, 2011

MSR might just be the bell labs of this century! They already employ Tony Hoare (Quicksort), Niraj Kayal (the K in AKS), Simon Jones (GHC). What a heady list!

vmind · on March 24, 2011

This looks very interesting, although it seems like they are relying on a very high speed network to get around the latency issues inherent in sharding a graph database. (They mention partitioning, but not whether it occurs online).

Related, my Third Year project is a graph database that loads its data lazily from configurable back ends (databases, caches, APIs, written in Clojure). Now I have more evidence for my dissertation that this kind of tech is useful.

I wonder what their language for queries is going to look like, the only main graph language I've found is Gremlin, which used to be XQuery derived, but recently switched to chained objects.

al_james · on March 24, 2011

This looks great. I wonder if its usable for real-time (low latency) queries? Looks a bit batch oriented to me.

Also, is this open source? Can I download and play?

joshu · on March 24, 2011

What is the point? Show us the code or shut up.

iamelgringo · on March 24, 2011

Respectfully, Josh. Comments like this really don't help the conversation much here at HN.

joshu · on March 24, 2011

It's a company talking about an internal tech (in comparison to another company's internal tech, no less.) They didn't release it. It is completely useless from a hacker's POV.

I stand by my comment.

Also, my name isn't Josh.

iamelgringo · on March 27, 2011

I apologize about the name shortening. As I said, no disrespect intended.

wladimir · on March 24, 2011

Same thought happened to me... "Code or it didn't happen"

If you're interested in this you could play with some other graph-based databases though, for example http://infogrid.org/, or see http://en.wikipedia.org/wiki/Graph_database for an overview. Many open source implementations exist.

rbanffy · on March 24, 2011

If they show you the code, you may get exposed to a lot of proprietary Microsoft IP and your work may be forever tainted by that.

Do you really want it?

BTW, that's why many companies won't allow their engineers to read patents from competitors.

rbanffy · on March 24, 2011

Is there a download link somewhere? Couldn't find it.

BTW, will it run on Linux?

udoprog · on March 24, 2011

Buzzword BINGO!