Show HN: Nebula – a distributed graph database written in C++

FillardMillmore · on Jan 15, 2020

Interesting stuff, thanks for sharing!

How did you decide to use an SQL-like query language rather than a declarative query language like Neo4j uses (Cypher query language)? What do you see as the pros and cons of that decision?

Do you have any plans currently to design an ETL tool to make extracting data from a RDBMS and loading into NebulaGraph easier?

I didn't see this in any of the documentation (though I did admittedly skim), but is there any sort of visual front-end built in? If not, do you have any plans to make one?

vvern · on Jan 15, 2020

SQL is a declarative language. Cypher is somewhat SQL-like, also declarative language which exists to make the expression of graph queries with predicates over nodes and edges easier to express.

FillardMillmore · on Jan 15, 2020

Thank you for the correction. You are correct, SQL is a declarative language. It would've been more accurate for me to refer to Cypher as 'a more expressive declarative query language' (specifically for the graph database paradigm) than a typical SQL-like query language.

jamie-vesoft · on Jan 15, 2020

Thanks for your interest! :)

Good question re data loading! Currently there are two tools for this purpose: 1) Spark Writer which enables you to load data from DWH to Nebula Graph via Spark: https://github.com/vesoft-inc/nebula/blob/master/docs/manual... 2) CSV importer if you don't have to import very large data from RDBMS to Nebula Graph: https://github.com/vesoft-inc/nebula-importer

The visual will be released by the end of this week. :)

de6u99er · on Jan 18, 2020

Apache Drill is nice for querying structured files like CSV.

https://drill.apache.org/

ablekh · on Jan 15, 2020

Looks interesting. Good luck! Couple of questions: 1) why did you decide to create your own graph query language instead of trying to follow recent graph query languages standardization trend (e.g., see https://www.tigergraph.com/2019/02/25/the-road-to-standardiz... and https://www.tigergraph.com/2019/03/15/the-road-to-a-standard... 2) do you plan to support semantic features [i.e., RDF/S, Gremlin, SPARQL, inference]; 3) do you plan to have SDKs for popular languages [e.g., Python, TypeScript]; 4) did you benchmark (or plan to do so) Nebula Graph against competition in terms of performance [i.e., data import, query throughput and latency, inference speed, if/when supported]; 5) have you figured out which features will remain open source and which ones will be enterprise-only (I assume that you plan to follow the Open Core business model)? Thanks!

jamie-vesoft · on Jan 16, 2020

Thanks for the questions! Good ones indeed.

1)nGQL will be compatible with the international standard GQL.

2) Currently no

3) So far the following SDKs are available:

   Python https://github.com/vesoft-inc/nebula-python 

   Java https://github.com/vesoft-inc/nebula-java

   Go https://github.com/vesoft-inc/nebula-go

4) We are working on the benchmark. Stay tuned.

5) Yes we'll follow the Open Core business model like you said. All the features that are open source today will remain open source. :)

Please let me know if you have any further questions.

ablekh · on Jan 16, 2020

It's my pleasure! Thank you for the clarifications. I will definitely follow your efforts. :-)

jamie-vesoft · on Jan 16, 2020

Much appreciated! You may find us on Twitter to follow the most recent updates.:) Also, welcome to be a contributor!

rs23296008n1 · on Jan 15, 2020

This looks good. I like the query language.

Whats the status of the C++ client?

https://github.com/vesoft-inc/nebula/blob/master/src/client/...

I see it has connect/execute/disconnect. This seems like the minimum required. What can't I do with that basis?

jamie-vesoft · on Jan 15, 2020

Glad you loved the query language!

The C++ client is PR ready now:

https://github.com/vesoft-inc/nebula/pull/1013

maxpert · on Jan 15, 2020

Any production usages that can tell us about how much it can handle? How does it compare to dgraph?

jamie-vesoft · on Jan 15, 2020

Thanks for showing interests in Nebula Graph!

In some production deployments, the graph size reaches hundreds of billions of edges and data size reaches dozens of TB.

lmeyerov · on Jan 15, 2020

Wow, not sure how I missed this! Bravo!

If anyone involved would appreciate access to Graphistry to experiment with a gpu client/cloud visual analytics side (e.g., jupyter & react), let me know (leo@....com). Would love to see them together!

jamie-vesoft · on Jan 15, 2020

Exciting! I know Graphistry from Twitter.

dang · on Jan 15, 2020

We changed the URL from https://github.com/vesoft-inc/nebula to the project page.

jamie-vesoft · on Jan 15, 2020

Hey, thanks for letting me know. May I ask why?

dang · on Jan 15, 2020

Because it's more project-specific and thus stands out more. There are a ton of Github links posted to HN, obviously.

If you'd rather we switch to the other URL, let us know and we can do that.

jamie-vesoft · on Jan 15, 2020

Ah I see. Thanks for the explanation! Could you please change the URL back to the GitHub URL? Also the title? Much appreciated!

dang · on Jan 16, 2020

I've changed the URL back from https://nebula-graph.io/.

The submitted title ("Show HN: An open-source distributed graph database written in C++") led to lots of arguments, so we changed it in accordance with HN's rules about baity titles. That's in https://news.ycombinator.com/newsguidelines.html.

jamie-vesoft · on Jan 17, 2020

Thanks for the explanation. I thought that was the reason you changed the title. Makes sense to me. :)

de6u99er · on Jan 17, 2020

I gave the tutorial using docker a try. The SQL like query language is OK until it gets to doing queries with where clauses on the data. Some form of help and auto completion would be great.

I recommend to align with the GQL standard: https://www.gqlstandards.org/

And Apache Tinkerpop. E.g. a RDF Sail implementation for Nebula. http://tinkerpop.apache.org/

jamie-vesoft · on Jan 18, 2020

Thanks so much for trying Nebula! We really appreciate it.

Sorry about the where clause issue. Do you mind bringing an issue in this regard on our GitHub repo? So that we can assign it to relevant staff.

As to the query language, thanks for your suggestion and nGQL will surely be aligned with the GQL standard. We are keeping a close eye on it. :)

We are planning to support OpenCypher in the first half of 2020 and TinkerPop would be the next.

Thanks again! Here's our slack group btw and you may raise any question there: https://join.slack.com/t/nebulagraph/shared_invite/enQtNjIzM...

slowmotarget · on Jan 15, 2020

It looks really neat! What's your plan on making this project viable in the long run? Do you envisage to monetize hosting, or maybe create a community vs. paid edition?

jamie-vesoft · on Jan 15, 2020

Glad you liked the project! Hosting service would be the main monetization method. In addition, we will be providing consulting, training and all sorts of enterprise services.

bane · on Jan 15, 2020

This is very interesting. Anybody use it? Alternatives?

lmeyerov · on Jan 15, 2020

We help bring gpu visual analytics & investigation automation to users of all sorts of graph DBs (think tableau & servicenow for graph), so based on our enterprise/big tech/gov/startup interactions:

1. Shortlist (and in no order): Neo4j, AWS Neptune, Datastax Graph, TigerGraph, Azure CosmosDB, and JanusGraph (Titan fork) are the ones we see the most in practice, and not in production but rumor-mill, Dgraph, RedisGraph, & ArangoDB. The three-and-four-letter types seem to roll their own, for better or worse. There are also some super cool ones that don't get visibility outside of the HPC+DoD world, like Stinger & Gunrock. Interestingly, the reality is a ton of our graph users aren't even on graph DBs (think Splunk/ELK/SQL), and for data scientists, just do ephemeral Pandas/Spark. As someone from the early days of the end-to-end GPU computing movement, we're incorporating cuGraph (part of nvidia rapids.ai) into our middle tier, so you get to transparently benefit from it while looking at data in any of the above.

2. I now slice graph DB's more in terms of OLTP (neo4j, janus, neptune, maybe tiger) vs OLAP (spark graphx, cugraph) vs batch (janus, tiger) vs friendly BI/data science (neo4j) vs friendly app dev / multi-modal add-on (CosmosDB, Neo4j, Arango, Redis). Curious to see how this goes -- given the number of contributors, I'm guessing it's doing well in at least one of these. +1 to hearing reports from others!

bane · on Jan 15, 2020

Thanks, I really appreciate the comprehensive write up of what your team is seeing. Any chance of a longer blog post that expands on this, especially pro-cons and performance?

lmeyerov · on Jan 15, 2020

Yes, that is a great idea!

derefr · on Jan 15, 2020

For someone who just wants to run some (intensive) OLAP graph queries on the “graph formulation” of a relational or hierarchical dataset every once in a while (maybe batch, maybe user-initiated, but either way <1QPS), but doesn’t yet have a graph DB and doesn’t really want to maintain their data in a canonical graph formulation, which type of graph DB would you recommend as the simplest-to-maintain, simplest-to-scale “adjunct” to their existing infra?

I.e. what’s the graph DB that best fits the use-case equivalent to “having your data in an RDBMS and then running an indexer agent to feed ElasticSearch for searching”?

lmeyerov · on Jan 15, 2020

My default nowadays is minimize work via "no graph db": csv/parquet extract -> jupyter notebook of pandas/cugraph/graphistry, and if that isn't enough, then dockerized (=throwaway) neo4j , or if the env has it, spark+graphistry. The answers to some questions can easily switch the answer to say "kafka -> tigergraph/janusgraph/neptune", or some push button neo4j/cosmosdb stuff:

* Primary DB: type / scale, and how fresh do the extracts need to be (daily, last minute?)

* Are queries more search-centric ("entities 4 hops out") or analytics ("personalized pagerank")?

* Graph size: 10M relations, or 10B? Document heavy, or mostly ints & short strings?

* Is the client consuming the graph via a graph UI, or API-only?

* Licensing and $ cost restrictions?

* Push-button or inhouse-developer-managed?

The result of (valid) engineering trade-offs by graph db dev teams means that, currently, adding a graph db as a second system can be tricky. The above represent potential mismatches between source db / graph stack / workload and team burden. Feels like this needs a flow chart!

Happy to answer based on the above, and you can see why I'm curious which areas Nebula will help straddle :)

jamie-vesoft · on Jan 16, 2020

Very insightful answer! Thanks for sharing your opinions here. Nebula Graph is good at OLTP use cases where high QPS and low latency are required.

sfgweilr4f · on Jan 15, 2020

I'd say dgraph (https://dgraph.io/) is closest competitor but Neo4j (https://neo4j.com) as well which has longer heritage.

I'd also include redis because of the graph module (https://oss.redislabs.com/redisgraph/).

I've likely missed a bunch of others. Add them as I'm interested in graph db and have only scratched the surface myself.

rapnie · on Jan 15, 2020

There is also decentralized Gun DB: https://gun.eco/

OrientDB is an alternative to Neo: https://orientdb.org/

There are a bunch of DB's compatible with Tinkerpop and e.g. query-able with Gremlin: http://tinkerpop.apache.org/

rapnie · on Jan 15, 2020

There is also Weaviate, still in development, which has a flavor of GraphQL for querying: https://github.com/semi-technologies/weaviate

And this awesome page has some good entries: https://github.com/jbmusso/awesome-graph/blob/master/README....

jhoechtl · on Jan 15, 2020

Alternative:Dgraph

https://dgraph.io/

In architecture and goals it actually closely resembles Dgraph, would love to see an (opinionated) comparison by Manish, the CEO of Dgraph

mrjn · on Jan 16, 2020

(Manish here) Don't know much about Nebula. Feels quite inspired by Dgraph.

shermanye · on Jan 17, 2020

(Sherman here. I'm the founder of Nebula) Nice to meet you here, Manish. Nebula is actually inspired by the Facebook internal project Dragon (https://engineering.fb.com/data-infrastructure/dragon-a-dist...). Fortunately I was one of the founding members of the project. The project was started in 2012. We never heard of dgraph at that time. So I'm not sure who was inspired :-)

The goal of Nebula is to be a general graph database, not just a knowledge graph database. There are some fundamental differences between the two.

We welcome any positive feedback and technical discussion. We would love to learn to the community and to provide a product which truly satisfies customers' needs.

jhoechtl · on Jan 23, 2020

> The goal of Nebula is to be a general graph database, not just a knowledge graph database. There are some fundamental differences between the two.

I am by no means a Graph expert but what are some of the mentioned fundamental differences?

w3clan · on Jan 15, 2020

I had the very same feeling, dgraph is older and has a larger community plus additional features like:

- geospatial features

- good speed as it is based on badgerdb key value database and ristello cache library.

- http library and other features

One of the advantage I saw in nebula graph is security role based access which is not available in dgraph until today.

I am very curious about benchmark between nebula graph and dgraph.

Also what is storage system used in nebula graph.

ayorosmage · on Jan 15, 2020

ACL is an enterprise dgraph feature: https://dgraph.io/support

jamie-vesoft · on Jan 15, 2020

geospatial is also available in Nebula Graph.:)

As to the storage system, Nebula Graph is based on multi-group raft and RocksDB.

w3clan · on Jan 16, 2020

Can you add support for module based storage system like, if someone wants to use badgerdb or leveldb or any other storage system instead of rocksdb

jamie-vesoft · on Jan 17, 2020

Yes, Nebula Graph supports multiple backend storages by design. So theoretically you are able to use whatever storage you want for whichever graph space in Nebula Graph.

You may take a look at this article about the design of our storage engine: https://github.com/vesoft-inc/nebula/blob/master/docs/manual...

In 2020 we will be working on more plugins. You may stay tuned if that interests you. :)

jhoechtl · on Jan 15, 2020

Geospatial is on the TODO list

jamie-vesoft · on Jan 15, 2020

Geospatial support has already been merged to the code base. :)

w3clan · on Jan 16, 2020

Dgraph has another disadvantage of data redundancy, where data associated with multiple index are stored multiple times for speed.

Does nebula also store data multiple time for multiple index?

jamie-vesoft · on Jan 17, 2020

Thanks for asking! Sorry I missed this question earlier.

Nebula doesn't store data multiple times for index.

And here's how the indexing works in Nebula Graph:

You are allowed to create multiple indexes for one tag or edge type in Nebula Graph. For example, if a vertex has 5 properties attached to it, then you can create one index for each if it's necessary for you. Both indexes and the raw data are stored in the same partition with their own data structure for quick query statement scanning. Whenever there are "where" clause/syntax in the queries, the index optimizer decides which index file should be traversed.

campoy · on Jan 16, 2020

Hi there, out of curiosity what do you mean the data is stored multiple times for speed?

We (I work at Dgraph) have data redundancy when you have multiple replicas for a given group - but that's an optional feature.

Thanks!

emmanueloga_ · on Jan 15, 2020

Check the list of implementations of SPARQL [1].

One of the most interesting picks: RDF4j (java based). It can connect to a lot of different SPARQL servers, but the rdf4j Native Store should be good enough for data sets in the order of the "100 million triples", according to the docs.

I don't know much about it, but not long ago they announced integrated support for "federated queries", which means that if you data set can't fit in a single node, they have a solution to query different servers in the same query [2].

I'm slowly learning through the forest of related technologies, one of the most useful is SHACL [3], which is a language to validate and extract pieces of the graph that match a pattern (very loosely, think a "schema" for graphs).

1: https://en.wikipedia.org/wiki/List_of_SPARQL_implementations

2: https://rdf4j.org/news/2019/10/15/fedx-joins-rdf4j/

3: https://rdf4j.org/documentation/programming/shacl/

rapnie · on Jan 15, 2020

Before using RDF for graphs one should inform themselves on the differences between labeled property graphs and triple stores, and choose the model that best fit their use case.

Take your time and beware of objectivity of article. Vendors try to lure you in. Following has some good info (but a Neo4j bias): https://dzone.com/articles/rdf-triple-stores-vs-labeled-prop...

emmanueloga_ · on Jan 15, 2020

Good point. Funny you mention that article: I remember encountering both that article and another one that provides some counterpoints! [1]

Also, both those articles are a bit old: RDF* ([2],[3]) is a new extension for RDF that makes it easier to accomplish the same kind of things you can do with property graphs. RDF4j has support for RDF* in the roadmap! [4].

To me, the fact that RDF is 1) a simpler and more general model and 2) an open standard with multiple free and commercial implementations; makes RDF a more a attractive option than locking into a single proprietary implementation like Neo4j.

--

1: http://www.snee.com/bobdc.blog/2018/04/reification-is-a-red-...

2: http://olafhartig.de/slides/RDFStarInvitedTalkWSP2018.pdf

3: http://blog.liu.se/olafhartig/2019/01/10/position-statement-...

4: https://github.com/eclipse/rdf4j/issues/1484

zozbot234 · on Jan 15, 2020

RDF is an interoperability mechanism, it has nothing to do with the architecture you use internally for your database. You can have a PostgreSQL database and offer an endpoint for querying it via RDF.

jamie-vesoft · on Jan 15, 2020

Currently the project has been deployed in multiple leading internet companies in China, including Tencent, MeiTuan (Chinese Yelp), Red (Chinese Pinterest), Vivo, and so on.

bane · on Jan 15, 2020

That's pretty impressive. I'd love to see some details blog posts about setting it up, or using it in production (things to watch out for, good practices for provisioning hardware, etc.).

jamie-vesoft · on Jan 15, 2020

There is a Getting Started series in the GitHub wiki page: https://github.com/vesoft-inc/nebula/blob/master/docs/manual...

Check out the Getting started YouTube video here if you prefer video tutorials: https://www.youtube.com/channel/UC73V8q795eSEMxDX4Pvdwmw

Also some FAQs: https://github.com/vesoft-inc/nebula/blob/master/docs/manual...

If you are interested in the architectural design of the project, here are some articles for your reference:

Overview: https://github.com/vesoft-inc/nebula/blob/master/docs/manual...

Storage engine: https://github.com/vesoft-inc/nebula/blob/master/docs/manual...

Query engine: https://github.com/vesoft-inc/nebula/blob/master/docs/manual...

Feel free to contact us if anything is missing. :)

gibsonf1 · on Jan 15, 2020

We're using Blazegraph (The db that Amazon Neptune uses) with great success. We only use the Sparql api with quads so we get nested graphs.

foota · on Jan 15, 2020

The architecture diagram looks interesting, would love to read more about it if anyone finds something.

rainyi2007 · on Jan 15, 2020

I found some articles about the architecture and design of the database in their repo:

Overall structure: https://github.com/vesoft-inc/nebula/wiki/Nebula-Graph-Archi...

Storage engine design: https://github.com/vesoft-inc/nebula/blob/master/docs/manual...

Query engine design: https://github.com/vesoft-inc/nebula/wiki/Query-Engine-Overv...

Hope that helps. :)

ioli · on Jan 15, 2020

Thanks

winrid · on Jan 15, 2020

Love the query language! Very easy to dive into. Any companies using this in production?

jamie-vesoft · on Jan 15, 2020

Glad you loved the query language. Simplicity and versatility are our design goals for the language.

Currently the project has been deployed in multiple leading internet companies in China, including Tencent, MeiTuan (Chinese Yelp), Red (Chinese Pinterest), Vivo, and so on.

gibsonf1 · on Jan 15, 2020

SPARQL?

cetra3 · on Jan 15, 2020

It's not open source, as it is licensed under Commons Clause according to the README, which according to the FAQ is not open source.

I'd be interested in knowing whether the commons clause license has been challenged as the wording is rather simple

thunderbong · on Jan 15, 2020

Honestly, I don't see what's wrong with expecting payment for your work if someone else decides to sell it. Why should 'open source' get conflated with free (as in gratis)?

For me, open source has been an incredible way to learn software - it's syntax, it's architecture, it's control flow, it's gotchas.

From my understanding of the license [1], you can see the code, learn from it, do whatever you want with it, modify it if you so please, improve on it, whatever. The only thing you cannot do is sell it. Because you've taken someone else's idea in the first place.

I see this happening all the freakin' time and it pisses me off no end. If I suggest a software to someone, the first thing they as is 'Is it open source?' What they really mean is 'Is it free?' Why? If someone is expecting to get paid for creating software for others, why is the feeling not reciprocated towards the person who's created the software in the first place?

From what I've seen, most managers and software engineers, expect to get paid for their work but all the software which helps them make that money, they expect for free.

I find that attitude extremely hypocritical, honestly.

[1]: https://commonsclause.com/

zozbot234 · on Jan 15, 2020

Why should open source get conflated with things that are NOT open source? Putting restrictions around "commercial" use (which is notoriously hard to define) is not open source. Discriminating against fields of endeavor is not open source.

If you want to get paid for developing genuine open source software, there are things you can do to that effect. Get paid for support (even maintaining the code is support). Offer to highlight companies that support your software (even if the highlighting is quite trivial, this is enough to unlock 'marketing' expenses and make it easier for business-oriented entities to support you). Start a Patreon page. There are lots of things that can be done without adding any licensing restrictions.

brobdingnagians · on Jan 15, 2020

> "without adding any license restrictions"

That would imply public domain. Every license has some licensing restrictions. MIT, BSD, and associated ones are closest to that, but still have restrictions. "Open source" in the literal sense in English is where the source is open to be looked at by everyone. Lots of software is like that, even fully commercial offerings. AGPL, GPL, and co have pretty drastic limitations on commercial usage (much more than the Commons Clause), but are obviously open source. The author should decide licensing, and if the source is available to be perused-- the English language would tend to call that, "open source". I think "OSI Approved Open Source License" would be a better phrase than the linguistically vague "open source". English has proper nouns for that sort of thing, and if we can go around writing "GNU/Linux", I think specifying the _type_ of open source license really isn't too much to ask for.

zzo38computer · on Jan 15, 2020

There are some licenses effectively like public domain, such as zero-clause BSD, CC0, WTFPL, Unlicense, etc.

GPL does not restrict commercial use any more than non-commercial use. What it does restrict is adding additional restrictions, it requires source code to be distributed, and it does not allow disallowing the user to substitute their own version.

If the source is available to be perused I think it is called "shared source" (or "source available"); "open source" is a subset of that, and is according to the OSI definition. "Free software" is also a subset of "source available". "OSI approved" is a subset of "open source" because OSI approved does not include public domain, even if it is still open source (which in some cases it is) (also some stuff that meets the OSI definition (by both words and intention) might not be OSI approved because OSI has not looked at it yet). And then there is also "FOSS".

ethbro · on Jan 15, 2020

> Putting restrictions around "commercial" use (which is notoriously hard to define) is not open source.

That's... somewhat accurate.

Let's not pretend the GPL team itself didn't have issues with Tivo-ization, that prompted license changes.

Cloud servic-ization is the virtualization of hardware modification locks.

So call it opinions about "commercial" or use another word, but the GPL definitely has them.

zozbot234 · on Jan 15, 2020

> Cloud servic-ization is the virtualization of hardware modification locks.

Hence why the FSF advocates the AGPL for software that's designed to be performed "as a service" over a computer network. But "no tivoization" and AGPL clauses do not deny these uses; they simply enable the end user of the software to exercise her rights with respect to it.

microcolonel · on Jan 15, 2020

Then don't call it open source, because that word in almost all cases involves free use regardless of commercial interest.

It's all well and good, and nothing immoral is done by offering code under this license, but that doesn't make it open source.

codycraven · on Jan 15, 2020

I get that Commons Clause isn't "open source" but I really love the concept. If a company wishes to productize a creator's work it seems reasonable to pay the creator to alternatively license it (if the company doesn't want to put their secrets out to the public).

Meanwhile the creator gets to share their work freely with anyone who wishes to use it as a component of their own product/software in the spirit of open source.

jamie-vesoft · on Jan 15, 2020

That's exactly what Common Clause is pursuing IMO.

jamie-vesoft · on Jan 15, 2020

As far as I know, Common Clause can be attached to any open-source projects. The main purpose is to prevent cloud providers monetizing from the project without contributing back. So Nebula Graph's main license is Apache 2.0, meaning that to most users it is open source, no different than any other open source projects. :)

bluejekyll · on Jan 15, 2020

The real question that has to be answered, and this is the hard one, when does the product begin to be monetized?

Let’s say it’s a full DB option as part of AWS RDS (or whatever that graph DB equivalent is). That probably is clearly monetizing the product. But what if they completely abstract the API and not expose the original one, it’s just the backing engine for a graph DB product?

Now moving away from a direct product, what if it’s just the backing DB AWS uses for managing all of their infrastructure? It’s not being directly monetized at that point but it might be the most critical component for the AWS operations, which means that it is helping them monetize other products. Do they owe in this case? (I’m speaking about the license here, not whether or not they should or should not based on goodness or feature improvements they want to pay to see).

As the DB moves further away from profit centers in an organization, at what point is it no longer being monetized?

Personally, I’d like to see a model where the OSS developers can and are paid in all of these cases for their work, but I’m not always sure there is anything better than a contract to support and build new features (classic OSS support model).

zozbot234 · on Jan 15, 2020

> to most users it is open source

Discriminating by field of endeavor is contrary to the definition of open source software, and has been since before the term even existed. It's not open source, it's effectively Shared Source and developers who care about open source should stay away from this.

adrianN · on Jan 15, 2020

I don't understand why projects don't simply use AGPL and offer a commercial license as an alternative.

fiberoptick · on Jan 15, 2020

There's a subtle difference between AGPL and the Commons Clause licenses.

AGPL requires network-accessible code to be disclosed & licensed under an AGPL-compatible license.

The Commons Clause license outright prohibits SaaS-style offerings of the licensed code.

A lot of startups licensing their code under AGPL might still have AWS et al. eat their lunch, becuase all Amazon needs to do to remain compliant is to publish any modifications made to the AGPL-ed code.

adrianN · on Jan 15, 2020

AGPL is super-banned at all companies because lawyers deem it a huge risk. In particular it's banned at Amazon. IIRC you aren't even allowed to have AGPL software on your laptop at all.

ethbro · on Jan 15, 2020

In my experience enterprise lawyers deem everything they can't be 100% certain about a huge risk. Legal CYA.

galaxyLogic · on Jan 15, 2020

It seems to me the difference is not so subtle. As I understand from https://commonsclause.com/ you can use Commons Clause licensed code-library as part of your commercial application without having to make your source-code available whereas with AGPL you would have to. But I may be wrong?

orangeshark · on Jan 15, 2020

Commons Clause is added ontop of an exisiting FOSS license. So it will be whatever requirements the base license plus an anti-commercial restriction preventing others from offering SaaS services.

stockkid · on Jan 15, 2020

Another approach I have seen is BSL (Business Source License), which is kind of like Commons Clause in that it prohibits commercial offerings of the software, but after a rolling time limit, converts to an open source license. I might be wrong, so please correct me.

https://www.cockroachlabs.com/blog/oss-relicensing-cockroach...

zozbot234 · on Jan 15, 2020

Yes, if you're choosing between Commons Clause and BSL please choose the latter. Because (1) it has a way less confusing name and mechanism of action, and (2) it acknowledges that some people may care about an actual OSS license for your software, and makes it clear how that might be achieved.

e12e · on Jan 15, 2020

Oh wow, this really is just source available, isn't it?

> For purposes of the foregoing, "Sell" means practicing any or all of the rights granted to you under the License to provide to third parties, for a fee or other considerationon (including without limitation fees for hosting or consulting/support services related to the Software), a product or service whose value derives, entirely or substantially, from the functionality of the Software.

So, you cannot pay a contractor to set this up, because they can't deliver to you if they charge for setup or hosting?

jamie-vesoft · on Jan 16, 2020

Really appreciate the feedback and discussion! We will seriously consider the license issue.

Please DO let us know if you have any better license options than Common Clause that can help provide an open-source project for the community while stop cloud vendors from monetizing without contributing back?

Thanks again!

pbowyer · on Jan 16, 2020

I follow a few licensing blogs, and I know people are talking about and working on alternatives to Common Clause. As far as I can tell, all will disappoint those who believe the OSI's definition of "Open Source" is the one true open source.

My bookmarks: https://katedowninglaw.com/blog/ https://writing.kemitchell.com/ (blog & the blogroll for finding others). See also: https://www.google.com/search?q=site%3Awriting.kemitchell.co... http://www.blueoakcouncil.org/

jamie-vesoft · on Jan 16, 2020

Much appreciated! We will definitely check these out and re-consider our license.

shermanye · on Jan 17, 2020

Nice to meet everyone here. As a newcomer, I would like to introduce ourselves a little bit. Nebula is inspired by the Facebook internal project Dragon (https://engineering.fb.com/data-infrastructure/dragon-a-dist...). Fortunately I was one of the founding members of the project. The project was started in 2012. Since then I've been spent all my time working on the graph databases.

The goal of Nebula is to be a general-purposed, distributed graph database. We welcome any positive feedback and technical discussion. We would love to learn to the community and to provide a product which truly satisfies customers' needs.

pojntfx · on Jan 15, 2020

It's a proprietary license, false advertising ...

jamie-vesoft · on Jan 16, 2020

Thanks for your feedback!

While our original intention is to provide a real open-source graph database project for the community, we also want to prevent cloud vendors from monetizing the project without contributing back to the community. Exactly like what's explained in this TechCrunch article: https://techcrunch.com/2018/09/07/commons-clause-stops-open-...

That being said, Common Clause seems to be the only license that can be used. Quote the article: "Academics, hobbyists or developers wishing to use a popular open-source project to power a component of their application can still do so. "

However, we will seriously consider the license issue. Please do let us know if you know any better licenses that can be used.

Much appreciated!

SQueeeeeL · on Jan 15, 2020

Huh? I'm ready to dunk on any project, but that's just not true... https://github.com/vesoft-inc/nebula/blob/master/README.md#l...

essive · on Jan 15, 2020

Why is this project tied so specifically to China? Honest question here....

flohofwoe · on Jan 15, 2020

What are those specific ties other than the company behind the project being located in China?

essive · on Jan 16, 2020

I just thought it seemed a bit odd to me to have the project tied to large organizations in China so much. I’m not saying to “buy American” but I do think it’s reasonable to be perplexed.

apta · on Jan 15, 2020

Still a red flag.

SQueeeeeL · on Jan 15, 2020

Is it? That's like saying that Intel being located in the US is a red flag for them leaving backdoors in their hardware... oohhh

https://en.wikipedia.org/wiki/RDRAND#Reception

apta · on Jan 15, 2020

And China has backdoors in their stuff. We shouldn't be using it.

flohofwoe · on Jan 15, 2020

So basically "only buy American"? ;)

I don't want to defend that company or the product, or the country they operate from, but the source code is all on github under a permissive license and thus can easily be auditioned for government backdoors. Where's the problem?

apta · on Jan 15, 2020

Backdoors can still exist in public code bases.

sweetdreamerit · on Jan 15, 2020

"Five-starred Red Flag", indeed ;) https://en.wikipedia.org/wiki/Flag_of_China

apta · on Jan 15, 2020

Better watch out for backdoors then.