WhiteDB – Lightweight NoSQL database written in C, operating in main memory

tammet · on Oct 25, 2013

One of the authors here. A few answers quickly. It does write to disk: you can either dump memory or write all changes to log (turn it on/off yourself). Sure it has a global read/write lock, with several locking strategies to select from (task-fair atomic spinlock queue or a reader-preference or a writer-preference spinlock). It is definitely meant to be a simple library. We strived to document it carefully to make usage as easy as possible. Yes, you can very easily form lists, trees or any other pointer structures. Happy to see it on the Hacker News, we never really expected that :)

bch · on Oct 25, 2013

Congratulations on your project and the attention it's getting!

Can you explain-more/rationalize GPLv3 licensing w/ the conditional alternate commercial license?

Why not just BSD, MIT, or (at least) LGPL ?

GPL is more understandable on "higher level software" (ie: complete applications), but I don't understand your intent licensing a library this way.

belorn · on Oct 25, 2013

The license page is quite clear why. The authors want that applications that are distributed and marketed as database systems to be used by other developers to be under GPLv3.

You might ask why they want that, and that could be an interesting read. My best guess: They are themselves developers.

tammet · on Oct 25, 2013

Making a clean cut between free-as-in-speech on one hand and free-as-in-beer on the other.

mbreese · on Oct 25, 2013

I appreciate the sentiment, but I was looking forward to using this until I saw that part. I (like I assume many others) cannot use a GPL3 library (and I'm in academia). If you want any sort of traction for a library, GPL3 is not the way to go.

This is why the LGPL was created, so that you can have modifications done on your library be free-as-in-speech, but still make the library as a whole useable for a wide variety of other projects, including closed-source versions.

Having a separate requirement to email you for a free-as-in-beer license is just overly complicated for this. The more hurdles you put up for people, the fewer that will adapt the library. I think that licensing is one of those cases where is doesn't pay to be clever. Plus, what happens when you decide to stop maintaining the code? Do you want to keep getting emails for licenses years from now?

Edit: in last paragraph, I said free-as-in-speech, but meant beer (see comment below).

tammet · on Oct 25, 2013

The default GPL is free-as-in-speech. You do not have to email for GPL. You have to email for free-as-in-beer. I assume that in case free-as-in-speech is not OK, it is also not a major hurdle to email for the free-as-in-beer version. In case emailing is a major hurdle, maybe you do not really need the free beer part.

Should we stop maintaining the code or get bored mailing free beer licences, we'll very likely change the licence to LGPL or MIT. Until then beer comes via email.

bch · on Oct 25, 2013

In the case of changing licenses, make sure over the course of your project maintainorship that you have the right to relicense all the code, including patches/contributions from others.

I wish it was simply licensed MIT or BSD, but congratulations on your software and sticking to your convictions.

:)

belorn · on Oct 25, 2013

> I (like I assume many others) cannot use a GPL3 library (and I'm in academia).

Is it copyleft in general, or the patent grant that hinders your work in in Academia? I not sure why you should be using other peoples work for free, but then go around and sue anyone who copies or improves on your work.

The project wrote down exactly what they wanted to do with their work on their license page. I say good for them. More people should do so and think what they themselves want.

mbreese · on Oct 26, 2013

I've had academic licensing offices balk at the GPL. I've had my fights with people over this, and lost. There are some specific clauses that they didn't like (this was GPL2). However, they rarely have problems with MIT/BSD licenses, so in general, that's what I try to use.

My stance is that since they did the work, the authors of the library can license it however they'd like. But, if they wanted to get more people using their library, I think that they should rethink their approach. LGPL is more appropriate for a library, where you can still have your copyleft approach for the code you wrote, while still promoting wider use.

Here's an extreme edge case... as they said, if they get tired of supporting the email to get a free-as-in-beer license, they will just open it up with an MIT/BSD style license and be done with it. That's great. But what if someone gets hit by a bus? Or someone leaves the project and moves to Antarctica? There would be no practical way to release an unencumbered version.

Really though, they can do what they want - it's their code. But licensing is one of those areas that you really shouldn't try to be clever.

teddyh · on Oct 27, 2013

> But, if they wanted to get more people using their library, I think that they should rethink their approach.

They actually don’t want as many people as possible using their library. That is not the goal when choosing the GPL. The goal is to maximize the number of free users in the world – that is, users who have the freedoms which define Free Software. Mere users is inconsequential. If users is what you desire, then by all means, choose a permissive license (MIT/BSD/etc).

tammet · on Oct 25, 2013

I'd just make a remark that even in the GPL world everything is not as simple as it looks. There are GPL versions with _exceptions_ endorsed by RMS, for example. A long time ago I used to work on a Hobbit scheme compiler for the scm interpreter, which was promoted by RMS and became Guile later. scm had such a GPL-with-exceptions clause by RMS, which was stated clearly incorrectly. I take every chance to boast that I convinced RMS to fix the error in his own GPL version for scm :)

bitdiddle · on Oct 26, 2013

Yes, there's a lot of subtlety to licensing. Personally I think it should be taught in computer science schools. Open source software has really changed the dynamics of corporations. Of course we wouldn't have the open source movement without free software and imho free software is more important than ever. You seemed to have struck a nice balance with this exception that protects your interests in the database space.

And yes, getting RMS to change something is quite the accomplishment :) His ability to walk the talk is impressive.

otterley · on Oct 25, 2013

Why is shared memory better than mmap(2) here? With the latter, you get persistence for free.

danjayh · on Oct 25, 2013

Is there information available anywhere on the required resources? Specifically:

1) How much space does the compiled code require? Can conditional compilation be used to omit unused features?

2) What is the overhead for the various data structures?

I'm thinking that it might be interesting to use this on very limited environments (PIC microcontrollers for example) where every byte matters.

tammet · on Oct 25, 2013

Conditional compilation can be quite certainly used. The best way to find out the space requirements is to try out some of the examples provided. I cannot give the overhead exactly, but it can be read from the source with not too much effort. Send an email to tanel.tammet at gmail.com if you need help with that. In broad terms, we have been very careful with using memory, both for the reasons you state and the reason of getting more bang from the cache.

endgame · on Oct 25, 2013

Good on you for going GPLv3 as your free software license. I think it's really funny to see people here going "b-but you should use my favourite license" instead of "that's a cool thing you've built there".

bch · on Oct 25, 2013

> "b-but you should use my favourite license" instead of "that's a cool thing you've built there".

First of all, "that's cool" and "license issues" aren't mutually exclusive.

Second of all, it's a case of picking an applicable license for the place the software fits into a project. This is a low-level library (cool or not); if an author wishes to "protect" their code, LGPL was built explicitly for this purpose.

To confuse matters, in the projects own page[1] they effectively waive all the rights of the GPL3 except for a very specific corner case.

People asking for license clarification or change are looking to simplify even _beginning_ to use the library.

[1] http://whitedb.org/licence.html

nknighthb · on Oct 25, 2013

What disgusting arrogance. There is no objective standard of "applicability" of any particular license to any particular piece of software. The LGPL is not "built explicitly for this purpose". The LGPL is a compromise license that exists for strategic reasons. There is no clear reason it should be applied here.

That some people may not want or be able to use this library because of its license choice may matter to you, but there's no reason to believe it matters to the authors, and you are not in a position to tell them what should matter to them.

Nothing on that page constitutes a waiver of GPLv3. It is merely an offer of an alternative license to a subset of potential users.

bch · on Oct 25, 2013

> but there's no reason to believe it matters to the authors

One reason may be to increase the number of folks using the library. I'm not saying that would happen, or that the original devs care, but it's conceivable, isn't it?

> and you are not in a position to tell them what should matter to them.

I'm not telling them anything -- I'm expressing an opinion and asking questions. [sidenote: isn't it ironic to be called "disgustingly arrogant" and attacked for asking questions on a matter of free speech? I think my discourse has been polite and respectful[1]].

> Nothing on that page constitutes a waiver of GPLv3. It is merely an offer of an alternative license to a subset of potential users.

Perhaps waiver was the wrong word, but lets look at this a moment:

The offer is to everybody (barring a small (arguably dubious (enter lawyers)) subset of potential users, where (again, presumably) the license is more like MIT than GPL.

At any rate, the decision is in the hands of the authors.

[1] https://news.ycombinator.com/item?id=6614482, https://news.ycombinator.com/item?id=6614179

jafaku · on Oct 26, 2013

He's just letting the author know that the library is unusable with this license. So either people won't use it, or they will use it and violate the license.

Would you seriously consider using (or even trying) a new library that has not even proved to be better than others yet, if you had to comply with GPLv3? I wouldn't.

pjmlp · on Oct 26, 2013

Yes, all the time.

Those that want to get money with the free work of others, should give something back and most authors do offer commercial licenses when asked for.

With licenses like BSD, the living proof is that most companies are leechers.

a-nom-a-ly · on Oct 26, 2013

> Those that want to get money with the free work of others, should give something back and most authors do offer commercial licenses when asked for.

Have you ever considered the fact that it's not all about money and proprietary software? This affect Open Source software as well. If I want to allow other to do as they will with my code, I won't touch a library with such a viral license because I don't want to subject myself or my users with having to even think about it. Also, have you considered...

> With licenses like BSD, the living proof is that most companies are leechers.

That's not a proof of anything... The same license you cite is a living proof that most[1] companies and regular people will contribute even more than they have to without you having to force them to do anything.

[1] most, for some definition of most, because if a company of individual have neither the expertise nor the resources to contribute, then what good does restricting their use of the code achieve? (rhetorical question)

pjmlp · on Oct 29, 2013

Yes, I have considered all of that.

When I started coding, there was no such thing as GNU or open source movement. You got commercial software, shareware, beerware, donationware, whateverware.

I don't have any problem with commercial software, actually I do use quite a lot of it.

What I have problems with, and I have seen it happening a lot, is companies using source code from someone else as a means to cut costs for their binary blobs, without any form of contribution.

So I always defend a dual license scheme. GPL for open source projects, and some company whats to use the code in a commercial product, just needs to ask for the commercial license.

The only freedom GPL takes away, is the freedom to abuse the work of others.

lambda · on Oct 27, 2013

The library is not unusable with this license.

Anyone writing GPLv3 software to begin with would be just fine with using this license.

The question you have to ask yourself is: am I OK with releasing my software under the GPLv3, or am I OK with rewriting this library if I decide to relicense my software under some other license. If you're OK with that, then you can use this library.

nknighthb · on Oct 26, 2013

> the library is unusable with this license

No, it isn't. I can use it. So can millions of others. That you can't is most likely the fault of shitty lawyers. Not the authors' problem.

jafaku · on Oct 26, 2013

Odds are you don't really understand GPLv3. See what people like Linus (you might have heard of him) think about it. And he made quite some contributions to the open source world.

nknighthb · on Oct 26, 2013

I understand it just fine, thank you. GPLv3 is hardly the only thing I disagree with Linus on. Childish appeal to irrelevant authority is not an argument.

jafaku · on Oct 26, 2013

You were claiming that it was a problem with my "shitty lawyers", I showed a perfect example of someone who worked on open source his entire life, who can't use GPLv3.

Childish assumptions and irrelevant reference to the argument of authority fallacy are not an argument. Talk about "arrogance"...

nknighthb · on Oct 26, 2013

I claimed it was "most likely" shitty lawyers, because 100% of the people I've heard from who genuinely can't use GPLv3 code anywhere in their work can't do so because a corporate lawyer-drone is in the way.

The rest either won't because they don't like the license, or can't by virtue of their own choices. Linus is one of these people. He has made his choices, which is his right. I (mostly) do not agree with the reasons for those choices, and would not have made the same ones, as is my right.

The identity and stature of the person who makes a choice is irrelevant, and dragging it out as if it makes my opinion invalid is absurd.

jafaku · on Oct 26, 2013

If a lawyer is the only thing stopping someone from using the library, then that person was clearly going to violate the license. Which proves my point: Those who want to steal the code will do it anyway. And those who wanted to give it a legitimate use won't even touch it. The same kids that are slapping GPLv3 to anything they build, are probably the ones breaking other people's licenses because they don't understand it. This their typical response when you call them out: "It's open source!". As if MIT/X11 open source was the same as GPLv3.

I insist: No professional programmer is going to use WhiteDB. Not for open source, not for anything. If there ever is someone willing to comply with GPLv3 just to be able to use WhiteDB, he won't even find out about the library, because nobody is using it.

nknighthb · on Oct 26, 2013

> If a lawyer is the only thing stopping someone from using the library, then that person was clearly going to violate the license.

Just by saying this you prove you've never dealt with corporate lawyers. I know multiple companies where all GPLv3 software has been banned by the legal department entirely. Not just for use as part of a product. It's literally not allowed on the company's computer's at all, because the shitty lawyers who couldn't make it in the real world have decided that if the company touches GPLv3 software, all company source code is immediately GPLv3.

They are that stupid.

By the way, if your faith in corporate lawyers is so strong, why bother with courts? We can just have corporate lawyers decide everything, since they'll always get it right. Which is why there are no lawsuits where one side wins and the other side loses.

> I insist: No professional programmer is going to use WhiteDB.

Which particular term of the GPLv3 would prevent me from using WhiteDB in a web application? I'm aware of none whatsoever. Note that this is GPLv3, not AGPLv3, which does have terms which can pose a problem to web applications.

By the way, Red Hat and Canonical make GPLv3 software, contribute to GPLv3 software, and include GPLv3 software in their Linux distributions. Are you accusing their programmers of being unprofessional?

endgame · on Oct 25, 2013

Well said. People need to remember that the L in LGPL stands for "Lesser", not "Library".

jafaku · on Oct 26, 2013

Where lesser means less restrictions, not less freedom.

GPLv3 = The ultimate restriction

belorn · on Oct 29, 2013

HN need better anti-troll protection against comments like that.

lambda · on Oct 26, 2013

  What disgusting arrogance.

Could we please try to improve the tone of discourse here?

nknighthb · on Oct 26, 2013

What would improve the tone of discourse is if every time someone posted a GPL'd project, the thread didn't fill up with entitled jerks whining about the terms under which they've been offered free stuff.

quanticle · on Oct 26, 2013

Except that no free stuff has been offered. A GPLv3 license is the equivalent of saying, "Hey, look at this cool thing I made! Oh, you want to use it for anything other than GPLv3 projects? Well, you can't have it!"

Forget the intent of the GPLv3. The effect of the GPLv3 has been to damage and subvert free software, rather than promote it.

nknighthb · on Oct 26, 2013

A GPLv3 license is saying that I will share what I created with you if you will share what you create with it with me on the same terms.

If you don't like those terms, you can write your own software from scratch. Nobody is under any obligation to give you anything on any terms at all.

lambda · on Oct 27, 2013

I agree that it would be nice if people wouldn't spend so much time complaining that people create free software, with the request that if you use it you share your changes freely as well. I'm on your side about the actual underlying issue.

What I was requesting was that you don't use phrases like "what disgusting arrogance" and "entitled jerks whining" to make your point. You are unlikely to convince anyone with that kind of language; when people feel like they are attacked, they're a lot more likely to respond defensively than they are to be convinced by you.

jafaku · on Oct 26, 2013

Agreed, most devs that choose GPL3 don't know the first thing about licenses, and just pick it because they will do whatever Stallman tells them to do.

I can't think of any good use case for GPL3. Nobody should use it.

pjmlp · on Oct 26, 2013

> I can't think of any good use case for GPL3. Nobody should use it.

When coding open source software.

For everyone else that wants to make money with the free work of others, a commercial license can always be asked for.

AsymetricCom · on Oct 25, 2013

You know who else built cool things? That's right. Hitler.

yid · on Oct 25, 2013

Looking at the benchmarks, I'm trying to rack my brain about how it outperforms redis consistently on every single benchmark for a simple associative map.

I don't know if this is likely, but it looks like redis doesn't lock memory [1], which means that the benchmarks could be explained by swapping. Depending on the type of shared memory used by whitedb, it could be that its pages are locked and immune to swapping.

[1] https://github.com/antirez/redis/issues/1177

josephg · on Oct 25, 2013

I think the difference is entirely explained by WhiteDB's lack of networking overhead, both in write() calls to the OS[1] and creating & parsing network messages. A fairer comparison would be with leveldb and lmdb[2].

[1] http://highscalability.com/blog/2013/6/19/paper-megapipe-a-n...

[2] http://symas.com/mdb/microbench/

Zariel · on Oct 25, 2013

Whats the overhead of the connection to the server? If he is connectng to Redis over TCP and WhiteDB via memory sharing then its not a huge leap to say TCP is slower than shared memory.

bch · on Oct 25, 2013

Why, oh why GPLv3 for a linkable library?

Request something more permissive (LGPL, MIT, BSD)?

kodablah · on Oct 25, 2013

Closed-source licenses are provided per request and the only restriction seems to be that you don't sell a DB system backed by the lib[1]. If you really want to prevent closed-source abuse of your library, a GPL-based dual licensing setup is the only way I can think of.

1 - http://whitedb.org/licence.html

bch · on Oct 25, 2013

Ah -- ok -- I got my info from the COPYING file in the git repo.

Regardless: "those which are distributed and marketed as database systems to be used by other developers" looks to be _full_ of wiggle room within what I suspect[1] is the authors intent.

[1] "I/somebody will/might make this a higher-order database tool for developers and I don't want anyone to compete with anybody else without forcing everybody involved (with this code) to open their entire codebase" (??)

SandB0x · on Oct 25, 2013

Python bindings! Finally I can stop using dictionaries like a common peasant.

Tobu · on Oct 25, 2013

"very low overhead but maximum possible contention" → an accurate description of the GIL

parhamn · on Oct 25, 2013

Did you read this part?

> Data is read and written directly from/to shared memory

Good luck accessing your dictionaries from another python process....

pekk · on Oct 25, 2013

What is the advantage of this over using dictionaries?

parhamn · on Oct 25, 2013

Accessible across multiple python processes (shared memory).

duaneb · on Oct 25, 2013

Indexes, albeit T-tree instead of B-tree.

boyd · on Oct 25, 2013

Does anyone mind sharing what they see as the advantage of this vis-a-vis a Redis, LevelDB, Mongo, etc. (realizing those are all very different)? Is it principally read throughput, write throughput, interfacing, space efficiency, scalability to very large datasets, or something else?

I ask as friends and I are doing research into very space-efficient (read: probabilistic) key-value stores. We have a few scientific use cases, but I'm curious if others would find the ability to scale to very large key- and value spaces (~1e9+ key-value pairs) in a space efficient way practically useful. Or, if interest is principally academic.

Apologies for the (pseudo-)hijack.

Ps. Looks very interesting, I have it on my to-dos to install and check it out more deeply.

throwaway420 · on Oct 25, 2013

Is it accurate to describe this as a faster Redis that doesn't have all of the useful data structures like lists, sets, etc?

meowface · on Oct 25, 2013

Yes, somewhat.

It's also different from Redis because Redis is intended to be ran as a server (it stands for Remote Dictionary Server). This is ran entirely as a process and communicates via IPC; other machines can't reach the database, only the local machine. This is a big reason for why it's very fast. However, it also means you can't distribute the database across multiple servers.

You could think of it like a very fast NoSQL Sqlite, I guess.

marshray · on Oct 26, 2013

Sounds like some specific collection type that one might build with Boost.Interprocess. http://www.boost.org/doc/libs/1_54_0/doc/html/interprocess/a...

stavros · on Oct 26, 2013

I keep wondering why there isn't a libredis you can just link into your program without needing to run a server and all that. It sounds very useful, but I'm not sure how hard it would be to develop.

meowface · on Oct 26, 2013

For the same reason you can't do that with MySQL or Postgres or any other database server. Redis intended to be ran as a server listening for TCP connections, and communicating over a TCP/IP network. If you're running it on localhost, the communication should generally be pretty fast, but you're right that overhead will still be incurred.

It would require adding a lot more code to allow for typical interprocess communication.

lttlrck · on Oct 25, 2013

Probably. Though it would be inaccurate to say it is in any way equivalent to Redis.

symisc_devel · on Oct 25, 2013

The project seems to be a fork of the more mature UnQLite project (http://unqlite.org) except that UnQLite support pluggable run-time interchangeable storage engines (B+Tree, LH, R+Tree) and has support for on-disk databases as well in-memory operations.

otterley · on Oct 25, 2013

How does performance compare to Kyoto Cabinet?

http://fallabs.com/kyotocabinet/

elacey · on Oct 25, 2013

Maji, nice work. Enabling graph like functionality with linking records is an interesting addition. Any idea what performance looks like going 2 or 3 levels deep with a cross-linked graph of say 100k vertices and 500k edges?

joshguthrie · on Oct 26, 2013

Ruby bindings are here: https://rubygems.org/gems/whitedb

What has been done so far:

* Create a database

* Create a record

* Set field to a string

If you feel like testing it:

    require "whitedb"

    db = WhiteDB::Database.new("foo", 20000)
    rec1 = db.create_record(5)
    rec2 = db.create_record(4)
    5.times{|x| rec1.set_field(x, "Rec1 #{x}") }
    4.times{|x| rec2.set_field(x, "Rec2 #{x}") }

Then use the wgdb binary (`wgdb foo select 20`) to see your data in your database.

I'm still new to ruby C extensions so the API is quite ugly (no blocks yet, no error checks,...) and I'm currently adding features as I'm following the tutorial.

ateeqs · on Oct 25, 2013

But why exactly is an in-memory NoSQL database necessary? This is a "non-sequitur." The fundamental purpose to using NoSQL databases is when your data spans multi-terabytes (where "multi" > 4) and you need replication and slicing. Main-memory NoSQL database seems unnecessary, doesn't it?

Also, when you say "main memory" NoSQL database, are you saying "never store in page file" but always reside/lock in main memory? If it will go to the page file, it's not really main-memory, is it?

aryastark · on Oct 25, 2013

Pretty much what I was trying to figure out. I can't see a use case here, since we already have things like Berkeley DB and LevelDB. But main memory? Your use case would have to be such that your entire dataset can fit in memory and for some reason you need extreme performance from that dataset, and from multiple processes running on a single machine. I just don't see it. Is the caching performance of other databases so horrible that this is really necessary?

jonny_eh · on Oct 25, 2013

Does it write to disk ever? I could just assume that it does because it's a DB, and would be nearly worthless if it didn't, but it keeps talking about being only in memory.

malkia · on Oct 25, 2013

It's using memory mapped files, so that's why I guess. Not sure if these are portable between architectures.

sigzero · on Oct 25, 2013

So "NoSQLite"? :)

n00j · on Oct 25, 2013

How would this compare with something like in-memory sqllite database? I guess the big difference is SQL vs NOSQL but would be curious about the performance difference.

gwu78 · on Oct 25, 2013

Will it compile on BSD/Solaris?

a8da6b0c91d · on Oct 25, 2013

Am I missing something or is this just an associative map allocated in shared memory? Seems to be an awful lot of text for something so simple.

xfax · on Oct 25, 2013

I'm convinced that one of these days we're going to see a new "DB" which will be a simple hashmap with collision handling sacrificed for "speed and performance"

adamnemecek · on Oct 25, 2013

But it will be WEBSCALE!!1!

/s

n00j · on Oct 25, 2013

It is webscale...

"Locking We use a database level lock implemented via a task-fair atomic spinlock queue for concurrency control ..."

Fantastic, database level locking!

JulianMorrison · on Oct 25, 2013

If the task is read-heavy (example: shared cache), it's mostly harmless.

daven11 · on Oct 26, 2013

we've come a long way since the 70's...