Hacker News new | past | comments | ask | show | jobs | submit login
LevelDB: A Fast Persistent Key-Value Store (google-opensource.blogspot.com)
169 points by skanuj on July 27, 2011 | hide | past | favorite | 62 comments



An interesting development a while back that I'm surprised hasn't received more attention was Oracle's release of a SQLite-based interface to BDB:

http://www.oracle.com/technetwork/database/berkeleydb/overvi...

It's essentially drop-in compatible with SQLite, but with added concurrency and speed for most operations. (The concurrency addresses a major issue usually keeping SQLite as a prototyping/single-user-only option in web development.)

With LevelDB as a BSD-licensed alternative to BDB, I wonder:

(1) How would the LevelDB-vs-SQLite benchmarks change against SQLite+BDB backend?

(2) Could a SQLite fork with a LevelDB backend get a performance boost?


Thanks for the link! You could theoretically just compile this file against SQLite-based BDB: http://code.google.com/p/leveldb/source/browse/trunk/doc/ben... And get the numbers yourself. (If you do, please post them here.)


> SQLite-based interface to BDB

One thing I didn't get about SQL API for BDB, how does something like

    select * from users where name!='tom'
work ?


You really don't know or care that you're using BDB; it works (to the user) just like SQLite. (Behind the scenes, it's using BDB for the tables/indexes, and so would do various full- or partial- table scans much like SQLite's native on-disk format.)



Hi there! I'm a YC alum (reMail W09) and helped Jeff and Sanjay with LevelDB. Let me know if you have any questions about LevelDB and I'll see if I can help.


I would really like to use this from my Android Java application. Is this possible, and what would be the best way to accomplish this?


How does this compare to other persistent key-value stores such as Membase?


Membase is a clustered data storage service your application uses.

LevelDB is a persistence library.

That makes LevelDB the kind of thing you plug into membase to get the unique properties it has to offer (or at least for fun).


In fact, the Riak guys are planning on doing exactly that: offering LevelDB as one of the storage back-ends, perhaps even the default.

http://blog.basho.com/2011/07/01/Leveling-the-Field/


Any comparisons of performance or functionality against BerkeleyDB?


Does this system have transactions and ACID guarantees?


Sorry if this is a bit off topic but it seems to me like most of Google's opensource projects are more source available than open source. Do they actually take contributions from the community or are they all like android, source made available when its "ready for public consumption"?

LevelDB sounds like something I would like to contribute to but if the reception is going to be chilly I won't bother, maybe pick up mongo or redis instead.


The synchronous writes benchmark is interesting. This is normally bound by # seeks your disk can do per second, which is mostly a function of rotational speed. With 7200RPM drive you get 7200/60 = 120 of these a second. So the 100 and 110 numbers for competitors make sense. 2,400 for LevelDB does not.

Is LevelDB batching writes or is there something more interesting going on?


If you are writing sequentially, then you can write more than the number of seeks.

And that is exactly what LevelDB is doing: writing a log (sequential), and when the memorychunk is full, it is writing it to disk sorted (this is also sequential).


flushing the log in an LSM is only kinda sequential, sadly


Data structures which require a disk seek per random insert are obsolete. LevelDB is using a Log-Structured Merge Tree, one of many write-optimized data structures (but not the best).


This link, comparing LSM trees with fractal trees, is quite interesting: http://www.quora.com/What-are-the-major-differences-between-...


Is LevelDB batching writes

Yes, updates can be done in one atomic batch. Please correct me if I'm wrong, but I don't think Tokyo Cabinet allows it without Tokyo Tyrant.


If you write full disk blocks, wouldn't the disk cache hide the seek latency?


Having write disk cache on would certainly explain it. But that leaves the question of discrepancy with numbers with competitors.

You turn off write-through caching on disks when you run a database unless you are willing to accept corruption (which is worse than data loss) on power outage. And that's why you can't get acceptable write performance out of database without a battery-backed RAID controller (or something other kind of RAM-based write cache with a battery backup).

Here's a simple way to test # fsyncs/s (a.k.a. commit rate) on your system:

  sysbench --test=fileio --file-fsync-freq=1 --file-num=1 \
   --file-total-size=16384 --file-test-mode=rndwr run --max-time=10 \
   | grep "Requests/sec"


If cache is on, any performance discrepancy can be explained away by "usage patterns" :)

Also, do you really mean turn off write-through, or did you mean write-behind? (I can't see how write-through would cause corruption, but maybe I'm missing something...)

Also, I wouldn't be surprised if there's a discrepancy in the flushing code across systems. God knows flushing a file to disk in cross-platform code is an arcane science :)

And finally, as somebody else pointed out, LevelDB seems to order write access sequentially as much as possible.


Bindings for Node.js if anyone is interested: https://github.com/creationix/node-leveldb


From the announcement: it has already been ported to a variety of Unix based systems, Mac OS X, Windows, and Android.

It's worth noting that the makefile includes options to build for iOS. I've successfully done it and my next iOS app will include LevelDB. Also worth noting, thanks to the iOS devices SSDs, it's much faster than with the traditional HDD machines.


Upcoming versions of the Chrome browser include an implementation of the IndexedDB HTML5 API that is built on top of LevelDB

Really excited about seeing IndexedDB run atop of this



Yes, we put up the Google Code site incognito mode back then, but have since added a number of bugfixes and optimizations, so we're actually comfortable announcing the project now.


Interesting how, like in the open-sourced protobuf, there are no commits by Jeff or Sanjay...


dgrogan and myself have been batching changes to LevelDB from our internal code repository to put them on the Google Code page. Playing Google Code site admin didn't seem to me like a good use of Jeff and Sanjay's time.


Yep, its great you guys could separate it from internal dependencies! Congrats.


Jeff and Sanjay wrote the original protocol buffer implementation. The project was taken over by Kenton Varda, who rewrote the C++ and Java parts; this is what was open sourced. See http://temporal.fateofio.org/files/resume


Which is what I point as being interesting.


How is this different than BDB?


BDB is a key\value store for unordered data more similar to Tokyo Cabinet hash databases. Tokyo Cabinet hash databases are a much faster option than BDB if you only need unordered data.

LevelDB is for if you need ordered data, and a more appropriate comparison would be against a B+\tree database.


LevelDB is for if you need ordered data

LevelDB is slower with random reads, but that doesn't mean you shouldn't use it for unordered data - it's still quite fast.


>LevelDB is slower with random reads, but that doesn't mean you shouldn't use it for unordered data - it's still quite fast.

In a positive analysis (should rather than shouldn't), assuming no default choice, it seems rational to use Tokyo Cabinet or CDB _hashmaps_ for unordered data, and LevelDB for ordered data, from a datastructure and performance standpoint. To assert more would probably need a specific use case for context.


In a positive analysis (should rather than shouldn't), assuming no default choice, it seems rational to use Tokyo Cabinet or CDB _hashmaps_ for unordered data, and LevelDB for ordered data, from a datastructure and performance standpoint.

It's as rational as doing optimizations. If the specific performance is extremely critical, yes, it definitely makes sense. But LevelDB does well with random reads [1]. With all its features and its permissive license, I think it's a strong contender as developer's go-to embedded key-value db, like SQLite for relational data.

Don't get me wrong, I do think your comparison is valuable, but I'm afraid it could be misleading; I specially found the wording LevelDB is for if you need ordered data misleading. Someone could read it and assume that LevelDB doesn't do well with unordered data.

[1] Compare today's benchmarks with these http://fallabs.com/tokyocabinet/benchmark.pdf, it looks like random reads in LevelDB are quite faster than BDB.


> and its permissive license

Other variables such as licensing are context dependent. In my context, I generally use embedded databases in shared library form to store arbitrary serialized Lua (much faster to reload than JSON, no need to compile protocol buffers) on my own servers, so in my development context LGPL vs BSD is irrelevant. But perhaps not for an iPhone developer. TokyoCabinet already has excellent bindings for nearly every language I've used, but thats irrelevant to a developer whose application is also C.

> Compare today's benchmarks with these http://fallabs.com/tokyocabinet/benchmark.pdf, it looks like random reads in LevelDB are quite faster than BDB.

I'd bet BDB is a snail, BUT the benchmark you just linked tests 0 databases in common with the one released by Google. What can be observed is that using a hashmap (TC) was over 7X faster than randomly accessing the fastest ordered data structure by the same author (TC-BT-RND) :)


you don't need to pay anyone to use it in your commercial software.


I suppose if you are shipping proprietary binaries, then yes. But otherwise it's effectively GPL'ed.


Exactly. BDB is GPL with an (non-publicly priced) commercial license to embed in apps, and LevelDB is BSD. With LevelDb, one of the most practical use case of a serverless db - embedding it in a client-side application - isn't crippled by it's own license.


it would be cool to make a leveldb backed fork of redis


Pardon my ignorance but what's backing redis currently?


Redis. :) But more importantly, it runs as a server. I think what @jcapote meant is being able to use Redis operations without a server, like Leveldb or sqlite. I would love to see that. There's already some effort towards that direction [1], but using a google backed project instead of building a full library from the ground up could be a saner approach.

[1] https://github.com/seppo0010/redislite


I love the insight about how fast compression (Snappy) is like having faster hard drives.



Your point?


That leveldb has been discussed several times on HN in the last two months. I just didn't break out the links from the search UI.

Downvoters: links to previous context are generally considered a good thing here.


Oh, then its easier to understand that in this query: http://www.hnsearch.com/search#request/all&q=leveldb&...


Yeah, the search results when I linked them had those results. The index was subsequently updated, so the top hits are now this thread, and this article. Oops.


Anyone know how LevelDB compares to Voldemort? From a cursory glance, they are identical in their simple API (get, put, delete)


Voldemort developer here--

Voldemort to LevelDB is what MySQL is to InnoDB: Voldemort is a distributed system that allows multiple engines to be plugged in. Mostly commonly, companies use either BerkeleyDB or MySQL as a storage engine. LinkedIn, Mendeley, EBay and others also use the read only storage engine, where the data is pre-built in Hadoop and loaded into Voldemort.

I am really excited about LevelDB: while there are higher priority projects on my plate right now, we'd very much like to see a LevelDB storage engine. If anyone is interested in contributing one, they're welcome.

The steps are:

1) Creating JNI bindings to LevelDB (or creating a .so version of LevelDB and creating JNA bindings)

2) Implementing the StorageEngine interface with the bindings, including passing in any configuration.

Here is an example of a third party InnoDB/Haildb storage engine for Voldemort:

https://github.com/sunnygleason/v-storage-haildb


My understanding is that Voldemort is a distributed key-value storage system, while LevelDB is a local on-disk key-value storage system.


Note to myself : Search before you post. Apologies, I checked new and front page only!


You were right to post it -- it's a new blog post announcing the non-beta release and benchmarks.


How well does LevelDB work for a mobile device? This might be a nice use case.


LSM-Tree is good!


So that's what Jeff does!


how is this different from membase?


Uhm shouldn't that be ovious by reading the high level descriptions of each? They are for completely different use cases. Membase is a distributed Key/Value server and LevelDB is a Key/Value library.


I see the difference between a server and a library, but both can often be used for the same use case. Just recently I evaluated a few data stores for a project and I didn't care all that much about the distinction. For the servers you're gonna use an API for your programming language anyway, so the programming model isn't that different.


About like the difference between mysql and sqlite.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: