Other people have had trouble wringing competitive performance out of Bw-Trees despite heroic optimization efforts [0]. Why is this implementation going to beat other index structures with just a bit of tuning?
It might not. But the critiques of bw trees in terms of performance that I've seen have not had compelling data in terms of things that matter outside of academia or benchmarking shootouts, like write or space amplification. The bw tree is a cheap thing to abandon after I implement a persistent ART and measure it though. The bwtree is only like 1k of rust on top of the modular pagecache, which is the real heart of the system.
That paper is pretty good but it's comparing bw-tree with much simpler in memory data structures. I think bw-tress might work specially well for fast-disk storage.
This was my interpretation as well. I'm going to compare a disk-backed bwtree with a disk-backed ART, both backed by the same pagecache, and maybe end up with an ART that scatters partial pages on disk, bwtree style. But I need to measure apples to apples on the metrics that matter for storage first. The pagecache is where most of the complexity is in my implementation, and it makes building different kinds of persistent structures on top of it pretty easy. docs.rs/pagecache
Yeah, I'm curious about using sled as a more ssd friendly storage engine for mentat. I'm just starting to experiment with datalog implementations, but I think by having harmony between the storage engine, query language, and hardware properties we can make a really compelling stateful systems. If this is something that interests you, I'd love to work with more people on this.
It means pay more attention to reliability than pop infrastructure and internet companies (who can offset poor reliability with human attention or intentionally deprioritize it to sell more support contacts) tend to put into these things. Specifically, exhaustive concurrency testing of lock-free algorithm interleavings via ptrace driven scheduling, model-based testing in combination with fault injection, ALICE-style file correctness testing, and for the various distributed modules that sit on top, network simulation combined with lineage driven fault injection. This is all very much a work in progress, and I'd love to work with more people on it!
How well does that handle the storage disappearing halfway through a write? How well does it handle power being cut halfway through an update of some sort? How well does it handle some of the written blocks actually making it to the disc and others not? How about if the ones that made it were not the first or the last it issued to be written?
(For predictable answers to these, and many other complex questions, when dealing with data you care about, use sqlite)
ALICE showed that's not always true with sqlite. Sled is being built with an extreme bias toward reliability over features, but as the readme says, it has some time to go before reaching maturity. The tests are quite good at finding new issues and deterministically replaying them, so you can help bake it in by mining bugs using the default test suite and help it get there.
Most database designers assume that a power failure will only affect writes that are pending. Alas, for SSDs and NVMEs that's not always true. A power failure can cause all kinds of corruption.
Long story short: even append-only strategies will not save you.
Indeed. This is why I aggressively checksum everything and pay particular attention to throwing away all data that was written after any detected corruption during recovery. This is easier with the log-only architecture. It's also totally os and filesystem agnostic. I was happily surprised yesterday when it passed tests on fuchsia :]
Users can rely on sequential recovery. At some point I'll probably write a partial recovery tool that gives you all versions of all keys that are at all present anywhere in the readable file though, which won't be much work. Typical best practices encourage moving away from single disk reliance for particularly valuable data, but this library will also work on phones etc... So it is important to support people when a wide variety of things go wrong.
You could work around it (erasure coding, fe),
if you had insights on the failure mode specifics, but it's vendor specific and vendors are not exactly forthcoming.
So the only thing you can do is add checksumming schemes that allow you to detect you have been affected.
The paper is from 2013, so the situation might have improved meanwhile (I wouldn't put any money on it)
>How well does that handle the storage disappearing halfway through a write? How well does it handle power being cut halfway through an update of some sort?
Or the drive catching fire and the user manually deleting all existing backups.
I see that MVCC is a planned feature. Why would you need MVCC for an embedded database? It seems like unnecessary overhead that conflicts with the performance goals.
It's not needed for the single-key atomic record store, which is the sled bwtree index that is the current highest level module. MVCC is implemented in most popular embedded DBs because it is an effective way to manage mixed workloads that seek to read snapshots of the entire database at a single point in time as well as not blocking writes as this happens. This functionality is desirable for transactions that support mixed workloads. That's why I'm building it for a higher level module. This is a collection of modules that let you choose the abstraction and associated complexity that you want. There's also a modular pagecache that is totally decoupled and reusable for your own database experiments.
I think they mean "embedded" as in "embedded in a program", not "targeting embedded hardware". In particular the README states they intend to use it in https://github.com/spacejam/rasputin. This is a use case similar to LMDB, for instance, which has MVCC.
Some are (more or less) free, others are not. Rust simply gives you some great tools to make the choices yourself. The biggest win tends to be that you can get away without a GC, but still keep the guarantee that you're not leaking memory by forgetting to collect it.
To be fair, I think the way GP worded it might still be accurate; they said that you wouldn't leak by forgetting to collect it. I took that as a statement about not needing to manually free things than an assertion that Rust guarantees no memory leaks (which can still occur from things like reference cycles).
I too try (and sometimes fail) to be careful when talking about this. People get all hot and bothered when you go overboard :P It can be tempting to start explaining what an afine type system is, but that can hurt more than help.
> Unsafe Rust exists because, by nature, static analysis is conservative. When the compiler is trying to determine if code upholds the guarantees or not, it’s better for it to reject some programs that are valid than accept some programs that are invalid. That inevitably means there are some times when your code might be okay, but Rust thinks it’s not! In these cases, you can use unsafe code to tell the compiler, “trust me, I know what I’m doing.” The downside is that you’re on your own; if you get unsafe code wrong, problems due to memory unsafety, like null pointer dereferencing, can occur.
> There’s another reason Rust has an unsafe alter ego: the underlying hardware of computers is inherently not safe. If Rust didn’t let you do unsafe operations, there would be some tasks that you simply could not do. Rust needs to allow you to do low-level systems programming like directly interacting with your operating system, or even writing your own operating system! That’s one of the goals of the language. Let’s see what you can do with unsafe Rust, and how to do it.
As the person who wrote those paragraphs, yes. The fundamental reason unsafe exists is not for performance reasons. That being said, the parent is also true that sometimes, unsafe is used to enhance performance. But that's not the usual case, and in fact, can even hurt performance in ways. For example, &mut T can't alias, but *mut T can, and so &mut T can be optimized more aggressively. (We recently turned these optimizations back on: https://github.com/rust-lang/rust/pull/50744 )
You're argument doesn't really make sense. Not all abstractions are workarounds for "unsafe" things. For example, unsafe is needed for dealing with the OS, no amount of abstraction will help you there. At some point you must talk to and accept memory that another process (OS) tells you is correct. I'm not so knowledgeable in Rust so I can't say definitely, but I think you should back up your statement with some (any) examples.
Your parent didn't say that. Speaking about Rust in a thread about a Rust project seems to be very on-topic?
(I would also disagree with that statement, but in a different way: Rust enables zero-cost abstractions, but nothing inherently means they have to be. I can also make quite costly ones, and they'll compile.)
What is results disturbing is that you are arguing just for the sake of arguing. "Algorithm" is a too wide term - everything is an algorithm, especially in programming. Talk was about abstractions, now you're trying to say it was about algorithms. We need "ignore" button here.
I would answer "you can write trash in any language, in Rust you need to find ways how to do it, when in other languages - how to avoid it".
Looks like ycombinator now is the place when people just looking for negativity and what else ti disagree with.
I'm not sure about ARMv5, but v7 and v8-A are Tier 1 for Firefox, so we get a pretty decent smoke test out of them. The thing about embedded is that it's quite diverse, so speaking about it in broad terms is tough. It goes from "lol nope" to "barely works" to "pretty decent" to "great", depending on which thing you're talking about.
We have a whole working group this year working on embedded.
It's been a focus of the team, and been getting better. With efforts being made to move some external tools into the mainline toolchains. Last I checked it was possible, but still a bit cumbersome.
P.S. Really looking forward to writing Rust on AVRs.
I'll also take any project that has thousands (if not millions?) of man-hours together with an extremely solid test suite. That doesn't mean other projects aren't worth exploring or won't be useful.
Then again embedded databases are a great building block for things like etcd. It also might even make sense to have something like this in process because that would remove quite a few failure modes coming from the client connection and simplify deployment
Currently it's even more basic. The current usable parts are a pagecache following the llama approach, some great testing utility libraries, and an index (that you can use as a kv) that follows the bwtree approach. Later it will have structured access support, but it needs some more db components to get there. It is a construction kit as well as a kv.
Not really, when the embedded database is allowed to run its own threads and network connections embedding something like etcd makes perfect sense. It removes the failure modes of the client connection and simplifies deployment.
This is honestly a use case I'm experimenting with using a mix of CRDTs and OT. Our systems are becoming more and more location agnostic and I don't feel that our current data infrastructure is adequate to serve the workloads we're going to be facing as compute migrates to the edge.
It's modular, and there is a paxos implementation, but it has been built totally in simulation so far and I haven't plugged it into an io layer yet. But this is trivial. That said, sled will always be a bwtree index, and the other modular crates will stand on their own.
I am a total devotee to their approach to building simulable systems, although I seek to push it even farther and integrate lineage driven fault injection from an early stage. I see a lot of cool things in what they have done, technically. Sled is free from day one.
[0] https://news.ycombinator.com/item?id=17041616