Hacker News new | past | comments | ask | show | jobs | submit login

Concurrency and multithreading are a major focus of both Go and RocksDB. This introduction makes little mention of those areas, and I'm curious if there's any more to be said on this. The article lists several features being reimplemented, including:

> Basic operations: Set, Get, Merge, Delete, Single Delete, Range Delete

It makes no mention of RocksDB's MultiGet/MultiRead -- is CockroachDB/Pebble limited to query-at-a-time per thread? I'm genuinely curious how this all translates into Go's M:N coroutine model currently and moving forward with Pebble.




Pebble does not currently implement MultiGet as CockroachDB did not use RocksDB's MultiGet operation. CockroachDB can use multiple nodes to process a query by decomposing SQL queries along data boundaries and shipping the query parts to be executed next to the data. CockroachDB can't directly use MultiGet because that API was not compatible with how CockroachDB reads keys.

RocksDB MultiGet is interesting. Parallelism is achieved by using new IO interfaces (io_uring), not by using threads. That approach seems right to me. See https://github.com/facebook/rocksdb/wiki/MultiGet-Performanc.... My understanding is that io_uring support is still a work in progress. We experimented at one point with using goroutines in Pebble to parallelize lookups, but doing so was strictly worse for performance. Experimenting with io_uring is something we'd like to do.


Indeed the conceptual fork point mentioned is RocksDB 6.2.1 which came before those features. The problem with RocksDB is that one thread only makes one request at a time. I should've phrased my question more succinctly: Is Pebble/CockroachDB capable of saturating the backplane with requests in parallel? Does it multiplex a single query by dispatching smaller requests to a thread-pool?


> Is Pebble/CockroachDB capable of saturating the backplane with requests in parallel?

Yes.

> Does it multiplex a single query by dispatching smaller requests to a thread-pool?

Yes, though it depends on the query. Trivial queries (i.e. single-row lookups) are executed serially as that is the fastest way to execute them. Complex queries are decomposed along data boundaries and the query parts are executed in parallel next to where the data is located.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: