I find the paper a bit confusing in some part, Hive in itself is a cluster embedding clusters.
More than one time the paper simply mention cluster/node without explicitly which cluster they are talking about, maybe I didn't properly read it but I find it hard to understand
Redis is a featurful key-value store. Values can be hash maps, streams, sets, etc. Many people do not _use_ these features, and their mental model of Redis is as an in-memory cache - that's fine, but we're looking at this from the perspective of an _implementer_, not a _user_. So regardless if no one uses streams, if we decide to provide them, we own that complexity. Hive is declaring their intention not to be as featurful a key-value store as Redis.
None of those other examples are key-value stores. Mongo is the most similar, but really has more in common with MySQL and Postgres than Redis/Hive. Key value stores view their values as opaque, modulo things like providing embedded hash maps. Queries in key-value stores are pretty simple, and are focused on the _keys_. Eg, iterate over these keys in sorted order and give me the values.
All of these other databases provide advanced querying capabilities by having type systems and the ability to introspect their objects and make decisions based on their values.
Redis is more of a datastore than a database. Sure it will persist data if you coherce it forcefully to, but works best as a transient store, with data residing somewhere else.
Not really? This is exactly what the comment before you is calling out.
Using the word "forcefully" to mean "configure it to", is weird, and is just setting the tone of the discussion without adding anything meaningful to it.
> but works best as a transient store
"best". It's odd to see such a strong determination of what a tool should be used for. It works fine as a "transient store", and it also works fine as a database.
you're missing the point. it's not the capability to handle data. it's the pain to backup a rotating dump file and a full datalog. one could try to backup using a replica, but the rdb gets sent over the wire ayway, and that's a load of bandwidth and disk and everything being thrown at the problem. it's madness.
and that's before considering partitioning. good luck synchronizing snapshots of each shard at the same point in time - your restore will have all kind of inconsistencies for crossreferenced data.
Hey I apologize if there's been a confusion, I have nothing to do with this article, I was just trying to point out a distinction I thought was being missed.
From what I understood, the "v2" of Scaleway Object Storage was based on OpenIO (which has been brought by one of their competitor, OVH). So they abandoned the tech and built a new one who could scale a lot more, and this is Hive.