Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Hive: A Globally-Distributed Key/Value Store [pdf] (scw.cloud)
82 points by wut42 on Jan 28, 2022 | hide | past | favorite | 22 comments


"Hive" seems to be an avoidable collision with apache hive.


And the flutter/dart key value store!


And portions of the Windows registry.


I find the paper a bit confusing in some part, Hive in itself is a cluster embedding clusters. More than one time the paper simply mention cluster/node without explicitly which cluster they are talking about, maybe I didn't properly read it but I find it hard to understand


Odd to refer to Redis as a general-purpose database:

"Hive is not meant to become a general-purpose database, like Redis"

Assume that is an error and they meant to say MySQL (or similar)? Or I'm seriously misunderstanding something.


Redis _is_ a general-purpose database. So is MySQL. They just accept and store data in different ways.


I guess the thing is it's not actually used/viewed a lot of times as a general-purpose DB, but rather just an in-memory cache.

There are better examples they could've used in this context, like MySQL - as mentioned -, Postgres, Mongo, etc.


Redis is a featurful key-value store. Values can be hash maps, streams, sets, etc. Many people do not _use_ these features, and their mental model of Redis is as an in-memory cache - that's fine, but we're looking at this from the perspective of an _implementer_, not a _user_. So regardless if no one uses streams, if we decide to provide them, we own that complexity. Hive is declaring their intention not to be as featurful a key-value store as Redis.

None of those other examples are key-value stores. Mongo is the most similar, but really has more in common with MySQL and Postgres than Redis/Hive. Key value stores view their values as opaque, modulo things like providing embedded hash maps. Queries in key-value stores are pretty simple, and are focused on the _keys_. Eg, iterate over these keys in sorted order and give me the values.

All of these other databases provide advanced querying capabilities by having type systems and the ability to introspect their objects and make decisions based on their values.


Redis is more of a datastore than a database. Sure it will persist data if you coherce it forcefully to, but works best as a transient store, with data residing somewhere else.


How do you differentiate between a datastore and database?


Not really? This is exactly what the comment before you is calling out.

Using the word "forcefully" to mean "configure it to", is weird, and is just setting the tone of the discussion without adding anything meaningful to it.

> but works best as a transient store

"best". It's odd to see such a strong determination of what a tool should be used for. It works fine as a "transient store", and it also works fine as a database.


sure, as you can use a monkey wrench for all your nuts. it's not going to be as pleasant, and it's going to ruin the nuts eventually.


If you say so. This way of thinking about redis holds so many organizations back.

Seriously, go try it again as a database.


you're missing the point. it's not the capability to handle data. it's the pain to backup a rotating dump file and a full datalog. one could try to backup using a replica, but the rdb gets sent over the wire ayway, and that's a load of bandwidth and disk and everything being thrown at the problem. it's madness.

and that's before considering partitioning. good luck synchronizing snapshots of each shard at the same point in time - your restore will have all kind of inconsistencies for crossreferenced data.


Got it. You wanted an example for a key-value store database.

The article is yours, so who am I to opinion on which example to use...

This is just noise...


Hey I apologize if there's been a confusion, I have nothing to do with this article, I was just trying to point out a distinction I thought was being missed.


From the paper it looks like this backs the Scaleway S3 compatible object storage.

One of my buckets currently has a warning that I have more than 500k objects and that isn't supported, so I'd take this with a grain of salt.


If you have a bucket with this warning, it isn't using Hive yet but the old backend that had scalability issues… which is why Hive is replacing it!


It's not using Hive ("v3") yet then.

From what I understood, the "v2" of Scaleway Object Storage was based on OpenIO (which has been brought by one of their competitor, OVH). So they abandoned the tech and built a new one who could scale a lot more, and this is Hive.

The v1 of their Object Storage was using Riak S2.


All new buckets are using Hive as their backend.


Yes, I was replying to OP saying that "One of my buckets currently […]".

I'm late to replying but since you are a Scaleway Object Storage, I am right on the timeline of the versions of it?


Will you be open-sourcing Hive?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: