Hive: A Globally-Distributed Key/Value Store [pdf]

djoldman · on Jan 28, 2022

"Hive" seems to be an avoidable collision with apache hive.

xdfgh1112 · on Jan 28, 2022

And the flutter/dart key value store!

johntb86 · on Jan 28, 2022

And portions of the Windows registry.

devwa · on Jan 28, 2022

I find the paper a bit confusing in some part, Hive in itself is a cluster embedding clusters. More than one time the paper simply mention cluster/node without explicitly which cluster they are talking about, maybe I didn't properly read it but I find it hard to understand

floathub · on Jan 28, 2022

Odd to refer to Redis as a general-purpose database:

"Hive is not meant to become a general-purpose database, like Redis"

Assume that is an error and they meant to say MySQL (or similar)? Or I'm seriously misunderstanding something.

bityard · on Jan 28, 2022

Redis _is_ a general-purpose database. So is MySQL. They just accept and store data in different ways.

rmbyrro · on Jan 28, 2022

I guess the thing is it's not actually used/viewed a lot of times as a general-purpose DB, but rather just an in-memory cache.

There are better examples they could've used in this context, like MySQL - as mentioned -, Postgres, Mongo, etc.

maxbond · on Jan 28, 2022

Redis is a featurful key-value store. Values can be hash maps, streams, sets, etc. Many people do not _use_ these features, and their mental model of Redis is as an in-memory cache - that's fine, but we're looking at this from the perspective of an _implementer_, not a _user_. So regardless if no one uses streams, if we decide to provide them, we own that complexity. Hive is declaring their intention not to be as featurful a key-value store as Redis.

None of those other examples are key-value stores. Mongo is the most similar, but really has more in common with MySQL and Postgres than Redis/Hive. Key value stores view their values as opaque, modulo things like providing embedded hash maps. Queries in key-value stores are pretty simple, and are focused on the _keys_. Eg, iterate over these keys in sorted order and give me the values.

All of these other databases provide advanced querying capabilities by having type systems and the ability to introspect their objects and make decisions based on their values.

avereveard · on Jan 28, 2022

Redis is more of a datastore than a database. Sure it will persist data if you coherce it forcefully to, but works best as a transient store, with data residing somewhere else.

avinassh · on Jan 29, 2022

How do you differentiate between a datastore and database?

aaomidi · on Jan 28, 2022

Not really? This is exactly what the comment before you is calling out.

Using the word "forcefully" to mean "configure it to", is weird, and is just setting the tone of the discussion without adding anything meaningful to it.

> but works best as a transient store

"best". It's odd to see such a strong determination of what a tool should be used for. It works fine as a "transient store", and it also works fine as a database.

avereveard · on Jan 28, 2022

sure, as you can use a monkey wrench for all your nuts. it's not going to be as pleasant, and it's going to ruin the nuts eventually.

aaomidi · on Jan 28, 2022

If you say so. This way of thinking about redis holds so many organizations back.

Seriously, go try it again as a database.

avereveard · on Jan 28, 2022

you're missing the point. it's not the capability to handle data. it's the pain to backup a rotating dump file and a full datalog. one could try to backup using a replica, but the rdb gets sent over the wire ayway, and that's a load of bandwidth and disk and everything being thrown at the problem. it's madness.

and that's before considering partitioning. good luck synchronizing snapshots of each shard at the same point in time - your restore will have all kind of inconsistencies for crossreferenced data.

rmbyrro · on Jan 28, 2022

Got it. You wanted an example for a key-value store database.

The article is yours, so who am I to opinion on which example to use...

This is just noise...

maxbond · on Jan 28, 2022

Hey I apologize if there's been a confusion, I have nothing to do with this article, I was just trying to point out a distinction I thought was being missed.

mike_d · on Jan 29, 2022

From the paper it looks like this backs the Scaleway S3 compatible object storage.

One of my buckets currently has a warning that I have more than 500k objects and that isn't supported, so I'd take this with a grain of salt.

Xenthys · on Jan 29, 2022

If you have a bucket with this warning, it isn't using Hive yet but the old backend that had scalability issues… which is why Hive is replacing it!

wut42 · on Jan 31, 2022

It's not using Hive ("v3") yet then.

From what I understood, the "v2" of Scaleway Object Storage was based on OpenIO (which has been brought by one of their competitor, OVH). So they abandoned the tech and built a new one who could scale a lot more, and this is Hive.

The v1 of their Object Storage was using Riak S2.

angristan · on Jan 31, 2022

All new buckets are using Hive as their backend.

wut42 · on Feb 7, 2022

Yes, I was replying to OP saying that "One of my buckets currently […]".

I'm late to replying but since you are a Scaleway Object Storage, I am right on the timeline of the versions of it?

atombender · on Jan 29, 2022

Will you be open-sourcing Hive?