Hacker News new | past | comments | ask | show | jobs | submit login

Why would you choose to use a system that doesn't scale by default?

Single user local applications? Fair.

Web applications? Very strange choice imo.

Reddis is great, but it is *not* a database, and it's thoroughly rubbish at high load concurrency without clustering, which is (still) a massive pain in the ass to setup manually.

Of course, you can just use a hosted version off a cloud provider... but, it's generally about 10x more expensive than just a plain old database.

/shrug

I mean, sure, it's (arguably...) step up from just using sqlite, but... really, it's easy, and that's good... but it isn't good enough as a general replacement for having a real database.

(To be fair, sqlite has got some pretty sophisticated functionality too, even some support for concurrency; it's probably a step up from redis in many circumstances).




> Why would you choose to use a system that doesn't scale by default?

By all accounts Postgres seems to be a pain to scale off a single machine, much more so than redis.


Postgres is not as automatic as other tools but is mostly an artifact of it being around so long, and focus being on other things. Few projects have been around and stayed as relevant as postgres.

Most of the time, you really don't need to scale postgres more than vertically (outside of the usual read replicas), and if you have tons of reads (that aren't hitting cache, I guess), then you can scale reads relatively easily. The problem is that the guarantees that postgres gives you around your data are research-level hard -- you either quorum or you 2pc.

Once you start looking into solutions that scale easily, if they don't ding you on performance, things get murky really quick and all of a sudden you hear a lot of "read-your-writes" or "eventual consistency" -- they're weakening the problem so it can be solved easily.

All that said -- Citus and PostgresXL do exist. They're not perfect by any means, but you also have solutions that scale at the table-level like TimescaleDB and others. You can literally use Postgres for something it was never designed for and still be in a manageable situation -- try that with other tools.

All that said, KeyDB[0] looks pretty awesome. Multithreaded, easy clustering, and flash-as-memory in a pinch, I'm way more excited to roll that out than I am Redis these days.

[0]: https://github.com/EQ-Alpha/KeyDB


KeyDB is really good. We use it in production to achieve millisecond response times on millions of requests per second.


It really looks absolutely amazing, I feel guilty because I want to run a service on it, there's almost no downside to running it everywhere you'd normally run Redis.

Also in the cool-redis-stuff category:

https://github.com/twitter/pelikan

Doesn't have the feature set that KeyDB has but both of these pieces of software feel like they could the basis of a cloud redis product that would be really efficient and fast. I've got some plans to do just that.


Are you Google search? How do you have millions of requests per second?


Lots of industries and applications can get to that scale. My last few companies were in adtech where that is common.


Thanks, I had no idea!


It's likely millions of internal requests, which as another comment mentions, is common in a number of industries.


Which PostreSQL scaling pain point would you be referring to? Citus?


redis is not a database. It's a key-value based cache. If you're using it as a database, you're gonna have a bad time.


Why so? It has persistence and I'm not aware of any reported data loss happening with it.

It's also got loads of complex and useful instructions.


Redis is inherently lossy as a matter of basic design, and that's not even touching on the many other issues born of NIH solutions rampant within it. You may not hit the behavior until you push real loads through it. If you talk to anyone who has, I'm confident they'll agree with the criticism that while it may be an excellent cache, it should never be treated as a ground truth database. It's excellent as a slower memcachd with richer features. It's not a database. You can also read Aphyr's reports over the years, which to be utterly frank, bent over backwards to be charitable.


Data loss can occur between flushes to disk, for example (by default every 2 seconds / every I_FORGOT megabytes). Perhaps (most likely) it is possible to fine-tune the configuration to have redis as a very reliable data store, but it doesn't come with such settings by default, unlike most of RDBMSes.


Not all use cases require reliable data storage and it is ok lose few seconds of data. Think simple discussion forums, internal chat applications. There are some scenarios where ease of use and a single server scalability pays off in the faster development and devops cost.


GP was asking why redis is not a reliable storage solution/database. Redis is great as an unreliable (not source-of-truth) storage.


For that temporary use case, how does it compare to memcached?


Mostly boils down to Redis having a richer API, and memcached being faster / more efficient. The new EXT store stuff allows you to leverage fast ssd's to cache stupid huge datasets. Memcached is also one of the most battle tested things out there in open source. I've used them both plenty over the years, but tend to lean towards memcached now unless I really need some Redis API feature.


I work on a SaaS community forums service and I can assure you data loss is not acceptable to our clients.

As a result we use MySQL w/ memcached, although we are considering a swap to redis for the caching layer.


> Not all use cases require reliable data storage and it is ok lose few seconds of data. Think simple discussion forums, internal chat applications.

That is definitely not ok. I'd be really pissed as a user if I wrote a huge comment and it suddenly disappeared.


It only disappears if there is a catastrophic failure. The likelihood for such thing to happen when you write a huge comment are less than jackpot in Las Vegas, a sensible risk tradeoff for better development experience and cost.


> a sensible risk tradeoff

Note the tradeoff doesn't make sense as soon as you're operating at a meaningful scale. A small likelihood of failure at small scale translates to "I expect a failure a million years from now", whereas at large scale it's more like "a month from now". Accepting the same percent risk of data loss in the former case might be OK, but in the latter case is irresponsible. Provided whatever you're storing is not transient data.


You are correct. I wrote the original comment as "single server" so I assume it does not mean a meaningful scale and can be more effectively dealt with a support ticket. Not everything needs to be growth trajectory SaaS.


How is that a sensible tradeoff compared to just using something that was actually designed to be a database when you need a database?


You do not need a relational database for a simple chat/forum application.


> sure, it's (arguably...) step up from just using sqlite

How so? What‘s wrong with SQLite?


I suppose it's a bit more suitable to networked services than sqlite is, since it's natively a web api, and sqlite is natively a local-only solution.

...but, I started writing about clustering and the network API, but, I can't really articulate why those are actually superior in any meaningful way to simply using sqlite, and given the irritation I've had in maintaining them in production in the past...

I guess you're probably right. If I had to pick, I'd probably use sqlite.


I would say Redis with RediSearch is a database.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: