Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can tell a lot about a developer by their preferred database.

* Mongo: I like things easy, even if easy is dangerous. I probably write Javascript exclusively

* MySQL: I don't like to rock the boat, and MySQL is available everywhere

* PostgreSQL: I'm not afraid of the command line

* H2: My company can't afford a database admin, so I embedded the database in our application (I have actually done this)

* SQLite: I'm either using SQLite as my app's file format, writing a smartphone app, or about to realize the difference between load-in-test and load-in-production

* RabbitMQ: I don't know what a database is

* Redis: I got tired of optimizing SQL queries

* Oracle: I'm being paid to sell you Oracle



This might be a stupid question, but surely no one thinks of RabbigMQ as a database right? I’ve used it from 2012 to 2018 extensively, including using things like shovels to build hub spoke topologies, however not once did I think of it as anything but a message broker.

Did I miss something huge?


>This might be a stupid question, but surely no one thinks of RabbigMQ as a database right?

Arguably the world's most popular database is Microsoft Excel.


It’s definitely popular, that much is inarguable.


I once worked on a system for notifying customers of events by posting to their APIs. Events came in on a Rabbit queue and got posted.

If a customer's API was down, the event would go back on the queue with a header saying to retry it after some time. You can do some sort of incantation to specifically retrieve messages with a suitable header value, to find messages which are ready to retry. We used exponential backoff, capped at one day, because the API might be down for a week.

I didn't think of RabbitMQ as a database when I started that work, but it looked a lot like it by the time I finished.


Sounds like delay line memory.


lol api could be down for a week? What?


Not everything is or needs to be webscale.


RabbitMQ stores your data, right? Then it's a database! That's pretty much all it takes. A flat file, memory-store, SQL DB, Document store, any of them can be databases if that's where you stick your data!

But also no, RabbitMQ and Kafka and the like are clearly message buses and though they might also technically qualify as a DB it would be a poor descriptor.


Ah I see, we are going with “well technically it stores something therefore it is database joke”. Now I’m fully onboard :)

Back when I worked in LA my CTO used to joke that most places use Microsoft Outlook as a database and Excel as BI tool.


If memory serves the original EToys.com code treated the filesystem as tree-structured database using atomic operations (though no transactions). It worked just fine, then the rewrite with an RDBMS that should have been stabler and faster resulted in the famous meltdowns. Admittedly this is cheating a bit since you can name folders & files with semi-arbitrary or internally structured string keys. By 1997 standards pure disk access without having to walk the filesystem heirarchy was blazingly fast compared to many of the databases I was using.

[Source: I was friends with the guy who wrote it as well as other EToys employees. God that was a trainwreck.]


Interesting, is there a blog around discussing this in detail ? If not would be kind enough to go more into detail.


I don't think anyone posted about their particular system, but it's not unknown now. If you google "filesystem as a database" there are some relevant hits. One super simple and probably not ideal, but at least balanced version uses a hash of some primary key like customer row id as the file index, then partitions the items into directories with all permutations at each level (or only populated ones) based on successive parts of the hash. For example an item key that hashes to a32c4214585e9cb7a55474133a5fc986 would be located somewhere like this:

  a32c/4214/585e/9cb7/a554/74133a5fc986
    a32c/
      4214/
        585e/
          9cb7/
            a554/
              74133/a5fc986
The advantage of this kind of structure is that you never need to manually scan a directory since you know exactly what path you're trying to open. You still incur the OS lookup time for the inode-equivalent in the directory entry, but a deeper heirarchy keeps that faster. You can trade off time to traverse the heirarchy versus number of entries in the final directories by adjusting the length of the hash chunk you use at each level. Two characters will put vastly fewer entries at a given level, but vastly increase your directory depth.

Basically if you're manually scanning the heirarchy for anything but a consistency check or garbage collection you've already lost.


That's how git stores its objects:

    18:35 $ tree .git/objects/
    .git/objects/
    ├── 02
    │   └── 9581d0c8ecb87cf1771afc0b4c2f1d9f7bfa82
    ├── 3b
    │   └── 97b950623230bd218cef6aebd983eb826b2078
    (...)
    ├── info
    └── pack
        ├── pack-b1fe2364423805afb6b1c03be0811c93b19dedc9.idx
        └── pack-b1fe2364423805afb6b1c03be0811c93b19dedc9.pack

    10 directories, 10 files


One important note: make sure you carefully consider using atomic renames and such for manipulating the files! Overwrite in place is a great way to end up with a corrupted item if something goes desperately wrong and you're not protected by COW or data journaling.


Usually you write these sorts of things as append only with an optional garbage collect. You get a minimal sort of Atomicity with that.


I was thinking of doing something similar as a lightweight embedded datastore: apply structure to the file system like you would a redis key.

Would love to talk to anyone on the EToys team or anyone who has done something similar.

I'm @akamaozu on twitter.


Unfortunately eToys imploded a couple of years later (2001) and there were only a few people involved at that stage so it's possible none of them are in the industry anymore. You might start by looking at email servers, I believe there are a few that use a deeply nested directory heirarchy for much the same reasons. IIRC Apple also does something similar with the interior of the sparsebundles used in Time Machine backups, but I don't know if any of that code is opensource.


You laugh, but I bet Excel produces orders of magnitude more real "business intelligence" than all other "BI" tools combined.


Here is an anecdote.

I had to work on a tool that shows what's wrong with an assembly line: missing parts, delays, etc... So that management can take corrective action. Typical "BI" stuff but in a more industrial setting.

The company went all out on new technologies. Web front-end, responsive design, "big data", distributed computing, etc... My job was to use PySpark to extract indicators from a variety of data sources. Nothing complex, but the development environment was so terrible it turned the most simple task into a challenge.

One day, the project manager (sorry, "scrum master") came in, opened an excel sheet, imported the data sets, and in about 5 minutes, showed me what I had to do. It took me several days to implement...

So basically, my manager with Excel was hundreds of times more efficient than I was with all that shiny new technology.

That experience made me respect Excel and people who know how to use it a lot more, and modern stacks a lot less.

I am fully aware that Excel is not always the right tool for the job, and that modern stacks have a place. For example, Excel does not scale, but there are cases where you don't need scalability. An assembly line isn't going to start processing 100x more parts anytime soon, and one that does will be very different. There are physical limits.


I think you drew the right conclusion from your experience, but I also want to point out that building the first prototype is always anywhere from one to three orders of magnitude easier than building the actual product.

The devil is in the details, and software is nothing but details. The product owner at the company I work for likens it (somewhat illogically, but it works) with constructing walls. You can either pick whatever stones you have lying around, and then you'll spend a lot of time trying to fit them together and you'll have a hell of a time trying to repair the wall when a section breaks. Or you can build it from perfectly rectangular bricks, and it will be easy to make it taller one layer at a time.

Using whatever rocks you have lying around is like building a prototype in Excel. Carefully crafting layers of abstraction using proper software engineering procedures means taking the time to make those rectangular bricks before building the wall. End result more predictable when life happens to the wall.


Well in these situations, the implicit ask of your company (I've been there myself) is to basically rebuild excel but replace some of the power/flexibility of excel for safety and to remove the risk of error away from front end users (aka move the risk to the back end developers)

Unfortunately which specific features of Excel are acceptable to remove are unknown until you have already way over invested into the project.

The best I've seen this done is having Excel as a client for your data store. Where read access is straightforward and write can be done via csv upload (and heavy validation and maybe history rollback).

That way the business can self-service every permutation of dashboard/report they need and only when a very specific usecase arises do you need to start putting engineering effort behind it.

I suppose you can also supplement the Excel workflow with a pared down CRUD interface for the inevitable employee allergic to excel.


I posted elsewhere[0] in this thread about my employer's successful practice of replacing shared spreadsheets with web applications.

Here is another option that we use instead of CSV import.

Our applications support custom reports and custom fields.

Users can define new reports and run them on demand.

They can also define custom field types with validation, data entry support, etc.

This combination provides some of the extensibility of Excel while retaining the advantages of an application.

Edited for wording changes.

[0] https://news.ycombinator.com/item?id=23292374


...And orders of magnitude more wasted time and capital due to inaccurate and isolated data.


People use what they know to solve the problems they have.

You can complain about their solution or see it as an opportunity.

I posted elsewhere[0] in this thread about my employer's practice of replacing shared spreadsheets with web applications.

This approach works quite well for us and I would encourage you to consider it as an option.

[0] https://news.ycombinator.com/item?id=23292374


You bet, but I'd really love to see data that supports that.


well technically it stores something therefore it is database joke

Confluent, the company behind Kafka, are 100% serious about Kafka being a database. It is however a far better database than MongoDB.


Excel can be an excellent source of new line-of-business applications.

Many of my employer's applications started out as a shared spreadsheet or Access database.

Our development team worked with the users and built a web application to solve the same problem.

This approach has a lot of advantages:

* The market exists and has an incumbent. There's a lower risk of a write-off.

* The users are open to process changes. You still have to migrate people off of the spreadsheet, though.

* It's easy to add value with reporting, error checking, concurrent access, and access control.

* You can import the existing data to make the transition easier. This will require a lot of data cleaning.

Edited to add the following text from another post.

You can cover most of the requirements with a set of fixed fields.

The last 10% to 20% of the use cases requires custom reports and custom fields.

Users should be able to define their own reports and run them without your involvement.

They should also be able to define custom field types with validation, data entry support, etc.

If your web application has these two features and other advantages then you should be able to replace Excel.


> Kafka and the like are clearly message buses and though they might also technically qualify as a DB

ksqldb is actually a database on top of this.

The thing is that they have an incrementally updated materialized view that is the table, while the event stream is similar to a WAL ("ahead of write logs?" in this case).

Because eventually you can't just go over your entire history for every query.


Oh ho ho ho. What weird things we use as a databases. I remember when I first started out as a consultant developer we were using a CMS as our data repository because someone thought that was a good idea. (It wasn't). The vendor was flown in from the states to help troubleshooting. I will never forget how he looked at me when I had to explain to him why we made so many nodes in the content tree, it was because we were using the CMS as a repository.


It's both. It's best used when it's being used as a message broker, but any sufficiently advanced message broker will need many of the features of a database – durability of messages, querying in various ways, etc. I think it's reasonable to think of it as a very specialised database.


I interpret it as they'd probably not call it a database, but they might use it in places where a database would be better suited, and effectively store data in it.


As someone who chose MySQL and provides direction to developers who really like Postgres, and who also uses Postgres for fun, I do find myself having to both defend MySQL as a prudent option and convince them that I know anything at all about Postgres or computer science. :)


I've heard MySQL (well, MariaDB, really) has improved a lot in recent years, but I still can't imagine why I'd ever choose it over Postgres for a professional project. Is there any reason?

It used to be that bargain basement shared-hosting providers would only give you a LAMP stack, so it was MySQL or nothing. But if you're on RDS, Postgres every time for my money.


mysql's admin tools are still far superior than what's available for postgres


What tools are these? Curious as a Postgres user


MySQL Workbench is actually pretty damn good.


As a PostgreSQL user. Pgadmin 4 is the worst.


PgAdmin3 was much better, but then they decided to turn it into a web application and we got PgAdmin4 Very annoying.


I used to use PgAdmin 3, but after... I donno how many years now, the PgAdmin4 is still a buggy mess.

It's really sad because all the contributors to Postgres have made an AMAZING database that's such a joy to work with. And then there's PgAdmin4 where its almost like they just don't care...

I don't feel I'm smart enough to contribute anything to PgAdmin4 to try make it better. So I stick to DataGrip and DBeaver.


I've probably tried a dozen or so.

For MySQL, I haven't found anything that beats SequelPro. For Postgres, I haven't found anything that comes close to parity, but my favorite is Postico.

I know people that swear by IntelliJ for their db stuff, it just never hit home for me personally though.


TablePlus is fantastic and supports all the most popular DB’s, including Postgres & MySQL. https://tableplus.com/

It’s the only DB client that doesn’t look like it was built in the 90’s. Slick UX & UI. Nice balance between developer tool & admin tool


For PG - DBeaver do a damn good job for me.


DBeaver is certainly underhyped. Moved over to using it from MySQL Workbench and so far it's been a big improvement.

I can't compare against SequelPro as I don't have a Mac, but DBeaver's worth a try for anyone looking for a cross platform DB editor


I spent a little time in it and I really like DBeaver. Free tier works great.


I tend to find people who argue with me against MySQL bring up things that haven't been true in a long time such as Unicode or NULL handling.

I'd probably choose Postgres over MySQL for a new project just to have the improved JSON support, but there's upsides to MySQL too:

- Per-thread vs per-process connection handling

- Ease of getting replication running

- Ability to use alternate engines such as MyRocks


Does MySQL have transactional DDL yet? Because if not, it is not a contender as a primary datastore.


Does that mean they fixed “utf-8” or that everyone is just supposed to know that it’s fucking bullshit and always has been?

You can’t cut corners like that without inviting questionS about the character of the primary committers. The pecking order in software is about trust.

People don’t let that stuff go easily, which is why you still see people harping on MongoDB. Once someone is labeled a liar And a cheat, everything they say that doesn’t add up is “guilty until proven innocent.”

The utf-8 situation is on top of a bed of half truths. Things like publishing benchmarks with backends that don’t support isolation. A cornerstone of a good DB is handling concurrent access efficiently and correctly. Drawing attention to other benchmarks is a lie by omission. Better than just being incorrect for a decade, certainly, but still sketchy.


MyRocks is a little bit janky in my experience - it doesn't support some transaction isolation levels, fails to handle some workloads that rely on locking (such as a job queue), has failed to upgrade MariaDB minor versions [0], has very sparse documentation, and overall has given me some amount of unexpected behavior.

Though I'm willing to put up with it due to its incredible compression capabilities...

[0]: https://jira.mariadb.org/browse/MDEV-22609


MySQL also has great JSON features (json data type, virtual indexes onnit, multi-value (array) indexes, json_table, ....)


I'd be impressed if these were better than PostgreSQL's JSON features. Do you know how they compare?


2 years ago, they were worse, from what I recall.


MySQL has definitely improved a lot since 5.0 but it still had a few warts.


I prefer PostgreSQL, but MySQL provides a better clustering experience if you need more read capacity than a lone node can provide.

Oracle is great if and only if you have a use case that fits their strengths you have an Oracle specific DBA, and you do not care about the cost. I have been on teams where we met those criteria, and I genuinely had no complaints within that context.


Given both my experience and prior research, I don't believe you that Oracle is ever better than have the stuff on the above list, and I think it's worse than Postgres on every metric.

Every time I need to work with an Oracle DB it costs me weeks of wasted time.

For a specific example, I was migrating a magazine customer to a new platform, and all of the Oracle dumps and reads would silently truncate long textfields... The "Oracle experts" couldn't figure it out, and I had to try 5 different tools before finally finding one that let me read the entire field (it was some flavor of JDBC or something). To me, that's bonkers behavior, and is just one of the reasons I've sworn them off as anything other than con artists.


SQL Server: I use C# and write line-of-business applications.


My day job involves developing for / customizing / maintaining two separate third-party systems that rely on SQL Server (one of them optionally supports Oracle, but fuck that).

I gotta say, as much as I hate it with a passion, and as often as it breaks for seemingly silly reasons (so many deadlocks), it's at least tolerable (even if I feel like Postgres is better by just about every metric).


Datomic: I'm done already, send more work please


I've been working with a partner company that is using Datomic to back a relatively impressive product - but I don't really see much written about it. What has been your experience?


>* H2: My company can't afford a database admin, so I embedded the database in our application (I have actually done this)

I'm familiar with the variant, "InfoSec won't let us deploy a DB on the same host".


SQLite: I enjoy using reliable and correct databases even at the cost of scalability


SQLite has always intentionally failed to report this error:

  sqlite> create table foo (n int);
  sqlite> insert into foo (n) values ('dave');
  sqlite> select count(*) from foo where n = 'dave';
  1


That is documented as a feature. https://www.sqlite.org/different.html under "Manifest typing".

SQLite does not use column typing except in integer primary keys.


Fair enough, but just because you call a bug a feature doesn’t mean it’s not a bug.

I think most people have realised weak typing is not a good idea in programming languages. It’s especially not a good idea in databases.


Most people have not in fact realized weak typing is not a good idea. I myself vastly prefer strongly typed languages and think they are superior. However there are a huge number of people I work with and know professionally who prefer dynamically typed languages. Weak versus strong typing is a highly subjective opinion. Each one has different costs and benefits and which camp you land in depends in large part on what you value personally.


Most "scripting" plangs are both strong and dynamic typed at the same time.


SQLite accepts a type constraint without enforcing it. That's a bug just like MySQL sometimes accepting a foreign key constraint without enforcing it.


It's not a type constraint. It's a hint to SQLite to try and coerce values when it can. Here's what that link parent posted says:

> As far as we can tell, the SQL language specification allows the use of manifest typing. Nevertheless, most other SQL database engines are statically typed and so some people feel that the use of manifest typing is a bug in SQLite. But the authors of SQLite feel very strongly that this is a feature. The use of manifest typing in SQLite is a deliberate design decision which has proven in practice to make SQLite more reliable and easier to use, especially when used in combination with dynamically typed programming languages such as Tcl and Python.

It's intended behavior that's compatible with the SQL spec.


That doesn't change the fact that it's not a good idea.


Why?


I admit I was kind of thinking that, even though I appreciated the humor. :) I imagine an awful lot of web sites out there would do just fine with SQLite as their back end.


It's great for small, read-only websites that aren't quite simple enough to just be static HTML sites. Also for desktop app config/data DBs.


Can you elaborate? I've seen benchmarks and from their website what I understood is that it can handle really massive reads and writes, tens (maybe hundreds) thousands of ops per second, but personally never tested to this extent.


We're using it in Quassel, and as soon as you go over ~3-4 parallel write/read threads, it starts locking up completely, sometimes taking 30 seconds for simple queries that should really take milliseconds.

The big issue is that sqlite does full db locking for any operation, so during any write you can't just easily read at all.

This can be fixed with WAL mode, but WAL mode is broken in uts early versions, and new versions of sqlite aren't in all disteos yet, despite being out for almost a decade. And even WAL mode gets abysmal performance.


As long as you're only doing reads, though, you can forego the locking and it's pretty nice!

Definitely would not use in a read/write situation for a web API for various reasons, though. :-)


Which version 'fixes' WAL in your opinion?


It really can (LXD cloud setup from personal experience), the problem is that if you don't serialise your writes then yeah, fun times to be had. There are compromises for all databases. People just like telling others their opinion as fact, and how wrong everybody is apart from themselves of course.


How far does SQLite scale? Obviously not good for anything public facing with thousands of concurrent users, obviously good enough for something you only use yourself, but what about internal tools with a couple hundred users total (few of them concurrent) - where's the limit when it starts slowing down?



Expensify aren't really scaling SQLite in the way that people would expect. To say it's scaling SQLite is not exactly wrong, but probably gives the wrong impression. The users of their database likely wouldn't see it as SQLite, and they don't use the stock SQLite code.

They have their own layer on top that happens to use SQLite as the storage format on disk[1]. This layer means they aren't using full SQLite at the application level, but rather using their custom database in the application, and SQLite within their custom database.

Further, they've customised the SQLite codebase as far as I can tell to remove much of the functionality that SQLite uses to ensure that multiple instances can safely edit the same file on disk together, then they memory map the file and just have many threads all sharing the same data.

[1]: FoundationDB also does this, and scales to thousands of nodes. The trick is that it's essentially _many_ separate, very simple SQLite databases, each being run independently.


Agreed, its not a vanilla SQLite but can give you a rough idea of the quality of SQLite codebase.


* Cockroach / Spanner: you know what's cooler than millions?


What if they prefer an XML database (like basex, exist, marklogic)?


We ask them politely, yet firmly, to leave.


Psychopath


I've used H2 Ina couple of my personal JVM applications mainly because when it comes to JVM it's a somewhat nicer fit than SQLite


I love postgresql, but I don't remember when did I last interact with it with command line instead of pgadmin.


What about MSSQL?


"We are a Microsoft-only shop"


I have other boats to rock than MySQL! ;-)


Neo4j?


ha, nailed it dude.


HAHAHAH The RabbitMQ one got me. Have your upvote, sir.


MySQL is actually amazing, scale better than PGsql supports Json and is available everywhere. I see no reason to use any other dB for 90% of the use cases u need a dB for


MySQL does not scale better than PostgreSQL.

I can tell you this emphatically as I spent 6 months trying to eke out performance with MySQL (5.6). PostgreSQL (9.4) handled the load much better without me having to change memory allocators or do any kind of aggressive tuning to the OS.

MySQL has some kind of mutex lock that stalls all threads, it's not noticeable until you have 48cores, 32 databases and a completely unconstrained I/O.

EDIT: it was PG 9.4 not 9.5


You comparing tech from 2 different eras... redo the benchmark today and I’ll be surprised if you come to the same results. PGsql even has a wiki page where they discuss implementing MySQL features and changing their architecture so they can scale. https://wiki.postgresql.org/wiki/Future_of_storage#MySQL.2FM...


> You comparing tech from 2 different eras...

They were both the latest and greatest at the time

> redo the benchmark today and I’ll be surprised if you come to the same results.

I would, but it was not just a benchmark, it was a deep undertaking including but not limited to: optimisations made in the linux kernel, specialised hardware along with custom memory allocators and analysing/tracing/flamegraphing disk/memory access patterns to find hot paths/locks/contention. (and at different scales: varying the number of connections, transactions per connection, number of databases, size of data, etc)

It was 6 months of my life.

> PGsql even has a wiki page where they discuss implementing MySQL features and changing their architecture so they can scale.

Just because mysql has some good ideas doesn't mean it scales better. I know for a fact that it didn't in 2015. I doubt that they have fixed the things I found, I could be wrong. But it would have to be a large leap forward for MySQL and PostgreSQL has had large performance improvements since then too.

also, I read that page and it talks nothing about scaling, just that some storage drivers have desirable features (memory tables are very fast, and PGSQL doesn't support it; archive tables are useful for writing to slower media, you can do this with partitioning but it's not intuitive)


MySQL 5.6 was released in 2012. Postgres 9.5 was released in 2016, so this isn't really a fair comparison.


Whups, I meant 9.4, it's not possible that I could have tested 9.5 as the test was conducted in early 2015, both databases were the latest versions.

yes, I should run the test again, but it was 6 months of my life, and I don't see how much could have changed.


How MongoDB is dangerous or less consistant that PG? I have one for you: I can't use PG or MySQL because my app will go down if the master is down so then the entire backend fails. How do you do HA with default PG?


https://www.postgresql.org/docs/10/different-replication-sol...

Logical replication or synchronous multimaster replication may meet your needs.


And you can tell a lot about a developer when they post comments like this.

Almost none of is remotely accurate e.g. RabbitMQ isn't even a database.


I can't believe the one item that was so obviously added as a joke went right over head.

It may be good idea to take a break from the computer and find something less stressful to do.


Perhaps that’s because some other message brokers are now being touted as databases[0][1], I remember seeing a thread about it on HN couple of days ago.

[0] https://www.confluent.io/blog/okay-store-data-apache-kafka/

[1] https://dzone.com/articles/is-apache-kafka-a-database-the-20...


Kafka is a very different beast from RabbitMQ.


Kafka is much more like a distributed file system that has queuing semantics baked in than it is an ephemeral queue that implements some level of persistence.

The fact that you put Kafka and RabbitMQ in the same category sort of makes me feel like you're out of your element, Donnie.



Re RabbitMQ, isn't that OPs point.


Man, people really hate Mongo.

We use it for a very specific use case and its been perfect for us when we need raw speed over everything. Data loss is tolerable.


It seems like you have the only good use case for it pegged down. I've worked at multiple companies that really, really didn't understand that putting something into the DB comes with some probability that it'll never come out. The arguments were "but it's a dataBASE, it stores data. They'd never sell this as a product if it LOST data; then it wouldn't be a database..."


LOL




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: