Hacker News new | past | comments | ask | show | jobs | submit | AlisdairO's comments login

I used to (and I didn't find much relief from eye drops). For the headaches, turned out they were migraines, which I was getting from screwing up my face because my eyes were uncomfortable.


That link isn't particularly convincing. As far as I can see, the only Postgres test performed on the hardware that the top MariaDB entries had was on a positively ancient Postgres version (9.2.1).


The top-performing Postgres instance among all benchmarks is v. 11.0.1008: https://www.passmark.com/baselines/V11/advanced-database-ben...

And I'm seeing versions tested up through 16.3 which was released in May. In fact even 9.2.1 is less than three years old.


   * In that link, V11 is not the version of Postgres, it's the version of the test. Scroll down to DB Version.
   * Lots of versions are tested, but 9.2.1 is the only version I see on the same hardware that the top MariaDB versions are tested against. The others are on much weaker hardware.
   * Postgres 9.2.1 is 12 years old.
This site is not a good like-for-like comparison.


self-plug, but you could give https://pgexercises.com/ a try. No need to spin up your own DB etc.


Why shouldn't they want the easy way out? I was obese twenty years ago, and lost the weight via diet and exercise. Keeping that weight off is the single hardest thing I have ever done, and a battle I still have to consciously fight every single day. Why should it be that difficult? So that I can pass some kind of purity test?

The fact is that the food we eat has evolved over time, and is too hard to resist overconsuming for a large fraction of our population. If we can create more addictive food, why not create antidotes? If we could easily treat alcohol addiction with a pill, would we tell alcoholics to just apply willpower instead?


Indeed - People In The Know have some cocnerns with this approach: https://ardentperf.com/2021/07/26/postgresql-logical-replica... .

At $work we did use this approach to upgrade a large, high throughput PG database, but to mitigate the risk we did a full checksum of the tables. This worked something like:

    * Set up logical replica, via 'instacart' approach
    * Attach physical replicas to the primary instance and the logical replica, wait for catchup
    * (very) briefly pause writes on the primary, and confirm catchup on the physical replicas
    * pause log replay on the physical replicas
    * resume writes on the primary
    * checksum the data in each physical replica, and compare
This approach required <1s write downtime on the primary for a very comprehensive data validation.


> so long as your code is correct

This is a pretty tough definition of correct, though. Without foreign key constraints you'll have a really tough time dealing with concurrency artifacts without raising your isolation levels, which generally brings larger performance concerns.


Why do you think that?

My experience is that if you have a moderate amount of foreign keys, a lot of DBMS (not Postgres) will refuse the `ON DELETE CASCADE` (in the diamond case), and you have to do it "manually" anyway (from your query builder).


I think it because a significant fraction of my career has been spent fixing db-concurrency-related mistakes for people once they hit scale :-).

I’m not talking about using cascade - this applies perfectly well to use of on delete restrict. FKs are more or less the only standard way to reliably keep relationships between tables correct without raising up the isolation level (at least, in most dbs) or doing explicit locking schemes that would be slower than the implicit locking that foreign keys perform.


How does this stay correct in the presence of concurrent activity?


Using transactions or UUID/ULIDs though maybe I’m misunderstandingyour question. How do foreign key constraints help with concurrency?


Table User: userid, etc

Table Resources: resourceid, userid, etc

If I want to restrict deletion of a user to only be possible after all the resources are deleted, I'm forced into using higher-than-default isolation levels in most DBs. This has significant performance implications. It's also much easier to make a mistake - for example, if when creating a resource I check that the user exists prior to starting the transaction, then start the tran, then do the work, it will allow insertion of data into a nonexistent user.


Add your user check as a where clause on the resources insert?


Can you give an example? I’m not aware of a mechanism like that that will protect you from concurrency artifacts reliably - certainly not a general one.


start transaction;

select id from users where id = ? for update;

if row_count() < 1 then raise 'no user' end if;

insert into sub_resource (owner, thing) values (?, ?);

commit;

??


Do that in most relational dbs in the default isolation level (read committed), and concurrently executing transactions will still be able to delete users underneath you after the select.

If we take postgres as an example, performing the select takes exactly zero row level locks, and makes no guarantees at all about selected data remaining the same after you’ve read it.

edit: my mistake - I missed that the select is for update. Yes, this will take explicit locks and thus protect you from the deletion, but is slower/worse than just using foreign keys, so it won't fundamentally help you.

further edit: let's take an example even in a higher isolation level (repeatable read):

  -- setup
  postgres=# create table user_table(user_id int);
  CREATE TABLE
  postgres=# create table resources_table(resource_id int, user_id int);
  CREATE TABLE
  postgres=# insert into user_table values(1);
  INSERT 0 1

  Tran 1:
  postgres=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
  BEGIN
  postgres=# select * from user_table where user_id = 1;
 user_id 
  ---------
       1
  (1 row)

  Tran 2:
  postgres=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
  BEGIN
  postgres=# select * from resources_table where user_id = 1;
   resource_id | user_id 
  -------------+---------
  (0 rows)
  postgres=# delete from user_table where user_id = 1;
  DELETE 1
  postgres=# commit;
  COMMIT

  Tran 1:
  postgres=# insert into resources_table values (1,1);
  INSERT 0 1
  postgres=# commit;
  COMMIT

  Data at the end:

  postgres=# select * from resources_table;
   resource_id | user_id 
  -------------+---------
             1 |       1
  (1 row)

  postgres=# select * from user_table;
   user_id 
  ---------
  (0 rows)
You can fix this by using SERIALIZABLE, which will error out in this case.

This stuff is harder than people think, and correctly indexed foreign keys really aren't a performance issue for the vast majority of applications. I strongly recommend just using them until you have a good reason not to.


HA on rds uses synchronous replication - you won’t lose data on automated failover under any normal circumstances.


Ok that's fine


This is the c-store paper, which was evolved into Vertica: http://www.cs.umd.edu/~abadi/vldb.pdf . It's very readable, worth a look.

edit: column stores existed before c-store, but c-store did some very nifty stuff around integrating compression awareness into the query executor


Thank you, I will read the paper!


> If you have a build farm / CI machines, don't use swap. With swap, if a user schedules too many compiles at once, machine will slow to a halt and become kinda-dead, not quite tripping dead timer, but not making any progress either. Instead, set up the OOM priority on the users processes so they are killed first. If OOM hits, clang is killed, build process fails, and we can go on.

This doesn't really work that well. It's true that if you enable swap and have significant memory pressure for any extended period your machine will grind to a halt, but this will _also_ happen if you don't use swap and rely on the Linux OOM killer.

Indeed, despite the lack of swap, as part of trying to avoid OOM killing applications, Linux will grind the hell out of your disk - because it will drop executable pages out of RAM to free up space, then read them back in again on demand. As memory pressure increases, the period of time between dropping the page and reading it back in again becomes very short, and all your applications run super slowly.

An easy solution to this is a userspace OOM-kill daemon like https://facebookmicrosites.github.io/oomd/ . This works on pressure stall information, so it knows when your system is genuinely struggling to free up memory.

On the historical fleets I've worked on pre-OOMD/PSI, a reasonable solution was to enable swap (along with appropriate cgroups), but target only allowing brief periods of swapin/out. This gives you two advantages:

* allows you to ride out brief periods of memory overconsumption

* allows genuinely rarely accessed memory to be swapped out, giving you more working space compared to having no swap


Eh, I’ve never seen a machine actually use any notable amount of swap and not be functionally death spiraling.

I’m sure someone somewhere is able to use swap and not have the machine death spiral, but from desktop to servers? It’s never been me.

I always disable swap for this reason, and it’s always been the better choice. Not killing something off when you get to that point ASAP is a losing bargain.


FreeBSD isn't Linux, but I've had FreeBSD machines fill their swap and work just fine for months. I had one machine that had a ram issue and started up with a comically small amount of ram (maybe 4 mb instead of 256 mb... It was a while ago) and just ran a little slow, but it was lightly loaded.

I've also had plenty of machines that fill the swap and then processes either crash when malloc fails or the kernel kills some stuff (sometimes the wrong thing) or something things just hang. Measuring memory pressure is tricky, a small swap partition (I like 512 MB, but limit to 2x ram if you're running vintage/exotic hardware that's got less than 256MB) gives you some room to monitor and react to memory usage spikes without instantly falling over, but without thrashing for long.

You should monitor (or at least look at) both swap used % and also pages/second. If the pages/second is low, you're probably fine even with a high % use, you can take your time to figure out the issue; if pages/second is high, you better find it quick.


The issue is specific to Linux. I’ve had Solaris and SunOS boxes (years ago) also do fine.


Don't mistake every machine you have seen death spiraling using swap, with every machine using swap as death spiraling. Notably, how many machines did you not have to look at, because the swap was doing just fine?


That I’ve administered? None under any significant load!

I even finally disabled it on the lab raspberry pi’s eventually, and a SBC I use to rclone 20+ TB NVR archives due to performance problems it was causing.

It’s a pretty consistent signal actually - if I look at a machine and it’s using any swap, it’s probably gotten wonky in the recent past.


Apologies. I forgot I had posted something. :(

I am a little surprised that every machine you admin has had issues related to swap. Feels high.

For the ones that are now using swap and likely went wonky before, how many would have that crashed due to said wonkiness?


There are plenty of workload which sometimes just spike.

Batch process for example.

With proper monitoring you can actually act on it yourself instead of just restarting which just leads to a oom loop.


If you pushed something to swap, you didn’t have enough RAM to run everything at once. Or you have some serious memory leaks or the like.

If you can take the latency hit to load what was swapped out back in, and don’t care that it wasn’t ready when you did the batch process, then hey, that’s cool.

What I’ve had happen way too many times is something like the ‘colder’ data paths on a database server get pushed out under memory pressure, but the memory pressure doesn’t abate (and rarely will it push those pages back out of swap for no reason) before those cold paths get called again, leading to slowness, leading to bigger queues of work and more memory pressure, leading to doom loops of maxed out I/O, super high latency, and ‘it would have been better dead’.

These death spirals are particularly problematic because since they’re not ‘dead yet’ and may never be so dead they won’t, for instance, accept TCP connections, they defacto kill services in ways that are harder to detect and repair, and take way longer to do so, than if they’d just flat out died.

Certainly won’t happen every time, and if your machine never gets so loaded and always has time to recover before having to do something else, then hey maybe it never doom spirals.


I try to avoid swap for latency critical things.

I do a lot of ci/CD where we just have weird load and it would be a waste of money/resources to just shelf out the max memory.

Other example would be something like Prometheus: when it crashes and reads the wal, memory spikes.

Also it's probably a unsolved issue to tell applications how much memory they actually are allowed to consume. Java has some direct buffer and heap etc.

I have plenty of workloads were I prefer to get an alert warning and acting on that instead of handling broken builds etc.


I think the key here is what you mean by using swap. Having a lot of data swapped out is not bad in and of itself - if the machine genuinely isn't using those pages much, then now you have more space available for everything else.

What's bad is a high frequency of moving pages in and out of swap. This is something that can cause your machine to be functionally unavailable. But it is important to note that you can easily trigger somewhat-similar behaviour even with swap disabled, per my previous comment. I've seen machines without swap go functionally unavailable for > 10 minutes when they get low on RAM - with the primary issue being that they were grinding on disk reloading dropped executable pages.

I agree that in low memory situations killing off something ASAP is often the best approach, my main point here is that relying on the Linux OOM killer is not a good way to kill something off ASAP. It kills things off as a last resort after trashing your machine's performance - userspace OOM killers in concert with swap typically give a much better availability profile.


100% agree.

In a situation where a bunch of memory is being used by something that is literally not needed and won’t be needed in a hurry, then it’s not a big deal.

In my experience though, it’s just a landmine waiting to explode, and someone will touch it and bam useless and often difficult to fix machine, usually at the most inconvenient time. But I also don’t keep things running that aren’t necessary.

If someone puts swap on something with sufficiently high performance, then obviously this is less of a concern too. Have a handful of extra NVMe or fast SSD lying around? Then ok.

I tend to be using those already though for other things (and sometimes maxing those out, and if I am, almost always when I have max memory pressure), so meh.

I’ve had better experience having it fail early and often so I can fix the underlying issue.


When I reenabled swap on my desktop (after running without swap for years assuming it would avoid the death spiral, only to find out it was almost always worse because there was no spiral: it just froze the whole system almost immediately), it would frequently hold about 25% of my RAM capacity with the system working perfectly fine (this is probably an indication of the amount of memory many desktop apps hold onto without actually using more than anything else, but it was useful). In my experience if you want a quick kill in low memory you need to run something like earlyoom to kill the offending process before the kernel desperately tries to keep things running by swapping out code pages and slowing the system to a crawl.


It's only one datapoint, but at this very moment a server at work is using a notable amount of swap, 1.5 GiB to be more precise, while functioning perfectly normally.

    $ free -h
                  total        used        free      shared  buff/cache   available
    Mem:          3.9Gi       1.7Gi       573Mi       180Mi       1.6Gi       1.7Gi
    Swap:         4.0Gi       1.5Gi       2.5Gi


I wish you luck! Only time that’s happened before was memory leaks for me, and it didn’t go very long before death spiraling. But if you’re comfortable with it, enjoy.


It's still working just fine, with still the same amount of swap in use (approximately).


> Eh, I’ve never seen a machine actually use any notable amount of swap and not be functionally death spiraling.

For my low-end notebook with solid-state storage I set the kernel's swappiness setting to 100 percent and this problem got magically fixed. It's rock-solid now.

I don't know how it works but it does.


It's pretty common for me to see a gig or two in swap, never really wanted back, and that RAM used for disk caching instead.


I think "Linux drops will drop executable pages without swap" is a symptom of machines with small amount of memory, say 4G or less. So it is pretty outdated for regular servers, and probably only relevant when you are saving money by buying tiny VMS.

Those build servers had at least 64GB of RAM, while executables were less than 1GB (our entire SDK install was ~2.5GB and it had much more stuff than just clang). So a machine would need to finely balance on memory pressure: high enough to cause clang to be evicted, but low enough to avoid OOM killer wraith.

I don't think this is very likely in machines with decent amount of memory.


Fair enough - I've seen it more commonly in smaller machines, but they're also more common in the fleets I've observed (and the ones that are more likely to run close to the edge memory-wise). I have also seen it in systems up to 32GB RAM, so it's by no means a non-issue in systems that are at least somewhat larger. The general observation that oomd/earlyoom + swap is a better solution than no swap still generally applies.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: