Why Reddit was down for 6 hours

jamwt · on March 18, 2011

I know it's not exactly in vogue these days to tout the merits of bare hardware, but.. after all the VPS hubbub over the last couple of years, the best progression for your website still seems to be:

1. No traction? Just put it anywhere, 'cause frankly, it doesn't matter. Cheapest reputable VPS possible. Let's say, Linode.

2. Scaling out, high concurrency and rapid growth? DEDICATED hardware from a QUALITY service provider--use rackspace, softlayer et al. Have them rack the servers for you and you'll still get ~3 hour turnarounds on new server orders. That's plenty fast for most kinds of growth. No inventory to deal with, and with deployment automation you're really not doing much "sysadmin-y" work or requiring full timers that know what Cisco switch to buy.

3. Technology megacorp, top-100 site? Staff up on hardcore net admin and sysadmin types, colocate first, and eventually, take control of/design the entire datacenter.

I simply don't understand why so many of these high-traffic services continue to rely on VPSes for phase 2 instead of managed or unmanaged dedicated hosting. The price/concurrent user is competitive or cheaper for bare metal. Most critically, it's insanely hard to predictably scale out database systems with high write loads when you have unpredictable virtualized (or even networked) I/O performance on your nodes.

jedberg · on March 18, 2011

reddit actually is a top 100 site, but we don't have nearly the need to host our own datacenter or co-locate. If we do make a move, it will be to #2. I don't want to hire people to be hands on -- I'd rather outsource that and let someone else pay to have spare capacity laying around.

lsc · on March 18, 2011

what kind of scale are you at? I mean, about how many 32GiB ram/8 core servers would you need if you were using real hardware?

jedberg · on March 19, 2011

We have ~130 servers at Amazon right now. We could probably do it with 50-75 or less, depending on how big the boxes are.

cagenut · on March 18, 2011

conde would very easily do this for you, they built an entire datacenter in deleware just for their child-companies to use like this.

jedberg · on March 19, 2011

Yes, they could. It is an option.

jamwt · on March 18, 2011

Fair enough; 100 was sort of arbitrary, but there is some metric when you're sort of at google scale and cutting down on power costs or achieving absolute minimum latencies between datacenters has a meaningful impact on the bottom line. Draw a line somewhere, and yeah, Reddit is probably on the other side.

phire · on March 18, 2011

Reddit has always amazed me with what they can do with extremely limited resources.

But it looks like that attitude has finally caught up with them, especially since they are down to just 3 technical staff (from 5 last week), and two of them are brand new.

jedberg · on March 19, 2011

Have some faith. We'll pull through. :)

brk · on March 18, 2011

Generally with you, except for the Rackspace recommendation, been burned and pissed off by Rackspace too many times to ever use or recommend them again.

I tend to try to find at least 1 good local/regional datacenter for a good portion of a server stack. There is huge benefits (IMO) in being able to drive to your servers and have a face to face meeting if there are issues and/or take your toys and go someplace else if there is a massive outage. If you're in Green Bay, options might be limited, but in any semi-major metropolitan area there are usually enough datacenter options that you have multiple choices.

reiddraper · on March 18, 2011

Another thought. If availability is your goal, with the trend toward 'operations as code', I think a small development team can build a system on top of AWS that can automatically respond to arbitrary node/resource/data-center failure. Netflix seems to do this to an extent, with their Chaos Monkey.

That being said, there are situations where you may truly need single-node performance that isn't available on AWS.

jedberg · on March 19, 2011

Chaos Monkey causes chaos, it does not fix it. :)

But yes, you are right. The goal is to have a system that scales itself. Not an easy task for sure.

naner · on March 18, 2011

http://www.reddit.com/r/blog/comments/g66f0/why_reddit_was_d...

A former employee is not quite as nice to Amazon.

JeremyBanks · on March 18, 2011

Also note the two other former employees replied in agreement: KeyserSosa[1] and raldi[2].

[1] http://www.reddit.com/r/blog/comments/g66f0/why_reddit_was_d...

[2] http://www.reddit.com/r/blog/comments/g66f0/why_reddit_was_d...

ilovecomputers · on March 18, 2011

It's so surreal reading their comments as "former employees."

This wasn't the case a few months ago. Good god I'm getting old, now I know how the poor people at plastic.com feel.

masklinn · on March 18, 2011

> This wasn't the case a few months ago.

For two of them, it wasn't the case a few days ago.

shawndrost · on March 18, 2011

Wow, lots of interesting stuff in that conversation.

http://www.reddit.com/r/blog/comments/g66f0/why_reddit_was_d...

brianwillis · on March 18, 2011

Assuming for a minute that Amazon deserves as much blame as ketralnis is heaping on here, why would the Reddit guys be so reluctant to point this out? Professionalism? Kindheartedness? Even professionalism and mutual respect have limits.

The community loves both the site and the admins, but there are limits to the patience of users, and those limits are being tested by these outages. I would think the Reddit guys would be happy to have a scapegoat to direct the community's rage towards.

idlewords · on March 18, 2011

Blaming a third party lacks class. The Reddit guys made the decision to rely heavily on EBS, and it came back to bite them. They show a lot of character by taking responsibility for an outage they had very little control over.

phire · on March 18, 2011

Also, its not a great idea to badmouth someone who is currently providing you with a service and actively working on improving it for you (according to the blog post.)

masklinn · on March 18, 2011

Then again, according to the comments Amazon is mostly actively promising to work on improving it, and has been for a year with little to show for it.

mbesto · on March 18, 2011

The good news is that it seems they have some leverage on Amazon now.

"Hi Mr. CIO of Amazon. You realize we had a 6 hour outage on one of the largest sites in the US because of you. We don't want to badmouth your service but we can. Care to pay more attention now?"

caveman82 · on March 18, 2011

Reddit's no longer a cashstrapped startup with limited resources and options. It's ultimately their choice to stick with Amazon. If Amazon has been giving them the run around for more than a whole year, maybe it would've been a smart decision to move to something else.

patio11 · on March 18, 2011

Happily, the Reddit team is better than the Reddit hivemind with regards to the appropriateness of channeling nerdrage.

Hovertruck · on March 18, 2011

I think in general a small group of professionals is going to be a little bit more focused and less knee-jerk-prone than a gigantic mass of similarly-minded people.

FiReaNG3L · on March 18, 2011

Because the whole site is hosted on Amazon, and it would be non-trivial to move from it. They also may or may not be receiving special pricing due to their size?

naner · on March 18, 2011

You shouldn't bite the hand that... hosts you.

And it is pretty unprofessional. Ketralnis thinks he's defending his buddies but he isn't really helping the situation.

nettdata · on March 18, 2011

I'm not sure that he really cares if he's helping the situation or not, seeing as he's no longer employed there.

I know first-hand that when a team is working on things, and it's very publicly going wrong, and it's due to things that are out of your control, it's beyond frustrating to have everyone think that it IS your fault due to the public stance.

I'm guessing that the Reddit team that is dealing with this is more than a bit pissed that they can't be more public with what the real reasons are.

Personally, I could care less about the "professionalism" or political BS, I'd rather know the real reasons for the problems so that I could be better informed and not run into the same issues.

I'd rather see more candor and less PR.

yuhong · on March 18, 2011

On the other hand: "Blaming a third party lacks class. The Reddit guys made the decision to rely heavily on EBS, and it came back to bite them. They show a lot of character by taking responsibility for an outage they had very little control over." What do you think is the best solution to that one?

bryanh · on March 18, 2011

Well, at this point, perhaps it needed to be said.

Reddit isn't vital to anyone's well being, but it is a service that hundreds of thousands (?) use pretty regularly, so it certainly isn't trivial either.

How is he hurting the situation by calling out a service for what it really is?

Angostura · on March 18, 2011

Actually, as a former employee, he is in a pretty good position to criticise while keeping that criticism at arms-length from Reddit. Not a bad tactic.

rhizome · on March 18, 2011

A poor craftsman blames his tools.

Groxx · on March 18, 2011

Indeed. You should always have a second and third hammer any time you swing, with an automatic failover to complete the job in case the first hammer explodes on contact or spontaneously ceases to exist.

davidw · on March 18, 2011

Unless of course you happen to be something akin to a cash-strapped startup, when you don't have those kind of luxuries.

nettdata · on March 18, 2011

I'm pretty sure he was being sarcastic. Even if he wasn't, that's how I took it, just because it was funnier that way.

Groxx · on March 18, 2011

I was :) I just hear the "a poor craftsman blames his tools" line often, but I'd bet you'd hear Michelangelo singing a different tune if his chisel hammer broke off at the handle and landed on his toe.

nettdata · on March 18, 2011

More like a poor craftsman selects the wrong tools for the job.

A1kmm · on March 18, 2011

Amazon claims: "Each storage volume is automatically replicated within the same Availability Zone. This prevents data loss due to failure of any single hardware component".

They make it sound like they are already providing RAID or something similar; however, the fact that things like this happen to Reddit, who have built their own RAID on top of Amazon's already replicated volumes, show that reliability is not a good reason to go with AWS.

parasubvert · on March 18, 2011

EBS isn't really RAIDed, it's virtualized block storage with replicas. The issue Reddit experienced wasn't drive failure, though, it was network degradation. The solution is to deploy redundant replicas in different availability zones (and/or regions, if you can). Reddit unfortunately wasn't built for that.

This isn't really any different from an on-premise application. An availability zone by definition implies "shared network hardware". Using multiple is what you do when you want redundancy.

snewman · on March 18, 2011

How do you know the issue was network degradation? Is this written up somewhere?

parasubvert · on March 19, 2011

The original Reddit blog post indicates there were latency problems initially. It's not clear what caused followon problems, but the latency may have triggered a bad condition for their replication.

misterbwong · on March 18, 2011

Disk access was the single reason we couldn't go with AWS offerings. The speed/reliability of EBS just isn't where we need it to be for our database servers. I don't blame amazon for this-it's just a drawback of the their choice to go fully shared tenant and virtual.

kowsik · on March 18, 2011

EBS storage aside, they are down to 3 guys? yikes

mttwrnr · on March 18, 2011

They have been granted a lot more help from Conde Nast. They're in the process of hiring four more developers.

http://blog.reddit.com/2011/03/so-long-and-thanks-for-all-po...

kowsik · on March 18, 2011

devops and all, the ratio's still staggering for the number of hits that reddit gets.

Raphael · on March 18, 2011

Just one programmer, spladug, on the job for 4 months.

brown9-2 · on March 18, 2011

Idle speculation is a bit useless, but it really makes you wonder why so many of their team have left recently, doesn't it? I wonder if the stress of keeping up with the increased load (and subsequent downtime) became too much for some of them.

bryanh · on March 18, 2011

On that note, I have been meaning to ask HN (even if nothing more than an exercise)...

If you had to run a site like Reddit, what would you do?

phire · on March 18, 2011

Most importantly, I wouldn't let the staffing levels get this low.

At this point in its life, reddit should have 6-12 programmers/system administrators + a few support staff, compared with the 3 they have at the moment.

That way they won't be agonizing over the choice between devoting their resources to keeping reddit running in the short term, or to moving reddit away from EC2 for long term stability.

stingraycharles · on March 18, 2011

"Most importantly, I wouldn't let the staffing levels get this low."

How would you pay for that?

phire · on March 18, 2011

Reddit has never been low on money. According to one of their old developers, the staffing issues have always been political, not financial.

http://www.reddit.com/r/blog/comments/g66f0/why_reddit_was_d...

_delirium · on March 18, 2011

Ah yes, the telltale sign of a large bureaucracy: people spending $150 of one kind of money to avoid spending $100 of a different kind of money (in this case, payroll budget versus tech-infrastructure budget).

lsc · on March 18, 2011

Well, I'm a SysAdmin/hardware guy myself, so obviously, I'd buy my own hardware and run it. Once a month I'd spin up my off-site backups (that would be on ec2) to make sure I had somewhere to go if the shit hit the fan at my co-lo.

Of course, I've spent my life dealing with infrastructure, so running a rack or two (or ten) servers is going to be a lot cheaper/easier for me than it would be for someone without that experience. The economics for people unwilling to gain that experience will depend entirely on scale; e.g. do they have enough servers that the lower running cost of owning hardware would pay for someone to manage that sort of thing. (and yes, hiring other people has overhead in and of itself.)

Generally, as much as possible, I avoid building complexity. I find that having a single point of failure with a backup that can be manually brought in to place (such as an asynchronously replicated database) is quite often more reliable than fancy home-made SAN solutions. In general, you need to be /very careful/ of complex redundant systems. In fact, I approach it somewhat like crypto. As much as possible, I don't build it myself. use well-used open-source tools with well-known failure modes.

The other thing to think of is failure domains. Sure, think about single points of failure, but more importantly, think about what goes down if that single point fails. For instance, in my current setup, each xen host is a single point of failure... but one going down won't take down anything else. I've seen other people design similar systems with improvised SAN setups, thinking "oh, if one node dies, I'll boot the guests on another!" the problem is that if that san goes down, everyone is toast, while if one of my hosts goes down, we're talking maybe 1/40th of my customers who are out of action.

I've seen drbd setups where a guest mirrored itself locally and to a second server... It sounded like a great idea, but the system turned out to be less reliable than my dumb local storage setup, as weirdness in drbd and how drbd dealt with disk and network issues would cause lockups that were much more frequent than the hardware failures that would take down whole nodes in my local disk setup.

In fact, I have a very strong suspicion, born of hard experience and smoking pagers, of SANs that cost less than mid-sized bay-area condos. And I'm pretty cheap, so that means local storage for me. There has been a lot of activity in that field lately, so I'm very carefully exploring it again, but I certainly wouldn't count on some homemade san as being more reliable than local disk.

(to be fair, my experience lies with "expensive SANs, used expensive SANs, and homemade SANs." and my only good experience was with the first of those. I don't have a lot of experience with low-cost commercial SANs.)

Morever, I'm pretty suspicious of disk over the network schemes, even when the expensive "real" SANs are involved. NFS is the only scheme I really trust; it's older than I am. It has problems, but we know about all of them. Overall, I think NFS handles network blips much better than any of the block device over the network schemes I've used. And you do see network blips. The network simply isn't as reliable as your sata cable, and the block subsystem isn't designed to deal with devices that are temporarily unavailable.

I've seen a lot of 'clever' redundant setups built by people who are much smarter than I am... quite often, their setup ends up becoming less reliable than my "dumb" systems.

btucker · on March 18, 2011

  > I find that having a single point of failure with a backup
  > that can be manually brought in to place (such as an
  > asynchronously replicated database) is quite often more
  > reliable than fancy home-made SAN solutions.

As someone who has invested heavily in implementing HA systems and then seen them be the root of increased downtime, I have come to the same conclusion. A simple fail-over system that will result in short downtime & require limited manual intervention is in many cases the best route.

duck · on March 18, 2011

Then, something really bad happened. Something which made the earlier outage a comparative walk in the park.

Murphy's law on St. Patrick's Day. Doesn't get any better than that.

jedberg · on March 18, 2011

I didn't even get a chance to have a Guinness today. :(

bryanh · on March 18, 2011

At this rate, a nice, tall Guinness is probably just what the doctor ordered.

X-Istence · on March 18, 2011

I always love seeing a good technical post-mortem of what went wrong and how it could be fixed in the future...

I'm currently working on building a backend service that has to scale massively as well, and it has been a fun challenge trying to understand exactly where things can go wrong and how wrong they can go...

marcamillion · on March 18, 2011

Wow...they sound like they are really beating themselves up over it.

I know the community can be demanding, but that just seems stressful.

rgrieselhuber · on March 18, 2011

Great writeup. I'd love to hear other people's experience with regards to workarounds when / if EBS goes down (switching over to RDS for a short time, etc.).

The comment about moving to local storage was interesting. Isn't the local storage on EC2 instances extremely limited (like 10-20GB?)

ktsmith · on March 18, 2011

Assuming "local storage" is synonymous with "instance storage" it's 160GB to 1690GB. http://aws.amazon.com/ec2/instance-types/

gregburek · on March 18, 2011

So, I assume instances reddit uses have instance storage root volumes instead of EBS root vols. I've always assumed the persistence of ebs AMIs was a plus without a downside. Why would you opt for instance-storage AMIs instead of ebs root volume EC2 AMIs?

ktsmith · on March 18, 2011

Being that EBS booting only became an option in December 2009 I would not be surprised if Reddit had not migrated their instances to that boot/storage method. They acknowledged in the last two years they hadn't even had time to move one of their databases from a single EBS volume to striped EBS volumes.

jedberg · on March 18, 2011

We're currently in the process of replacing every one of our hosts with new OS versions. As we do this we are in fact going to the EBS based instances.

Those instances actually show the same problems, but they aren't too bad, because once you boot them, you don't need the root vol that much (that's what the instance storage is for).

jjm · on March 18, 2011

Some Qs:

Q1. I still don't get the use case for db storage on ephemeral storage.

Q2. If EBS is the problem why are you migrating to S3 backed EBS boot vols? The problem with this is still the time in between snapshots even though it will be shortened.

Some Comments: It will only be a matter of time before S3 disks and hardware start dying like EBS...en masse

I talked with Ketralnis several year ago and know how many VMs you were running back then. Pretty sure your not too far off from that count even today (even if 2x).

You can still virtualize on a good set of dedicated hardware to emulate your current 'network environment' to get you up and running in the near term _asap_. Obviously you'd build out of that vm environment (with your load) as the days go by. Seriously look into a parallel switch over though.

If EBS is in fact a huge issue as has been shown, you really may need to start migrating off unless you want dedicated employees monitoring system health on AWS. Eventually if problems continue that is what will happen, with no time left to even develop automation... And why automate on a pile of instability?

Don't forget that the more VMs you add with this high failure rate increases soft management costs and will eventually eat into your development time...

I don't work for Rackspace (I think they're quite expensive), but you guys might benefit from this level of care to focus on the real issues.

jedberg · on March 19, 2011

> Q1. I still don't get the use case for db storage on ephemeral storage.

We're still not sure either, so we're investigating to see if it makes sense. One possible option will be to have the master on ephemeral disk with a hot backup on EBS so there is no data loss.

Another option is use ephemeral for the master and all but one slave, so we got hot backups without a slowdown.

Still need to look into it more.

The one that we are doing ephemeral right now is Cassandra with continuous snapshots to EBS. Everything in there can be recalculated, and with an RF of 3, if we lose one node we can run a repair.

> Q2. If EBS is the problem why are you migrating to S3 backed EBS boot vols? The problem with this is still the time in between snapshots even though it will be shortened.

They are just easier to use. The root volume is rarely accessed after it is booted, so the EBS slowdowns aren't really a problem in that case.

> Some Comments: It will only be a matter of time before S3 disks and hardware start dying like EBS...en masse

I don't think so. It is a totally different product built by a totally different team with a different philosophy. S3 was build for durability above all else.

In response to the rest of your comments, you are absolutely right, there are other options. We will certainly be investigating them.

jjm · on March 18, 2011

I meant to say several months ago, not years.

ktsmith · on March 18, 2011

Thanks for the follow up jedberg, I was just guessing based on what has been publicly stated in the various blog posts over the last year or two. I used the same process for my own s3 -> ebs boot volume migration. That took a few weeks and I didn't have that many instances to migrate in the first place. Given the large number of instances reddit uses and the surprisingly small staff and one would reasonably expect that the migration would not be done.

gregburek · on March 18, 2011

Thanks for the extra info! I'm doing a lot of work now with python on EC2 and the reddit write ups + presentations have been a huge help. Thanks again.

jedberg · on March 18, 2011

> python on EC2

#1 tip: Don't use threading. Python threading + EC2 will not work well. Instead rely on the OS doing the task switching and run multiple copies.

If you want more info, I did a talk at Pycon about this and other things: redd.it/b5jyy

gregburek · on March 18, 2011

Duly noted. I started with this talk and have been using it as a guide to scaling edge cases with python as well as AWS. I thought raid10 was overkill before I started digging into the postgres/EBS mess, but now it seem almost routine enough that Amazon should have it as a configuration option.

WALoeIII · on March 18, 2011

Did you get stuck in the Fedora 8 trap as well? It was the 'starter' in 2008/2009 and it took us two years to get off it.

jedberg · on March 19, 2011

Ubuntu 8.10 for us.

rgrieselhuber · on March 18, 2011

That makes more sense, thanks.

PaulHoule · on March 18, 2011

I had two machines running in east-1 last night and one of them went down around the same time reddit did. The other one made it through the night O.K.

EBS problems do seem to be the biggest reliability problem in EC2 right now. The most common symptom is that a machine goes to 100% CPU use and 'locks up'. Stopping the instance and restarting usually solves the problem.

The events also appear to be clustered in time. I've had instances go for a month with no problems, then it happens 6 times in the next 24 hours.

My sites are small, but one of them runs VERY big batch jobs periodically that take up a lot of RAM and CPU. Being able to rent a very powerful machine for a short time to get the batch job done without messing up the site is a big plus.

jwcacces · on March 18, 2011

This is why you don't outsource your bread and butter, people!

If you want to outsource who makes your lunch, fine, but if your whole business is requests in, data out, you do not put the responsibility of storing your data in someone else's hands.

I get it, Amazon EBS is cheap. But at the end of the day you've got to make sure it's your fingers on the pulse of those servers, not someone else who's priorities and vigilance may not always line up with yours.

(also the cloud is dumb)

tomkarlo · on March 18, 2011

It's all a continuum. You could also build your own servers, or design specialized boards with dedicated processors optimized to your application. Everyone is going to choose a point on the continuum and each will have tradeoffs.

nkohari · on March 18, 2011

You're still outsourcing if you go with a managed dedicated hosting service, or even if you buy hardware and colocate it. Even if you owned the datacenter and the entire backbone, you're still banking on everyone else not fucking up their end of the connection.

jwcacces · on March 18, 2011

Yeah, but at least you can take direct action when your people fuck up.

_4vyi · on March 18, 2011

> We could make some speculation about the disks possibly losing writes when Postgres flushed commits to disk, but we have no proof to determine what happened.

If you read between the lines, this says that EBS lies about the result of fsync(), which is horrifying.

jrmg · on March 18, 2011

Most consumer /hard drives/ lie about the result of fsync, as a 'performance optimization'.

It's generally possible to fsync, then cut the power before the data is physically on the disk.

lsc · on March 18, 2011

Yeah. but even I don't use consumer hard drives in production. (Honestly, I don't know 100% that the 'enterprise' drives are that much better, but I'd guess they'd lie less... I switched because consumer drives tend to hang RAIDs when they fail, while 'enterprise' stuff fails clean)

wulczer · on March 18, 2011

There's a section in the Postgres docs about fsync and lying hard drives: http://developer.postgresql.org/pgdocs/postgres/wal-reliabil...

There's also a Postgres contrib module called pg_test_fsync that tests various fsync modes on your hard drive, and if you get too high results, you can strongly suspect that the dist is lying.

CrLf · on March 18, 2011

Enterprise hardware lies about fsync, because even if you loose power, there is a little battery on the RAID controller that's enough to flush the cache to disk or to keep it for hours until the machine is powered up again. When the battery goes bad, the write cache is disabled automatically.

On bigger hardware, like SAN storage arrays, the (redundant) batteries keep the whole thing running for a while after the loss of power.

omh · on March 18, 2011

You could consider that the whole battery backup etc. means that it isn't lying about fsync. It says that it's been permanently written, and then it makes sure that is has. It might not have been burnt to spinning metal, but the system as a whole will ensure that it's permanent.

lsc · on March 19, 2011

"enterprise" disk is just that; it's just disk. I've not seen a disk with an onboard battery. You can put a raid controller in front of your disks, with or without battery backed cache. (most raid cards have an optional battery module.) But, even many gold plated 'corporate' systems I've seen omit the battery, as it's usually not cheap.

My personal opinion is that raid controllers are of little value without battery backed cache.

I just use MD in front of 'enterprise sata' drives, as without doubling my total cost for storage, I can't get a raid card, as far as I can tell, which is better than MD, and really, if I was to double my storage cost, simply buying twice as many spindles would get me better bang for my buck, I think, than a hardware raid card that cost that much.

omh · on March 18, 2011

Enterprise hard drives might lie about fsync with caching turned on, but they generally let you disable caching and behave correctly.

Of course that completely shafts your performance, but hopefully you were expecting that.

_delirium · on March 18, 2011

A comment from an ex reddit employee linked elsewhere in this HN discussion (http://www.reddit.com/r/blog/comments/g66f0/why_reddit_was_d...) seems to confirm that:

More recently we also discovered that these disks [EBS volumes experiencing performance degradation] will also frequently report that a disk transaction has been committed to hardware but are flat-out lying.

thaumaturgy · on March 18, 2011

...I was wondering if I was going to be the only one that caught that.

There's not enough information to know if that's even possible or not, or how it might be possible, but as far as I know there aren't any known postgres issues that would cause that.

I would be extremely reluctant to store data on a network that sometimes didn't actually store the data without raising an error anywhere.

patrickgzill · on March 18, 2011

You can actually configure postgres to not do an fsync() after each transaction; instead it gets written to disk, which on Linux means written to the disk caches in RAM, then Linux flushes to disk when it does the rest of the disk writes (anywhere from 5 to 30 seconds are set as default on different versions of Linux).

_4vyi · on March 18, 2011

Why would you want to do that? This isn't MongoDB, the real world cares about durability.

patrickgzill · on March 19, 2011

I am not sure why I was downvoted - I wasn't suggesting as a course of action... it gives a large speed increase and I was suggesting that possibly this was done as a performance measure.

moe · on March 18, 2011

If you read between the lines, this says that EBS lies about the result of fsync(), which is horrifying.

Jump to conclusions much?

Do you know if reddit had fsync enabled in first place?

khafra · on March 18, 2011

If you read between the lines, "if you read between the lines" almost always means "if you make an educated jump to conclusions."

moe · on March 18, 2011

So where is the education in jumping from "we have no proof to determine what happened" to an outrageous claim about EBS?

It's very common to turn off fsync on a database for performance reasons. It's far less common to have a network block device driver (which tend to be designed with intermittent outages in mind) lie about fsync.