Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why are distributed systems so polarizing?
71 points by c0mptonFP on Aug 12, 2022 | hide | past | favorite | 63 comments
There is an odd gatekeeping duality on tech forums:

1. You're not a proper engineer if you can't write scalable, highly available software that scales to infinity. Real engineers write production-grade, robust, fault tolerant, scalable, highly available, observable mission-critical systems.

2. No one actually needs large distributed systems, you're not google, stop trying to build large scalable systems. One Server + backup is enough. Everything else is overkill, complexity, resume-driven engineering. I can handle 50k RPS with one beefy bare metal machine, written in Rust. Unless you have 10 million customers, which 99.9% of companies don't have.

I'm not sure how to feel about this.




Those seem like extreme positions. Reality is more like this:

1) Real engineers write systems that accomplish the organization's goals.

2) Most people don't need to write large distributed system, but they will end up writing small distributed systems.

3) Small distributed systems can be surprisingly complicated.


2) is because making a distributed system is more interesting and fun than debugging why your ORM making inefficient DB queries.


This is so true. Hire a bunch of senior engineers to solve junior level problems and they'll turn the junior level problems into senior level problems :p


I'm mostly surprised how developers as a community can both:

- Espouse we really need to care about communication in every form

- Still manage to let a few 'seniors' capable of solving the problem do so, at the cost of destroying most momentum by making the solution incredibly difficult to reason about and leaving next to zero documentation behind.

Then again, when business doesn't give the time to fix this or doublecheck, it's no wonder. And then hiring managers look cross-eyed at developers for wanting to work on green project instead of the next dumpster fire


If you've got an ORM, you almost certainly have a distributed system already (arguably not if your ORM is talking to SQLite in process). Even if DB and application are on the same host, I'd call it at least a potentially distributed system.

I'll do the de-minimus anti-ORM screed here and just say if it's harder to get an ORM to make efficient queries than it is to write efficient queries by hand, maybe an ORM isn't a necessary abstraction layer.


>why your ORM making inefficient DB queries

It's also crazy that we still have this and other basic problems with databases. Frontend and backend is moving so fast, with so many fun technologies, but even backing up mid-size database, testing backup and restring it, isn't a trivial problem...


The market demand for database work is primarily not in making mid-size databases work better, but in making the "mid-size database" abstraction work on increasingly large (and increasingly distributed) datasets. I'm familiar with large companies who routinely run SQL queries over ~all their logs; a decade ago that would have been somewhere between extremely challenging and impossible.


When you have three folders stuffed with hundreds of files that are laced with your ORM annotations the ship has already sailed.


I'd say that you often want a small distributed system for availability reasons, even if scalability is not a concern.


That only happens the first time though. Distributed systems are the devil when it comes to debugability.


Some examples of small interesting distributed systems that occur even with a traditional stack:

With multiple visitors on a web app on top of a relational database, you have a small distributed system if they can somehow interact. Even something as simple as a web shop can leave you with an interesting question like what to do when multiple people ordered the last item in stock.

Even just a page with a little bit of ajax going on can give you an interesting small distributed system where a single visitor can interact with themselves.

Or say you need to sync up information between two systems, e.g. export information from the ERP system to a web app or vice versa. That's a small distributed system - there are lots of pitfalls in state transfers.


How is Position 2 extreme? It's just a fact that the vast majority of sites don't need highly available, massively distributed and scalable systems. The small amount of organizations that actually need something like that are the rare exceptions.


This is...a refreshing take. Good shout.


The tension comes from our addiction to FAANG-style software engineering.

You can't be a Google engineer without thinking about scale-out. However, Google et al have a different kind of engineering than most other companies: they have tons of requirements of you that make engineering hard, but they also have tons of tools and libraries that make scale easy. In these environments, it makes sense to make even the most trivial systems horizontally scalable.

Conversely, if you do not have Google's or Amazon's distributed system components, and you don't have access to expertise with those tools, scale-out is likely the hardest problem to solve. GCP, AWS and others know this, so they charge you a lot to solve the scale-out problem for you (using their internal tools).

This is the source of the tension. FAANG-style engineering is what pads your resume (and establishes you as a "smart engineer" in the eyes of people who want to work at FAANGs), and simpler systems get things done until you absolutely need to scale out.


> FAANG-style engineering is what pads your resume (and establishes you as a "smart engineer" in the eyes of people who want to work at FAANGs)

This isn't just engineers doing this though. I worked at startups where there was pressure to build really complex systems with lots of staff. This goes from engineer who is promoted for building complex "hard" systems( manager directly asked me "whats hard about" simple system you built). Managers are promoted for hiring more people. Directors are promoted for "growing their org" and so on. Its a collective collusion to create more "bullshit jobs" . VCs demand that you build these systems with tons of staff to supposedly outcompete FAANGs.


What kind of tools they have to make scale easy?


I used to work at G, so my view is based on that. The three key infrastructure services that handle this for you are: Colossus (scale-out distributed filesystem), Chubby (lock service), and Spanner/Bigtable (scale-out databases). All three of these handle concurrency for you, and as long as you are okay accepting their concurrency models, you basically get the hard parts of a distributed system done for you. Of these three, only Spanner and Bigtable have similar competitors that are publicly available.

On top of those services, Google also has distributed systems that handle higher-level things like authentication, and they have libraries that help you do things like load balancing and load shedding. To top it off, the monitoring available is top-notch, and there are lots of administration tools that allow you to make sure that your service and all of your customers are well-behaved.


It is my opinion that too many programmers jump too soon to scale-out systems before making sure that every node is as highly optimized as it could be.

Data will always be able to outgrow hardware's capability of processing it in a timely manner. Parallel systems are critical to handle database tables with billions of rows; file systems with hundreds of millions of files; and NoSql stores with an ever increasing number of KV pairs or documents. So a distributed system becomes necessary at some point.

The problem comes when the threshold for scaling out is set too low. A process gets too slow and instead of optimizing the code, they immediately try breaking it up and distributing it. So instead of needing a dozen servers to handle a big problem, the algorithms are inefficient enough that it takes 100 servers or more to solve the same problem in a reasonable amount of time.

I am working on a distributed data system https://didgets.com/ that handles all kinds of data. I have focused on making sure that every node can process large amounts of data in an efficient manner. It can be so much faster when you don't have to coordinate between too many pieces and more of the data is located close to the CPU processing it.


I recommend the whitepaper Scalability! But at what COST? A single threaded system with data locality can often run in shorter time than a system that is networked.

I also recommend the the document "latencies every developer should know" updated by Jeff Dean.

I say this as most of the time a computer is waiting for IO from main memory or SSD or network or spinning disks. The more sequential you can get memory into the CPU the better. Row major memory pattern, structures of arrays or arrays of structures depending.

I write multithreaded software so I feel people reach for multiprocessing software due to multithreading being harder to get right than putting load balancer in front of it.

I write and journal of concurrency, threading and distributed systems in my GitHub, mainly ideas4.


I was reading your ideas4 list and came across #18 Data Structure Synchronization. It referenced RocketSet which I had not heard before. A brief introduction to their system told me that they did things very similarly to how I implemented file system tags and DB columns in my Didgets system. My tag objects can also be used to do content indexing and creating 'Dictionaries' that are incredibly fast. The bonus is that for each column, the index and the data are one and the same. When you do CRUD operations, you don't have to update more than one copy of the data.


Wow thank you for reading it. I really appreciate that.

I am enamoured with RockSet's converged indexes. They solve the problems of WHERE queries, columnular analytic workloads and row based iteration of data.

I think we tend to look at data is being relatively static and don't denormalise as much as we could for performance and data locality so we bear with slow Microservices and distributed systems with lots of IO. Once data is in Postgres you don't shift it to other machines that often.

I think the next step of distributed systems is key rebalancing and key SCHEDULING. I plan to design a system that shifts data to create nodes where particular queries are fast with data locality due to that server having all the data needed to fulfil the query. Autoshard at Google is interesting too. It requires denormalisation and data synchronization. I am looking at multimaster postgres or writing my own simple synchronizer. But it is eventually consistent.

I also have implemented multiversion concurrency control solution and an early raft implementation that needs to be added to a server. I tend to build components then plug them together. I am looking at Google's spanner and TrueTime.

I shall look at your solution.

How do you avoid the duplicate copying of data with a single copy of the data? How is it columnular or indexed by value?


Each of my data objects called Didgets (short for Data Widgets) that store tag or column information is a simple Key-Value store. I store each unique value once and have links between it and any keys mapped to it.

So if you have a 1 million row table of U.S. customers that has a 'state' column, then each state value (e.g. 'California', 'Iowa', 'Florida', etc.) is only stored once. Each value is referenced counted so if you had 50,000 customers who lived in California that value would have a reference count of 50,000. There would then be 50,000 links between the 'California' value and their respective keys (in this case the row numbers).

If one of your customers moves from California to Texas, you update that value which just decrements the reference count for California and increments the reference count for Texas. Then the row key for that customer is re-linked or mapped from pointing to the California value to pointing to the Texas value.

A query like "SELECT name, state, zip FROM <table> WHERE state LIKE 'T%';" causes the code to find all values in the state column that start with the letter T (Texas, Tennessee). It will then find all the keys mapped to those two values. Then it will load in the data for the name and zip columns and find any values mapped to those keys.

It is incredibly fast partly because it can be multi-threaded (one thread finds the names mapped to the keys while another one finds the zip codes mapped). Analytics are fast too since all the values are reference counted. It can instantly tell you the top 10 states where your customers live.

Here is a short video showing how fast it can do it compared to the same data set stored in Postgres. https://www.youtube.com/watch?v=OVICKCkWMZE


That's really interesting.

What's also coincidental is that I read a Quora post recently of network model databases where references are direct. They have the linking problem which CODASYL worked towards solving. Where you need to update all links when there are changes. My idea 18 data synchronization system is to produce one system to handle it. It also could be used for cache invalidation which is an interesting problem in itself.

Your solution also reminds me of a graph database where nodes and vertices/ edges are explicitly stored but in your case you are using reference counting which is new to me.

Thanks for sharing your knowledge on this.

From a logical point of view, explicit storage of links should be more efficient than a hash join or nested loop join. But is there a tradeof in write amplification? Joins in Postgres are materialised at query time but they're efficient due to btrees, whereas in your model the data links are all materialised. I think links in graph databases such as neo4j and dgraph are materialised as links directly too.


Updating values may or may not result in links being changed. For example if you imported data where 'Illinois' was misspelled as 'Ilinois' for the 10,000 customers who lived there; updating the value to the correct spelling might not change any of the links.

All the links are stored within a hashed set of data blocks where similar links are stored near each other. This helps minimize any write amplification. You might be able to update 10,000 links while only needing to write out a few blocks to disk.


Do you have a link to the whitepaper you referenced?


Nevermind. I found it once I figured out the full title was "Scalability! But at what cost?"



It's helpful to plan for scale out if it will be needed, though. It can be difficult to build that later if it wasn't planned for.

On the other hand, you do need to consider scaling up. You can get a HPE ProLiant DL385 Gen10 Plus with dual 64-core Epycs and 8 TB of ram and almost half a petabyte of flash storage. If you're starting from zero, it's likely a long way until that's not enough, by then bigger servers might be easily obtainable, IBM has a power server that goes to 64 TB, but then you're dealing with IBM and power.

If you do need a distributed system, if it's possible, you want to design things as you said, with as little coordination as possible during processing. Thinking about coordination in the system design early can help you make it easier to separate later. Coordination on a single host isn't as expensive as across hosts, but it's not free either, so it's not wasted work to consider early.


I think you should stop reading tech forums as if they're supposed to converge on some sort of coherent consensus and form a singular voice. It's just thousands of opinions partially dictated by who was bored when something was posted.

Read and form your own coherent opinions, bust stop asking whole forums to.

Edit: I also want to jump in again and pooh-pooh this redditized style of askhn.


I'm not always expecting a consensus, it's just that frustration-fuelled rants end up being noisy and sometimes difficult to differentiate from rational arguments.

What also bothers me is the hostility and cynicism with which these things are discussed.

Also, you're tripping if you don't think HN has a quasi-consensus on many, many topics.


It's very natural and common to attach a sense of community voice to a fourm. And I end up doing it as well so I get the draw and I don't mean to pick on you.

> Also, you're tripping if you don't think HN has a quasi-consensus on many, many topics.

My point is it's still important remember that it's actually thousands of voices from dozens of actual technical communities. It's a bad habit and I work resist it myself.

If you met these people at a conference you'd come away thinking "wow, this must be and unresolved or open ended question in the field with a lot of distinct different opinions" not unsettled asking "well, which is it, make up your mind?!?".

You said you weren't sure how to feel about it. This is how you should feel about it. imho


What you said makes a lot of sense. Also quite a few times I come to the conclusion that there's no satisfying answer to a topic, and that's fine.

Just this time around I had a weird gut feeling. So I created this post to get a bit of meta-level feedback.


IMO, polarizing for political decisions.

Distributed systems are harder to reason about, and so people have different feelings on the maintenance burden. Things get complicated around career goals and personal goals like resume driven development.

The odd gate-keeping duality is due to large super corporations dominating the public discussion, while smaller firms and developers fight back.

The truth is more nuanced though. Most places don't need or have high quality and highly available software, and can STILL make millions of dollars a year. Banks, and critical infrastructure SHOULD have very scalable software in key places. I know for a fact many investment banks have tons of good scalable software, and tons of absolute shit software depending on where you look.


You’ve presented two extremes here. With no indication of timescale either.

For example, starter projects in general should lean towards option 2 for obvious reasons. But as they grow, naturally you’re going to become distributed. You’re right that very few companies need hundreds of infinite scale, but many companies need “2” scale right?

Also, Rust is a fairly new language. The majority of companies out there are on slower languages. Are you asking them to fire all the Python people and bring in Rust developers to rewrite everything?

The issue is so complex and nuanced that discussing it without context and detail seems pointless. To be honest if anyone holds any of the above opinions in real life I’m probably just going to smile and nod and move on.


"forest or trees". Obviously theres differences of opinion resulting from different needs.

"gate keeping" is probably more "argument as sport" indulgence than actual passion; I expect few of the discussions you're summarizing have enough detail for anyone to say they're advising on a specific solution.

Put another way: tho I'm an advocate of "use what you got and keep it under your thumb," if the situation actually called for it I wouldn't consider it a bad thing to implement a solution using cloud or CDN services either. I will say I think I'd try to limit the need for them, so that the system could run alone, but the first goal is that the system runs to the level it needs to solve the problem, not that it satisfy abstract design notions.

> I'm not sure how to feel about this.

You don't have to "feel" about it. People disagree. When you face a problem that you are trying to solve, their passionate writing may aid you in finding your solution, but that's worth the same gratitude you give to the rest of the world that offers you knowledge.

I suggest you take a diversity of opinion and vigorous debate on a subject as a reason for joy: vicious rants can be fun to read and the only certainty in dogma is that it will be boring.


One of the mistakes I see happen in discussions about architecture is that just because something is scalable doesn't mean that's always the only (or even primary) reason for choosing that approach.

I use DynamoDB, not because it is scalable, but because it's a simpler developer experience that covers my rather simple query use cases.

Many teams adopt a multi-service architecture (not always "micro" services) not because of scalability, but because it allows multiple development teams to work and release separately (most of the time).

Don't let arguments of scalability and performance distract you from other considerations.


There are bunch of reasons, but here’s the one topping my list:

1. Heroes!

One man’s hero is another man’s jerk. A single box becomes a single point of failure. Once the hero guy gets pulled out of bed one too many times because the one box is down due to a “simple” change the other “stupid other guy who’s not the real coder” pushed, the hero guy decides they have a duty to protect the tin box with their blood.

It gets messy real fast. Soon you get a tin box that no body except our “Hero” can touch. The hero has all the passwords and they decide where the backups are kept (you don’t want the lesser people to mess them up! don’t you?), the hero has the encryption keys, the hero is only person who knows what to do when the tin box goes up in smoke.

Any org worth it’s salt does not want that hero guy or their one magic tin box!

TL;DR: Organizations are willing to pay for the slow distributed system that costs 10x to run to avoid the real shitty people situation that often comes with a magic all-powerful single machine.

(edit: tldr)


But instead you just get Mr Hero protecting the AWS account from the developers, which is probably even more of a problem.


People in general notice and give attention far more to the extreme opinions then the more nuanced opinions. The vast majority of comments on distributed systems are nothing like what you mention, but they don't receive the same level of attention.


I think because it dis-empowers programmers and gives that power to "architects", SREs, system administrators etc. Decisions that could have been made and enforced through code instead must be coordinated across multiple systems.


It is not even about this specific topic - it is basically on most of tech.

It is ORM vs writing SQL queries by hand it is tabs vs spaces etc.

How to feel about this?

All these are worthless water treading arguments. You can simply ignore that stuff and don't waste your time on getting a feeling on it - any opinion on forum or blog post is just an opinion. So unless you really know background of person stating opinion and you are absolutely sure that person is an expert in matter just ignore it.

People love to extrapolate their experience and they think they have all the answers, they don't understand how big world really is and how many different companies and how many use cases are there.


Regardless of how much an expert they are, their experience isn't necessarily your use case either, and technology evolves. What used to be impossible is now possible, and what was bleeding edge technology is now the recommended boring tech.


Microservices - which are related to what we're talking about - have so many benefits when done right. Teams can deploy independently and interactions between teams just become about API contracts or data relationships if we're talking aysnc systems / message passing. Sure, there is a whole lots of other complexity and issues (technical and organisational) that can arise here but, scale-aside, it is often about solving the problem of people working on a big system. A single monolith is hard to work on if your business requires continuous deployment type work flows for example.

All that said, I'm a big fan of single applications where you don't need it. Eg single teams should often be striving to build single apps with well-defined boundaries and interfaces over putting these modules into their own services. You have to ask yourself why you're doing it. There are many reasons to do so but don't cargo cult and make your team's domain into twenty small services just for the sake of it.

As with everything in software engineering, whether you should do something is answered by "it depends". What the OP is alluding too doesn't seem to reflect the nuanced reality out there or what many really think.


I think both of these are missing the fundamental question of: What does the business, or what do the product teams need? And what is the business willing to pay for it?

We have a whole bunch of simple systems with very lax SLAs around. Those are just running as a single container in the orchestration, because upon failure, they just restart with a minute of downtime or five and it's available enough. For these systems it wouldn't make sense to really think about HA postgres clusters for example. It would be entirely fine to have 2 postgres instances with replication and alerting so the admin can trigger a failover as needed.

However, we also have systems with rather strict and ambitious SLAs, and we're being paid well for the availability of these systems. And at that point, the buisness decisions start to accrue - the company considers it a selling point to be self-hosting, we have ambitious SLAs, we have a lot of products settling on postgres.

At that point, it makes sense for the company to have 1-2 engineers focusing on an HA postgres setup and a couple more engineers who can handle it during an oncall incident. It took us like 2 years and a lot of head scratching to get where we are (which would be an unacceptable investment of time and money to a startup), but now these rock-solid database clusters are turning into an actual asset for sales and product development.

This has been my learning over the last 2-4 years overall: You kinda have to do the smallest and simplest thing to make the business requirement work. Oftentimes, a simple single node solution or a 2-node setup with some defined manual emergency handling works surprisingly well and you don't need any fully scalable auto-failover setup. In other times, big requirements require the big hammer, but that one doesn't come cheap.


Companies that anticipate scaling needs well in advance and plan accordingly tend to make smoother transitions than if they are always reacting to the latest crisis. Too many wait until it hits the fan before they do what is necessary.

Some small businesses will try to run a set of Excel spreadsheets as the 'company database' until it absolutely breaks on them once the load reaches a certain point. Similarly, companies that opt for a lighter-weight solution such as SQLite might have to scramble when they have to port all their data to Postgres or other database.

In today's environment where a company can go from nothing to a massive data overload in a short period of time, it can be particularly challenging to anticipate what is coming their way. This is one reason why cloud computing is so popular.


You're calling it 'gatekeeping', but in my experience the latter position is mostly an expression of _frustration_. Dealing with the fallout from dozens of different decisions by engineers and directors to build "for scale" without the proper understanding of the costs involved in doing so makes many of us react with exasperation to suggestions in that vein.

We rarely have to convince someone that they _do_ need scalable highly available software - mouthing the idea from across the room is generally sufficient to make a director sign off on such a plan. It's convincing people that the costs those approaches inflict (which can be very difficult to explain, even to other engineers) are _not yet warranted_ that tends to be hard.


Would be better if people didn't put down other developers to feel better of themselves.


Tribal identity, ego, and the need for validation can be troublesome to identify and mitigate.


I think two things are true

1. There's a hard limit to the rate of computation you can do within a single instance. Anything beyond that will require distribution, and you as an engineer have to be able to handle that.

2. That limit is very high nowadays, and distributed systems have very high overheads. Most systems that are distributed don't actually need to be, and are paying the overhead costs unnecessarily.


Having come from a time where loading a small JPEG required a progress bar, it seems like one beefy server machine ought to be able to handle thousands of concurrent requests per second. This seems like it should fit most companies' needs until a hundred million in revenue or so:

1000 req/sec * 6 hours of daytime in one market * 365 days = 1.3 billion, so $0.10 revenue/req would be $130 million. Seems like even node.js could do 1000 req/sec unless the database bottleneck is large, let alone something like Go.

Of course, I might be wrong about this, these are just unverified estimates.

But, another reason is that I, personally, don't want to manage computers, I want to do interesting things with code. As soon as you get a second computer, infrastructure starts becoming non-trivial. I am not interested in infrastructure, so I'd like to be sure I've maxed out one computer first.

Also, my set of problems that I'm interested in does not really require distributed systems.


This glosses over availability a bit too quickly. You can't have high availability without at least dipping your toe into distributed systems. (BTW don't tell me about "single box" high availability or fault tolerant systems. I was there when they were created. They're distributed systems wrapped in tin, one with and one without extra circuitry to add complexity and cost.) A lot of people need high availability, including data availability, even if they don't need high scale.

There's a lot of jumping toward higher-than-necessary degrees of scalability, and lots of gatekeeping, but it's still true that for a lot of jobs "one beefy bare metal machine" thinking just won't allow you to meet requirements - with or without a backup that has to be promoted manually.


If you mention that companies can have HA requirements you get hit back with "most companies don't actually run a cost-benefit analysis for the complexity of introducing X 9s of uptime" or "most outages happen due to misconfigurations and human error, so your redundant architecture won't save you"


Not always. Rarely, in my experience. It seems like this crazy Manichaean battle because there really are kind of two different worlds - the mostly-desktop world where there genuinely might not be any availability requirement (data availability is "somebody else's problem") and the server world where there practically always is. People who only know one get on forums like this and act like the others are being irresponsible, and if they tried to do each other's jobs that might be true. But in the real world, people usually agree pretty quickly on which world they're in and act accordingly.


1. Reasonable disagreements on the definition of “simplicity”.

2. Neither option is perfect, some people are victims of the bad examples of that, and the here are cases where people have made the wrong decisions. Three blind men and the elephant

3. Normal human emotions in response to change.


No one needs to know about distributed systems until your company wants to send automated emails to customers and someone thinks all requests need retry logic.


That's more in line with how I think about it. All systems are distributed (okay not all), even my nice easy to reason about monoliths have to deal with outside events from various webhooks, or sync records with a customer's ERP, etc, etc.


How would you design such system?


It probes at "philosophy of computing" to ask what kind of tech is needed to solve the problem.

For many everyday tasks with personal records or small business, paper records continue to work fine provided the scale isn't too large and only a few hands are in the pot. Slightly larger, and you jump over to the spreadsheet.

Once you involve custom development you've moved from commodity solutions into the realm of architects making a bespoken design, and like with architecture, there's a strong desire to be a star and work on a monumental structure, not a shed.

But...there's a gap between spreadsheet and Bigtable. Where you can start adding requirements of more "9's" of reliability, deep access control policies, frontend dashboards and the like.

These things aren't the informational problem, they're the control of information problem. They don't follow the literal grain of the technology, but exist in an imagined universe where more and more power is consolidated into the hands of the system's owners.

That is the actual statement of purpose you have to make to justify a big tech kind of solution.

There are distributed systems that are not of that sort, the Internet itself among them. They exist, they have some value, they evolve and gain some complexity, but they don't naturally turn into platform monopolies.

And so the tension of "wanting to build distributed but being unable to justify it" is kind of specific to the economic thrust of SV style business and companies trying to ape that model. They're charged with leveraging tech to grow faster and control more, so they have to invent it. But if your business isn't that, you don't need it. But you can't conquer the world without doing that, so if you don't do that, you aren't playing for the real stakes. And that drives a certain kind of conflict in engineering orgs between pure problem solvers and the power hungry.

The only way out of thinking like that, really, is to let go and find balance. The people who are seriously happy with distributed systems work will do it with no paycheck. And for most other people, the spreadsheet, or at most a SQL database, is where it's at. For all the rest, it's the business card scene in American Psycho; the technical demonstration is simply keyfabe for one's personal advancement.


I think it's a mix of several things at play:

1. Fast growing organizations struggle to keep up with the communication overhead when rapidly onboarding new engineers. Most common open source frameworks lack good interfaces for developing isolated components in the same project. In the short term it's easier to spin up a new project than defining and enforcing interface and dependency boundaries.

2. Cloud providers and consultants are incentivized to propagate the myth that distributed systems is the best solution for all problems.

3. Engineers looking to grow are incentivized to add popular new tools. In particular, the less equity you have in the company, the greater financial incentive you have to become an expert of a tool in high demand and land a job elsewhere with higher pay.

4. In my experience very few engineers learn the fundamentals of computers and systems. Instead they follow "gurus" that tell them the current "best practices" are. I think it's easier to feel you're doing a good job by making all your code comply to some style guide, or building systems with an architecture discussed in some cloud provider blog.

5. A VP of engineering I worked with told me in private that one of the reasons we were adding a lot of distributed systems components was so that we could sell ourselves as a tech company to VCs in the next funding round rather than a tech enabled business. I doubt that VCs care about this, but it's telling that a VP of eng thinks it matters.

6. If you start breaking up your monolith into a distributed system you won't feel the pain until you have several systems that are struggling to coordinate and keep data consistent. For the first few months or even years you'll only see the upsides of quicker iterations. It can be enough time that all the engineers that added the distributed systems got promoted and left for another job.

For companies growing quickly or large companies I don't see how you're able to mitigate the communication overhead without adding distributed systems. It allows different teams to ignore each other for the most part and respond to the market quicker. It's often easier for teams to re-build systems than trying coordinate with a different team that has different incentives.

But for all other companies I think people are adding distributed systems prematurely. But lots of individuals in the decision making chain are incentivized to add them. Unless you have an experienced CTO that can enforce a sane policy, it's inevitable that someone will add a distributed system without understanding the nuances that come with it.


Moderate viewpoints don't get upvotes.


I guess that plays a part


... there was that time I had a 3-machine Hadoop cluster at home that was highly effective for the graph processing I was doing.


Number 2 is the factually correct position. Number 1 is the professionally correct position. Resume driven development is the name of the game, and companies reward it.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: