Hacker News new | past | comments | ask | show | jobs | submit login
In Defense of Simple Architectures (2022) (danluu.com)
590 points by Brajeshwar 7 months ago | hide | past | favorite | 436 comments



This is what I tell engineers. Microservices aren't a performance strategy. They are a POTENTIAL cost saving strategy against performance. And an engineering coordination strategy.

Theoretically If you have a monolith that can be scaled horizontally there isn't any difference between having 10 replicas of your monolith and having 5 replicas of two microservices with the same codebase. UNLESS you are trying to underscale part of your functionality. You can't underscale part of your app with a monolith. Your pipe has to be big enough for all of it. Generally speaking though if you are talking about 10 replicas of something there's very little money to be saved anywhere.

Even then though the cost savings only start at large scales. You need to have a minimum of 3 replicas for resiliency. If those 3 replicas are too big for your scale then you are just wasting money.

The place where I see any real world benefit for most companies is just engineering coordination. With a single repo for a monolith I can make 1 team own that repo and tell them it's their responsibility to keep it clean. In a shared monolith however 0 people own it because everyone owns it and the repo becomes a disaster faster than you can say "we need enterprise caching".


You can scale those things with libraries also. The very browser that you're using to read this comment is an example of it. FreeType, Hurfbaz, Pango, Cairo, Uniscribe, GDI, zlib this that and a deep deep dependency tree built by people who never have talked to each other directly other than the documentation of their libraries - works perfectly well.

I assure you 98% of the companies have simpler and shallower code base than that of a modern A class browser.

Microservices was a wrong turn in our industry for 98% of the use cases.


> Hurfbaz

This made me chuckle. :) It's HarfBuzz. Yours sounds like a goblin.


I could't recall the correct name but originally, it is in Persian حرف باز and in that باز would be transliterated as Baz.

But yes, you're right.


I wasn't even aware that this came from Persian! Thanks for educating me.


Services, or even microservices, are more of a strategy to allow teams to scale than services or products to scale. I think thats one of the biggest misconceptions for engineers. On the other end you have the monorepo crew, who are doing it for the same reasons.

On your note about resiliency and scale - its always a waste of money until shit hits the fan. Then you really pay for it.


> Services, or even microservices, are more of a strategy to allow teams to scale than services or products to scale.

I've never really understood why you couldn't just break up your monolith into modules. So like if there's a "payments" section, why isn't that API stabilized? I think all the potential pitfalls (coupling, no commitment to compatibility) are there for monoliths and microservices, the difference is in the processes.

For example, microservices export some kind of API over REST/GraphQL/gRPC which they can have SDKs for, they can version them, etc. Why can't you just define interfaces to modules within your monolith? You can generate API docs, you can version interfaces, you can make completely new versions, etc.

I just feel like this would be a huge improvement:

- It's so much more engineering work to build the service handler scaffolding (validation, serialization/deserialization, defining errors)

- You avoid the runtime overhead of serialiation/deserialization and network latency

- You don't need to build SDKs/generate protobufs/generate clients/etc.

- You never have the problem of "is anyone using this service?" because you can use code coverage tools

- Deployment is much, much simpler

- You never have the problem of "we have to support this old--sometimes broken--functionality because this old service we can't modify depends on it". This is a really undersold point: maybe it's true that microservice architectures let engineers build things without regard for other teams, but they can't remove things without regard for other teams, and this dynamic is like a no limit credit card for tech debt. Do you keep that service around as it slowly accretes more and more code it can't delete? Do you fork a new service w/o the legacy code and watch your fleet of microservices grow ever larger?

- You never have the problem of "how do we update the version of Node on 50 microservices?"


> I've never really understood why you couldn't just break up your monolith into modules

You can! We used to do this! Some of us still do this!

It is, however, much more difficult. Not difficult technically, but difficult because it requires discipline. The organisations I’ve worked at that have achieved this always had some form of dictator who could enforce the separation.

Look at the work done by John Lakos (and various books), to see how well this can work. Bloomberg did it, so can you!

Creating a network partition makes your system a distributed system. There are times you need this, but the tradeoff is at least an order of magnitude increase in complexity. These days we have a lot of tooling to help manage this complexity, but it’s still there. The combination of possible failure states is exponential.

Having said all this, the micro service architecture does have the advantage of being an easy way to enforce modularity and does not require the strict discipline required in a monolith. For some companies, this might be the better tradeoff.


> easy way to enforce modularity and does not require the strict discipline required in a monolith

In my experience, microservices require more discipline than monoliths. If you do a microservice architecture without discipline you end up with the "distributed monolith" pattern and now you have the worst of both worlds.


Yes, I completely agree. If your team doesn't have the skills to use a proper architecture within a monolith, letting them loose on a distributed system will make things a lot worse. I've seen that happen multiple times.


> does not require the strict discipline required in a monolith

How so? If your microservices are in a monorepo, one dev can spread joy and disaster across the whole ecosystem. On the other hand, if your monolith is broken into libraries, each one in its own repo, a developer can only influence their part of the larger solution. Arguably, system modularity has little to do with the architecture, and much to do with access controls on the repositories and pipelines.


> Arguably, system modularity has little to do with the architecture, and much to do with access controls on the repositories and pipelines.

Monoliths tend to be in large monolithic repos. Microservices tend to get their own repo. Microservices force an API layer (defined module interface) due to imposing a network boundary. Library boundaries do not, and can generally be subverted.

I agree that modularity has nothing to do with the architecture, intrinsically, simply that people are pushed towards modularity when using microservices.


People make this argument as though it's super easy to access stuff marked "private" in a code base--maybe this is kind of true in Python but it really isn't in JVM languages or Go--and as though it's impossible to write tightly coupled microservices. The problem generally isn't reaching into internal workings or coupling, the problem is that fixing it requires you to consider the dozens of microservices that depend on your old interface that you have no authority to update or facility to even discover. In a monolith you run a coverage tool. In microservices you hope you're doing your trace IDs/logging right, that all services using your service used it in the window you were checking, and you start having a bunch of meetings with the teams that control those services to coordinate the update. That's not what I think of when I think of modularity, and in practice what happens is your team forks a new version and hopes the old one eventually dies or that no one cares how many legacy microservices are running.


> It is, however, much more difficult. Not difficult technically, but difficult because it requires discipline.

Before that, people need to know it's even an option.

Years ago when I showed a dev who had just switched teams how to do this with a feature they were partway through implementing (their original version had it threading through the rest of the codebase) it was like one of those "mind blown" images. He had never even considered this as a possibility before.


i agree that its possible. From what i've seen its probably harder though than just doing services. You are fighting against human nature, organizational incentives, etc. As soon as the discipline of the developers, or vigilance of the dictator lapses, it degenerates.


It is really hard to read this for me. How can anyone think that it is harder to write a proper monolith than implementing a distributed architecture?

If you just follow the SOLID principles, you're already 90% there. If your team doesn't have the knowledge (it's not just "discipline", because every proper developer should know that they will make it harder for everyone including themselves if they don't follow proper architecture) to write structured code, letting them loose on a distributed system will make things much, much worse.


its not really a technical problem. As others have mentioned on various threads, its a people coordination problem. Its hard to socially/organizationally coordinate the efforts of 100s of engineers to a single thing. It just is. If they're split into smaller chunks and put behind relatively stable interfaces, those people can work on their own on their own thing, roughly however they want. That was a major reason behind the original bezos service mandate email. You can argue that results in a harder overall technical solution (distributed is harder than monolith) but it is inarguably to me much easier organizationally.

You can sort of get there if you have a strong central team working on monolith tooling that enforces module seperation, lints illegal coupling, manages sophisticated multi deployments per use, allows team based resource allocation and tracking, has per-module performance regression prevention, etc. They end up having many of the organizational problems of a central DBA team, but its possible. Even then, I am not aware of many(any?) monoliths in this situation that have scaled beyond 500+ engineers that people are actually happy with the situation they've ended up in.


> some form of dictator who could enforce the separation.

Like a lead developer or architect? Gasp!

I wonder if the microservices fad is so that there can be many captains on a ship. Of course, then you need some form of dictator to oversee the higher level architecture and inter-service whatnots... like an admiral.


> You never have the problem of "how do we update the version of Node on 50 microservices?"

And instead you have the problem of "how do we update the version of Node on our 10 million LOC codebase?" Which is, in my experience, an order of magnitude harder.

Ease of upgrading the underlying platform versions of Node, Python, Java, etc is one of the biggest benefits of smaller, independent services.


> And instead you have the problem of "how do we update the version of Node on our 10 million LOC codebase?"

I think if you get to that scale everything is pretty hard. You'll have a hard time convincing me that it's any easier/harder to upgrade Node on 80 125K LOC microservices than a 10M LOC monolith. Both of those things feel like a big bag of barf.


Upgrading the platform also happens at least 10x less frequently, so that math doesn't necessarily work out in your favour though.


It's much easier to make smaller scope changes at higher frequency than it is to make large changes at lower frequency. This is the entire reason the software industry adopted CI/CD


I'm not sure that's measuring what you think. The CI pipeline is an incentive for a good test suite, and with a good test suite the frequency and scope of changes matters a lot less.

CI/CD is also an incentive to keep domain-level scope changes small (scope creep tends to be a problem in software development) in order to minimize disruptions to the pipeline.

These are all somewhat different problems than upgrading the platform you're running, which the test suite itself should cover.


CI/CD is usually a component of DevOps, and any decent DevOps team will have DORA metrics. Time-to-fix, frequency of deploys are both core metrics, and mirror frequency and scope of changes. You want change often, and small.

Yes, change failure rate is also measured, and that's why good test suites matter, but if you think frequency and scope of change don't matter for successful projects, you haven't looked at the data.

That means frequently updating your dependencies against a small code base is much more useful (and painless) than occasional boil-the-ocean updates.

(As always, excepting small-ish teams, because direct communication paths to everybody on the team can mitigate a lot of problems that are painful at scale)


> I've never really understood why you couldn't just break up your monolith into modules.

I think part of it is that many just don't know how.

Web developers deal with HTTP and APIs all the time, they understand this. But I suspect that a lot of people don't really understand (or want to understand) build systems, compilers, etc. deeply. "I just want to press the green button so that it runs".


Counterpoint, most monoliths are built like that; I wonder if they think that pressing a green button is too easy, like, it HAS to be more complicated, we HAVE to be missing something.


How do you square that with the fact that shit usually hits the fan precisely because of this complexity, not in spite of it? That's my observation & experience, anyway.

Added bits of "resiliency" often add brand new, unexplored failure points that are just ticking time bombs waiting to bring the entire system down.


Not adding that resiliency isn't the answer though - it just means known failures will get you. Is that better than the unknown failures because of your mitigation? I cannot answer that.

I can tell you 100% that eventually a disk will fail. I can tell you 100% that eventually the power will go out. I can tell you 100% that even if you have a computer with redundant power supplies each connected to separate grids, eventually both power supplies will fail at the same time - it just will happen a lot less often than if you have a regular computer not on any redundant/backup power. I can tell you that network cables do break from time to time. I can tell you that buildings are vulnerable to earthquakes, fires, floods, tornadoes and other such disasters). I can tell you that software is not perfect and eventually crashes. I can tell you that upgrades are hard if any protocol changed. I can tell you there is a long list of other known disasters that I didn't list, but a little research will discover.

I could look up the odds of the above. In turn this allows calculating the costs of each mitigation against the likely cost of not mitigating it - but this is only statistical you may decide something statistically cannot happen and it does anyway.

What I cannot tell you is how much you should mitigate. There is a cost to each mitigation that need to be compared to the value.


Absolutely yeah, these things are hard enough to test in a controlled environment with a single app (e.g. FoundationDB) but practically impossible to test fully in a microservices architecture. It's so nice to have this complexity managed for you in the storage layer.


Microservices almost always increase the amount of partial failures, but if used properly can reduce the amount of critical failures.

You can certainly misapply the architecture, but you can also apply it well. It's unsurprising that most people make bad choices in a difficult domain.


Fault tolerance doesn't necessarily require microservices (as in separate code bases) though, see Erlang. Or even something like Unison.

But for some reason it seems that few people are working on making our programming languages and frameworks fault tolerant.


Because path dependence is real so we're mostly building on top of a tower of shit. And as computers got faster, it became more reasonable to have huge amounts of overhead. Same reason that docker exists at all.


> How do you square that with the fact that shit usually hits the fan precisely because of this complexity

The theoretical benefit may not be what most teams are going to experience. Usually the fact that microservices are seen as a solution to a problem that could more easily be solved in other much simpler ways, is a pretty good indication that any theoretical benefits are going to be lost through other poor decision making.


Microservices is more about organisation than it is about technology.

And that is why developers have so much trouble getting it right. They can't without having the organisational fundamentals in place. It is simply not possible.

The architectural constraints of microservices will show the organisational weaknesses in a much higher rate because of the pressure it puts on having the organisation be very strict about ownership, communication and autonomy.

The takes a higher level of maturity as an organisation to enable the benefits of microservies, which is also why most organisations shouldn't even try.

Stop all the technical nonsense because it won't solve the root cause of the matter. It's the organisation. Not the technology


Except that most people build microservices in a way that ignores the reality of cloud providers and the fact that they are building (more) distributed systems, and often end up with lower resiliency.


> You can't underscale part of your app with a monolith. Your pipe has to be big enough for all of it.

Not necessarily. I owned one of Amazon’s most traffic-heavy services (3 million TPS). It was a monolith with roughly 20 APIs, all extremely high volume. To keep the monolith simple and be able to scale it up or down independently, the thousands of EC2 instances running it were separated in fleets, which serving a specific use case. Then we could for example scale the website fleet while keeping the payment fleet stable. The only catch is that teams needed to be allowed to call only the APIs representing the use case of that fleet (no calling payment-related APIs on the website fleet). Given the low number of APIs and basic ACL control, it was not a challenge.


> Microservices aren't a performance strategy

Who thinks that more i/o and more frequent cold starts are more performant?

Where I see micoservices as very useful is for elasticity and modularization. It's probably slower than a monolith at your scale but you don't want to fallover when loads start increasing and you need to scale horizontally. Microservices with autoscaling can make that very useful.

But of course, updating services can be a nightmare. It's a game of tradeoffs.


Surely one could split a shared monolith into many internal libraries and modules, facilitating ownership?


Yes, but you are still dealing with situations where other teams are deploying code that you are responsible for. With microservices you can always say, "our microservice wasn't deployed, so it is someone else's problem."

But I think you are pointing out one of the reasons most places don't get benefits from microservices. If their culture doesn't let them do as you describe with a monolith, the overhead of microservices just bring in additional complexities to an already dysfunctional team/organization.


It's crazy to me how opposed to building libraries everyone seems these days. We use a fuckton of libraries, but the smallest division of software we'll build is a service.


its not crazy to me... ive worked at a couple of places that decided to try to use libraries in a web services context. They were all disasters, and the libraries ended up being considered severe tech debt. The practical aspects of updating all the consumers of your library, and how the libraries interact with persistence stores end up being their undoing.


How do you square this with the fact that you likely used dozens of libraries from third parties?


probably 1000s hah. It's the speed of change. third party libraries dont change much, or fast - and you dont really want them to. The software you build internally at a product company with 500+ engineers changes much much faster. And you want it to.


It's difficult to own a library end-to-end. Ownership needs to include deployment and runtime monitoring.


> And an engineering coordination strategy.

i've always felt the org chart defines the microservice architecture. It's a way to keep teams out of each other's hair. When you have the same dev working on more than one service then that's an indication you're headed for trouble.


That's not just a feeling, it's a direct consequence of Conway's law. You feel correctly.


> The place where I see any real world benefit for most companies is just engineering coordination. With a single repo for a monolith I can make 1 team own that repo and tell them it's their responsibility to keep it clean. In a shared monolith however 0 people own it because everyone owns it and the repo becomes a disaster faster than you can say "we need enterprise caching".

Microservices supposed to be so small that they are much smaller than a team's responsibility; so I don't think that is a factor.

The bottleneck in a software engineering company is always leadership and social organisation (think Google - they tried to muscle in on any number of alternative billion dollar businesses and generally failed despite having excellent developers who assembled the required parts rapidly). Microservices won't save you from a a manger who doesn't think code ownership is important. Monoliths won't prevent such a manger from making people responsible for parts of the system.

I've worked on monoliths with multiple teams involved. It seems natural that large internal modules start to develop and each team has their own areas. A little bit like an amoeba readying to split, but never actually going through with it.


> In a shared monolith however 0 people own it because everyone owns it and the repo becomes a disaster faster than you can say "we need enterprise caching".

* I give two teams one repo each and tell each one: "This is your repo, keep it clean."

-or-

* I give two teams one folder each in a repo and tell each one: "This is your folder, keep it clean."

If you've got a repo or folder (or anything else) that no one is responsible for, that's a management problem, and micro services won't solve it.

Repos (and folders) don't really have anything to do with organizing a complex system of software -- they are just containers whose logical organization should follow from the logical organization of the system.

Microservices back you into position where when the code of one team to calls the component of another team, it has to be high-latency, fault-prone, low-granularity, config-heavy. Some stuff falls the way anyway, so no problem in those case, but why burn that limitation in to your software architecture from the start? Just so you don't have to assign teams responsibility for portions of a repo?


i think doing releases, deployments, and rollbacks is trickier in a monorepo. with multiple services/repos, each team can handle their own release cycles, on-call rotation, and only deal with the bits, tests, and failures they wrote.

you lose some CPU/RAM/IO efficiency, but you gain autonomy and resilience, and faster time to resolution.

https://blog.nelhage.com/post/efficiency-vs-resiliency/

e.g. at Grafana we're working through some of these decoupling-to-microservices challenges, because the SREs that deploy to our infra should not need to deal with investigating and rolling back a whole monolith due to some regression introduced by some specific team to a core datasource plugin, or frontend plugin/panel. the pain at scale is very real, and engineering hours are by far more expensive than the small efficiency loss you get by over-provisioning the metal a bit.


I didn't mean to say there are no reasons to have separate repos, just more responding to the post above mine.

But I will note that almost all the benefits listed aren't specific to micro services.

(Also, to me, it's flawed that a dependency can really push to production without coordination. At the least, dependent components should decide when to "take" a new version. There are lot of ways to manage that -- versioned endpoints, e.g., so that dependents can take the new when they are ready. But it's extra complexity that's tricky in practice. e.g., you really ought to semver the endpoints, and keep versions as long as there could be any dependents... but how many is that, and how do you know? From what I've seen, people don't do that and instead dependencies push new versions, ready or not, and just deal with the problems as breakages in production, which is pretty terrible.)


> Your pipe has to be big enough for all of it.

What do you mean by "pipe" here? It's easier to share CPU and network bandwidth across monolith threads than it is across microservice instances. (In fact, that is the entire premise of virtualization - a VM host is basically a way to turn lots of disparate services into a monolith.)


I view micro services as a risk mainly in a Conway sense rather than technology.

Most companies can run on a VPS.


I mostly agree. I'd add though that isolation can be a legitimate reason for a microservice. For example, if you have some non-critical logic that potentially uses a lot of CPU, splitting that out can make sense to be sure it's not competing for CPU with your main service (and bringing it down in the pathological case).

Similarly, if you have any critical endpoints that are read-only, splitting those out from the main service where writes occur can improve the reliability of those endpoints.


> Microservices aren't a performance strategy. They are a POTENTIAL cost saving strategy against performance.

Yeah they were a way to handle databases that couldn't scale horizontally. You could move business logic out of the database/SQL and into Java/Python/TypeScript app servers you could spin up. Now that we have databases like BigQuery and CockroachDB we don't need to do this anymore.


If you plan to scale horizontally with a microservice strategy, you'll in a lot of pain.

As if you have say 100x more customers, will you divide your application in 100x more services?

As others said, microservices scale manpower, not performance. It only comes sometimes as a bonus.


Microservices can help with performance by splitting off performance critical pieces and allowing you to rewrite in a different stack or language (Rust or go instead of Ruby or Python)

But yeah, they also tend to explode complexity


An important point that people seem to forget is that you don't need microservices to invoke performant native code, just a dynamic library and FFI support. The entire Python ecosystem is built around this idea.


This is why Python stole enterprise big data and machine learning from Java. It actually has a higher performance ceiling for certain specific situations because, almost uniquely among garbage collected high-level languages, it can call native code without marshaling or memory pinning.


You don't need it but you can also explode your repo, test, and build complexity when it'd be easier to keep them isolated.

For instance, you might not want to require all develops have a C tool chain installed with certain libraries for a tiny bit of performance optimized code that almost never gets updated.


I don't know, that seems like confusion of runtime and code organization boundaries. Adding a network boundary in production just to serve some code organization purpose during development seems completely unnecessary to me.

For development purposes you could build and distribute binary artifacts just like you would for any other library. Developers who don't touch native code can just fetch the pre-built binaries corresponding to current commit hash (e.g. from CI artifacts).


>Adding a network boundary in production just to serve some code organization purpose during development seems completely unnecessary to me.

That's a pretty common approach. Service oriented architecture isn't necessarily "development" purposes, usually more generic "business" purposes but organization layout and development team structure are factors.

From an operational standpoint, it's valuable to limit the scope and potential impact code deploys have--strong boundaries like the network offers can help there. In that regard, you're serving a "development" purpose of lowering the risk of shipping new features.


One would imagine that by 2024 there would be a solution for every dynamic language package manager for shipping pre-compiled native extensions, _and_ for safely allowing them to be opted out of in favour of native compilation - I don't use dynamic languages though so don't really know whether that has happened.

My overriding memory of trying to do literally anything with Ruby was having Nokogiri fail to install because of some random missing C dependency on basically every machine...


I think it's gotten a little better but is still a mess. I think the biggest issue is dynamic languages tend to support a bunch of different OS and architectures but these all have different tool chains.

Nodejs, as far as I know, ships entire platform specific tool chains as dependencies that are downloaded with the dependency management tool. Python goes a different direction and has wheels which are statically linked binaries built with a special build tool chain then hosted on the public package repo ready to use.

I don't think anything in Ruby has changed but I haven't used it in a few years.


I am good with splitting off certain parts as services once there is a performance problem. But doing microservices from the start is just a ton of complexity for no benefit and most likely you'll get the service boundaries wrong to some degree so you still have to refactor when there is a need for performance.


Often you can get your boundaries close enough. There will always be cases where if you knew then what you know now. However you don't need to be perfect, just close enough. Web apps have been around in all sizes for 30 years - there is a lot of culture knowledge. Do not prematurely pessimize just because we don't know what is perfect.

I'm not saying microservices are the right answer. They are a useful tool that sometimes you need and sometimes you don't. You should have someone on your team with enough experience to get the decisions close enough up front. This isn't anything new.


Github scaled all the way to acquisition with Ruby on Rails, plus some C for a few performance critical modules and it was a monolith.

It doesn’t take a microservice to allow rewriting hot paths in a lower level language. Pretty much everything has a C-based FFI. This is what makes Python useable for ML—the libraries are written in C/C++.


But you don't need micro services for that. You can always split things when useful, the issue with microservices is the idea that you should split things also when not necessary.


Keep it simple :)


> For example, at a recent generalist tech conference, there were six talks on how to build or deal with side effects of complex, microservice-based, architectures and zero on how one might build out a simple monolith.

Queue my favourite talk about microservices: David Schmitz - 10 Tips for failing badly at Microservices [1]

This guy has amazing delivery -- so dry and funny. He spends 45 minutes talking about all of his microservices mistakes!

[1] https://www.youtube.com/watch?v=r8mtXJh3hzM


Microservices are an organizational technology. They allow small teams to work independently from other teams.

The most frequent problem that I see is that people design Microservices that are "small as possible" instead of "Small enough than a small team can own them". For example, I have seen teams of 5-7 developers owning 20 microservices which is crazy IMHO. I blame the pre-fix micro which is highly miss leading. Now, if you merge together these 20 tiny microservices doesn't make monolith. A monolith would be a service or application that is owned by multiple/many teams.

The other important aspect is that microservices should be loosely coupled and what I see is highly coupled microservices. How do you know you have highly coupled microservices? I have seen a few things.

- They share business logic and developers end up creating libraries with business logic which is very problematic.

- They share access to the tables.

- Changes in one often require changes in other microservices.

- Integration and unit tests are not enough. You need E2E tests, but these are hard to write and brittle.


Monoliths aren’t very useful in many organisations where you need to build and connect 300+ systems. They also stop having simple architecture if you try. Most architecture conferences and talks tend to focus more on the enterprise side of things, and really, why would you need full time software focused architects if you’re building something like stackoverflow.

I do think things have gotten a little silly in many places with too much “building like we’re Netflix” because often your microservices can easily be what is essentially a bunch of containerised monoliths.

I think the main issue is that your IT architecture has or should have) very little to do with tech and everything to do with your company culture and business processes. Sometimes you have a very homogeneous focus maybe even on a single product, in which case microservices only begin to matter when you’re Netflix. Many times your business will consist of tens-thousands of teams with very different focuses and needs, and in these cases you should just never do monoliths unless you want to end up with a technical debt that will hinder your business from performing well down the line.


   > Monoliths aren’t very useful in many organisations where you need to build and connect 300+ systems
Seems like the mistake is building 300+ systems instead of a handful of systems.

A Google team published a paper on this last year: https://dl.acm.org/doi/10.1145/3593856.3595909

    > When writing a distributed application, conventional wisdom says to split your application into separate services that can be rolled out independently. This approach is well-intentioned, but a microservices-based architecture like this often backfires, introducing challenges that counteract the benefits the architecture tries to achieve. Fundamentally, this is because microservices conflate logical boundaries (how code is written) with physical boundaries (how code is deployed). In this paper, we propose a different programming methodology that decouples the two in order to solve these challenges. With our approach, developers write their applications as logical monoliths, offload the decisions of how to distribute and run applications to an automated runtime, and deploy applications atomically. Our prototype implementation reduces application latency by up to 15× and reduces cost by up to 9× compared to the status quo.
Worth the read.


> Seems like the mistake is building 300+ systems instead of a handful of systems.

But that’s not what happens on enterprise organisations. 90% of those are bought “finished” products, which then aren’t actually finished and you can be most certain that almost none of them are capable of sharing any sort of data without help.

Hell, sometimes you’ll even have 3 of the same system. You may think it’s silly, but it is what it is in non-tech enterprise where the IT department is viewed as a cost-center similar to HR but without the charisma and the fact that most managers think we do magic.

Over a couple of decades I’ve never seen an organisation that wasn’t like this unless it was exclusively focused on doing software development, and even in a couple of those it’s the same old story because they only build what they sell and not their internal systems.

One of the things I’m paid well to do is help transitions startups from their messy monoliths into something they can actually maintain. Often with extensive use of the cheaper external developers since the IT department is pure cost (yes it’s silly) and you just can’t do that unless you isolate software to specific teams and then set up a solid architecture for how data flows between systems. Not because you theoretically can’t, but because the teams you work with often barely know their own business processes. I currently know more about specific parts of EU energy tariffs than the dedicated financial team of ten people who work with nothing else, because I re-designed some of the tools they use and because they have absolutely no process documentation and a high (I’m not sure what it’s called in English, but they change employees all the time). Which is in all regards stupid, but it’s also the reality of sooo many places. Like, the company recently fired the only person who knows how HubSpot works for the organisation during down sizing… that’s the world you have to design systems for, and if you want it to have even a fraction of a chance to actually work for them, you need to build things as small and isolated as possible going all in on team topologies even if the business doesn’t necessarily understand what that is. Because if you don’t, you end up with just one person who knows how the HubSpot integrations and processes work.

It’s typically the same with monoliths, they don’t have to be complicated messes that nobody knows how work… in theory… but then they are build and maintained by a range of variously skilled people over 5 years and suddenly you have teams who hook directly into the massive mess of a DB with their Excel sheets. And what not.


> a high (I’m not sure what it’s called in English, but they change employees all the time).

To help you out, the word is attrition or turnover. Turnover would be more appropriate if the roles are refilled, attrition if the roles are never replaced.

https://www.betterup.com/blog/employee-attrition


Full-circle back to mainframe programming model.


Not really; it's not monolithic compute. It's monolithic codebase, but deployment can be piecemeal.


Building like NetFlix is better than random unguided architectures that result from not thinking. It might not be the best for your problem though. If you don't need thousands of servers, then the complexity that Netflix has to put into their architecture to support that may not be worth the cost. However if you do scale that far you will be glad you choose an architecture proven to scale that large.

However I doubt Netflix has actually documented their architecture in enough detail that you could use it. Even if you hire Netflix architects they may not themselves know some important parts (they will of course know the parts they worked on)


I mostly use Netflix as somewhere you’ve reached a technical point where you need to spread horizontally. As StackOverflow you can scale rather far without doing so if your product isn’t streaming billions of gigabytes of video to the entire world through numerous platforms. So what I mean by it is that many of will never reach those technical requirements. Sorry that I wasn’t clear. I don’t disagree with what you say at all, but I do think you can very easily “over design” your IT landscape. Like we have a few Python services which aren’t build cleverly and run on docker containers without clever monitoring. But they’ve only failed once in 7 years and that was due to a hardware failure on a controller that died 5 years before it should’ve.


That is how I use Netflix or stackoverflow. Choosing either (despite how different they are!) is better than random unstructured building code with no thought to the whole system.


It's a very bold assumption that a team that cannot manage a monolith will somehow lay a robust groundwork that will be the foundation of a future Netflix-like architecture.

By the way - Netflix started as a monolith, and so did most other big services that are still around.

The rest faded away, crushed by the weight of complexity, trying to be "like Netflix".


There are much better options above copying someone else. Copy is better than letting anything happen, but you should do better. You should learn from Netflix, stackoverflow and the like - no need to remake the same mistakes they did - but your situation is different so copy isn't right either.


I think a lot of this comes from the additive cognitive bias, which is really deeply entrenched in most of our thinking.

Here's a concrete example, stolen from JD Long's great NormConf talk, "I'd have written a shorter solution but I didn't have the time", which is itself drawing from "People systematically overlook subtractive changes" by Adams, et al., 2021:

Give people a Lego construction consisting if a large, sturdy plinth with single small spire supporting a platform on top. The challenge is to get it to reliably support a brick. As is, it will collapse because that little spire is so flimsy. Ask participants to modify the structure so that it will support the brick. Incentivize simpler solutions by saying that each additional brick costs $0.50.

Given just this information, people universally try to fix the problem by adding a bunch more bricks to overbuild the structure, and they spend a lot of energy trying to come up with clever ways to reduce the number of bricks they add. But then the research team makes one small tweak to the instructions: they explicitly point out that removing bricks doesn't cost anything. With that nudge, people are much more likely to hit on the best solution.


In software, it's clear why we don't prefer subtractive changes - complexity. If there's any chance that the code in question has non-local effects that can't be reasoned about at the call site, it's risky to make subtractive changes.

Additive changes have the advantage of having a corresponding additive feature to test - you don't need to touch or even understand any of the existing complexity, just glue it on top.

So the cost structure is likely inverted from the study you describe. Additive changes are (locally) cheap. Subtractive changes are potentially expensive.


This is a big factor in why Haskell is widely considered a really fun language to refactor in. The enforced purely functional paradigm makes adding nonlocal effects much more annoying, so later contributors can much more confidently reason about the state of their code (assuming you can find any later contributors in such a small community!).

I don't know whether the Lisps have a similar "fun to refactor" quality to them, since while they are in the functional camp, they're totally different and more flexible beasts architecturally.


IMO Lisps are not nice to refactor in that sense, but a dream to modify in general. In the case of Haskell and Rust, the compiler creates really tight and more importantly, global, feedback loops that have pretty good guarantees (obviously not perfect, but they're stronger than most everything else out there), while Lisp has basically no guarantees and forces you to execute code to know if will even work at all. This doesn't become apparent until you try to do large scale refactors on a codebase and suddenly you actually need good test coverage to have any confidence that the refactor went well vs in Rust or Haskell if it compiles there's already a pretty reasonable guarantee you didn't miss something.

Fortunately, Lisp codebases IME tend to be smaller and use small, unlikely-to-need-to-be-modified macros and functions, so large refactors may be less common.


I'm not convinced that challenge shows what you say it shows. If someone set me a challenge to "modify this structure to support a brick" and the solution was "take the structure away and put the brick on the ground" I would feel like it was a stupid trick. "Remove the spire and put the brick lower" is less clear, but it's along those lines; "make this church and tower reliably support a clock at the top where everyone can see it", solution: "take the tower away and put the clock on the church roof", no, it's a Captain Kirk Kobayashi-Maru cheat where the solution to the "engineering challenge" is to meta-change-the-goal.

Yes you might get to change the goal in business software development, but you also know that the goal is 'working solutions' not 'build up this existing structure to do X'.


I believe it is as easy to over-engineer as it is to under-engineer. Architecture is the art of doing "just enough"; it must be as simple as possible, but as complex as necessary.

But that is hard to do, and it takes experience. And one thing that does not describe the IT industry well is "experience": don't most software developers have less than 5 years experience? You can read all the books you want and pass all the certifications you can like a modern Agile manager, but at the end of the day you will have the same problem they have: experience takes time.

Inexperienced engineers throw sexy tech at everything, inexperienced manager throw bullshit Agile metrics at everything. Same fight.


> Architecture is the art of doing "just enough"; it must be as simple as possible, but as complex as necessary.

Or as Albert Einstein phrased it: “Everything should be made as simple as possible, but not simpler!”


Well I have seen lot of people with 20 years of experience which is actually 2 year experience repeated 10 times.


That's not what I am saying. I am saying that you cannot make 20 years of experience in 2 years.

You are saying that some people with 20 years of experience are worse architects than others.


No, I am trying to say inexperience is more insidious in IT industry and no of years is not giving true picture of ones' experience.


Let me put it this way: do you think that somebody can have 20 years of job experience after 2 years on the job?


In my experience over-engineering is far less of a problem than badly engineering. I see way more total messes than "enterprise" engineering.


Well summarized.

Also, the area of experience matters. I myself for instance have become pretty good at designing systems and infrastructure both from scratch up to corporate level. However, that will not help me too much when it comes to mega-corp level (think: FAANG) and other things such as designing public opensource libraries. (libraries and systems/applications require very different tradeoffs).


When was the last time you saw a system fail because it was under engineered? I don’t mean bad code. I mean something like a single file with 100k lines with structs and functions, or a MySQL instance serving millions of customers a day.


I'm going to agree with Dan Luu by asking where I can find more of these sane companies. I want to spend an honest 6h a working day improving a product (Terraform and Webpack are not the product) and spend the rest of my time tending to my garden.


I am with you. Complex architectures where you have to fight it even for simple changes is a recipe for burn out as you paddle and paddle and you are stuck in the same place.


I think about this almost daily while slogging away in hundreds of lambdas, working on the world’s slowest app server. I think maybe 10% of our effort rubs off as business value. Maybe less.


You aren't looking for a job.

You're looking for a lifestyle business.

They are great. You will get time in the garden, that's awesome.

Vacation someplace with no phones? Out of the question. Weekend in a 3rd world country with character and food and sketchy internet. Not gonna happen.

You want to optimize for free time in the garden by all means you can do it, but you loose out in other places, you pick up other work (taxes, systems etc).

Edit: Down vote away, I live this life now, my tomatoes are delicious and I make yogurt too!


Not sure I understand your point. I do my work, doing development work and managing a small team.

Still never work weekends (no phone, no email, no slack), spend time in my garden and just came back from a no-phone vacation.

Salary is above the going rate where I live - the work is remote and salary is based on company headquarters. Taxes are in line with the country I live in.

Not really seeing any downsides here, and as expected morale is quite good overall at work... But finding this company was extremely lucky/difficult.


I learned long ago that I cannot write quality code for 8 hours per day. I need to have several hours of meetings every day just to ensure I don't overdo the coding and write bad code.

Sure I can write code for 14 hours a day, but it will be bad code, sometimes even negative productivity as I introduce so many bugs.


Running a business, even a "lifestyle business" is so substantially different from a job - requiring all kinds of very different tasks, and risk-taking profile - that it doesn't seem reasonable to assume that someone who's looking for such a job is actually looking to run some business.


May I get some more information on how you went into lifestyle business? Looking to get into that as well.


I know several folks with niche business that pay various levels of their bills.

Software for youth sports, photography, asset tracking, vendor tracking, niche issues in CC Processing, facets of insurance and billing (coding).

Niche businesses happen all over the place, and finding one (for me) was a lot of trial and error, that niche business pays my "bills" and I do consulting work (sporadic, interesting and high value) to round it out (and keep me on my game).

Dont think of it as a business right away. You're going to "play", you want to build them quickly, you want to host them cheaply, you want to toy with selling them. Your managing disappointment and failure, your learning lessons and keeping it FUN. The moment you start dreaming that it's going to "make it big" is the moment you have to reel yourself back to reality. If you can say "what did I learn" and have a list of things you got from it then it was a success. At some point you just find one that clicks and it grows.


I'd argue that this article could be read more like "start as a monolith, and then move towards microservices when sensible", except the penny hasn't fully dropped for the author that the sensible time for their organisation is right now.

The company appears dogmatically locked into their idling python code, their SQL making unpredictable commits, and their SQL framework making it difficult to make schema migrations.

This is a financial services company describing data-integrity bugs in their production platforms.


In what way does "moving towards microservices" relate to solving "data-integrity bugs"?

I would argue the opposite - the more distributed you make the system, the more subtle consistency bugs tend to creep in.


For a financial services company they should be using a compiled language. Something like C# or Java or Rust or Go with Postgres.


He's been talking about the problems/solutions around Wave for a while. Every time I can't help but think if they'd just started with C# or Java, maybe Go or Rust, they'd be in a better position.

Here's what I see them as providing:

- Simple async, no wasted CPU on I/O

- Strong typing defaults

- Reliable static analysis

Half of the "pros" he lists for using GraphQL are just provided out of the box using ASP.NET Core with NSwag to document the endpoints. If they want to keep the client-side composability, create some OData endpoints and you've got it.

> Self-documentation of exact return type

> Code generation of exact return type leads to safer clients

> Our various apps (user app, support app, Wave agent app, etc.) can mostly share one API, reducing complexity

> Composable query language allows clients to fetch exactly the data they need in a single packet roundtrip without needing to build a large number of special-purpose endpoints

Bias flag: I'm primarily a C#/.NET developer. I came up on python and ruby but have had more success in my region getting paid to work on .NET code.


Why? What inherent advantage do those languages have with financial logic?


Python will let you use float division on integers. Compiled type languages won't do that. This would be solved by using a double data type. But python doesn't have it.


You prooobably don't want to use doubles with financial data. Even in Java. Go for BigDecimal instead, for arbitrary precision instead of floating-point math.

And if you're doing that, python has the Decimal class for the same thing.


You'd almost certainly just use a library dedicated to mitigating this problem though, of which there are many.


Or you could just use a language with a static type system and avoid a whole class of bugs automatically.


Look, I like static type systems, but not all type systems are created equal, and some(with the easiest example being Go) are downright awful for helping encode business logic, forcing you to write immense amounts of code just to work around their awful type system decisions, code that often introduces bugs because of those issues.


Better static analysis, for one.


We have services in Python and C# at $dayjob, and my experience is that Python has mostly left C# behind here - what is something you can express in the C# type system that MyPy cannot statically check?

Conversely, I find myself constrained by C#'s typesystem not having features now widely popular from Rust and Typescript and available in modern Python: const generics / literals, protocols/structural interfaces and matching over algebraic datatypes, to name a few examples.


Was bewildered when reading this message. How come? Python and TypeScript by definition don't have a concept of const generics the way it applies to C++ templates or Rust.

Nor "matching over algebraic datatypes" made any sense - sure sounds slightly below average fancy but comes down to just using C# pattern matching effectively, which the author may have never done. But then it struck me "structural interfaces sure sounds familiar", I go check the pinned repositories in the github profile and it all made sense immediately - the pinned ones are written in Go, which explains the author's woes with confidently navigating something more sophisticated.


> Python and TypeScript by definition don't have a concept of const generics the way it applies to C++ templates or Rust.

Can you help guide me where I'm misunderstanding the type system aspect of const generics that's missing in Python? What I meant was that in Python I can say something like this:

    import typing as t

    class FooEater[NUM_FOOS: int]:
        def eat_foo(self, num_foos: NUM_FOOS):
            print("nomnom")
    
    one_foo = FooEater[t.Literal[1]]()
    one_foo.eat_foo(2)  # type error, NUM_FOOs for one_foo only allows literal '1'

And a PEP695-compatible typechecker will error at build time, saying I can't call `eat_foo` with `2`, because it's constrained to only allow the value `1`.

I admit to not being a type system specialist, but at least as I understand Rust's "const generics" the type system feature is: "Things can be generic over constant values". Isn't that the same as what Python supports?


Sorry - can you put the snark aside and answer the question then, it seems given my lack of understanding here as a mere Go simpleton it should be trivial for you.

The question I responded to said: "Better static analysis, for one."

I asked: "What is something you can express in the C# type system that MyPy cannot statically check?"

Can you provide some examples where C#'s type system can express something that modern well-typed Python cannot type-check?


And that helps in financial analysis how?


I didn't say anything about financial analysis. But static typing is a good defense mechanism against type errors at runtime (I don't think anyone would argue against that). When you're running a financial services product, the cost of a type error in production can be dramatically higher than in non-financial code. Speaking from firsthand experience.


I don't think it's specific to financial services. He just meant that financial services is an area where you really don't want bugs! And one of the easiest ways to eliminate entire classes of bugs is to use static typing.


The author is ex Microsoft, Google and was a senior staff eng at Twitter; i don’t know them but my experience with their blogging over the last decade is that they seem generally very well informed. To me it seems unlikely “the penny hasn’t dropped”, like you say.

On the actual criticism you’re raising: In what concrete way would moving to micro services help with reducing data integrity bugs? In my experience I’d expect the exact opposite - micro services systems generally being harder to test and generally having more possible interleavings and partial failure paths.


Where does the "idling python code" come from? If it blocks, it blocks, that's not "idling". And I doubt they are running a process per core.


These are all specific problems that can be individually solved. Personally I don’t see how changing their architecture or programming language would solve those


In addition, they risk introducing new problems they already solved with their current setup


Daily reminder that when one process is blocked others can be working so nothing is “idling” on a web server properly configured.


This one is a classic (instant classic), I can't say anything about this better than Dan did.

What I can offer is a personal anecdote about what an amazing guy he is. I had been a fan of his blog for a while, and at one point I decided to just email him and offer to fly to wherever he was because I wanted to hear his thoughts on a number of topics.

This was in maybe 2016, but even then I didn't expect someone who must get a zillion such emails to even reply, let alone invite me up to Seattle at my leisure! I think I had the flight booked on a Wednesday for a departure on Friday, for one night's stay, and Dan was profoundly generous with his time, we stayed up late into the night chatting and I learned a great deal, especially about the boundary between software and hardware (a topic on which he is a first-order expert with an uncommon gift for exposition).

I had the great fortune of spending some real time with Dan not just once but twice! When I went to NYC to get involved with what is now the Reels ML group, he happened to also be in NYC, and I had the singular pleasure to speak with him at length on a number of occasions: each illuminating and more fun than you can have without a jet ski.

Dan is a singularly rigorous thinker with the dry and pithy wit of a world-class comedian and a heart of gold, truly generous with his expertise and insight. I'm blessed to have met and worked with a number of world-class hackers, but few, few if any are such a joy to learn from.


I agree with the general sentiment that simple architectures are better and monoliths are mostly fine.

But.

I've dealt with way too many teams whose shit is falling over due to synchronous IO even at laughably low volumes. Don't do that if you can avoid it.

"Subtle data-integrity bugs" are not something we should be discussing in a system of financial record. Avoiding them should have been designed in from the start.


So they cannot get data integrity constraints done properly with a single database? Wait until they have to do seven. Also, sounds like not even proper indexes were in place, so database amateur hour.


> shit is falling over due to synchronous IO even at laughably low volumes.

Like? I ask because even synchronous IO let's you serve millions of requests per month on a cheap VPS.

That's enough in the b2b space to keep a company of 1000 employees in business.


Until one of your IO destinations develops some latency. Or your workflow adds a few more sync IOs into each request. Or you suddenly run outta threads.

Then even if you're only at millions per month you've probably got problems.


> Then even if you're only at millions per month you've probably got problems.

Not in my experience. You may be using metrics for B2C websites which make $1 for each 1 million hits.

B2B works a little differently: you're not putting everyone on the same box, for starters.

I did some contract maintenance for a business recently (had no tech staff of their own, had contracted out their C# based appdev to someone else decades ago and just need some small changes now), and a busy internal app serving about 8000 employees was running just fine off a 4GB RAM VPS.

Their spend is under $100/m to keep this up. No async anywhere. No performance problems either.

So, sure, what you say makes sense if your business plan is "make $1 of each 1 million visitors". If you business plan is "sell painkillers, not vitamins" you need maybe 10k paying users to pay yourself a f/time salary.


I had a similar thought when C# introduced async/await. "Why all this complexity? What was wrong with good old fashioned blocking calls?"

I don't know the answer, but I would like to. I think it has to do with the limit on the number of processes/threads that the OS/framework can manage. Once you reach this limit, using async/await somehow allows the OS/framework to secretly use your (technically not) blocked threads to do some other work while they are (technically not) blocked.


1. Async allows you to parallelise tasks. If a request from a user needs to hit three logically independent endpoints, you don't need to do that in sequence, you can do them in parallel and thus the user will get a response much quicker.

2. OS threads can be expensive in the sense that they can block a lot of memory, to the point where you could run out of threads at some point. This is worse in some environments than in others. Apart from async, another solution for this is virtual / green threads (as in Erlang, Haskell, and much more recently, Java).

3. Some async implementations enable advanced structured concurrency patterns, such as cancellation, backpressure handling, etc.


To be fair, a million requests per month is 20 requests per minute...


> To be fair, a million requests per month is 20 requests per minute...

Which, in B2B, is insanely profitable. At 20 rqsts/min, for a paying customer paying you $200/m/user, those numbers are fantastic!

I can only dream of having those numbers!


Sure, but in terms of load, dealing with the requests, a single raspberry pi 2 will barely register that even if you deploy it with a CGI stack.


Recently I worked on a project that was using synchronous IO in an async framework -- That tanked performance immediately and effectively meant that the application could service one request at a time while subsequent requests started queuing.

(Agreed that synchronous IO can serve hundreds of requests per second with the right threading model)


Yeah I'm with this guy right up until that last statement. You should have 0 of these. Not an increasing rate of them.


so the solution to incompetence is adding more complexity?


The most sophisticated arch ever needed for any scale in my FAANG and F500 jobs are a ssl-supported load balancer, multiple app servers with thread pools, a sharded database, and message queues. Everything else is just a dressed up version of this


Wow, this guy's patreon is something else. https://www.patreon.com/danluu

$16/month "Short-form posts"

$256/month "Sponsor"

$2,048/month "Major Sponsor"

$16,384/month "Patron"

You certainly can't fault the guy for trying, although it does make him seem a bit crazy. And I'm aware of scenarios where raising prices doesn't have the negative effect you might expect. You never know what people might spend but I can't imagine this is a tempting proposition for anyone.

Maybe it's all a ploy to get you to spend $16/month to see how many "Sponsor", "Major", and "Patron" level subscribers there are.


In this very comment page, there is a guy who flew Dan to a different city, just to hear this thoughts on random issues.

Also, if you look at his other writings, he's often noted trends in how software developers are paid. And the paradox that with even small tweaks, he's sometimes saved his employers far more than he could have earned in ten lifetimes. He's a free agent now... so why not charge what his advice is actually worth?

Dan is a very unusual person, in that he tries to do what is predicted by the data he has, rather than waiting for it to become conventional wisdom.


I find that architecture should benefit the social structure of the engineering team, and there are limits. I work on one of these “simple architectures” at large scale… and it’s absolute hell. But then, the contributor count to this massive monorepo + “simple architecture” hell numbers in the thousands.

Wave financial is only 350 people according to wikipedia - I doubt that’s 350 engineers. I know only of Google and Meta that can even operate with a massive monorepo, but I wouldn’t call their architecture “simple”. And even they do massive internal tooling investments - I mean, Google wrote their own version control system.

So I tend to think “keep it simple until you push past Dunbar’s number, then reorganize around that”. Once stable social relationships break down, managing change at this scale becomes a weird combination of incredible rigidity and absolute chaos.

You might make some stopgap utility and then a month later 15 other teams are using it. Or some other team wants to change something for their product and just submits a bunch of changes to your product with unforseen breakage. Or some “cost reduction effort” halves memory and available threads slowing down background processes.

Keeping up with all this means managing hundreds of different threads of communication happening. It’s just too much and nobody can ever ask the question “what’s changed in the last week” because it would be a novel.

This isn’t an argument for monoliths vs microservices, because I think that’s just the wrong perspective. It’s an argument to think about your social structure first, and I rarely see this discussed well. Most companies just spin up teams to make a thing and then don’t think about how these teams collaborate, and technical leadership never really questions how the architecture can supplement or block that collaboration until it’s a massive problem, at which point any change is incredibly expensive.


The way I tend to look at it is to solve the problem you have. Don't start with a complicated architecture because "well once we scale, we will need it". That never works and it just adds complexity and increases costs. When you have a large org and the current situation is "too simple", that's when you invest in updating the architecture to meet the current needs.

This also doesn't mean to not be forward thinking. You want the architecture to support growth that will more than likely happen, just keep the expectations in check.


> Don't start with a complicated architecture because "well once we scale, we will need it".

> You want the architecture to support growth that will more than likely happen

The problem is even very experienced people can disagree about what forms of complexity are worth it up-front and what forms are not.

One might imagine that Google had a first generation MVP of a platform that hit scaling limits and then a second generation scaled infinitely forever. What actually happens is that any platform that lives long enough needs a new architecture every ~5 years (give or take), so that might mean 3-5 architectures solving mostly the same problem over the years, with all of the multi-year migration windows in between each of them.

If you're very lucky, different teams maintain the different projects in parallel, but often your team has to maintain the different projects yourselves because you're the owners and experts of the problem space. Your leadership might even actively fend off encroachment from other teams "offering" to obsolete you, even if they have a point.

Even when you know exactly where your scaling problems are today, and you already have every relevant world expert on your team, you still can't be absolutely certain what architecture will keep scaling in another 5 years. That's not only due to kinds of growth you may not anticipate from current users, it's due to new requirements entirely which have their own cost model, and new users having their own workload whether on old or new requirements.

I've eagerly learned everything I can from projects like this and I am still mentally prepared to have to replace my beautifully scaling architectures in another few years. In fact I look forward to it because it's some of the most interesting and satisfying work I ever get to do -- it's just a huge pain if it's not a drop-in replacement so you have to maintain two systems for an extended duration.


Early in my career one of the IT guys told me that one of the people on staff was "a technical magpie". I looked at him with a raised eyebrow and he said "He has to grab every shiny piece of tech that shows up and add it to the pile".

This is where we are.

I can't tell you how many times I have seen projects get done just to pad a PM or developers resume. Just because it was the lastest and greatest hot shit thing to use. No sense of if it would be better, faster, cheaper, useful.

When cloud was the hot new thing the company that I worked with launched a replatform on AWS. It gave us the ability to get through the initial scaling and sizing with ease. We left right away, because even then the costs did not make sense. Now we see folks crying about "exit fees" that were always there. That assumes that your institution even has the gall to own years of pissing away money.

Workman like functionality isnt sexy, it wont be the hot bullet point on your resume, it wont get you your next job, but it is dam effective.


> Workman like functionality isnt sexy, it wont be the hot bullet point on your resume, it wont get you your next job, but it is dam effective.

So, not fun, not rewarding, no intellectual challenge, no career benefit. Why exactly should I want to do it? This isn't the goddamn United Federation of Planets, nor is the company a church - why exactly should I go above and beyond what I agreed to in exchange for my salary? It's not like the bosses go above and beyond either, nor do they believe in company "mission".

To be clear: I understand the importance of actually doing your job right, and benefits of using boring tech, but you are not selling that well here. Employees need food and shelter and creature comforts, and so do their families. They are going to think beyond the current job, because if they won't, nobody else will.


In my experience the boring "I delivered a project very fast with X,Y and Z and saved the company $100mil" will win over "I rearchitected a massive system to run on microservices"

At a certain point in your career, you'll realize that the business manager can override any technical hiring manager. Because at the end of the day delivering results is sexier, than bells and whistles in your resume.


There's an old IEEE article about the billions of dollars lost due to software project failures: https://spectrum.ieee.org/why-software-fails

We don't hear of such failures any more because software projects (or products) no longer "fail" in the traditional sense -- they turn into endless money sinks of re-architectures, re-platforming, tech debt repayment or expensive maintenance, that can continue as long as the company has cash. When the company does run out of cash, it is difficult to say to what extent tech expenses or lack of revenue due to slow software delivery played a part.


> delivered a project very fast with X,Y and Z and saved the company $100mil

The problem is that $100mil is all pixie fairy dust when you're working on a new project. I wish this wasn't true but it works out better for you to implement it as costly and complex as possible, show off how smart you are, then simplify it during a cost cutting initiative (wow they must be so smart to make such an obviously complex system so simple).

The secret is that while you think you're getting away with something playing this game you're actually doing exactly what the business wants.


> ...while you think you're getting away with something playing this game you're actually doing exactly what the business wants.

How so? I would think the business wants to spend as little money as possible.


A bit of an aside, but one of the most important things that I've learned over my career is that the business wants to make as much money as possible. This may seem similar to "wants to spend as little money as possible," but there's a big difference.

Your floor is limited because you can only drop your costs to zero, but there's no ceiling on how much revenue you can make.


Nah, they want to bring in as much money as possible, subtle difference. High complexity (tech debt) and high costs (paying for expensive managed services) in service to time-to-ship is actually great. If it turns out that the market they predicted doesn't pan out they find out faster and just shut it down chalk it up to r&d costs for the tax break, and if it's so successful it costs them an arm and a let it's "good problems to have."


Well maybe not what it wants, but probably (depending on culture) what it _rewards_.


> In my experience the boring "I delivered a project very fast with X,Y and Z and saved the company $100mil" will win over "I rearchitected a massive system to run on microservices

Good luck having the opportunity to work in a project where you have even the faintest idea how much money your contribution will make or save. I don't know about you, but never in my 17 year career have I had enough information to even attempt computing these numbers. And even if I could have, it was never part of my job description.

So how did you know your numbers? Or if you didn't, how did you made them up for your interviews?


It's crazy that you don't know. I've been in this industry for 20y and apart from when I was extremely junior I always had a sense of the business impact of my work.


Yeah, a sense. A guess. A gut feeling. Based on what exactly? I sure do get a sense of what will require less effort in the long run, or what will makes the user's life easier, or even what is likely to decrease the defect rate… but I dare 95% of programmers, even the subset active here on HN, to reliably assess the monetary impacts of those decisions within one order of magnitude, especially compared to the alternatives.

Not to mention the monetary impacts of decisions totally outside my control. I can tell the architect "you suggest A, but B is simpler to use and makes the API 3 times simpler at no loss of functionality", what's the point of estimating the impact of such a no-brainer when the architect answers is "you're correct, but we'll do it my way" (real story)? And how do you expect me to estimate the monetary impact of pointing out that their TPM provisioning is missing a verification step? That stuff happens inside the factory, a problem at this stage is unlikely anyways. And even if I could somehow divine my monetary impact, the best I can say now is "I did good work for this company, they didn't listen to me, and now they're going under". Not even kidding, they are going under. I just ended my gig there because I couldn't take it any more.

What are those wonderful places you worked at where you could estimate your impact with reasonable accuracy?


Napkin math and ROI, no one is asking for the cents.

For example, build system improvement on 10% of build time across 200 developers who on average get paid 300k a year - that's a very easy math, no ?

Same for time to deploy, improvements on time to fix a bug, etc. etc.

You can extrapolate and compare initiatives and projects on t-shirt sizes and ROIs. Knowing where yours sit as well.

What places I've worked at ? Mostly business that made some money and were stable.. apart from a start up that was VC funded and made no money


I rarely got to know the actual deployment scale of anything I've done. Let's make a list:

Ground software for an observation satellite. My internship was about implementing a dead simple neural "network" (2 hidden layers, no feedback), everything was specified from up top, we didn't even get to touch the learning algorithms. Impact? I guess a big flat zero, since all the differentiators was in the learning parameters.

Peer-to-peer social network before Facebook. Never made a cent.

Geographic Information System for the military. I was for obvious reasons not allowed to know enough to estimate the impact of my work. And even then all decisions was made by the customer, and once the user (a different entity) saw the Rube Goldberg contraption we dully made for them they predictably balked, and we did what we could from there. Which was, not that much. I did some useful stuff for sure, but mostly I participated in a system that was arguably worse than the one that preceded it.

A visualiser for civil radar data. Data in, little planes in the screen out. And other nice stuff. I designed a simple C++ API that allowed the client to write business code faster than we would have ourselves (if only because of communication overhead), saving weeks of work. That contribution was utterly ignored for personal reasons, and I was eventually out. I have no idea what my actual impact was, because I don't know how far the project even went, and how widely it was eventually deployed.

The maintenance of ground software for small civil observation drones. I did some cool stuff, but then was asked to transfer ownership of this software to a recently bought team (that did stuff similar to the company I worked for). I could have known how many drones were actually deployed, but to be honest my thing just saved a few minutes of flight, while most of the cost is to get the drone and its operator on site. That company was never really profitable, I hope the good people I met there are doing well.

Scripting language for a programmable logic controller test environment. For the military, so I don't think I was allowed to even know the size of the team we'd deliver the software to. I got good feedback from them (they were happy about what I did), and I'm pretty sure my static typing made things easier for them than if I had just picked Lua or something, but how easier, and how much money it will save in the long run I have no freaking clue.

Stuff in a missile company I cannot disclose. I believe my impact was almost nil, I couldn't stand their abysmal tech environment.

Maintenance of a ground troops training system (a glorified laser tag, with debrief helpers and so on). Good luck assessing the impact of a better system, and I was just asked to do small tasks I could barely chose anyway.

Prototype ADAS system. It was never deployed. Actual impact was therefore basically nil. Cool stuff to work on though, the CAN bus is a think of beauty. One of the rare instances where I could actually learn from example, instead of seeing yet again one of the gazillion obvious ways how not to do stuff.

Ground software for some IoT device. Impact fundamentally uncertain, we had yet to sell it to anyone.

Incident reporting software, based upon a more generic distributed base. I made the encryption layer (between users & company server), with a security based on PAKE (thus avoiding a PKI, which simplified the work of the sysadmin, at a slight loss of security). Impact fundamentally uncertain, we had yet to sell it to anyone.

Charging stations for electric vehicles. I did the TPM provisioning, and mentioned a low-key security issue along the way. I participated in a questionable micro-service that was meant to help user interfaces (yeah, their IoT stuff had a micro-service architecture). Impact: whatever I did didn't save them: one year after I left, they're now going under.

Preliminary study on the possible use of AMD-SEV to prevent users from peeking at our secret sauce (DRM). I don't think I was allowed to know the list of clients, and it's not even the only alternative. I don't think I could ever have assessed the long term impact of my work there.

Flight recorder for trains (not a flight recorder then, but you get the idea). I just did little tasks here and there, didn't get the chance to have a good bird's eye view of the thing or its environment. Deployment base was knowable, but the business impact of my work was likely minimal, beyond "finish this step so we can show the client we're on track for the next short term milestone". The whole thing is a heap of technical debt, common components are impossible to update (user projects aren't locked to a given revision, they all pull from trunk), the build system is a home made monstrosity that doesn't help more than the standard monstrosities (I hate build systems)… and I was just axed from a round of layoffs.

Cryptographic library I did on my free time: https://monocypher.org/ Nice little thing with a significant user base in the embedded ecosystem (not even my primary target). I controlled everything from start to finish, and I have no idea how many users I have, let alone how much time and money I saved them. In part because it is so simple, with such an outstanding documentation (which I mostly didn't write), that most users don't even have to bug me.

---

To sum this up, my resume looks fairly horrible with respect to what I know of my actual business impact. Most of it, I think, was entirely outside my control. And I don't think I'm exceptional in this.


If I could offer one piece of unsolicited advice: in whatever you do next, make it a point to understand a) how the business is doing and b) what your impact to the business will be.

Make it a point to gather numbers like e.g. revenue figures, growth rates, costs, usage metrics. It is true that us underlings aren't usually handed these numbers, but you might be surprised by how easy it is to get them once you start looking. And once you have those numbers, you can derive some good estimates around your own impact towards the business.

Having those numbers will help you understand your own impact. They will also help you avoid companies that, for a lack of a better word, are destined to fail.

(I'm a fan of your blog btw. I really liked the article on the Monty Hall problems!)


(Thanks for the compliment!)

I should be able to try this the next few weeks (new gig). But again, I'll be working on a prototype where such projections have already been made, the plan have already been laid out, and I'll mostly have to follow orders with relatively little wiggle room (they also expect me to hit the ground running). My only expected impact there is whether the stuff gets done on time or not. And assuming it is, the main company will throw the prototype away and rewrite everything anyway (at least they're supposed to).


Honest question: if you’ve never known the tangible value of your work, how did you decide what to do? It’s an uncomfortable question to ask, but I genuinely don’t understand how that would be possible.


Your manager tells you? Or, higher up the career ladder, whatever is most urgent for the best-paying customer? Like, I know what's on the list to finish for the next multi-million-dollar payout from a major customer, but how my work contributes to it, compared to work done by 20+ other people involved in dev, operations, qa, deployment, customer negotiations, etc.? Who the fuck knows? Best I can estimate is how much it'll cost the company if I fail to deliver my piece on time.


At a lot of companies I've worked at, the engineers are empowered to decide on what to work at, it's not like they're 100% booked by PMs. Even if PMs, they can argue.

"Your manager tells you what to work on" isn't how big tech runs for the most part, in fact it's a bit sad to think that some people work like that


It's how most people had me work. It's how most of my colleagues had to work too. Just doing what my manager tells me to work on is what is considered normal and expected in… 8 of the 10 or so places I've worked at.

The empowerment you speak of is but a faint dream in my town.


Interesting. That's different from companies I've worked at, where the sw developer usually asks for clarifications and can fight back if they have something else that they feel should be worked on that has a bigger ROI for the company. Companies I've worked at recently have been much more bottom up than top down


We can definitely ask for clarification. But fighting back is a grounds for being moved up the ax list. One workaround is trying stuff for free and present the results. Free labour for the employer, yay!


> "I rearchitected a massive system to run on microservices"

Saving a company from political mayhem is a pretty good achievement to have on your resume. It's also impressive because most engineering teams give up early on.


That depends on interest rates - right now it's a rare time when saved millions suddenly appear worth more than freshly rewritten pile of microservices.


Yes, and add in a couple, I saved the project or successfully competled the previously failing project...


> Why exactly should I want to do it?

Because you're a professional, and part of that means doing things to help your team succeed.

> They are going to think beyond the current job, because if they won't, nobody else will.

This is also right, and a good thing to hold in the other hand.

Reconciling these two forces needs a healthy org that allows employees to grow, along with a recognition that sometimes the forces conflict, and sometimes they don't. All we can do is play the cards we're dealt in the best way.

If you really want to learn new tech, that's what the off hours are for. I say this as someone who has a lot of things that intrude into those hours. I'm (slowly) learning frontend after being a backend/compiler dev for a long time. It's...not easy, but I like it!


> no intellectual challenge

I tend to think that cargo cult programming and resume-driven development are the intellectual path of least resistance. Perhaps it's analogous to, "I'd rather rewrite this than understand how it works", because that requires less intellectual effort. Quality engineering is not achieved by the intellectually lazy, from what I've seen.


You're not wrong, but when you're inheriting a convoluted 50 file React shitfest that could have been a single HTML page and 20 lines of javascript... what are you going to do? Invest time in understanding that, or radically simplify in 20% of the time it takes to grok what you get thrown at you?


Ah, I see you are also a coder of culture...

The trick is to get the Project Management to migrate to a hot new framework: Vanilla JS...

http://vanilla-js.com/


No, a single HTML page and 20 lines of Javascript is clear cut. But there's a _lot_ of instances where it's not that way, and still rewrites are being proposed.


Well I still need to understand what it is doing in order to radically simplify it and still have it do the exact same thing.


strawman. why do you even have a 50 file react shitfest to begin with? Hint: perhaps because someone want to pad their resume?


Hint: because almost every web developer is a junior who doesn't know what they're doing.

Proof: that's literally what a significant positive growth rate of an occupation means - if the doubling period is N years, then at any given moment, half the workforce has N years of experience or less. I don't remember the estimate for webdev, but I think N was something between 3 to 5 years.


I've seen this. Usually a combination of no economical constraints and technical curiosity on the engineers side.


> I'd rather rewrite this than understand how it works

Sounds like "how should I know what I think before I hear what I say" ;)


I mean yes, it works that way? Hence inner narrative, for those who have it, and/or talking to yourself via notebook or a text file.


> no intellectual challenge

If it's not intellectually challenging, you're not working on interesting systems. If you have to find interesting tools to find intellectual stimulation, consider a different system to work on.

As an example, I got to learn astrodynamics as part of my last job. Maybe not intellectually stimulating to everyone, but it didn't require me to learn the latest tooling just an interesting area of math and physics. The tooling and the language for the software wasn't that interesting, but that's fine.


I use boring architectures: JS/TS hitting a single C# server hitting a single database. I have had to (and gotten the opportunity to) learn about:

- Environmental chemistry

- Mass balance simulations

- Wastewater treatment processes

- Geotechnical engineering

- Ecology

- Mass transit systems

And quite a bit more. I could not agree with you more. Even without the broad range of interesting subject matter, there's no end to intellectually stimulating work simply trying to architect a system well.


A boring and simple tech stack can mean you focus on delivering features rather than trying to work out which part of your complicated system is broken.

The career benefit to me is that a simple tech stack allows a company to move fast and prosper. A prosperous company is usually financially rewarding even if it's not the most mentally rewarding.

Getting tangled up in shiny new toys can harm your ability to move fast and it can have a negative effect on your career at that particular company. Especially since the shiny new toy today is old and rusty tomorrow, but boring stacks will always be old reliable.


It is difficult to overestimate the value of being able to actually take time off because changes happen in a reasonable time and your software just works without any surprises. Give me a boring tech stack, please!


> So, not fun, not rewarding, no intellectual challenge, no career benefit. Why exactly should I want to do it?

It does help you get the next job. You’re just pitching it wrong.

Instead of “Built boring tech” try “Delivered $5,000,000 return 2 months early”. Watch your inbox blow up. Business leaders don’t care about what you do, they care about results. What you do to get those results is just an unfortunate cost and obstacle to overcome on the way to the outcome.


Most companies out there want you to have certain technologies / keywords in your resume and will automatically reject you if you don't have them.

Yes, building a solid project with boring technology that delivers real business value sounds good in theory but not so good when applying for a new job. Maybe it can help after you somehow manage to pass some initial screening.


> Instead of “Built boring tech” try “Delivered $5,000,000 return 2 months early”.

How do I do that without lying through my teeth? 17 years on the job, I never had the data to even begin estimate that kind of things. It was never my job to know it (I'm a programmer, not an accountant), and it was often actively hidden from me.

And how did you do it? How did you get your numbers, and what did you tell recruiters when you didn't?


Maybe I’ve been extraordinarily lucky, but I’ve always just asked and people were so excited that an engineer would actually care about things that are on their mind all day every day.

Might be more common in companies managed by OKR where you always know the business impact of your work. The business impact is your prime objective and you’re free to figure out the implementation.


Right? I was going to ask OP "have you ever asked anyone?"

Because, IME, managers, etc. love it when you show an interest in how the business works and where your part fits in. It also makes their job easier if they can relate the crappy stuff they have to assign you to how much benefit the business gets from it.


I must be doing something wrong because most of the time, getting interested in the wider impact of my work is held against me. I just ask why they want the stuff, suggest alternatives, point out issues, and the next day I'm an uncontrollable Maverick that means to rewrite everything and waste a ton of time…

This also happens after explicit requests for feedback. Maybe they didn't actually meant it and I'm supposed to "take the hint" or some neurotypical bullshit, but when I hear such requests I tend to take them literally, and provide real feedback on the real issues. Unfortunately for all of us those tend to be stuff that ossified years ago and cannot (or will not) be fixed any time soon. Ever, in my experience. Quite the downer.

Last time this happened I ended up being axed from a wave of layoffs. Whether their short term workflow and subsequent technical debt finally caught up to them, or their parent company just wanted to cut costs, I will never know. I do know my technical expertise was highly praised, and middle management felt I wasn't quite aligned with the goals of the company, whatever those were. (I think one significant cause was that I only emailed them about significant problems, and kept to myself and my team lead when it was smooth. I think in the future I will stop trusting them with anything negative.)

So yeah, I can bet they love it when you act interested about their work, but start questioning their decisions (even just the one directly related to your own work and its likely impact on theirs), and the tone of the conversation changes very, very quickly. At least where I've worked.


>start questioning their decisions

I'm not privy to any discussions you've had, of course, but I will comment on this because I see it in many tech people: don't question people's decisions. No matter how you phrase it, it will appear critical. And people (neurotypical or otherwise!) don't like criticism, no matter how much they claim to welcome it.

Those "explicit" requests for feedback? They want positive feedback. If you really must say something negative, suggest the opposite in a positive sense instead. i.e., instead of saying "we should not have done X. It was a bad idea," try saying "I think if we had done Y, things would have worked well" while avoiding any potentially embarrassing references to X.

Also

> neurotypical bullshit

Be careful that you're not "othering" the person you're talking to and dismissing their concerns because that's what I think when I read this.


The reason I speak of "neurotypical bullshit" is because people around me have several times suspected I'm on the spectrum, and I noticed I have a hard time dealing with subtle social cues. One manifestation of this is when someone asks me something with a straight face, I tend to assume they actually want what they asked for.

When someone asks them feedback so they can improve, I tend to concentrate on what could have actual impact, and this heavily biases towards negative feedback: the build system is bulky and not well documented, newcomers waste time on it, we should replace those global variables by messages if we actually meant to do an actor model, this piece of code is 10 times bigger than it needs to be and will drag us behind if we port it as is…

To be honest, phrasing the above in a nice way that avoids hurting people's feeling is exhausting. They want to know how to make their system better? Well, here are the parts that suck the most, addressing those should have the biggest positive impact. Short and to the point. Oh you didn't want me to tell you that? Why did you fucking asked then?


Here's the thing: You're right. And also you're not.

Sometimes, often, feedback is about timing. The time to talk about how better to architect a thing is when it's being architected, not 2 months later. Instead of "This is crap" try "Hey I know this works for now, but there's room for improvement here. Next time we're talking about a system like this can I help with design? I have a few ideas what we might improve".

Volunteering to do the work of implementing your improvements goes a long way to making people more receptive to feedback. Otherwise you're just giving people homework and they almost certainly already have enough of that.

edit to add: Yes, creating change is work and it is exhausting. That's why organizations value people who can pull it off.


I wish I could have that good timing, but I'm rarely hired early enough.


Hey 3 years from now the mistakes you’re making today will look like “Wow I wish I was there, that loup-vaillant guy sucked” to some new hotshot :)

That is to say: You can start now. There’s always plenty of new mistakes to make.


It's not that I wish I was there to prevent past mistakes. I just wish I could be allowed to fix the mistakes I see with the power of hindsight.

But there's worse: most of the time they already know. The various issues I balk at were often bothering them for years, and yet fixing it was never the priority. Week after month, there always was something more pressing, generally about some deadline. Such issues often have gone long enough that the cost of keeping the current cruft have long exceeded the cost of replacing it. But this cost is gradual, so the fix keeps getting pushed further and further into the end of times.

Perhaps my biggest mistake is not seeing that such organisations are beyond helping. I keep getting fooled by promises of doing things right going forward, when I should instead extrapolate the past. Either they're already work in ways that I can approve, or at least live with, or I should seek something else on the spot.


The usual resume advice to quantify everything so each bullet point conveys "number go up" also falls apart when you invent something, create a new revenue stream, new product, etc. The previous value was nil, therefore I improved it by... infinity percent?


Exactly. People forget that the final and most important decision for hiring will be at a less technical and much more bean-counting level.

That's the reason why CS graduates with only bells and whistles in their CV have hard times getting a relevant position - glitter over your resume doesn't deliver value at all.


> That's the reason why CS graduates with only bells and whistles in their CV have hard times getting a relevant position - glitter over your resume doesn't deliver value at all.

If this is true, why does everyone still think that filling up their Technology Bingo card will get them their next job, rather than delivering business value?


> why does everyone still think that filling up their Technology Bingo card will get them their next job

It does for entry and mid level jobs. When you do this you’re advertising that that’s what you’re looking for – a grunt level job.

Unfortunately most job seeking advice out there is for people new to the industry. Because there’s more of them.

Think about it this way: Would you trust a CEO whose resume lists word, excel, powerpoint, and google docs under skills? Probably not, but you sure would expect a ceo knows how to use those.


> So, not fun, not rewarding, no intellectual challenge, no career benefit. Why exactly should I want to do it? … why exactly should I go above and beyond what I agreed to in exchange for my salary?

I think that delivering a solution which works, even if it is not sexy, is exactly what one agreed to in exchange for one’s salary. It may not be fun, it may have no intellectual challenge, and it may have no career benefit, but it is rewarding: the reward is the salary.


They don’t call it work because it’s fun.

Your goal isn’t to be intellectually stimulated at your job. If you want that, read a book. Your job is to deliver reliable, lasting value.

Overcomplicating the architecture for the sake of job security is a con you run on your employer.


> Your goal isn’t to be intellectually stimulated at your job. If you want that, read a book.

And then people are surprised burnout rates are as high as they are. Lack of mental stimulation leading to burnout is the white-collar equivalent of repetitive stress injury at jobs that put strain on the body.

> Your job is to deliver reliable, lasting value.

Nobody is actually paying you for that. In fact, it's probably counterproductive to the business goals.

> Overcomplicating the architecture for the sake of job security is a con you run on your employer.

On the other hand, "work ethics" and professionalism in modern workforce is a con your employer runs on you. The further above and beyond you go, the more work they get out of you for the same pay.

Yes, I'm being a bit obtuse here. But my point is, there needs to be a balance. Or at least a mutual understanding of conflicting incentives. We can't demand facets of professionalism in the way that benefits employers short-term, but deny and scorn those that empower the professional. Independent learning and broadening one's experience is a part of what being a professional means.


The fact that you're all the time in hackernews probably means that you're very bored in your actual work, as well with the "FoMO" on AI. I don't think you're on a good position to judge what you're judging, or to give business insights. I believe all of your takes are bad in this thread..


> Your goal isn’t to be intellectually stimulated at your job. If you want that, read a book. Your job is to deliver reliable, lasting value.

That's not my goal. That's not even what my employer wants most of the time. Most of the time, it's just about a bunch of rich dudes (and fewer ladies) wanting me to make them even richer. That's how the system works, no need to write the "C" word, or call me with the other "C" word just because I say it so bluntly.

My goal is to enjoy my life. I have various ways of enjoying life, many selfish, some altruistic, very few aligned with the will of the rich people up top. My job takes about a fourth of my waking hours (if I worked full time it would be a third), valuable time that I'd rather spend for me and my loved ones, instead of giving it to people who already have way too much. The only reason I can sometimes tolerate unrewarding work is because I don't have a better way to pay the bills.

The reason I don't over-complicate architecture isn't because it will make more money for my employer (sometimes it means making them less money, especially in the short term). I keep it simple because I can't stand wasted effort.


IMHO it's about society.

If you're asking on a personal level, I think that if you keep to the high ground, you're more likely to find long-lasting happiness. David Graeber spends a good deal of the pages in Bullshit Jobs on this topic.


Hopefully, hopefully your incentives are aligned with your team's success.

If they are not, I am truly sorry.


In almost every business setting, your incentives are _partially_ aligned with your employer's. For instance, you usually both want to build a good product; conversely, you want to get paid as much as possible while your employer wants to pay you as little as possible.

If it's all above board, and the non-aligned parts are agreed-to, all is well.


if the pride of a good job done isn't enough motivation for you then you'll never understand because you simply don't have the ability to.


Unless you're working pro bono, the "pride of a good job done" isn't enough motivation for you either. Your employer may wish it was, though.

Point is, there is more to the equation. Employees don't exist in isolation, and when jobs actively refuse to take into account that the workers are real, living human beings, with needs and plans and dependents to feed, then resume-driven work will continue.


> Unless you're working pro bono, the "pride of a good job done" isn't enough motivation for you either. Your employer may wish it was, though.

wait, let me guess, if I truly loved my wife I'd also never play a video game by myself.

Watch as my eyes roll out of my head, onto the floor, and out the door.


human factors like drive are more important than most project managers would like to believe

if you have people who are effective, allow them some space for fun and intellectual challenge even if it takes a bit away from the workload - if you disregard those human factors something will give up at the end, perhaps catastrophically as efforts are made to add "sexiness" to the core of the mission critical workload


Most jobs aren't a source of those things. Why should software development be any different? Introducing unnecessary technical challenges just to make your work interesting often has a negative impact on the end user/customer, which you should give a shit about. Do you think lawyers and architects complain if they aren't allowed to jump on every fad and make their work needlessly complex?


>>> but you are not selling that well here

I did not sell it well, that's fair.

>> Why exactly should I want to do it? This isn't the goddamn United Federation of Planets, nor is the company a church - why exactly should I go above and beyond

HN doesn't want to hear this answer: you do it for the PEOPLE around you.

If you build sexy tech, and then get sexy job and I have to clean up your turds... well you can go fuck yourself. Hope that I'm never going to be the one answering the linked in request for a recommendation or sitting on the other side of the table when you come in for an interview.

The tech job market is bad, and getting worse. You're gonna need an advocate on the inside if you want or need work quickly. That means former co-workers and bosses. NO one is gonna hire clock puncher who did selfish resume building project and left. Dont be that guy.


> not rewarding, no intellectual challenge

Don't forget that simple isn't easy. I find it very rewarding and intellectually stimulating to solve a problem with a solution which is as simple as possible - but no simpler.


The problem is when you prioritize your future career over playing your position and achieving results for your current company. It ends up hurting both the company and your own future prospects because this mindset will inevitably get outed by an engineering manager who isn’t easily bamboozled by shiny objects.


Because your work is then stable. Easy to maintain, not getting paged for customer problems. Which leaves you the time to do more work that will be interesting and beneficial.


No it doesn't. If your work is easy, the scale gets ratcheted up until it's nearly impossible. That's why web devs have so much trouble with the seemingly-simple task of parsing and producing text


What scale?

How does easy work affect scale?


I believe he is saying they add expectations and responsibilities till you are back to equilibrium (i.e. over extended).


If the equilibrium is always reached, then why wouldn’t I make it easy on myself by making things that are easy to maintain? I want fixing issues to be like blowing out a birthday candle, not dealing with a forest fire? I’d rather blow out 20 candles than deal with a single forest fire.


Of course you would want to do that. However:

1. It's hard to estimate what will or won't be easy to maintain down the line;

2. New tech becomes hot new tech because it promises to be easy to maintain down the line;

3. Most of the maintenance burden is due to choices of other people anyway, so you have limited control over that.

Trying new tech is often a bet on the promise of better maintainability turning out true. It usually doesn't, but the status quo is so bad already that people are grasping at straws.


I tend to stop trusting people/companies/industries which break promise after promise. I want to go with solutions which have proven themselves to be true and stood the test of time. It needs to be worth people’s time to learn, not just today, but in 5 years.

A lot of times tech is so focused on the tech that they forget about real problem they’re trying to solve.


True, exactly witnessed this a dozen times.


This is the kind of comment I come to HN for.

I think this is an absolutely right read on the situation. To put it in a slightly different context, the magpie developer is more akin to a "sociopath" from V. Rao's "Gervais Principle" [0], doing the least amount of work for the company while forging a path forward for their career. In this case, it just happens to not be within the same company.

[0] https://www.ribbonfarm.com/2009/10/07/the-gervais-principle-...


That guy was just optimizing for future employability, albeit in a short sighted way. Being able to talk in an interview about how you have professional experience with various tech stacks is valuable. That being said, optimizing for that at the cost of current job performance and coworker alienation is just dumb, since job performance and networking leads are more important for landing good jobs. I'm guessing this guy was a serial job hopper who had no expectation of being able to progress up the ladder at the company you were at.


   > I'm guessing this guy was a serial job hopper who had no expectation of being able to progress up the ladder at the company you were at.
Sometimes folks find themselves stuck in a kind of typecast role: they're "the guy" who does "the thing" that the company needs right now-- until they don't.

In many places no one will invite typecast folks to transition to different, more interesting roles that align with their interests. Instead the person will simply be discarded when they're no longer needed for that thing they do. To get around this requires some initiative and that means not "asking for permission" to try new stuff. Sometimes it's better to just take a chance and do something new. There's a risk of cargo-culting, of course, but hey there are worse things that can happen.

Danluu, as he indicated many times, comes from workplaces where staff are paid in multiples of 100K. These are elite "end-game" jobs, not "dead-end" jobs. Such staff are very much tied-in to the performance of the company objectives (in a real sense ($$$$) not in a mission-statement sense), so yeah, these places ALREADY have resources and tech in place that are marketable in other places. There's no need for folks in those workplaces to desperately get out of some php dungeon run by a B.O.F.H petty tyrant.


> I'm guessing this guy was a serial job hopper who had no expectation of being able to progress up the ladder at the company you were at.

The magpie was practically furniture (Over a decade there). We speculated that he had buried a literal body for the CEO based on what he got away with. Shiny objects was an astute call on the part of IT guy (he was setting up another new MacBook for him)


On the other hand, at least someone was exploring new tech. In the exploration/exploitation problem, going 100% exploitation and only ever using the same boring old tech for everything is not the optimal choice either.


This is also part of the reason you find reliable reseller partners. They can burn cycles figuring out what new tech is useful and what is a waste of time so you can spend your cycles actually getting things done with cool new tech that works without wasting your company's time and money on things that have fatal flaws that aren't immediately obvious.


One reason why people hire me, is for my actual, production, experience in loads of stacks and architectures.

Actual, production experience, is, IMO, a requirement to make decisions. No-one will make a good decision to ditch or embrace, say, microservices, based on a HN conversation and a few blog-posts. Nor will they make such a decision based on papers in science journals.

But rather based on failures with monoliths, failures with microservices, successes in Rails, successes in Drupal, and failures in Rails or Drupal. (Or leptos, React, flask, whatnots). Actual felt pain, and drawn learnings. Actual celebrated successes and drawn learnings.

edit: I'll often bring that up in consultancy. "Yes, Rails is great because X and Y. But Z is a rather real danger for your case; we've been bitten by that when building FooBarLy..."

What I'm trying to say: yes, indeed, this person is also collecting experience and evidence to make future decisions on. That there's a real need, and actual benefit on trying and implementing new tech. If only because otherwise we'd still be maintaining COBOL mainframe spagetti (oh. wait...)


Be honest with me, how many jobs have you had that cared about your variety of experiences?

I’ve been applying to jobs for months and they’re all looking for go and python devs.

I have production experience with both languages, their common web stacks, and many others (ruby, js, php, c#, elixir, erlang, rust).

I’ve felt that even mentioning that I have experience with other stacks is a turn off to recruiters and EMs.

Nobody seems to care about breadth of experience nowadays.


All of them in the last decade.

But I guess we misunderstand each-other. None of them cared that I knew "a lot of stuff that isn't appropriate here".

For example, a recent gig, hired me because I'm not just another Rails "expert", but a Rails expert with Typescript experience, who built large CI/CD pipelines with containers and has built complex (is there another way?) AWS infrastructures etc.

Sometimes they need someone with that exact skill-set. In this case, they needed someone to help them move from yet another "upwork-delivered-rails-spagetti" to something that could actually be maintained.

I convinced them to drop the react/typescript frontend for now (it was terribly bolted on) and to forego building their own PaaS nightmare on AWS but instead just push to Heroku - for now.

My experience helped them make tough decisions.

Sometimes gigs hire me because I have a weird combination of experiences. But more often because my experience allows me to help them make decisions on architecture and such. Granted, I am often hired as "senior architect" or some such. And one of the first things I do, is convince them they should never again hire an "externalm interim architect", lol.


I envy that position.

Feels like because I didn’t work in go or python at my most recent job, I’m having trouble landing anything. (I have 5 years of golang prior to my most recent job)

My experience working in devops doesn’t seem to matter for SE positions either.


I do freelance gigs. And I charge what's "enough" for me, so price may play a role. And so I have several gigs a year. I'm on my third for 2024.

I give talks on meetups and conferences, so my insights are seen.

I prefer to work in start- or scale ups, so there's a real need for some "know it all".


How did you get started doing that?

It might just be the area I live in, but I wouldn’t even know how to start looking for freelance roles that aren’t just poorly paid 3mo junior contracts.


Exploring tech is great! … for smaller projects for proofs of concept, prototypes, side projects, projects specifically for researching new technologies… heck yeah.

Just not for your prod architecture. Many late night beepers have beeped and phones lit up because the piece that holds together the two pieces that let that thing talk to the database queue monitor thing stopped working so the whole thing went down.


Maybe. Some people really are like collectors chasing the latest thing. You see this in all fields and things. Ever been to someone's house that always has the latest gear in whatever hobby they follow? There is no reason to think people won't do the same in settings other than hobbies.


I'm immediately reminded of my favourite Kurt Vonnegut quote: "Another flaw in the human character is that everybody wants to build and nobody wants to do maintenance."

I've always felt that the magpie syndrome you describe is because of the desire to build new things, rather than maintain what's there.

I watched a repair show recently where a craftsman repaired an old 70s bedside clock. The pride, time and patience he took in repairing the clock was wonderful to see, actively avoiding rebuilding parts if he could reuse what was there, even if there was a crack or blemish.

I've always respected engineers that maintained and fixed software well, and also knew when to reach for the right tool in the toolbox. Better yet, those that knew when not to build something new. Perhaps that's something you learn through experience and doing, but I wonder if it's actively taught, encouraged, or rewarded in the workplace. It really should help get you your next job.


Is it a flaw though? There's a lot of truth in that eCard: "a clean apartment is a sign of wasted life". How much of technological progress occured to ease the maintenance burden? Is it a flaw that the washing machine is saving people ridiculous amount of time (to the point of, arguably, allowing two-income households to exist in the first place)?


i'm glad i have "kurt vonnegut" notifications because this was nice to read.


One of the best jobs I ever had was under "technical magpie." Did we get shit done? No. Did I get paid a lot of money and always have new stuff to do instead of shoveling CRUD? Absolutely. It was a blast.


Yes, it's basically being in college - while being paid. If your resume is full of those kind of roles, I'd disregard your resume and many experienced managers will as well.

Remember that your resume will not hold much value, when you give off "we built this thing with friends in a garage" in your resume and little else.

Have you supported anything in production? No? Explain why should you be a candidate for anything other than an entry level position as a SwE.


> If your resume is full of those kind of roles, I'd disregard your resume and many experienced managers will as well.

That's why you lie about it.


"lie" is a strong word but my resume is always optimized for the role i'm applying for. If I have experience in a technology that's not relevant then i leave it off and use the space/attention for something better matching the role.


I had a job like this for a while. My boss always wanted to be involved in the new stuff and I was the one he threw it at to kick the tires.

Some stuff got done, but nothing too mission critical that kept me up at night and it was pretty relaxed.


Just be careful not to go too far in the opposite direction. There are new things coming all the time. You probably don't want to be writing new COBOL anymore even though it was once a good idea (you might have to maintain it, but you should already know what you replace it with and what your interoperability strategy is)


Isn't there a labor shortage for COBOL engineers to maintain the mainframe code that powers $3T of transaction volume in banking and healthcare enabling skilled COBOL contractors to name their price?


Only at the salaries those banks want to pay, that aren't high.


Depends on the bank and what the code is. I know of insurance jobs that pay very nice salaries. 9-5 job where if you are around at 5:01 they tell you to go home. Vacations are mandatory as well (all banks have mandatory vacations - they need you gone long enough that if you are embezzling money whatever scheme you had breaks and someone else figures it out investigating what is wrong). It is however boring coding that will suck your soul so many around me accept less pay for more interesting work.


COBOL itself is pretty horrible, but if there's an old tech which I'm happy using and there's still high demand for it, why not?


Using it is fine, but you need to know it is horrible and you should already have a this is what new stuff is done in plan in place. Or at least try to make that plan, there may not be a better alternative yet, but you should be looking.


you can let the industry do the testing for you.

It's like changing to the new version of an OS on day 1 verses waiting 6 months.


When has a decision that’s bad for the decision maker ever been popular?

We see it in the C-suite; we see it with engineers.

I think the travesty of so-called “principal engineers” and “engineering leaders” is their adamant refusal to make doing the Right Thing (TM) sexy.

Your employees are monkeys: act like it.


Yep. Microservices! AWS! Everything Gartner and Thoughtworks says! It'll look good on my resume...

..several years later..

Escalating cloud costs, high staffing cost, staff turnover, heavily reduced margins, decreased productivity, burnout, clients unsatisfied, C-suite paving over this by hiring more marketers...


I wonder how many early stage businesses went tits up because they drank the microservice kool-aid and burned valuable engineering cycles that should have been spent on features on docker spaghetti.


I once interviewed at Fast. One of the questions they asked was how to scale up a rate limiter. In my mind I was wondering why you'd ever need to worry about scaling up a rate limiter. The answer apparently was some kind of microservice.

The company eventually folded[1]. Turns out the company was burning millions of dollars in hiring + infra, while generating only $600,000 in revenue.

[1] https://newsletter.pragmaticengineer.com/p/the-scoop-fast


Alternatively how many later stage business failed because all their features were in a Rails monolith that no number of engineers could maintain.


The Rails monolith companies probably have a better chance at adapting than the 50 microservices maintained by 10 devs companies.


This. Just silo the monolith out into tenants.

Salesforce, not exactly a small monolith company, did this for a very very long time.


Well, did it look good on the resume?


Someone had to stay behind and muck out the stables...


Yeah, but if that expectation was false, those people were justly punished. And if it was true, the problem is clearly elsewhere.


>I think the value of so-called “principal engineers” and “engineering leaders” is their adamant refusal to unconditionally jump on all the latest bandwagons and instead make judicious selections of technology appropriate to the problem at hand.

FTFY.


As a product manager, I am frequently confronted by UX people who declare something as „standard“, a feature that is supposed to be an absolute „must-have“, or else our organisation would loose even the last of our users. Unfortunately, developers tend to accept these things as interesting challenges and (knowingly or not) underestimate the effort and complexity needed to implement it.

My very lonesome role in these cases is to insist that this shiny thing is no standard at all and that our users would be quite happy without it.


I see this with a lot of people that do visualizations and dashboards. A lot of fancy stuff going on but most people just want simple tables with filters that apply to other views (mostly rollups) when navigating.

The Wall Street Journal Guide to Information Graphics is an essential reading for anyone who regularly presents data.


> I have seen projects get done just to pad a PM or developers resume

This reminds me of the time that I complained that a sensor on the machine we were developing was way too complicated and that there were far simpler ways to accomplish the same thing.

Someone with more history on the project explained. The engineer who designed that sensor expected, and received, a patent for his work. The company was quite generous with patent royalties and no one's getting a patent for an off the shelf float switch!


It's the job of engineering management to stop this. We're supposed to say "why do you need this? Justify the need for this". I.E. "Why do you need kafka here? Will we have enough traffic volume to warrant it? Make a proposal." And they need to follow up and ask "Was that needed? Show how we're using it".

But engineering management is so busy filling out TPS reports they don't have time to actually do any oversight.


That would require that engineering management actually be competent technically. A shockingly large number aren't.


I have rarely seen engineering management that's of any help making these decisions. Either they resist any change or they jump on technology because they have read a LinkedIn article or Gartner report. I have never seen good, fact based technical decisions come from there.


engineering management is equally likely to assert some kind of baseless 'best practices' position without really understanding whether or not its actually a good idea in this context


to be fair, when you don't have any pathways for working on $COOL_TECH at your job, designing and justifying something overly complex makes sense


I'd argue it's something a little more fundamental than mere CV padding

Taking the CV aspect literally — this is a sleight of hand because it's a metaphor — I know lots of people who do this stuff that don't have a CV at all.

There's levels to it of course, but I don't really view it be any different to people who insist on using old machines as a bit (but in the other way obviously)


Don’t we get paid the big bucks precisely because we have to fix stuff like this? I mean, if maintaining and fixing software were easy, I guess we wouldn’t be earning 6 figures.

In software engineering We have all these principles and patterns and whatnot precisely because we have to deal with a pile of things that don’t work well together.


I think our outsized compensation is less because it's hard, and more because of our industry. In tech companies, labor is a relatively small expenditure, so tripling your labor budget to get 5% more "output" can be a very rational thing to do. (Also, the Mythical Man Month means that a smaller, sharper team is often more useful to solve your task than a bigger one.)


It’s not the tech, it’s the business - people pay for new and shiny things to be added, regardless of the actual value they bring. Engineering managers hire for shiny things on your resume precisely because of that business trend.

Tech trend will continue until this business mindset of burning money on shiny things changes.


It's a common behavior. When I started my last job as the software lead at a small company, I was warned that one of the senior engineers on my team was easily distracted by shiny things. They were not wrong. The dude was smart and effective, but I had to spend way too much time keeping him on task.


> Workman like functionality isnt sexy, it wont be the hot bullet point on your resume, it wont get you your next job, but it is dam effective.

When you see opportunities to do such work in a way that delivers something new and valuable, I recommend taking hold of them. I learned this somewhere along the line, but not until I’d missed a few from self-doubt and indulging the inner magpie.

Clear, simple and value focused engineering is _exactly_ what I’m looking for in candidates these days. Focus on the long term, not the short term — just like with investing.


deal with a bunch of this right now, no considerations for future growth of system and also dump everything in json and itll be ok ....tech debt in architectural designs is real... and it takes a lot to trim it back and say ok now we are moving to XYZ tool that works and doesnt need to be shiny. Had a chat with a client once and they needed something and i said this looks like itll be a report and they wanted some super duper dashboard but all you needed was a small db + csv extract for charts etc.


> When cloud was the hot new thing the company that I worked with launched a replatform on AWS. It gave us the ability to get through the initial scaling and sizing with ease. We left right away, because even then the costs did not make sense.

Cases like this always fascinate me. I've led a "move from Data Center to AWS" effort twice, and both times it was at > 50% cost savings. However, I think both were probably small infra footprints compared to many cases like many others.


I wonder if this effect ultimately advantages Google, and other companies that create their own internal tools rather than using the new shiny buzzword.


I understand that's your opinion, but could you show me some badly designed user research results to make this conversation more data driven?


> more data driven

Any other engineering discipline. What are common practices in IT would be negligence in other disciplines and might get your permit/license removed.

IT is the only sector where companies like Cisco or SAP can exist despite the horrible reliability of their products.


As one of my friends, an SAP consultant, said - "The value of SAP isn't that it's actually good, but it's predictably scalable"

As in - you can setup a process in Germany, then replicate it globally with predictable accuracy. And predictability matters a lot in stable low margin businesses. Walmart can't spend a few billion on a project that may have -100% to 400% return value, when. they have the option of having a reliable 20%-30% return value.


LIDL famously burned around 500M € on a SAP rollout before pulling the plug.


Provided the transition to SAP doesn't bankrupt you.


It is realy funny how SAP is the one single big software company from Europe and it is an absolute dumpster fire.


> IT is the only sector where companies like Cisco or SAP can exist despite the horrible reliability of their products

Come on, other industries have garbage companies putting out garbage products, too.


> Come on, other industries have garbage companies putting out garbage products, too.

That's correct, but we have to admit that the software industry excels at this.


Software is full of monopolies. But monopolies' products are garbage in every industry.


Can you explain it in a different way? I have no idea how it relates to my comment.


Not to take anything away from the angle (which is bang on), but the "we're X people valued at $YBILLIONS" is a weird way to open and detracts from the message. I suppose it's a proxy for.. seriousness? I dunno. Valuations are just bets on NPV given "what could go right" scenarios, so congrats on having a convincing story to tres commas your val. But opening a pitch on architecture with "we're a small team building a simple numbers app but lol $BILLIONS" and then using some other company (SO) as the case study ref.. ok yup what. This affected casual "#winning but like whatever" style is everywhere now. But I'm a salty old fuck so it's probably just me it offsides.


> "we're X people valued at $YBILLIONS" is a weird way to open and detracts from the message.

The most common argument against simpler architecture is, "but that won't scale and we are going to grow to be a valuable company." So the idea that they aren't a $10 million company does seem somewhat relevant.


Writes simple and then drops GraphQL and K8s.


Both of which are justified in the post. Like, complexity where it makes sense is a good thing, especially if that complexity brings benefits.

Complexity for the sake of complexity is foolish.


Enough of this strawman one-liner "complexity for the sake of complexity". Unnecessary complexity is introduced because someone once thought it was necessary. It may not be necessary anymore, or (even worse) it might have not even been necessary at the time and the person who introduced it was just wrong. But all complex architectures start because people think it will bring benefits.


"The person who introduced it was just wrong" is doing an awful lot of heavy lifting in your post. I've seen "this is just the only thing the programmer knows" many times, for instance, in which case one can argue whether an actual "decision" is being made. Certainly "this is the tech the engineer wants to learn" is a real problem in the world, in which case again it's rather debatable whether "necessity" ever crossed their mind as a concern. "I was taught in school this is the one and only way software is designed" is common with fresh grads and they aren't really making a decision because they don't see a decision to be made.

You seem to be implying that all complexity is the result of an informed decision made by someone but this does not match my experience in the field.


> You seem to be implying that all complexity is the result of an informed decision made by someone

My life would be much easier if this were true. One day, maybe....


Agreed, but I rarely see this work out, even in the short term. It certainly seems like complexity for its own sake, and someone once told me that our system was embarrassingly simple so that's why they introduced a load of complexity. Sigh.


> someone once told me that our system was embarrassingly simple so that's why they introduced a load of complexity.

This exceeds my ability to suspend disbelief.


You can't imagine someone saying "look at this system, it's totally basic, programming 101, clown shoes, baby's first web service, probably made by a PHP coder, not enterprise-class at all, we need to put on our big-boy pants, I read some whitepapers from thought leaders at FAANG and we need to implement..."


GraphQL is less complicated than querying dozens of interdependent REST queries.


you can absolutely use Kubernetes to implement a dead-simple three tier with minimal ops overhead, especially if you're using a managed cluster


Just like you can use a fully orchestrated API-driven cloud IaaS to run a simple binary.

Show me your problem, I'll show you how to solve it with the least amount of code and dependencies.


And if something goes wrong, is it simple to debug and understand? Can I reproduce issues locally? Happy path simplicity is easy.


I use minikube daily for local setup works perfect


As someone who dissed GraphQL for a long time, Hasura/Apollo are in a really good place now, the amount of code you don't have to write and the performance/features you get for free are really compelling. The setup isn't conceptually simple, but it sure does keep your codebase lean.


Yes, but that isn't the thrust of the article.

"Simple architectures" and "boring technology" are slogans meant to keep technologies like GraphQL out of your stack. Writing less code and getting features for free is how one describes "exciting technology".


<smh> Agree. No need for the throwaway on this one!


The problem is that most people have never actually built something in all the architectures they are considering.

They just read blog posts of what to do.

As an analogy, I’ve manufactured a lot of stuff out of wood, concrete, plastic (3D printed) and increasingly more metal. When I need a jig, I know what each material will be like and what I will be getting out of it.


If I tell you that clay is better for your next project then you don't know enough. Am I right or wrong - should you invest a lot of money into clay equipment, and time into learning to use clay? You need to make a decision based on what others say about the advantages of clay - maybe clay isn't better, maybe it is better but not by enough to be worth learning. If this is just for fun maybe you say you want to learn clay so you spend months learning to do it, but if this is for a job your really wants some confidence that clay is worth learning before investing your time to learn it.


Well if my day job is building stuff and it’s the reason I picked this career and it seems companies pay $$$$, give great benefits, and lets me take 2-4+ week vacations wherever I want if I’m good at it, and going to work is great because everyone gives you the fun problems knowing very well you know your shit…

Yes, it's worth learning a little about clay. Maybe I'll even enjoy it.

(And I know you’re trying to make an example, but I have already looked into clay a bunch of times. Clay just doesn't have the material properties that my projects tend to need. It'd be like learning COBOL expecting to build front-end apps.)


As a young gun coming from working in games before touching "internet"/"enterprise" software back around 2006 I had an eye on performance matters and all the early Google papers caught my eye (esp as I was on a project with an overpositive sales CEO that had in his mind that we'd run to Google valuations within a year).

A sobering account was our second CTO who told us that their previous 65000 user application ran on a single database/server, so for the rest of that project we just kept most our application on a single DB and focused on tuning that where needed.


I work at a company with several 100k MAU. Single db + 1 standby replica. No problems.

I believe even Stackoverflow used to run on something similar, not sure.


I like to treat simplicity as an ingrained attitude.

When hiring someone, I love to see if they are visibly annoyed when something is over-engineered. Or when a user interface is too complex.

If they have strong emotional reaction, it's a good cultural fit :)

The inverse is true too. If someone "gets off" to complexity, I view it a red flag.


Highlight that 99% of companies should take away from this piece:

>since we’re only handling billions of requests a month (for now), the cost of this is low even when using a slow language, like Python [and simple synchronous code], and paying retail public cloud prices.


> Another area is with software we’ve had to build (instead of buy). When we started out, we strongly preferred buying software over building it because a team of only a few engineers can’t afford the time cost of building everything. That was the right choice at the time even though the “buy” option generally gives you tools that don’t work. In cases where vendors can’t be convinced to fix showstopping bugs that are critical blockers for us, it does make sense to build more of our own tools and maintain in-house expertise in more areas, in contradiction to the standard advice that a company should only choose to “build” in its core competency. Much of that complexity is complexity that we don’t want to take on, but in some product categories, even after fairly extensive research we haven’t found any vendor that seems likely to provide a product that works for us. To be fair to our vendors, the problem they’d need to solve to deliver a working solution to us is much more complex than the problem we need to solve since our vendors are taking on the complexity of solving a problem for every customer, whereas we only need to solve the problem for one customer, ourselves.

This is more and more my philosophy. I've been working on a data science project with headline scraping (I want to do topic modeling on headlines during the course of the election) and kept preferring roll your own solutions to off the shelf ones.

For instance, instead of using flask (as I did in a previous iteration of this project a few years ago) I went with Jinja2 and rolled my own static site generator. For scraping I used scrapy on my last project, on this one I wrote my own queue and scraper class. It works fantastically.


It was a genuine joy to see a website in this day and age which doesn't try to impose margins on text. I want my entire monitor to be used, and this site delivers.


You might be alone in this take. While I agree a lot of websites do it poorly, the only way I personally can read Danluu is when I make the window smaller.


It's the same reason that cooking recipes often become overly complex. (A sprig of this and a pinch of that.)

We are often embarrassed by the simple.

The simple makes us feel small, unsophisticated, and unimportant.

I always find it amusing when I find a way to solve a problem simply, and my coworkers answer somewhat directly that it seems very simple.

To me that's mark of success, but to many it's uncomfortable.

Of course if we didn't work on it then it's over complicated it needs to be rewritten. Go figure.


I agree with the majority of this with respect to simplicity in general, and only really take issue with this bit, about GraphQL:

> • Our various apps (user app, support app, Wave agent app, etc.) can mostly share one API, reducing complexity

This could of course be done with a RESTful API.

> • Composable query language allows clients to fetch exactly the data they need in a single packet roundtrip without needing to build a large number of special-purpose endpoints

This, too, can be done with a RESTful API. Two approaches which immediately spring to mind are passing desired items as query parameters or passing a template item as a query parameter or request body.

> • Eliminates bikeshedding over what counts as a RESTful API

This may actually be a benefit of GraphQL! There’re a ton of soi-disant ‘RESTful’ APIs which simply don’t follow REST principles at all (e.g.: ‘REST’ ≠ ‘JSON RPC’; also HATEOAS is hugely important, even when the hypermedia is in a non-HTML format such as JSON). But I also think that real REST really is a powerful framework for distributed computing. Sometimes the discussion really is bikeshedding, but sometimes it is a definition of terms.


> Two approaches which immediately spring to mind are passing desired items as query parameters or passing a template item as a query parameter or request body.

I've implemented this twice, you quickly end up at something resembling GraphQL after a couple of rounds of "iOS needs the the profile picture in two different sizes, dependent on native screen size, and with version 2 of the blurhash"

Might as well use GraphQL and get schemas, libraries for clients and servers and IDE support out of the box at that point. It's no panacea - many of the backend libraries were somewhat low quality last time I checked - but I wouldn't hesitate to use it if I had a somewhat complex main product served by multiple frontends following different UI guidelines.


I might be fully past the idea that what makes the difference is the architecture rather than the team.

I am certainly past the idea that monoliths impose any kind of barrier to complexity. In many ways I find it to be precisely the converse.

Sure, there's nothing meaningful to choose between one giant plate of spaghetti versus a dozen small plates flying in loose formation. I'd still rather have lasagna than either.


> I might be fully past the idea that what makes the difference is the architecture rather than the team.

SO much yes. You may have the prettiest architecture, and the best separation of concerns, and the most modular modularity, but if you don't have the right competencies, processes and people all that would go out the window before you can even finish the power point slides showing off your nice new architecture.

It's like a garden - if you don't tend to it, it turns into a mess, and the fight against entropy never stops. If you also have to fight the team along the way you've already lost


Strongly agree here. We wrote a similar post on how we’re scaling to a pretty big size with stock Python and Django: https://www.ethicalads.io/blog/2021/07/handling-100-requests...

So much of the content around the programming web is about huge scale and domain-specific stuff. We’re in a pretty magical place where the core technologies we use can scale to the size of a successful business without much work. We should promote that more!


I think these kinds of articles are a bit glib. Developers implement architecture to solve problems confronting them. Sometimes a new problem then arises which must be dealt with, and so on, until the architecture in hindsight is quite complex. But you can only know this in hindsight. A few companies didn't run into the "piling on" of issues to be fixed, and so look back in hindsight, see their simple architecture, and think, "we know something that everybody else doesn't," when in fact they're simply experiencing some form of survivorship bias.


Architecture isn't done to solve problems confronting you now, it is about solving problems you will face in the future if you don't solve them now.


Think there's lot of examples of the Fortune 5000 just being a single Rails or Django app for the longest time.

Seems in line with the we're soon going to see a single person billion dollar company narrative.


Micro services is a deployment pattern and not a development pattern you could build monolith and expose various services to and various parts with an Ingress and point of to the same monalic and for example in java project these various end points of the services inside the same on it would only load up the classes/objects which are relevant to that service. There is no overhead in terms of memory or CPU by placing monolith as micro services exposed by end points


Micro services is a team organization pattern, emulating the software service model, except within a micro economy (i.e. a single business). In practice, this means that teams limit communication to the sharing of documentation and established API contracts, allowing people to scale without getting bogged down in meetings.


Could it be that deployment patterns and team organization patterns are the same thing, especially in this age of build-run teams?


Conway might find some correlation, but strictly speaking, no. A service is not bound to any particular deployment pattern. Consider services in the macro economy. Each individual business providing a service is bound to do things differently. As micro services are merely emulation of macro services in the micro, each team is equally free to do things however they wish.


Rails and Django are brilliant when they can host all your capabilities. Keeping it simple is the operational mantra, and rightly so. What's more complex is defining simple. My organisation wants to focus on the business domain, automating business processes for our customers. That means thinking as little as possible about the hosting and operational complexities of what executes that logic. The perceived wisdom would be to build a Django monolith, put it on Postgres and run it in a container. We did that and worked until we needed better scheduling of those containers and in-walks K8s.

What's our solution for simplifying our operational platform? It's to pay AWS to do it for us. And now we use Fargate and Lambda for hosting.

Is that simpler? Or exotic and more complicated? There are tradeoffs, which will vary from team to team and organisation to organisation. To one team, slapping functions in Lambda and knowing how to support them are more straightforward than a team of experts who know how to operate monoliths at scale.

For me, the real conversation is about something other than simplicity. Simplicity is subjective. It's about familiarity and competency. Pick what your organisation is familiar with or competent with (or has the ability find people who are...)


I grant him his point about simple architectures but now I'm interested in more technical details about Wave's architecture? He mentioned synchronous Python... everything about that sentence goes against what I've learned about network programming. Python is single-threaded, and blocking I/O means that any APIs built with it couldn't handle other customers waiting. Unless they're processed so fast that there's no appearance of delays?

Something that seems apparent with handling money in accounting systems (and trading systems) is you can't really do concurrency any way. If someone changes their account you need to lock those database records from access to any other thread and make everything atomic. Otherwise it can lead to race conditions where the same funds can be spent multiple times (this is something that has effected Bitcoin exchanges before quite ironically.) So maybe they use a sequential design because it makes more sense from an accounting perspective and eliminates the potential for race conditions? But then I'm still very curious about how performant such an approach is?


> We’re currently using boring, synchronous, Python, ... We previously tried Eventlet, an async framework ..., but ran into so many bugs ...

I had a similar experience using async Rust to make a boring HTTP server. Debuggers can't trace across `await`, so debugging was a very slow manual process. Also, I wasted a lot of time dealing with borrow-checker errors.

I finally gave up and tried using Rust HTTP servers that let you write threaded request handlers, but there was only one (Rouille) and it had show-stopping problems. So I wrote a good one:

https://crates.io/crates/servlin

You can use Servlin to make a boring HTTP server in Rust, with threaded request handlers (no async). I use Servlin to serve https://www.applin.dev , running on Render. I'm also using Servlin (and Applin) to build a mobile app.


When I think financial services and simple architecture, I think of Python, ORM, Kubernetes, Cloud, GraphQL and admission of "data-integrity bugs" and "paying retail public cloud prices".

The article itself mostly discusses tech choices rather than architecture. Half of their choices, they seem to regret.

On the service itself: It seems detrimental for African countries (or any country) to allow external financial services to get ingrained and allow them to milk their own populace and prevent the development of state-owned, cost-neutral, non-profit, long-term solutions for the public good. Payment should be public infrastructure, not private profiteering.

Is anyone here donating Dan $180k/year (+VAT) for his blogging?


Hmm I dunno, this architecture feels simple in some ways but genuinely pretty complex in others. Like they defined their own protocol? That's pretty not-simple.

It's like the StackOverflow example. Sure, the architecture is conceptually simple, but you still bought servers and are maintaining them yourself. That's pretty complicated unto itself. Probably the right decision, but not the "simplest".


This is the thing about simple things, usually to maintain a simple "face" they require quite a lot of sophistication on the inside. Like the stackoverflow example, they have a simple architecture, but to make that work they did a lot of very low level optimizations, have a deep understanding of their stack and database, and also hosted their own servers. Basically, it takes a lot of skill to build something "simple".


Interesting article on simplicity because halfway through is a massive plot twist in which author attempts to justify using GraphQL and k8s.


But he does attempt to make a point about that in the end. I don't agree with the point, but at least he tried.


Agree with this article 100%. At my old company we chose to write our server in Go instead of Java. We ended up regretting it because it was significantly harder to hire developers who could write in Go vs. Java.

Later, I read a Paul Graham essay where he says that a company should choose its programming language based on how fast they can iterate in that language. And it clicked for me.


When did this happen? From what I head and what I see go is starting to get quite popular. Not Java level of course but it shouldn't be impossible to hire go devs now. Especially that it's not the hardest language to learn.


Our stack is a NestJS server, Handlebars templates for SSR, htmx and a PostgresSQL database. Deployed as a container on GCP Cloud run with a Github action. It's delightful to work on and very responsive in a way that a React app with the same data load wouldn't be. At this point it would take a pretty massive reason to get me to adopt the SPA approach again.


I usually also use Cloud Run for running my containers. But I use Cloud Build instead of GH Actions. I'm curious, what's your reasoning? I prefer my repository to just hold code and run build actions close to where it's deployed.


For me, I had use GH Actions before so it was pretty quick to get up and running. I am sure it was not that much more effort to use a cloud build tool in GCP or AWS but honestly don't see that much of a pro/con.


I guess in my view depending on GH actions means if it's down I'm blocked but if it's just a repo I can always point my build system to a different one. I have never experienced GCP being down and would bet it has better overall uptime than GH...


You’re right, I miss spoke, I use Cloud Build triggered by a GitHub merge to main or staging. So not an action, a trigger from GitHub to GCP.


FastAPI, Jinja, htmx and Postgres here. Simple to add features, not sure I would use a SPA or other similar JS heavy framework unless I had a feature requiring it.


Old and boring doesn't always mean simple. New ideas can simplify the architecture as well (the author cites GraphQL as an example). Agree that unnecessary complexity should be avoided.

This talk explores the tradeoffs: https://www.youtube.com/watch?v=SxdOUGdseq4


I really think Clojure should be recognised as boring tech more than it is. It seems to have drifted out of public discourse in the last couple of years, which is a shame because it's a rock solid, simple, predictable language that would make an excellent choice for many use cases.

It's just a shame that the syntax will forever be a barrier to widespread adoption.


I don't think the syntax is the main consideration with Clojure.

It's an opinionated language: dynamic, with immutable data structures, hosted on the JVM. I like it quite a bit, personally, and it's been adopted in places where it makes sense.

But many-to-most projects are better off picking another runtime, and the syntax has little to do with why.


IMO: Part of boring tech is that it's easy to learn and therefore, usually, easy to hire for.

Clojure ticks neither of those boxes, unfortunately.


In my opinion the best antidote to overly complex architectures is to have engineering teams and the engineers in them be rewarded based on actual business outcomes.

I suspect the era of VC-money and ZIRP led to a large number of engineers who were disconnected from successful long-term business outcomes, so there was no incentive to simplify.


What is this obsession on HN about "everything can explained with the end of ZIRP" (zero interest rate policy)? Really: Even with overnight rates at 5%, the returns are still awful compared to even moderatly successful VC firms. And, I do not write this as a fanboi/gurl for VCs. (Many have shitty, immoral business practices to enrich themselves at the expense of hard-working non-executive staff.) Also, "end of VC-money": No such thing. They are practically bursting at the seams with uninvested money gathered during ZIRP.


Are you suggesting that the macroeconomic environment in terms of funding and performance expectations of startups hasn't changed signficantly in the last few years?

VC-money and ZIRP is a convenient shorthand for what's changed, neither are terms I think particularly encapsulate what did change, but I think it's very hard to argue that it's still just as easy to get a bunch of funding without a business model. (aside from in AI perhaps?)


This only makes sense if the engineering teams own the process from end to end, at which point they stop being engineering teams.


A sales team gets measured on the sales they make - but they don't own the end to end process of building the thing that they're selling.

It's entirely possible to measure teams on business outcomes without having them own things end-to-end, but to be really effective in this they'll need to collaborate with the other teams across the company - which is generally a desirable outcome.


I agree.

In my experience, this exists in a few, rare (investment bank) trading floor technology teams. From a distance, it looks like traders who are programmers.


Good article! I like that it suggests a pragmatic approach, and introduces complexity only when it's necessary and makes sense to do so. Often engineers advocating for KISS end up (poorly) reimplementing the complex version, when just adopting the "complex" alternative would've avoided them all the development and maintenance headaches.

That said, "simple" and "complex" are subjective terms, and they're not easily defined. Even when everyone aligns on what they are, complexity can still creep in. So it takes constant vigilance to keep it at bay.

In the infinite wisdom of grug[1]:

> given choice between complexity or one on one against t-rex, grug take t-rex: at least grug see t-rex

> grug no able see complexity demon, but grug sense presence in code base

[1]: https://grugbrain.dev/#grug-on-complexity


I don't like this grugbrain approach because complexity is subjective, as you mentioned, and the grugbrain mentions simply saying "no" instead of telling a dev to articulate why something shouldn't be in the codebase properly.

I mention this because I've been a lot of times in a place where instead of adopting "well maintened library that everyone is using since more than a decade" I had to write and maintain my own solution just to appease someone else because they thought it would be less work (it wasn't, maybe for them, definitely not for me).

We need less posts in software development that people try to argue in absolutes.


I mean, I think that grug also agrees that saying "ok" to complexity makes sense sometimes. :)

I agree that this is a deep topic that requires a nuanced discussion, but my point is that I like the pragmatic approach mentioned in the article. You see threads here arguing that GraphQL and k8s are not simple, which is true, but I think it's wrong to hardheadedly avoid these technologies because of their "complexity", than to reach for them when the right time comes to use them.


Yes, I agree with you, nuance is key. I just have bad experiences of people throwing me sentences stolen from these types of articles as an argument of "why I should not do X" and then these people just never do anything throughout the development cycle just keep arguing as dead weight and failing to deliver cycle after cycle. Sorry, lots of trauma here.


I got told the other day that an ecommerce to ERP integration I built 8 years ago at a company I don't work for anymore, is still running without so much as a reboot. Apparently it is on it's third different website with no changes.

Linux, Bash, Python FTW. Everything was designed to be as simple as humanly possible. I mostly built it in my spare time too.

I once showed it to a real developer and he was horrified that some long messages that had to be sent were hardcoded into a file and concatenated to the one bit that changes...it was my prototype that worked so well I never changed it.

It only ran every 15 minutes, since all the systems that depended on the order ran less frequently than that. Everything was handled sequentially, and quite deliberately so.

It actually had a lot of functionality it gained over a couple of months, it had to print some stuff, detect fraud, fix addresses etc.

My favourite bit is that several companies had offered to build it for £mega and decided it couldn't be done! I wish I could describe it in full...


I get Dan's desire for minimalistic styling but a little margin around the page would be nice for readability purposes.

But on a more contextual note: I would contend that microservice architectures are in effect much simpler than monoliths. The UNIX philosophy of small tools that do one thing well is still very applicable. It's easier to deploy small changes quickly than the massive deploys that could take significant amounts of time. Deploying to a FaaS is quick and easy for almost all things.

If your product requires complex coordination/orchestration just as an operational modality then I'd question the value of that product design regardless of architecture type implementation, even for web-scale products. AND this is really a different discussion.


> The UNIX philosophy of small tools that do one thing well is still very applicable. It's easier to deploy small changes quickly than the massive deploys that could take significant amounts of time.

The UNIX philosophy was conceived in the context of a single computer—with dynamic linking the cost of splitting out a new binary is close to zero, so making each module boundary an executable boundary makes some degree of sense.

In the context of a cloud computing network, there's a lot more overhead to splitting out a million tiny modules. Cloud service provider costs, network latency, defending against the possibility of downtime and timeouts—these are all things that UNIX programs never had to deal with but that are everyday occurrences on the cloud.

The trade-offs are very different and so the UNIX philosophy is not equally applicable. You have to weigh the benefits against the risks.


> In the context of a cloud computing network, there's a lot more overhead to splitting out a million tiny modules.

Don't get me wrong - this can be taken to an extreme and from my experience the places where MS seems to fail (the service is not used, or hated by the engineering teams) is in scenarios where it was absolute overkill.

> The trade-offs are very different and so the UNIX philosophy is not equally applicable.

There is a reasonableness standard that exists in both contexts and applies to both in the same way.


> There is a reasonableness standard that exists in both contexts and applies to both in the same way.

It applies to both in the same way in that there exists a definition of reasonable and exceeding it would be bad. But those definitions of reasonable are going to be completely different because the environments are completely different.

I guess my point is that invoking the UNIX philosophy as such brings to mind mostly tools whose scope would frankly be far too small to justify deploying a whole cloud service. It would be better to treat deployable units on the cloud as something new, rather than trying to treat them as if they were a logical extension of single-node UNIX executables.


This post read like a “we want an operationally simple setup,” until the bit about selecting Kube as an orchestration technology. I think it’s still a service oriented architecture and still has to reason about asynchronicity given it uses celery and queues.


I wonder if new serverless platforms, like Modal [0], are even simpler than monoliths. I've been reading some books on software engineering, and it seems clear that network calls, multiple services, integration points, etc. all cause complexity.

However, I do wonder if things like Modal allow you to focus only on writing business logic. No dealing with deploys, network comms, memory, etc.

(No, I don't work at Modal, just wondering "what comes after monolith" -- Modal, Temporal, etc. all seem quite good)

[0] https://modal.com/


> Default GQL encoding is redundant and we care a lot about limiting size because many of our customers have low bandwidth

I'd love to know how you ended up combating this! I'm assuming something like graphql-crunch[0] or graphql-deduplicator[1], but I'd love to know what worked well in practice.

[0]: https://github.com/banterfm/graphql-crunch

[1]: https://github.com/gajus/graphql-deduplicator


I am very glad of the architecture we use at my current company. This is a monolith, BUT with the capacity to be deployed as microservices if needed.

The code is structured in such a way that you only start the sub services you need. We have node that launch almost all services, some only a few.

If we need to scale just a particular part of the system we can easily just scale the same node but configuring it just for the sub services we need.


> Our architecture is so simple I’m not even going to bother with an architectural diagram.

What's described is far from a simple architecture. Synchronous python? Kubernetes? I'm guessing there's a legacy monolith with a few microservices or load-balancers? Then... GraphQL... why? This is just a hodgepodge of stuff in dire need of actual refactor.


Arguing for simple architectures is hard.

Everyone wants to argue that yes, they ARE going to need it. Maybe not now but someday, therefore they should have it.

Always reminds me of this scene from Arrested Development.

https://www.youtube.com/watch?v=VVtOkX-gUxU&t=132s


I've been in the situation where someone argued we didn't need something so go simple - turns out they were wrong (isn't hindsight nice) but now the system was based around this incorrect assumption and really hard to rework into the thing we needed.

This is why architecture is hard. It is about making the decisions in advance that you will regret getting wrong.


> now the system was based around this incorrect assumption

How big of a userbase and what made it particularly difficult to refactor?


I work on embedded systems so userbase isn't really a concern.

However we have > 10 million lines of C++, and the change would have been a major refactoring of most of them.


So I'm assuming this decision was made quite a while ago? Years, maybe even over a decade?


It took a decade to find the data to show that this decision we made 10 years ago was bad - for those entire 10 years we all thought it was good until we we looking at a performance issue and realized the roots case what a decision made in the early days had an unexpected side effect.


Congratulations on your success. Legacy code is never easy to work with, but it's a good problem to have because it means you survived where hardly anyone does.


10 years ago this was a greenfield project that was just getting the first release. Today it is a legacy project that is bringing in a ton of $$$.


I like the article and am a big proponent of simple architectures.

But.

The described use cases have homogenous workloads. Hence simple architecture. Extrapolate the rest.

And of course there are for sure cases of people increasing complexity for the sake of being needed. That's a topic for sociologists/psychologists to reason about, not for engineers.


Microservices aren't just about breaking apps into pieces; they're about making each piece independently scalable, deployable, and manageable, which is huge for continuous deployment and integration. Sure, they add complexity, but for big, dynamic projects, that trade-off is worth it.


I couldn't agree more. Even when you decide to use a dead simple cloud provider (fly.io for example), you still have to spend time to understand:

- the CLI

- the cost model

- the cost optimizations

I'm tired of this endless looping. A cheap VPS + Docker is enough and in most of cases (it's even easier to scale/migrate)


It pains to see the need for a microservice to even start thinking about system architecture. As if the additional database and additional set of classes could not be done on the same instance. And then everyone shrieks in pain when they see the monthly costs :)


Let's create an awesome-simplicity on github.

A repo for not hyped, dead simple solutions, to work with.


Discussed at the time:

In defense of simple architectures - https://news.ycombinator.com/item?id=30936189 - April 2022 (189 comments)


LEPP stack is very effective imho - Linux, Nginx, Python, Postgres.

I've solved a lot of the issues he brings up - gevent, no orm - well-known transaction boundaries, etc. It's pretty pleasant to work in.


Im always interested in comparison between top adult sites and netflix


I don't understand this comment. Can you explain more?


Netflix always writes those engineering blogs about their fancy microservices and infra

Meanwhile adult sites probably run on some boring php and serve orders of magnitude more content (harder to cache) while being free and top visited sites in the world


Very much clicked for me.

Loved this line:

"I’ll discuss a few boring things we do that help us keep things boring."


> Our architecture is so simple I’m not even going to bother with an architectural diagram. Instead, I’ll discuss a few boring things we do that help us keep things boring.

Maybe that is why everyone goes with the the latest and greatest sophisticated techniques that are popular on the conference circuit. At least it has a diagram.


That is indeed part of the problem, complex architectures sell better. The problem comes when you have to deliver. But you can always blame the team.


"Simple" is an interesting term. I've always found it to be very relative to what someone's actual experiences and knowledge bases are.

For example, virtual machines are something I and probably most other people consider "simple". They're pretty much just little computers. For the most part, I can just work with VMs the same way I would work with computers. No sweat.

My DevOps pedigree, however, also means I consider Docker containers and even some light k8s stuff as pretty "simple" - when these two things feel to other people like massive overcomplications.

On the other hand, I'm pretty new to GraphQL, and it feels to me vastly more confusing than good old fashioned REST API endpoints, where I can crack open the code and actually see exactly how that particular endpoint works. I am mostly alone in my org when it comes to this feeling, because I haven't digested the abstractions that make GraphQL work as well.

I don't really have a good answer then for what simple is beyond "I know it (kinda) when I see it".


Very good and important observation. In his talk "Simple made easy" [1] Rich Hickey defines simple as opposite of complex and easy as opposite of hard.

The easiness is relative (as you described) and depends on the things you are familiar with. For example, Docker containers and k8s stuff is easy (for you), and GraphQL is hard (for you).

The simplicity should be assessed (somehow) more objectively.

[1] https://www.youtube.com/watch?v=SxdOUGdseq4


Even his website follows simple HTML stack architecture


Gall's law:

"A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system."

It's quite funny how unknown this quote is. Imagine all the developers that would be out of work if we had simpler systems. It's hard to admit, but developers get more jobs when their work is complicated, it's the hostage taker syndrome.

Software as a thing, is pretty recent, it's important to remember it has a large Silicon Valley influence, and software quality is just unregulated because of freedom of speech warriors.

The software we have today, is just the combination of good government funding and marketing maniacs.


Look I love the http://motherfuckingwebsite.com and think it's hilarious and I totally agree with the underlying message, to cut out a lot of the shit and just keep the web simple sometimes, but is it really necessary to have absolutely no style whatsoever which makes reading this blog post an absolute chore. It's so off putting that I can't bring myself to read it. Anyone else or just me?


What's Wave?


https://www.wave.com/en/

Fintech targeting Africa


is there a patreon tier that adds css margins? if you’re gonna beg for handouts you could at least make the site readable


Just hit the reader view button.


Wall of text


Reader view


and by simple architecture that includes graphql, k8s, and custom network protocol! LMAO, LMFAO!!


Couple thoughts:

> Despite the unreasonable effectiveness of simple architectures, most press goes to complex architectures

We're over-simplifying these terms. What's simple in one way is complex in another way. It makes for great blog posts to use generic terms like "simple" and "complex" for broad-ranging and highly technical topics, but it means nothing. Be specific or you're just tilting at windmills.

> The cost of our engineering team completely dominates the cost of the systems we operate.

Could this be because you decided to invent a new kind of truck, thus requiring lots of costly engineers, when getting two mechanics to weld a crane onto a flatbed would have done the trick?

NIH is rampant in tech. Everybody thinks they need to invent something new when they can't find off-the-shelf parts that do what they want. When in reality, just making their system a bit more complicated in order to accommodate off-the-shelf would have been faster and cheaper. Sometimes more complex is simpler.

Later on in the article Dan mentions build vs buy, but it's not that simple. You can buy and then modify, and you can build in a way that's cheaper and easier, if uglier and more complex. Design and engineering decisions aren't binary. There are more options than just "only use a complete product" vs "build your own everything", and more business and engineering considerations than just time to market and "scale".

> Rather than take on the complexity of making our monolith async we farm out long-running tasks (that we don’t want responses to block on) to a queue.

See, here's the thing with "simple" vs "complex". It's not "simpler" to use a queue than async. Both are complex. Queues may look simple, but they're not. Async may look complex, and it is. You didn't pick simple over complex, you picked something that looked easy over something that looked hard. You made a bet. Which is fine! But let's not pretend this is somehow good design. You just ignored all the complexity you're going to run into later when your queue becomes a problem. Maybe it won't be a problem? But maybe async wouldn't have been a problem at scale either. "Maybe" doesn't equal "good design", so let's not pretend it does (re: the idea that it's simpler and thus better).

> Some choices that we’re unsure about [..] were using RabbitMQ [..], using Celery [..], using SQLAlchemy [..], and using Python [..]. [...] if we were starting a similar codebase from scratch today we’d think hard about whether they were the right choice

This is a sign of a lack of engineering experience. I don't mean that as a diss; I only mean that the correct choice (of these options) is clear to somebody really familiar with all these options. If you don't have that experience, you need to go find someone that does and ask them. This is one of the best reasons why you don't want to engineer something yourself: you probably aren't the best engineer in the world, and so probably don't know the kind of problems you are going to run into. However, if you are engineering something yourself, and don't know the right answer, you need to go ask somebody (or many somebodies). If you were inventing some new kind of device that nobody had ever invented, maybe nobody knows? But you probably aren't doing that, so somebody does know. And if for some reason you can't or won't go ask somebody, then you should use the thing that you[r team] is most familiar with and already know all the pitfalls of, regardless of nearly any other downside from its use. Unexpected problems are a much bigger thing than expected problems.

> By keeping our application architecture as simple as possible, we can spend our complexity (and headcount) budget in places where there’s complexity that it benefits our business to take on.

And this is still good advice. But, again there is no such thing as a binary choice of "simple vs complex". It's all shades of gray.


> our architecture is a standard CRUD app architecture, a Python monolith on top of Postgres

The rest of the article goes into why the above is not the case.

The language was chosen because it was the CTO's pet, not for simplicity's sake. It wasn't the right choice: "its concurrency support, performance, and extensive dynamism make us question whether it’s the right choice for a large-scale backend codebase"

Synchronous/blocking wasn't chosen for simplicity's sake - the async libraries were buggy! To work around the performance issue:

1) a "custom protocol that runs on top of UDP" was written. No thanks.

2) putting work onto a queue. Event-sourcing, anyone?

> we’re having to split our backend and deploy on-prem to comply with local data residency laws and regulations

It's good that the software was a monolith, otherwise it would have been difficult to split apart /s.

Software is incidental complexity and essential complexity. If you shun certain things as 'too complicated', but they end up being in the essential complexity bucket, you're just going to be building them yourself, slower and with more bugs.

Imagine how different the article would have been had they picked technology whose concurrency, performance and typing worked in their favour.


Which language with those characteristics will have the same package ecosystem as Python so that your engineering team isn't left constantly reinventing the wheel? Do you think it's worth rebuilding stable packages that do a job well just because you don't think the language it was written in was perfect?


Python has good library support, but sometimes it seems like Python advocates speak as if it is somehow uniquely equipped with libraries that nobody else has. There's a dozen other languages with well-established libraries for everything mrkeen mentioned, as well as healthy ecosystems that can be completely expected to support everything else a financial app may reasonable expect to need.

Python wasn't a bad choice but even several years ago it wouldn't even remotely have been the only viable choice, and if you ignore "what the people involved already knew" probably wasn't the best. In real life, you can't ignore that and it looms large in any real decision, so I'm not proposing that should have been the metric. But Python's disadvantages can be pretty significant. "Dynamic typing" and "deals with money" is a combination that would make me pretty nervous. YMMV.


> But Python's disadvantages can be pretty significant. "Dynamic typing" and "deals with money" is a combination that would make me pretty nervous. YMMV.

This is an interesting comment. I do not disagree. In your opinion, if not Python, then what languages would be acceptable to you?


Starting today: Go, C#, Rust, possibly TypeScript (not my personal choice but probably acceptable), Java (which despite being poorly done in a lot of places does have the tooling necessary for this if used by decent programmers). C++ with a super-strong static analysis tool like Coverity used from day one. (I consider C or C++ plus something like Coverity to be essentially a different language than using just a C or C++ compiler.)

And that's sticking to reasonably mainstream languages. F#, D, Scala, Haskell, Lisp with strong typing around the money parts (or Clojure), probably a couple of others as well. These have problems with having smaller communities and being harder to hire for, but I'd bet that pretty much all of them still have well-supported libraries for nearly everything a financial company would need to do. (And that you could expect general open source libraries for. Super-specific internal financial protocols may not have libraries in general, or may only be in Java, etc.)

Also I make no exclusivity claims that these are the only appropriate languages, so if you've got a favorite statically-typed language feel free to just mentally edit it in without asking me how I could possibly miss it. (I'm not familiar enough with the Objective-C family to have informed opinions about it, for instance.)

Dynamically-typed languages with progressive typing used rigidly around the money might be OK, but progressive typing makes me nervous and really seems like a bodge in a lot of places. Reading people's experiences with them, even when the person writing the experience is nominally positive on the experience, has not generally impressed me.


In my experience writing software that performs financial calculations, it is easy to screw up in any language. I'm not sure that any of those languages provide a great enough advantage specific to financial calculations, considering all of the other baggage. To state the obvious, most apps that perform financial calculations are not performing financial calculations in 99% of the code.


Well, I was as broad as "statically typed" versus "dynamically typed" for a reason. Dynamically typed languages have a hard time enforcing much of anything. Statically-typed languages can do things like "type Pennies int", or create a structure with a value and attached currency type, or create a structure with a value, currency type, and attached timestamp, or even have all of those at once, while making it practical to have functions that never mix any of them up.

I'm writing a lot of email-related code right now, porting something out of Perl into Go, and because this code supports Google-style "+" addressing, where "anemail+something@gmail.com" and "anemail+other@gmail.com" need to be tracked as separate recipients (as defined by the SMTP protocol) but I also have a slate of functions that need to operate on the mailbox, as identified by "anemail@gmail.com", using some types to keep the two things straight has greatly cleaned up the translated Go code, whereas in Perl it was all just strings and there are constant repeated snippets of code trying to (inconsistently!) translate between the two on demand and with no compiler support as to whether it is being done correctly. I wouldn't be surprised this fixes longstanding and hard-to-diagnose bugs simply by the act of imposing a static type system on top of the mass of dynamically-typed goo. I'm fairly sure I've identified at least one place in the old code that it failed to do the conversion, but in dynamically-typed languages it's hard to even be sure about that.

Statically typed languages do not automatically or magically make dealing with currency easier and safer than dynamically type languages. A static language's compiler will not peep as you pass around an "integer" that represents US pennies, or maybe Euro pennies, or maybe Yen, or maybe Italian Lira as of March 23, 1981. But if you have a need to do so, and a need to do it safely, static languages give you much more rigid and safe tools to do it with.

(You can create a "currency" type in dynamically-typed languages that is a lot smarter than an int, certainly, but you can't use the interpreter to rigidly enforce that it is used correctly, and the various incremental or progressive typing options I've seen generally fail to impress me. The cultural temptation in a dynamically-typed language to allow functions to also just take an int and then "guess" the rest will be overwhelming based on my experience and it takes a strong developer lead to push back against people sneaking that in.)

While you're right that only small bits of the code may be doing calculations, a lot of the app may be dealing with money in some sense. In a statically-typed language used correctly, the money values can at least pass through those parts of the system unambiguously, carrying exactly what they are, and the parts of the system doing the calculation doesn't need to do a lot of defensive programming to ensure that the currency values are actually of the expected type. Code pulling these values out of the database should put them into the correct type, code displaying them to users should be using this information correctly. Not just things "calculating" with the money but anything that touches money values, passes them along, serializes or deserializes them, all of it is cleaned up by having a precise specification of what it is that is maintained all the way through the app.


One answer to your question is C# / .Net.

Combines excellent concurrency primitives, very good performance, and a deep reservoir of packages and libraries.

Of course it's almost never worth switching languages for an existing product, you just get left with a mess of half-old and half-new tooling and invariably get left having to support both languages.


Real question: From your specific comment, are there any advantages of C# vs Java here?


Way worse building, packaging and dependency management experience. Worse ORM experience. Making simple single-file executables requires way more ceremony versus 1-3 toggles, sometimes not even possible at all. A lot of small and large papercuts in places where Java simply does worse, more expensive interop and reaching performance ceiling is way more difficult due to lack of low-level primitives. Terser syntax especially the functional parts of it and more lightweight tooling in the SDK, a lot of ceremony around dealing with just making sure you dev environment works does not exist in .NET (as long as you don’t start putting everything into custom paths). Building a complete product is dotnet publish, running an application is dotnet run, which requires more setup and scripting with Java.


Not the parent, but C# and Java are very similar in their feature set. One thing that I find interesting about the respective communities is, Java tends to develop JSR's where a standard is worked on by multiple vendors and published essentially as an interface. Many vendors then provide an implementation of the interface.

In C# Microsoft often provides the most popular tooling for common problems. For instance, in web development, Microsoft publishes ASP.net and Entity Framework. In Java there is Jax-rs (JSR 339) and JPA (JSR 317).

So depending on your preference for a strong centralized environment provided by a single vendor, or a decentralized standards based solution with many vendors, you might lean towards Java or C#.


I'm not familiar enough with Java to answer that question, you may well feel the same about Java, and it may well be true.


Honestly IMO Python does a good job these days without engineers having to reinvent the wheel. I've used asyncio with Starlette and/or FastAPI in larger-volume systems without much issues. I'm curious what specific issues the author had with async Python, and when that was since I don't see any such issues in my experience.


There's a weird belief that monoliths are a 'simple' architecture and micro services are 'complicated'.

Microservices are simple. Request handlers or queue listeners, transformation and data layers, caching, database connection pools, clients to other systems. They're easy.

Monoliths are where you come across some of the most godawful complicated architectures. Unified object-relational mapping layers. Shared cache systems with per-value QoS support. Hacked runtime hooks that tweak the garbage collection or memory management for particular use cases. Homegrown monitoring stacks and instrumentation systems. External configuration management tools. Custom build tool integrations with Jira.

And the upgrade slogs... when the database engine needs to be updated, or a new version of the underlying web technology is released.

Simple to build and simple to add to are not the same thing as 'simple'.


Monolith is fine if you have a fairly simple process for manipulating your data. Like posting an article, then a comment, with some moderation thrown in.

But when you start adding business rules, and start transforming your data and moving it around, then your monolith will become too complex and often too expensive to run. Lots of moving parts tightly coupled together, long-running transaction wrapping multiple joined tables etc. Rolling out new features will become very challenging as well.

Something like event sourcing is more complex to set upfront than a monolith, but at least it offers the way to add scale and features later without creating a combinatorial explosion...


Ironically, business rules are often much easier done in a monolith, since they tend to require access to basically the entire database, and have impact across your code base.

Not saying it needs to be spaghetti all over the code mind you. Just that it's easier to have a module within the monolith rather than a dedicated service.


Especially that it must be fun hunting down race conditions across microservices. Like, microservices throw away every benefit of a single code base in a single language (and its guarantees). They sometimes make sense, but arguably that sometimes is quite rare.


The conversation in this (overall) thread is dancing around issues of experience,competence, and maturity. And the age ceiling forcefully pushed by people like Paul Graham of this very HN. When your entire engineering team are “senior developers” with 3 years of experience (lol) and most don’t even know what a “linker” does, the fundamental feature of the wunder architecture is obscure and not understood.

Building effective monoliths absolutely demands competent DB experts, schema designers & developers. The problem that microservices solved was the sparsity of this sort of talent when demand overshot supply by orders of magnitude.

(In a way the monoloth vs microservices debates are echoes of the famous impedance mismatch between object graph runtimes and relational tables and DBMSs.)


Why do we need "scale"? The 2nd cheapest Hetzner offering can probably serve a hundred thousand people a basic CRUD app just fine, with the DB running on the same machine. And you can just buy a slightly more expensive machine if you need scale, horizontal scaling is very rarely necessary actually.

Stackoverflow runs on a couple of (beefy) machines only.


This is a fallacy. Adding a network boundary does not make your application less complex. If you can't make a "monolith" make sense, thinking you can do it in a microservices architecture is hubris. If you think long-running transactions / multiple tables are difficult, try doing that in a distributed fashion.


One of the main "problems" with proposing microservices is that trivially, there is nothing a microservice can do that can not be done by a monolith that is designed with discipline. Over the years my monoliths have grown to look at awful lot like a lot of microservices internally, except that they can still benefit from passing things around internally rather than over networks.

(Meaning generally that performance-wise, they clean the clock of any microservice-based system. Serializing a structure, shipping it over a network with compression and encryption, unserializing it on the other end, performing some operation, serializing the result, shipping it over the network with compression and encryption, deserializing the result, and possibly having to link it back up to internal data structures finds it hard to compete with "the data is already in L1, go nuts".)

I've even successfully extracted microservices from them when that became advantageous, and it was a matter of hours, not months, because I've learned some pretty solid design patterns for that.

If you can't design a well-structured monolith, you even more so can't design a microservice architecture.

It's not wise to try to learn too many lessons about what is good and bad from undisciplined, chaotic code bases. Chaos can be imposed on top of any nominal architecture. Chaotic microservices is not any more fun than a chaotic monolith, it's just unfun in a different way. The relevant comparison is a well-structured monolith versus a well-structured microservice architecture and that's a much more nuanced question.


A few comments point out that replacing a monolith with micro services doesn't reduce complexity. I agree 100%.

That's why I mentioned Event Sourcing pattern, not "microservices". Think of a single event log as a source of truth where all the data goes, and many consumer processes working in parallel alongside, picking only those events (and the embedded data) that concern them, reacting to them, then passing it on not knowing what happens later. Low coupled small self-sufficient components that you can keep on adding one next to another, without increasing the complexity of the overall system.

Maybe Event Sourcing/CQRS can be called "microservices done right", but that's definitely not those microservices (micro-monoliths?) everyone is talking about.


ES has the potential but is too immature of a pattern to be simple. It’s a shame, but let’s not pretend.

For instance, an immutable event log is illegal in many cases (PII). So you have to either do compaction on the log or use an outside mutable store.

Another issue is code evolution: if you change your event processing logic at runtime, you get a different state if you replay it. Maybe some users or orders will not be created at all. How you deal with that? Prevent it with tooling/testing or generate new events for internal actions?

Also, all the derived state is eventually consistent (so far so good) but for non-toy apps you absolutely need to use derived state to process events, which naively breaks determinism (now your event processing depends on the cursor of the derived state).

Check out Rama[1]. They’re solving this problem, and it’s super interesting but again let’s not fool ourselves – we’re far from mature and boring now.

Something like it could hopefully become boring in the future. Many of these features could probably be simplified or skipped entirely in later iterations of these patterns.

[1]: https://redplanetlabs.com/learn-rama


"passing it on not knowing what happens later" often is fundamentally not acceptable - you may need proper transactions spanning multiple things, so that you can't finalize your action until/unless you're sure that the "later" part was also completed and finalized.


An individual component participating in a complex operation spanning multiple steps indeed knows nothing about the grand scheme of things. But there will be one event consumer component specifically charged with following the progress of this distributed transaction (aka saga pattern).


> But when you start adding business rules, and start transforming your data and moving it around, then your monolith will become too complex and often too expensive to run. Lots of moving parts tightly coupled together, long-running transaction wrapping multiple joined tables etc. Rolling out new features will become very challenging as well.

Absolutely NONE of that has to happen if you structure your project/code well.


Most CRUD software is little more than the simple processes you describe. At least intrinsically. And if it isn't, it started off being that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: