But with a complex system. How would you know where the bottleneck is if you don’t know how to instrument your entire system and how would you know the possible solutions?
Right. Like that’s going to happen in an environment where shipping code on two weeks sprints is expected. Even in your perfect world where this does happen, it’s not like you could possible know what type of bottlenecks or usage patterns will happen until you get real users using your code.
Are you suggesting we go back to a waterfall approach and not get fast feedback and learn what works as you are developing?
So within those two weeks while you are “researching” how are you going to know real usage patterns with real users? Is your research going to perfectly predict where all of the bottlenecks and optimizations need to be in the entire system?
Are you going to perfectly predict the size and number of VMs that you need? The size of the database? Where your users are and the average latency? Are none of your developers going to mistakes that aren’t apparent until you are running at scale?
There is more to architecting a system than just “code”.
Sure there is more to "architecting a system than just “code”.", which is exactly my whole point.
Performance is a feature, it doesn't get retrofitted. There is only one shot, specially in fixed budget projects.
While a perfect design is an utopia, and there will be surely some unforeseen problems, not designing at all is even worse.
To calculate the initial set of VMs, database size, average users, network latency, you name it, it only requires reading the RFP requirements, having technical meetings with all partners about those requirements, and having a team that knows their stuff around CS.
If it already clear from deployment scenario that at very least 4 VMs will be needed, or that a DB node will need 100 GB on average, it would be very risky just to do on the go.
As for running at scale, that should already be obvious from RFP requirements, unless we are speaking about startups dreaming of being the next FANNG.
MongoDB is a very good example of running at scale without doing the necessary engineering, but they do have a good marketing department to compensate for it.
Performance is a feature, it doesn't get retrofitted. There is only one shot, specially in fixed budget projects.
So you’re saying it’s not possible to add indexes, increase the size of your database, increase the number of read replicas, increase the number of servers in your web farm, reconfigure your database to be multi-master, copy your static assets to a region closer to the customer or add a CDN after an implementation? I must be imagining things that I’ve been doing with AWS....
To calculate the initial set of VMs, database size, average users, network latency, you name it, it only requires reading the RFP requirements, having technical meetings with all partners about those requirements, and having a team that knows their stuff around CS.
So “knowing CS” would have helped us predict at one of my previous companies that our customer was going to more than double in size in less than a year through an acquisition? In fact this has happened at two separate companies. The other company we more than doubled in size and revenue literally overnight.
Will “good CS design” help us predict how successful our sales team will be in closing deals? We are a SAAS B2B company where one “customer” or new implementation from a current customer can increase our volume of transactions by enough to have to increase the number of app servers or with enough implementations increase the size of our database cluster.
If it already clear from deployment scenario that at very least 4 VMs will be needed, or that a DB node will need 100 GB on average, it would be very risky just to do on the go.
So now it’s “risky” to click on a button and increase the size of our web farm by increasing the desired number of servers in our autoscaling group or is it risky to click on another button and increase the size of the VMs in our database cluster? The number of app servers we have for one process goes from 1 to 20 automatically based on the number of messages in the queue. As far as storage space, if we need a terabyte as our client base grows instead of 100GB, I’m sure AWS has some spare hard drives laying around that they can give us. But transparently adding space to a SAN even on prem has been a solved problem for a long time. Even back at a previous company where we would boast to our client that we had a whole terabyte of storage space.
As for running at scale, that should already be obvious from RFP requirements, unless we are speaking about startups dreaming of being the next FANNG
Again, you don’t have to dream of being the “next FAANG”. Mergers and acquisitions happen. Getting new clients happen (hopefully). When you are a B2B company especially when you are a SAAS B2B company with a decent sales team, a sale to a couple of “whales” can mean adding more of everything.
Also the RFP is not going to tell you that your company that you are planning on implementing and hosting a solution for x users will need to be able to scale to handle 2x in a year after a merger closes. Should we have 5 or 10x the capacity now in anticipation for our sales team producing or should we scale up as needed?
MongoDB is a very good example of running at scale without doing the necessary engineering, but they do have a good marketing department to compensate for it.
I had no problem with the scalability of Mongo at a previous company. What type of scale do you think in your experience is too much for Mongo?
Life is beautiful when one does time-and-material projects.
Every half backed release can always be improved later, at customer expenses.
Likewise, not everyone is doing button clicks on AWS to scale their compute center, and a proper knowledge of distributed systems is required in order to do correct scaling.
Mongo DB problems are well known across the interwebs.
I am not going to change your mind, nor you will change mine, so lets leave it here.
Every half backed release can always be improved later, at customer expenses.
So now, it’s an “improvement at customer expense” to add servers and increase the size of servers? How long do you think it takes to do everything I listed to add scale? When I say it’s logging into a website and clicking a few buttons, I am not exaggerating. Of course, in the modern era you modify a cloud formation template but that’s an implementation detail.
Likewise, not everyone is doing button clicks on AWS to scale their compute center, and a proper knowledge of distributed systems is required in order to do correct scaling.
Whether you are button clicking on AWS or using a data center, adding resources is the same. Increasing the size of your primary and secondary databases is the same on prem. It takes more effort and the turn around time is higher to provision resources, but it’s not magic. Everything I listed except for the CDN is something I’ve worked for a team that did on prem. I’m sure a lot of people can pipe in and say they have done similar things on prem or at s colo with Kubernetes and Docker. But that’s outside of my area of expertise.
Mongo DB problems are well known across the interwebs.
I am asking about your personal experience not what you “read on the internet”.