...and everyone knows code has nothing to do with algorithms and data structures...

scarface74 · on April 13, 2019

Well, if you’re doing yet another software as a service CRUD app or another bespoke app that will never be seen outside of a company - like most developers - knowing:

Given a binary tree, return the level order traversal of its nodes' values. (ie, from left to right, level by level).

Isn’t that useful or how to invert a binary tree.

I would much rather you show some competence in the language we are using. Knowing leetCode isn’t going to help if we need an iOS app....

klibertp · on April 13, 2019

Trees - although not necessarily binary ones - are everywhere. If you don't know about them, your CRUD will explode if objects can form a multi-level hierarchy. Don't make too light of CRUD apps - there's complexity there, too.

vips7L · on April 13, 2019

Everything you're saying is true, but for some reason companies still think they need to test you on hackerrank/leetcode like problems. I would much prefer to do some code in the frameworks I claim to know.

scarface74 · on April 14, 2019

Not companies I interview for. The last time I had any type of algorithm type interview was 1999 when I was applying for a job as a low level cross platform C bit twiddler. Since then all of my interviews have been a combination of soft skill, tell us about your experience, white board architectural discussions type of interviews. Of course they asked me technical questions about the language and stack they were using.

pjmlp · on April 13, 2019

Depends very much on what that CRUD app is doing.

Knowing algorithms and data structures pretty well, is the difference between an update button taking a couple of minutes, or milliseconds.

scarface74 · on April 13, 2019

And I would even say that’s not true most of the time. Why is the update button slow?

- Is your customer in Asia and your servers are in us-east-1? Do we need a multi master database one in each region? Can we make even that faster by doing an eventually consistent write?

- do we really need a synchronous update process or can we use queues to make it more consistent?

- is our web server slow? Should we scale horizontally or vertically? Should we use autoscaling and if so which metric should we use? What should our cooldown time be between autoscaling events? Do we need to autoscale across regions? Where is our traffic coming from?

- is our database indexed properly? Did we look at our slow query logs? Did someone do something stupid like have triggers on our database unnecessarily? Is an RDMS the right choice? Do we need to denormalize the table?

- or is it our code?

This is the thought process my manager was looking for when he interviewed me. Not the best way to traverse a tree.

pjmlp · on April 13, 2019

Answer 5. Update implemented as O(n!) for the input dataset graph because the coder didn't knew any better.

scarface74 · on April 13, 2019

But with a complex system. How would you know where the bottleneck is if you don’t know how to instrument your entire system and how would you know the possible solutions?

pjmlp · on April 13, 2019

By doing a proper architecture design and data analysis before writing a single line of code, instead of coding away without any sense of direction.

Delivering a good result out of that process, requires knowing algorithms and data structures tailored at the problem space.

scarface74 · on April 13, 2019

Right. Like that’s going to happen in an environment where shipping code on two weeks sprints is expected. Even in your perfect world where this does happen, it’s not like you could possible know what type of bottlenecks or usage patterns will happen until you get real users using your code.

Are you suggesting we go back to a waterfall approach and not get fast feedback and learn what works as you are developing?

pjmlp · on April 13, 2019

It does happen in two weeks sprints, that is what refinement planning, spike stories and research sprints are all about.

And in case you missed, the large majority of companies that actually moved into agile, nowadays are doing what we could call scrum-waterfall.

Plenty of bottlenecks and usage patterns are already clear from reading the RFP documents and preparing the respective sales pitch offer.

Surely if one codes away without thinking about overall system architecture, like the TDD proponents, then these problems aren't possible to predict.

scarface74 · on April 14, 2019

So within those two weeks while you are “researching” how are you going to know real usage patterns with real users? Is your research going to perfectly predict where all of the bottlenecks and optimizations need to be in the entire system?

Are you going to perfectly predict the size and number of VMs that you need? The size of the database? Where your users are and the average latency? Are none of your developers going to mistakes that aren’t apparent until you are running at scale?

There is more to architecting a system than just “code”.

pjmlp · on April 14, 2019

Sure there is more to "architecting a system than just “code”.", which is exactly my whole point.

Performance is a feature, it doesn't get retrofitted. There is only one shot, specially in fixed budget projects.

While a perfect design is an utopia, and there will be surely some unforeseen problems, not designing at all is even worse.

To calculate the initial set of VMs, database size, average users, network latency, you name it, it only requires reading the RFP requirements, having technical meetings with all partners about those requirements, and having a team that knows their stuff around CS.

If it already clear from deployment scenario that at very least 4 VMs will be needed, or that a DB node will need 100 GB on average, it would be very risky just to do on the go.

As for running at scale, that should already be obvious from RFP requirements, unless we are speaking about startups dreaming of being the next FANNG.

MongoDB is a very good example of running at scale without doing the necessary engineering, but they do have a good marketing department to compensate for it.

scarface74 · on April 14, 2019

Performance is a feature, it doesn't get retrofitted. There is only one shot, specially in fixed budget projects.

So you’re saying it’s not possible to add indexes, increase the size of your database, increase the number of read replicas, increase the number of servers in your web farm, reconfigure your database to be multi-master, copy your static assets to a region closer to the customer or add a CDN after an implementation? I must be imagining things that I’ve been doing with AWS....

To calculate the initial set of VMs, database size, average users, network latency, you name it, it only requires reading the RFP requirements, having technical meetings with all partners about those requirements, and having a team that knows their stuff around CS.

So “knowing CS” would have helped us predict at one of my previous companies that our customer was going to more than double in size in less than a year through an acquisition? In fact this has happened at two separate companies. The other company we more than doubled in size and revenue literally overnight.

Will “good CS design” help us predict how successful our sales team will be in closing deals? We are a SAAS B2B company where one “customer” or new implementation from a current customer can increase our volume of transactions by enough to have to increase the number of app servers or with enough implementations increase the size of our database cluster.

If it already clear from deployment scenario that at very least 4 VMs will be needed, or that a DB node will need 100 GB on average, it would be very risky just to do on the go.

So now it’s “risky” to click on a button and increase the size of our web farm by increasing the desired number of servers in our autoscaling group or is it risky to click on another button and increase the size of the VMs in our database cluster? The number of app servers we have for one process goes from 1 to 20 automatically based on the number of messages in the queue. As far as storage space, if we need a terabyte as our client base grows instead of 100GB, I’m sure AWS has some spare hard drives laying around that they can give us. But transparently adding space to a SAN even on prem has been a solved problem for a long time. Even back at a previous company where we would boast to our client that we had a whole terabyte of storage space.

As for running at scale, that should already be obvious from RFP requirements, unless we are speaking about startups dreaming of being the next FANNG

Again, you don’t have to dream of being the “next FAANG”. Mergers and acquisitions happen. Getting new clients happen (hopefully). When you are a B2B company especially when you are a SAAS B2B company with a decent sales team, a sale to a couple of “whales” can mean adding more of everything.

Also the RFP is not going to tell you that your company that you are planning on implementing and hosting a solution for x users will need to be able to scale to handle 2x in a year after a merger closes. Should we have 5 or 10x the capacity now in anticipation for our sales team producing or should we scale up as needed?

MongoDB is a very good example of running at scale without doing the necessary engineering, but they do have a good marketing department to compensate for it.

I had no problem with the scalability of Mongo at a previous company. What type of scale do you think in your experience is too much for Mongo?

pjmlp · on April 14, 2019

Life is beautiful when one does time-and-material projects.

Every half backed release can always be improved later, at customer expenses.

Likewise, not everyone is doing button clicks on AWS to scale their compute center, and a proper knowledge of distributed systems is required in order to do correct scaling.

Mongo DB problems are well known across the interwebs.

I am not going to change your mind, nor you will change mine, so lets leave it here.

scarface74 · on April 14, 2019

Every half backed release can always be improved later, at customer expenses.

So now, it’s an “improvement at customer expense” to add servers and increase the size of servers? How long do you think it takes to do everything I listed to add scale? When I say it’s logging into a website and clicking a few buttons, I am not exaggerating. Of course, in the modern era you modify a cloud formation template but that’s an implementation detail.

Likewise, not everyone is doing button clicks on AWS to scale their compute center, and a proper knowledge of distributed systems is required in order to do correct scaling.

Whether you are button clicking on AWS or using a data center, adding resources is the same. Increasing the size of your primary and secondary databases is the same on prem. It takes more effort and the turn around time is higher to provision resources, but it’s not magic. Everything I listed except for the CDN is something I’ve worked for a team that did on prem. I’m sure a lot of people can pipe in and say they have done similar things on prem or at s colo with Kubernetes and Docker. But that’s outside of my area of expertise.

Mongo DB problems are well known across the interwebs.

I am asking about your personal experience not what you “read on the internet”.