I'm the senior dev on my team, and whenever a new dev joined my team they would look at the codebase and go "ew, python2? Just use python3."
That gave me a chance to explain the testing and refactoring cost that would come with changing python versions, and how the benefits to users would be almost zero. And then at some point one of the new juniors said, "hey, there's a lot of filesystem performance improvements and readability improvements (f-strings) in 3. I can get the test/refactor done in a month, and I think it's a net positive." They were right, and now we're on python3.
The reverse of this also happens: new team manager joins a team of 4-5 dev and goes "eww... a monolith, we'll write an MVP in 3 weeks with microservices, CQRS and all".
Long story short, one year and a half passes and the mvp is still not finished, the architect leaves the company and some poor guys (from an outsourcing company) are still going at it with the same architecture.
CQRS and microservices has so much overhead, I'm amazed at how many companies adopt microservices without having anyone know a single thing about distributed systems. I think people underestimate the glue required to make microservices be useful. I had a similar situation at my last company where they spent a year and a half converting their monolith into half ass microservices and it still wasn't working properly. They also stopped releasing new features on their product during this entire conversion which obviously lead to a drop in customers. The mass exodus of employees at the end was something beautiful though.
I would argue that main reason for adaptation of microservices is to actually prolong development time (moar dollars) and make devops as convoluted as possible to safeguard data. Or foolishness.
I don't understand the hate for microservices on hn. I have found microservices are a great way to extend monoliths developed with legacy frameworks. There are so many pros with the cons easily avoidable with the right tooling / processes.
I just started working on a new project about half a year ago, completely greenfield. The backend developer (there is only one...) jumped into microservices directly, deploying on AWS Fargate, trying to split out as many things into containers as possible, because that's the "proper" way of doing it. We still have few users as the project is new, but hey, at least we could scale to 100000s users in a few minutes, instead of having easier logging, debugging and versioning
A lot of stuff in software engineering is just driven by what's popular, not what's needed.
A monolith can handle millions of daily unique users. The database is the hard part.
In the web world, if you have less than 10s of millions of daily users, as long as you design an application that holds no state itself, the architecture is usually more important for scaling your team and the size of your codebase rather than the number of users you can handle.
I've been in discussions about internal applications with few hundreds users and very moderate amounts of data being shuffled and stored and some people, of various backgrounds, are just so convinced that we need to go all in on Kubernetes, Microservices, Istio etc.
And all I can think about is "Hey, you could build this with a small team as a simple monolith and with proper caching you could probably run this on one or two raspberry pi, that is the amount of power you actually need here".
Don't get me wrong I do think they absolutely have their place and in other parts of the company we have much larger software development projects and they are absolutely making great use of Microservice architectures and Kubernetes and is getting a lot out of it. But that is 100+ teams building a product portfolio together.
> And all I can think about is "Hey, you could build this with a small team as a simple monolith and with proper caching you could probably run this on one or two raspberry pi, that is the amount of power you actually need here".
If your product has a global audience who needs to CRUD stuff, caching and a single raspberry pi won't get you very far in the game.
If it's just an intranet stuff with a few hundred users and your front-end isn't very chatty then you're right, you don't need much.
If you don't pile abstractions on top of abstractions on top of kubernetes on top of docker, you'd be surprised how much you can do with a single small-sized instance.
I'm not talking about performance overhead, I'm talking about architecture overhead. Kubernetes doesn't have any advantages over not using Kubernetes for simple applications. Reproducible dev environments and CI you can get easily without Kubernetes, without having to add complex solutions for logging, profiling and other introspection tools.
One could argue that you need the reproducible dev environments and CI to be solved BEFORE even start using Kubernetes.
> Kubernetes doesn't have any advantages over not using Kubernetes for simple applications.
So out-of-the-box support for blue/green deployments and fully versioned deployment history with trivial undo/rollbacks are of no advantage to you?
> Reproducible dev environments and CI you can get easily without Kubernetes, without having to add complex solutions for logging, profiling and other introspection tools.
I'd like to hear what you personally believe is a better alternative to kubernetes.
And by the way, Kubernetes does not support not requires distributed tracing tools not "logging, profiling, and other introspection tools". That's somethings entirely different and separate, and something that you only use if for some reason you really want to and make it your point to go out of your way to adopt and use.
In fact, distributed tracing is only a thing not due to kubernetes but due to you operating a distributed system. If you designed a distributed system and get it up and running somewhere else, you still end up with the same challenges and the same requirements.
Apparently HackerNews used to run from a single application server + Cloudflare caching, but later moved away from Cloudflare. Unsure how many web servers it runs on now. [0]
In 2016, Stack Overflow ran on 11 IIS web servers, but they really only needed 1.
At least that is what I often think, when I hear people describing micro-services. If there is no data sharing between computations, then the problem is embarrassingly parallel [1] and thus easy to scale. The problem is not the monolith, it is the data sharing, which micro-services only solve if each service owns it’s own data. To be fair people advocating micro-services also often argue that they should own their data, but in quite a few of the instances I’ve heard described, this is not the case.
> With microservices, you lose the ability to use database transactions across services/tables. Later on, when people realize this, it's too late.
I don't follow your reasoning. I mean, the ACID vs BASE problem you described is extensively covered in pretty much any microservices 101 course or even MOOC, along other basic microservices tradeoffs such as the distributed tax and ways to mitigate or eliminate these issues like going with bounded contexts. Why do you believe this is a mystery that no one is aware of?
I've seen a couple of projects where the original team didn't think they needed transactions, or underestimated how much harder eventually consistent distributed systems are to reason about than an acid system.
It doesn’t matter if you have transactions or not. Just make sure you execute things in the right order! Same for referential integrity, if you just make sure nothing ever goes wrong, there’s no need for it! /s
Sequences of interactions with the DB which are either atomically committed to on success of the sequence, or rolled back on failure, so that the DB is in the same state it had before the interactions began.
One of the fundamental laws of microservices is that each service is responsible for storing its own state. Nothing else is allowed access to its backing store.
Given that, once you have more than one service in your architecture, you cannot coordinate transactions across the distinct storage mechanisms - they are distinct databases and are therefore subject to the CAP theorem and other complexities of distributed computing.
Of course, nothing's stopping you from putting all your features into one service to avoid this thorny problem. It's a wise approach for most of us.
When you do that you have a monolithic architecture, not a microservice one.
If your application is stateless following your definition then it's trivial to break the monolith down to microservices that focus on a bounded contexts while sharing a common db. This is a very well known microservices pattern.
Yes, and what does that buy you? Instead of a single deployable unit that owns the database, there are many deployable units, none of which can own the database. That doesn't seem like an improvement.
Martin Fowler lists the benefits of microservices as
Strong module boundaries
Independent deployment
Technology diversity
And with a single database you severely limit the benefits from strong module boundaries and independent deployments. Suddenly you have to synchronize deployments, and anyone can pull or change any data in the database which now must be enforced with discipline which gets you into the same boat as a monolith.
You can still wind up with distributed state errors. Ex:. User requisitions a billable resource, request hits node A. User immediately terminates billable resource, hits node B. Requisition request has a long latency (and termination request does not). So in the billing system, there is a negative time associated with the resource.
Our initial prototype was a cluster-based approach, and once we had a better estimate of user growth and resources we moved to a monolith for all the reasons you cite.
It’s not the log writing that’s hard; it’s the log insight extraction that is. When you compare poring over logs from multiple invocations, even with request identifiers, the tooling is worse than tooling to look over stack traces/core dumps.
A calls B, C, and D. B puts an item on a queue that eventually causes a call to C. Both B and C call D. Which D call is slow/erroring? Is it the AD, the ABD, the ABqCD, or the ACD call?
No, contrarian is a good description of the problem. We see a lot of posters complaining loudly about problems they never had or tools they never used. Some appear to simply enjoy arguing for the sake of it.
Having heard mainly "nothing but guff" justifications for microservices in non-enormous orgs, I think (as usual, and some law I forget dictates) the latter is more likely.
Why do people always associate cqrs with microservices?
I have a project where only 3 developers work; we re designed it to be cqrs. I was sceptical at first - I especially don't like some of the boilerplate it creates - but I'm now sold on it by combining it with the mediator pattern, now I can have validation, logging and performance checks on every command and query without repeating code, works like a middleware
And ofc being only 3 devs it would be nuts to have microservices so we are happy with a cqrs monolith
On the other other hand, I’ve worked on system that stuck with “the old way” (like ColdFusion) for so long that it was impossible to even find documentation on the old programming environment if you could find somebody willing to maintain it. The longer you wait to upgrade, the more it’s going to hurt when you finally do.
I'm not against replacing old systems, but this has to happen in incremental steps.
This way, if an implementation path doesn't worth the effort, just drop it and don't spend dozens of months with a full rewrite and realizing that you are spending a week to implement a CRUD for a simple entity. It's even worse when devs are new to the project and have no knowledge of the domain; I've been burned by this, never again.
At the end of the day, we are not in the research field, we are payed to fix business problems, to deliver something that brings a business value. Yeah, it's nice to write some really complex software that would make the system a lot more optimized, but we have to deliver it before our bosses are getting tired of excuses and pull the plug. Learned that the hard way.
Having been in this situation, as well, I find that the aging out of technology MUST be led by a CTO who is competent enough to know that tipping point of cost-benefit. Their job immediately becomes a political choice (pass the pain to the next guy) when the point has passed and it gets worse every new CTO. Some of these systems are HUGE (hudreds of millions of lines of ColdFusion, where I worked).
The reason I have been left without product document is because Oracle bought the software the organisation was using. Cannot stress enough the value of keeping an offline copy of the documentation of a product rather than relying on a companies website or similar.
Edit: Also want to add software installation files are very important to keep offline as well.
On the other hand, don’t hire who wants to understand everything first either. Sometimes a bad choice was just a bad choice, and it’s a waste of time to figure out if it was taken for a reason.
That's a very "junior senior" type of developer. Rewrite from scratch needs external reasons to be a good idea, because you have a huge uncertainty risk of "i don't know enough about this system to rewrite it, and now i'm spending months hacking back in edge cases to the new no-longer-beautiful-design." Uncertainty risk doesn't mean never do it, but it means make sure you understand it and make sure it's gonna be worth it.
I've never worked in outsourcing and almost every job I had in my >15 years career as a developer included at least somewhat-gnarly & often truly-gnarly code.
It is very rare to see a codebase several years old that can be fairly describe as "in good shape".
This sounds like the phenomenon dubbed Chesterton's Fence [0].
A core component of making great decisions is understanding the rationale behind previous decisions. If we don’t understand how we got “here,” we run the risk of making things much worse.
So you helped the new dev understand the current lay of the land. They listened, then suggested an improvement based on their new understanding. You agreed and together improved the code.
This isn't really Chesterton's Fence. It doesn't take any soul searching or insight to answer "why was this not written in Python3 in the first place" - the answer is almost certainly that Python3 or some key libraries didn't exist.
It's a pure cost/benefit analysis question. Switching has some cost, and some benefits, and you have to decide which are likely greater.
Knowing how so many devs are hungry to move to the new shiny thing, the question is often more like "why hasn't this been rewritten in Python3 already?".
Possible answers:
• Legit technical or cost/benefit reasons
• Probably a good idea but no budget/time for the effort
• Somebody already tried and it was a disaster
• No manager has been stupid enough to humor the devs
Most reasonable devs are not jackdaws and do not want a "new and shiny" thing for its shine.
Mostly devs want to go away from the old and increasingly creaky thing. And when the cost / benefit ration is finally right (that is, it's creaking loud enough and slows things down much enough), the move hopefully happens.
Really? The only other answer I can imagine is "all our other projects are in Python 2". And generally that means you will reuse some code from those projects. What other reasons have you seen?
I'm working with some old code and spend a lot of my time thinking "There's got to be a reason this is as wonky as it is."
Sometimes I just can't find out (the guy who wrote it is gone, nobody knows) then I have to just do it the way I think it should be, test ... then I find out why, sometimes spectacularly ;)
I recently started a new job and whenever I can I try to document the code I contribute with the "whys" instead of documenting self-explanatory parts. I've extended this to parts of the code that I now maintain but did not initially write as this helps me mentally map out the rationale.
I think this is the most forgotten part of this type of issue. It's not a static issue. With these types of things and time the problem often gets worse, and the answers get easier. They should be periodically re-evaluated.
Parts of the problem get worse as time goes on. (more code to convert, more complexity, more test cases, EOL for the python2 version of some lib, more user data, blah)
Parts of the solution get easier as time goes on. (easier syntax sugars, better testing frameworks, infrastructure abstracted easier, remove dead features, etc)
Parts of the desired solution become less relevant as time goes on. (Why not use golang or node or elixir, php and python are so dated!)...
Just because last year was a bad time for the upgrade doesn't mean today is. Knowing how to get these things done at the right time by the right people for the business is what separates a great engineering manager from one that is just "shipping features and bug fixes".
I am a senior dev and I always try to push new ideas to my team lead (senior dev and older than me), but all I get is blatant criticism because he says "I've tested it and didn't like it). A clear example: refusing to move to Spring Boot and still staying on the dead horse JavaEE, which got more complicated and fragmented than ever since Java 9
Good Lord. And here I thought I was being a stick-in-the-mud by gently reminding our new upstart that, yes, moving from Spring to Guice would bring us some new functionality, but it would also lose other functionality and have a non-zero migration cost.
Just imagine that allof it is called JakartaEE and yes, you have an artifact for interface and one for implementation and they are not even compatible with jpackage (Java 14)
and... if after 3-4 weeks, it was nowhere near completion, or you'd hit so many snags it was going to take several more months, I'd hope you'd have the good sense to put a cap on it, and revisit later. There's nothing inherently wrong about trying something like that, especially if there's some tests in place, a known goal, a reasonable time boundary relative to the potential benefit, and a willingness to stop if the effort is growing beyond the benefit.
Your experience sounds great, but it also sounds like we don't see that level of middle-ground pragmatism enough - it's either "no, we have to stay using PHP4.3.3 because it's what I know" or "we have to rebuild to cloud micro services to be able to scale infinitely without being restricted by schedules or budgets".
We mostly stick to a single master with our code, but this was one case where we branched so we could watch progress via test passing percentage to make sure we were moving fast enough to finish before we got sick of it.
Definitely have been one or two refactors that have been shutdown. We've been lucky to have clients that give us the freedom to retire some technical debt/risk instead of just churning out features. And we're small enough (200 kLoc approx, 6 devs) that full codebase refactors are still doable.
Here's a tricky one that I encountered a year ago.
A junior tried to advocate for browser test. Senior from the developer productivity team said browser test couldn't possibly be made non-flaky.
I didn't know how to advice the junior because I agreed with them.
Saying "something is infeasible/costly" is like a blanket statement. There was no way to quantify that.
On a flip side, we couldn't justify the impact of browser test either. But I felt, at the time, we should've been biased to implement every type of tests (than not) since there are only 3 types: backend, JS, and browser.
Akin to your example, it was the same argument "X is costly/infeasible".
Your example doesn't have any issue because everyone probably agrees with it. But being infeasible to setup browser test sounds strange.
There's another type of tests: visual diffing. I've used it a lot and I love it.
1) Keep a set of a few thousand URLs to diff, add new ones as needed.
2) Use a tool that can request these URLs from two versions of your app, make a side-by-side visual diff of the resulting pages, and present a report of any differences.
3) Before committing any change to your codebase, run that tool automatically on the whole URL set, comparing the new version of the app with the current version. Then the committer should review the diff report manually and it gets attached to the commit history.
This way, adding coverage for new functionality is easy (you just add some URLs to a text file) and the whole thing runs fast. And it catches all kinds of problems. Not just UI bugs where some bit of CSS messes up something unrelated, but also you can run the frontend diff after making any backend change and it will catch problems as well. It won't solve all your testing needs, but it covers a lot of ground cheaply, so you can concentrate the custom testing code where it's actually needed.
My team went through the browser testing issue as well and agree they're hard tests to write. We ended up writing a few using selenium, but we don't have as much coverage as we'd like. Luckily, unlike an interpreter upgrade, we can add them incrementally.
If the test is slow and hard to deflake, then let's add it slowly. Maybe only test the critical path. There are ways to manage it, instead of discarding it entirely.
I'm a senior dev/tech lead and I empower my junior team mates/reports to do what they think is right. As such, I get shot down just as much as I shoot them down.
I learn a lot from them, because I'm asking them to do a lot, and they also learn from me when I review the code or advise them on an alternative. They push back a lot and that's what I want. I want my reports to prove me wrong and show me better.
This approach means that if I really have to put my foot down or enforce something, then there is enough mutual trust to allow that to happen.
I’m trying to get to that point. It’s interesting to see that any new member that joins the team has a sort of adjustment period where the still come to ask everything, but eventually realize that “do what you think is best” is going to be the answer to any technical question regardless, and they realize it’s ok to have (and fix!) their own problems with the code.
No excuse converting python 2 to python 3 slowly. One commit at a time. We did that in Chromium. Using newer Jon deprecated, unsupported libraries brings a bit better team Morale and enjoyment in what you do. Imagine intern coming in, and using Fortran.
We wrapped up our conversion about 1 week after the sunset date. I think we could have lived with no new features, but the no security updates issue was a big reason why we upgraded.
Wins for the user were performance improvements and security. The dev quality of life features (better type checking, f-strings, nicer pathing syntax), as someone else mentioned, will improve our development velocity in the future, which gets back to the users as new features faster.
Will it save a person-month in future work? If there's 10 devs working on this project and it will be around for at least another year, then that requires less than a 1% increase in average efficiency.
For every one of these stories there are two ways it can go: The first is it turns out the senior dev is jaundiced in their view and the problem was improvable/fixable, or thee senior dev is right and the junior dev is about to go and waste a whole load of time on something they've already been told is a waste of time.
As a manager I always hated these situations, firstly, because the problem the juinor dev is trying to fix is almost never as productive as "Hey let's upgrade to python3", it's more like "Hey let's migrate to this specific version of this specific tool that I happened to use on one of my pet projects" or "I've got this incredibly ambitious plan to change everything about this peice of code you gave me to work on" (You're only looking at that code at all because it's simple, relatively unimportant and we're trying to ease you into the team).
The problem is if it is the junior dev whose wrong you're now going to start seeing all these new issues because you've taken someone whose intuition isn't quite there yet and given them something incredibly complex to do. That's when you get into work 2 weeks later and find their mega-commit changing 2,395 files changing tabs to spaces and auto-modifying everything to camel case. Taking a chance on that intern's pet project means a lot of support from others in the team.
That is, the fact that Python 2 is not officially supported any more, and at maximum you'll see some fixes for most egregious security holes, is not something you considered back then?
I knew about the security risk, but I wasn't aware of how many quality of life and performance improvements we could use in the new version. And I also learned that some people enjoy researching/debugging the interpreter, whereas I originally thought the refactor would be a chore that hurt morale.
This is a nice counter example. Although it's clear the article has good intentions, it's promoting unscientific thinking with an appeal to authority ("yes, they go brrrrrr extremely quickly much of the time, and senior developers know that!"). I think there is a point to be made about how we all could benefit from quelling our outrage at decisions we initially disagree with before we've heard all the evidence, but that has nothing to do with seniority.
It's not clear to me why you think that this anecdote had any particular intention. My intention was not to promote any position at all, but rather to tell an amusing personal story that I was reminiscing about because of an email I got from a young friend.
Anecdotes are by definition anecdotal; I am not promoting an anti-science position by relating a personal anecdote and I resent the statement that I am doing so.
If you'd like to write a blog article that promotes scientific thinking, I strongly encourage you to do so.
A lot of times the creators/maintainers of a project just haven't put any effort into 'upgrading' stuff. Whether it's a newer version of a language, library, operating system...
Some people will use any excuse to not have to change. Once in a while it's because they are genuinely too busy, but that just signals other issues.
And maybe the Junior developer is allowed to do this because they cost less, so if it gets abandoned the company can take it, but a senior out for a month would impact other areas.
Only if you just fix whatever breaks in the upgrade and never use the features in Python 3. The static typing benefits alone should either make your software more reliable (benefit to users) or speed up feature delivery (benefit to users).
I find that the biggest misunderstanding happens because "new grads" (and I happen to be one) confuse _asymptotic complexity_ with actual complexity.
I'm not sure sure why, but CS courses and interview questions mostly focus on _asymptotic complexity_ and usually forget to take into consideration the complexity for "little values of n". And funnily enough, in real life n never goes to infinity!
In a strict sense big O notation only cares about what happens when n goes to infinity. The algorithm could behave in any way up to numbers unimaginable (like TREE(3)) but still, its big O wouldn't change.
Maybe what is missing to those "new grad" is a felling of real world data, and how a computer behave in the real world (with caches, latencies, optimised instructions etc...) not just having an ideal computer model in their mind when they design algorithms.
It’s because Big O is Computer Science. Cache effects are Software Engineering. Professors of CS do a fine job of teaching CS. They even briefly mention that there is a implicit constant factor k in O(k n log(n)) and then they never mention it again. They certainly don’t mention that k can easily vary by 128x between algos. AKA: 7 levels of a binary tree. Or that most of the data they will be dealing with in practice not only won’t be infinite, but will actually be less than 128 bytes. Or, that even with huge data and an proven-ideal O() algo, there is often 10x speed-up to be had with a hybrid algo like a b-tree instead of a binary tree. And, another 2-10x with SIMD vs scalar. 100x isn’t infinite, but it’s still counts.
So, grads listen to their CS professors and that’s what they know. It’s not until they get lectures from greybeard software engineers that they learn about the reality algos and not just the idealized algos.
> briefly mention that there is a implicit constant factor k in O(k n log(n)) and then they never mention it again
A fine concrete example of this is the Coppersmith–Winograd algorithm (and its derivatives), a matrix multiplication algorithm with impressive complexity properties, but which in practice always loses to the Strassen algorithm, despite Strassen's inferior complexity. [0][1][2]
(Aside: the Strassen algorithm is pretty mind-bending, but also easily shown. If you've got 22 minutes spare, there's a good explanation of it on YouTube. Perhaps there's a more dense source elsewhere. [3])
> It’s not until they get lectures from greybeard software engineers that they learn about the reality algos and not just the idealized algos.
To mirror what some others are saying here, students should also be taught the realities of cache behaviour, SIMD-friendliness, branch prediction, multi-threaded programming, real-time constraints, hardware acceleration, etc.
Even Strassen can be overly expensive for reasonably sized matrices. Almost all matrix multiplication algorithms have a threshold only beyond which do they use Strassen.
Although apparently that threshold can be lowered (http://jianyuhuang.com/papers/sc16.pdf), but even then it's a matrix that's several hundred columns by several hundred rows large.
Some CS classes explicitly use Strassen to teach the realities of asymptotic vs wall-clock time complexity, challenging students to come up with a hybrid matrix multiplication algorithm that performs the fastest and switches at the best thresholds of matrix size.
> To mirror what some others are saying here, students should also be taught the realities of cache behaviour, SIMD-friendliness, branch prediction, multi-threaded programming, real-time constraints, hardware acceleration, etc.
Which would have the positive knock-on effect of the textbook being sufficiently obsolete every year or so that the students could no longer trade it in for credit, saving the bookstores money!
More seriously, that knowledge (at least once you attach numbers to it) has a shelf life, and not a very long one. Teaching big-O analysis means the knowledge is timeless, which any good theoretical knowledge is, and moving more towards practice would force the professors to keep on top of the state of the art, and the state of the mainstream, of hardware design in addition to everything else they're doing.
> Which would have the positive knock-on effect of the textbook being sufficiently obsolete every year or so that the students could no longer trade it in for credit, saving the bookstores money!
That's naive. E.g. SIMD is here since a good time and going to stay. So are GPGPU, with now quite similar architectures for tons of chips.
And Computer Science can actually be about science for real computers, and computers are not 8086 nor PDP11 anymore, and have never been a turing machine. So there actually is some existing generic CS and ongoing research that cares about cache effects and so over. Maybe it is applied CS if you want, and some kind of pure CS should not care about that, but I really don't see what should be the criteria to decide which is what anyway, so IMO there should not be any (but I do not mean that all research should care about e.g. cache effects, just that it is not really useful to attempt to distinguish between those which do and those which don't).
We don't teach advanced math by only showing what was done at e.g. the beginning of algebra. Neither should we stick to only basic subjects in computer science.
> textbook being sufficiently obsolete every year or so
I wasn't very clear on that point, but didn't mean to suggest it be the same textbook. These other topics deserve courses and books of their own. The algorithms lecturer should be careful to emphasise the limitations of complexity theory though.
> that knowledge (at least once you attach numbers to it) has a shelf life, and not a very long one
Plenty of long-lived principles to be learned there, even if the particulars change over time. Caches are still going to be around in 10 years time.
Without knowledge of the hardware your software runs on you're likely to be one of those people who uses lists instead of vectors without really understanding the difference.
Also, as other people said, the shelf life for that knowledge is actually pretty long. Hardware will always be your platform, no matter how many layers of abstraction you have in between.
> To mirror what some others are saying here, students should also be taught the realities of cache behaviour, SIMD-friendliness, branch prediction, multi-threaded programming, real-time constraints, hardware acceleration, etc.
They are in many places I'm aware of. At least, as an EE (at Stanford, but I've heard MIT and several others do the same), I had to take a digital system design class, but the majority of the class was spent on performance engineering. In fact, the very first (actual) project of the class was to take a 10-line piece of C code, which applies a simple filter in real time to a video, and make it performant. The initial code runs at around .5 FPS.
Our resulting performant code was, of course, many times larger (I think it might have been ~150 lines), but it ran incredibly quickly (110 FPS, iirc), by doing crazy compiler tricks and often calling ASM from within the C code, even though the asymptotic (big O) performance was exactly the same.
For context, this is not just a digital systems thing (my work is in mathematical optimization theory and my undergrad was in photonics and physics), but I do know that this class is not a requirement for CS since it's potentially too hardware oriented. The classes exist, but I'm not sure people are taking them.
Yes, sure. I hadn't meant to imply otherwise. The pure-algorithms lecturer needn't cover these other topics in detail in their course, but should be careful to emphasise the uses and limitations of complexity theory.
> this class is not a requirement for CS since it's potentially too hardware oriented
I don't see the sense in this. Computer scientists publish work on applying GPU acceleration, as they should - that's not electronic engineering work they're doing. We could quibble about whether it's computer science of software engineering.
> I don't see the sense in this. [...] We could quibble about whether it's computer science of software engineering.
I agree, I'm not sure why this is the case either, just my idea as to why it may not be a requirement. (Some part of the class does involve writing a good chunk of a 5-stage RISC processor based on MIPS, but this was still relatively straightforward with just a basic understanding of digital logic.)
I always loved my algorithms and datastructures professor talking about Brodal queues[0] - a datastructure named after him. They're super interesting from a theoretical point of view, but they're not useful for anything.
It wasn’t taught to me. And, in my previous job I interviewed many dozen fresh grads. One of my questions was “How much slower is it to sum integers in a trivial linked list vs. a trivial array?” 90% answered “Umm... I don’t know. 2x?” When asked why, they all said “1 op to sum the int +1 op to traverse the pointer.” It was amazingly consistent.
This kind of CS-based rationalization is arguably another aspect of what the article comments on. I wrote a benchmark and found the difference in this case to be 3x-3.5x.
I can see how this wouldn’t be covered in an undergrad cs education. I took only a single computer architecture class which was extremely limited in scope. The only reason I knew about vectorization during undergrad is because a friend mentioned it to me once.
Are you saying that the majority of the speed up is from caches and then there's a secondary, much smaller, speed up from the vectorization? Or are you saying all the speed up is from caches and I'm off base here with vectorization.
1) Do you think about cache at all or is it just something you heard mentioned as important that one time?
2) It's a good lead-in to discussing the effects of cache in algorithms. How that conversation goes helps me to understand how that person thinks and discusses complex problems.
A good answer would be "I'm not sure, but probably way, way slower because linked list can point all over memory but arrays cache really well."
An excellent, A+ answer would be "In the best case it might not be too much slower if you have an intrusive linked list is arranged sequentially in memory like an array like onekorg explained. But, in practice most will be 20-200x slower because they are usually implemented as pointers to nodes containing pointers to data and each piece is allocated piecemeal in a already fragmented heap. Uncached memory reads can take 100+ cycles and summing an int is not enough work to hide that even if the CPU speculatively prefetches."
I mainly expect a surprised reaction that they could be so slow and looked forward to the follow-up discussion.
I sometimes wonder if CS should be renamed Computer Pseudo-Science. I blame Knuth and (mostly) Dijkstra for propagating the falsehood that you can estimate the performance of real code on real hardware with a some elegant academic blackboard math which gives you the wrong answer for many practical applications.
It's not that Big O isn't useful - it's that it's taught as a set of "proofs" which somehow make it appear objective and "correct", when in reality performance is at least as dependent on cache architecture, median size-of-n, memory bandwidth, and other implementation details.
Anyone who graduates CS without having been taught this very forcefully - preferably during a practical project - should be refunded at least some of their course fees.
But Big O was a lot more directly correlated when the CPU wasn't doing "magic" optimizations on its own. It was still estimations with an invisible constant factor, of course,
This is a key point. The gargantuan performance difference between main memory and the CPU cache (or indeed, the existence of significant CPU caches at all) happened well after big-O was firmly established in the CS curriculum.
My A+ answer is "My guess is [x] but instead of speculating we can create a test to discover the performance. [Describes test]."
Koala_man above says:
> I wrote a benchmark and found the difference in this case to be 3x-3.5x.
The actual number depends on a lot of things, of course (language, architecture, test methodology...), but it is possible that your 20-200x A+ answer is incorrect.
High end 3D mobile games. We regularly measured direct correlations between performance and revenue. Higher performance meant a larger install base of devices could run the app with better responsiveness and lower battery burn. Thus, higher retention and engagement. Thus higher monies.
At a guess, some understanding of this article[1] - if a CPU instruction scaled up to human time takes 1 second then a Level 1 cache lookup takes 2 seconds and a main memory lookup takes 4 minutes.
Imagine for the array it's 1 CPU instruction to load a value, 1 to load the next value, 1 to add them, and one to store the result, that would be 4 instructions per sum; ideally the array would stream into the CPU after a single main memory lookup delay up-front, and then be 4 instructions per pair, summed as fast as the CPU can loop.
The linked list at worst needs an imaginary 1 CPU instruction to load a value, 1 to load the pointer value, 1 to reference the pointer, a delay of 2 seconds to get that value from L1 cache - missed, it's not in cache - 240 seconds stalled waiting for main memory, 1 to add, 1 to store the result. Worst case, >240x slower.
The linked list is not guaranteed to be in contiguous memory, but it might be, so the cache might have the right data in it. The linked list is 50% data, 50% metadata, so the cache is half wasted / can hold half as much data, and if the linked list is coming in from a big memory read quickly, half the bandwidth is carrying pointer addresses not data, so the throughput is halved for that, too, and the processor cycles were already able to happen much faster than the main memory bus max speed. If it's not contiguous memory, you don't know in advance where it is to request all the right memory areas at once - not until you read sequentially to the last item pointer and find there are no more.
Maybe if they are both small, both in contiguous memory and go into Level 1 cache after a single main memory delay, it could be only ~2x time, but the more data there is overall, the more chance the linked list will bust the cache or be discontinuous in memory. And on the plain array side, it might be possible to speed up with SIMD/SSE instructions to spend fewer cycles adding and storing per element, which the linked list approach might not be amenable to at all[2], then best case might be ~4x slower, worst case ~500x slower.
O dear.... Am I happy that I never studied computer 'science'..... On the other hand, there must be smart computer science students and/or smart places of education where actual learning about processor caches and the like takes place.....
Mentioned, yes, but often not learned. In many situations the only thing that matters is the constant factor. If the number of data items is relatively small, the difference between N log N and and N squared may be completely dominated by the constant factor. In addition, there is the challenge of maintaining the code later and making sure it's correct.
Take a look at Real-time Collision Detection[1]. I takes a great look at both algorithmic complexity and cache awareness. That's how it should be done.
I've looked before, but I've never seen a class dedicated to practical algorithm design. Being able to reason about cache layout, context switch costs, branch prediction behavior, simd refactoring, and basic compiler level optimizations will result in much more performant code. In the real world people often write complex algorithms which operate on structs/classes instead of primitives. This means there's a huge performance hit just from pointer traversal in a hot path, especially if someone naively does math across data inside heap objects. You can easily write a fancy dynamic algorithm approach which has theoretical O(k*n) performance which takes forever in the real world due to abstraction traversal. If you're doing more than one operation, it's often a massive performance boost to cache all your object pointer evaluations into primitive arrays, do simd operations on them, and then populate the objects at the end.
Does anyone have a good textbook suggestion for cache/simd aware algorithm design? I've seen plenty of papers that cover single examples but never something the scope of a book.
Models are easy when you turn every cow into a sphere. But physicists never believe their models respect the real world. Computer Science should be about the Science of Computers, not hypothetical models acting on hypothetical architectures.
And then we seriously overvalue these "core CS skills" for a lot of positions, many of which will not involve actually using any of it at all. I can't tell you how many people I've worked with that could tell you a bunch of Big O crap off the top of their heads but all they were working on was web apps and they couldnt seem to remember that running an ORM query in a loop is a bad idea. The way the U.S. uses college education as a prerequisite to employment but also seems to not be very good at teaching stuff that's actually relevant to the work you allegedly need the degree for is very disturbing to me.
As a side note: O(k n log(n)) and O (n log (n)) have exactly the same meaning for k non-null constant by the definition of big O notation. The more you know!
After a few years working on real-world code, I understood that faster asymptotic complexity was often slower on average.
After a few more years working on real-world code, I understood that it's usually better to choose the algorithm with better asymptotic complexity, anyway.
Like, sure, my O(n) algorithm will be 10x slower than your O(n^2) algorithm for small n. But users aren't going to notice a few microseconds. Users ARE going to notice my O(n) algorithm being 100000x faster for large n, when it's the difference between milliseconds and minutes.
It's a tough call sometimes. Code legibility is important and an O(n^2) is so often fast enough (microseconds!) that a more complex algorithm may be faster for the 0.001%ile and you're right that sometimes that means we should select it, because it makes the worst case still microseconds, but realistically speaking some codepaths change frequently enough that the true metric of a codebase is how easily it's adapted, understood, or modified. Let the guy with the big n eat cake, so to speak.
I regularly see people make this mistake and don't grasp it after correction.
You could make a hash table with a constant time lookup, but the hash takes 1 hour.
Big oh only tells you how it scales, not it's performance (runtime).
It's not even that. You could have a normal hash table with a decent hashing function, and you'll still get beaten by a flat array for small n (hundreds, low thousands), because the array is contiguous in memory - so operations like search or moving stuff around after addition make extremely good use of CPU's cache.
> the array is contiguous in memory - so operations like search or moving stuff around after addition make extremely good use of CPU's cache
Also - if I see someone try to use a linked list for an enormous data structure again.... Wow it does not scale worth crap because it turns out that the hardware is actually important, and contiguous memory is amazing.
Oh god. Don't talk to me about linked lists. One of the bigger performance improvements I've made in a certain company is taking the code working with lots of numerical data in linked lists because they had easier syntax, and rewriting it using honest-to-god, contiguous-memory arrays of doubles. After that, we could process three orders of magnitude more numbers per operation, and one order of magnitude more of operations, and we still came ahead.
Maybe you knew the scale up front, but if you didn’t the easier syntax was the right first choice. It may have been the right first choice because it was easier to code even with the scale known up front. Only after measuring and understanding the trade offs should the easier to reason about code have been removed. IMO, thinking about and understanding these trade offs is one of the main differentiators between a junior and senior developer.
> IMO, thinking about and understanding these trade offs is one of the main differentiators between a junior and senior developer.
I agree, but in a way opposite to what you intended. An experienced developer[0] should be able to look at a situation like this and realize that few more minutes of focus can yield a better (array-based vs. list-based) implementation[1]. There are no downsides to that (arrays were only slightly less convenient in that case, syntax-wise), improvements occur regardless of scale. The list-based solution was a bad one at the scale it was originally written for handling.
I believe a hallmark of an experienced developer is writing performant code from the get-go; this is accomplished by not making stupid mistakes like this, and it costs pretty much nothing in terms of coding time or code complexity. All it takes is a little knowledge and caring about the product's performance.
--
[0] - I hesitate to use the word "senior", because to me, whether it means anything depends on the company one works in. In many, a "senior" developer is just the one that came before all the "junior" hires, and it doesn't matter that that developer is a fresh bootcamp graduate. And once you can put "senior X developer" on your CV, it's likely your next job will give you seniorship immediately as well.
[1] - and an extra few more minutes would give an implementation that doesn't allocate new memory unnecessarily - also a huge performance win.
The most important lesson I've learned from 34 years of writing software, it's to stop pretending I know shit about the problem I'm trying to solve before I have written an actual working solution. Which means getting there asap is top priority and nothing else matters. Sometimes that code runs fast enough, often it turns out I'm solving the wrong problem which means performance doesn't matter at all.
Unsorted-array-based maps are sometimes used in the Java world, and for two or three elements will have much less overhead than hash tables. For instance, fastutil has http://fastutil.di.unimi.it/docs/it/unimi/dsi/fastutil/objec.... The map interface and encapsulation into a single “object” is the same.
It occurs to me that I don't know whether any of the major dynamic language implementations with maps/dicts/hashes as a central data structure use a similar approach for very small ones… huh.
> by a flat array for small n (hundreds, low thousands)
Some of us are working in, say, Python. A flat array can outperform at small n, yes, but people overestimate where the tradeoff point is. It's at <5 items:
# A list of [0, 1, 2, 3, 4]
In [10]: linear = list(range(5))
# A hash set, same thing.
In [11]: hashing = set(range(5))
# 44ns / linear search
In [12]: %timeit 3 in linear
44.2 ns ± 0.412 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# 25ns / hash search!
In [13]: %timeit 3 in hashing
25 ns ± 0.6 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
The hash set outperforms the linear search by nearly 2x, on a list of size 5! (The performance is similar for other common types that end up in hashes, like strings.)
"It's Python!", you say. "Too much chasing of pointers to PyObjects destroy the cache!" And yes, they do; but many people are working in high-level languages like Python or Ruby.
But, for those that aren't, if we repeat the above exercise in Rust, yes the tradeoff will move up, but only to ~60 items, not hundreds or low thousands:
test tests::bench_hash_int ... bench: 14 ns/iter (+/- 1)
test tests::bench_linear_int ... bench: 19 ns/iter (+/- 3)
If you're thinking that somehow accessing the middle item each time bestows an unfair advantage to the hash table, randomizing the desired item doesn't help, either:
test tests::bench_rng_hash_int ... bench: 19 ns/iter (+/- 2)
test tests::bench_rng_linear_int ... bench: 24 ns/iter (+/- 2)
And looking for an item not in the list is definitely not favorable to the linear search. (It's the worst case.)
In my experience, it's almost always easiest to pay mild attention to big O concerns, and just use the appropriate data structure for the problem at hand. Cache effects mattering is either rare (you're writing a RESTful microserving to push cat pictures, a cache isn't going to matter once we hit this mobile devices 20 second network latency!) or highly context dependent (your line of work is always low-level, and these crop up more often, and you're consequently on the lookout for it; I don't think this applies to most of us, however).
You only benchmarked the search itself, but in the real world it might also take time to set up the data. You can't really pinpoint a tradeoff point without knowing how many times a data structure will be used.
I ran your Python test on my machine and the hash set was faster in every case: 10x faster at size 50, 2x faster at size 5, 1.3x faster at size 3.
That's not a bad point, and perhaps I should test creation times too. (I wanted to ignore that s.t. the test isolated a single thing: searching / wasn't mixing two things together.) Both hashsets and vectors are O(n) in setup, and I'd mostly expect their real-world performance to be very similar: hash tables tend to use contiguous arrays for storage and vectors do by definition, both tend to overallocate (hash tables to prevent collisions, vectors to amortize appends to O(1)), so I'd expect the performance in the real-world to be similar. Vectors tend to insert in-order, though, whereas the insertions in hashsets would bounce around the table, so that might make it more interesting.
Yes, it's always best to reduce the number of variables in benchmarking, but the setup makes a big difference when we are talking about tiny data sets.
I don't know how to test it properly, but I tried looping over both the initialisation and a single search, and the hash sets were much slower, so much so that hash sets trailed lists by microseconds at size 10000. In retrospect, making those data structures unsurprisingly dominated running time and the test kind of lost all meaning. It was clear, however, that it wouldn't take many searches for the hash sets to win, search times for lists were going through the roof.
To be fair, in my experience it is often the case that asymptotic complexity is a good proxy for real-world performance, even for small values of n. Not always, but often.
I think it's fine that the academic courses focus a bit more on what's better in theory than in practice, because there are always caveats to "in practice"; the person who writes the special-purpose genomics libraries was also once a new grad.
I like to start by thinking about cache locality and ensuring linear layout. Next focus on one-time, or minimal memory allocation. Then there are a bunch of small, systemic things you need to get right. After that you can start worrying about worst case big O scenarios.
Of course this depends on your language. A c programmer will have a different mental model than a python one.
In Python performance is your last consideration, and that's OK. Most things computers do don't need to be fast. Only the innermost loops run the most do.
This is the philosophy that has led to our software becoming slower despite improvements in hardware.
Performance is always important. Especially for consumer applications, where your software will probably need to run alongside many other processes each competing for resources.
> This is the philosophy that has led to our software becoming slower despite improvements in hardware.
I disagree. Software has gotten slower over time because we are adding more fluff to it (SDK’s, libraries, electron, GUI animations, web interactions, frameworks, etc). Not because the developers are failing to focus on code optimizations.
i used to think this was true, but theres a lot of stuff recently where there really aren't such hotspots everywhere. when i profile UI stuff for instance there isn't some big obvious hotspot to optimize, the runtime is spread all throughout the app doing random things. if all the regular code you write is just ludicrously slow, you're going to end up with something that's just laggy and without any way to fix it other than rewriting it
Often, for small values of n performance matters less anyways matching that as n gets larger is often a nice bonus. Sometimes this makes the code more complicated, yes, but occasionally it can even make the code simpler, especially in a language with good data structures and algorithms (C++ is a shining example.)
I am always uncomfortable with high "big O" algorithms. For example I know that with the data we have now, O(n^2) is fine, but if there is no strict bound on n, we don't know how far n will go in the future. It may also be a vulnerability, where the attacker uses data that is designed to exploit the worst case scenario.
It is more about peace of mind. By using the more efficient algorithm (by asymptotic complexity), I know that my code won't become a bottleneck. That's like using "size_t" instead of "int" in C. I know my array will not exceed 4GB in any practical application, but by using size_t, I know it won't crash if it happens one day. One less thing to worry about.
Almost all well designed libraries use hybrid approaches, switching from an algorithm optimized for low level efficiently for low N to a theoretically more efficient algorithm for high N. For example a sorting algorithm can go from insertion sort (good for low N) to quicksort (very efficient most cases) to merge sort (guaranteed nlog(n), highly parallelizable).
And they never factor in training or maintenance level complexity, i.e. your ball of mud runs fast but have fun teaching ~5 juniors how to use it two years from now.
This is key. If the number of objects processed will never be large, then it makes more sense to write a quick, easily understood O(n^2) loop in under a minute and move on.
Taking an extra 30 minutes to an hour or longer to optimize and test for large inputs that will realistically never exist is a waste of time and money.
If you feel that the value of N might, in some strange and rare combination of success and changed requirements, exceed the expected amount, add a check for the lowest value that may signify a problem and throw a warning.
if N > 1000:
debug.warn("N count of %d may be too large for existing algorithm. Consider optimizing.", N)
Leave it at that and get on to more important things.
Taking the difference between normal complexity and asymptotic complexity to the extreme you have https://en.wikipedia.org/wiki/Galactic_algorithm which do have the best asymptotic performance, but only on values of n so large they never come up in real life.
When you do big O analysis you get best case, worst case, and average case. You have to do some thinking about the structure of you data when doing big O analysis.
It's not that. Something not properly covered in CS courses is that very often, performance is dominated by things that are not evaluated as a part of big O analysis. Like, memory allocations, cache friendliness, and other constant factors.
For example, according to the theory, a hash table is much better suited for key lookup and random additions than a vector. In practice, if you're storing a couple hundred elements, a flat array (with objects stored directly) will be faster because of data locality. If your problems are mostly "do a lot of small N ops" and not "do some large N ops", then big O analysis isn't all that useful anymore.
No, the point here is that big-O analysis means nothing if n is small. If n < 10 your algorithm could be exponential and still do better than a linear algorithm with a constant factor a thousand times larger.
In real life, you will always start with simple working implementation and go with it.
Then if things are slow, you profile your code with a good profiler while running for some kind of real life scenario and spot the slow parts (also keep in mind that profiling may affect the program's behaviour).
After that you may want to consider alternatives with less asymptotic complexity iff that's the part causing slowness.
Once I was asked to look for one project to see that if there is any room for improvement to speed up the program.
After I profiled the program with the test data, I saw that program was severely affected by a "size" method call on a lock-free concurrent list. Since the data structure is lock free, size method is not a constant time operation and calling it in a large list takes too much time. It was just there to print some kind of statistics, I changed the code so that it is called only necessary not every time some operation occurs. This immediately made program 2-3 times faster.
There were also some parts I changed with some algorithms with less algorithmic complexith to make it faster.
Overall, I made the program 6x faster. So sometimes you need to use fancy algorithms, sometimes you just need to change one line of code after profiling.
Although I mostly agree, there’s certain types of simple, fundamental performance considerations you want to take when writing code the first time, otherwise you’re just needlessly dealing with tones of performance issues in prod. Like make sure your DB queries are well covered by indexes, if you’re repeatedly looking up items from a list, turn it into a map or set first, in FE code try to keep re-renders to areas of the UI that need updating, etc.
Correctness and simplicity are almost always things you want to focus on over performance, but there’s still SOME level of simple performance best practices that you want to stick to up front. Similar to the tradeoff between YAGNI and software architecture - if you don’t start with some sort of coherent architecture, your project is gonna descend into a spaghetti mess. Ignore simple performance best practices and you’ll spend way too much time fighting performance issues.
> Correctness and simplicity are almost always things you want to focus on over performance
Right, it's a balance good developers know how to tread. Obviously if you can use a set instead of a list and it's one line change, go for it. But as the meme in OPs post, if you're gonna USE SEGMENT TREES and all, then it better be worth the amount of complexity and time you're putting into it.
Part of what makes a good engineer is being able to quickly tell where it's worth optimizing and where it isn't. Or as the meme says, when nested loop goes BRRRRRR.
There is also such thing as "design for performance," and sometimes - just sometimes - any amount of effort spent later on optimization would end up going nowhere, because what you are trying to optimize is in fact a huge steaming POS.
By the way, here’s an anecdote for the flip side: at one of my internships I was working on a tool to process large log files, and by careful application of Aho-Corasick I was able to make it about 50 times faster on the dataset we were dealing with, which made using the tool change from “let’s go grab lunch while this finishes” to “let’s stream the logs through this live”. Sometimes you do know how to make things faster: just make sure to test them before you do, and asking why something is a certain way is always a good thing to do before proclaiming it’s broken.
That's a great example; the most important part of your anecdote is the end which says what was the user impact? There is no prior reason to believe that a 50x speedup is actually a win; taking an algorithm from 100K nanoseconds to 2K nanoseconds when hitting the file system takes billions of nanoseconds is not a win for the user, and taking an algorithm that takes 5000 years down to 100 years is probably not either.
But a 50x win that, as you note, goes from "let's have lunch" to "let's change this one thing ten times and re-do the analysis to see which gets us the best results" is a huge game changer; it's not just that it saves time, it's that it makes possible new ways to work with data.
Another anecdote: A few months ago, I was adding a bunch of features to an open source library. Before changing, I studied the library's code to find out where to put the change best. During that study, I found an instance where the code performed a slow operation, but I didn't attempt to change it because it was out of scope for my change, and I didn't want to introduce bugs [1].
A little bit after I have made the change, an user filed a bug [2] to the library about slow behavior when you passed a large amount of data to the library. They were using the public release instead of git master, so my code wasn't at fault thankfully. Due to the knowledge I attained from doing the change, I could quickly confirm that precisely this slow part was cause for the performance slowdown, and was able to file a PR to reduce the library's complexity from quadratic to linear [3].
It wasn't very complicated from the algorithmic point of view, I only had to create a lookup structure, but the lookup structure had to be created in a certain way so that the library still behaved the same way as tested by the test suite. It also had to support things like duplicates as the original implementation also supported them, as a very important feature in fact (toml arrays).
The open source intrusion detection system Suricata [1] used aho-corasick until intel released hyperscan [2] in open source. Hyperscan is apparently more performant than aho-corasick. If your language can handle the C libraries, have you considered trying hyperscan to see how it compares?
For interest's sake, did you try simply using a decent regex engine as an alternative? Any DFA regex engine implicitly implements Aho-Corasick for you.
I think the engineer before me put a bit of effort into this, but it didn’t work out all that well in this case. This isn’t surprising considering that the standard regex tool on the platform wasn’t known for being particularly performant.
Heh... reminds me of my first proper MS internship, when I too was responsible for speeding up some code, this time in the VS Code Go extension. This code was responsible for tokenization, so it affected pretty much every operation and ran on every edit. Important shit.
Day 1: do some basic hoisting. O(n^3) => O(n^2). Tokenization times for a 10k line file go from ~15s to 500ms. Sweet.
Days 2-30 [1]: ideate, develop, bug fix, iterate on, etc, a novel (to me) data-structure to speed up the program even more. O(n^2) => O(n x log(n)) (expected). Great! Runtime on 10k line file went from 500ms to maybe 300ms. Oooops.
But hey, all the people working in 500k line files must really love the couple seconds my month of toiling (more importantly, my month of not doing other, more impactful things) saved them.
Learned a lot from that experience, and writing this out now I can see how that impacted engineering decisions I make to this day. I suppose thats the real point of an internship, so time well spent, in a way.
[1] It probably wasn't actually a month, but certainly a significant chunk of the internship.
> But hey, all the people working in 500k line files must really love the couple seconds my month of toiling (more importantly, my month of not doing other, more impactful things) saved them.
This stuff matters. These couple seconds per operation may very well be a difference between being able to open a 100k+ LOC file in the same editor you're doing your other work in, vs. giving up in frustration and looking for something else that can handle large files. Large files happen surprisingly often (in particular: log files, data dumps, machine-generated code). A "month of toiling" like this may immediately enable new use cases.
This is true, but my approach was still not a great one. It turns out the data structure was only accessed in a very specific way, which I could have exploited to dramatically simplify the implementation. If I had researched the problem more before diving into my original idea, I could have achieved the same end goal with much less work.
I have adhd. Every "couple seconds" of waiting is a chance for me to lose focus and "wait what was I doing?" 15 minutes later. This stuff definitely matters.
One of the most important things you can do in perf analysis is to know when to stop looking for incremental improvement.
If this subject in particular interests you, we did a lot of work in the C# lexer/parser so that once the file is lexed, it only re-lexes the tokens which changed on every edit. It also does fun stuff like the syntax colourizer only runs on code that's actually on the screen. Getting every operation that depends on the lex/parse of code in the editor down to running in much less than 30ms so that it would not slow down keystrokes was a huge amount of work.
Does the C# parser implement incremental parsing by just using a recursive descent parser with caching, or does it do parsing with nonterminals as lookahead?
It does a full lex and parse. Then if there is an edit, it determines based on the edit which tokens need to be re-lexed; maybe we went from "[10,20]" to "[10.20]" and so now the [ and ] tokens are fine but everything in the middle is changed.
So we go from a data structure representing "original text plus edit" to a data structure representing "original lex plus changes". Now we have the information we need to do the same to the parse tree, which has also been stored. Given the set of tokens in the program which changed, and knowledge of where the textual boundaries are of every parse node, we can restrict the re-parse to the affected syntax tree spine. In the example given, we know that we've still got, say, an array index list but the contents of the list need to be re-parsed, so we re-start the parser on the left bracket.
The algorithm that does this is called "the blender", and reading that code makes my brain feel like it is in a blender. The code was written by Neal Gafter and based on his PhD thesis in incremental parser theory.
The source code is available on github; do a search for "roslyn" and you'll find it.
Wow that still sounds really long for simply tokenizing a file? I worked on parsers a while ago and for reference I benchmarked e.g. the Python parser at 300.000 loc / second (for tokenization and building an AST) on my machine (a i7 laptop). Also tokenization complexity should not increase quadratically with the length of the file?
You probably know what you’re doing, just curious why these numbers seem to be off so much to what I would expect. What approach did you use for tokenization if I may ask?
I don't recall the exact numbers to be honest. I know the original was in many seconds, and in the end it was sub 1.
As mentioned in another comment:
The go extension is a thin wrapper around standard go tooling, we weren’t tokenizing ourselves just converting between their tokens and ones we could process; a large part of that was converting from byte offsets to UTC-8 character offsets.
The quadratic behavior was a bug caused by reconverting segments over and over again instead of converting deltas between previously converted subsegments.
I think you’re right, the end result was actually linear.
And the go extension is a thin wrapper around standard go tooling, we weren’t tokenizing ourselves just converting between their tokens and ones we could process; a large part of that was converting from byte offsets to UTC-8 character offsets.
I see these senior vs non-senior engineer contrasts pop up a lot. I’m not a huge fan of them.
It seems that there is a spectrum of skills an engineer could excel at: programming, infrastructure, managing, planning, etc. I’ve known senior engineers who only excel at a particular skill. I’ve also known senior engineers who are moderately good at many but not particularly good at one.
In my experience the only difference between a senior and non-senior is that the senior’s distribution of skills makes them more effective.
I’ve also seen senior engineers change roles laterally and become more or less effective, yet still maintain the senior title.
I don't think he dislikes the topic, but rather the way it is framed as senior vs. junior instead of subject matter expert vs. non-expert. The skipto example from your post is not exemplary of the difference between a senior dev and a new grad, it seems very domain-specific.
For over a decade I've wondered whether or not I was good enough to call myself a senior. Now I know. But I also discovered that, to me at least, being senior is not about knowing how to optimise algorithms, or knowing the ins and outs of the compiler or any technical skill like that. I mean, I used to love algorithm optimisation 20, or even 10 years ago. Still do, I guess. But in most projects, there's little value in it. Value is in being able to carry a complex project to completion in a fairly goal-oriented manner. Having overview of what needs to be done, knowing what to spend your time on, understanding what the user needs and how to accomplish that. Getting juniors on board, and making sure we're all working on the same thing. Of course technical skills are relevant, but it's more about how and where to apply them then about the skills themselves.
As a senior engineer who is proud of certain strengths and envious of different strengths I see in more often in others than myself, this seems pretty spot on to me.
If you accept the system that promotes an engineer into a senior position, there is no arguing that a senior engineer does posses skills that a junior doesn't. The junior is going to be evaluated in the same system.
I've seen it a day or two ago. Can't find the picture anywhere now (I've seen it in some group chat). Anyway, beyond the words quoted at the beginning of this article, the meme's "nested loops go brrr" had a picture of a triple-nested loop using Active Record to do some simple database operations.
To which the correct response is: "it's a 'senior developer' in an industry where you get called a 'senior' after a total of 3 years of experience doing programming; don't do stupid shit like this, just use SQL like it's meant to".
It was based on a similar story the one in OPs blogpost. At my first job I used to work with some really talented fresh grads that wanted to show off their algorithms skills and ended up over-engineering stuff.
One of them implemented a trie and stored it in SQL lite to implement some string autocomplete where the number of strings was something like 100.
The other implemented a 2D segment tree for doing some grid updates where the size of the grid was small. This inspired the first part of the meme. Segment trees and sqrt decomposition are topics that are popular at programming contests and nowhere else really.
Regarding the triple nested loop, I just wrote the simplest pseudocode that represents nested loops, not necessarily something a "senior" developer would write.
>CHK CHK CHA-GONK BRBBRBBRING! -- Man's Eyes Being Poked Like A Cash Registers' Keys And Jaw Popping Open Like A Till Drawer -- Mad #61, Mar 1961, Page 18 -- Kitzel's Department Store
Many people can write brute force implementations. But while the meme is funny and great, it only shows a part of reality: sometimes you do have to optimize things. Just don't do it too early and only if there is a clear use case that requires the speed. Then you should be able to write fast code, or at least know which library to use that has a good implementation of the algorithm you need.
Some companies like GAFAM probably put too large focus onto algorithmic questions, but they can afford to lose otherwise good engineers who are bad at algorithmic questions. They need something to filter the masses of applicants they receive.
Thanks for the context. The example you describe supports the meme better.
Sorry for being harsh, I got triggered by that code inside the printer, because I've dealt with a lot of dumb "I don't know how SQL joins work, so I'll use my ORM to do it and filter the data in code" cases early in my career, and I have sort of an allergy to that now.
I see this so much in Rails codebases that at this point the two are nearly synonymous in my mind. But maybe I’ve been cursed to work only on bad Rails projects or something and there’s a universe of them out there that aren’t full of that sort of thing.
The faces are just "Zoomer" and "Boomer," respectively. Depending on which group the creator is trying to make look ineffectual and clueless, one typically looks smug and amused while the other is losing his shit.
This is a great story. I feel like talented CS students are particularly prone to this sort of myopic thinking; their professors introduce them to functional programming, or the philosophy of UNIX, and they carry that approach everywhere, in a very blunt way. They also carry this little-examined conviction that the world consists of mediocrities and mediocre code and sometimes aren't charitable with the work of others.
Putting it another way, knowledge with the bare minimum experience required to be effective is very sharp, and that sharpness is sometimes what’s required to cut through old, retrospectively “wrong” ways of doing things. I don’t disagree with what you’ve said, I think we all have met plenty of people fitting your description, I just mean to say there’s another side of the coin. As food for thought, much (not all) of the open source software we herald as brilliant today was initially written by people still in or just barely out of grad school.
I agree with the thrust, but I think this has two sides as well.
Having an intricate understanding how things are connected, doesn't necessarily mean that one feels compelled to make it better.
Sometimes, "understanding" actually begets a cynical form of apathy. The whole world is chaotic and full of holes and injustices, so why bother trying to do the right thing, if it won't make a difference in the "big picture"?
Sometimes, knowledge (of your corner of the graph) with a hefty dose of misunderstanding (of the nodes and edges you're about to be traipsing down) is what's needed to most successfully face the challenges at hand. One name for this is "idealism", but without the negative connotation it is sometimes shipped in.
This is more what I was getting at with the whole knowledge + inexperience bit, if this makes sense.
The senior probably had both Knowledge and Understanding, and decided (correctly) in a split second which solution was good enough in all applicable dimensions, including the temporal (is it easy to change this if we have to?)
Being pedantic here, but knowledge is equivalent to understanding (information being knowledge without understanding). Wisdom is the word you were looking for (knowledge being wisdom without experience):
The New Grad had knowledge. The Senior Dev had Wisdom.
I see it differently. Knowledge =/= understanding. There are plenty of ppl who know a lot but have little understanding.
Put another way, knowledge is the nodes. Understanding is grasping the connections. Understanding is the higher power. Understanding is where the magic happens.
Wisdom? Wisdom is next level understanding. It's the maturity of developing connection within the connections.
Not if you read the article. It's pretty clean the New Grad didn't understand the details, the scope of the problem. He was theoretically correct at 50k ft. In reality, the details made his theoretical irrelevant.
> They also carry this little-examined conviction that the world consists of mediocrities and mediocre code and sometimes aren't charitable with the work of others.
Hear hear. CS students fresh out of school seems to carry the notion that they are superior to people who haven't studied CS, even though those people been in the industry for many years. Once the ex-students are now working in a professional environment with deadlines and stakeholders, it takes them a couple of months before they realize they have to actually learn how to work as a software engineer now, and that has nothing to do with CS.
I think it's really interesting how complexity theory sort of falls apart in the face of hardware-specifics and concrete use cases. The motto in computer architecture is to make the common case fast (it's in like every textbook), whereas the motto in computer science is to make the program scale/to solve generically. When push comes to shove (as this article shows), the computer architects seem to have the final word, at least when it comes to making things go more brrrrrr.
There's so much stress and attention given to complexity theory for software engineers, to the point that people will cram for hours to make it through FAANG interviews. I understand that it's important and it's just something that everyone has to go through... but data structure and algorithm performance is a Wikipedia search away, and then you choose the appropriate STL container and move on with your life. The same can't be said for gaining an understanding of modern processors.
I'm not saying that one is more important than the other or vice-versa, I'm just saying that it seems wrong to me that not knowing algorithms and data structures can break an interview, whereas not knowing hardware is virtually a non-factor.
The big take away for me is this: if you're not benchmarking, you're cargo culting.
I think you are wrong. Yes, better understanding of hardware implementation (cache sizes and latency, branch prediction and pipelines, specialized instructions) is important. But also in real life those asymptotic complexities will often have similar c and N_0, or your input will be bigger than N_0, so O(n) vs O(n^2) will still matter. Basic analysis of algorithms makes your life easier because, in this simplified model, you can get a rough estimate of time/space requirements. Of course, if you have competing implementations, you will benchmark them (which is not straightforward to do properly).
By the way, usually we use "complexity theory" for a part of theoretical computer science (math) concerned with proving lower bounds (like the most famous P vs. NP). What is required for basic analysis of algorithms (without any fancy methods) is not very hard. Yes, you can search the web like we all do, but first you need to know what are you searching for. Also, we reuse algorithms/data structures but not only in the final product but also in our own new algorightms so you cannot search it.
Regarding interviews, I think the main problem is that many companies and interviewers try to blindly copy the questions (cargo cult) without understanding the point of such interviews. The point IMO is that you evaluate analytic skills of a candidate. Correct/incorrect answer is not everything. You could do the same with math, science problems but algorithms/data structures are closer to programming. Anyway, I feel this is becoming a controversial topic.
in the 80s, alongside ibm, cray, heimdall, amdal, hitachi, etc etc you had the volume generics - intel. the generic overtook the custom, enough so the revenue and economy of scale won big. it came accompanied with a sci mindset of functional perf, * / ÷ / ^ / sqrt, rather than tps or $/transaction.
The big iron was a product of large-data-volume business problems - payroll, airline reservations, insurance quotes, credit cards, catalog order stats.
Go's strings.Index/bytes.Index also often use the "find first byte" approach; now on amd64 it's a fancy vector instruction. If it finds the first byte too many times without a full match, it falls back to a Rabin-Karp match using a rolling multiplicative hash.
Battle-tested implementations of fundamental things like string searches, sorts, hashtables, allocators, graphics primitives, etc. are often interesting 'cause you find out a lot about what you need (and/or don't need) to work well in practice as well as protect from exploding worst-case scenarios.
I sometimes think that you should be be able to hint the compiler that this is a smallish string, this one can be big, this list should be mostly under 1000 elements so that it handles the correct version of an algorithm (bonus point to let me provide several algorithms with different use cases)
As a senior dev, I wish that I could say I always knew more than my interns, and that all the code that's there is because it was carefully planned to be that way.
But more often than not, I don't, and it's not.
My take on it is that for the most part senior devs are a lot more paranoid on breaking things and don't want to make code changes unless enough people are asking for it and it'll make a notable difference. Even making things strictly faster can break users if they depend on the algorithm being slow (an extreme edge case that you'd usually say is the user's fault anyway, but could nonetheless be relevant in things like optimizing away a spin-wait).
I'm a dev with some grey hair who feels it would have been useful for all that fantastic domain knowledge from Paterson to get documented in a code comment.
I'd love to hear if either of them ever went back and did that?
In this case I think not. However I strongly agree with your position and have tried hard to ensure that my code is commented such that it explains why the code works as it does. See https://ericlippert.com/2014/09/08/comment-commentary/ for more thoughts on this subject.
I was looking for this, thank you. I cannot get over when people try and keep all this knowledge in their head. Have they never worked on something written by someone other than themselves? I know it usually isn’t, but it really feels like some kind of hubris that makes people allergic to writing a paragraph block comment in the code.
I always envy people who work on this level instead of cobbling systems together that integrate several systems all with their own set of flaws and you can be happy if you can make them work together somehow. The algorithm stuff seems pretty simple in comparison. A very local problem that can be profiled well and you can understand most of the factors at play.
The "cobbling systems together" stuff is code bureaucracy rather than programming. You're no longer dealing with constraints of physics, mathematics and sound system design - you're building a bit-pushing layer in between Kafkaesque monstrosities that were never truly intended to work together.
Sadly, most programming jobs seem to primarily involve code bureaucracy rather than the "algorithmic" level.
I've always seen the "cobbling systems together" stuff as actually software engineering, as that's what you need to have a final production system running. What you describe as "programming" I would say is computer science.
And for me, the work that solves real-world problems tends to be software engineering (using my own definition), rather than computer science (again, using my own definition), which seems to be more about optimizations.
I got the chance at work recently to work solo on a small, greenfield project, where I was fully in control of all the pieces and the problem space was small. I could do it any which way I wanted.
Gods, it was wonderful!
As I've moved up the ladder and worked on complex enterprise systems, with umpteen integrations, overly-strict SAST systems, enforced 90% test coverage and the like, I seldom feel the "joy of code". It was good to feel it again!
For a while I worked in video decoding/encoding. That was true engineering. I loved reading up about cache and assembly and then applying this knowledge to our code. It was ok to work on one function for weeks just to get 20% speed up. Now I do enterprise and it’s just horrible. Between dealing with stakeholders and 3rd party systems that barely work there is no room for systematic engineering.
I was never an expert in video and during 2008 my company went down. The only position I found after a while was doing C#/.NET. Outside Silicon Valley there are not too many hardcore dev roles anyway and in addition I feel a lot of that work is done in Asia now.
Applying algorithms in non classroom assignment style is rarely trivial. Takes some insight into nature of the problem to spot the opportunity and good familiarity with applicable solution in the first place.
This is also why C++ (and hopefully other languages by now) has the short string optimization. On a 64-bit machine, a pointer to a heap-allocated buffer is 8 bytes long. A stupidly large number of the strings used by actual programs are <= 7 bytes long (it's actually more like 20 in C++, because strings also include a length and capacity). Think about it: that's virtually every file extension, path separator, and field delimiter, and a good number of field names, labels, first & last names, usernames, identifiers, URL path components, parameter names, etc. If you can fit the whole string into an immediate value, then you can avoid the heap allocation and deallocation entirely, you can avoid chasing pointers and incurring a cache miss, and you can very significantly reduce the total amount of memory used.
This is also a good reason why, if you're designing a standard library, you really want to have length-prefixed strings instead of null-terminated ones. If you know the length of the strings you're dealing with, you can swap out the algorithm you use, such that you might use brute force for a very small needle, word comparisons if it's exactly 4 bytes long, SSE instructions if it's a word-length multiple, or Boyer-Moore for very long strings.
Imagine that the intern didn't dare to ask such questions.
a - he could've gone out thinking that performance doesn't matter. but it certainly does in a piece of code being used daily by thousands of devs.
b - he could've thought that the simple implementation is faster but missed the fact that skip is implemented in assembly.
c - he could've realized both but missed the why.
and these failure scenarios are likely to happen because this is an intern we're speaking about.
One or two tricks up your sleeve do not matter but repeat this a 100 times which one do you think would be a better programmer?
I think the willingness to challenge authority figures and to be (often) proven wrong and to learn from it is an essential part of becoming better.
Maybe Tim was understanding because he himself challenged people older than him and in-process learned form them.
Advice like "don't reinvent the wheel", "avoid premature optimization", "write boring code" promote exploitation.
which is the optimal strategy for older agents.
but for newer agents, a higher level of exploration is needed otherwise they would converge into a suboptimal strategy
>As Tim explained to me, first off, the reason why VB does this seemingly bizarre “find the first character match, then check if query is a prefix of source” logic is because the skipto method is not written in the naive fashion that I showed here. The skipto method is a single x86 machine instruction.
oh my bad then, from this text it sounded like it was written in assembly.
I'd say that for widely used library code it makes sense to implement an efficient algorithm. By its very nature the context in which a library routine will be called is unknown and so it must be prepared for everything. Also for widely used code the amount of total saved machine time can be quite substantial.
In the story the senior dev makes a judgment call (that this routine will be used in LOB applications where the worst case is unlikely to appear so the implementation is OK) which is probably correct, especially considering other priorities. And of course senior devs are much better equipped to make this kind of calls than juniors, but they still can and will guess wrong.
> Moreover, Tim explained to me, any solution that involves allocating a table, preprocessing strings, and so on, is going to take longer to do all that stuff than the blazingly-fast-99.9999%-of-the-time brute force algorithm takes to just give you the answer.
That's why a good implementation will dispatch to the best algorithm at runtime!
Of course, if this code were running in a server, or if it were part of a library that was widely used, then suddenly you have the potential for a DoS vulnerability.
The narrative would then be that the microseconds you saved all those devs over the years were wiped out when hackers took down your system.
The senior dev knows the system, knows what they can get away with, and has choices to use those knowledge powers. A newbie will always follow the programming rules, like never going over the limit on a hwy. as they can’t take much calculated risks
> NO! YOU CAN’T JUST USE BRUTE FORCE HERE! WE NEED TO USE SEGMENT TREES TO GET UPDATE TIME COMPLEXITY DOWN TO O(LOG N)! BREAK THE DATA INTO CHUNKS AT LEAST! OH THE INEFFICIENCY!!!
I'm the opposite of this stereotype, and I think there are more like me. Two reasons as to why:
(1) Psychological:
I never had this. As a junior dev, I don't like to optimize because I feel a bit of pain when I need to moderately focus. I can do it, and I've done it quite a lot. In that sense, I've experienced quite a bit of pain in my life. It's something I've learned to live with. And when I have to focus, I prefer to focus all the way, because whether I focus on 50% or 100%, the pain is roughly similar. This leaves me in a state of either being lazy(ish) and wanting to program in a simple and understandable way versus being willing to go as deep into a topic as I would need to and focus on all the details step by step.
When I'm intense, I also am still sympathetic towards simple code because I know that I understand that in both states. I only understand complicated code when I'm focused.
(2) There are enough CS grads that know better:
Also, on another note. Efficiency analysis is simply not taught at university. Parts of it are taught, but they're never connected to real world cases.
For efficiency analysis (note I haven't done any but I've read a thing or two and talk with a friend who is a performance engineer quite regurlarly about it) I think there need to be a few perspectives in check:
1. What is the user's actual behavior.
2. What is the behavior for malicious attackers (if applicable, Intel should do this more, to my knowledge they are doing it more now).
3. How does the compiler translate it to assembly?
4. How does it theoretically look? Specifically: worst-case, best-case, average-case in both theoretical and empirical sense.
Concluding:
I only have one year of experience in software development, where no one cared about performance yet I know better than to look only at the theoretical speed. So I guess I'm a counter example to the stereotype. I'm not alone. Any CS student who reads HN has a high chance of stumbling upon articles like this and will know that performance analysis is not only about calculating the space-time complexity.
The "malice" aspect is a great one and I did not go into that in this post because of course back in the 1990s we were not at all concerned that someone would maliciously craft inputs that would slow down this algorithm.
In modern code we'd want to do a threat model that considered the consequences of untrusted inputs.
It's the whirring sound of a motor. The for loop is metaphorically spinning to loop over everything instead of doing something smarter to find what it needs. The programmer thinks the whirring sound means it's doing a lot of work for him.
It's an interesting dichotomy because the most practical solution could go either way. Maybe it's looping over a table of 20 customers and the wasted time is microseconds that wouldn't even justify ten minutes of developer thought, or maybe it's looping over a million customers and causing all sorts of capacity problems. Maybe the memetic "senior dev" here knows the former is true, or maybe his "senior" experience was all misplaced and he's clueless.
It originated from a meme of the Federal Reserve printing more monopoly money. "Printing machine goes brrrr". As they turn the crank ever faster making the printing machines smoke.
A Visual Basic program is not allowed to have undefined behaviours like a C program; InStr has a specification and that specification has to be implemented; that spec includes defining the behaviour for all inputs.
There's also no null handling here, which was a deliberate omission for clarity. In practice, the convention used inside the VB source code is that null string pointers are semantically the same as empty strings, which introduces some complexities.
`starts()` looks like it's not checking if len(source) < len(query), but if, say, query="foobar" and source="foo", when i = 3 the line
if (source[i] != query[i])
return false;
will evaluate to
if ('\0' != 'b')
return false;
so `starts()` will correctly return false.
Always if len(source) < len(query), we'll return false when we get to i=len(source), because source[len(source)] != query[len(source)] as source[len(source)] == '\0' and query[len(source)] != '\0' since len(query) > len(source).
There is a dereference past the bounds of the query in one case in the last code sample, but there is no deference beyond the bounds of the source string.
You're probably thinking in C# or Java; remember that in C the convention is that a zero char ends strings. If the source string is shorter than the query string then the code will encounter a zero char in the source string at the same time as it encounters a non-zero char in the query string, and the inequality will end the loop before the beyond-bounds dereference.
No null checks in find. If source is null, dereferencing source[i] in the while condition will cause undefined behaviour. If just query is null, the same will occur in the first call to starts.
One of my job tasks was improving our custom string library. In VB!
Sure InStr was slow: but we were trying to do "starts with" across very long strings.
Sure, it worked, but a slightly smarter solution was 100 times faster.
Upon finding it I went on a search, I found it written nearly identical 10 years before me, in a different version of a string library (hint, that one was faster).
The trick to knowing we needed to improve it: profiling!
This isn't always the case that simplistic algorithms don't cause issues Bruce Dawson keeps on finding O(n^2) algorithms in windows that keep on biting him. I'm always impressed with his work.
I take your point, but let's be fair. My attitude was "this code is bad and I'm going to demonstrate my skill by improving it" when it should have been "please teach me what design and implementation concerns went into the choice of algorithm here". I was lucky to get a gentle and thoughtful correction for my presumptions.
Kudos to you first for recognizing your own error and second for openly admitting it.
I think there many things to consider here. For new developers, I'd encourage you to look for those "old guys" who really know their stuff. There's a lot of unmined gold you can discover there if you find the right ones. I think us older developers would do well to imitate the patience and kindness of Tim Paterson more often. I guess what I'm saying is both sides could do with a huge dose of humility. I know at times I've been the youthful dev out to one-up the "old guys", and I've been the senior dev thinking "these kids today" to myself when dealing with those with a lot less experience. And both of those are bad.
Also, there are many times when a young guy fresh out of college spots a problem, finds a great solution, and makes things ten times better by just doing it! If you're in the business for a long time, it can become too easy to be cynical, and lose your enthusiasm.
I think the best thing we can do is try to let the good stuff from both sides rub off on each other.
It's great that you saw that you had a bad attitude and reflected on it. But ignorant arrogance is a personality trait some people have sometimes, not a new grad trait.
> Is it really so hard to help other people learn, and to accept that the only advantage you have on them is starting earlier?
If only the world were that black-and-white. It took me a long time to realize the habits I learned from my father in this area were toxic. Not everybody got the same upbringing you did, it sounds like yours was more advantageous than mine in that respect.
Except it is a struggle. It's often a struggle to get newer devs to stop wasting time, it's often a struggle to get them to focus on the problem you're trying to solve instead of the new library all the cool kids are using, etc.
I agree with the "more than they deserve" mentality, but let's be honest here: it's a struggle.
We've all been through it as new devs, and we'll all help new devs struggle through it as well.
I've worked with juniors who always scope creep their very simple introductory tasks. Something as simple as "add these two fields to the existing API response" suddenly turns into "change the method signature of 90% of existing methods because DTOs make code lines shorter".
New team members also notice the mess veterans made by feature creeping "add two more fields" to a 100 field monstrosity that is a usability and security nightmare, but no one can improve it because the lead engineer got promoted for building v1 and refuses constructive criticism.
I've seen in with senior devs too. "architecture astronauts" and "principal engineers" interfering with solving problems. Why drag an unnecessary distracting stereotype into the conversation?
Sometimes it’s not just coding tools/techniques where new grads can benefit from a bit of oldster wisdom, like that one time I convinced a new grad it was wiser to slog through the bureaucratic approval process for a new api than attempt to hack into our company’s Oracle Financials db.
This reminds me of how insertion sort is the most efficient sorting algorithm for small values of n. I remember new-grad me always dismissing insertion sort in every situation because of asymptotic complexity. Engineers are supposed to find the most pragmatic solution without biases.
To be fair, there have also been plenty of times in this situation where the senior dev was wrong. You were right to question things and we should make sure to continue to encourage questions. The difference between a new grad and a senior dev is that the senior dev has the real-life experience to make the "yes/no" decision as to whether the new grad's input actually makes things better (e.g. it could be true that it doesn't matter if it is asymptotically faster if we are always working with small data). It's also a trait of a senior dev to be able to stay level-headed in these situations
Edit: I thought I was responding to someone, but I can't find the comment now.
I find that being open to new ideas, and allowing some time to prove them out rather than deciding beforehand, is the best way to find ideas that move the needle.
Senior comes to you with an idea? Great, prototype it, prove it out. Junior comes to you with an idea? Great, prototype it, prove it out.
Of course, you need to allow some time for prototyping.
When I ran teams, I told them part of their job was to spend the last half of Friday (unless emergency) prototyping their ideas and presenting their favorites at some point.
No one works the last half of Friday anyway, unless it's on this.
Well, this is a nice example of lack of knowledge flow. The senior developer spent his time explaining things that should have been put in the procedure's comment. Yes, that's what comments are for.
As a new grad myself, I completely agree with the author’s freak out when seeing “inefficient” looking code. But I feel like this applies to basically all aspects of a company life as a new grad software engineer. For me at least, there seems to be a lot of inefficient things going on in every aspect of the company. Maybe the author is right in that there is a reason behind those seemingly “inefficient” things. However, the inefficiencies are there because sometimes, people who have been working there for years just have become used to it.
One thing you learn over time is many of those code horrors are there for a good reason, put there by someone solving a real problem, in the best possible way given time constraints. It is just the reason is lost to history. This is one reason the "big re-write" almost always fails. You are throwing away all those hard learned lessons just because they aren't apparent on reading of the code.
That is also true (perhaps less often) for byzantine process related procedures.
Seems like the original developer should have left a comment about why their nested for loop was the right implementation (if they knew it was). Would have saved everyone a lot of time.
I would use the standard ISO C strstr instead of writing the function they are calling find.
Yes, we had strstr in 1994; it was in the 1989 ANSI C standard.
On a very small system, strstr will probably just be naively coded, to save space. On a big system, it might use Knuth-Morrison-Pratt or Boyer-Moore. I don't have to care; by using strstr, that decision is taken care of.
As I noted in the text, the actual problem that we had to solve was complicated by a great many factors not the least of which was the fact that the library had to handle strings from different string encodings.
I think it is good when new grads have the knowledge to see that those things don't look right and can articulate how and why it should look different.
After all, when you come across a problem, that does not contain just 100 chars, it is very helpful to know what you can use to create something that still works with reasonable resources.
The only nested loop implementation I've ever changed for performance was due to the fact that there was a freakin' database call in the inner loop. All I did was refactor the data structure so that the db call was done outside the outer loop. Instant 20x improvement in performance.
Were the x86 string instructions much faster in 94? That was 486/Pentium era, right? IIRC this has varied over the years with sometimes slow microcoded string instructions and then faster improved instructions again.
The explanation behind the code is exactly the kind of thing that belongs in a comment. The why, not the what. So even if it wasn't a failure of programming, it was a failure of documentation.
> The skipto method is a single x86 machine instruction.
That’s not always a good thing, especially on modern hardware. And obviously, the “single instruction” doesn’t mean it’ll take bounded time to execute…
Tangentially to your point, here's something I haven't thought about much: when these instructions get an interrupt, I imagine they've updated (r|e)si and (r|e)di, (r|e)cx etc. to reflect where they are in their copy or scan loop. So if you get a page fault in the middle, then the kernel does enormous amounts of work in response to it, then resumes that single instruction, it resumes in the middle of the loop, not the start.
So to reiterate one aspect of your point, there might be lots of work, written in C, that occurs in response to the page fault in the middle of your "hardware-backed" single instruction. On top of all the other complexities of cache vs memory access etc. that make scanning and copying memory complicated no matter which way you do it.
But probably in the heyday of Visual Basic, and especially the DOS-based BASICs that preceded it which wouldn't have had virtual memory at all, all of this is less of a concern. The story takes place in a simpler time which serves as a plot device to better illustrate the point.
Pretty pointless as said page fault would occur no matter how you access the string. Besides, the page in question would very likely be already present.
>> Besides, the page in question would very likely be already present.
> Really? How are you so sure?
It was in a BASIC interpreter. Most of the time string needle in a haystack search is done, haystack is relatively fresh, almost certainly on a page that is present. Might not be in CPU dcache, but that's another matter.
REP SCASB on 1994 (same year as the incident in the article) Intel Pentium would have taken 9 + 4 * n clock cycles [0][1] to go through a string. So scanning 4 chars long string takes 25 clock cycles.
It could be faster than some alternatives under some circumstances. A loop would probably be more code (== icache pressure) and would consume at least a register and a BTB entry.
It's certainly been a while since I last optimized for the original Pentium architecture. Still faintly remember U & V pipes, unexplained causes for stalls, etc.
As even nowadays, it would likely depend on the particular algorithm and data set. I'd be surprised if you can't do better than 4 cycles per char for sufficiently long strings. Most likely for short strings, REP SCASB wins due to setup costs. (Actually that article's skipto method would have of course used REP CMPSB, but that's just splitting hairs.)
Remember that even original Pentium could execute up to two instructions per clock. Unless you messed up with those damn U & V pipes. :-)
The hypothetical faster-than-rep solution would need to process data in 32-bit chunks, faux vector style.
You would be surprised about what is happening in modern computers. I don't know about REP SCASB, but IIRC, REP MOVSB is now an insanely efficient way to memcpy on last Intel microarchs (not necessarily always the fastest, but really fast enough for tons of scenario, and very I cache friendly). But it might be less interesting on some other x86 processors.
It makes sense to delegate some of the microoptims to the hardware.
But regular scalar instructions are also optimized like crazy. Write a small loop, and your state of the art microarch might sort of unroll it by using register renaming and speculative execution, so sometimes basically multiple iterations are executed at the same time (and on top of that you sometimes get uOP cache locking, which then improves energy and hyperthreading efficiency).
The way I read this, TIM FREAKIN' PATERSON, the man who wrote the most hated "operating system" in history, put an algorithm with a nasty performance bug into the standard library of VB, and conveniently forgot to document that this function is no good for large and/or repetitive inputs. Confronted with his blunder, he not only doesn't correct his blunder, but instead condescendingly proclaims that those inferior VB coders would never need long strings, just like 640kb ought to be enough for everybody, and if they did anyway, they'd surely use a library written for the purpose by a real programmer.
The arrogant prick should have listened to the new grad.
There's some good wisdom in this story. It reminds me of that article posted here about a more page cache efficient b-trees vs. Knuth's theoretically ideal version
There is another type/layer of new grad - the kind without CS background and not being able to make a right solution (let alone balancing different concerns)
New Grad: I'd use a linked list here because insertion into the middle is O(1) rather than O(n).
Senior Dev: Linked lists have very many more cache misses than do vectors, and the difference between hitting cache and hitting main memory is such a huge constant factor that for most reasonable list sizes it never makes sense to use a linked list. Use a vector. Checkmate, smug Lisp weenies.
Yeah, the Lisp world figured out cdr-coding in the 1970s and had moved on to chains of vectors by 1985 according to https://cpsc.yale.edu/sites/default/files/files/tr362.pdf. Fortunately they avoided changing the API to make this tradeoff.
I feel that most will be sympathetic if to you explain why you did something a certain way and showed that you understood the different approaches available.
Everyone who has done profiling knows that (almost always) the performance problem is in the part that you least expect it. But also that sometimes you wrestle to improve the performance of an algorithm hitting a hard wall and than someone comes up with an idea, often resulting from a fresh look on the problem, that results in an improvement of several orders. I guess that performance is at least as counter intuitive as statistics. The same is true for some other things like scalability and reliability.
Actually, I think that it scares the hell out of most of the developers that it is so difficult to get a grip on these things. It is so easy to think that there is a simple solution, a grand idea that will fix the problem. I still find myself falling in the trap and this after having developed software for over 30 year. It is the Dunning–Kruger effect over and over again. I guess it more that as a more senior engineer, you have experienced a little more often.
> I guess that performance is at least as counter intuitive as statistics.
I don't think this is really true. After you've optimized enough code over the years, you start to get a sense for bottlenecks, and your code is usually "fast enough" even on the first try. When it isn't, finding the problem with a profiler is usually pretty straightforward.
I understand this as phenomenon as the new grad and the senior developer are optimizing for different things. The new grad is focused solely on the asymptotic complexity of the code. It doesn't matter how slow or how complicated it is in practice, they are solely focused on using the fastest data structure asymptotically.
The senior developer optimizes a different set of criteria:
1) How hard is it to understand the code and make sure it's correct.
2) How fast the algorithm in practice.
There are several different reasons why the performance of the algorithm in practice is different than the performance in theory. The most obvious reason is big-O notation does not capture lots of details that matter in practice. An L1 cache read and a disk IOP are both treated the same in theory.
A second reason is the implementation of a complex algorithm is more likely to be incorrect. In some cases this leads to bugs which you can find with good testing. In other cases, it leads to a performance degradation that you'll only find if you run a profiler.
I one time saw a case where a function for finding the right shard for a given id was too slow. The code needed to find from a list of id ranges, which one a given id fell into. The implementation would sort the id ranges once ahead of time and then run a binary search of the ranges to find the right shard for the id. One engineer took a look at this, realized that we were doing the shard lookups sequentially, and decided to perform the shard lookups in parallel. This made the code faster, but we still would have needed to double the size of our servers in order to provide enough additional CPU to make the code fast enough.
Another engineer hooked the code up into a profiler and made a surprising discovery. It turns out the implementation of the function was subtlety incorrect and it was sorting the id ranges on every call. This happened because the code sorted the id ranges inside of a Scala mapValues function. It turns out that mapValues does not actually map a function over the values of a hash table. It instead returns an object that when you look up a key, it will look up the value in the original hash table, then apply the function[0]. This results in the function being called on every read.
The solution was to replace mapValues with map. This dramatically improved the performance of the system and basically brought the CPU usage of the system down to zero. Notably, it would have been impossible to discover this issue without either knowing the difference between map and mapValues, or by using a profiler.
Recently I worked on the same type of project as someone with 10 yrs of experience & a CS degree from Stanford.
A few months later, I created a project, and had a manager with a CS degree. However, when I left, that manager was unable to pickup where I left off, and he ended up leaving soon after.
I have less years of experience, but to me, what matters more is the time within that experience which was put into a relevant business model, product built, or past projects. I.e. a Senior Dev with a CS degree, vs. a New Grad with a Business Background & SWE Experience. It's apples to oranges in many cases.
Also, I'd echo another comment here:
>"daxfohl 1 hour ago [-]
> As a senior dev, I wish that I could say I always knew more than my interns, and that all the code that's there is because it was carefully planned to be that way.
>But more often than not, I don't, and it's not. "
There are indeed many not-really-programmers with CS degrees.
On the other hand, sometimes I'm handed a program written by a new grad to maintain/fix/improve, and rapidly determine that it's less work to just chuck it in the bin and start over.
One of the major differentiators between a newbie and an old hand is knowing how to create a piece of software that those who work alongside you or come after you can understand, maintain, and improve.
but it's not uncommon for jr. devs to believe every piece of code deserves the most efficient runtime. Runtime speed causing projects to fail is very uncommon. What does add an incredible amount of work time is combing through a codebase looking for micro optimizations. I've never once seen a jr. dev who claimed to care about efficiency start by writing benchmarks over large parts of the system and using that to find real bottlenecks. No, they always comb through the code trying to impress the sr. dev. with their algorithmic knowledge.
> Runtime speed causing projects to fail is very uncommon.
That's true. What is common, however, is bad runtime performance losing you users and bleeding your money. Not doing dumb things (like using a list where a vector would do), and taking a moment every now and then to go over your product with a profiler and fix the biggest bottlenecks early, can save you a ton of money in cloud bills (it might even turn out that your product actually doesn't need to horizontally scale at all, giving you further reduction-of-complexity benefits). Or you might end up delivering features that were impossible to do with bad performance (see e.g. https://news.ycombinator.com/item?id=22712103).
That gave me a chance to explain the testing and refactoring cost that would come with changing python versions, and how the benefits to users would be almost zero. And then at some point one of the new juniors said, "hey, there's a lot of filesystem performance improvements and readability improvements (f-strings) in 3. I can get the test/refactor done in a month, and I think it's a net positive." They were right, and now we're on python3.
So, sometimes we all learn something.