Most software in the industry is slow because it's doing a lot of stuff that it shouldn't. Often times additional "optimization" layers adds caching, but makes getting to the root of the issue harder. The biggest win is primarily getting rid of things you don't need and secondarily operating on things in batch.
My playbook for optimizing in the real world is something like this:
1. Understand what you're actually trying to compute end-to-end. The bigger the chunk you're trying to optimize, the greater the potential for performance.
2. Sketch out what an optimal process would look like. What data do you need to fetch, what computation do you need to do on this, how often does this need to happen. Don't try to be clever and micro-optimize or cache computations. Just focus on only doing the things you need to do in a simple way. Use arrays a lot.
3. Understand what the current code is actually doing. How close to the sketch above are you? Are you doing a lot of I/O in the middle of the computation? Do you keep coming back to the same data?
If you want to understand the limits of how fast computers are, and what optimal performance looks like I'd recommend two talks that come with a very different perspective from what you usually hear:
Strongly agree. That's perhaps less true for the software I work on these days (lapack), but I've seen that so many times over my career. I'm also a big fan of "Efficiency with Algorithms, Performance with Data Structures" by Chandler Carruth at CppCon 2014. https://youtu.be/fHNmRkzxHWs
My playbook for optimizing in the real world is something like this: 1. Understand what you're actually trying to compute end-to-end. The bigger the chunk you're trying to optimize, the greater the potential for performance.
2. Sketch out what an optimal process would look like. What data do you need to fetch, what computation do you need to do on this, how often does this need to happen. Don't try to be clever and micro-optimize or cache computations. Just focus on only doing the things you need to do in a simple way. Use arrays a lot.
3. Understand what the current code is actually doing. How close to the sketch above are you? Are you doing a lot of I/O in the middle of the computation? Do you keep coming back to the same data?
If you want to understand the limits of how fast computers are, and what optimal performance looks like I'd recommend two talks that come with a very different perspective from what you usually hear:
1. Mike Acton's talk at cppcon 2014 https://www.youtube.com/watch?v=rX0ItVEVjHc
2. Casey Muratori's talk about optimizing a grass planting algorithm https://www.youtube.com/watch?v=Ge3aKEmZcqY