Performance optimization covers a lot of topics, it depends on what you are trying to optimize.
1. Latency vs throughput. Oftentimes they are the same, i.e. reduce the time it takes to do something. However, when you passed a certain threshold, techniques that can optimize throughput will hurt latency, so it is important to know what you are looking for. There are also low level details if you have rather extreme latency requirement, e.g. pinning the cores, kernel settings etc.
2. Knowledge about the overall system and your input distribution. While this seems trivial, often times you can get large performance improvement by avoiding redundant work, either by caching or lazy evaluation. Some computation may only exist because they may be needed later, and these can be avoided by lazy evaluation.
3. Better algorithms. Again, this seems trivial but oftentimes people are using algorithms that are far from optimal. And even if the algorithm can be asymptotically, there may be faster algorithms for special cases or faster in practice. Optimizing special cases may be rewarding if they occur frequently. Do you really need optimal solutions? Can you allow randomization? Can you do optimization on the queries to make it faster overall without optimizing individual operations?
4. Parallelization. Can you do parallelization? Are your problem instances large enough, or individual stages slow enough to benefit from parallelization? Do you have computation that are trivially parallelizable and can benefit from offloading to the GPU? If your code is waiting on some events, can you make them async? Can you avoid locks or atomic operations in your parallel code?
5. Data structure optimization. Can you reduce the number of allocation needed? Can you make the data structure more linear and predictable so the CPU can have better cache utilization? Can you compress certain data if they are sparse?
6. Low level CPU/GPU optimizations. There are a lot of great resources out there, but only do it when you are very sure it will be worth it, i.e. they are bottleneck in your system.
If you make your batches small, you can get pretty much all of the benefit without adding (appreciable) latency. e.g. batch incoming web requests in 2-5 ms windows. Depending on what work is involved in a request, you might 10x your throughput and actually reduce latency if you were close to the limit of what your database could handle without batching.
Agreed. When dealing with low hanging fruit, you often improve both.
Later though… once your system is somewhat optimized, you will tend to make latency vs throughput decisions. For most people though, slight changes to latency are the cost to large increases in throughput, but that may just be my experience.
Please do read a bit about history of TCP and how latency impacts the overall throughout. It's a classic thing and applies to any processing where you need results of some steps to proceed.
1. Latency vs throughput. Oftentimes they are the same, i.e. reduce the time it takes to do something. However, when you passed a certain threshold, techniques that can optimize throughput will hurt latency, so it is important to know what you are looking for. There are also low level details if you have rather extreme latency requirement, e.g. pinning the cores, kernel settings etc.
2. Knowledge about the overall system and your input distribution. While this seems trivial, often times you can get large performance improvement by avoiding redundant work, either by caching or lazy evaluation. Some computation may only exist because they may be needed later, and these can be avoided by lazy evaluation.
3. Better algorithms. Again, this seems trivial but oftentimes people are using algorithms that are far from optimal. And even if the algorithm can be asymptotically, there may be faster algorithms for special cases or faster in practice. Optimizing special cases may be rewarding if they occur frequently. Do you really need optimal solutions? Can you allow randomization? Can you do optimization on the queries to make it faster overall without optimizing individual operations?
4. Parallelization. Can you do parallelization? Are your problem instances large enough, or individual stages slow enough to benefit from parallelization? Do you have computation that are trivially parallelizable and can benefit from offloading to the GPU? If your code is waiting on some events, can you make them async? Can you avoid locks or atomic operations in your parallel code?
5. Data structure optimization. Can you reduce the number of allocation needed? Can you make the data structure more linear and predictable so the CPU can have better cache utilization? Can you compress certain data if they are sparse?
6. Low level CPU/GPU optimizations. There are a lot of great resources out there, but only do it when you are very sure it will be worth it, i.e. they are bottleneck in your system.