I mean, yes and no. It was a software challenge to hit the hardware limit, but the hardware limits were also much lower. My team stopped optimizing when we maxed out the PCI bus in ~2001.
I don't see how you could have read the article and come to this conclusion. The first few sentences of the article even go into detail about how a cheap $1200 consumer grade computer should be able to handle 10,000 concurrent connections with ease. It's literally the entire focus of the second paragraph.
2003 might seem like ancient history, but computers back then absolutely could handle 10,000 concurrent connections.
In spring 2005 Azul introduced a 24 core machine tuned for Java. A couple years later they were at 48 and then jumped to an obscene 768 cores which seemed like such an imaginary number at the time that small companies didn’t really poke them to see what the prices were like. Like it was a typo.
We're slowly getting back to similarly-sized systems. IBM now has POWER systems with more than 1,500 threads (although I assume those are SMT8 configurations). This is a bit annoying because too many programs assume that the CPU mask fits into 128 bytes, which limits the CPU (hardware thread) count to 1,024. We fixed a few of these bugs twenty years ago, but as these systems fell out of use, similar problems are back.
> Driven by 1,024 Dual-Core Intel Itanium 2 processors, the new system will generate 13.1 TFLOPs (Teraflops, or trillions of calculations per second) of compute power.
This is equal to the combined single precision GPU and CPU horsepower of a modern MacBook [1]. Really makes you think about how resource-intensive even the simplest of modern software is...
Note that those 13.1 TFLOPs are FP64, which isn't supported natively on the MacBook GPU. On the other hand, local/per-node memory bandwidth is significantly higher on the MacBook. (Apparently, SGI Altix only had 8.5 to 12.8 GB/s.) Total memory bandwidth on larger Altix systems was of course much higher due to the ridiculous node count. Access to remote memory on other nodes could be quite slow because it had to go through multiple router hops.
Half serious. I guess what Iwas saying is that it is that kind of science which is still very useful but more to nginx developers themselves. And most users now dont have to worry about this anymore.
They didn't have this kind of compute back when the article was written. Which is the point in the article.