My immediate question is how much of this is due to the analysis, caching, mysql optimization and how much due to the beefing up of their database servers (50% faster CPUs, 4x RAM).
My impulse would be to think that the hardware is responsible for most of it, which makes the article somewhat less interesting.
That occurred to me as well. A more rigorous analysis would have performance numbers with only different combinations of optimizations turned on, including the new hardware. If you have a single experiment but change multiple variables, you have no way of knowing which variable or combination of variables is what makes the difference.
The point of the post wasn't to provide a rigorous performance analysis, it was to highlight the overall improvement. We prefer to attack the problem of performance from as many avenues as possible so that we can get the most
That said, the vast majority of the improvement was from the application changes, and not the hardware upgrades. Between server-side caching via memcached and conditional GET using if-modified-since headers and ETags, we were able to see dramatic improvements. We've also spent a lot of time analyzing the behavior of the Ruby garbage collector in our applications and making changes where appropriate to improve performance.
Did the database server upgrades make a difference? Of course they did, but it was an incremental improvement -- think along the lines of 10%. The point was to give us a longer scalability runway before we have to do things like sharding the database, not to provide an instant performance increase.
This comment is actually better than the original posting. I could get all friend-feed, Dave Winer-ish and start talking about "conversations." :-)
In general, in my opinion, profiling languages in web applications is a minor win because generally they're just glue. The real payoffs are on the client side (which you've just brought up) and the database side (which I wanted to make more prominent).
An overview is nice, but it doesn't tell the reader which techniques made the most difference. Naively applying performance optimizations without measuring their individual impact is not a good idea. It sounds like you did not do that, but the reader has no way of knowing that.
Is anyone here using New Relic RPM? The demo looks slick. We're currently doing things a little more adhoc for Kongregate. Just using pl_analyze - http://rails-analyzer.rubyforge.org/pl_analyze/
My impulse would be to think that the hardware is responsible for most of it, which makes the article somewhat less interesting.