Hacker News new | past | comments | ask | show | jobs | submit login
How a 1-Engineer Rails Site Scaled to 10 Million Requests Per Day (railsinside.com)
42 points by wgj on Oct 9, 2009 | hide | past | favorite | 28 comments



The summary: An surprisingly large investment in a complex platform development and extensive hardware purchases.

Let's be really, really generous and say peak load on 10 million requests is 5 hours, and pack all 10 million requests into 5 hours. That's 2 million requests/hour, 33,333 requests/minute, 555 requests/second.

You can easily handle 5-10k requests a second with a few ms response time -- including dynamic templating and hitting a backend database -- on a 2.66 GHz 4 core Xeon 5150 with a couple gigabytes of RAM, using a servlet runtime and a JVM-based language.

Basically, you could take that entire solution stack and compress it down to a few items with far fewer moving parts, and spend half as much on engineering and capital expenditures.

My hope is that efforts like MacRuby bring this level of free performance to the Ruby world, and that languages like Scala and Clojure make the JVM more attractive, too.


Looking at this setup, I agree that the server costs are high for the amount of traffic they have. That being said, I don't believe your proposal is a fair comparison.

Of course, using Java on bare metal is going to have more performance/$ than ruby on Xen. That's not the point. The point is to use tools that maximize human capital at the expense of server costs.

I don't know what you mean by reducing moving parts. The only parts involved in handling the requests for this setup are Nginx and HAProxy. All the other technology parts (Tokyo Cabinet, memcached, Sphinx) are used to scale the database, something your stack still has to achieve.

The tradeoffs between using bare metal vs virtualization and using Rails vs Java are well known at this point. Recommending a NASCAR driver to use a Formula 1 car isn't helpful advice.


Of course, using Java on bare metal is going to have more performance/$ than ruby on Xen. That's not the point. The point is to use tools that maximize human capital at the expense of server costs.

That assumes that the tools maximize human capital more than the tools that maximize both human capital and minimize server cost. Do they?

I don't know what you mean by reducing moving parts. The only parts involved in handling the requests for this setup are Nginx and HAProxy. All the other technology parts (Tokyo Cabinet, memcached, Sphinx) are used to scale the database, something your stack still has to achieve.

Lucene, ehcache -- in-process libraries to achieve the same effect but without all the moving parts. Or, instead of lucene, PostgreSQL's built-in FTS.

You also don't need nginx, haproxy, and a complex architectural division between static/non-static content.

The tradeoffs between using bare metal vs VM and using Rails vs Java are well known at this point.

I'm not sure I agree, given the constant repetition of "Ruby is more productive" as a justification for poor performance and complex architecture. Is it more productive than Scala? Clojure? Groovy? What about using even just using JRuby?


In-process libraries that are orthogonal to your domain aren't a golden ticket. SOA lets you scale the parts that need to, one service at a time, be that caching, search, storage or domain logic.

I disagree that in-process libraries are actually less complex architecture, it just hides the complexity within a single process.


In-process libraries that are orthogonal to your domain aren't a golden ticket. SOA lets you scale the parts that need to, one service at a time, be that caching, search, storage or domain logic.

SOA is also considerably more complex to architect, configure, and administer. If you don't need it yet, why pay for it?

I disagree that in-process libraries are actually less complex architecture, it just hides the complexity within a single process.

I'm not sure what you mean by "hide". Using these tools you can scale up a much larger order of magnitude before having to worry about breaking out services.

We're talking about the difference between 1-2 servers and 10-20, and that's a fairly substantial difference in man hours, architectural complexity, physical plant, etc, especially early on in a startup.

Chances are good you'll never even need to grow to that scale, either -- we ran a game site with 4 million in yearly revenue (and quite a bit more than 10 million requests/day) on about 8 mid-range servers.


Thanks for sharing your experiences.

With commonly used processes (like memcached and sphinx search) and normal client libraries, I have not found a substantial increase in time to architect and configure. You are right about time to administer, but in my experience that is front-loaded, and the added ease of troubleshooting problems readily offsets it.

By hide, I mean you still have services that your domain logic is consuming, but they appear to be part of your business application in a mono-process architecture instead of being "exposed" in SOA.

If adding servers is painless and automated, then I disagree that there is a substantial difference between 2 and 10. If you want to have fast recovery, reproducibility and transparency, a cluster of 2 servers takes almost as much effort as a cluster of 10.

Tuning applications doesn't add nearly as much to the bottom line as feature development does, especially early on in the start-up. Twitter is an obvious example of not performance tuning fast enough, but 1000x times that are examples of scalable websites we'll never hear of[1].

The conventions and convenience of "mainstream" rails architectures make this SOA approach nearly painless.

However, I do agree that rails is obnoxiously slow. To me, this is a huge problem in your average response time more than total throughput of the architecture. Further, I think we agree that scaling is not a problem, until it is a problem.

One thing you may not be aware of is that in the ruby world, sphinx and memcached are actually easier to use than their in-process equivalents. Further, the tendency in the rails world to go with MySQL rules out all the awesomeness of Postgres.


You can support 10,000 requests/second hitting a servlet engine, with dynamic templates, and backend DB calls, on one quad core box? I'd like to give you a job.

Maybe I've been working with heavyweight frameworks too long, but that seems an order of magnitude off at least. Any stats on Servlet JVMs that can do 10k requests/second on a single quad core?


You can support 10,000 requests/second hitting a servlet engine, with dynamic templates, and backend DB calls, on one quad core box?.

5-10k, depending; a local instance of a recent webapp I wrote can run 5.9k requests/sec on a simple page JSP-templated page backed by a database request. It's running on tomcat, using servlets with a lightweight REST API, and postgresql.

The webapp I'm working on right now has the following performance profile for a page that fetched the user-list from the backing database. Not as fast, but 2k req/sec is not bad, and I haven't done any profiling on the new stack we're using at all yet:

  Server Software:        Jetty(6.1.x)
  Server Hostname:        localhost
  Server Port:            8080

  Document Path:          /users
  Document Length:        5855 bytes

  Concurrency Level:      4
  Time taken for tests:   0.508 seconds
  Complete requests:      1000
  Failed requests:        0
  Write errors:           0
  Total transferred:      6045474 bytes
  HTML transferred:       5855000 bytes
  Requests per second:    1968.41 [#/sec] (mean)
  Time per request:       2.032 [ms] (mean)
  Time per request:       0.508 [ms] (mean, across all concurrent requests)
  Transfer rate:          11621.09 [Kbytes/sec] received

  Connection Times (ms)
                min  mean[+/-sd] median   max
  Connect:        0    0   0.1      0       1
  Processing:     1    2   1.7      2      22
  Waiting:        1    2   1.7      2      22
  Total:          1    2   1.7      2      22

  Percentage of the requests served within a certain time (ms)
    50%      2
    66%      2
    75%      2
    80%      2
    90%      3
    95%      3
    98%      4
    99%     12
   100%     22 (longest request)


I'd like to give you a job.

Something in your tone tells me you're not serious ...

Maybe I've been working with heavyweight frameworks too long, but that seems an order of magnitude off at least.

I couldn't say; I do have local 3rd party spring-based webapps that take 5 minutes just to start up, not to mention 300ms+ to render a single page, so I wouldn't be surprised.


Your methodology doesn't reflect actual usage of the site. I suspect there hardly an (if any) writes happening when you hit /users, the database is repeatedly fetching a hot query (or you are pulling from a cache that isn't changing because there are no writes happening), your dataset could be minimal, etc.


That may be partially true[1], but I'd challenge you to re-implement this non-optimized case (ie, this is a rough webapp, no caching, etc) in another runtime on similar hardware and see:

1) Whether it can support anywhere near the same level of concurrent requests with similar response times.

2) How much complexity (nginx + unicorn + memcached + puppies) is required to achieve this in comparison to a servlet engine (eg, tomcat) and your webapp.

[1] Most web applications are read-heavy, low on writes, and scaling up write capability generally requires scaling up the database. You can grow quite a bit with simple caching and monolithic database scaling before having to tackle more complex distributed data architectures.


If those 1968.41 [#/sec] were on a quad core then python can match it.


I maintain an internal app at work that handles @3k requests per second per dual core server. Site runs on coldfusion on windows. Its a conferencing app so users poll every second for updates. Nothing special about it, just putting this out there to show that the bog standard install of a jvm can be pretty powerfull


Coldfusion runs on the JVM?



Open source to boot!

http://openbluedragon.org/


urk - this story keeps coming back to HN.

Ruby isn't the issue here - most of the hardware spend (tiny as it is) went toward master and slave databases. Our database isn't huge, but it isn't small either. The working set doesn't fit in 32 GB of memory - lots of memory and fast disks are critical (even with caching and all that) and that takes 2 non-cheap servers.

As far as the platform goes - list of technologies aside, I wouldn't agree that it is complex. I also wouldn't trade some of these pieces (Sphinx and HAProxy, in particular) for anything.

I worked on Java web apps for 8 years before starting Ravelry and I definitely don't regret choosing Ruby for this project :) Did I have to buy 1 more server than I would have had to with Java? Yeah, I think so. ...but the trade-off was definitely worth it.


Ruby isn't the issue here - most of the hardware spend (tiny as it is) went toward master and slave databases. Our database isn't huge, but it isn't small either. The working set doesn't fit in 32 GB of memory - lots of memory and fast disks are critical (even with caching and all that) and that takes 2 non-cheap servers.

7 machines is getting into part-time sysadmin territory.

My estimate of peak load was somewhat generous in averaging it across only 5 hours -- could you elaborate on the actual peak load?

I'm surprised that something that's orders of magnitude slower than the alternatives isn't at least part of the issue, given the amount of front-end stack required, hired employees, and 3 6-core (6?) 40 GB front-end webservers required for 10 million hits/day.

I worked on Java web apps for 8 years before starting Ravelry and I definitely don't regret choosing Ruby for this project :) Did I have to buy 1 more server than I would have had to with Java? Yeah, I think so. ...but the trade-off was definitely worth it.

The JVM truly doesn't require Java, Spring, or J2EE, and it's an incredibly impressive piece of technology to just discard.


"..given the amount of front-end stack required, hired employees, and 3 6-core (6?) 40 GB front-end webservers"

Could you have misread the article?

We have one developer/sysadmin (myself, and I'm pretty much half-time on the development work and no-time on the sysadmin work) and the 6 processors/40 GB RAM is the total amount of app server resources across all machines.

I am a big fan of the JVM and I don't want to get into an argument about language productivity, but since I believe that I have saved a lot of time (and therefore, money) with Ruby, I don't really see the ~$5,000 spent on app servers over 2 years as an issue.


How do you envision that scaling for say an factor of five increased load?


Same as anything else -- you start by adding a few more front-end servers (or VMs), or scaling up the ones you have just a bit more. You don't have to do nearly as much work, and not nearly as soon.

If load levels continue to grow, leverage ehcache to handle caching directly in the VM (if you're not already), begin splitting responsibilities across different service implementations, etc.


Why is scaling a Rails site to 10M requests per day that big of a deal? Scaling anything to 10M has its challenges but at the end of the day, you rarely read about people making a big deal out of scaling a php website of that size.


It's only noteworthy because of the early reputation Rails had for being slow (earned or not).


The real meat about how it was done is here: http://highscalability.com/blog/2009/9/22/how-ravelry-scales...

and that info was taken from this interview: http://www.tbray.org/ongoing/When/200x/2009/09/02/Ravelry

The linked article is just a summary.


Yes, it was mostly a "if you haven't seen this stuff, you need to check it out" for my subscribers. I'm not sure why it's become so popular.. I guess a lot of people still hadn't heard of it! (So mission accomplished, in a way..)


This is at least the 4th article on Ravelry's scaling accomplishments here on HN.


Yeah. I made this post and it was really just a "throw away" piece for any of my subscribers who hadn't already heard about it - I'm as surprised as Obama is at winning the Nobel ;-)

I guess that even when a story blows up in the techie world, there are enough people who didn't hear about it that it can be repeated several times and still do well.


I'm the OP of this one, sorry. And I saw Casey above groaned a little bit at it surfacing again. :)

But seriously, even with all the repostings, there are a lot of us who somehow hadn't already seen it. Even though the headline, and the story, emphasize the scaling challenges and solutions, the larger story is how you can bootstrap something really successful with minimal resources.

Peter, thanks for running a great blog.


Apparently it takes 4 times for me to see anything, then ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: