Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Rails + MySQL scaling on a budget (muziboo.com)
80 points by prateekdayal on March 9, 2011 | hide | past | favorite | 32 comments


The single best advice I can give for scaling Rails, and it is widely applicable to other frameworks: never block an HTTP request/response cycle on I/O. In Rails, for example, if you have an external API call which routinely takes 3 seconds to complete, then that call costs about three hundred megabyte-seconds of RAM time to service, because the 100 MB or so used by your mongrel is effectively worthless while you wait.

When you run into resource constraints in Rails, odds are you are running out of RAM first. It's a memory munching beast in most common deployment scenarios.

It is much more efficient to offload those blocking requests to a worker process and poll for results, since a single mongrel can blow through lots of poll requests and useful work in five seconds. Bonus points: your web app will feel snappier, because progressive rendering tricks users into thinking 5 seconds is not actually 5 seconds.


My single-VPS-hosted app spends a lot of time waiting on DB queries, which themselves spend a lot of time waiting on disk I/O.

I've therefore found that adding RAM and doing nothing with it so it can be used as disk cache (or, for Postgres, allocating some of it as shared_buffers) is the best non-non-blocking solution.

(I'd originally allocated almost all RAM to running up to 16 or so Passenger instances. Once more than 3 or 4 of these were serving requests, everything ground to a halt waiting on the DB. Now I'm down to about 4 instances, with the rest of the RAM for cache, and they cope fine with the same load).


Yes .. thats a great advice. We use beanstalkd for asynchronous tasks (like sending emails or updating twitter) and backgroundrb for jobs that need polling (re-encoding a mp3 file and showing a progress bar to the user)


The single best advice I can give for scaling Rails, and it is applicable to nearly every other framework: give it more memory. Generally, when you run into a bottleneck in Rails, you're running out of RAM first. You can get a couple of 8GB sticks for around $500; just over-provision your server and worry about the problem later.


Sorry, but that's terrible advice, especially for bootstrapped businesses. Throwing resources at a problem will only scale as large as your wallet is. It's both wasteful and lazy.


Throwing money at problems is sometimes the right answer, and I say that as someone who is so bootstrapped I can occasionally taste the shoe leather. Heck, I spend ~$500 a month on hosting when I could probably do it in half that on renting a dedicated server (or just buy a big beefy box and pay a pittance for a spot on a rack somewhere). I don't because it isn't worth my time to even think about migrating.

My favorite anecdote: asking tptacek to help me me squeeze a few megs out of Redis to avoid having to bump my VPS up a tier. That would have been, heck if I know, $BOATLOADS / hr in engineer costs to avoid $30 per month on my Slicehost bill. He restored me to sanity fairly quickly.


I hear people whine about their VPS hosting costs at the same time as enjoying $100 of booze and cheese at the bar.


But that's what you're doing, whether you choose to add hardware or improve the code -- it's not like your time as a business owner is free, and every moment you're spending on the codebase is a moment you're not marketing or selling.

The real question is which is more valuable to your business.

Consider these two examples:

Let's say I'm a bootstrapper, and have a startup that earns $3k monthly. My customers are happy, and my business is growing, but I'm having some serious scaling problems.

Scenario One: I've outgrown the small Linode that I started on. If doubling the cost of my Linode allows me to service 2X the number of customers, I do it, because $3k in revenue is way higher than the cost.

Scenario Two: I've maxed out the biggest Linode, and the next step up is a cluster of dedicated nodes at Rackspace for $4k/month, which will allow me to scale to 2X my current volume. At this point, I fix the code, because my earnings will drop even if I get twice the number of customers.


Here's few other ways to optimize your rails application (with the description taken from the doc) :

- https://github.com/flyerhzm/bullet/ : The Bullet plugin/gem is designed to help you increase your application’s performance by reducing the number of queries it makes. It will watch your queries while you develop your application and notify you when you should add eager loading (N+1 queries), when you’re using eager loading that isn’t necessary and when you should use counter cache.

- https://github.com/sdsykes/slim_scrooge/ : SlimScrooge implements inline query optimisation, automatically restricting the columns fetched based on what was used during previous passes through the same part of your code.

- http://guides.rubyonrails.org/caching_with_rails.html : Caching With Rails from the awesome rails guides

- http://guides.rubyonrails.org/performance_testing.html : Performance Testing Rails Applications from the rails guides


Bullet is nice, but I do occasionally have an experience where it will yell at me about unused eager loading associations when they're not actually unused.

I suppose I could dig into the bullet code a bit, but I've been too busy.

Has this come up for you before?


I had the same error from time to time, but it seems resolved : https://github.com/flyerhzm/bullet/issues/closed#issue/20

You should open a ticket, eventually write a spec to reproduce.


I'll do another code review and make certain that I'm not crazy and just missing something, then I'll send a spec his way.


Not to be obnoxious but I wasn't wasn't very impressed by this post ... use explain plans to add indexes to your tables? use passenger instead of mongrel instances? don't use rand() because it does a table scan? these are things a developer should know and be doing already.

There are also mistakes in there like saying you cannot add indexes to your tables using rails (can do this using migrations).


Agreed, I had the feeling reading this article that these devs just discovered the idea of indexes on a database. This should be common knowledge to a developer.


> Rails does not (cannot) add indexes to your database.

Am I missing something in this sentence? You can add indexes in migrations.


I think the unspoken implication is "for you.". Many newbie Rails devs assume that a line like "has_many :friends" is magic and makes user.friends execute quickly. They won't understand this explanation, but that can plausibly cause a full table scan, or worse, N+2 full table scans if you have something like:

photos_i_am_in = user.friends.map {|f| f.photos.select {|p| p.include? user}}.flatten

(Code somewhat minimalist - don't think too hard about it.)


The #map and #select approach in your example is even worse since it loads all of the associated friend and photo records into memory and tries to use Ruby for queries that should be performed by the database. If there are more than a handful of those records, that process will balloon up and could bring the server to a halt.


Sorry. I meant rails can't do it automatically for you. You will have to figure out based on your queries and add the indexes yourself. Once you have figured out what indexes to add, you can use add_index in your migration file.


Choosing indexes is a world of compromise ; Indexes slow down operations that involve writing. When you have an index that is rarely or never used you'll slow down writing operations on the table for nothing. So you'll have to carefully chose indexes and there's no way any framework could do this for you ... It completely depends of your application and your indexes can change over time.


I think he intended to say that Rails does not (cannot) automatically add indexes to your database, which is true and makes more sense in the context.


Are there any frameworks that do this?


If there are they shouldn't.

As mickeyben says above, choosing indexes is a compromise - you sacrifice performance in one area for performance in another. Only you can make the right decision there, because only you can begin to know how the app will be used in production.


DataMapper's dm-constraints plugin (https://github.com/datamapper/dm-constraints) can automatically add foreign key constraints to your associations, which has the side effect (in MySQL at least) of creating indices on the foreign key columns if they don't already exist.


Yeah, that kind of blew me away too. I posted a comment on the article about it. Even if Rails didn't support it natively you can always execute raw SQL statements in your migrations and just use your DB-specific syntax there to do it.


You are right. That's a pretty glaring hole.


We did similar things on our server (though with much much less traffic but all dynamic content + API).

We moved from passenger to nginx+unicorn and the performance increase was awesome + it uses less ram (which again means more ram for more app servers!). Our box was a old quad core with 4gb ram so it got a bit tight with everything on one machine. The DB blocking problem could easily be solved with a cheap VPS as a slave DB, then you can do backups off that. :-) Depending on your kind of website, caching with varnish is great, we also made great use of memcached for fragments and other little things. And since our main cpu consuming background task just did one thing in the end it was rewritten in C which also helped a lot. Offloading all files to Amazon S3 is also a very good way of reducing load if you have a lot of them.


Were you on passenger for apache or passenger for nginx, and were you on passenger 3? Those make quite a difference. Memory usage and performance have been greatly improved in 3, with important parts optimized in C. The dynamic multiprocessing can save you more memory during idle time.


We were using passenger with apache if i remember correctly. It's a few years ago and i was more on the developer side rather than sysadmin side of things back then.

Its probably worth taking another look? For my ruby applications at the moment i use nginx+unicorn, which works really great.


Even if you aren't "on a budget," it's always a good idea to spend some time benchmarking and tuning your setup. You never know if you'll receive a spike of traffic or what the limits of your setup are. Most of the time, between the OS, Web Server, Application Stack, and Database, there are a few knobs that can be adjusted with minimal effort that will yield big gains in headroom. The author's suggestion of Munin is a great first step is a nice way to keep from getting caught with your pants down, and can usually be installed and configured in a few minutes.

This is sort of the systems side of always making sure HTTP compression is working, minifying your JavaScript, crushing your PNGs, and keeping an eye on the aggregate size of web pages. Even if it loads fast for YOU, someone might be on a slow DSL line (or, gasp, DIALUP) or a mobile connection.


I had an article published about MySQL tuning a while ago:

http://blog.scoutapp.com/articles/2011/02/10/understanding-d...

Always tune for your workload. Be sure to index your damn tables. And yes, measure.


As an aside, beware using idiomatic ruby while writing ActiveRecord queries.

Model.all.inject { sum or average code } will destroy you once you get into production. Working at an otherwise excellent Rails shop I had to launch a mini-jihad to drive this stuff out.

(NB my code is wrong because I haven't written Ruby for a while).


I've faffed around with optimising LAMP for years in order to get Wordpress to perk up. Aside from WP-supercache, the biggest difference I ever made was moving MySQL onto a second server. It halved page generation times and the whole site is much, much snappier.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: