Hacker News new | past | comments | ask | show | jobs | submit login

Our watchdog process calls the EC2 APIs directly to identify how many instances are running, which ones are spot instances, etc. Boto, the AWS client library for Python, makes that pretty easy. The watchdog isn't very sophisticated -- it just checks to make sure that the correct number of instances are running in each auto-scale group. Our application servers aren't very efficient in certain respects, so we don't trust metrics like usage/load to make auto-scaling decisions.

If I was doing it over again, I'd just use Amazon's auto-scaling features for all of this. At the time we built this, EC2's auto-scaling didn't support some of the features we needed. Since then, they've made it a lot easier to do things like set up a repeating schedule for auto-scaling, rather than using metrics.

We only have one EC2 AMI that we use for all of our servers. That AMI is pretty basic; it only does enough to connect to our Puppet configuration management servers. Puppet then configures the boxes as web servers (or databases, or...) and adds them to the appropriate load balancer.




Very interesting, thanks! I've spent a bit of time working on a library to accomplish this, and Boto has been extremely helpful.

The watchdog isn't very sophisticated -- it just checks to make sure that the correct number of instances are running in each auto-scale group.

How did you decide on the right number of instances? The article mentions 20%--is that based on latency, a cost-saving target, or something else?


We revise the "right number of instances" every few weeks based on latency and traffic numbers. But sometimes when we release updates, we'll find that we suddenly need a lot more capacity (or a lot less if we improved performance). We have automated tools to help us notice performance regressions. Once we decide that we need to change the pool size, we adjust the watchdog configuration by hand.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: