We've been using these for about a month at Justin.tv. They are awesome. Capacity increased from about 350mbps per m1.x-large instance to nearly 700mbps per c1.x-large instance.
As a web app developer, I've found memory to be the greatest bottleneck for both web and database servers. But this will no doubt be useful for sites like animoto that need to do a lot of offline processing.
Agreed. I'd love to read an announcement about "High Memory" instances. It'd be great to be able to pay 15 cents per hour and get the basic instance with 3 GB of RAM, or 20 cents per hour to get 5GB of ram.
RAM just hasn't gotten cheaper at the same rate as clock cycles. Offering lots of memory for very little additional money couldn't possibly be cost-effective, unless Amazon is also memory-constrained for most of their applications.
Regardless, you can almost always trade CPU time and/or IO for memory in webapps. Limit your RAM caching, and move less-frequently-used data to temporary files on disk. Be more judicious in your SQL, so that you don't return large result sets to your application servers which then get filtered further in your business logic. Hell, run one or two fewer app server processes to begin with, and just bring up another EC2 image to handle load if you need it.
Has anyone done a benchmark comparing a real machine to an EC2 instance on something concrete that is processor-bound? I'd be interested in hearing from anyone who has personally made a comparison in some manner.
That wouldn't be consistent over time. The EC2 Compute Units represent the power of a certain type of CPU at a certain point in time. Performance for a given clock speed has increased dramatically over the past few years. If you have a suggestion for something better, I can definitely pass it along to the EC2 team.
I like the EC2 compute units in general -- they are much better than keeping track of many different models of CPUs. There are, however, a couple of important limitations with this approach: First, different processors vary considerably in their integer : floating-point performance ratios, so your notion of how a modern CPU compares to a 1.0 GHz 2007 Opteron might be different from mine. Second, because the rated performance of instances is a minimum value and some instances (if you're lucky to get a new box, presumably) have slightly higher performance, it's not possible to test code on one instance and then expect it to take the same amount of time if run on a different instance of the same nominal size.
To solve these two issues, I'd suggest taking the SPEC benchmarks (yes, they're crude... but they're better than nothing), publishing minimum SPECint and SPECfp values per EC2 Compute Unit, and then adding an API call to say "tell me what the SPEC values are for my instance i-12345678".
Of course, there are more complicated approaches which could work -- I can imagine a system where instead of having a pool of "small" instances, there are some "1 core, SPECint_base2000 750" instances, some "1 core, SPECint_base2000 800" instances, et cetera, and different prices for each (maybe even fluctuating from hour to hour based on demand or an auction system).
Ignoring NetBurst and assuming that Amazon decommissions servers after three or four years, I think frequency is a decent proxy for performance. If I built EC2 I would expose the real frequency and processor type, so instead of renting a "c1.xlarge" instance, you'd ask for "8x2.5GHz-IntelCore-7GB".