Hacker News new | past | comments | ask | show | jobs | submit login
AWS Case Study: Parse (YC S11) (amazon.com)
68 points by goronbjorn on March 7, 2013 | hide | past | favorite | 21 comments



I'm really humbled to see the architecture that we came up with at reddit with help from Heroku and JustinTV, which I then taught to Amazon and AWS, is still the one AWS is highlighting today (and presumably teaching their customers). It's an amazing feeling to see your work live on beyond your involvement.

* They flew me up to Seattle to teach Amazon retail how to move to AWS back when retail had just started that process. Amusingly, the guy mainly responsible for that move now works for Pinterest (a company that "does cloud right").


Was it a strange experience to teach a company how to use its own product?


I was surprised when they asked, that's for sure.

It was a little odd because I felt like there is no way I could teach these people anything they didn't already know, but I got really good questions and feedback so at the end I didn't feel nearly as weird about it.


This happens fairly often: customers will use your product in unexpected ways and sometimes for much better output than you originally thought.


I realize the diagram is likely heavily over-simplified, but what's the reason for the nginx layer? It doesn't look like you guys are serving assets through those boxes, so it seems like you have an unnecessary extra hop from the ELB to the app servers. My guess is this is either nginx caching, a holdover from the pre-aws architecture or a measure to keep the system decoupled from ELB?


I can't speak for Parse, but I've come up with something similar in the past. Nginx/HAproxy as a combo is far more flexible than the ELB alone, you might want to use it for rate limiting, better load balancing algorithms, better logging, tweaking headers, handling errors, or controlling buffer sizes for example.


Or for preventing DoS attacks from slow client connections. Unicorn is not designed to be exposed to the outside world; it needs a reverse proxy that does request buffering, like nginx (which the Unicorn docs recommend).


True, though the ELB also does some simple buffering of HTTP requests.


We use nginx for serving static files and doing some trivial top-level routing. Haproxy is for load-balancing among families of unicorn app servers.


> I realize the diagram is likely heavily over-simplified, but what's the reason for the nginx layer

probably to load balance between unicorn instances in the particular amazon EC2 instance


Can you tell us more about RAID10 comment?

Are you doing RAID0 now instead for disk volumes?

What would happen if there's another EBS outage that would take out a lot of your MongoDBs. Would you just recreate them from the very latest S3 backups?


Why Squid behind the cloud code servers? Are you moving from MongoDB to Cassandra?


Squid handles external http requests that are made from cloud code.


That makes sense, thanks.


Why both redis and memcached?


I can't speak for Parse, but I use both of these technologies in one site. Memcached is, well, for any kind data that you want to cache: results of SQL queries, blocks of the site (HTML), etc. Redis, on the other hand, it's a final data storage on its own (like MySQL, because it's also persistent), but it's faster than MySQL for certain things. For example, for things like "List of profiles I've visited / have visited me recently on Facebook". This seems like a very simple MySQL table and query, but once it becomes big it starts to get slow. On the other hand, you can have one key per user on Redis, like "<id_user>_visit_history" which is a sorted set of <time, id_user>. You could do the same with Memcached (although it doesn't support the same data types), but then it wouldn't be persistent.


Right ... thats my point. redis is as fast if not faster than memcached for caching exactly the same things you talked about (results of sql queries, blocks of the site etc) so why not just use redis for everything and eliminate memcached.


Here's one possible answer, for one specific use case: http://engineering.pinterest.com/posts/2012/memcache-games/


We use redis because we use resque extensively for the push notifications architecture. Memcache is superior to redis for simple caching in front of data sources.


Any reason why you haven't tried using Sidekiq, which is much more efficient in terms of using memory resources? Or is Resque just good enough for now?


Hm, well Sidekiq wasn't out when we started developing. :) Memory has never been a constraint for us with resque though. The only problems we've had with it have been things like improper signal handling, pretty easy to deal with.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: