Recently I switched jobs and joined some old employers and friends on a new endeavour, an advertisement platform. So far things have been very good, so good in fact that we're quickly growing out of our current infrastructure. Last christmas everything went crazy and the, then only developer, rewrote a bunch of things to make it scale better.
Currently we write away every single transaction to txt files and once every X minutes gather those and start inserting them into 2 Mysql shards. After that we run a bunch of group by queries to generate cross table reports. Currently our CMS is very limited by this, you can only view data relations in the predefined tables and we would like to have (a lot) more flexibility here.
Perhaps the biggest issue of all is that both I and the other developer do not have experience with handeling this much data. We both have worked on systems which handle a fair bit of data, but not nearly as much as this. We've been looking at MapReduce for MongoDB but already run into problems when we try to insert a weeks worth of data, which is about 350gb, because we only get so many IOPS at AWS. If we want more IOPS, it gets really expensive really fast. We're also not entirely sure yet if MongoDB and MapReduce is the way to go.
What I'm looking for is a few pointers to help us get headed in the right direction. Which databases should we be looking at? Should we be looking in an entirely different direction? Are there some (recent?) blog posts about this which can help us out?