This is coming from the confused part of my brain. 45k transactions a day isn't that much and processing 300kb xml files should be trivial. What am I missing? Was it tongue in cheek?
I guess my response was a bit intense. but I will stand by with saying processing 45k 300kb files a day is a non-trivial task. Making the site handle that amount of traffic (serving the pages) is easy. Processing the data however, isn't. Consider it's being done through their proprietary firefox extension that works on two browsers and operating systems in order to pull info off your devices. Then it has to go to some cluster of background job processors. If you think growing your database/storage scheme by 2-5 gigs a day is 'trivial' then I commend you. From my experience, it's an easy recipe for setbacks. For example, our dataset is only 5 gigs, however at 5 gigs we are to the point of multi-hour schema changes in our data storage format. So, say you want to support handling laps from an uploaded logfile. A data migration of how many terabytes spanning how many hours?
It's ok, it seems as if you have real experience doing this, in contrast to the armchair "scaling experts" flooding HN that balk at problems like this.
Noticing this a few days late, but feel the need to defend myself. I've worked for two data mining and data aggregation companies. Currently working for a real time vertical search company. We do 45k in the blink of an eye. real time search over millions of documents is HARD. Granted we have lots of hardware and bandwidth. I'd put 45k at entry level scaling problems and see no reason to brag or get excited over it, which is why I asked for clarification.