How much did storing the data on S3 cost where you said, "However, the data is on S3" or was it there for such a transient time that it didn't cost much? Bandwidth costs in/out of S3 too?
Edit: Actually I read the S3 parts again, it sounds like the CommonCrawl project pays the S3 costs, I think, since it looks like you're using their domain data?
Edit: Actually I read the S3 parts again, it sounds like the CommonCrawl project pays the S3 costs, I think, since it looks like you're using their domain data?