I sort of have this feeling too, but Twitter *is* a business after all, and host...

slapshot · on April 5, 2011

> until end users pony up for the service, then they're the product.

Exactly. It should come as no surprise when any free site that accumulates a massive amount of data turns around and starts selling that data -- even if users feel like their privacy is being violated.

Twitter really has nothing to sell but data. Same for Facebook and others. They can sell that data indirectly (by allowing targeted advertising) or directly (by selling massive blocks of data for $0.30 an hour), but nobody should be surprised when it happens; it's all that they have to sell.

chc · on April 5, 2011

Actually, God has told us how many tweets they're hosting, and tweets are mercifully short. As of about a month ago, Twitter says it gets about 140 million tweets per day. Assuming the maximum length of 140 characters, this means they're storing about 18 GB per day. At Amazon S3 rates (which are considerably higher than what Twitter pays if they have a working brain anywhere in their corporate structure), that means that their storage costs increase by about $1/day. After five years of storage at that rate, their monthly storage costs (again, at S3 rates) would be around $2000. If they're making less than $2000 per month with that wealth of data, nickel-and-diming developers is a drastically misguided underreaction.

I'm not pretending this is all it takes to run Twitter, but I'd be surprised if storing a few TB a year is a major cost center. (Serving up so many concurrent users seems like a much bigger and more expensive problem — that's an average of 1600 tweets per second, to say nothing of readers, and I suspect tweet rates are very lumpy.)

simonw · on April 5, 2011

"Assuming the maximum length of 140 characters, this means they're storing about 18 GB per day"

That's a huge underestimation. A tweet isn't 140 characters - it's 140 characters plus a huge chunk of surrounding metadata and indexes (who tweeted, when they tweeted, where they tweeted from, was it a reply, did it mention anyone, did it include any hash tags, did it link to anything, was it a retweet, its unique ID, how many users was it delivered to...) - all massively denormalised for performance reasons. See http://www.scribd.com/doc/30146338/map-of-a-tweet for an idea of the data involved.

Then there's the fact that a reference to each tweet has to be written in to the "inbox" of every user that receives it - so if Tim O'Reilly says something a reference to that tweet gets written 1,452,801 times, once for each of his followers.

On top of that, there's all of the associated stats collection, including link click tracking and a ton of data around who is doing what in the Twitter interface.

This article from last year suggests that Twitter were storing 8TB/day back in October, and it's only going to have gone up since then: http://techcrunch.com/2010/09/17/twitter-seeing-6-billion-ap...

xtacy · on April 5, 2011

I think storage is not just the tweets. It's also the meta data, especially the ReTweets. Considering that content dies down pretty quickly after they're posted, I would imagine that caching is extremely important. Caching infrastructure would require a lot of memory, for which they would have to pay for RAM, which costs a lot more.

dotBen · on April 5, 2011

this is what is known as a straw-man argument, and it does nothing to move the conversation forward.

Clearly storing just the 'tweet' contents alone would be unhelpful because what about the username or any of the other 40+ metadata point a tweet carries.

What about keeping the mechanisms needed to store, sort, search, send those tweets, etc etc. I could go on.

Also, what were you expecting - Twitter to their business at-cost?

chc · on April 5, 2011

As far as I can tell, the comment I was replying to was about Twitter's storage costs. So the fact that my response focused on storage and how much it costs does not make a strawman, which would involve arguing against something other than what I was replying to. Moreover, the point of my comment was that Twitter's storage costs are not the interesting part of their operation, so your question "Were you expecting … Twitter to run their business at-cost?" actually is a straw man. You're just repeating what I said, except with slightly less hard data.

gbhn · on April 5, 2011

I heard on This Week in Tech that all the history they sent to Library of Congress was 4TB in size. So yes, the storage isn't the issue.

Splines · on April 5, 2011

True enough. For the record, that was just a tongue-in-cheek phrase that I used. You're right; the bandwidth, machine time, and manpower probably make up the bulk of their costs.