An informative post on YES Cloud outlines the infrastructure implications of tweeting using Twitter. Using Twitter stats from tweetstats.com shows that Tweeter are responsible for nearly 2 million tweets per day. With Twitter currently popularity soaring and it coming out of it’s current niche use it is frightening to think how many Tweets it is going to have to handle.
To get a better sense of the infrastructure needed Prashant Gandhi, the blogger from Yes Cloud, did some calculation in the original post as to what Twitter may have to handle:
Average Tweet Size: 100 bytes
# of Tweets: 10 per tweeter per day
# of Tweeters: 1 billion worldwide
Tweet Rate: 10 billion tweets per day
Tweet Storage: 100 Gigabytes per day (with 10:1 compression)
Storage needs appear to be quite manageable also – 100GB/day means ~37TB/year
Each tweet is essentially an HTTP transaction (request and response). The tweet rate of 10B/day translates to ~115K HTTP transactions/sec for tweets uniformly distributed throughout the day. Assuming that the compute infrastructure (aggregate of web, application, database servers) can process 1000 transactions/sec/server, about 115 servers are needed. If a peak to average ratio of 3:1 is assumed, then about 350 servers are needed.
Now given that Twitter has more than enough problems scaling it’s current infrastructure do you really think it would be able to handle these volumes of scale ? Or will it/ has it re-architected ? (I’m waiting for that call you Twitter execs…..)
Now GigaSpaces is recognized as a technology that can scale and handle huge loads. It has proved this numerous times on platform tests, in banks handling huge amounts of market data, and by handling such dynamic platforms as the iPhone launch. I am starting to see more and more dynamic media, retail and innovative web 2.0 type platform vendors using GigaSpaces to handle the dynamicity of Peak loads that they otherwise cannot. GigaSpaces is the difference between scaling out your platform in real-time and servicing your customers, or not. Simple as that. Throw in the cloud to be able to handle break out scaling that the existing data centre cannot and GigaSpaces becomes even more compelling.