Capping IT Off

Capping IT Off

Opinions expressed on this blog reflect the writer’s views and not the position of the Capgemini Group

The Physics and limits of Big Data – storage isn’t your issue

You hear lots of things in the world of Big Data about Terabytes of data being created in ever-smaller periods of time and the need to store that information. What is often over looked however is the network bandwidth that is required to process this information and what that will mean for data center design.

Lets take something understandable as an example for our baseline. TV channels, something that we know is stored and managed today and then broadcast to people. Lets assume a provider as 200 channels and all of these are broadcast in full 1080p HD which means about 8Mbps per channel.

So if we are a media company and want to store a years worth of programs this means we have

8Mbps feed, which gives 1 MB (megabyte) per second, which means that for every year (60 seconds, 60 minutes, 24 hours, 365 days) and 200 channels we get 6307200000MB or 6.3 Petabytes 

Which means we need 6.3 Petabytes of information each year to handle that amount of video. You also need 1.6Gbps of bandwidth to handle the feed in. If we assume you are storing this on 3TB disks it means you need roughly 2100 disks to store all this information, without duplication, so if we assume its important to have at least duplication then we need 4200 disks to store this information.

A normal 3.5 hard-drive has dimensions of 146mm x 101.6mm x 25.4mm (assuming a larger form factor drive to store 3TB) which gives us a rough cube if we stack the drives 2x3x12 to give 72 drives and 216TB in a rough 30cm3. So to get our 6.3Pb we could create a 5x3x2 set up (1.5m x 90cm x 60cm) which gives us a tiny bit of growth room. This is of course just the raw disks, no cables, no power, but we are already talking about having almost a cubic meter of disk for just a single year of TV.

If we are assuming using pizza box style rack servers for our 4200 disks and a standard sort of Hadoop set up where we have 10 disks per pizza box, a couple of CPUs (8-12 cores) and a bit of RAM then we have 30TB per pizza box, and assuming a 42U Rack and the pizza box being 2U we get 630 TB per rack. Each rack is roughly 0.6m x 1m x 2m so our 6.3 Petabytes is going to take up ten racks which gives us 6m x 1m x 2m of floor space taken up by our disks in a row of racks.... or to put it all together in an infographic...

The reason I've gone through all this maths is to make a point. That really isnt an overly scary amount of floor space or rack space. Its a lot but its not terrifying. Even after 20 years of 200 channels of HD TV we are looking at 126 Petabytes and therefore 200 racks which is very roughly 20 rows of 10 which, assuming even spacing and rack space) gives us 38 x 10 in dimensions which is 22.8m x 6m x 2m which doesnt come close to filling a football field.  This is streamed video which is at the very high end of data feeds, a level that most businesses won't even come close to matching.

So in the world of Big Data storage is just a cost, its not an issue.

About the author


Leave a comment

Your email address will not be published. Required fields are marked *.