Capping IT Off

Capping IT Off

Opinions expressed on this blog reflect the writer’s views and not the position of the Capgemini Group

MegaUpload, an Information Management cloud

This post has not been written to discuss the legal aspects of the recent MegaUpload events. This post’s goal is presenting a high level and technical point of view of how MegaUpload was designed.

As you know, MegaUpload was a collection of websites allowing users to upload and download any file. MegaUpload information system is a pure cloud based solution. To propose services, to store petabytes of data or to bill - all have been done using cloud services. As the heart of these cloud services was Information management, I think it would be interesting to investigate, from a technical point of view.

Based on the MegaUpload indictment, some figures and some technical facts are really interesting.

Hosting capacity:

Carpathia hosting is a North-American cloud provider based in Virginia. MegaUpload was a customer of Carpathia and was renting about 25 petabytes of storage space and about a thousand servers (half of them physically located in the USA). It means an average of 25 Terabytes of storage per server.

Leaseweb is based in Netherlands and provided more than 600 servers to MegaUpload shared between the Netherlands, Belgium, Germany and the US. Cogent communications was providing 36 servers in the US and in France. I’ve got no information about their provided storage capacity.

Compression techniques:

As most of the files stored were already highly compressed (video or audio formats), every time anew file was uploaded on MegaUpload, a unique identifier was generated through an MDS hash calculation to determine if the same file had been previously stored and avoid storing the same file twice (or more).. Note that MegaUpload gave the new upload a different URL from the URL given to the initial upload, so this compression by avoiding twin files was fully transparent for the users.

Information Access:

To allow internet users to find the stored files, websites have been created describing the content (name, kind of data, photos, name of the actors or singers …). In Business Intelligence we call it generating metadata and this is part of the presentation layer (like a Business Objects universe or the metadata description on Documentum).

The internet search engines like Google, Bing or Yahoo were automatically indexing all this public metadata content, allowing users to find the right website and so the file location through a MegaUpload URL.


As with all commercial internet websites billing is a mandatory component. There was indirect billing through the presence of advertisements on the MegaUpload websites and direct billing offered to the final users to improve their upload and download capacity (bandwidth and quantity of simultaneous downloads) with a subscription.

Apparently subscription represented 75% of the revenue, whilstadvertisement was about 25%.

The final billing relationship (payment) with customers, providers and advertising was once again purely done on a cloud mode, managed through Paypal, Moneybookers, Adbrite or Partygaming.

High level architecture design:

Megaupload Information Management Architecture
Megaupload Information Management Architecture

If you look at this simple high level architecture design, it looks like an information management public cloud based solution, with an integration layer, a storage layer, a presentation layer, a usage layer and, as it was an e-commerce web site, a billing layer.

As you may see, MegaUpload was a very primitive information system cloud solution. I’m sure there were also BI tools for internal usage (analyzing click streams and customer’s activity), but I have no precise information on it, maybe next episodes will cover this domain.

About the author

Manuel Sevilla

Leave a comment

Your email address will not be published. Required fields are marked *.