Rolling product analytics are done locally in region. Finance, etc., is dealt with by the Enterprise Data Warehouse at a quite an abstracted layer. But how to deal with the annual reporting to key regulators is a challenge at this scale. Keeping all transactional data globally – and with audit trail to source (even for say, just the US) is an expensive task.
Let’s take a record size of 1250bytes for full transactional data with the traceability to source, etc.
1250bytes*1BN transcations = 1.14tb per year
For the last 10 years of data we are at 45TB. For 20 years we are at 910TB – all for transactional data we dare not throw away due to the risk of long term class actions and regulator challenges…
…It’s likely I need it once a year, and will probably have to keep it for up to 100 years
“Cold store” the data to Amazon Glacier or take the approach offered by Facebook opensource – a BluRay archive capable of multi-petabyte stores – it doesn’t really matter which, all have long term certainty. Recovery time is irrelevant, providing it is minutes and not hours. Cold store is cheap, an easy win when meeting the regulator requirements are a pure cost overhead.
Once a year, as regulator demands:
- Abstract either the whole global data set, or a region as needed, into Hadoop with the previous years as needed.
- Distill the data sets using large scale data set tools like Pivotal’s PDD
- Provide the regulator with line of business view needed
- Complete, shutdown and archive to cold store the new data sets
It meets the demand case for when you need insight just once a year.