I’ve been working with one of my favourite clients recently, a Mr Santa Claus – who is faced each year with the ultimate big data challenge – a level of complexity that implies he has solved many of the challenges that business are facing. In chatting with him I see that he is faced with three key streams of big data challenge: demand, supply, and fulfilment*.
- Unstructured data from 526 million letters to Santa (wish lists), in multiple languages
- Correlation of these to an accurate GIS database with up to date information on chimney access and sleigh parking
- Notes on previous quality and quantity of carrots and mince pies (the tradition where I am)
- Assuming 1 present each at 1kg and 30cm square, total supply components of at least 526,000 tonnes and 14,202,000m3 space required, built to time and potentially “just in time” based on last minute letters from children
- A predictive supply chain linked to the above wish lists, with a level of sentiment analysis to determine subtleties of true demand – from whatever is cool from friends in the school yard to exact colour and size based on child preference
- Delivery to approx. 22 million children per hour – 365,000 per minute, 6,100 per second.
- A list of who has been naughty, and who has been nice, updated in real time – so a 526 million row database with structured data. Our recent solution, “Anomalous Behaviour Detection” can really help parents here.
- Automated routing based on weather, flight conditions, etc. around the world (to avoid disrupting any flights) – an average of 102,465 flights per day to be tracked
How might he have built a solution to this challenge? Possibly by taking a Santa Data Lake approach using the Pivotal Big Data Suite and:
- Store everything from the wish lists and letters from all children, for say the last 15 years (to allow trend analysis) – about 3.5PB in total based on 500kb high res scan of the letters. Store this in Hadoop for cost and simplicity.
- Ingest the letter data and analyse. Text analysis and OCR, probably using data science tools (e.g. Madlib) to create a sanitized, structured data view tied to the GIS and address database.
- Use batch analysis from the history to determine exact present specification
- Ensure a 2-way interface to an ERP system for production – SAP, etc to ensure present creation by Santas helpers whilst monitoring updates via sentiment analysis for the most popular colours of key presents.
- Distil data sets as needed from Hadoop to In-Memory grid for delivery day – Santa will need real time updates
Insight delivered in real time;
- Automated routing – based on the 100k plus flights, likely routes and altitudes with corresponding updates to Rudolph.
- A sleigh mounted tablet solution would be ideal to guide the next delivery to Santa – given the 6,100 deliveries per second, and timing, communication latency will be a real issue so it will need to be hosted locally on the sleigh.
- Last minute updates will be delivered by micro-batch via satellite
- Using data compaction techniques you can reduce the structured data to its lowest point of entropy, vastly reducing the in-memory requirements – down to approximately 50kb for a single delivery – but still a 1tb in memory grid, which will need to be fault tolerant.
- Action – Santa will probably have an hour of data sets at any one point allowing him to deliver the present on time.
Looking ahead, Santa will also need to consider his digital strategy – he should take a strategic approach if he wants to be successful, as we found successful companies do in “Leading Digital”
- Santa will need to consider real time interactions with the Santa Tracker to improve his interaction with the 12-18 year demographic
- Mobile apps are increasingly important when delivered as a cohesive part of the customer experience.
On that note… Seasons greetings to one and all, and I hope Santa brings you what you want this holiday.
* All numbers are approximated based on this rather fun article