Capping IT Off

Capping IT Off

Opinions expressed on this blog reflect the writer’s views and not the position of the Capgemini Group

The AAA of Big Data and Analytics

The AAA of Big data and Analytics!
The traditional DW does not provide the level of transactional granularity that is required for predictive modeling, so data scientists were left to their own devices to create what we call analytical datasets. They spend enormous amounts of time collating and cleaning data, sometimes extracting it directly from operational data sources….….all this before we could  embark on any serious modeling. This creation of datasets sometimes accounts for 50-60% of the time that is spent by a data scientist/predictive modelers!
The focus of traditional datawarehouses is more around KPIs, reports, metrics and an optimized DW structure to facilitate drilldowns and faster report retrieval. As far as I remember, DWs were never created with the data scientist and predictive modeler in mind
Big data technologies are creating a shift in how the traditional data modeler has approached his/her work! 
Availability:  New sources of data like social media data are available today.  Text mining algorithms mean that insights can be extracted from long texts like blogs and call center conversations to further help data scientists understand customer behavior. Accessibility to data at a much more granular level like RFID feeds, smart meter data, geo location; network data are introducing additional dimensions that the data scientist could exploit in building predictive models.
Ability:  Traditionally, much of the predictive analytical modeling has been done by data scientist on historical sample datasets.  Big Data platforms are able to store and process larger amounts of data at a fraction of previous times. As a result, data scientist has the option to develop models based on much larger data sets, as against sample datasets that were done previously. Models developed on larger datasets are more relevant to the underlying population. Also more anomalies in population can be factored in, since the data scientist has looked at population as against a sample.  Additionally, for the first time, the data scientist has access to both unstructured and unstructured data on a single platform allowing for shorter modeling cycle times and greater model accuracy.
With a combination of smart carts, market basket analysis, forecasting techniques and understanding customer behavior, a retailer can make an offer to a customer based on their current basket, leading to increased sales. Observed changing calling patterns can generate alerts which can be used by a Telecom company to make offers to a customer before a competitor reaches out to them.  Trends in machine logs can be monitored in real time to pre-empt a machine breakdown and reduce down times. The combination of traditional modeling and a superimposed robust rules engine can be effectively used to decrease response times to events and customers. Very recently, we helped a client segment their customer base – the segmentation exercise was dynamic with a superimposed rules engine detecting pattern changes that may predict a segment change, so that offers were more tuned to recent behavior rather than historical patterns.
Increasing availability of data, and the ability and agility that Big data technologies afford is increasingly enabling the data scientist to create additional value

About the author

Mamatha Upadhyaya
2 Comments Leave a comment
Accuracy could be a good candidate for a fourth A, e.g. taking into account the new targeting opportunities that Big Data & Analytics can offer.
maupadhy's picture
Yes Absolutely. ...Model accuracy versus cost, an ongoing dilemma for predictive modelers...... can be the subject of another blog!

Leave a comment

Your email address will not be published. Required fields are marked *.