The need for Data Scientists increases with the volume of big data on a daily basis, their skill set will be essential to successfully retrieving the information required as much as finding use for the masses of data which are accumulating in different businesses/industries.

Typical Business Intelligence is not built for Big Data analysis and requires a different approach, more realtime, more forward looking (as opposed to backwards for business intelligence) pattern recognition and new visualisations able to show large amounts of data and being able to interact with it. In the movieThe Matrix I did see the first attempt (already 16 years ago!) to do realtime analysis of a whole virtual earth. 

Sentiment analysis being improved by new taxonomy and making it ever more complex to build models to increase accuracy. This actually prompted a question, what Big Data scenarios require what sort of approach. Although you can always find a reason to use one or the other approach, I find the following to be the most common solutions to get ROI for the Big Data scenario in question.

Which of the Big Data scenarios require realtime/predictive analysis?

Smart Meter Analytics

Massive volumes of machine-generated data from smart meters can be captured for analysis and efficient long-term storage, utilities can use that data to detect patterns and make recommendations to customers. The end result: more efficient energy distribution and improved customer relationships.

Predictive Analysis: Yes, pattern recognition
Realtime analysis: No

Fraud Detection & Anti-Money Laundering

Financial services companies are challenged to create a balance between increasing service offerings while reducing exposure to perpetrators. Here a link to a proven Capgemini solution including information about the solution approach using SAS and Big Data.
As new systems and features are rolled out to customers, perpetrators are constantly adapting to fraud prevention techniques. Sophisticated firms are perfecting their fraud detection models and use real-time fraud detection engines to catch bad behaviour in flight.

Predictive Analysis: Yes, pattern recognition
Realtime analysis: Yes

Genome Processing & DNA Sequencing

According to the National Human Genome Research Institute, it costs about US$10,000 to sequence one genome using standard technology, and this process generates about 100GB of compressed data. Apache Hadoop has made it affordable for scientists to process and store this data, helping to reduce the cost of sequencing a genome to less than US$100.

Predictive Analysis: No
Realtime analysis: No, but processing power is required. 

User Engagement & Digital Content Analysis

In the world of media and entertainment, the longer you can keep your customers engaged, the more successful you’ll be. Media and entertainment companies track user interactions across systems and channels in real time — to personalize the user experience, improve website stickiness and increase customer loyalty.

Predictive Analysis: Yes, predict next purchase
Realtime analysis: Yes

Sentiment & Social Media Analysis

Companies that sift through social media and user-generated content (UGC) data to better understand customer sentiments can respond to those customers in a more personal manner, and establish strong customer loyalty.

Predictive Analysis: No
Realtime analysis: No

I have some experience with Netbase, but it can be quite challenging if there are little followers out there, to draw an accurate picture. So the numbers also have to support it.

360-Degree Customer View & Customer Churn Prevention

Retail data is often stored in silos, so it’s difficult to correlate data about customer purchases, marketing campaign results, and online browsing behavior. The retailers that figure out how to bring all that information together into a single, multi-channel view will come out on top.

Predictive Analysis: No
Realtime analysis: Yes

Traffic management information

Researchers at KTH Royal Institute of Technology in Sweden are using streaming analytics technology, to gather real-time information from the Global Positioning System (GPS) devices on nearly 1500 taxi cabs in the city and will expand to gather data from delivery trucks, traffic sensors, transit systems, pollution monitors and weather information.

Predictive Analysis: Yes
Realtime analysis: Yes

4/7 Predictive
4/7 Realtime

It’s a draw … Have I left some out? Of course! Giving you the reader a chance to post your comments.

Note: There are some big data datasources/scenarios which are not widely used in the business world, like particle accelerator data, race information (F1), air traffic control information are all big data generators and in general R&D information (pharmaceuticals).



References used: Cloudera