We are in an age where users indulge in online engagements across multiple businesses over various contexts. The user intents learned, are by and large used to inject cross selling and up selling of products, through the allotted ad spaces.  Learning a behavior and deriving recommendations can have larger margins of error, as a majority of the engagements online may not be those that significantly influence business outcomes.  Also the behavior of the user on a media platform interacting with a commerce platform varies significantly. Hence deriving the business-influencing context from a larger set of behavioral traits becomes a major challenge. Most of the AI engines will have a “knowing” as well as a “learning” phase. The following diagram is a standard, simple view of widely followed models. There is a continuous learning, analysing, correlating and recommending phase which has error filters that keep the sanity of the data and accuracy of the system.

The key challenges of user recommendations are to create a personalized strategy for a large volume users interacting with a platform. This is usually solved with a segmentation approach that happens for a group of users or patterns of behavior. This creates considerable pressure on the accuracy of the recommendation as it is applied across a segment. However, the user context is a significant factor that can eliminate errors at various enterprise applied strategies. The context of a user can be simplified—finding answers to questions such as: who are you, where are you, why are you in this transaction, what do you want, where did you come from, where are you navigating, what are you looking at within the transaction to make the decision, etc. The recommendation accuracy has more dimensions to match, for example, if the current context of the user can be mapped to the past learned data and context. In summary, the context helps to understand the user better and hence recommendations can be more accurate.

Another dimension of the problem highlights the fact that, socially created content on a large scale needs of diligent curating and classification or tagging procedures. Often the content learning and filtering poses bigger challenges than user learning. Extracting context from the content is a complex task and that involves deep media content analysis. This is independent of the user behavior of the creator or the behaviors of the users (audience) that consume the content. Hence user learning will not be a great yardstick to gauge the content. The AI training phase needs a wider and larger set of data and dimensions—images, objects, action, event, and patterns to classify new content. Image and video recognition is a significant area of innovation that can address a majority of the problems. A foolproof video recognition AI engine will have to be immune to frame injections that can lead to false tagging. One of the recent research studies by the University of Washington reveals the vulnerability of the recognition by noise frame injections.

Recently Google has come under fire for brand safety issues for customers subscribed to Google Ads. An inaccurate ad recommendation places critical customer ads and inappropriate video content in proximity in Youtube. In response, Google has said they will put more focus on machine learning algorithms to learn context accurately and apply the right set of advertisements at the right user context. Google has accurately solved this real time video learning and interpretations in self-driving algorithms. Google will be mostly applying some of the key learning in those deep learning aspects on the advertisement injection space. This is essentially Google’s way of “knowing” the content better beyond “learning.

Inaccurate content learning and applying the wrong advertisement, product, and content strategy should be key filter criteria for recommendation use cases. The immunity of recommendations to noises in content should be a key acceptance criteria for the recommendation approaches. Enterprises will have to focus on solving the learning challenge as well as knowing the contents and the users.

Content recognition, classification and tagging are a vast research area aimed at addressing most of the key vulnerabilities of social content creations e.g., fake news filters, objectionable content filtering for advertisements, inaccurate recommendations affecting customer loyalty, etc. These innovations have applicability beyond content hygiene itself; it shapes the visual intelligence discipline to create more digital transformational use cases for media and entertainment, e-commerce, industrial cataloging, etc.

In the future, videos, images, and audio will become the default medium of interaction rather than text, for most of the business transformational use cases. This is because of the amount of information exchanged and conveyed by media becomes significantly larger than textual modes. The accuracy of data and productivity boost in decision making has a direct amplification factor by using videos, images and audio as medium. For example voice activated CRM, service audit by video recording, status updates by images, product lookup by real time photos or videos, parts identification by photographs and more, disrupts the way workflows are done currently. Drone based data collection, shipment delivery, autonomous drones etc will be driven by advanced vision algorithms for drones. The mass adoption of these technologies will require dedicated recognition capabilities that can directly help business decision making with minimal swipes and taps. Learning and knowing the user and the environment will become the new normal of digital living.


April 8, 2017