Do you have ideas to develop using AI but are not sure if they will work or scale up appropriately? Previous blogs highlighted the strategies to ensure the maturity of AI initiatives and the struggle in assessing business value. This blog focuses on quality –and data quality.
Data abundance and most importantly, quality are the key enablers to any AI project’s success. Before embarking on the long-term journey of AI (not cheap nor fast) it is essential to solve – to the greatest extent possible – any known issues with data itself (reliability, privacy, representativeness, bias identification, etc.) until there are no doubts around its quality. Poor data quality can lead to bias, spurious correlations, misidentification of trends and many other negative effects in AI model results.
The AI hierarchy of needs represents the stages that successful data-driven organisations navigate when developing AI initiatives with prospects of industrialisation. One should not progress to a higher level if the requirements of lower, foundational levels are not satisfied. A horizontal growth of the pyramid represents the successive evolutions from initial Proof of Concept (PoC) (narrow, vertical section in the pyramid, where each level works well end-to-end) to industrialised AI applications (the whole pyramid, where the widening stands for enhancements at each level).
The AI Hierarchy of Needs
1st Data collection:
This stage demands a clear understanding of the data sources and availability of data to deliver the business outcomes. A PoC may be proven on a specific set of data – perhaps even open-source data – isolated for the purposes of the project. Maturity will entail setting up strong data governance strategies (also covering storage and management processes) to boost data trust and support the scaling of AI initiatives.
2nd Data flow:
A reliable data flow with fundamental Extract, Transform, Load (ETL) processes, including data cleansing, are defined to prepare the data for analysis. There is a notable transition from the functional data pipelines set up to support a PoC, and the optimised solutions that make use of full data integration with IT systems. Organisations functioning on fragmented, legacy IT architecture that hamper the normal operations of data teams (engineers, analysts, and scientists) could consider a data estate modernisation (addressing cloud resources, scalability, and security) aligned with its AI roadmap.
We have data effectively integrated into the client environment and are confident about its quality. At PoC stage, analytics focuses on identifying the metrics and KPIs to track, perhaps with the support of dashboards and reports. Analytics are executed with the end goal of applying AI. This includes deciding on a simple baseline to assess on the performance of our solution and thinking about the data as features (Are there labels? Do we need to increase the diversity of the data?). More powerful analytics solutions may be incorporated as the development matures (data mining, descriptive analytics, and diagnostic analytics).
It is time for AI. Off-the-shelf AI models can be good enough to prove the value of a PoC, with model updates, such as algorithm fine-tuning, forming a healthy model lifecycle (Test-Learn-Validate). The deployment of updates is dictated by the rules of agile practices, in particular MLOps and DataOps (a blend of development and operations, agile and lean processes).
Thinking carefully about each stage in the hierarchy sets a solid foundation on which AI development and deployment can thrive. Aim for a PoC that fulfils all stages and iterate; build the pyramid, then grow it.
Sergi Capape – Data Science Consultant, Analytics & AI
Sergi offers expertise in data science and consulting services to organisations across a wide spectrum (from engineering to financial services) to improve value delivery.