Enabling text-based insights: infrastructure matters

Publish date:

Techniques in data collection and analytics are soft assets that need an adequate pipeline to realize their full potential. Check out this blog post for a discussion on infrastructure for NLP in the public sector.

As readers may have noticed, the techniques regarding data collection and analysis are already established and still developing. These “soft” assets, however, can only reach their full functionality when hardware conditions are met.

Perhaps the more practical question is why these use cases are not widely applied. Besides the path-dependence issue within the public sector and the lack of the awareness of big data before the wave of artificial intelligence, we argue that the nonexistence of modern data collection and analytical infrastructure restricts the full potential of the NLP techniques.

Note: A complete pipeline of NLP to realize its full potential.

For one thing, the OCR and various data mining techniques require sufficient storage and adequate servers to digitize the documents accordingly. Without reliable data storage and pre-processing infrastructure, there is no way to employ cutting-edge analytical algorithms. As for analytics, the public sector needs to use cloud-based solutions or high-performance computing (HPC) to derive text-based insights effectively. Without an infrastructure for big data, there will not be any valuable evidence to derive.

In addition, I would like to mention the role of human intervention with regard to the NLP pipeline. Applying the quantitative method does not mean that qualitative endeavor is obsolete. Instead, for all supervised or rule-based training of the models, human labeling is essential for future success. Also, qualitative knowledge about specific administrative or political issues can inform the NLP solution developers about model-relevant issues so that the pipeline can be tailored to the needs of various tasks. For instance, particular word combinations such as “parental leave” are important features for some political fields. Dividing them into “parental” and “leave” is not informative.

To sum up, NLP techniques provide the public sector with a variety of new opportunities to analyze relevant data in unstructured text form. There are already many established use cases that use cutting-edge algorithms to amass large amounts of data and deliver politically and administratively relevant insights in an automated fashion. Nevertheless, an infrastructural pipeline is essential for the success of the analytic engine, and for valuable qualitative insights and labeling practices.

You can get in touch here.

Related Posts

Insights and Data

Collaborative data ecosystems

Date icon October 13, 2021

Data ecosystems are emerging across industries, from financial services to automotive to...

Insights and Data

Best practices to admire (and adopt) from the data masters

Zhiwei Jiang
Date icon September 29, 2021

By starting with the business’s strategic priorities and specific problems, data masters are...

Insights and Data

An engineering approach to data mesh

Ron Tolido
Date icon September 28, 2021

A data-powered enterprise creates value by making data accessible across the enterprise. Yet,...