Enabling text-based insights: infrastructure matters

Publish date:

Techniques in data collection and analytics are soft assets that need an adequate pipeline to realize their full potential. Check out this blog post for a discussion on infrastructure for NLP in the public sector.

As readers may have noticed, the techniques regarding data collection and analysis are already established and still developing. These “soft” assets, however, can only reach their full functionality when hardware conditions are met.

Perhaps the more practical question is why these use cases are not widely applied. Besides the path-dependence issue within the public sector and the lack of the awareness of big data before the wave of artificial intelligence, we argue that the nonexistence of modern data collection and analytical infrastructure restricts the full potential of the NLP techniques.

Note: A complete pipeline of NLP to realize its full potential.

For one thing, the OCR and various data mining techniques require sufficient storage and adequate servers to digitize the documents accordingly. Without reliable data storage and pre-processing infrastructure, there is no way to employ cutting-edge analytical algorithms. As for analytics, the public sector needs to use cloud-based solutions or high-performance computing (HPC) to derive text-based insights effectively. Without an infrastructure for big data, there will not be any valuable evidence to derive.

In addition, I would like to mention the role of human intervention with regard to the NLP pipeline. Applying the quantitative method does not mean that qualitative endeavor is obsolete. Instead, for all supervised or rule-based training of the models, human labeling is essential for future success. Also, qualitative knowledge about specific administrative or political issues can inform the NLP solution developers about model-relevant issues so that the pipeline can be tailored to the needs of various tasks. For instance, particular word combinations such as “parental leave” are important features for some political fields. Dividing them into “parental” and “leave” is not informative.

To sum up, NLP techniques provide the public sector with a variety of new opportunities to analyze relevant data in unstructured text form. There are already many established use cases that use cutting-edge algorithms to amass large amounts of data and deliver politically and administratively relevant insights in an automated fashion. Nevertheless, an infrastructural pipeline is essential for the success of the analytic engine, and for valuable qualitative insights and labeling practices.

You can get in touch here.

Related Posts


Accelerating the customer experience (CX)

Goutham Belliappa
Date icon September 15, 2020

There is no single answer or recommendation on the preferred architecture. It depends on the...

Data Analytics

Monitor data proactively to increase model resilience

Chandrasekhar Balasubramanyam
Date icon July 22, 2020

A new approach can keep your predictive models a step ahead – even during a crisis such as a...

Insights and Data

Retain value, decommission cost and manage your business’ COVID response

Steve Jones
Date icon July 20, 2020

Data sharing of employees and facilities information will ultimately help transform how...