Enabling text-based insights: infrastructure matters

Publish date:

Techniques in data collection and analytics are soft assets that need an adequate pipeline to realize their full potential. Check out this blog post for a discussion on infrastructure for NLP in the public sector.

As readers may have noticed, the techniques regarding data collection and analysis are already established and still developing. These “soft” assets, however, can only reach their full functionality when hardware conditions are met.

Perhaps the more practical question is why these use cases are not widely applied. Besides the path-dependence issue within the public sector and the lack of the awareness of big data before the wave of artificial intelligence, we argue that the nonexistence of modern data collection and analytical infrastructure restricts the full potential of the NLP techniques.

Note: A complete pipeline of NLP to realize its full potential.

For one thing, the OCR and various data mining techniques require sufficient storage and adequate servers to digitize the documents accordingly. Without reliable data storage and pre-processing infrastructure, there is no way to employ cutting-edge analytical algorithms. As for analytics, the public sector needs to use cloud-based solutions or high-performance computing (HPC) to derive text-based insights effectively. Without an infrastructure for big data, there will not be any valuable evidence to derive.

In addition, I would like to mention the role of human intervention with regard to the NLP pipeline. Applying the quantitative method does not mean that qualitative endeavor is obsolete. Instead, for all supervised or rule-based training of the models, human labeling is essential for future success. Also, qualitative knowledge about specific administrative or political issues can inform the NLP solution developers about model-relevant issues so that the pipeline can be tailored to the needs of various tasks. For instance, particular word combinations such as “parental leave” are important features for some political fields. Dividing them into “parental” and “leave” is not informative.

To sum up, NLP techniques provide the public sector with a variety of new opportunities to analyze relevant data in unstructured text form. There are already many established use cases that use cutting-edge algorithms to amass large amounts of data and deliver politically and administratively relevant insights in an automated fashion. Nevertheless, an infrastructural pipeline is essential for the success of the analytic engine, and for valuable qualitative insights and labeling practices.

You can get in touch here.

Related Posts

AI

How to find the path to a data-powered enterprise

Dinand Tinholt
Date icon November 18, 2020

Companies are still struggling to activate data and artificial intelligence.

Business Data Lake

Why data needs to be hands-free

Danny Centen
Date icon October 14, 2020

Data is being democratized, extending the audience for business analytics by sharing access...

Business Data Lake

Enjoying your data lake from your lakehouse

Fiona Critchley
Date icon October 13, 2020

When it comes to data, businesses have trust issues.