Enabling text-based insights: infrastructure matters

Publish date:

Techniques in data collection and analytics are soft assets that need an adequate pipeline to realize their full potential. Check out this blog post for a discussion on infrastructure for NLP in the public sector.

As readers may have noticed, the techniques regarding data collection and analysis are already established and still developing. These “soft” assets, however, can only reach their full functionality when hardware conditions are met.

Perhaps the more practical question is why these use cases are not widely applied. Besides the path-dependence issue within the public sector and the lack of the awareness of big data before the wave of artificial intelligence, we argue that the nonexistence of modern data collection and analytical infrastructure restricts the full potential of the NLP techniques.

Note: A complete pipeline of NLP to realize its full potential.

For one thing, the OCR and various data mining techniques require sufficient storage and adequate servers to digitize the documents accordingly. Without reliable data storage and pre-processing infrastructure, there is no way to employ cutting-edge analytical algorithms. As for analytics, the public sector needs to use cloud-based solutions or high-performance computing (HPC) to derive text-based insights effectively. Without an infrastructure for big data, there will not be any valuable evidence to derive.

In addition, I would like to mention the role of human intervention with regard to the NLP pipeline. Applying the quantitative method does not mean that qualitative endeavor is obsolete. Instead, for all supervised or rule-based training of the models, human labeling is essential for future success. Also, qualitative knowledge about specific administrative or political issues can inform the NLP solution developers about model-relevant issues so that the pipeline can be tailored to the needs of various tasks. For instance, particular word combinations such as “parental leave” are important features for some political fields. Dividing them into “parental” and “leave” is not informative.

To sum up, NLP techniques provide the public sector with a variety of new opportunities to analyze relevant data in unstructured text form. There are already many established use cases that use cutting-edge algorithms to amass large amounts of data and deliver politically and administratively relevant insights in an automated fashion. Nevertheless, an infrastructural pipeline is essential for the success of the analytic engine, and for valuable qualitative insights and labeling practices.

You can get in touch here.

Related Posts


Deriving sharp insights from unstructured texts: analytical approaches

Qixuan Yang
Date icon February 18, 2020

In the previous post on the use of NLP in the public sector, some techniques and use cases...

Data Analytics

Data & analytics trends for consumer products and retail companies in 2020

Dinand Tinholt
Date icon January 10, 2020

What will be the key trends for 2020 and where will organizations focus their efforts this...


New age competencies

Navjit Gill
Date icon October 31, 2019

It is time to seize the business opportunity that automation, analytics, and AI bring


By continuing to navigate on this website, you accept the use of cookies.

For more information and to change the setting of cookies on your computer, please read our Privacy Policy.


Close cookie information