Skip to Content

Improved identification through AI-driven document control

Kilian Toelge
30 Mar 2022

How can public security and safety organizations make use of Artificial Intelligence (AI) when working with privacy-sensitive documents?


Public security and safety organizations can use AI-driven solutions to tackle the growing problem of identity fraud.
Public security and safety authorities must deal with the complexity of identity documents.
Lack of data requires a division into generic AI components.
A hybrid form of people, data and AI ensures a future-proof application.

The struggle between order and crime is a continuing phenomenon in society, where innovation plays a crucial role for both sides. Europol research in 2020[i] shows that criminals have been using AI to fraudulently get their hands on money or obtain other benefits for some time.

Identity documents are the most important documents that people possess, with some countries insisting that everyone over a certain age has at least one. Identity documents contain an individual’s basic information and are important during official activities or events, such as traveling, opening a bank account, taking out insurance, a police check, etc.

This makes it particularly worrying that the development and availability of advanced image editing technologies and printing techniques is causing an increase in identity fraud. In the past six years, this form of crime has increased by more than 500 percent within the Netherlands alone, and further afield some 47 percent of Americans experienced financial identity theft in 2020[ii]. It’s a big problem elsewhere too, with a report published  in 2021 stating that France had seen an explosion in the rates of identity and biometric document fraud at four times higher than the rest of Europe.[iii]

Figure 1: Identity theft and fraud complaints in the USA, 2016-2020 (US Federal Trade Commission, Consumer Sentinel Network[iv])

Identity theft complaints (as shown in the bottom layer of the graphic above) increased by roughly 250% in just 5 years, according to the US Federal Trade Commission.

This does not mean that identity documents are insecure. Countries regularly introduce new and more complex security features for identity documents precisely to make it as difficult as possible for fraudsters.

The problem lies with controlling authorities, such as government agencies or institutions like banks, insurers, airports, embassies, etc. They have to carry out increasingly complex and specific controls to validate the authenticity of an identity document. So, it is important that the means being used evolve with time and that these controls are incorporated.

Challenges in the field of document control

Three major challenges with respect to document control must be taken into account when developing such an application.

The complexity of identity documents

One of the biggest challenges is the complexity of the documents themselves. There are around two hundred countries in the world, each of which has its own identity documents. Per country, there are often more than ten different valid types and models. Examples include ordinary passports, service passports, ID cards and residence permits. Every document has between fifty and a hundred security features. These can be categorized into standardized agreements with regard to the structure of a document, and country-specific and/or model-specific security features on the document, such as the check digit in an MRZ (Machine-Readable-Zone) at the bottom of the passport) or a country-specific hologram.

The complexity of documents is such that forensic document experts and specialized equipment are needed in order to investigate each aspect. For many people and companies, such knowledge and resources are insufficiently available, if at all. In addition, there are unknown security features or ones that may not be disclosed to everyone. Finally, there are processes where there is simply not enough time to carry out in-depth checks.

The complexity of the form and structure of identity documents is clearly a challenge for automated processes. In addition to the form, the content of the documents also poses problems.

Privacy-sensitive data

A second challenge in developing an application to validate identity documents is the lack of data. Identity documents contain personal and sensitive data about their holder, which means that the scans must not be stored and retained for the development of an AI model. However, a complete AI solution would require a large number of documents. As such, it would not be enough to have only authentic documents in the dataset for an AI solution to be effective. In fact, it is particularly important to include dozens of examples of (authentically) falsified security features for each document. However, this is not feasible, because for every security feature, there are only so many known forgeries out there.

In addition, documents are issued by countries that do not completely adhere to the standards, or where production errors have arisen. This means that examples of these must be fed to the AI model as well, so it learns that deviations can exist and that these are not counterfeits.

The sensitivity of the data and human error in the production process mean that it is impossible to apply a complete AI solution to the verification of identity documents. Both the documents themselves and the availability of the data cause limitations in the application.

Technical limits of scanning equipment

As a third challenge, there are the technical limits of current scanners and control processes. Despite the fact that new scanners are regularly introduced, with higher resolutions and additional functionalities allowing for the recognition of very small print (so-called microprints), there will always be security features that cannot be checked via scans. Countries are deliberately developing security features that can only be checked on the physical identity document using specialist forensic equipment. Thus, common document scanners cannot see certain security features. In addition, some control bodies, such as the police, do not always have the option of using these advanced scanners, because they cannot take them out on the street.

As a result, the application used for automatic checks is constrained by the technical limits and the availability of the scanners.

Hybrid AI solutions as a future-proof application

How can existing technologies best be used to meet these challenges?

The added value of document templates

In the initial stage, it is important to collect and store information about the various security features and documents in a smart way. There are multiple publicly available collections of this information, such as PRADO and Edison. This information can be used to create separate document templates for each type and model of an identity document, which will serve as the basis for the application.

It is advisable to take into account repeating elements or structural properties, such as the ICAO standards, to minimize the required storage in the database and the amount of work for those entering the data into the system. In this way, a database can be created that contains templates for every known identity document with the corresponding security features, variations, and checks. These document templates will make the complexity of the identity documents manageable and enable the application to apply country-specific and model-specific checks in addition to the standard ICAO checks.

The challenge with regard to the form of the documents can thus be solved by the use of document templates, but what about the challenge with regard to the sensitivity of the data?

Applied generic AI

The lack of data for a complete AI solution does not mean that the power of AI cannot be used in the validation of identity documents. It is possible to break up the validation process into steps that are generic enough to make it possible to develop specialized AI components, which can perform the required tasks without access to large amounts of privacy-sensitive data.

Figure 2: Validation process broken up into generic steps with specialized AI components.

For example, a Deep Learning model capable of comparing newly scanned documents to a database consisting of only one sample document (specimen) per document template can recognize the correct document template (classification of the documents). This step is important in retrieving the country-specific and model-specific information for a scanned document from the database.

In addition, object recognition can be used to crop the document from the scan. The AI for this can be trained on all types of documents and scans, allowing for the bypassing of sensitive data. Facial recognition, like the one on mobile phones, makes it possible to compare the photo of the document to a live recording of the holder. The text of the document can be read with the latest OCR (Optical Character Recognition) techniques to further verify it in a later step of the process.

None of these specialized AI components require sensitive identity documents and data for their development. So, instead of a full AI solution, it is possible to solve the data problem by making smart use of multiple and more specific AI components. In addition, there is control of what steps are carried out and how these are carried out, preventing a ‘black box’ phenomenon. This approach will deliver more reliable and more manageable results while leveraging the latest advances in AI.

Dealing with technical limits

The problem of the scanners’ technical limits and those of the scanning process cannot be solved by an application. In order to perform a complete document inspection, it is and will continue to be necessary to inspect the physical identity document manually. A manual inspection supported by technology is recommended to solve the previously identified complexity problem, and to minimize human errors.

In the first stage, the classification from the application can be (re)used to show the correct document template for the manual inspection. This saves time. In addition, the overview of the document template can be arranged in such a way that the inspector is guided through the manual inspection associated with this specific document step by step. This ensures all the security features are checked in the correct manner. Further, an overview also offers the possibility of showing the inspector additional information. For example, an alert for known forgeries may appear for the document shown, ensuring that the inspector pays extra attention to this.

This approach can also be used by the police to remedy the lack of document scanners on the street. In addition to the automatic checks via the camera of their mobile phone, they could also use a mobile application to access the manual inspection and carry out a more thorough inspection.

The addition of a step-by-step manual inspection of some security features by humans circumvents the scanners’ technical limits.

Working together to combat identity fraud

Identity fraud is an ever-increasing problem. Countries will start to incorporate more and more complex security features into their identity documents to make life as difficult as possible for fraudsters. This means that government agencies and other institutions will need smart, scalable, and future-proof tools to continue to inspect these characteristics.

The sensitivity of the personal data is holding back implementation of the complete AI solutions already used in other areas. This means that control bodies must switch to specialized AI components to keep up with technological progress and be able to use the power of AI in the future.

Find out more

This article has been adapted from a chapter in the Trends in Safety 2021-2022 report giving European leaders insight into the safety and security trends affecting citizens in the Netherlands.

  • The full report in Dutch can be found here.
  • An executive summary in English can be found here.

For information on Capgemini’s Public Security and Safety solutions, visit our website here


[i] malicious_uses_and_abuses_of_artificial_intelligence_europol.pdf

[ii] U.S. Identity Theft: The Stark Reality –




Killian Toelge

Data Scientist & Software Architect
Kilian specializes in creating tailor-made AI solutions. In recent years, he has been designing and developing an application for the validation and verification of identity documents through AI in the public sector. Email id –