Skip to Content

Machine learning models, alternative data sources expand banks’ credit-scoreable population

Gunjan Aggarwal

The recent implementation of the General Data Protection Regulation (GDPR) within the European Union and European Economic Area (EEA) gives consumers control over their personal data and aims to simplify the international business regulatory environment. While the GDPR’s ultimate impact on the global financial services industry will evolve, one thing is clear today – leading-edge data management systems are more critical than ever. For lenders, the ability to efficiently score and approve the most credit-worthy customers while maintaining regulatory compliance is a must-have competence.

Credit card debt in the United States reached its highest point ever last year, surpassing $US one trillion, with the average household carrying US$16,883.[1] So it goes without saying that technology that could improve a bank’s returns on credit held, or that could grow market share, are worth a second look. It’s no surprise, then, that both incumbent banks and startups are exploring innovative new underwriting models.

Are you overlooking deserving customers?

Historically, consumer credit tests have included an evaluation of an individual’s creditworthiness, debt burden, borrowing frequency, and social and community considerations. However, the linear nature of these statistical models makes it difficult to include and analyze the growing volume and variety of Big Data that can help lenders make informed credit decisions. A machine learning model, unconstrained by some of the assumptions of classic statistical models, can yield insights that a human analyst might not reach.

Through machine learning (ML) models, lenders can now directly implement algorithms that assess customer risk and assign scores, even to thin-file or no-file customers (individuals without recent credit files or those with little credit history).

Thin files may be thick with opportunity

Considering that one in 10 adults in the United States has no credit history with one of the three leading credit bureaus, algorithmic ML capabilities could have an enormous impact on lenders’ revenue-generating potential.[2]

Moreover, ML results are explainable for compliance and internal and external communication. Machine learning models can liberate banks from exclusive reliance on third-party credit companies.

In fact, ML models enhance credit bureau reports by recalculating existing consumer credit indexes based on external data sets,[3] which often enable banks to more meaningfully assess and accept previously overlooked credit applications.

ML models can interpret countless consumer attributes

US consumer contact data
Email addresses7+ billion, 620M unique
Postal addresses250 million
Mobile phone numbers110 million
Address history file2+ billion
Global business contact data
North America111,529,591
Europe (GDPR restrictions)53,027,013
Latin America19,700,218
Other data tyes types
Email reputation file1.6 billion
Compromised credentials file1.25 billion
US voter registration records96 million
Criminal records250 million
Under-banked consumer records100 million
Mortgage application records10 million
Social media records150 million

Source: Versium LIFEDATA® Data Card 2018.

Whether a bank wants to more efficiently manage current credit customers or take a closer look at the millions of consumers currently considered unscorable, alternative data sources can provide a 360-view that is far superior to traditional credit scoring. Third-party data sets can unveil consumer information (such as social media activity, texting, travel history, frugal phone patterns, or on-time utility bill payments) that can increase the predictive accuracy of the credit scores of millions of credit prospects – consumers who may be desirable but have been invisible to lenders before now.

ML models that leverage alternative data sets can target population segments ignored by banks that rely exclusively on traditional credit-scoring models – which can lead to a commanding competitive advantage.

A strategic approach, knowledge of the dynamic regulatory landscape, and access to large amounts of data are must-haves for banks aiming to establish an ML-based credit index to expand their customer roster.

From data ingestion and preparation to discovery and real-time data analysis that uses open source or commercial tools, Capgemini’s Analytics and Data Science team – a part of the Insights and Data practice – can help banks learn more about best practices to reach their business objectives.

Tap into Capgemini analytics and data science expertise

To find out how Capgemini’s analytics solutions and services drive innovation, enable agile new ways of working, and increase business value, contact Gunjan Aggarwal or click here to learn more about our Smart Analytics platform for financial services.

As Capgemini’s practice leader for Analytics and Artificial Intelligence, Gunjan Aggarwal helps clients face today’s dynamic business environment by offering innovative thinking and sound technology advice for measurable results. A digital native with years of information technology and systems integration experience, Gunjan is an expert when it comes to building standardized and custom data science solutions.

[1] CNBC, “Credit card debt hits a record high. It’s time to make a payoff plan,” Jessica Dickler, January 23, 2018, CNBC.com

[2] Consumer Financial Protection Bureau, “Who are the Credit Invisible?” Michele Scarbrough, December 12, 2016,

[3] External data sets might include: business and firmographic information, personal contact and directory assistance information, business news coverage, online and offline data indexing, social media sites, corporate sites, SEC filings, blogs, government data sources.