Skip to Content

Analyzing software repositories to improve testing: Changing the game for software testers

Vivek Jaykrishnan

At Capgemini, we work with our clients to apply analytics to the software repositories being used on a day-to-day basis. Why? Quite simply, to improve the way they test. But I’m occasionally asked whether such analytics really is helpful or if it is simply hype—a case of something new. My answer is unequivocal: analyzing software repositories really does improve testing quality.

These repositories hold a wealth of information about how people collaborate to build software. This is information that can be mined to better understand prior experience and dominant patterns, and from which historical data can be extracted to use as predictors.

But, of course, in a collaborative product development environment, this data is spread across different repositories and represented in different forms. In the development and test (engineering) cycle, for example, there are quite a few entities or artifacts, such as test, code, requirements, and defects, and there are multiple repositories that also contain information which could provide insights.

Learning from diverse data categories

For each of these entities, data falls into different categories:

  • Interaction data, such as review comments, email and chat transcripts, defect notes, etc.
  • Descriptive data, such as attributes of code, test case meta data, defect and code quality attributes, etc.
  • Attitudinal data, such as user forum opinions, survey results, social media data, and beta testing feedback
  • Behavioral data, such as requirements, commit history, test execution and use history, and UML.

The volume and disparate nature of this data raises challenges, notably around data availability, data quality, and data accessibility. So how does analytics help? We can learn from the data we analyze. This is nothing new. Decades of research in analytics, visualizations, statistics, artificial intelligence (AI), etc. have generated many powerful methods for learning from data.

Making AI part of your analytics toolkit

AI is increasingly part of the analytics toolkit. Based on recent progress specifically in the field of Deep Machine Learning, there is a growing conviction that AI has the potential to significantly improve the way we leverage IT in pretty much any business domain. Analysts project substantial double digit growth rates for AI-empowered business in the coming years. IDC expects spending on AI technologies by companies to grow to $47 billion in 2020 from a projected $8 billion in 2016. But, let’s take a step back and consider the huge value of analytics and AI in testing. I firmly believe that the testing function can improve and optimize the test strategy through better analytics of software repositories. There are four particular applications for analytics in this area that yield highly beneficial outcomes:

  • Descriptive: understand what is happening in your test operations, for example how effective your tests are
  • Diagnostic: understand why things (defects or test failure) are happening, for example why some tests never fail
  • Predictive: gain insight into what is likely to happen, such as which tests will fail
  • Prescriptive: identify what you need to do, for example which tests you need to run to reduce the test cycle time to two weeks.

Right data, right quality, right access

To get to these outcomes, however, the testing function must first identify the right project or program with the right data sources on which to apply analytics activity; and then assess whether the data sources are analytics-ready by carrying out an assessment to check for data quality, data availability, and data accessibility. This is an elaborate process whereby various characteristic can establish a project as a candidate for applying analytics to improve the way testing is undertaken. These characteristics include:

  • Active projects with sufficient historical data
  • ALM tool usage with some level of integration or process template
  • Active projects with a huge number of test cases and code churn for better ROI—not a tiny project
  • Existence of source to test mapping is an advantage
  • Tests could be automated, manual, or both
  • Unable to complete regression test cycle due to: volume of test; limited time; and resource constraints
  • Teams do not have a regression selection strategy or a good test review process.

There are other challenges that must be overcome on this journey, such as a lack of integration between software repositories, poor data quality due to process violations, inability to validate recommendations due to a lack of SME bandwidth, and the required skillset (data scientists and testing experience). But the value of software repositories for analytics makes the effort worthwhile.

From optimized test coverage, auto generation of test scripts, and test sets aligned with real application usage, to the ability to predict and advise on the marginal value of additional testing to ensure release readiness, and better visualization of quality (e.g., cost of a test bug, test efficiency, etc.), analytics is changing the game for software testers.

Taking an intelligent approach to QA

The latest World Quality Report 2017 shows that 99% of organizations face challenges with quality validation in agile projects and that only 16% of test activities are automated, clearly indicating the need for an intelligent approach to QA.

At Capgemini, we recommend analytics as one of the key steps towards Cognitive QA, where AI, robotics, analytics, and automation come together with human insight and reasoning to transform QA and testing into a true business enabler.

With our intelligent Cognitive QA approach, we enable smart quality decision-making based on factual project data, actual usage patterns and user feedbacks. It is how we ensure our clients’ QA and Testing operations deliver quality with speed in a complex connected world at optimized cost.