Skip to Content

Insights & Data Terminology Glossary

Minna Lind
5 May 2022

The world of insights and data can sometimes feel inaccessible due to the amount of industry-specific jargon used. What’s more, it’s an area that can evolve quickly as new terms come up or even take on new or expanded meanings. That’s where we come in.

The Insights & Data team at Capgemini Finland has the expertise that encompasses an extensive range of tasks and projects undertaken in this area. We’ve used that knowledge to gather together many of the key terms you may come across and offered clear definitions to help you understand their meaning.

All these, and more, are areas where we can assist in creating and executing effective and efficient solutions. Please get in contact if you’d like to discuss this with us further.

Algorithm

An algorithm is a series of steps that explains what calculations a computer should perform to transform data into a more useful form or output. For example, it can help analyse potential solutions to a problem to find the most efficient option.

Artificial intelligence (AI)

This is a term used in so many ways and settings in the modern world, it’s easy to think of it too broadly. Concerning insights and data, artificial intelligence refers to the concept of computers making reasonable deductions on actions to take based on the data. As more data is made available, the artificial intelligence system should be able to improve the accuracy of the decisions it’s making, this process of learning being a key part of the solution.

This type of artificial intelligence opens up the possibility of business solutions that can operate at speed and scale to perform tasks that previously required human intervention due to their nature.

Augmented Analytics

This is where you use the power of artificial intelligence and machine learning to take on some of the analysis tasks usually completed by a data specialist. By utilising augmented analytics, companies can conduct challenging analyses even without data scientists available or at a size where there simply aren’t enough staff available to complete.

It’s a process of automatically taking company data, processing it, checking for insights and translating the outcomes into a format that non-technical employees can use.

Big Data

This refers to data sets that are too large or unstructured to be handled in a traditional way or with a traditional software solution.

Business Intelligence (BI)

Business intelligence brings together all aspects of collecting, processing and presenting data into a format that helps drive business decisions and strategies. This usually involves a software solution.

Cloud computing

A form of delivering computing power over the internet rather than in-house. People commonly think of cloud storage, but it also includes everything from applications to raw processing power or databases and other computing resources.

Data analytics

The concept of drawing together data into a set and examining it to spot patterns or draw conclusions.

Data Catalogs

See Metadata management & Data Catalogs

Data for Net Zero

By Data for Net Zero, we refer to the entire ecosystem of carbon costing, planning and accounting, powered by sustainability data and then building analytics and AI on top. By truly understanding their footprint across the whole value chain and changing behaviour based on that, an enterprise can transparently and genuinely drive their impact on the green revolution.

Data Governance

See Data Management & Data Governance

Data Lakehouse

Data lakehouse is an architecture model that combines the flexibility of data lakes and data management and the benefits of data warehousing. It enables analytics, business intelligence (BI) and machine learning (ML) on all the data; structured, semi-structured and unstructured.

The data mesh and data lakehouse models of data management can be used together to bring modern data architecture to all types of data and workloads.

Data Management & Data Governance

The terms data management and data governance are often used interchangeably. However, data management refers to managing the whole data lifecycle within an organization. Whereas data governance is the core component within data management, tying together other sections that belong under it like data quality, metadata management, data architecture, data warehousing and master data management.

Data Governance includes the processes, policies, standards, roles and responsibilities ensuring shared understanding and trust in data. This improves the customer and employee experience and enables value from the data. For example, it can help make decisions based on accurate data resulting in a  quicker time-to-market.

Data governance is an essential tactic to help companies succeed in the face of ever-increasing quantities of data.

Data Mesh

Data Mesh is an architectural and organizational model that seeks to decentralize and distribute ownership and responsibility of data delivery to the people who are closer to the data (i.e. Data Domain Experts) while leveraging a standardized set of designs for self-service. This enables quick and efficient data sharing between Data Producers and Data Consumers.

The key principles of data mesh are:

  • Domain Oriented Data
  • Data as a Product
  • Data Infra as a Platform
  • Federated Governance

Data Modeling

Data modeling is the process of defining the structure and relationship of data based on actual business processes. Data modelers conduct it in close collaboration with business stakeholders.

It takes place in three sequential layers:

  1. The conceptual model captures the entities and their relationships in the business process
  2. The logical model captures all required data attributes by the business
  3. The physical model ensures performance and that it’s understandable to the business user.

An efficient and user-friendly data model is a key enabler for Self-Service Analytics.

DataOps

This is  a set of practices to democratize the use of data and improve its accessibility to business, by setting up an agile cooperative process with data analysts, data engineers, business users, and IT operations. It improves the quality, agility, speed of ingestion, and preparation, as well as provisions data for use in AI and analytics use cases.

Data platform

A data platform is a solution that takes in data, processes it and then presents it to users or other software in a format they can use.

Data virtualization

This allows you to access data whatever its source in your organization, providing you with the power to quickly generate conclusions from multiple data sources. It helps overcome the issue of data silos as roadblocks to analysis.

ESG reporting

An ESG report is a report published by a company about environmental, social and governance (ESG) impacts. Together, these give a snapshot of how sustainable and responsible a company is to investors so they can be confident it is taking its values seriously. Translating values into consistent behaviors shows commitment to a good ESG performance.

Knowledge graphs

A knowledge graph provides a way of storing the meaning behind data in a formalized way. It enables the creation of a comprehensive and easily extendable data model that can be easily queried, reasoned, naturally visualized and validated by humans in an explorable graph structure. Advances are based on using graph representation for capturing semantics using logic and making the results available as machine-readable contextual information.

Machine learning

A subset of AI, machine learning allows computers to learn from data through their programming.

Master data

This is the data that is key for an organization’s needs and the data commonly used within the organization that should be mastered in one place. That way, only a single truth exists for that data. It usually wouldn’t include transactional data and instead refers to stable data that doesn’t often change.

Master data management

Master data management (MDM) focuses on identifying an organization’s key entities, like customers and products, and maintains a single version of truth for that data. The target is to ensure there is complete and accurate information. Successful MDM needs data governance since defining and following the process and roles for managing the data is central to having high-quality master data.

Metadata management & Data Catalogs

Metadata is data about data, and it helps to answer the what/who/when and where questions related to the data. Metadata describes an organization’s data assets and thus enables them to find, understand, access and trust the data they need. Data Catalogs are tools for managing the organization’s metadata and can store different types of metadata. For example, technical metadata describes the location of the data, whilst business metadata provides the definitions and standards and context for the data.

MLOps

MLOps is a set of practices combining machine learning (ML), DevOps and data engineering. It aims to deploy and maintain ML systems in production reliably and efficiently. With MLOps, you can store and version models in production, collect continuous feedback on model behavior and maintain quality models to keep your business and customers ahead.

Self-service analytics

Self-service analytics is a form of business intelligence (BI) that enables business users to query curated datasets and build visual analytics/reports with minimal support from a central BI team.

To realize the benefits of this approach, it is important to:

  • Involve key business stakeholders throughout the data modeling process. This ensures that the final data model captures the business analytics requirements whilst keeping it understandable for the users.
  • Build curated datasets supported by a business glossary and a well-documented data model. Leverage data cataloguing tools to make the dataset and support documentation accessible to end-users.
  • Define a process for the BI team to govern and monitor the usage. Utilize the feedback to enhance existing datasets, build new datasets and take over the most used reports created by key business users.

Sustainability Analytics

Sustainability or Carbon Analytics combines carbon costing, carbon planning, carbon accounting and actual data from various data sources. When all the data is collected in a central hub, such as a cloud data platform, analytics and reporting can take place.

Carbon analytics should always be done throughout the entire value chain of an enterprise. Only focusing on the carbon footprint of, e.g. product (Scope 1 emissions) but excluding operations and further supply chain (scope 2 and 3) means only focusing on a minor part of the actual environmental footprint of an enterprise.

Author

Minna Lind

Data Management and Governance Lead

Email: minna.lind@capgemini.com

Contact us

First name is not valid.
Last name is not valid.
Organization is not valid.
Email is not valid.
What’s on your mind? is not valid.
Slide to submit
Thank you

We are sorry, the form submission failed. Please try again.