Doing the right thing with data: start exploring data ethics now

Capgemini

April 21, 2020

We live in the age of data. The amount of data available worldwide more than doubled between 2015 and 2019, and still each year the pace at which data is collected continues to accelerate (source: IDC). Data enables organizations to be insights-driven and innovative in problem solving, allowing them to be effective in growing, evolving, and achieving their goals. Data can provide many benefits – financial and other – and is therefore an important asset to any organization. However, if not managed properly, it can become a liability, too.

Advanced technologies such as artificial intelligence and machine learning provide enormous advantages, but inevitably bring challenges as well. Now that decisions previously left to humans are increasingly delegated to algorithms, the topic of data ethics frequently makes the news, pressing twenty-first century organizations to be aware of its relevance and importance. The report Why addressing ethical questions in AI will benefit organizations, by the Capgemini Research Institute, surveyed 1,580 interviewees and showed that “executives in nine out of ten organizations believe that ethical issues have resulted from the use of AI systems over the last 2–3 years, with examples such as collection of personal patient data without consent in healthcare, and over-reliance on machine-led decisions without disclosure in banking and insurance.”

“However, data ethics is not limited to AI.”

It also stretches to everyday data-related activities, for example processing data in simple applications such as Microsoft Excel or sharing documents via email. While the unethical use of data can be the outcome of malicious intent, it often actually results from a lack of digital literacy, or too little awareness or consideration of the complex implications of using digital technologies and tools.

The difference between compliance and ethics

Ethics is fundamentally different than compliance. Being compliant means following a strict set of rules, specified for example in law or professional codes of conduct; being ethical means doing what is “good.” US Justice Potter Stewart defined ethics as: “knowing the difference between what you have a right to do and what is right to do.” We will humbly leave the answer to the ambitious question “what is good” to philosophers and only focus in the following on the unambiguous unethical use of data, which often implies serious reputational risk for organizations.

To mitigate this risk, many organizations have established sets of principles that capture their general ethical profile, such as being inclusive and honest.

Capgemini itself has held honesty among its core values since our formation in 1967. However, many still must make the leap to think about data ethics specifically. The recently published report Conversations towards ethical AI by the Capgemini Research Institute quotes Luciano Floridi, professor of philosophy and ethics of information at the University of Oxford. He says: “a company needs to understand that doing the right thing is a win-win situation. It’s good for business and it’s good for society.” Hence, data ethics themes such as accountability, or the respect of privacy, are gaining momentum and require the attention of most employees in organizations – managers and data scientists in particular. The value created by data will not endure if organizations fail to understand the implications of its use.

Ethical questions play a role in all stages of data processing

The process of turning data from a raw state into something valuable – whether insights, decisions, or actions – can be subdivided into three phases, for simplicity: acquisition, processing, and generating impact. This model helps analyzing the ethical implications along the workflow, with the right tools at the right stage.

Data acquisition includes any form of data collection, such as creating it (e.g., manually generating datasets or collecting it through IoT sensors), receiving it (e.g., extracting it from a CRM system), or buying it (from specialized firms). The next phase, processing data, consists of any work done to the data, from combining it to analyzing it and eventually disposing of it. In the last phase of the data cycle, the goal is to generate impact through insights or decision making.

In all phases, organizations face different challenges and risks when it comes to data ethics.

For what concerns the acquisition of data, one should be aware of possible weaknesses in the dataset; is the dataset a representative sample without significant bias? Besides, when collecting personal data, ethically dealing with data implies that the people described by the data are at least aware that their data is being collected, consented to the collection of their data, and know for which purposes it will be used.
Processing data is the actual analysis of the data and has its distinct problems: is the data appropriately anonymized? Could the data you use be a proxy for something else? Would the people described in the data (if any) agree with the way that the data is processed and shared in the organization and beyond? Moreover, artificial intelligence algorithms are increasingly complex, even for the data scientists constructing them. Ideally, organizations should be aware of the reasoning of their algorithms and the combinations that machine learning systems are able to make, in order to make the right ethical decisions. Capgemini Invent can help you in setting up understandable and transparent AI algorithms.
The question in the impact-phase is: what impact? And, perhaps even more importantly: impact on whom? The use of data and decisions based on data analysis have real-world implications. Algorithms, no matter how simple or complex, do not absolve organizations from the responsibility and the agency of their actions. The impact on vulnerable groups in society in particular should be sufficiently considered, to mitigate the risks of possible discrimination resulting from your organization’s analysis.

What are the ethical themes in play while using data?

In ordinary life, between individuals, ethics often boils down to being a good citizen – treating others as you would like to be treated yourself. In data ethics, we advise organizations to focus on specific themes. Following our review of academic literature and ethical codes, we have selected what we consider to be the four most important themes: transparency, accountability, privacy, and bias. These four themes are not mutually exclusive, but complementary and overlapping:

Transparency

Transparency enables democracy and informed decision making and is therefore one of the pillars of a healthy society. In data analytics, transparency can help prevent abuses of institutional power while also encouraging individuals to feel safe in sharing their own data. Transparency is the essential theme in data ethics, since it enables people to examine the use of their data. Moreover, the recent Capgemini Research Institute report stresses that transparency should be accompanied with understandability, since transparency without understandability still offers limited insights into the working of an algorithm.

Accountability

Accountability is of vital importance when working with data. It raises the question “who is responsible when things go haywire?” When data is irregularly collected, irresponsibly anonymized, or wrongfully interpreted, the people responsible should be held accountable. From start on, data governance should clearly delineate who is responsible for which actions and what data. Furthermore, in the case of more complex models, algorithmic accountability – meaning that organizations are responsible for the outcomes of their algorithms – should be ensured.

Privacy

Privacy implies that confidential information remains confidential. Privacy as an ethical theme starts at a baseline where personal data should not be shared without consent and should be secured sufficiently. However, the concept of privacy has broadened and comprises the responsibly to anonymize data sufficiently to ensure that it cannot be traced back to the people it describes.

Bias

Bias is an inclination for or against a person, group, area, or subject. Unfortunately, the outcomes of algorithms are not always bias free, but rather carry the biases of analysts, data collectors, managers, policymakers, and all other stakeholders who influence the data throughout the cycle we described. Furthermore, each dataset is created within a societal and historical context and therefore risks carrying the biases present in society at the time of creation. Therefore, it is important for users to consider the possible biases that could arise in their datasets. Additionally, team diversity should be a spearhead for all organizations working with data to mitigate the forming of biases in their datasets and algorithms.

The data ethical themes and the data cycle together provide an excellent starting point for organizations that want to think of their use of data in a more ethical way.

The figure below is a graphical interpretation that encapsulates the themes and the cycle in one simple overview.

Concluding, data ethics is an emerging topic in a world where data is created at unprecedented speed and algorithms are increasingly important. Organizations implement ethical guidelines and tools for two reasons: first, because treating data ethically simply is the right thing to do and second, because unethical ways of working with data make organizations susceptible to reputational damage. Whatever the reason, becoming more ethical with data is crucial for organizations to flourish in the age of data. The combination of the data cycle and the ethical themes in this blog enables organizations to do just that, by supporting them to ask the right questions when acquiring, using, and creating value with data.

About the authors

Jochem Dogger and Sari IJsseldijk are consultants in the Data Strategy & Data Science unit of Capgemini Invent. They help organizations become digital leaders by translating business questions to data solutions and facilitating a data-ready environment where data can be leveraged to its full extent in an ethical way.

Our expert in this field is Gianfranco Cecconi, a principal consultant in the Insights-Driven Enterprise practice of Capgemini Invent. He is an advocate for open and ethical technology, currently leading Capgemini’s work for the European Union in the space of open data and data sharing.