Skip to Content

Five ways to battle data waste

Roosa Säntti
14 September 2022

There is an increasing focus on reducing the environmental footprint of data centers and cloud services. Interesting enough, that is not yet the case at all for data. But clearly, with more organizations aspiring to become data-powered, the issue of Data Waste is lurking around the corner. We introduce five ways to begin battling data waste – with an additional key benefit: getting a better grip on the corporate data landscape.

My data is bigger than yours: we used to take pride in storing as much data as possible – because we could, prices were low, and future, killer algorithms were waiting. Having more data seemed the landmark of being a true, successful data-powered enterprise.

Turns out this consumes loads of energy and precious natural resources, and it creates a growing heap of unsustainable e-waste. We need to become more aware of what data we really need to store, how many times we duplicate it, and how long we keep it available. Also, although AI may be key to addressing climate challenges, it slurps energy itself too. Think only about how much energy it takes to perform one training cycle for a major AI language transformer model (hint: really, really a lot – say 5 times the lifetime CO2 emission of an average American car). The battle against data waste will therefore be a continuous, delicate balance act – and it only just begun.

And it’s a battle with benefits: many of the measures that already can be taken bring additional value for organizations that want to become data-powered, even to the point that the positive impact on overall data mastery may dwarf the sustainability impact.

Here are five suggestions to get your quest going:

 1. Get the data first

As with any other transformational objective: you should map your current situation first before you can start improving. Battling data waste begins with getting data on what data you actually have. Only then you will be able to assess how much of it really is unsustainable data waste, for example by analyzing how often data is used, by how many people and for what type of purposes. Many data catalog tools (such as Alation, see a separate article in this magazine) are perfectly equipped for this, and increasingly they feature intelligent automation and AI to do the heavy lifting of scanning the data landscape. Having an up-to-date data catalog brings many obvious additional benefits to a data-powered business as well, so every minute of activity in this area is typically well-spent.

2. Map the environmental impact

Once you know what data you have, it is a matter of understanding its real environmental impact. Data is stored in storage systems, as part of an IT infrastructure and a supporting network (in a data center or in the cloud). All these resources consume energy, create e-waste and have a carbon footprint. An increasing number of’ publicly available carbon calculators help to establish the sustainability cost of the elements of the data landscape, not only focusing on Scope 1, but covering the entire ‘supply chain’ of Scope 2 and 3. Once this data is established, it should be routinely added to the metadata management and catalog facilities of the organization – for current and future reference and use. As with every sustainability effort, you want to focus on the data sets that have the most negative impact.

“But it is indeed a balancing act, as the data can be part of a solution or an initiative that delivers societal benefits that far outweigh its sustainability costs.”

3. Get rid of it

Ever saw Hoarders? It’s a reality-TVshow that features compulsive hoarders: people who are addicted to filling their homes with objects,and how that spills out into their lives. You don’t want to be a data hoarder. Just keeping data for the sake of it – or that it might come in hand in some unforeseen way – can provide you with a high sustainability bill. And it simply costs money too,for that matter. So, just as with application rationalization, data should have a managed lifecycle that not only involves creating and using it, but also features clear policies for decommissioning unused, redundant, or simply wasteful data.Organizations sometimes tend to hold on to their established IT assets(including data) for nothing more than emotional, non-rational reasons. Where the cost equation may not be enough to break that spell, sustainability impact might just do fine.

4. Stop at the gates

It’s a well-established practice within Permaculture (see our separate article in this magazine about ‘Permacomputing’ for more): you don’t recycle, reuse, and repurpose as an afterthought, it is an integrated part of your design and approach, right from the start. A lot of wasteful data can be avoided by never ingesting it in the first place. So, no more room for this typical Big Data era mindset of whatever data is available should be stored, because storage in the cloud is cheap and you never know what use it may have. Later. Sometime. Maybe. Instead, think in terms of Small Data, Tiny Data, or simply Smart Data: be much pickier about the data sets you get onboard, the objectives you have for it, and the quality of the data points inside. Select data that is fit for your purposes. Think more upfront, clean so much less later.

5. Do not duplicate

Data architecture is not necessarily a well-established practice within many complex organizations. As a result, data is often unnecessarily copied multiple times from the central data organizations to various business domains, and vice versa. Each instance starts to lead its own life, serving all sorts of different purposes, rapidly adding to a growing pile of potential data waste. And it all tends to be unaligned and unsynchronized. New architectural approaches – notably Data Mesh – appoint the ownership of specific data sets much more explicitly to specific business domains. Data is typically held – and stored – by the business domain and made available in flexible integration ways (such as APIs), so that duplication is unnecessary, even undesirable. Other integration technologies, such as data virtualization, can achieve the same.

Lastly, don’t forget the people. As with everything around data, we can only accomplish so much without involving and empowering people to be and lead the change. Data catalogs and API-first architectures are great tools to drive more sustainable use of data and AI. But if there are no people embracing the direction (a sustainable data vision and strategy) and no ownership of the data (internalizing which data is used, why and how much) – failure is a given. True Data Masters battle data waste by harnessing both: data foundations and data behaviors.

There are many more ways to stop data waste, such as relying more on shared data between multiple ecosystem partners, procuring data and pre-trained algorithms from external providers, limiting the movement of data, and switching to energy-saving storage media. One thing is for sure: even if reducing data waste would not deliver a substantial sustainability impact at first sight, each and every activity suggested adds to a higher level of data mastery. And that – in all cases – is priceless.

INNOVATION TAKEAWAYS

Data has a sustainability cost

With its obvious merits, data has an impact on the environment in terms of its dependency on natural resources and energy and its carbon footprint; hence data waste must be actively addressed.

The quest against data waste

There are many ways to decrease harmful data waste, but they all start with a better understanding of the current data landscape and its environmental impact.

Battle with benefits

Reducing data waste can have an obvious positive environmental impact, but while doing so organizations will see their level of data mastery lifted as well.

Interesting read?

Capgemini’s Innovation publication, Data-powered Innovation Review | Wave 4 features 18 such articles crafted by leading Capgemini and partner experts sharing inspiring examples of it – ranging from digital twins in the industrial metaverse, “humble” AI, serendipity in user experiences, all the way up to permacomputing and the battle against data waste.. In addition, several articles are in collaboration with key technology partners such as  AlationCogniteToucan TocoDataRobot, and The Open Group to reimagine what’s possible.  Find all previous Waves here.

Authors

Roosa Säntti,

Head of Insights & Data Finland
Roosa Säntti is heading Insights & Data practice in Finland and is also an active member of Capgemini’s global I&D Innovation Network. Roosa is a business builder by heart and believes that with data, we can truly drive businesses and society towards a more sustainable future. She is also a big supporter of diversity and sees that fueling innovation also in her own teams.

Ron Tolido

CTO of Global Insights & Data
Ron is an EVP, certified master architect, and Chief Technology Officer of the Insights & Data Global Business Line. In addition to Data-powered Innovation Review, he is also the lead author of Capgemini’s renowned TechnoVision on technology trend series. Based in the Netherlands, Ron is an executive lecturer at several business universities.