Getting a grip on Data

Publish date:

Recently, I was called in to advice on a dispute concerning the delivery of the required data to a Business Intelligence environment from one of the source systems. The problem was that there was not enough time in the weekend to run both month-end closing and the BI-feed. On the table were increase processing-power, separate […]

Recently, I was called in to advice on a dispute concerning the delivery of the required data to a Business Intelligence environment from one of the source systems. The problem was that there was not enough time in the weekend to run both month-end closing and the BI-feed. On the table were increase processing-power, separate environment and redesigning the extract software. No-one considered cleaning up the source system. There is data in there dating back to the early 70s, not fitting current business nor business rules and still we do not think about cleaning it up.
Can’t complain really. I seem to be doing the same thing at home. At the beginning of the year, I take some time out to organise all the paper mail of the year before, put them in nice stacks and subsequently punch holes in them and place them in binders. The binders get full and after having put much too much into a single binder, I go out and buy a new binder. I have a full overview of all my spending of 1975, if your interested.
The trouble is: no one is interested. Ok, maybe on a cold, rainy Sunday morning I would like to stroll down memory lane, but does that justify keeping it?
“It doesn’t eat bread.” my father used to say to my mother when she urged him to clean up his junk. Leave it alone, it doesn’t cost any money.
And along the same lines, companies seem to react to cleaning up the source systems: it’s going to cost us a lot more money to clean it up than to buy a few more disks. Unfortunately, the cost doesn’t end there. Especially the older systems require that ‘everything’ is read when doing monthly processes such as premium runs and invoice runs. The old data is a dead-weight that is constantly present. This means that besides the storage, you also need to invest in processing power. Add a CPU, more internal memory.
Storage needs to be backed up and consequently we have to buy some more auxiliary storage as well. Plus the time it takes to copy that old data that has been copied so many times already.
No wonder the weekend is too short.
After my vacation last year, I came back with 2,500 pictures at 2,5 MB average. Add to that the 3,500 pictures of my fellow vacationers, that is a lot of storage. Now, the sane thing to do would be to select a few good ones and keep these as the reminder of a great vacation. Well, I did select about 300 and copied them to a separate folder, downsized these to another folder and then uploaded them to my webpage. Companies do the same: copy some data from one system to another, do a bit of aggregation perhaps, or cleaning, and then copy them to yet another system. And much like me at home, we keep the original and the copies. And then we back it all up to auxiliary storage.
At the beginning of 2009, I had 2,6 Terabyte of hard-drive storage at home. At the end of 2009, it is 5 Terabyte. Copies, back-ups and back-ups of back-ups, just to be sure that that picture I will probably never watch again, will be preserved.
The structured data in our source systems, although massive, isn’t even our biggest problem. It’s the email, the documents and the web pages we have created. Emails, with or without attachments, are joyfully send to long lists of addressees. Forward it and you have another copy in your outbox. People reply, hey presto! more copies. And what happens when the administrators warn you of your mailbox size? You complain and ask for more or you move them to your archive. A lengthy email discussion between a few people amounts to staggering storage consumption and after a little while non of the new replies hold the complete story. Are we at least considering tools like Twitter and Yammer as alternatives to email discussions? These are scary tools for the person who wants to be in control. How do you make sure that that interesting discussion will still be there when you need it again? At least with email, you’ve got my own copy, which you will not delete. Ever!
The trouble with discussions and knowledge bases is that they tend to grow old. A discussion about the new version of ToolX is set in a time-context: “it is the best tool on the market” is only valid for a short time. The evaluation of the latest Smartphone is interesting, but not after 2 years. We’ve moved on. We do not want to see old evaluations pop up when we search for Smartphones. Who cares what was hot in 2005? But as long as its on the system, it will keep appearing, blocking our view of what we are looking for. 2,000 hits on a search? Please, that’s not helpful at all. It will take me another weekend to get through those!
Outdated data is very costly: not only does it frustrate our searches, our Data Quality measurements and our computer power, it also frustrates our ability to progress, to innovate and to be agile. We are forced to make amendments to brand new systems because otherwise we can’t move the old data in. This in turn opens these new systems to new data imperfections. The introduction of a new system is a perfect opportunity to get rid of old data. We take that opportunity hardly ever. “Who decides what can be discarded?” and “how do we prove that the conversion worked when we don’t convert everything?” are the main reasons not to take the opportunity. We rather just move everything over first and then we will start a project to do the clean-up. A pity we had to corrupt our brand new system and a pity the clean-up project never gets started.
Yes, there are some conversions that change or clean-up old data so that it fits in the new system. But is that always a good idea? Changing old data distorts the little value it had left. Repairing 29th February 1991 into the 28th isn’t preserving old data, it’s corrupting it.
And it doesn’t help us getting through the weekend.
Action is called for. For me and for the companies. We need someone to stand up and take control, get a grip. At home, it is simple: that would be me. I will decide which data can be declared obsolete (Data User), I will set up rules to identify which fall into that category (Data Owner) and I will clean it up (Data Steward). For older papers with value, I will also decide whether paper is still the best medium. If necessary, I will transfer it into an electronic document and shred the paper. I can apply the same principle for my electronic archive. Any document only once it its original format and one back-up.
In companies we can do the same thing. we appoint Data Owners and Data Stewards. If we already have appointed them, we probably need to give them more power so that they can make the difference. Set boundaries for personal storage such as email. Personal storage are nothing more than other copies of Company data. Introduce a use-by-date for data, urge Knowledge Workers to clean up outdated information, select the right tools for discussions and try to get rid of attachments in emails. There are much better ways to share documents. Set targets to get rid of at least 10% of outdated data every year.
And be open about this. Share (preferably not through email) what you are about to do and why. Make sure we are all in on it.
Lets try to bend the curve of ever increasing storage by getting rid of outdated data. But let’s not make it a project. Let’s make it a way of life. A culture. Let’s be sure that we can easily identify data when it is outdated and let us remove it. Let’s get our weekend back. After all, wouldn’t it make more sense to use that processing power to generate more business rather than pumping old data around?
Let’s get a grip.

Related Posts


On target! How to ensure your marketing gets results

Stéphane Sun
Date icon March 26, 2021

Marketers must transform how they capture customer data and turn it into a true business...


CMOs are leveraging data and compliance to augment their marketing ecosystem

Date icon March 17, 2021

CMOs must keep up with new channels and focus on becoming leaders of change instead of being...


Alternate data can streamline the underwriting evaluation process

Shane Cassidy
Date icon March 17, 2021

Alternate data has significant advantages, although managing high volume and velocity – as...