Featured Product
This Week in Quality Digest Live
Management Features
Constance Noonan Hadley
The time has come to check whether the benefits of teamwork still outweigh the costs
Naresh Pandit
Enter the custom recovery plan
Anton Ovchinnikov
In competitive environments, operational innovation could well be the answer to inventory risk
Julie Winkle Giulioni
The old playbook probably won't work
Sarah Schiffling
But supply chains will get worse before they get better

More Features

Management News
Program inspires leaders to consider systems perspective for continuous improvement and innovation
Recent research finds organizations unprepared to manage more complex workforce
Attendees will learn how three top manufacturing companies use quality data to predict and prevent problems, improve efficiency, and reduce costs
More than 40% of directors surveyed cite the ability of companies to execute as one of the biggest threats to improving ESG performance
MIT Sloan study shows that target-independent compensation systems can be superior
Steps that will help you improve and enhance your employee recruitment, retention, and engagement
300 Talent acquisition leaders and HR executives from companies gather in Kansas City
FedEx demonstrates commitment to customer-focused continuous improvement

More News

Tristan Mobbs


How Classifying and Fixing Dirty Data Can Help Even Those Using Spreadsheets

Examining the ground we stand on

Published: Wednesday, March 9, 2022 - 13:02

All too often the topic of fixing dirty data is neglected in the plethora of online media covering artificial intelligence (AI), data science, and analytics. This is wrong for many reasons.

To highlight just one, confidence in the quality of data is the vital foundation of all analysis. This topic remains relevant for all levels of complexity, from spreadsheets to complex machine-learning models.

So, I was delighted to review Susan Walsh’s book, Between the Spreadsheets: Classifying and Fixing Dirty Data (Facet Publishing, 2021). Here are some highlights from her book, and my own advice on who should read it.

Yes, we’re finally talking about dirty data

Between the Spreadsheets covers a topic that gets less daylight amid the glamour of AI and machine learning. Having followed Walsh for a while on LinkedIn, I've enjoyed how she highlights the benefits of making sure your data have their COAT on (consistent, organized, accurate, trustworthy), and how she takes the time to explain the challenges associated with poor data quality—as well as the painful real-world consequences of it.

Exploring dirty data in the world of procurement, Walsh looks at spend data classification and provides real examples of how she would go about validating and sorting out the dirty data.

Data horror stories to inspire action

Data quality and data validation are often unloved topics in the world of data. But they’re crucial. Anyone working in the world of data will have encountered issues that had consequences for the company or people involved. In the energy industry, I often saw customers being billed who weren’t on supply with the company I worked for. It gets worse when the debt collectors are about to be unleashed.

There are many examples where data errors cause major issues, from NASA losing its $125 billion orbiter due to a metric conversion error to an English town in Yorkshire not paying its gas bill for 17 years. The range of impact can vary from looking a bit foolish to something extremely costly. Between the Spreadsheets shows the importance of these topics, and offers practical examples of steps to ensure your data have their COAT on.

Cleaning data can be tedious, but Walsh’s practical examples guide you through a process that helps make it a little less painful. As the benefits are stated and the horror stories shared, this book will motivate you to get on with it and clean up your data.

How can you get started with your dirty data?

We can all spot errors and clean up our data with Walsh’s guidance along with our own methods and techniques. Often, it can be as simple as sorting the data, as a schoolboy found out when he corrected NASA (yes, them again). Rocket science is easy compared to data validation, apparently.

Throughout her book, Walsh provides guidance on how to clean up your data in Excel. She shares tips and tricks, such as key misspellings and the challenges of replacing data without context.

Walsh also highlights the importance of regularly cleaning up your data. If quality is regularly checked, then cleanup is a relatively small task. If neglected, however, the task can become huge. The longer it’s left, the bigger the effect on the business, too. What dodgy decisions might be based on false information in your organization?

Who can benefit from reading this book?

None of the techniques and methods shared by Walsh is particularly complex. Most people working with data should have the skills to execute the advice and methods in this book. By illuminating the issues, Walsh hopefully will motivate more people to take an interest in ensuring their data are cleaned on a regular basis.

If you have limited experience in managing and maintaining data, then this book is for you. If you work in finance, procurement, or marketing, you deal with data daily but may not have the technical knowledge of a data team. For this reason, Walsh’s Excel tips are relatable and easy to implement. So almost anyone can improve the quality of their data using this book as a prompt or guide.

I’d be betraying confidences if I shared specific examples, but I too have seen big organizations make costly mistakes due to data errors. These days, it feels like all the focus on advanced analytics and data science is making such neglect even more likely. Thank goodness for the rise of DataOps as a topic. Hopefully, these techniques and a growing legion of chief data officers (CDOs) can ensure dirty data are tackled quickly and often. As leaders in data, analytics, and insight, let’s resolve to continually question the quality of our data. They are the very ground we stand on.


About The Author

Tristan Mobbs’s picture

Tristan Mobbs

As a true data translator, Tristan Mobbs excels at giving data meaning. Mobbs’ goal is to to ensure that analytics deliver business results.