Featured Product
This Week in Quality Digest Live
Statistics Features
Edwin Amenta
Tanking a team to the top and other moneyball strategies
Donald J. Wheeler
Practical answers to an age-old question
Davis Balestracci
Will it be thumbs up or thumbs down?
Teofilo Cortizo
Four points of consideration
Donald J. Wheeler
It takes more than a good capability ratio

More Features

Statistics News
Tool for nonstatisticians automatically generates models that glean insights from complex data sets
Version 3.1 increases flexibility and ease of use with expanded data formatting features
Provides accurate visual representations of the plan-do-study-act cycle
SQCpack and GAGEpack offer a comprehensive approach to improving product quality and consistency
Ask questions, exchange ideas and best practices, share product tips, discuss challenges in quality improvement initiatives
Strategic investment positions EtQ to accelerate innovation efforts and growth strategy
Satisfaction with federal government reaches a four-year high after three years of decline
TVs and video players lead the pack, with internet services at the bottom

More News

Davis Balestracci

Statistics

Watch Out for the Big Data Steamroller

The most seductive waste of all

Published: Monday, January 14, 2019 - 13:03

“People think that if you collect enormous amounts of data you are bound to get the right answer. You are not bound to get the right answer unless you are enormously smart.”
Bradley Efron

There has been an explosion in new technology for acquiring, storing, and processing data. The “big data” movement (and its resulting subindustry, data mining) is becoming more prevalent and having major effects on how quality professionals and statisticiansand everyone elsedo their jobs.

Big data is a collection of data sets that are too large and complex to be processed using traditional database and data-processing tools. Any change this big will require new thinking. 

Rocco Perla, a colleague for whom I have the utmost respect, feels that even though there is now unprecedented attention and focus on analytics and data-driven decision making, it has also introduced a number of challenges.

It is poor practice to rely on whatever data happen to be available or to assume sophisticated analytics can overcome poor data quality. The fact that data reside in electronic files says nothing regarding the quality of the data. Observational data are teeming with reproducibility issues, especially if they are a resulting merge of many different sources 

Intuitively, as the amount of information increases, one would think that the degree of confusion decreases. But this is true only to a point because the situation eventually reverses, reaching a point where more information leads to increasing confusion—low amounts of information and high amounts of information will both lead to a high degree of confusion! There are real economic implications of information overload—including a loss of up to 25 percent of the working day for most knowledge workers.

W. Edwards Deming often made the point that information is not knowledge. The speed of today’s worldwide instant communication does not help anyone to understand the future and the obligations of management. Do we really need constant updating to cope with the rapidly changing future? It could hardly be accomplished by watching every moment of television or reading every newspaper!

Data maturity = data sanity

Perla introduces a concept he calls “data maturity” to begin to make sense of all the available data, especially with the growing sub-industry of project work inherent in many improvement philosophies. It has the following five characteristics.

1. Projects and their supporting data (e.g., reports, dashboards) are viewed as a resource expenditure that adds various costs and complexity for needed support, which should be only temporary.
No more automatic “data on demand.” Organizations that are not data mature often view requests for data as going into a magical black box that, with limited input, thought, and resources, is able to produce the desired end-product perpetually.

2. Projects don’t go on forever. Any project needs formal closure to be retired, after which its measures are also retired.
Any transition from diagnostic and testing data to the consideration of collecting data to hold any gains should be done deliberately and with thought. It is not unusual in many organizations for data, reports, and projects to reach a point where they are never or rarely looked at by the requester in the extended future. If one is not careful, new measures become additive to old measures and are allowed to continue in perpetuity and to clutter the environment.

3. All measures are operationally defined.
In data-mature organizations, operational definitions are viewed as sacred because they are the only way to ensure a shared understanding of the work to be done.

It is not splitting hairs to formally define adjectives such as “good,” “reliable,” “unsafe,” or “unemployed” so that, regardless of the person measuring the situation, he would either come up with the same literal number or decision, e.g., thresholds where an environment goes from “safe” (x = 0) to “unsafe” (x = 1). 

Even though people may not agree on the actual definition, they will agree to use it and continue to use it as long as the ability to make the desired decision inherent in the objective of the collection remains strong. The objective here is to reduce the “human variation in perception” factor, which will contaminate any data and render them virtually useless.

Management cannot knowingly allow for idiosyncratic or capricious interpretation of a measure, especially if there is potential for distortion to meet an arbitrary goal. 

For example, is Pluto a planet (x = 1) or isn’t it (x = 0)? It depends: What is the objective for asking this question? (Never mind how you feel about it!)

4. Improvement measures are clearly and explicitly linked to any changes being tested in the system—and they are collected frequently over a period of time.
The purpose of improvement is to understandas quickly as possibleif the changes made to a system are leading to improvement and to inform the next test. Many improvement practitioners shy away from annotated run and control charts because direct causal links between changes and outcomes often are not possible. 

5. The dominant form of analysis includes charts of data collected over time to determine, and react appropriately to, common and special causes of variation.
Organizations that are data mature understand the importance of examining data over time and using these charts as developed by Walter Shewhart (mainly control chart for individuals) to distinguish between special and common cause variation. These charts are based in what are called analytic statistics, and neither resemble what is taught in most academic courses (enumerative statistics) nor involve formal hypothesis testing.

How does one filter all this information and make sense of it? 

In my experience, virtually without exception, these ideas begin to seriously challenge the way leaders and organizations currently think about information, data, and decision making. Environmental pressures remain constant, and technology is not going to slow down. This creates more dependence on the tremendous inertia of the status quo and a toxic blindness to the deceptive power of this different approach’s counter-intuitive simplicity. This blindness makes people especially prone to... the most seductive waste of all?

Hand-in-hand with the explosion of big data will be an industry trying to sell you solutions for analyzing it. In fact, a client of mine sent me the figure below:

“Eye Chart” not to be confused with “I-chart” (control chart for individuals)

In this case, the vendor promises highly visual, easy-to-understand information that will allow insight into not only what the level of employee engagement is, but also the level of leadership effectiveness within each department. The vendor claims its analyses are statistically robust and repeatable over many years (and I’m willing to bet that, unfortunately, most of them use this all too common—and wrong—analysis). Note how they “delight their customers” by also throwing in red, yellow, and green color-coding.

Another claim that is possibly some good news, but for an entirely different reason: This one tool is a fraction of the cost of a typical survey

Well, at least you could save money on all those ongoing, silly customer satisfaction surveyshow many of your electronic data files are filled with those?

Data have their place, but the challenge is to maximize their ability to serve us humans with all our limitations—not the reverse.

Perla concludes that the most effective future leaders will leverage this approach to data with a vision, energy, intellect, and moral compass that come from withinnot so much from a report or balanced scorecard. 

Let’s stop the “data tail” from wagging the process dog, shall we?

Discuss

About The Author

Davis Balestracci’s picture

Davis Balestracci

Davis Balestracci is a past chair of ASQ’s statistics division. He has synthesized W. Edwards Deming’s philosophy as Deming intended—as an approach to leadership—in the second edition of Data Sanity (Medical Group Management Association, 2015), with a foreword by Donald Berwick, M.D. Shipped free or as an ebook, Data Sanity offers a new way of thinking using a common organizational language based in process and understanding variation (data sanity), applied to everyday data and management. It also integrates Balestracci’s 20 years of studying organizational psychology into an “improvement as built in” approach as opposed to most current “quality as bolt-on” programs. Balestracci would love to wake up your conferences with his dynamic style and entertaining insights into the places where process, statistics, organizational culture, and quality meet.