Featured Product
This Week in Quality Digest Live
Statistics Features
Donald J. Wheeler
What does this ratio tell us?
Harish Jose
Any statistical statement we make should reflect our lack of knowledge
Donald J. Wheeler
How to avoid some pitfalls
Kari Miller
CAPA systems require continuous management, effectiveness checks, and support
Donald J. Wheeler
What happens when the measurement increment gets too large?

More Features

Statistics News
How to use Minitab statistical functions to improve business processes
New capability delivers deeper productivity insights to help manufacturers meet labor challenges
Day and a half workshop to learn, retain, and transfer GD&T knowledge across an organization
Elsmar Cove is a leading forum for quality and standards compliance
InfinityQS’ quality solutions have helped cold food and beverage manufacturers around the world optimize quality and safety
User friendly graphical user interface makes the R-based statistical engine easily accessible to anyone
Collect measurements, visual defect information, simple Go/No-Go situations from any online device
Good quality is adding an average of 11 percent to organizations’ revenue growth
Ability to subscribe with single-user minimum, floating license, and no long-term commitment

More News

Paul Laughlin

Statistics

Learning From The Book of Why

Correlation vs. causality

Published: Tuesday, May 3, 2022 - 12:02

As I started reading The Book of Why: The New Science of Cause and Effect, by Judea Pearl and Dana Mackenzie (Basic Books, 2018), I was reminded how often analysts trot out the bromide “correlation is not causation. It’s a well-known warning. Indeed, I often encourage those learning data visualization to ensure their designs don’t imply causation. But this book helped me think more deeply about it.

This is not a light read—but it is an important one. As Pearl and Mackenzie make so clear in case studies, causation is central to how we think as human beings. Most nonstatisticians are not really interested in patterns and correlations in the data. They want to know what causes an effect so they can take appropriate action.

Sadly, this book also chronicles how, for decades, the formal study of statistics has not helped that quest. Leading figures (who are critiqued in this book) declared it unknowable, or that the only answers lay in more data. As well as venerating randomized control trials (RCTs), this mindset played into the Big Data revolution. But lacking any documented causal model of the real world, this can be misleading and lead to false results. When you only have a hammer, you start to see nails everywhere.

A ladder to guide our understanding of causation

But I am getting ahead of myself. Let me share with you first an overview of what to expect in this book. Then I will go on to explain some of the helpful frameworks and approaches Judea and Mackenzie outline. After a helpful scene-setter on background and why this subject matters, the authors share a ladder as the chief framework. It is called the ladder of causation, and it makes clear the difference between three levels of understanding causation:
1. Seeing = Association (i.e., correlation: What if I see...? What does a survey tell us about election results?)
2. Doing = Intervention (i.e., experiments: What if I do...? What if we ban cigarettes?)
3. Imagining = Counterfactuals (i.e., understanding possibilities: What if I had done...? Why? What if I had never smoked?)

ladder of causality
Image from The Book of Why: The New Science of Cause and Effect, by Judea Pearl and Dana Mackenzie

Building on that fundamental structure, the authors go on to share both the history of progress and setbacks. New heroes and villains emerge as a way of making progress within the statistics mainstream, including the important contribution of Thomas Bayes and Bayesian networks. Along the way and beyond the history lesson, the authors explain what is needed for each rung of the ladder and emphasize the importance of confounding variables, paradoxes, designing interventions, and modeling counterfactuals.

This critical textbook for understanding causal science finishes by considering the relevance of this progress to AI and Big Data. As I considered when reviewing Rebooting AI: Building Artificial Intelligence We Can Trust, (Gary F. Markus and Ernest Davis; Pantheon, 2019), we are still a long way from more helpful general intelligence—robots perceiving what they do and why. The book explores why the answers lie more in causal science than in more Big Data.

Models to help analysts think deeper about the real world

As I mentioned before, this is not an easy read. That is not because it is badly written. Not at all. But it’s not an easy topic.

This book will challenge you to think. It’s a chance to go “back to school” and learn a science that should be as commonly taught as other branches of math and statistics. This review cannot do all that material justice, but here are a few tools and approaches I recommend analysts explore and learn.

Early on, we are introduced to causal models with diagrams and arrows that capture our understanding of known or possible causal relationships. Like the power of analysis being informed by real-world domain knowledge, this makes all the difference. Time and again, the authors show how such a visual graph can inform approaches and the mathematics needed. Building on that, the reader (with sufficient math background) learns the do-calculus: A mathematical way to capture causal models and resolve these expressions of what happens if you take certain actions.

There are two further topics I would recommend analysts and statisticians learn from this book, although all of the above is powerful enough by itself. Even more transformation of current approaches can be made by climbing the ladder of causation to the top, together with being aware of what can go wrong. Statisticians will find the sections on “counterfactuals” informative and, at times, philosophical. It’s amazing to reflect on how easily a human child can imagine what might have happened if they had acted differently, yet how that thought process lacks mathematics to represent it. Lastly, I would direct all analysts to read the explanations and mitigations for Simpson’s paradox, the Monty Hall paradox, and Berkson’s paradox.

Could you benefit from reading the book?

Despite my warnings that this book will require you to think and flex your math muscles, I recommend it. Indeed, this book makes a good case for all data scientists to study causal science. It also shows why this level of understanding will help analysts. I hope you take up my recommendation and can’t wait to see a lot more causal diagrams.

What would you have analyzed differently if you knew how to answer the simple question, Why?

First published April 13, 2022, on the Customer Insight Leader blog.

Discuss

About The Author

Paul Laughlin’s picture

Paul Laughlin

Paul Laughlin is a speaker, writer, blogger, CustomerInsightLeader.com enthusiast, and the founder and managing director of Laughlin Consultancy.