Statistics Article

Danielle Underferth’s picture

By: Danielle Underferth

As municipalities clamor for a slice of President Biden’s $1.2 trillion infrastructure spending bill, one Johns Hopkins scientist is re-examining one of the basic elements of road-building: Determining the width of road lanes. But determining the width that provides the highest level of safety, access, and comfort for every road user—drivers, cyclists, and pedestrians—is complex, says Shima Hamidi, an assistant professor in Johns Hopkins’ Department of Environmental Health and Engineering, which is shared by the Whiting School of Engineering and the Bloomberg School of Public Health.

It’s a data problem, she says, and she wants to help cities solve it.

Hamidi is undertaking a massive collection of data on urban streets across the United States to answer one question: How low can cities go on street width to make room for bike lanes and wider sidewalks?

Tristan Mobbs’s picture

By: Tristan Mobbs

All too often the topic of fixing dirty data is neglected in the plethora of online media covering artificial intelligence (AI), data science, and analytics. This is wrong for many reasons.

To highlight just one, confidence in the quality of data is the vital foundation of all analysis. This topic remains relevant for all levels of complexity, from spreadsheets to complex machine-learning models.

So, I was delighted to review Susan Walsh’s book, Between the Spreadsheets: Classifying and Fixing Dirty Data (Facet Publishing, 2021). Here are some highlights from her book, and my own advice on who should read it.

Atul Minocha’s picture

By: Atul Minocha

Do you ever feel like you’re spending money like crazy on marketing and getting little or nothing in return? If so, you might be tempted to pull the plug on marketing altogether. That would be a big mistake.

An effective marketing strategy can mean the difference between your organization’s success and failure. To maximize your strategy, there are eight common marketing mistakes you should avoid at all costs.

#1 Focusing solely on data

Most marketers firmly believe the old saying, “What doesn’t get measured doesn’t get improved.” They track various metrics, hoping the data will show them how to improve customer engagement.

The problem is some of the most important elements of customer engagement—like emotional response—can’t be tracked easily. How do you measure whether or not you’re tugging at their heartstrings?

The real power of marketing comes from synergy of both the left brain (data) and the right brain (emotion). Focusing solely on the data will never lead to optimal results.

Donald J. Wheeler’s picture

By: Donald J. Wheeler

Most of the world’s data are obtained as byproducts of operations. These observational data track what happens over time and have a structure that requires a different approach to analysis than that used for experimental data. An understanding of this approach will reveal how Shewhart’s generic, three-sigma limits are sufficient to define economic operation for all types of observational data.

Management requires prediction, yet all data are historical. To use historical data to make predictions, we will have to use some sort of extrapolation. We might extrapolate from the product we have measured to a product not measured, or we might even extrapolate from the product measured to a product not yet made. Either way, the problem of prediction requires that we know when these extrapolations are reasonable and when they are not.

The structure of observational data

Before we talk about prediction, we need to consider the structure of observational data. For any one product characteristic we can usually list dozens, or even hundreds, of cause-and-effect relationships which affect that characteristic. Some of these causes will have larger effects than the others. So, if we had perfect knowledge, we could arrange the causes in order according to their effects to obtain a Pareto like Figure 1.

Anthony D. Burns’s picture

By: Anthony D. Burns

I’m a chemical engineer. The fundamentals of the chemical engineering profession were laid down 150 years ago by Osborne Reynolds. Although chemical engineering has seen many advances, such as digital process control and evolutionary process optimization, every engineer understands and uses Reynold’s work. Most people have heard of the Reynolds number, which plays a key role in calculating air and liquid fluid flows. There are no fads. Engineers use the fundamentals of the profession.

Fads, fads, fads

By contrast, in the past 70 years, “quality” has seen more than 20 fads. The fundamentals have been forgotten and corrupted. Quality has been lost. Quality managers engage in an endless pursuit of magic pudding that will fix all their problems.

Alarmingly, the latest “quality” fad, Agile, has nothing to do with quality. It’s a software development fad that evolved from James Martin’s rapid application development (RAD) fad of the 1980s. This in turn grew into the rapid iterative processing (RIP) fad. When it comes to quality today, anything will do, no matter how unrelated.

W. Edwards Deming’s picture

By: W. Edwards Deming

Editor’s note: The following is from a transcript of a forgotten speech given in Tokyo in 1978 by W. Edwards Deming for the Union of Japanese Scientists and Engineers (JUSE). Because the original was a poor photocopy, there are small portions of text that could not be transcribed. Transcript courtesy of Mike McLean.

The spectacular leap in quality of most Japanese manufactured products, from third-rate to top quality and dependability, with astounding economy in production, started off in 1950 with a meteoric flash, and still continues. The whole world knows about Japanese quality and the sudden surge upward that began in 1950, but few people have any idea how it happened.

It seems worthwhile to collect in one place the statistical principles of administration that made possible the revolution of quality in Japan, as even at this date, most of these principles are not generally understood or practiced in America. It is for this reason that the title speaks of new principles.

The relative importance of some of the principles explained here have of course changed over the years since 1950. Some principles stated here have emerged as corollaries of earlier principles. Other corollaries could be added, almost without end.

William A. Levinson’s picture

By: William A. Levinson

Part one of this article showed that it is possible, by means of a Visual Basic for Applications program in Microsoft Excel, to calculate the fraction of in-specification product that is rejected by a non-capable gage, as well as the fraction of nonconforming product that is accepted. This calculation requires only 1) the process performance metrics, including the parameters of the distribution of the critical to quality characteristic, which need not be normal; and 2) the gage variation as assessed by measurement systems analysis (MSA).

Part 2 of the series shows how to optimize the acceptance limits to either minimize the cost of wrong decisions, or assure the customer that it will receive no more than a specified fraction of nonconforming work.

William A. Levinson’s picture

By: William A. Levinson

IATF 16949:2016 clause 7.1.5.1.1 requires measurement systems analysis (MSA) to quantify gage and instrument variation. The deliverables of the generally accepted procedure are the repeatability or equipment variation, and the reproducibility or appraiser variation. The Automotive Industry Action Group1 adds an analytic process with which to quantify the equipment variation (repeatability) of go/no-go gages if these come in specified dimensions, or can be adjusted to selected dimensions.

The anvils of a snap gage can, for example, be set accurately to specified dimensions with Johansson gage blocks. Pin gages (also known as plug gages), on the other hand, come in small but discrete increments. If the precision to tolerance (P/T) ratio is greater than the generally accepted target, the gage cannot distinguish reliably between good and nonconforming product near the specification limits. This means nonconforming work will reach internal or external customers, while good items will be rejected, as shown in figure 1 below.

Saligrama Agnihothri’s picture

By: Saligrama Agnihothri

Health-tracking devices and apps are becoming part of everyday life. More than 300,000 mobile phone applications claim to help with managing diverse personal health issues, from monitoring blood glucose levels to conceiving a child.

But so far the potential for health-tracking apps to improve healthcare has barely been tapped. Although they allow a user to collect and record personal health data, and sometimes even share it with friends and family, these apps typically don’t connect that information to a patient’s digital medical chart, or make it easier for healthcare providers to monitor or share feedback with their patients.

William A. Levinson’s picture

By: William A. Levinson

The first part of this series introduced measurement systems analysis for attribute data, or attribute agreement analysis. AIAG1 provides a comprehensive overview, and Jd Marhevko2 has done an outstanding job of extending it to judgment inspections as well as go/no-go gages. Part two will cover the analytical method, which allows more detailed quantification of the gage standard deviation and also bias, if any, with the aid of parts that can be measured in terms of real numbers.

Part one laid out the procedure for data collection as well as the signal detection approach, which identifies and quantifies the zone around the specification limits where inspectors and gages will not obtain consistent results. The signal detection approach can also deliver a rough estimate of the gage’s repeatability or equipment variation. Go/no-go gages that can be purchased in specific dimensions, or set to specific dimensions (e.g., with gage blocks) do indeed have gage standard deviations even though they return pass/fail results.

Syndicate content