Statistics Article

Multiple Authors
By: Tom Siegfried, Knowable Magazine

If Fyodor Dostoyevsky had been a mathematician, he might have written a book called Crime and Statistics. However, since “statistics” doesn’t have quite the same ring as “punishment,” it wouldn’t have sold as well.

But such a book would make a better guide for formulating crime-fighting policy. Analyzing criminal behavior scientifically, using proper statistical methods, could enhance the ability of criminologists to better understand crime and what to do about it.

“The field needs broader and deeper scientific examination,” writes statistician-criminologist Greg Ridgeway in an upcoming Annual Review of Statistics and Its Application.

David Currie’s picture

By: David Currie

This is part three of a three-part series. Read about good metrics in part one and bad metrics in part two.

Have you ever had occasion to dread a metric reviewed month after month, where the metric defies logic, and any action taken does not seem to reflect in the metric? It is most likely a bad metric in so many respects that it has turned ugly. Let’s look at a sample ugly metric.

Quality Digest’s picture

By: Quality Digest

Annalise Suzuki, director of technology and engagement at software provider Elysium Inc., spoke to Quality Digest about the importance of model-based definitions (MBD) for data quality, validation, and engineering change management. With the increase of digital 3D models in the manufacturing workflow, companies are appreciating their value for speeding product development, improving quality and performance, and allowing for greater automation. Here, Suzuki answers seven questions about the model-based enterprise (MDE)’s current and future role in industry.

Anthony Chirico’s picture

By: Anthony Chirico

Everybody wants to design and conduct a great experiment! To find enlightenment by the discovery of the big red X and perhaps a few smaller pink x’s along the way. Thoughtful selection of the best experiment factors, the right levels, the most efficient design, the best plan for randomization, and creative ways to quantify the response variable consume our thoughts and imagination. The list of considerations and trade-offs is quite impressive. Then, finally, after optimizing all these considerations, and successfully running the experiment, and then performing the analysis... there is the question of “statistical significance.” Can we claim victory and success?

The answer lies in-part on the critical value provided by a table of critical values—or by a computer program. If our calculated test statistic exceeds the critical value, we will reject the null hypothesis and claim there is a difference among the treatment averages. If our calculated test statistic does not exceed the critical value, we will fail to reject the null hypothesis. This is the moment of truth at the end of all our hard work. This is a moment of anticipation and excitement.

Scott A. Hindle’s picture

By: Scott A. Hindle

Walter Shewhart, father of statistical process control and creator of the control chart, put a premium on the time order sequence of data. Since many statistics and graphs are unaffected by this, you might wonder what the fuss is about. Read on to see why.

Figure 1 shows a series of measurements over 11 months. Each measurement value is from one production batch, with the date of each production given. The date is formatted as day first, and month second, meaning that “06.01”—the first measurement of 69.4—is from January 6.


Figure 1: Measurement data in time order of production.

Minitab LLC’s picture

By: Minitab LLC

Machine learning as a tool in your analytical toolkit can help accelerate the discovery of insights in data that can create a more efficient manufacturing process and drive innovation.

Machine learning in the spotlight

The growth in availability of technologies that give us the ability to monitor, collect, exchange, analyze, and deliver information will only continue to expand. With this network of growing devices creating a loop between the physical and digital worlds, we now have access to high volumes of data about manufacturing operations like never before. Leveraging these data to drive actionable insights will be the key to prioritizing improvements and driving innovation for overall competitiveness.


Minitab LLC’s picture

By: Minitab LLC

Process validation is vital to the success of companies that manufacture pharmaceutical drugs, vaccines, test kits, and a variety of other biological products for people and animals. According to FDA guidelines, process validation is “the collection and evaluation of data, from the process design state through commercial production, which establishes scientific evidence that a process is capable of consistently delivering a quality product.”

The FDA recommends three stages for process validation. Let’s explore the stage goals and the types of activities and statistical techniques typically conducted within each. You can use Minitab Statistical Software to run any of the analyses here. If you don’t yet have Minitab, try it free for 30 days.

Stage 1: Process design

Goal: Design a process suitable for routine commercial manufacturing that can consistently deliver a product that meets its quality attributes.

It is important to demonstrate an understanding of the process and characterize how it responds to various inputs within process design.

David Currie’s picture

By: David Currie

This is the second article in a three-part series to help readers distinguish good metrics from bad. In part one we discussed good metrics. Here, we will look at a bad metric and consider how to change it into a useful, good metric. A bad metric is one that fails in one or more of the attributes of a good metric and is often not usable for the purpose it was intended.

Attributes of a good metric

A good metric:
• Supports the goals and objectives of the quality system
• Contains data with sufficient detail to allow analysis of specific defects
• Contains data that have been carefully collected, and checked for accuracy and completeness
• Contains data that are combined in a way that clearly represents the process
• Uses a data-collection process that is clearly understood
• Demonstrates a clear relationship between the process and the data being used
• Has a metric-review interval that matches the response time for corrections
• Results in process improvement and overall cost savings

Anthony Chirico’s picture

By: Anthony Chirico

Perhaps the reader recognizes d2 as slang for “designated driver,” but quality professionals will recognize it as a control chart constant used to estimate short-term variation of a process. The basic formula shown below is widely used in control charting for estimating the short-term variation using the average range of small samples. But what exactly is d2 and why should we care?

L.H.C. Tippett

To find some answers to this question, we need to consult the 1925 work of L.H.C. Tippett.1 Leonard Henry Caleb Tippett was a student of both Professor K. Pearson and Sir Ronald A. Fisher in England. Tippett pioneered “Extreme Value Theory,” and while advancing the ideas of Pearson’s 1902 paper of Galton’s Difference Problem,2 he noted that the prior work of understanding the distribution of the range for a large number of samples was deficient.

Tippett proceeded to use calculus and hand calculations to integrate and determine the first, second, third, and fourth moments of the range for samples drawn from a standard normal distribution. That is, he calculated the mean, variance, skewness, and kurtosis for sample sizes of size two through 1,000 by hand.

Minitab LLC’s picture

By: Minitab LLC

Choosing the correct linear regression model can be difficult. Trying to model it with only a sample doesn’t make it any easier. Let’s review some common statistical methods for selecting models, complications you may face, and look at some practical advice for choosing the best regression model.

It starts when a researcher wants to mathematically describe the relationship between some predictors and the response variable. The research team tasked to investigate typically measures many variables but includes only some of them in the model. The analysts try to eliminate the variables that are not related and include only those with a true relationship. Along the way, the analysts consider many possible models.

They strive to achieve a Goldilocks balance with the number of predictors they include.  
Too few: An underspecified model tends to produce biased estimates.
Too many: An overspecified model tends to have less precise estimates.
Just right: A model with the correct terms has no bias and the most precise estimates.

Syndicate content