Featured Product
This Week in Quality Digest Live
Customer Care Features
Knowledge at Wharton
Companies can benefit themselves if they bring out the altruistic side of their consumers
Annette Franz
You can’t transform what you don’t understand
David Dubois
Create a competitive advantage by tailoring technology to customer relationships
Alex Bekker
Give customers what they want, keep storage costs under control
Annette Franz
You may think you are, but most are probably not

More Features

Customer Care News
The FDA wants medical device manufactures to succeed, new technologies in supply chain managment
Chick-fil-A leads; Chipotle Mexican Grill stabilizes
Consolidated Edison posts large gain; patient satisfaction is stable
Partnership for a Cleaner Environment (PACE) program has grown to more than 40 suppliers in 40 countries
Trader Joe’s tops supermarkets; Home Depot overtakes Lowe’s
TVs and video players lead the pack, with internet services at the bottom
AIAG’s director of corporate responsibility comments on impact of new ethics language in upcoming IATF 16949
Good news for Detroit
The Baldrige Criteria for Performance Excellence can help

More News

Bruno Scibilia

Customer Care

Creating Value From Your Data

A simple example of the benefits of analyzing an enormous database

Published: Tuesday, September 27, 2016 - 14:37

There may be huge potential benefits waiting in the data in your servers. These data may be used for many different purposes. Better data allow better decisions, of course. For instance, banks, insurance firms, and telecom companies already own a large amount of data about their customers. These resources are useful for building a more personal relationship with each customer.

Some organizations already use data from agricultural fields to build complex and customized models based on an extensive number of input variables (e.g., soil characteristics, weather, and plant types) in order to improve crop yields. Airline companies and large hotel chains use dynamic pricing models to improve their yield management. Data are increasingly being referred as the new “gold mine” of the 21st century.

A few factors underlie the rising prominence of data (and, therefore, data analysis).

Huge volumes of data

Data acquisition has never been easier, thanks to sensors in manufacturing plants and connected objects, as well as data from internet use and web clicks, from credit cards, fidelity cards, and customer relations management (CRM) databases, and from satellite images, to name just a few examples. Data can easily be stored at costs that are lower than ever before; huge storage capacity is now available on the cloud and elsewhere. The amount of data that is being collected is not only huge, it is growing very fast—exponentially, in fact.

Unprecedented velocity

Connected devices, like our smartphones, provide data in almost real time, and these data can be processed quickly. It’s now possible to react to any change, almost immediately.

Incredible variety

The data collected aren’t restricted to billing information; every source of data is potentially valuable for a business. Not only is numeric data getting collected in a massive way, but also unstructured data such as videos and pictures in a variety of situations.

But the explosion of data available to us is prompting every business to wrestle with an extremely complicated problem:

How can we create value from these resources?

Simple methods, such as counting words used in queries submitted to company websites, do provide insight as to the general mood and trend of your customers. Simple statistical correlations are often used by web vendors to suggest a purchase just after buying a product on the web. Very simple descriptive statistics are also useful.

Imagine what could be achieved from advanced regression models or powerful statistical multivariate techniques, which can be applied easily with statistical software packages like Minitab.

Let’s consider an example of how one company benefited from analyzing a very large database.

In the airline industry, many steps (such as security and safety checks, and cleaning the cabin) are needed before a plane can depart. Because delays negatively affect customer perceptions and productivity, airline companies routinely collect a very large amount of data related to flight delays and times required to perform tasks before departure. Sometimes this information is automatically collected, and sometimes it’s manually recorded.

A major airline company intended to use these data to identify the crucial milestones among a very large numbers of preparation steps, and to determine which ones often triggered delays in departure times. The company used Minitab’s stepwise regression analysis to quickly focus on the few variables that played a major role among a large number of potential inputs. Many variables turned out to be statistically significant, but two among them clearly seemed to make a major contribution (see X6 and X10 below).

When huge databases are used, statistical analyses can become overly sensitive and detect even very small differences due to the large sample and power of the analysis. P values often tend to be quite small (p < 0.05) for a large number of predictors.

However, in Minitab, if you click on Results in the regression dialogue box and select Expanded tables, contributions from each variable will be displayed. Whhen considered together, X6 and X10 were contributing to more than 80 percent of the overall variability (with the largest F values by far). The contributions from the remaining factors were much smaller. The airline then ran a residual analysis to cross-validate the final model.

In addition, a a multivariate technique called principal component analysis (PCA) was performed in Minitab to describe the relations between the most important predictors and the response. Milestones were expected to be strongly correlated to the subsequent steps.

The graph above is a loading plot from a principal component analysis. Lines that go in the same direction and are close to one another indicate how the variables may be grouped. Variables are visually grouped together according to their statistical correlations and how closely they are related.

A group of nine variables turned out to be strongly correlated to the most important inputs (X6 and X10) and to the final delay times (Y). Delays at the X6 stage obviously affected the X7 and X8 stages (subsequent operations), and delays from X10 affected the subsequent X11 and X12 operations.


This analysis provided simple rules that this airline’s crews could follow in order to avoid delays, making passengers’ next flight more pleasant. The airline can repeat this analysis periodically to search for the next most important causes of delays. Such an approach can propel innovation and help organizations replace traditional and intuitive decision-making methods with data-driven ones.

What’s more, using data to improve operations isn’t restricted to the corporate world. Increasingly, public administrations and nongovernmental organizations are making large, open databases easily accessible to communities and to virtually anyone.


About The Author

Bruno Scibilia’s picture

Bruno Scibilia

Bruno Scibilia is a technical training specialist and leads several Minitab public training courses in Paris. He joined the Paris-based Minitab office in 2007 as a technical support specialist responding to customer queries regarding installation and use of Minitab Statistical Software and the project management software, Quality Companion. Prior to joining Minitab, Scibilia worked as a statistical engineer in the semiconductor industry for six years, and was a practitioner of quality improvement statistical techniques within a manufacturing company for a number of years. He has a Ph.D. in design of experiments and he is an ASQ-certified Six Sigma Black Belt.