The Four Questions of Data Analysis

Homogeneity is the primary question of analysis.

The four questions of data analysis are the questions of description, probability, inference, and homogeneity. Any data analyst needs to know how to organize and use these four questions to be able to obtain meaningful and correct results.

The description question

Given a collection of numbers, are there arithmetic values that will summarize the information contained in those numbers in some meaningful way?

The objective is to capture those aspects of the data that are of interest. Intuitive summaries such as totals, averages, and proportions need little explanation. Other summaries that are less commonly used may require some explanation, and even some justification, before they make sense. However, in the end, to be effective a descriptive statistic has to make sense—it has to distill some essential characteristic of the data into a value that is appropriate and understandable. In every case, this distillation takes on the form of some arithmetic operation:

Data + Arithmetic = Statistic

…

Want to continue?

By logging in you agree to receive communication from Quality Digest. Privacy Policy.

Create a FREE account

Forgot My Password

Comments

Good article. It reminds me

Good article.

It reminds me of an old story. A young man was caught breaking into the king's harem, for which the penalty was death. However the king recognised that he too was once a horny young man and being a gambling man, decided to give the young fellow a chance to live. The young man was blindfolded and asked to draw from a bowl containing 100 black beads and 100 white beads. If he drew black, he'd be put to death. If he chose white, he'd go free. The young man was a statistician. What did the young man do ?

Before going on to the answer, homogeneity of data brings to mind the current ClimateGate scandal. Global temperature trends over the past 150 years have been based on changing universes. 150 years ago, there were just a handful of weather stations, with very poor measurement systems, quite different to those today. The number of stations grew to 8000 in the 1980's, then fell to around 4000. How can any valid conclusions be drawn about an "average" ?

Back to the story. The young man, being aware of homogeneity, asked if he could put the beads into two bowls, while keeping each bowl well mixed. The king saw no problem, so he said yes. The young man placed one white bead in one bowl with the remainder in the other, dramatically improving his chances of living.