The Law of Large Numbers and Big Data

The assumption behind big data techniques

Mon, 07/21/2025 - 12:03

In statistics class we learn that we can reduce the uncertainty in our estimates by using more and more data. This effect has been called the “law of large numbers” and is one of the primary ideas behind the various big data techniques that are becoming popular today. Here we’ll look at how the law of large numbers works in some simple situations to gain insight on how it will work in more complex scenarios.

When we use a statistic to estimate some property of a process or system, we have to consider the inherent uncertainty in our estimate. And, as theory suggests, this uncertainty will shrink as the amount of data used in our computation increases.

To illustrate this relationship, consider the results of a series of drawings from a bead box containing a mix of red and white beads as shown in Figure 1. A sample of these beads is obtained using a paddle with 50 holes on one side. The number of red beads in a sample of 50 beads is recorded; then the 50 beads are returned to the bead box, and the whole box is stirred up before the next sample is drawn.

…

Want to continue?

By logging in you agree to receive communication from Quality Digest. Privacy Policy.

Create a FREE account

Forgot My Password

Comments

Great Insight!

This is a very good reminder for those who spend more time in front of a computer screen than they do on the floor. Statistical processes are great and very useful, but I love that you are reminding us that we need to include some common sense to the numbers before we blindly believe the conclusions.

I would love for you (or someone else) to now draw the line between this and Large Language Models.