This mini-lecture gives you three things. First (and second), it emphasizes (and explains) the Central Limit Theorem and gives examples of probability calculations. Third, it shows the connection between the Central Limit Theorem and sampling distributions.

The Central Limit Theorem

[Normal graphic]

A graphic of a Normal distribution. According to the Central Limit Theorem, sample means are almost Normally distributed for large sample sizes.

The proof of this theorem is beyond the scope of the course. Its statement is not, nor are its uses. In fact, the results from Section 8.1 rely on it. Page 334 gives a simple statement of it. Note four things about that definition. First, it talks about the distribution of $\overline{X}$ (sample means). Second, as the sample size increases, the actual distribution of $\overline{X}$ gets closer and closer to Normal. Third, the mean of $\overline{X}$ is the mean of X, μ. Fourth, the standard deviation of $\overline{X}$ is the standard deviation of X divided by the square root of the sample size: $\sigma / \sqrt{n}$.

That’s all there is to the Central Limit Theorem. If you wish to approximate the distribution of a sample mean, you need three pieces of information: n, μ, and σ. With those, you have

$$ \overline{X} \sim N \left( \mu; \frac{\sigma}{\sqrt{n}} \right) $$

Note that this distribution is approximate. However, sample sizes of n > 30 usually give a sufficient approximation.

The Central Limit Theorem seems rather odd. If the data are not Normally distributed, how can the sample means be Normally distributed? In fact, what does it mean for a sample mean to have a distribution? Well, the answer to the second question is this: The sample mean is based on the sample collected. Thus, as the sample is a random collection of people/places/things, the sample mean is also random.

To see this, let us compare the distribution of grades for a statistics course and the distribution of the sample means. First, here is a histogram of the observations (the data, the scores):

[histogram of observations]

Note that the distribution of those scores is definitely not Normal. They are more bowl-shaped than bell-shaped. Now, look at the distribution of sample means from this data.

[histogram of sample means]

Note that the distribution of those sample means is almost perfectly Normal.

Aside: Getting those sample means

So, how did I obtain the distribution of the sample means? I bootstrapped them: Since the sample size is n=50, I just randomly sampled 50 scores from the original data and calculated their sample mean. Repeating this process a million times leaves me with a million sample means drawn from this sample. The golden graphic is just a histogram of those sample means.

I did not use Excel to perform this bootstrapping. I used a statistical program called R. Any statistical program that allows for scripting would work just as well, so STATA and SAS would provide the same results. Using the right program always makes some actions much easier… much easier.

We are examining the sample mean, not the sample.

Here is the upshot: The data may not follow a Normal distribution, but the sample means will come close (as long as the sample size is large enough). Understanding this sentence is important, because it marks our transition from talking about the observations to talking about the mean of the population.

Probability Examples

[The Gateway Arch]

The Gateway Arch in St. Louis, MO. Photo courtesy the Library of Congress.

These examples may seem contrived. They are. The real strength and importance of the Central Limit Theorem comes in taking a sample and estimating the population mean. However, to better understand the process and the concept, these examples give you the population mean and ask you to calculate certain probabilities.

A. Let us be given that the average (mean) width of a stainless steel panel in the Gateway Arch is 10cm, with a standard deviation of 1cm. If I measure the thicknesses of 16 panels, what is the probability of the sample mean being less than 9.75cm?

In this problem, we are given μ=10cm, σ=1cm, and n=16. These pieces of information tell us

$$ \overline{X} \sim N \left( 10; \frac{1}{\sqrt{16}} \right) = N \left( 10; 0.25 \right) $$

We need to calculate $P[ \overline{X} < 9.75 ]$. In Excel, that would be =NORM.DIST(9.75,10,0.25,TRUE). This gives you a probability of 0.1587. (Before moving your mouse over the grey box to see the answer, calculate the answer yourself.)

This means there is a 15.87% chance of a sample of 16 panels having an average thickness less than 9.75cm.

B. Let us be given that the average (mean) thickness of a panel on the Gateway Arch is 10cm, with a standard deviation of 1cm. If I collect 25 panels, what is the probability of the sample mean being less than 9.75cm?

In this problem, we are given μ=10cm, σ=1cm, and n=25. Note that the sample size is differnt. How does that change our distribution for $\overline{X}$? Only in the standard deviation:

$$ \overline{X} \sim N \left( 10; \frac{1}{\sqrt{25}} \right) = N \left( 10; 0.20 \right) $$

Again, we need to calculate $P[ \overline{X} < 9.75 ]$. In Excel, that would be =NORM.DIST(9.75,10,0.20,TRUE). This gives you a probability of 0.1056.

This means there is a 10.56% chance of a sample of 25 panels having an average thickness less than 9.75cm.

C. To make the point more clear, let us be given that the average (mean) thickness of a stainless steel panel on the Gateway Arch in St. Louis is 10cm, with a standard deviation of 1cm. If I measure 100 panels, what is the probability of the sample mean being less than 9.75cm?

How does this differ from the previous examples? Here, we are given μ=10cm, σ=1cm, and n=100. These pieces of information tell us

$$ \overline{X} \sim N \left( 10; \frac{1}{\sqrt{100}} \right) = N \left( 10; 0.10 \right) $$

Again, we need to calculate $P[ \overline{X} < 9.75 ]$. In Excel, that would be =NORM.DIST(9.75,10,0.10,TRUE), which gives you a probability of 0.0062.

This means there is a 0.62% chance of a sample of 100 panels having an average length less than 9.75cm.

D. Just one last time: Let us be given that the average (mean) thickness of a stainless steel panel on the Gateway Arch in St. Louis is 10cm, with a standard deviation of 1cm. If I measure 400 panels, what is the probability of the sample mean being less than 9.75cm?

How does this differ from the previous examples? Here, we are given μ=10cm, σ=1cm, and n=400. These pieces of information tell us

$$ \overline{X} \sim N \left( 10; \frac{1}{\sqrt{400}} \right) = N \left( 10; 0.05 \right) $$

Again, we need to calculate $P[ \overline{X} < 9.75 ]$. In Excel, that would be =NORM.DIST(9.75,10,0.05,TRUE), which is 0.0000.

This means there is a 0.00% chance of a sample of 400 panels having an average length less than 9.75cm.
 

The sample size only affects the standard deviation of the sampling distribution, not the expected value (mean).

Note the difference in the probabilities for the four examples, which only differed in the sample size. The only effect of the sample size is on the standard deviation of the sample mean. As n gets larger, the standard deviation gets smaller. A small standard deviation indicates that the distribution is very peaked around the population mean. That is, if we increase the sample size, our sample means will tend to be much closer to the population mean. Our estmate is more precise.

The rest of Chapter 7 consists of motivating and explanatory examples.

Useful Theory Videos

The following are videos about the theory of the Central Limit Theorem. They are not software-specific. The actual calculations are done using the same techniques discussed in The Normal Distribution.

In addition to this video, there is a large number of videos on YouTube for learning about the Central Limit Theorem. The following search link will take you to YouTube and provide you with a non-exhaustive list: The Central Limit Theorem.

That is it. In this mini-lecture, we looked at the Central Limit Theorem and what it can be used for. We saw how to perform probability calculations about the sample mean using sampling distributions. Without question, the Normal distribution is the most useful distribution we have. If you do not recall how to calculate probabilities of the Normal distribution using your program of choice, here is the The Normal Distribution lecture. Feel free to visit it.