This example gives two full examples of how I could use confidence intervals in my research. Remember that understanding confidence intervals requires that you also understand sampling distributions. So, you may want to revisit that mini-lecture.

Before you read through — and work through — these examples, you will want to also refresh your memory on the Binomial distribution, the Normal distribution, and the first GM Recall example. That refresher will help this be more a review than something new.

Confidence Intervals I: Estimating the population mean

[The GM Technical Center]

The General Motors Technical Center in Warren, MI. Photo courtesy the Library of Congress.

Last part’s lengthy GM Recall example started with me saying “the average length of the ignition switches is actually μ = 10.45 cm.” Since I used a Greek letter, the 10.45 cm value refers to the population mean. The population is the set of all ignition switches that were made, will be made, and could have been made by this production process. The population is not observable. In real life, the population is never observable. So, how do we determine the population mean? We estimate it using confidence intervals.

The first step is to collect data. We need to collect a sample from the population in order to understand anything about the population. The sample is not the population. It does, however, represent the population. Well, it represents the population if the sample we draw is a random sample. To obtain a random sample, we just randomly select a sample of ignition switches produced by the process and measure their lengths.

Since it is a sample of data, it has a sample mean and a sample standard deviation. These are our best estimates—called points estimates—of the corresponding population parameters. Point estimates are very important. They are also just single numbers. They do not, in themselves, indicate the precision of their estimates.

To indicate precision, we can either provide the margin of error or a confidence interval for the population mean. As they are used more frequently, let us calculate the confidence interval.

The population is not observable. The sample is.

By hand, we find the correct formula, plug in the numbers we have, and crank through the calculations. My sample is of size $n=30$, has mean $\bar{x}=14.41$ cm and has standard deviation $s=0.20$ cm. I want a 95% confidence interval because I usually use $\alpha=0.05$.

The formula for the endpoints of the confidence interval is on Page 358:

$$ \text{Bounds} = \bar{x} \pm t_{\alpha/2, n-1} \frac{s}{\sqrt{n}} $$

The book does not include the number of degrees of freedom in the t-multiplier. I do. Thus, the lower bound on the population mean is

$$ \begin{align} \text{Lower bound} &= \bar{x} - t_{\alpha/2, n-1} \frac{s}{\sqrt{n}} \\[1em] & = 14.41 - t_{0.025, 29} \frac{0.20}{\sqrt{30}} \\[1em] & = 14.41 - 2.045 \frac{0.20}{\sqrt{30}} \\[1em] & = 14.41 - 0.07467 \\[1em] & = 14.3353 \end{align} $$

Similarly, the upper bound is

$$ \begin{align} \text{Upper bound} &= \bar{x} + t_{\alpha/2, n-1} \frac{s}{\sqrt{n}} \\[1em] & = 14.41 + t_{0.025, 29} \frac{0.20}{\sqrt{30}} \\[1em] & = 14.41 + 2.045 \frac{0.20}{\sqrt{30}} \\[1em] & = 14.41 + 0.07467 \\[1em] & = 14.4847 \end{align} $$

In the formula, the term “t(α/2, df)” is the multiplier based on the distribution of the test statistic. I got the number from Table A.4. The number of degrees of freedom, df, is the sample size minus 1 (when dealing with a single population). I divided α by 2 because I needed a two-tailed confidence interval (the usual).

If we selected a different sample of ignition switches, measured their lengths, and performed the above calculations, the confidence limits would change. If we perform this experiment a gazillion times, approximately 95% of the confidence intervals will contain the population mean. We cannot say there is a 95% probability that any specific confidence interval contains the population mean, however. Thus, the correct conclusion is: “We are 95% confident that the average length of ignition switches produced by this process (the true population mean) is between 14.3353 and 14.4547 cm.”

Using Technology

In Excel, the function =CONFIDENCE.T(0.05,0.20,30) calculates the margin of error for the above example. Thus, the endpoints of the 95% confidence interval are

=14.41-CONFIDENCE.T(0.05,0.20,30)

and

=14.41+CONFIDENCE.T(0.05,0.20,30)

Confidence Intervals II: Estimating the population proportion

[The GM Technical Center]

The General Motors Technical Center in Warren, MI. Photo courtesy the Library of Congress.

In the above example, I showed how to calculate confidence intervals for population means when $\sigma$ is not known (the realistic case). Here, I show you how to calculate them for population proportions.

Let us continue the previous example, offering a different take on the issue. Instead of estimating the distribution of the ignition switch lengths, we can estimate the proportion of switches that do not fit. In some ways, this makes more sense because that proportion is all we really care about when calculating the costs of a recall. It is also less expensive (time) to determine if it is the wrong length than to actually measure the length.

As a side note, there are things called “go/no go gauges” that quickly determine if a part has a measurement within tolerances. An example of one is whether or not bolt holes are of the right size. Here is a video demonstrating one go/no go gauge created by Daniels Manufacturing Corporation. This go/no go gague is used to test if the tool’s opening is the right size. The actual testing starts taking place at 1:03, so you may want to jump to that spot.

Another example of one is a go/no go test for ensuring batter has the right consistency. Watch the video at Science Channel’s “How It’s Made” to see how Whoopie Pies are made. When you get to the viscometer at 1:05, you are viewing an example of a go/no go test. If the batter spreads too much or too little, it is discarded (no go). Here is the video:

To estimate the proportion of ill-fitting switches, we take a random sample of ignition switches and measure their lengths. Each switch is either within specifications (between 14.4 and 14.5 cm) (a “success”) or not (a “failure”). Let us say that the sample of 1000 ignition switches had 8 that were either too long or too short.

To get a better appreciation for the software, let’s do the calculations by hand (ewww!). The point estimate for the population proportion of faulty parts is $\hat{p} = x/n = 8/1000 = 0.008$. From knowing this quantity, we can calculate the confidence bounds. The lower bound for the 95% confidence interval on the population proportion is

$$ \begin{align} \text{Lower bound} &= \hat{p} - Z_{\alpha/2} \sqrt{ \frac{\hat{p}(1-\hat{p})}{n}} \\[1em] & = 0.008 - Z_{0.025} \sqrt{ \frac{0.008(1-0.008)}{1000}} \\[1em] & = 0.008 - 1.96 \sqrt{ 0.000007936 } \\[1em] & = 0.008 - 0.005521 \\[1em] & = 0.00248 \\ \end{align} $$

Similarly, the upper bound is

$$ \begin{align} \text{Upper bound} &= \hat{p} + Z_{\alpha/2} \sqrt{ \frac{\hat{p}(1-\hat{p})}{n}} \\[1em] & = 0.008 + Z_{0.025} \sqrt{ \frac{0.008(1-0.008)}{1000}} \\[1em] & = 0.008 + 1.96 \sqrt{ 0.000007936 } \\[1em] & = 0.008 + 0.005521 \\[1em] & = 0.01352 \\ \end{align} $$

Thus, we are 95% confident that the proportion of faulty ignition switches produced by this process is between 0.00248 and 0.01352.

Using Technology

For the Excel people out there, you are out of luck. There is no function in the base distribution that calculates the margin of error or the confidence intervals for a single population proportion. You will have to do the above calculations by hand.

And that is it. This example showed how to calculate confidence intervals for the population mean and for the population proportion. While it showed how to do the calculations by hand, this is about the last time we will be able to do so. The calculations are about to get very difficult. Use technology and smile!