The School Zone Example
A school zone from Chillicothe, OH (1940). Note children had to beware of speeders even back then. Photo courtesy the Library of Congress. |
Every morning, I drive by an elementary school. While the posted speed limit is 20mph, I am always passed by a couple cars driving 35 (or so). This is dangerous for the children. I contacted the City of Stillwater to let them know that the speed limit is routinely ignored. Before spending the money to post police officers in the speed zone, they wanted an estimate of the proportion of speeders.
Being a statistician, I knew it would not take too much effort to collect the data and crunch the numbers. All I needed to do was take a sample of cars driving through the school zone, measure their speed, and determine if they were speeding. So, I gave it to a student as a project. For the sake of brevity, let us call this student Captain Kirk (who has since graduated from OSU).
Step 1: Collect the data
The city wanted an estimate of the population proportion. So, Kirk needed to collect data from a sample of all cars passing through the school zone. The sample needed to be representative of the population. To accomplish this, he sat in the school zone from 7:00 until 8:30am (the time it is limited to 20 mph) every day of the week. He counted two things: the number of cars passing through and the number of cars going at least 30mph. He was able to measure this second variable because he was a university policeman with a personal radar gun. Here is a summary of the data he collected:
Day | Monday | Tuesday | Wednesday | Thursday | Friday | Totals | |||||
Speeders | 31 | 12 | 33 | 15 | 46 | 137 | |||||
Cars | 324 | 280 | 391 | 256 | 310 | 1561 | |||||
Step 2: Analyze the data
Statistics helps us understand the population based on the sample.
Note that there is variation across the days. There is also variation across the weeks, which you cannot see here. Both the number of cars passing through and the number of cars speeding are random variables. So is their ratio: the proportion of speeding cars in the sample. Since we cannot observe the zone always and forever (collecting the population), we must use statistics and our sample.
Using Section 8.4 (page 367), the point estimate of the proportion of speeders is $\hat{p} = 137/1561 = 0.087764$. Note that Section 8.4 tells us that the methods to calculate confidence intervals work only if $n\hat{p}$ and $n(1-\hat{p})$ are both at least 10. Both hold in this example. The number of successes is 137 and the number of failures is 1424. Thus, we can use these methods to estimate the confidence interval.
So, using the formula on page 367, the lower bound of a 95% confidence interval is
$$ \begin{align} \text{Lower bound} &= \hat{p} - Z_{\alpha/2} \sqrt{ \frac{\hat{p}(1-\hat{p})}{n}} \\[1em] & = 0.087764 - Z_{0.025} \sqrt{ \frac{0.087764(1-0.087764)}{1561}} \\[1em] & = 0.087764 - 1.96 \sqrt{0.0000512886} \\[1em] & = 0.087764 - 0.014036764 \\[1em] & = 0.0737 \\ \end{align} $$
Similarly, the upper bound is
$$ \begin{align} \text{Upper bound} &= \hat{p} + Z_{\alpha/2} \sqrt{ \frac{\hat{p}(1-\hat{p})}{n}} \\[1em] & = 0.087764 + Z_{0.025} \sqrt{ \frac{0.087764(1-0.087764)}{1561}} \\[1em] & = 0.087764 + 1.96 \sqrt{0.0000512886} \\[1em] & = 0.087764 + 0.014036764 \\[1em] & = 0.1018 \\ \end{align} $$
Step 3: Conclusions
Thus, I provided my data and my calculations to the City of Stillwater. I am 95% confident that the proportion of drivers going at least 10mph over the speed limit in the school zone is between 7% and 10%. Now, it is up to the city to determine if that is too high and what to do about it. If they do something, then we should perform the same experiment to see if the intervention accomplished its goal. But, comparing two populations will need to be left for a later module.