This extended example shows how p-values can be used. The structure follows a scientific method to emphasize the underlying structure to research.
The Research Question
A school zone with crossing guard in Oakland, CA (1943). Photo courtesy the Library of Congress. |
The research question is a question that frames your interest in broad terms. It ends in a question mark and should be interesting to someone. For this example, the student (Captain Kirk) was completing an assignment for me. The interest was higher than expected.
Do people tend to slow down for the school zone?
North of our campus, there is an elementary school on Washington Street. The school zone has a speed limit of 30 mph, even when children are not present. It is located in a 35mph zone. As a police officer, Kirk had access to his own, personal radar gun, so getting the speeds was not a problem for him.
What was a problem, however, was making sure that Kirk’s presence did not influence the behavior of the drivers. For some reason, even someone without a uniform, slows people down when holding a radar gun. Also, a middle-aged man hiding in a bush near an elementary street may create its own set of issues.
To avoid these issues, Kirk decided to park his car along a side street near the school. This kept him hidden. It also allowed him to get speed measurements. We tested the accuracy of his radar gun and found that he was accurate to within 0.50 mph.
Note all of the work we had to do to make sure the measurements were representative of the population. This shows the importance — and difficulty — of planning.
The Research Hypothesis
The research hypothesis is a proposed answer to the research question. From experience, we decided that the answer would be that fewer than half of the drivers would slow down for the school zone. In symbols, the research hypothesis is
p < 0.500
The population parameter is p, the population proportion of people. The (in)equality sign is “fewer than.” The hypothesized value is 50%. Thus, we believe that the proportion of people who will slow down for the school zone is less than 50%. (Personally, I think it will be around 5%, but I am cynical.)
Remember: The research hypothesis is what the scientist cares about… the only thing. However, because of probability and the randomness of life, statisticians need two other hypotheses: the null and the alternative. From Table 1 in the p-values mini-lecture,
H0 : p ≥ 0.500
The alternative hypothesis is either the research hypothesis or its opposite. If there is an equals part to the research hypothesis, then the alternative hypothesis is the opposite of the research hypothesis. If there is no equals part, then the alternative hypothesis is the research hypothesis. Thus, for this example,
H1 : p < 0.500
Planning
Automobiles at the Cadillac Ranch (2013). Photo courtesy Carol M. Highsmith and the Library of Congress. |
Now that we have our hypotheses and a better understanding of the processes involved in creating the data, we can explicitly write our plan, which allows others to replicate our work.
- Select a random day of the week.
- Select three time periods (one hour each) during that day.
- Park the car as discussed and measure how many cars reduce their speed to 30mph or less when they enter the school zone.
- Repeat these three steps until at least 1000 cars are measured.
The second aspect of planning is planning the analysis. Here, it will be straight forward. We need to draw a conclusion about a single population based on a proportion. The correct test is the one-sample proportion test.
Execute the Plan
Now that we have a plan, we just need to execute it.
The student collected all of the data. Throughout it all, he kept aware of the data and thought of ways of extending the analysis (see next week). However, for this null hypothesis, he found that 21 of the 1045 cars reduced their speed to 30mph or less. He also found that only 54 out of the 1045 cars reduced their speed at all.
Analyze the Data
Unfortunately, Excel does not offer this function in the basic distribution. So, you will need to do these calculations “by hand” in Excel.
Interpret the Results
According to our calculations, the p-value is essentially 0. (The test statistic is a whopping -31.03!!.) As the p-value is less than our α=0.05, we reject the null hypothesis and conclude that the alternative is correct. In other words, we conclude that the proportion of people who slow down for this school zone is less than 50%.
Go Beyond the Yes/No
Alright, it’s cool that we know the proportion of people slowing down at this school zone is less than 50%. But, it would be interesting to know how many actually do slow down. This question requires us to calculate a confidence interval for p.
With any program, not only do we conclude that the proportion of drivers who slow down for the school zone is less than 50%, we are 95% confident that the (population) proportion is between 1.16 and 2.86%.
The Conclusion
It may help for you to see the entire conclusion written up. While this course does not require you to write these lengthy conclusions, you may gain from seeing how everything merges together in a final explanation.
We would like to test if at least half of the drivers tended to slow down for the school zone on Washington Street for Will Rogers Elementary School. To test this, we used a radar gun and the camouflage of a parked car to determine how many did so at various times during the week. Of the 1045 cars we measured, only 21 slowed down to the posted speed limit.
To test the hypothesis, we performed a one-sample proportions test. According to that test, fewer than half of the drivers slowed down for the school zone (p < 0.0001). In fact, a 95% confidence interval for the proportion of cars that did slow down to 30mph is from 1.16 to 2.86%.
This example showed how the last two weeks fit together to give us usable information about reality based on a sample. Here, we were able to estimate the proportion of drivers who slow down for the school zone outside Will Rogers Elementary School.
The next step is for a policy analyst to devise a method for increasing the slow-down rate. Once that plan is in place, we return and collect more data. We then compare the slow-down rates from before the implementation with those after. That comparison requires methods from next week.
And that is it. These two examples covered some additional aspects of hypothesis testing. They both started with the research question, and continued through the scientific method, ending with the actual conclusions we could draw based on the data we carefully collected.