This provides an example of one of the three tests you can run with the information in this module — the two-sample proportions test. The structure of the first follows a scientific method to emphasize the underlying structure to research. Much of this repeats the School Zone example from before. Repetition is not bad. Do this example after doing Section 10.3.

The Research Question

[an early car]

A car that does not need to slow down for the School Zone in Monroe County, WI. Photo courtesy the Library of Congress.

The research question is a question that frames your interest in broad terms. It ends in a question mark and should be interesting to someone. For this example, Kirk was extending his project for me. The interest was higher than expected.

Does the probability that a driver slows down for the school zone depend on whether the driver is wearing a seat belt?

North of campus, sits the awesome Will Rogers Elementary School on Washington Street. The school zone has a speed limit of 30 mph, even when children are not present. It is located in a 35mph zone. As a police officer, Kirk had access to a radar gun, so getting the speeds was not a problem. A couple weeks ago, we determined that we are 95% confident that the true proportion of drivers who drive the speed limit in the school zone is between 1.16 and 2.86%; that is, p ∈ (1.16, 2.86). This week, we want to see if that proportion differs for people depending on if they wear their seat belts.

The Research Hypothesis

The research hypothesis is a proposed answer to the research question. From experience, we decided that the answer would be “yes;” those wearing their seat belts are more likely to go the speed limit. Translated into symbols, this is

pb > pn

The population parameter is p, the population proportion of people. The (in)equality sign is “greater than.” The two groups are the “belted” group and the “non-belted” group (hence the subscripts). Importantly, note that this research hypothesis is equivalent to

pb − pn > 0

Putting it in this form may help you see the connection with what we have done in the past.

Remember: The research hypothesis is what the scientist cares about… the only thing. However, because of probability and the randomness of life, statisticians need two other hypotheses: the null and the alternative. The null hypothesis here is

H0 : pb − pn ≤ 0

The alternative hypothesis is either the research hypothesis or its opposite. If there is an equals part to the research hypothesis, then the alternative hypothesis is the opposite of the research hypothesis. If there is no equals part, then the alternative hypothesis is the research hypothesis. Thus, for this example,

H1 : pb − pn > 0

Recall Table 1 in a previous mini-lecture. Again, feel free to learn that table.

Planning

Now that we have our null hypothesis and a better understanding of the processes involved in collecting the data, we can explicitly write our plan, which allows others to replicate our work. This is almost the same plan as last week. The only difference is that one additional measurement is made.

  1. Select a random day of the week.
  2. Select three time periods (one hour each) during that day.
  3. Park the car as discussed.
  4. Measure whether the driver reduces the car’s speed to 30mph or less when they enter the school zone.
  5. Determine if the driver is wearing a seat belt.
  6. Repeat these three steps until at least 1000 cars are measured.

The second aspect of planning is planning the analysis. Here, it will be straight forward. We need to draw a conclusion about comparing two populations based on a proportion. The correct test is the two-sample proportions test.

Note that we are not trying to understand either sub-population individually. The research question and hypothesis are about comparing the two.

Execute the Plan

[A seat belt]

A bucket seat with Schroth six-point harness in a 2010 Porsche 911 GT3 RS. Photo courtesy The Car Spy.

Now that we have a plan, we just need to execute it.

Kirk collected all of the data. Throughout it all, he kept aware of the data and thought of ways of extending the analysis. However, for this null hypothesis, he found that 19 of the 739 belted drivers reduced their speed to 30mph or less, but only 2 of the 306 non-belted drivers did.

Analyze the Data

Now, with the data, we can test the null hypothesis. Here is the data in tabular form:

Table 1: Data from the seatbelt/speeding example.
Group: Belted Unbelted
Slowed 19 2
Total 739 306

Here is the data and work for this example. Feel free to download it to see the calculations performed. The first sheet is the summarized data; the second holds the calculations.

Interpret the Results

According to our software, the p-value is 0.0222 (the test statistic is z=2.01). As the p-value is less than our α=0.05, we reject the null hypothesis and conclude that the alternative is correct. In other words, we conclude that belted drivers do tend to slow down for school zones at a greater rate than non-belted drivers.

It appears as though those who are willing to break the law once are more likely to break it again.

The Conclusion

It may help for you to see the entire conclusion written up. While this course does not require you to write these lengthy conclusions, you may gain from seeing how everything merges together in a final explanation.

We would like to test if seat-belted drivers tend to slow down for the school zone on Washington Street for Will Rogers Elementary School more often than non-belted drivers. To test this, we used a radar gun and the camouflage of a parked car to determine how many did so at various times during the week. Of the 1045 cars we measured, only 21 slowed down to the posted speed limit. This came out to be 19 of the 739 belted drivers and 2 of the 306 non-belted drivers.
To test the hypothesis, we performed a two-sample proportions test. According to that test, belted drivers did tend to slow down for the school zone at a higher rate than non-belted drivers (p=0.0222). In fact, we are 95% confident that belted drivers will slow down more frequently than non-belted drivers by 0.46% to 3.37%.

Note that we got excited when our p-value allowed us to reject the null hypothesis. However, the detected difference in slow-down rates was rather pathetic. This illustrates the difference between statistically significant (p-value) and practically significant (confidence interval). It is “great” that there is a difference (p-value), but that difference is quite minor.

And that is it. This example showed how to test comparisons between two population proportions. Here, we were able to determine that belted drivers had a higher probability of slowing down for a school zone than did non-belted drivers.