This provides an example of one of the three tests you can run with the information in this module — the two-sample means test. The structure of the first follows a scientific method to emphasize the underlying structure to research. This example uses the techniques of Section 10.1. The videos at the bottom shows the calculations in Excel.
The Research Question
The Milton S. Eisenhower Library at the Johns Hopkins University. Photo courtesy the Library of Congress. |
The research question is a question that frames your interest in broad terms. It ends in a question mark and should be interesting to someone. For this example, the student was completing an assignment for me. The interest was higher than expected.
Are male students at Oklahoma State University taller than female students, on average?
This seems to be a settled question with a well-known answer. Thus, it may not be interesting, but it does help illustrate the importance of sample size and power in statistical tests.
The Hypotheses
The research hypothesis is a proposed answer to the research question. From experience, we decided that the answer would be “yes;” male students are taller than female, on average. Translated into symbols, this is
μm > μf
The population parameter is μ, the population mean. The (in)equality sign is “greater than.” The two groups are the “male” group and the “female” group (hence the subscripts).
The Null Hypothesis
The research hypothesis is what the scientist cares about... the only thing. However, because of probability and the randomness of life, statisticians need two other hypotheses: the null and the alternative. For this course, the null hypothesis is always the same as the research hypothesis, but with the (in)equality changed to an equality. This, the null hypothesis here is
H0 : μm ≤ μf
This null hypothesis is identical to
H0 : μm − μf ≤ 0
Why would I write it in this form? This form offers two things. First, it emphasizes that we are testing one single hypothesis about μm − μf. We are not testing two hypotheses: one about μm and one about μf.
Second it emphasizes that we are hypothesizing that this (population) value is less than or equal to zero. There may come a time when you are needing to test whether the difference is some value other than 0. Perhaps you need to test a hypothesis that July is 107°F hotter than August, on average. In such a case, the null hypothesis would be μJul − μAug = 107.
The Alternative Hypothesis
The alternative hypothesis is either the research hypothesis or its opposite. If there is an equals part to the research hypothesis, then the alternative hypothesis is the opposite of the research hypothesis. If there is no equals part, then the alternative hypothesis is the research hypothesis. Thus, for this example,
H1 : μm − μf > 0
Recall Table 1 in a previous mini-lecture. Again, feel free to learn that table.
Planning
Now that we have our null hypothesis and a better understanding of the processes involved in creating the data, we can explicitly write our plan, which allows others to replicate our work. This is not an ideal plan, but it is what the student did.
- Go to the library.
- Approach 10 students, 5 male and 5 female.
- Measure their heights.
- Measure their genders.
- Record the data.
The second aspect of planning is planning the analysis. Here, it will be straight forward. We need to draw a conclusion about comparing two populations based on a mean. The correct test is the two independent sample means test.
This will be the independent sample test because we are not doing repeated measures on 5 students to get the 10 measurements. Were we measuring the heights of five students in July and the same five students in August, the samples would be dependent. This is not the case, the males and females are different people.
Execute the Plan
Now that we have a plan, we just need to execute it.
The student collected all of the data. For male students, he got an average height of 70in. and a standard deviation of 1.9in. For female students, he got an average height of 68in. and a standard deviation of 2.0in.
Here is the raw data:
Male | Female |
---|---|
70 | 69 |
68 | 69 |
69 | 68 |
73 | 70 |
70 | 67 |
Analyze the Data
Now, with the data, we can test the null hypothesis. In Excel, we use this menu trail DATA | Data Analysis | t-test: Two-Sample Assuming Unequal Variances
. Fill in the necessary information, then click on OK. Doing so gives a new worksheet with the results. Unfortunately, it only provides capabilities to test hypotheses. So, you will need to calculate confidence intervals “by hand” in Excel.
Get out of the pool!
The pool is closed.
So, why do we not pool? The answer is that pooling is a result of assuming the two population variances are the same. Since we do not know the individual population variances, how can we know if they are equal? Thus, we should not pool (unless our boss says otherwise).
Interpret the Results
Again, now that we have performed our analysis, we tell the world about it in our conclusion.
Because the p-value of 0.0706 is greater than our α=0.05, we cannot reject the null hypothesis. We cannot conclude that male students are taller than female students, on average.
A Quick Discussion
We all know that men are taller than women, on average. What happened here? The sample size was too small to be able to detect a difference. The resulting test was not powerful enough.
I leave it as practice for you to determine if a sample size of 15 males and 15 females (with the same sample statistics) is large enough to detect a difference. Leave a message in this discussion forum if you do this.
And that is it. This example showed how to test comparisons between two population means. Here, we were unable to conclude that males are taller, on average, than females. Note that we did not conclude that females were taller than men or that females were the same height as men, on average. We failed to reject the null hypothesis. This means that we know nothing about the relationship between the average male height and the average female height.