A Fair Coin?

This provides an example of how (and why) to perform the chi-squared goodness-of-fit test. Before working through this, please review the Chi-Square Goodness-of-Fit test and write down your questions about it.

The Research Question


The 2016 D Lincoln Shield Cent Penny. Photo courtesy USA Coin Book.

The research question is a question that frames your interest in broad terms. It ends in a question mark and should be interesting to someone. Since this is a “toy” example, it may be interesting to no one beyond its pedagogical use.

Is the 2016 US penny fair? That is, if I spin it on the table-top, will it come up heads 50% of the time?

The Hypotheses

This is a case where the research hypothesis really does not matter. Either it will be fair or it will not be fair. Either answer to that question will produce the same null and alternative hypotheses:

H₀ : p_H = p_T
H₁ : p_H ≠ p_T

Note: One could answer this research question using a previous test we have studied (one-sample proportions test). However, I am looking at this in terms of it being a Chi-Square test to illustrate that there are often multiple tests that can be used in statistics. Sometimes, the tests are equivalent. Sometimes, it is unclear which test is better. Sometimes, one test is always better. One thing statisticians do is determine which test is better, when there are multiple tests.

Planning


The Silver Australian Dollar. Photo courtesy Penny Pincher Coins.

Now that we have our null and alternative hypotheses, we begin planning the data collection and analysis. The data collection is rather easy: We spin a 2016 penny many times. The more times we spin it, the better our estimate. Unfortunately, the more times we spin it, the more resources this analysis consumes (time, here). There is always a trade-off between power and resource consumption... alas! So, since I have a free hour, I will spin the coin 1000 times.

Now that we have planned the data collection, we need to plan the data analysis. We will use the Chi-Square test here. That requires we know the observed counts (we get those from the data collection) and the expected counts (we get those from our null hypothesis).

The expected counts are due to our null hypothesis that the coin is fair. The term “fair” means the probability of a Head coming up is the same as the probability of a Tail coming up... 50%. Since we are flipping the coin 1000 times, the expected number of Heads is $np = 1000 \times 0.500 = 500$. Similarly, the expected number of Tails is $500$.

Execute the Plan

I spent an hour collecting this really exciting data. Here is that tabulation:

Table 1: Observed frequency distribution for each face in coin spinning.
Face:	Head	Tail
Observed:	612	388

From this, note that there are n=1000 spins.

Analyze the Data

Now, with the data, we can test the null hypothesis. This can be done in StatCrunch and Excel. However, before doing it in either, you need to calculate (or have Excel calculate) the expected frequency distribution, which we did above. The expected number of Heads is 500; of Tails, 500.

Table 2: Expected frequency distribution for each face in coin spinning.
Face:	Head	Tail
Expected:	500	500

Now, we have the observed counts and expected counts. Since we are doing this completely by hand, note the formula for the test statistic is

$$ X2 = \sum \frac{(\text{Obs} - \text{Exp})^2}{\text{Exp}} $$

So, to calculate this test statistic, we need to first calculate the $\text{Obs} - \text{Exp}$:

Table 3: Expanded table for each face in coin spinning to include the deviance row.
Face:	Head	Tail
Observed:	612	388
Expected:	500	500
Obs − Exp:	112	-112

Now, let us finish the numerator of the test statistic calculation by squaring the values of the third row:

Table 4: Expanded table for each face in coin spinning to include the squared deviance row.
Face:	Head	Tail
Observed:	612	388
Expected:	500	500
Obs − Exp:	112	-112
(Obs − Exp)²:	12544	12544

Next, we divide the numerators by the Expected values. I enter that as the next row:

Table 5: Expanded table for each face in coin spinning to include the adjusted squared deviance row.
Face:	Head	Tail
Observed:	612	388
Expected:	500	500
Obs − Exp:	112	-112
(Obs − Exp)²:	12544	12544
(Obs − Exp)²/Exp:	25.088	25.088

That last line is everything in calculating the test statistic except for the summaton, Σ. So, we finish our calculation by adding up those adjusted squared deviances: We get X2 = 50.176.

Interpret the Results


The 2008 Canadian Dollar, otherwise known as the Loonie. Photo courtesy Coin News.

The traditional method: Using the Chi-Square Table in our book, we can get the critical value (1 degree of freedom) to be 3.84. Since our observed test statistic is more extreme than the critical value, we reject the null hypothesis.

The p-value method: To calculate a p-value, we need the computer. From the computer, we get a p-value that is much less than 0.0001. Since the p-value is less than our alpha (0.05), we reject the null hypothesis. This is (and will always be) the same conclusion as we got in the traditional method above.

The Discussion

But, what does this mean in real life? Since we rejected the null hypothesis, we can conclude that the 2016 penny is not fair when we spin it. (It is still fair when we flip it, however.) This is something to keep in mind if we need to decide between two options. If you want the outcome to be fair, flip the penny. If you want the Head option to win, spin it. While you are not guaranteed a win, you do have a greater probability of winning.

This is something to keep in mind if we need to decide between two options. If you want the outcome to be fair, flip the penny. If you want the Head option to win, spin it. While you are not guaranteed a win, you do have a greater probability of winning.

And that is it. This example showed how to test if the observed distribution is sufficiently close to the hypothesized distribution. Here, we did detect a difference between the two. As such, we were able to conclude that the 2016 penny was not fair when spun.