This provides an example of how (and why) to perform the chi-squared goodness-of-fit test. Before working through this, please re-read Section 12.1. That should give you some background. More background on the Benford Test can be found at Wikipedia.

The Research Question

[The Giza Necropolis]

The Giza Necropolis. Photo courtesy Yasser Nazmi.

The research question is a question that frames your interest in broad terms. It ends in a question mark and should be interesting to someone. For this example, the student was working on research with me. She was interested in this problem, because she came from Egypt, fleeing just before Mubarek’s regime fell.

Is there evidence of unfairness in the 2014 Egyptian Constitutional Referendum?

The Hypotheses

This is a case where the research hypothesis really does not matter. We are testing for the presence of unfairness in the 2014 Egyptian Constitutional Referendum. It is, in such cases, better for the scientist to stand back from the analysis and just allow the data to tell the story.

Even though there is no research hypothesis, there are still the null and the alternative hypotheses. These are formulated in the ususal way. The null hypothesis is that there is “no difference” between the hypothesized distribution and the observed data. The alternative is the opposite of the null hypothesis (that there is a difference).

Planning

[Saint Katherine]

Saint Katherine in the Sinai Penninsula. Photo courtesy Zoltan Matrahazi.

Now that we have our null hypothesis, we begin planning the data collection and analysis. The data collection is rather easy: The data is collected and sent to my by the Egyptian government. (Sometimes, it is really nice being an academic!)

The distribution we will use is the Benford distribution. It is frequently used in digit tests, which are a set of tests designed to test hypotheses about distributions. The Benford distribution specifies the probabilities of leading digits (the first digit in the number) for “natural” numbers.

In other words, we are testing if the observed frequency distribution matches the distribution expected under the Benford distribution. Because we are comparing an observed distribution to a hypothesized distribution, we can use the Chi-Square Goodness-of-Fit test.

Execute the Plan

My research assistant collected the data from Egypt. She then tabulated the leading digits for the “Yes” votes. Here is that tabulation:

Table 1: Observed frequency distribution for leading digits in the 2014 Egyptian Constitutional Referendum.
Digit:123456789
Observed:1052422110

From this, note that there are $n=27$ provinces.

Analyze the Data

Now, with the data, we can test the null hypothesis. This can be done in StatCrunch and Excel. However, before doing it in either, you need to calculate (or have Excel calculate) the expected frequency distribution. According to the Benford Law, the (relative frequency) distribution of leading digits is

Table 2: Probability distribution for leading digits according to the Benford distribution.
Digit:123456789
Expected:0.3010.1760.1250.0970.0790.0670.0580.0510.046

That is the table of probabilities (or expected relative frequencies). It is not the table of expected frequencies. Since relative frequencies are p and expected values are $E[X] = np$, we just need to multiply each of those probabilities by the sample size, $n=27$ (this sure looks like the expected value from a Binomial distribution, right?):

Table 3: Expected frequency distribution for leading digits according to the Benford distribution.
Digit:123456789
Expected:8.1274.7523.3752.6192.1331.8091.5661.3771.242

And that is it, the expected frequency distribution of leading digits. The Chi-Square Goodness-of-Fit test compares what we observed to this expected distribution. And so, we turn to technology to perform the necessary test.

Here is a link to the Excel spreadsheet that performs these calculations.

Interpret the Results

[Essam Azzam Street]

Essam Azzam Street in Cairo. Photo courtesy Mohammed Moussa.

According to our statistics program, the p-value is 0.9133 (the test statistic is X2=3.3112). As the p-value is greater than our α=0.05, we cannot reject the null hypothesis. In other words, we did not detect evidence of unfairness in the 2014 Egyptian Constitutional Referendum.

Remember when you were having problems wondering why you could not “accept” the null hypothesis when the p-value was large enough? Well, this shows you the reason. There are two possible reasons we failed to reject the null hypothesis: there was actually no fraud, or the test could not detect the fraud that existed. One cause of the latter reason is that the test’s power is low, usually due to a small sample size. Here, $n=27$, which is rather small.

The Discussion

Again, we failed to reject the null hypothesis. This does not mean there was no unfairness in the election. It just means we did not detect it with this test. Was there no unfairness? Was our sample size too small? Who knows. All we know is that we did not find unfairness using this method.

And that is it. This example showed how to test if the observed distribution is close enough to the hypothesized distribution. Here, we did not detect a difference between the two. This could be due to the lack of unfairness in the election or to the small sample size (low power). We cannot tell without additional data.

If you are interested in this problem, I have done further analysis of this election. You can find it at the post at the Center for Electoral Forensics. It may be interesting because the analysis relies on linear regression, which was the last module.