The Binomial Distribution
A simple coin can be used to illustrate the Binomial distribution. Flip the fair coin n times. The number of heads will have a Binomial(n, p=0.500) distribution. |
A Bernoulli experiment is a single trial with two possible outcomes. The probability of a success is p (also known as the “success probability”). The probability of a failure is q = 1 − p.
If we perform a Bernoulli experiment n times and measure only the number of successes, we have a Binomial experiment. Technically, there are four requirements for an experiment to be a Binomial experiment. Those requirements are listed in your text on Page 266. These are another way of expressng the requirements:
- The number of trials, n is fixed and known.
- The outcome of one trial does not depend on the outcome of the others.
- Each trial has two possible outcomes, success or failure.
- The probability of a success is the same for each trial.
Note that different sources have a different number of requirements. Your textbook lists four requirements. Many others will list a fifth requirement: The random variable is the number of successes.
Check the following to see if each is a Binomial experiment. When you have decided for each, hover your mouse over the grey box to see the answer.
- I flip a coin 100 times and measure the number of heads flipped.
This is a Binomial experiment. It meets all four requirements.
- I pass through five stoplights on the way home and measure the number of times I have to stop.
This is not a Binomial experiment. Stopping at the lights are not independent events. If I stop at the first light, the probability of me having to stop at the next is reduced… unless the traffic engineer hates humanity, in which case it is higher. =)
- The crime rate in a city of 100,000 people.
This is not a Binomial experiment. The number of trials is not known. A person can commit more than one crime in a given year. If we were measuring the number of people who commit a crime, then it would be a Binomial experiment.
- The number of days this week that my business has more than 100 customers.
This is a Binomial experiment. It meets all four requirements.
- The number of times it takes for me to flip 100 heads.
This is not a Binomial experiment. We do not know the number of trials (only the number of successes).
- The number of stoplights I pass through before having to stop.
This is not a Binomial experiment. We do not know the number of trials (only the number of failures).
- The recidivism rate in Oklahoma.
This is a Binomial experiment. The recidivism rate is measured as the number of people released who are returned to jail/prison.
- The number of customers I have this week.
This is not a Binomial experiment. The number of trials is unknown… or the number of outcomes per trial is more than two. It depends on how you define “trial.”
Why this is Important
If we know we have a Binomial experiment, we know a lot about it. We know the probability of each possible outcome. We know what value is most likely. We know the average value (mean). Determining these things is a matter of using the correct formula (or the correct technology). The most important formulas are on pages 266 and 272. To practice them, click on the Project Scarlet link to the right.
Short Example
I have a fair coin. I flip that coin 10 times. What is the probability of getting exactly 7 heads in those 10 flips? Note that the description tells us $n = 10$ and $p = 0.500$.
Let us use Excel to answer this question. In a blank cell, type =BINOM.DIST(7,10,0.500,FALSE)
. Once you hit the Enter key, Excel tell you that the probability of getting exactly 7 heads is 0.1771875.
The Excel formula has four slots. The first is for the number in the probability statement. Here, since we were calculating P[X = 7], this number is 7. The second is n, the number of trials (coin flips). The third is p, the success probability (probability of a head on each flip). The fourth is FALSE if you are calculating an = probability and TRUE if you are calculating an ≤ probability. As this probability statement was P[X = 7], we used FALSE.
Now, for practice, calculate these five probabilities:
- What is the probability of getting exactly 3 heads?
P[H = 3] = 0.1171875. The Excel formula is =BINOM.DIST(3,10,0.5,FALSE)
- What is the probability of getting exactly 1 head?
P[H = 1] = 0.009765625. The Excel formula is =BINOM.DIST(1,10,0.5,FALSE)
- What is the probability of getting two or fewer heads?
P[H ≤ 2] = 0.0546875. The Excel formula is =BINOM.DIST(2,10,0.5,TRUE)
- What is the probability of getting 7 or more heads?
P[H ≥ 7] = 1 − P[H ≤ 6] = 0.171875. The Excel formula is =1 − BINOM.DIST(6,10,0.5,TRUE)
- What is the probability of getting more than 6 heads?
P[H > 6] = 1 − P[H ≤ 6] = 0.171875. The Excel formula is =1 − BINOM.DIST(6,10,0.5,TRUE)
Here are some hints. The following are the above written in symbols:
- P[H = 3]
- P[H = 1]
- P[H ≤ 2]
- P[H ≥ 7]
- P[H > 6] = P[H ≥ 7]
That’s it! The computer did the calculations, as it needed to do.
Remember: Statistics is more about the interpretation than it is about the calculations. Get the computer to do the calculations for you. Spend your brain power on interpreting the results.
In a couple of the above examples, the inequality is either > or ≥. Since Excel requires the probabilities to be =, < or ≤, we used the Rule of Complements (Page 229) to perform the calculations. You will see that rule quite frequently, so make sure you understand it.
Uses of the Binomial
In my experience as a statistician, I have used the Binomial probability to model many processes. These processes range from presidential elections to traffic light placement. Once I knew I had a Binomial experiment, I could ignore a lot and focus on estimating the two parameters: n and p. Usually, n is easy to determine; it is the number of trials. Also usually, p is hard to determine. A lot of time goes into estimating the success probability, p.
Once I have determined n and p, I know everything about the process. For instance, I know that the expected number of successes is $np$ (see Page 272). I know that the variance in the number of successes is $npq$ and the standard deviation is $\sqrt{npq}$ (see Page 272). I know all of the probabilities of all possible outcomes and combination of outcomes. There is a formula available for this (Page 266), a table (Table A.1), and several calculators (e.g., StatTrek). The difficulty is only in estimating p, which is beyond the scope of this part, but will be covered in Part III: Inference.