A graphic of a Normal distribution. Note that the Normal distribution is unimodal, symmetric, and bell-shaped. All Normal distributions look like this. The only differences are the numbers along the x-axis. |
In the previous chapter, we covered the Binomial distribution, which is a very important distribution for modeling discrete random variables. The most important distribtion for this chapter is the Normal distribution, which is even more important. Where the Binomial distribution was only able to model certain types of discrete random variables, the Normal distribution is able to approximately model the distribution of the sample mean (almost always).
The Probability Density Function
The authors would have started Chapter 6 with the Normal distribution if it had a nice shape. It does not. Thus, areas (probabilities) are much more difficult to calculate than in the case of the Uniform distribution. In fact, not even calculus allows us to calculate areas exactly. This is the reason for needing to tabulate areas corresponding to different z-values (Table A.3).
For those who are interested, here is the actual pdf for the Normal distribution.
$$ f(x) = \frac{1}{\sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}} $$
Note that there is an $x$ in the right-hand-side of the pdf. That means the pdf depends on the value of $x$. Thus, its graph is not flat. Thus, it is more difficult to calculate the areas (probabilities).
That is the bad news. The good news is that Excel can calculate those areas for you. In fact, if you would like, you can use Table A.3 to help calculate those probabilities. Table A.3, like most standard Normal tables, provides probabilities for P[Z ≤ z] for values of z between -4 and 4.
As always, make the computer do the calculations. You should worry about interpretation.
In Excel, the function is =NORM.DIST
. In the Excel case, one needs to get the probability statement in the form of P[X ≤ x] (or, equivalently, P[X < x]), where x is a number.
That may seem difficult. However, there are only four options for probability statments. The following table show how to get each option into the “correct inequality form:”
1. | P[X ≤ x] | ||||
2. | P[X ≥ x] | = | 1 - P[X ≤ x] | ||
3. | P[x1 ≤ X ≤ x2] | = | P[X ≤ x2] - P[X ≤ x1] | ||
4. | P[X ≤ x1 or X ≥ x2] | = | P[X ≥ x2] + P[X ≤ x1] | = | 1 - P[X ≤ x2] + P[X ≤ x1] |
As these are the only four possible probability statements, you should learn this table. By the way, the first two are the most common.
Probability Examples
A. Let us be given that the random variable X follows a Normal distribution with mean 0 and standard deviation 2. What is the probability that x is less than 2?
The first step is to translate the words above into probability terms. Here, we are given X ~ N(μ=0; σ=2). We are asked to calculate P[X < 2].
Ans. =NORM.DIST(2,0,2,TRUE)
Ans. 0.8413
B. Let us be given that IQ scores follow a Normal distribution with mean 100 and standard deviation 15. What is the probability that a randomly selected person will have an IQ greater than 90?
Again, the first step is to abstract the words into symbols. Here, we are given IQ ~ N(μ=100; σ=15). We need to calculate P[IQ > 90]. Note the direction of the inequality in the probability statement. It is a Type 2 probability statement, >. Excel can only calculate probabilities when the inequality is < (or ≤). Thus, we will need to switch its direction. From the table above, note that P[IQ > 90] is identical to 1 − P[X ≤ 90]. Thus, we have:
Ans. =1-NORM.DIST(90,100,15,TRUE)
Ans. 0.7475. Thus, about 75% of people have IQs greater than 90.
C. The average height of adult males in Zimbabwe is Normally distributed with mean 68 inches and standard deviation 4 inches. What is the proportion of males that are between 5-foot and 6-foot tall?
Translating this into probability symbols is straight-forward. We are given H ~ N(μ=68; σ=4) and are asked to calculate P[60 < H < 72]. This is a Type 3 probability statement. It is equivalent to P[H < 72] − P[H < 60]. Thus, we will need to calculate two probabilities and subtract them:
Ans. =NORM.DIST(72,68,4,TRUE) − NORM.DIST(60,68,4,TRUE)
Ans. 0.8186. Thus, about 82% of Zimbabwean males are between 5 and 6 feet tall.
D. Let us know that the width of a penny must be between 0.059 and 0.060 inches. If it is outside these bounds, the Treasury rejects it. If the newest penny punch produces pennies with thicknesses that follow a Normal distribution with mean 0.0598 inches and standard deviation 0.0001 inches, what proportion of pennies are rejected?
In probability symbols, this tells us we know T ~ N(μ=0.0598; σ=0.0001). We need to calculate P[T < 0.059 or T > 0.060]. This compound probability statement is a Type 4 probability statement. From the table, we know it is equivalent to P[T < 0.059] + P[T > 0.060], which is the same as P[T < 0.059] + 1 - P[T ≤ 0.060]. Using technology to calculate this, we do
Ans. =1−NORM.DIST(0.060,0.0598,0.0001,TRUE)+NORM.DIST(0.059,0.0598,0.0001,TRUE)
Ans. 0.02275. Thus, approximately 2% of the pennies are rejected.
Those are the basics of the Normal distribution. Everything else is just practice in calculations. Remember this: Probabilities can only take values between 0 and 1. If you get a probability outside that range, you did your calculations incorrectly. The next section covers the z-transform. This transform is used to scale any Normal random variable into a standard Normal random variable. If you are using Table A.3, you will need to perform the z-transform; Table A.3 only gives probabilities for the standard Normal distribution.
The Z-Transform
The above section shows the mechanics of calculating probabilities using technology. There is still a group in statistical education who believe it is important to use tabulated values. This section covers how to use Table V, which tabulates values for the standard Normal distribution.
All Normal distributions can be transformed into the standard Normal distribution using a process called standardization. Let us be given that
$$ X \sim N( \mu; \sigma) $$
That is, let X be a random variable with a Normal distribution having mean μ and standard deviation σ (page 294). If we define a new random variable Z by
$$ Z = \frac{X - \mu}{\sigma} $$
then we say “Z has a standard Normal distribution (Page 296),” which is symbolized as
$$ Z \sim N( 0; 1) $$
Values for P[Z ≤ z] are tabulated in Table A.3.
Reading Table A.3 is easy once you get the hang of it. The numbers along the margins are the z-values. The numbers in its interior are the probabilities P[Z ≤ z]. For instance, to calculate P[Z < 1.23], follow this process:
- Go to Table A.3
- Find “1.2” along the left margin.
- Find “.03” along the top margin.
- Read the number at the intersection of that row and column.
That value is 0.8907.
Note that the values in Table A.3 can also be calculated using Excel. In Excel, =NORM.DIST(1.23,0,1,TRUE)
gives the answer. Because of how easy it is to use technology to get an answer that is more accurate and precise, I do not recommend the table. Use technology.
Check Yourself
Please calculate the probability for each of the following statements.
- P[Z ≤ 1.14] = 0.8729.
Excel:=NORM.DIST(1.14,0,1,TRUE)
- P[Z < -0.76] = 0.2236.
Excel:=NORM.DIST(-0.76,0,1,TRUE)
- P[Z ≥ 1.14] = 0.1271.
Excel:=1-NORM.DIST(1.14,0,1,TRUE)
- P[Z > -2.36] = 0.9909.
Excel:=1-NORM.DIST(1.14,0,1,TRUE)
- P[1.15 < Z < 2.36] = 0.1159.
Excel:=NORM.DIST(2.36,0,1,TRUE)-NORM.DIST(1.15,0,1,TRUE)
- Let X ~ N(μ=1, σ=2). P[X < 2.22] = 0.7291.
Excel:=NORM.DIST(2.22,1,2,TRUE)
- Let X ~ N(μ=10, σ=1). P[X < 12.12] = 0.9830.
Excel:=NORM.DIST(12.12,10,1,TRUE)
- Let X ~ N(μ=-5, σ=3). P[X ≥ 0.15] = 0.0430.
Excel:=1-NORM.DIST(0.15,-5,3,TRUE)
- Let X ~ N(μ=1, σ=0.2). P[0.80 < X < 1.15] = 0.6147.
Excel:=NORM.DIST(1.15,1,0.2,TRUE)-NORM.DIST(0.80,1,0.2,TRUE)
- Let X ~ N(μ=0, σ=2). P[0.80 < X < 1.15] = 0.0619.
Excel:=NORM.DIST(1.15,0,2,TRUE)-NORM.DIST(0.80,0,2,TRUE)
More Practice
Would you like some more practice? Project Scarlet offers it to you. Go to http://statistics.kvasaheim.com/distributions/norm_cdf.php or click on the icon to the right to see!