4.11 Normal Distribution $N(\mu,\sigma^{2})$
1. Description of the Distribution
It is the distribution of a continuous random variable $X$ with values from $-\infty$ to $+\infty$. The parameters of this distribution are:
- $\mathbf{\mu} =$ mean
- $\mathbf{\sigma} =$ standard deviation
The "behavior" of the probability is described by a function which creates a symmetrical bell-shaped curve:
Roughly speaking, there is a highly likely mean value $\mu$ and all the other values of $X$ spread out symmetrically about the mean. As we move away from the mean (either to the left or to the right of the mean), the probability decreases dramatically!
We say that $X$ follows a normal distribution with mean $\mu$ and standard deviation $\sigma$ (or variance $\sigma^2$) and we write:
- Weight of people
- Height of people
- Time spent in a super market
- Weight of a pack of coffee labeled 500 g.
2. The Standard Percentages & Spread
For example, suppose that for a Greek man the mean weight is $\mu=75$ kg and the standard deviation is $\sigma=10$ kg. It is estimated that:
| Percentage of the population | Ranges in general | Between (for our problem) |
|---|---|---|
| about 68.3% of the population | $\mu-\sigma$ and $\mu+\sigma$ | [65, 85] |
| about 95% of the population | $\mu-2\sigma$ and $\mu+2\sigma$ | [55, 95] |
| about 99.7% of the population | $\mu-3\sigma$ and $\mu+3\sigma$ | [45, 105] |
- The whole area under the curve is 1 (i.e. 100%).
- The area before the mean as well as the area after the mean is 0.5 (i.e. 50%).
- Theoretically, the distribution of $X$ ranges from $-\infty$ to $+\infty$. In practice, we assume that almost the whole population (99.7%) ranges between $\mu-3\sigma$ and $\mu+3\sigma$.
- The standard deviation $\sigma$ explicitly indicates the spread of the population.
This implies both populations have the same mean, but Italians are clustered closer to the mean than Greeks.
Almost the whole population ($\mu \pm 3\sigma$) is:
$75 \pm 30 \implies 45\text{ to }105$ kg for Greeks.
$75 \pm 24 \implies 51\text{ to }99$ kg for Italians.
3. Three Types of Problems
We will distinguish three types of problems. In all these problems, we use the GDC in order to find the results. For Casio fx, navigate to MENU $\to$ STAT $\to$ DIST $\to$ NORM: We use Ncd or InvN.
- Data: always use Variable.
- Ncd is used when we ask to find a probability.
- InvN is used when we already know the probability.
PROBLEM 1: FIND PROBABILITY ($\mu, \sigma$ known, we use Ncd)
Consider again the example where $X = \text{the weight of a Greek man}$, with $\mu=75$ kg and $\sigma=10$ kg.
Find the probability that a Greek man weighs:
- (a) between 60 and 82 kg [that is $P(60 \le X \le 82)$]
- (b) more than 82 kg [that is $P(X \ge 82)$]
- (c) less than 60 kg [that is $P(X \le 60)$]
Solution: We use Ncd in the GDC. We set $\sigma=10$, $\mu=75$.
| Question | Ncd Parameters | Press EXE (Result) |
|---|---|---|
| (a) | Lower 60, Upper 82 | 0.691 |
| (b) | Lower 82, Upper 999999... | 0.242 |
| (c) | Lower -999999..., Upper 60 | 0.067 |
For $P(60 \le X \le 82)$ the GDC gives $p=0.691$. Below this result, some extra information is given:
z:Low = -1.5, z:Up = 0.7This physically means that the lower bound is 1.5 standard deviations below $\mu$, and the upper bound is 0.7 standard deviations above $\mu$:
$75 - 1.5 \times 10 = 60$
$75 + 0.7 \times 10 = 82$
We will refer to those values $Z$ later on; they are known as standardized values.
The probabilities above refer to a selection of one person only.
$P(\text{a person is between 60 and 82 kg}) = 0.691$
$P(\text{a person is not between 60 and 82 kg}) = 1 - 0.691 = 0.309$
If we randomly select 10 people, what is the probability that exactly three of them are between 60 and 82 kg?
The normal distribution grants us $p=0.691$.
A binomial distribution (of a new variable $Y$) is defined with $n=10$ and $p=0.691$.
Hence, for $Y \sim B(10, 0.691)$, we use Bpd(3) to obtain: $P(Y=3) = \mathbf{0.0106}$.
PROBLEM 2: PROBABILITY IS GIVEN ($\mu, \sigma$ known, we use InvN)
Again, let $\mu=75$ kg and $\sigma=10$ kg for the variable $X = \text{the weight of a Greek man}$.
The probability that somebody weighs less than $a$ is 0.067. That is: $P(X \le a) = 0.067$. Find $a$.
Tail: Left (it is the area before $a$)Area: 0.067Press EXE and obtain mathematically $\mathbf{a = 60}$ kg.
| Tail: Left | If we know the area before some value |
|---|---|
| Tail: Right | If we know the area after some value |
Hence, the exact same answer above may be obtained by using the right tail. If the area before $a$ is 0.067, the area after $a$ is strictly $1 - 0.067 = 0.933$.
In other words $P(X \ge a) = 0.933$. Then:
Tail: Right
Area: 0.933
Press EXE and obtain identically $\mathbf{a = 60}$ kg.
EXAMPLE 1 (Comprehensive Application)
The mass of packs of a certain type of coffee is normally distributed with a mean of 500 g and standard deviation of 15 g. This implies $X \sim N(500, 15^2)$.
We use Ncd: $P(X \ge 520) \cong \mathbf{0.091}$.
We use InvN:
$P(X \le a) = 0.04 \implies \mathbf{a \cong 474\text{ g}}$
$P(X \ge b) = 0.04 \implies \mathbf{b \cong 526\text{ g}}$
The total rejected proportion is $4\% + 4\% = 8\%$.
$1600 \times 0.08 = \mathbf{128\text{ packs}}$.
$(0.08)^2 = \mathbf{0.0064}$.
This models a Binomial distribution with $n=5$ and $p=0.08$.
$P(Y \ge 1) \cong \mathbf{0.341}$.
In fact, it logically mirrors question (b). We know mathematically that the area before $Q_1$ is strictly 0.25, while the area before $Q_3$ is strictly 0.75. We use InvN:
$P(X \le Q_1) = 0.25 \implies \mathbf{Q_1 \cong 490\text{ g}}$
$P(X \le Q_3) = 0.75 \implies \mathbf{Q_3 \cong 510\text{ g}}$
(Particularly for the median, we could use Tail:Central, Area = 0.5 to find interquartile limits directly).
4. Standardisation - Normal Distribution N(0,1)
Consider the random variable $Z$ which follows a Normal distribution specifically parameterized with $\mu=0$ and $\sigma=1$. This is defined as the standardised normal distribution: $Z \sim N(0,1)$.
Any continuous variable $X$ that follows a normal distribution can be mathematically transformed into the standardised normal variable $Z$ by applying the fundamental formula:
The standardised value of $x=60$ is $Z_{60} = \dfrac{60-75}{10} = \mathbf{-1.5}$
The standardised value of $x=82$ is $Z_{82} = \dfrac{82-75}{10} = \mathbf{0.7}$
Checking by GDC mathematically confirms that $P(60 \le X \le 82) = 0.691$ is exactly identical to $P(-1.5 \le Z \le 0.7) = 0.691$.
5. Reversing the Process to Find Missing Parameters
PROBLEM 3: FIND $\mu$ OR $\sigma$ (we use Standardisation and InvN)
A random variable $X$ follows a normal distribution with $\mu=150$ and $\sigma$ unknown. It is given that 25% is less than 140, that is $P(X \le 140) = 0.25$. Find $\sigma$.
For $X=140$, the $Z$-value can be isolated and obtained securely by the GDC with base logic $\mu=0, \sigma=1$.
The standardized value $Z_{140}$ is obtained by the GDC using InvN:
Tail: LeftArea: 0.25$\sigma$: 1, $\mu$: 0Press EXE and strictly obtain $Z_{140} = -0.6745$.
Then structurally map back to the primary equation: $$ \begin{aligned} -0.6745 &= \dfrac{140-150}{\sigma} \\ \implies \sigma &= \dfrac{-10}{-0.6745} \\ \implies \sigma &\cong \mathbf{14.83} \end{aligned} $$
The standardized value $Z_{140}$ is exactly the same fixed geometry: $Z_{140} = -0.6745$. $$ \begin{aligned} -0.6745 &= \dfrac{140-\mu}{14.83} \\ \implies 140-\mu &\cong -10 \\ \implies \mu &\cong \mathbf{150} \end{aligned} $$
EXAMPLE 2 (System of Equations for Unknown $\mu$ and $\sigma$)
For a random variable $X$ we know mathematically that 35% is less than 60 and 25% is more than 90. That is: $P(X \le 60) = 0.35$ and $P(X \ge 90) = 0.25$. Find $\mu$ and $\sigma$.
Tail: Left, Area: 0.35, $\sigma$: 1, $\mu$: 0 $\implies Z_{60} = -0.385$Tail: Right, Area: 0.25, $\sigma$: 1, $\mu$: 0 $\implies Z_{90} = 0.674$Establishing the dual matrix equations gives: