4.11 Normal Distribution $N(\mu,\sigma^{2})$

1. Description of the Distribution

It is the distribution of a continuous random variable $X$ with values from $-\infty$ to $+\infty$. The parameters of this distribution are:

  • $\mathbf{\mu} =$ mean
  • $\mathbf{\sigma} =$ standard deviation

The "behavior" of the probability is described by a function which creates a symmetrical bell-shaped curve:

μ -∞ +∞ x

Roughly speaking, there is a highly likely mean value $\mu$ and all the other values of $X$ spread out symmetrically about the mean. As we move away from the mean (either to the left or to the right of the mean), the probability decreases dramatically!

We say that $X$ follows a normal distribution with mean $\mu$ and standard deviation $\sigma$ (or variance $\sigma^2$) and we write:

$X \sim N(\mu,\sigma^{2})$
General Context: It is the most "popular" distribution in nature. Random variables which depend on many factors follow this distribution. For example:
  • Weight of people
  • Height of people
  • Time spent in a super market
  • Weight of a pack of coffee labeled 500 g.

2. The Standard Percentages & Spread

For example, suppose that for a Greek man the mean weight is $\mu=75$ kg and the standard deviation is $\sigma=10$ kg. It is estimated that:

Percentage of the population Ranges in general Between (for our problem)
about 68.3% of the population $\mu-\sigma$ and $\mu+\sigma$ [65, 85]
about 95% of the population $\mu-2\sigma$ and $\mu+2\sigma$ [55, 95]
about 99.7% of the population $\mu-3\sigma$ and $\mu+3\sigma$ [45, 105]
68.3% 65 75 85 -∞ +∞ x
NOTICE:
  • The whole area under the curve is 1 (i.e. 100%).
  • The area before the mean as well as the area after the mean is 0.5 (i.e. 50%).
  • Theoretically, the distribution of $X$ ranges from $-\infty$ to $+\infty$. In practice, we assume that almost the whole population (99.7%) ranges between $\mu-3\sigma$ and $\mu+3\sigma$.
  • The standard deviation $\sigma$ explicitly indicates the spread of the population.
Example of Spread: Assume Greeks have $\mu=75$ kg, $\sigma=10$ kg and Italians have $\mu=75$ kg, $\sigma=8$ kg.
This implies both populations have the same mean, but Italians are clustered closer to the mean than Greeks.
Almost the whole population ($\mu \pm 3\sigma$) is:
$75 \pm 30 \implies 45\text{ to }105$ kg for Greeks.
$75 \pm 24 \implies 51\text{ to }99$ kg for Italians.

3. Three Types of Problems

We will distinguish three types of problems. In all these problems, we use the GDC in order to find the results. For Casio fx, navigate to MENU $\to$ STAT $\to$ DIST $\to$ NORM: We use Ncd or InvN.

  • Data: always use Variable.
  • Ncd is used when we ask to find a probability.
  • InvN is used when we already know the probability.

PROBLEM 1: FIND PROBABILITY ($\mu, \sigma$ known, we use Ncd)

Consider again the example where $X = \text{the weight of a Greek man}$, with $\mu=75$ kg and $\sigma=10$ kg.
Find the probability that a Greek man weighs:

  • (a) between 60 and 82 kg [that is $P(60 \le X \le 82)$]
  • (b) more than 82 kg [that is $P(X \ge 82)$]
  • (c) less than 60 kg [that is $P(X \le 60)$]
0.691 0.067 0.242 60 75 82 -∞ +∞ x

Solution: We use Ncd in the GDC. We set $\sigma=10$, $\mu=75$.

Question Ncd Parameters Press EXE (Result)
(a) Lower 60, Upper 82 0.691
(b) Lower 82, Upper 999999... 0.242
(c) Lower -999999..., Upper 60 0.067
NOTICE: GDC Output Details
For $P(60 \le X \le 82)$ the GDC gives $p=0.691$. Below this result, some extra information is given:
z:Low = -1.5, z:Up = 0.7
This physically means that the lower bound is 1.5 standard deviations below $\mu$, and the upper bound is 0.7 standard deviations above $\mu$:
$75 - 1.5 \times 10 = 60$
$75 + 0.7 \times 10 = 82$
We will refer to those values $Z$ later on; they are known as standardized values.
Combining Normal and Binomial Distributions
The probabilities above refer to a selection of one person only.
$P(\text{a person is between 60 and 82 kg}) = 0.691$
$P(\text{a person is not between 60 and 82 kg}) = 1 - 0.691 = 0.309$

If we randomly select 10 people, what is the probability that exactly three of them are between 60 and 82 kg?
The normal distribution grants us $p=0.691$.
A binomial distribution (of a new variable $Y$) is defined with $n=10$ and $p=0.691$.
Hence, for $Y \sim B(10, 0.691)$, we use Bpd(3) to obtain: $P(Y=3) = \mathbf{0.0106}$.

PROBLEM 2: PROBABILITY IS GIVEN ($\mu, \sigma$ known, we use InvN)

Again, let $\mu=75$ kg and $\sigma=10$ kg for the variable $X = \text{the weight of a Greek man}$.
The probability that somebody weighs less than $a$ is 0.067. That is: $P(X \le a) = 0.067$. Find $a$.

0.067 a 75 -∞ +∞ x
Solution: We use InvN. We set the parameters $\sigma=10$, $\mu=75$. Then:
Tail: Left (it is the area before $a$)
Area: 0.067
Press EXE and obtain mathematically $\mathbf{a = 60}$ kg.
Tail: Left If we know the area before some value
Tail: Right If we know the area after some value

Hence, the exact same answer above may be obtained by using the right tail. If the area before $a$ is 0.067, the area after $a$ is strictly $1 - 0.067 = 0.933$.
In other words $P(X \ge a) = 0.933$. Then:
Tail: Right
Area: 0.933
Press EXE and obtain identically $\mathbf{a = 60}$ kg.

EXAMPLE 1 (Comprehensive Application)

The mass of packs of a certain type of coffee is normally distributed with a mean of 500 g and standard deviation of 15 g. This implies $X \sim N(500, 15^2)$.

(a) Find the probability that a pack weighs more than 520 g.
We use Ncd: $P(X \ge 520) \cong \mathbf{0.091}$.
(b) The lightest 4% of the packs weigh less than a, while the heaviest 4% weigh more than b. Find a and b.
We use InvN:
$P(X \le a) = 0.04 \implies \mathbf{a \cong 474\text{ g}}$
$P(X \ge b) = 0.04 \implies \mathbf{b \cong 526\text{ g}}$
(c) The packs in question (b) are rejected. In a daily production of 1600 packs, how many are expected to be rejected?
The total rejected proportion is $4\% + 4\% = 8\%$.
$1600 \times 0.08 = \mathbf{128\text{ packs}}$.
(d) We select 2 packs. Find the probability that both are rejected.
$(0.08)^2 = \mathbf{0.0064}$.
(e) We select 5 packs. Find the probability that at least one is rejected.
This models a Binomial distribution with $n=5$ and $p=0.08$.
$P(Y \ge 1) \cong \mathbf{0.341}$.
(f) Find $Q_1$ and $Q_3$, the lower and upper quartiles of the weights.
In fact, it logically mirrors question (b). We know mathematically that the area before $Q_1$ is strictly 0.25, while the area before $Q_3$ is strictly 0.75. We use InvN:
$P(X \le Q_1) = 0.25 \implies \mathbf{Q_1 \cong 490\text{ g}}$
$P(X \le Q_3) = 0.75 \implies \mathbf{Q_3 \cong 510\text{ g}}$
(Particularly for the median, we could use Tail:Central, Area = 0.5 to find interquartile limits directly).

4. Standardisation - Normal Distribution N(0,1)

Consider the random variable $Z$ which follows a Normal distribution specifically parameterized with $\mu=0$ and $\sigma=1$. This is defined as the standardised normal distribution: $Z \sim N(0,1)$.

Any continuous variable $X$ that follows a normal distribution can be mathematically transformed into the standardised normal variable $Z$ by applying the fundamental formula:

$Z = \dfrac{X-\mu}{\sigma}$
For our primary example, $X \sim N(75, 10^2)$, standardising yields:
$Z = \dfrac{X-75}{10}$
We calculate that:
The standardised value of $x=60$ is $Z_{60} = \dfrac{60-75}{10} = \mathbf{-1.5}$
The standardised value of $x=82$ is $Z_{82} = \dfrac{82-75}{10} = \mathbf{0.7}$

Checking by GDC mathematically confirms that $P(60 \le X \le 82) = 0.691$ is exactly identical to $P(-1.5 \le Z \le 0.7) = 0.691$.
0.691 60 75 82 X standardisation 0.691 -1.5 0 0.7 Z

5. Reversing the Process to Find Missing Parameters

PROBLEM 3: FIND $\mu$ OR $\sigma$ (we use Standardisation and InvN)

A random variable $X$ follows a normal distribution with $\mu=150$ and $\sigma$ unknown. It is given that 25% is less than 140, that is $P(X \le 140) = 0.25$. Find $\sigma$.

Solution: We use the fundamental standardisation formula: $Z = \dfrac{X-\mu}{\sigma}$.
For $X=140$, the $Z$-value can be isolated and obtained securely by the GDC with base logic $\mu=0, \sigma=1$.
The standardized value $Z_{140}$ is obtained by the GDC using InvN:
Tail: Left
Area: 0.25
$\sigma$: 1, $\mu$: 0
Press EXE and strictly obtain $Z_{140} = -0.6745$.

Then structurally map back to the primary equation: $$ \begin{aligned} -0.6745 &= \dfrac{140-150}{\sigma} \\ \implies \sigma &= \dfrac{-10}{-0.6745} \\ \implies \sigma &\cong \mathbf{14.83} \end{aligned} $$
A directly analogous mathematical procedure handles an unknown $\mu$. Suppose that $\sigma=14.83$ but $\mu$ is unknown, under the identical constraint $P(X \le 140) = 0.25$.
The standardized value $Z_{140}$ is exactly the same fixed geometry: $Z_{140} = -0.6745$. $$ \begin{aligned} -0.6745 &= \dfrac{140-\mu}{14.83} \\ \implies 140-\mu &\cong -10 \\ \implies \mu &\cong \mathbf{150} \end{aligned} $$

EXAMPLE 2 (System of Equations for Unknown $\mu$ and $\sigma$)

For a random variable $X$ we know mathematically that 35% is less than 60 and 25% is more than 90. That is: $P(X \le 60) = 0.35$ and $P(X \ge 90) = 0.25$. Find $\mu$ and $\sigma$.

Solution: The standardized boundaries $Z_{60}$ and $Z_{90}$ can be rigorously obtained by the standard InvN GDC procedure:
Tail: Left, Area: 0.35, $\sigma$: 1, $\mu$: 0 $\implies Z_{60} = -0.385$
Tail: Right, Area: 0.25, $\sigma$: 1, $\mu$: 0 $\implies Z_{90} = 0.674$

Establishing the dual matrix equations gives:
$-0.385 = \dfrac{60-\mu}{\sigma} \quad \text{and} \quad 0.674 = \dfrac{90-\mu}{\sigma}$
We subsequently obtain the linear algebraic system: $$ \begin{aligned} \mu - 0.385\sigma &= 60 \\ \mu + 0.674\sigma &= 90 \end{aligned} $$ The complete mathematical solution of the simultaneous system is $\mathbf{\mu = 70.9}$ and $\mathbf{\sigma = 28.3}$.
Notice (Only for HL): Since you know fundamentally $E(X)$ and $Var(X)$, you implicitly know $E(X^2)$ globally. Indeed: $$ \begin{aligned} Var(X) &= E(X^2) - (E(X))^2 \\ \implies E(X^2) &= Var(X) + (E(X))^2 \\ \implies E(X^2) &= \sigma^2 + \mu^2 \end{aligned} $$