4.2 Measures of Central Tendency and Spread

Consider the following numerical data (which represents either a sample or a whole population):
10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80

The total number of entries is $n=11$. To properly describe this dataset, we use 3 measures of central tendency and 3 measures of spread. The central tendency measures indicate a representative central value, while the spread measures indicate how dispersed the data is.

1. Measures of Central Tendency (The 3 M's)

A) MEAN: The sum of all values divided by $n$.
$$\text{mean} = \dfrac{10+20+20+20+30+30+40+50+70+70+80}{11} = \mathbf{40}$$

B) MODE: The most frequent value.
Here, $\text{mode} = \mathbf{20}$.

C) MEDIAN: The value in the middle (provided they have been placed in ascending order).
Here, it is the sixth number in the ordered list: $\text{median} = \mathbf{30}$.

NOTICE & NOTATION

Calculating the Median Position:
The median is not simply the $\dfrac{n}{2}$-th entry as one might expect. Instead, the median corresponds to the $\dfrac{n+1}{2}$-th entry.
  • If $n=11$, the median is the $\dfrac{11+1}{2} = 6^{\text{th}}$ entry.
  • For an even number of data, the median is the mean of the two middle values. For example, if $n=10$, $\dfrac{10+1}{2} = 5.5$. Thus, the median is the mean of the $5^{\text{th}}$ and $6^{\text{th}}$ entries.

Standard Variables:
  • The median is also denoted by $Q_2$.
  • The mean is denoted by the Greek letter $\mu$ for an entire population, or by the Latin letter $\bar{x}$ for a sample.
  • Formula for the mean: $\mu = \dfrac{\sum x_i}{n}$.

EXAMPLE 1

a) Find the integers $a \le b \le c$, given that mean = 4, mode = 5, and median = 5.
  • The median implies that $b=5$.
  • The mode implies that also $c=5$.
  • Using the mean formula:
$$\begin{aligned} \dfrac{a+5+5}{3} &= 4 \\ a+10 &= 12 \\ a &= 2 \end{aligned}$$
Therefore, the numbers are $\mathbf{2, 5, 5}$.

b) Find the integers $a \le b \le c \le d$, given that mean = 5, mode = 7, and median = 6.
  • The median implies that either $b=c=6$ or ($b=5$ and $c=7$).
  • Since the mode is 7, we obtain $b=5$ and $c=d=7$.
  • Using the mean formula:
$$\begin{aligned} \dfrac{a+5+7+7}{4} &= 5 \\ a+19 &= 20 \\ a &= 1 \end{aligned}$$
Therefore, the numbers are $\mathbf{1, 5, 7, 7}$.

2. Measures of Spread

We use the same set of numerical data for consistency:
10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80

A) STANDARD DEVIATION:
The standard deviation is perhaps the most "reliable" measure for spread, as it takes all data into consideration. It measures how far the entries are distributed away from the mean.
  • It is denoted by $\sigma$ (for a whole population) or by $s_n$ (for a sample).
  • For our example dataset, using a GDC gives $\sigma = \mathbf{22.96}$.

B) RANGE:
$\text{Range} = (\text{maximum value}) - (\text{minimum value})$.
Here, $\text{Range} = 80 - 10 = \mathbf{70}$.

C) INTERQUARTILE RANGE (IQR):
$\text{IQR} = Q_3 - Q_1$.
  • $Q_1$ = LOWER QUARTILE = the median of the values occurring before $Q_2$.
  • $Q_3$ = UPPER QUARTILE = the median of the values occurring after $Q_2$.

For our dataset, $Q_2 = 30$. Before the median we have 5 numbers, making $Q_1 = 20$ (the 3rd entry). After the median we have another 5 numbers, making $Q_3 = 70$ (the 3rd entry from the end).
Therefore, $\text{IQR} = 70 - 20 = \mathbf{50}$.

0 10 20 30 40 50 60 70 80 Min Q₁ Q₂ Q₃ Max

EXAMPLE 2 (Finding Quartiles)

Remember the fundamental rules for defining quartiles:

  • For the value of the median $Q_2$, we consider the $\dfrac{n+1}{2}$-th entry.
  • For the values of $Q_1$ and $Q_3$, we strictly consider only the entries before and the entries after the median respectively.
a) For $n=7$ entries: $10, 20, 30, 40, 50, 60, 70$
The median is $Q_2 = 40$ (the $4^{\text{th}}$ entry).
Hence, $Q_1 = \mathbf{20}$ and $Q_3 = \mathbf{60}$.
b) For $n=8$ entries: $10, 20, 30, 40, 50, 60, 70, 80$
The median is $Q_2 = 45$ (the $4.5^{\text{th}}$ entry).
Hence, $Q_1 = \mathbf{25}$ and $Q_3 = \mathbf{65}$.
c) For $n=9$ entries: $10, 20, 30, 40, 50, 60, 70, 80, 90$
The median is $Q_2 = 50$ (the $5^{\text{th}}$ entry).
Hence, $Q_1 = \mathbf{25}$ and $Q_3 = \mathbf{75}$.
d) For $n=10$ entries: $10, 20, 30, 40, 50, 60, 70, 80, 90, 100$
The median is $Q_2 = 55$ (the $5.5^{\text{th}}$ entry).
Hence, $Q_1 = \mathbf{30}$ and $Q_3 = \mathbf{80}$.