4.2 Measures of Central Tendency and Spread
Consider the following numerical data (which represents either a sample or a whole population):
10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80
The total number of entries is $n=11$. To properly describe this dataset, we use 3 measures of central tendency and 3 measures of spread. The central tendency measures indicate a representative central value, while the spread measures indicate how dispersed the data is.
1. Measures of Central Tendency (The 3 M's)
NOTICE & NOTATION
The median is not simply the $\dfrac{n}{2}$-th entry as one might expect. Instead, the median corresponds to the $\dfrac{n+1}{2}$-th entry.
- If $n=11$, the median is the $\dfrac{11+1}{2} = 6^{\text{th}}$ entry.
- For an even number of data, the median is the mean of the two middle values. For example, if $n=10$, $\dfrac{10+1}{2} = 5.5$. Thus, the median is the mean of the $5^{\text{th}}$ and $6^{\text{th}}$ entries.
- The median is also denoted by $Q_2$.
- The mean is denoted by the Greek letter $\mu$ for an entire population, or by the Latin letter $\bar{x}$ for a sample.
- Formula for the mean: $\mu = \dfrac{\sum x_i}{n}$.
EXAMPLE 1
- The median implies that $b=5$.
- The mode implies that also $c=5$.
- Using the mean formula:
- The median implies that either $b=c=6$ or ($b=5$ and $c=7$).
- Since the mode is 7, we obtain $b=5$ and $c=d=7$.
- Using the mean formula:
2. Measures of Spread
We use the same set of numerical data for consistency:
10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80
The standard deviation is perhaps the most "reliable" measure for spread, as it takes all data into consideration. It measures how far the entries are distributed away from the mean.
- It is denoted by $\sigma$ (for a whole population) or by $s_n$ (for a sample).
- For our example dataset, using a GDC gives $\sigma = \mathbf{22.96}$.
$\text{Range} = (\text{maximum value}) - (\text{minimum value})$.
Here, $\text{Range} = 80 - 10 = \mathbf{70}$.
$\text{IQR} = Q_3 - Q_1$.
- $Q_1$ = LOWER QUARTILE = the median of the values occurring before $Q_2$.
- $Q_3$ = UPPER QUARTILE = the median of the values occurring after $Q_2$.
For our dataset, $Q_2 = 30$. Before the median we have 5 numbers, making $Q_1 = 20$ (the 3rd entry). After the median we have another 5 numbers, making $Q_3 = 70$ (the 3rd entry from the end).
Therefore, $\text{IQR} = 70 - 20 = \mathbf{50}$.
EXAMPLE 2 (Finding Quartiles)
Remember the fundamental rules for defining quartiles:
- For the value of the median $Q_2$, we consider the $\dfrac{n+1}{2}$-th entry.
- For the values of $Q_1$ and $Q_3$, we strictly consider only the entries before and the entries after the median respectively.
The median is $Q_2 = 40$ (the $4^{\text{th}}$ entry).
Hence, $Q_1 = \mathbf{20}$ and $Q_3 = \mathbf{60}$.
The median is $Q_2 = 45$ (the $4.5^{\text{th}}$ entry).
Hence, $Q_1 = \mathbf{25}$ and $Q_3 = \mathbf{65}$.
The median is $Q_2 = 50$ (the $5^{\text{th}}$ entry).
Hence, $Q_1 = \mathbf{25}$ and $Q_3 = \mathbf{75}$.
The median is $Q_2 = 55$ (the $5.5^{\text{th}}$ entry).
Hence, $Q_1 = \mathbf{30}$ and $Q_3 = \mathbf{80}$.