4.3 Frequency Tables & Grouped Data

1. Frequency and Cumulative Frequency Tables

Consider again the numerical data set we used previously:
10, 20, 20, 20, 30, 30, 40, 50, 70, 70, 80

Instead of listing every individual value, an alternative and more compact way of presentation is the Frequency Table. To assist in locating the median and quartiles, it is highly beneficial to add an extra column with the Cumulative Frequency (c.f.), which represents a running total of the frequencies.

Data ($x$) Frequency ($f$) Cumulative frequency (c.f.)
10 1 1
20 3 4
30 2 6
40 1 7
50 1 8
70 2 10
80 1 11
$n = 11$

EXAMPLE 1 (Extracting Measures from a Frequency Table)

1. Mean ($\mu$): Multiply each data value by its respective frequency and divide by the total $n$.
$$\text{Mean} = \dfrac{\sum (x \times f)}{n} = \dfrac{10(1) + 20(3) + 30(2) + 40(1) + 50(1) + 70(2) + 80(1)}{11} = \mathbf{40}$$

2. Mode: The entry $x$ that corresponds to the highest frequency.
The highest frequency is 3, which corresponds to the value $x = \mathbf{20}$.

3. Median ($Q_2$): The middle value, which is positioned at $\dfrac{n+1}{2} = \dfrac{11+1}{2} = 6^{\text{th}}$ entry.
Looking at the cumulative frequency column, the total reaches 6 exactly at the row where $x = \mathbf{30}$. Thus, the median is 30.

4. Interquartile Range (IQR): $\text{IQR} = Q_3 - Q_1$.
  • There are 5 entries before the median. The middle of those 5 entries is the $3^{\text{rd}}$ entry. Looking at the c.f. column, the 3rd entry falls into the row where $x = 20$. So, $Q_1 = \mathbf{20}$.
  • There are 5 entries after the median. The middle of those 5 entries is the $3^{\text{rd}}$ entry from the end (or the $9^{\text{th}}$ overall entry). Looking at the c.f. column, the total crosses 9 at the row where $x = 70$. So, $Q_3 = \mathbf{70}$.
Therefore, $\text{IQR} = 70 - 20 = \mathbf{50}$.

2. Grouped Data

When data is continuous or spans a wide range of values, it is organized into class intervals (groups). Because the exact individual data points are lost, we represent each class by its midpoint.

Scenario: Suppose that 100 students took an exam and obtained scores from 1 to 60 (full marks), distributed according to the following grouped frequency table.

Score Interval ($x$) Midpoint Frequency ($f$) Cumulative Freq (c.f.)
$0 \le x < 10$ 5 8 8
$10 \le x < 20$ 15 12 20
$20 \le x < 30$ 25 10 30
$30 \le x < 40$ 35 25 55
$40 \le x < 50$ 45 35 90
$50 \le x < 60$ 55 10 100
$n = 100$

EXAMPLE 2 (Calculating Estimates for Grouped Data)

Because we do not have the exact scores of the 100 students, any calculations we perform for the mean and standard deviation are technically estimates. We act as if all data points in a class take on the value of that class's midpoint.
Calculating the Estimated Mean ($\mu$):
$$\begin{aligned} \mu &= \dfrac{\sum (\text{Midpoint} \times f)}{n} \\ &= \dfrac{5(8) + 15(12) + 25(10) + 35(25) + 45(35) + 55(10)}{100} \\ &= \dfrac{40 + 180 + 250 + 875 + 1575 + 550}{100} \\ &= \dfrac{3470}{100} \\ &= \mathbf{34.7} \end{aligned}$$
GDC Tip: You can enter the midpoints into List 1 and the frequencies into List 2. Go to 1-Var Stats, set 1Var XList : List1 and 1Var Freq : List2, and calculate to automatically find the estimated mean and standard deviation!

3. Cumulative Frequency Curve (Ogive)

A cumulative frequency curve graphs the running total of frequencies against the upper boundaries of each class interval. It creates an S-shaped curve that is extremely useful for estimating percentiles, the median, and quartiles graphically.

0 25 50 75 100 Cumulative Frequency (c.f.) 0 10 20 30 40 50 60 Scores (Upper Boundaries) Q₁≈25 Q₂≈38 Q₃≈46

Reading the Curve:

  • Step 1: Divide the total cumulative frequency (the y-axis) into four equal parts: $25\%$, $50\%$, and $75\%$ of $n$. (Here, at $y=25$, $y=50$, $y=75$).
  • Step 2: Draw horizontal tracking lines from these y-values until they intersect the plotted S-curve.
  • Step 3: Draw vertical lines from those intersection points straight down to the x-axis to read the estimated scores.

Based on the geometry of the curve above, we can visually estimate the quartiles: Lower Quartile ($Q_1$) $\approx 25$, Median ($Q_2$) $\approx 38$, and Upper Quartile ($Q_3$) $\approx 46$.

Note: The values $Q_1, Q_2, Q_3$ are also referred to interchangeably as the $25^{\text{th}}$-percentile, $50^{\text{th}}$-percentile, and $75^{\text{th}}$-percentile respectively. Using this exact same graphical tracking method, you can estimate any percentile by starting at the required percentage of $n$ on the y-axis.