4.9 Discrete Distributions

1. Discrete vs. Continuous Variables

A random variable $X$ takes values in a domain. It can be:

  • Discrete: e.g., $X \in \{10, 20, 30\}$ or $X \in \{0, 1, 2, 3, \dots\}$. A discrete variable takes values in a finite or countable set.
  • Continuous: e.g., $X \in [10, 20]$ or $X \in \mathbb{R}$. A continuous variable takes values in an interval or intervals.

This section covers discrete random variables.

Discrete Random Variable

Let $X$ be a variable that takes the values 10, 20, and 30 with probabilities 0.2, 0.3, and 0.5. This can be shown in a table:

$X$ 10 20 30
$P(X=x)$ 0.2 0.3 0.5
To express that the probability of $X=10$ is 0.2, we write $P(X=10) = 0.2$. Similarly, $P(X=20) = 0.3$ and $P(X=30) = 0.5$.

For any discrete random variable $X$ with values $x_1, x_2, x_3, \dots$ and probabilities $p_1, p_2, p_3, \dots$:
  • (i) $p_i \ge 0$ for all $i$ (probabilities are non-negative)
  • (ii) $\sum p_i = 1$, meaning $p_1 + p_2 + p_3 + \dots = 1$ (the sum of probabilities is 1)
(This defines a probability function $P: x_i \mapsto p_i$).

2. The Expected Value $\mu = E(X)$

The mean $\mu$, or expected value $E(X)$, is defined as:

$E(X) = \sum x_i p_i = x_1 p_1 + x_2 p_2 + x_3 p_3 + \dots$

For the previous example, the expected value is:

$E(X) = 10 \times 0.2 + 20 \times 0.3 + 30 \times 0.5 = 23$
Explanation for $\mu = E(X)$
This mean is identical to the standard statistical mean. Consider these ten numbers:
$10, 10, 20, 20, 20, 30, 30, 30, 30, 30$
The probabilities of selecting 10, 20, or 30 match the table (0.2, 0.3, 0.5). The mean is:
$\mu = \dfrac{10 \times 2 + 20 \times 3 + 30 \times 5}{10} = 10 \times \dfrac{2}{10} + 20 \times \dfrac{3}{10} + 30 \times \dfrac{5}{10} = \mathbf{23}$

EXAMPLE 1

Consider the following probability distribution:

$X$ 10 20 30
$P(X=x)$ $a$ $b$ 0.5
Given that $E(X) = 23$, find $a$ and $b$.

Solution: Use two equations based on the properties of distributions:
  • Sum of probabilities is 1: $a + b + 0.5 = 1 \implies a + b = 0.5$
  • Expected value formula: $10a + 20b + 30(0.5) = 23 \implies 10a + 20b = 8$
Solving this system yields: $\mathbf{a = 0.2}$ and $\mathbf{b = 0.3}$.

Probability distributions apply to betting games:

EXAMPLE 2

Using the previous table, select one number from 10, 20, 30 at random.

  • Selecting 10 earns 6 points
  • Selecting 20 earns 1 point
  • Selecting 30 loses 2 points

What is the expected number of points per game?

Solution: Organize the data in a table:
$X$ 10 20 30
Profit 6 points 1 point -2 points
Prob 0.2 0.3 0.5
Calculate the expected profit:
$\text{Expected profit} = 6 \times 0.2 + 1 \times 0.3 - 2 \times 0.5 = 1.2 + 0.3 - 1.0 = \mathbf{0.5}$
We earn an average of 0.5 points per game.
Explanation: Playing this game 10 times yields an expected 5 points. In 10 games, we expect to get:
  • 2 times the number 10 $\implies 2 \times 6 = 12$ points
  • 3 times the number 20 $\implies 3 \times 1 = 3$ points
  • 5 times the number 30 $\implies 5 \times (-2) = -10$ points
In total, $12 + 3 - 10 = 5$ points over 10 games.

EXAMPLE 3

We throw two dice.

  • TWO SIXES earns 15€
  • ONE SIX earns 1€
  • NO SIX loses 1€

Find the expected profit per game.

Solution: Organize the data in a table:
Result TWO SIXES ONE SIX NO SIX
Profit 15€ 1€ -1€
Prob $\dfrac{1}{36}$ $\dfrac{10}{36}$ $\dfrac{25}{36}$
The expected profit per game is:
$\text{Expected profit} = 15 \times \dfrac{1}{36} + 1 \times \dfrac{10}{36} - 1 \times \dfrac{25}{36} = \dfrac{15 + 10 - 25}{36} = \mathbf{0}$
This is a FAIR GAME! The expected net profit is zero.
Notice: If the first prize were 14€ instead of 15€, the expected profit would be $-\dfrac{1}{36}$. If we play 36,000 times, we expect to lose 1,000€.

3. Median and Mode

These measures are defined for probability distributions as follows:

  • MODE = The value $X=a$ with the highest probability.
  • MEDIAN = The value $X=m$ where the cumulative probability splits the distribution (reaches 0.5).

EXAMPLE 4 (Finding Mode and Median)

$X$ 10 20 30
$P(X=x)$ 0.4 0.3 0.3
  • MODE = 10 (Highest probability)
  • MEDIAN = 20 (Cumulative probability reaches 0.5 at 20)
$X$ 10 20 30
$P(X=x)$ 0.2 0.3 0.5
  • MODE = 30 (Highest probability)
  • MEDIAN = 25 (At $X=20$, the cumulative sum is 0.5. The midpoint between 20 and 30 divides the distribution: $\dfrac{20+30}{2} = 25$).

4. VarianceHL ONLY

The variance of a discrete random variable is:

$Var(X) = E(X - \mu)^2$

Expanded, this is:

$Var(X) = (x_1 - \mu)^2 \times p_1 + (x_2 - \mu)^2 \times p_2 + (x_3 - \mu)^2 \times p_3 + \dots$

An alternative formula is:

$Var(X) = E(X^2) - \mu^2$

where

$E(X^2) = x_1^2 \times p_1 + x_2^2 \times p_2 + x_3^2 \times p_3 + \dots$

EXAMPLE 5

Using the initial probability distribution:

$X$ 10 20 30
$P(X=x)$ 0.2 0.3 0.5
Since $\mu = E(X) = 23$, the variance using the first formula is:
$Var(X) = (10-23)^2 \times 0.2 + (20-23)^2 \times 0.3 + (30-23)^2 \times 0.5 = \mathbf{61}$
Using the alternative formula with $E(X^2)$: $$ \begin{aligned} E(X^2) &= 10^2 \times 0.2 + 20^2 \times 0.3 + 30^2 \times 0.5 \\ &= 100(0.2) + 400(0.3) + 900(0.5) \\ &= 20 + 120 + 450 \\ &= 590 \\ \implies Var(X) &= E(X^2) - \mu^2 = 590 - 23^2 = \mathbf{61} \end{aligned} $$