4.9 Discrete Distributions in General

1. Discrete vs. Continuous Variables

Roughly speaking, a random variable $X$ takes on some values in a given domain at random! It may be:

  • Discrete: e.g., $X \in \{10, 20, 30\}$ or $X \in \{0, 1, 2, 3, \dots\}$. A discrete variable takes on values in a finite or numerable set.
  • Continuous: e.g., $X \in [10, 20]$ or $X \in \mathbb{R}$. A continuous variable takes on values in some interval(s).

In this section, we only deal with discrete random variables.

Discrete Random Variable

Let $X$ be a variable which takes on the values 10, 20, 30 with probabilities 0.2, 0.3, 0.5 respectively. We often use a table:

$X$ 10 20 30
$P(X=x)$ 0.2 0.3 0.5
To express that the probability that $X=10$ is 0.2, we write $P(X=10) = 0.2$. Similarly, $P(X=20) = 0.3$ and $P(X=30) = 0.5$.

Clearly, for a discrete random variable $X$ with values $x_1, x_2, x_3, \dots$ and probabilities $p_1, p_2, p_3, \dots$ it holds:
  • (i) $p_i \ge 0$, for all $i$ (all probabilities are non-negative numbers)
  • (ii) $\sum p_i = 1$, i.e., $p_1 + p_2 + p_3 + \dots = 1$ (their sum is always 1)
(We also say that a probability function $P: x_i \mapsto p_i$ is defined).

2. The Expected Value $\mu = E(X)$

The mean $\mu$, or otherwise the expected value $E(X)$, is mathematically defined by:

$E(X) = \sum x_i p_i = x_1 p_1 + x_2 p_2 + x_3 p_3 + \dots$

For our previous example, the expected value (mean) is:

$E(X) = 10 \times 0.2 + 20 \times 0.3 + 30 \times 0.5 = 23$
NOTICE: Explanation for $\mu = E(X)$
In fact, the mean here is not different than the standard mean in statistics. Consider the following ten numbers:
$10, 10, 20, 20, 20, 30, 30, 30, 30, 30$
The probabilities to select 10, 20, or 30 perfectly map to our table (0.2, 0.3, 0.5). The statistical mean is:
$\mu = \dfrac{10 \times 2 + 20 \times 3 + 30 \times 5}{10} = 10 \times \dfrac{2}{10} + 20 \times \dfrac{3}{10} + 30 \times \dfrac{5}{10} = \mathbf{23}$

EXAMPLE 1

Consider the following probability distribution:

$X$ 10 20 30
$P(X=x)$ $a$ $b$ 0.5
Given that $E(X) = 23$, find the values of $a$ and $b$.

Solution: We use two mathematical relations based on the properties of distributions:
  • Sum of probabilities is 1: $a + b + 0.5 = 1 \implies a + b = 0.5$
  • Expected value formula: $10a + 20b + 30(0.5) = 23 \implies 10a + 20b = 8$
Solving this system of linear equations yields: $\mathbf{a = 0.2}$ and $\mathbf{b = 0.3}$.

The probability distribution applies universally in many betting games:

EXAMPLE 2

Consider again the table from above. But now we select one of the numbers 10, 20, 30 at random.

  • If we select 10 we earn 6 points
  • If we select 20 we earn 1 point
  • If we select 30 we lose 2 points

What is the expected number of points in one game?

Solution: We extend our table as follows:
$X$ 10 20 30
Profit 6 points 1 point -2 points
Prob 0.2 0.3 0.5
We estimate the expected profit:
$\text{Expected profit} = 6 \times 0.2 + 1 \times 0.3 - 2 \times 0.5 = 1.2 + 0.3 - 1.0 = \mathbf{0.5}$
That is, in each game we earn 0.5 points on average.
Explanation: If we play this game 10 times we expect to earn 5 points on average. Indeed, if we play 10 times we expect to obtain:
  • 2 times the number 10 $\implies 2 \times 6 = 12$ points
  • 3 times the number 20 $\implies 3 \times 1 = 3$ points
  • 5 times the number 30 $\implies 5 \times (-2) = -10$ points
In total, $12 + 3 - 10 = 5$ points over 10 games.

EXAMPLE 3

We throw two dice.

  • If we obtain TWO SIXES we earn 15€
  • If we obtain ONLY ONE SIX we earn 1€
  • If we obtain NO SIX we lose 1€

Find the expected profit in one game.

Solution: Let us organize our data on a table:
Result TWO SIXES ONE SIX NO SIX
Profit 15€ 1€ -1€
Prob $\dfrac{1}{36}$ $\dfrac{10}{36}$ $\dfrac{25}{36}$
The expected amount earned per game is:
$\text{Expected profit} = 15 \times \dfrac{1}{36} + 1 \times \dfrac{10}{36} - 1 \times \dfrac{25}{36} = \dfrac{15 + 10 - 25}{36} = \mathbf{0}$
This is a FAIR GAME! We expect neither to earn nor to lose on average.
Notice: If the first winning prize was not 15€ but 14€, the expected profit would be $-\dfrac{1}{36}$. In other words, if we play the game 36,000 times (or otherwise bet 36,000€), we statically expect to lose 1,000€.

3. Median and Mode

These measures, heavily utilized in descriptive statistics, are defined analogously for probability distributions:

  • MODE = The specific value $X=a$ which holds the absolute highest probability.
  • MEDIAN = The exact value $X=m$ where the cumulative probability logically splits into two equal parts (0.5 - 0.5).

EXAMPLE 4 (Finding Mode and Median)

$X$ 10 20 30
$P(X=x)$ 0.4 0.3 0.3
  • MODE = 10 (Highest probability is 0.4)
  • MEDIAN = 20 (Cumulative probability surpasses 0.5 at 20)
$X$ 10 20 30
$P(X=x)$ 0.2 0.3 0.5
  • MODE = 30 (Highest probability is 0.5)
  • MEDIAN = 25 (Why? At $X=20$, the cumulative sum is 0.5. At $X=30$, the remaining sum is 0.5. The exact midpoint between 20 and 30 symmetrically divides the distribution, so $\dfrac{20+30}{2} = 25$).

4. Variance (Only for HL)

We define the Variance of a discrete random variable mathematically as:

$Var(X) = E(X - \mu)^2$

That is strictly expanding to:

$Var(X) = (x_1 - \mu)^2 \times p_1 + (x_2 - \mu)^2 \times p_2 + (x_3 - \mu)^2 \times p_3 + \dots$

An equivalent (and practically far faster) algebraic definition is:

$Var(X) = E(X^2) - \mu^2$

where

$E(X^2) = x_1^2 \times p_1 + x_2^2 \times p_2 + x_3^2 \times p_3 + \dots$

EXAMPLE 5

Consider again the baseline probability distribution:

$X$ 10 20 30
$P(X=x)$ 0.2 0.3 0.5
We have seen previously that $\mu = E(X) = 23$. Therefore, calculating variance via the initial definition:
$Var(X) = (10-23)^2 \times 0.2 + (20-23)^2 \times 0.3 + (30-23)^2 \times 0.5 = \mathbf{61}$
Or, alternatively estimating via $E(X^2)$: $$ \begin{aligned} E(X^2) &= 10^2 \times 0.2 + 20^2 \times 0.3 + 30^2 \times 0.5 \\ &= 100(0.2) + 400(0.3) + 900(0.5) \\ &= 20 + 120 + 450 \\ &= 590 \end{aligned} $$ $$ \begin{aligned} Var(X) &= E(X^2) - \mu^2 \\ &= 590 - 23^2 \\ &= 590 - 529 \\ &= \mathbf{61} \end{aligned} $$