4.9 Discrete Distributions in General
1. Discrete vs. Continuous Variables
Roughly speaking, a random variable $X$ takes on some values in a given domain at random! It may be:
- Discrete: e.g., $X \in \{10, 20, 30\}$ or $X \in \{0, 1, 2, 3, \dots\}$. A discrete variable takes on values in a finite or numerable set.
- Continuous: e.g., $X \in [10, 20]$ or $X \in \mathbb{R}$. A continuous variable takes on values in some interval(s).
In this section, we only deal with discrete random variables.
Discrete Random Variable
Let $X$ be a variable which takes on the values 10, 20, 30 with probabilities 0.2, 0.3, 0.5 respectively. We often use a table:
| $X$ | 10 | 20 | 30 |
|---|---|---|---|
| $P(X=x)$ | 0.2 | 0.3 | 0.5 |
Clearly, for a discrete random variable $X$ with values $x_1, x_2, x_3, \dots$ and probabilities $p_1, p_2, p_3, \dots$ it holds:
- (i) $p_i \ge 0$, for all $i$ (all probabilities are non-negative numbers)
- (ii) $\sum p_i = 1$, i.e., $p_1 + p_2 + p_3 + \dots = 1$ (their sum is always 1)
2. The Expected Value $\mu = E(X)$
The mean $\mu$, or otherwise the expected value $E(X)$, is mathematically defined by:
For our previous example, the expected value (mean) is:
In fact, the mean here is not different than the standard mean in statistics. Consider the following ten numbers:
EXAMPLE 1
Consider the following probability distribution:
| $X$ | 10 | 20 | 30 |
|---|---|---|---|
| $P(X=x)$ | $a$ | $b$ | 0.5 |
Solution: We use two mathematical relations based on the properties of distributions:
- Sum of probabilities is 1: $a + b + 0.5 = 1 \implies a + b = 0.5$
- Expected value formula: $10a + 20b + 30(0.5) = 23 \implies 10a + 20b = 8$
The probability distribution applies universally in many betting games:
EXAMPLE 2
Consider again the table from above. But now we select one of the numbers 10, 20, 30 at random.
- If we select 10 we earn 6 points
- If we select 20 we earn 1 point
- If we select 30 we lose 2 points
What is the expected number of points in one game?
| $X$ | 10 | 20 | 30 |
|---|---|---|---|
| Profit | 6 points | 1 point | -2 points |
| Prob | 0.2 | 0.3 | 0.5 |
$\text{Expected profit} = 6 \times 0.2 + 1 \times 0.3 - 2 \times 0.5 = 1.2 + 0.3 - 1.0 = \mathbf{0.5}$
That is, in each game we earn 0.5 points on average.
- 2 times the number 10 $\implies 2 \times 6 = 12$ points
- 3 times the number 20 $\implies 3 \times 1 = 3$ points
- 5 times the number 30 $\implies 5 \times (-2) = -10$ points
EXAMPLE 3
We throw two dice.
- If we obtain TWO SIXES we earn 15€
- If we obtain ONLY ONE SIX we earn 1€
- If we obtain NO SIX we lose 1€
Find the expected profit in one game.
| Result | TWO SIXES | ONE SIX | NO SIX |
|---|---|---|---|
| Profit | 15€ | 1€ | -1€ |
| Prob | $\dfrac{1}{36}$ | $\dfrac{10}{36}$ | $\dfrac{25}{36}$ |
3. Median and Mode
These measures, heavily utilized in descriptive statistics, are defined analogously for probability distributions:
- MODE = The specific value $X=a$ which holds the absolute highest probability.
- MEDIAN = The exact value $X=m$ where the cumulative probability logically splits into two equal parts (0.5 - 0.5).
EXAMPLE 4 (Finding Mode and Median)
| $X$ | 10 | 20 | 30 |
|---|---|---|---|
| $P(X=x)$ | 0.4 | 0.3 | 0.3 |
- MODE = 10 (Highest probability is 0.4)
- MEDIAN = 20 (Cumulative probability surpasses 0.5 at 20)
| $X$ | 10 | 20 | 30 |
|---|---|---|---|
| $P(X=x)$ | 0.2 | 0.3 | 0.5 |
- MODE = 30 (Highest probability is 0.5)
- MEDIAN = 25 (Why? At $X=20$, the cumulative sum is 0.5. At $X=30$, the remaining sum is 0.5. The exact midpoint between 20 and 30 symmetrically divides the distribution, so $\dfrac{20+30}{2} = 25$).
4. Variance (Only for HL)
We define the Variance of a discrete random variable mathematically as:
That is strictly expanding to:
An equivalent (and practically far faster) algebraic definition is:
where
EXAMPLE 5
Consider again the baseline probability distribution:
| $X$ | 10 | 20 | 30 |
|---|---|---|---|
| $P(X=x)$ | 0.2 | 0.3 | 0.5 |