4.12 Continuous Distributions in General (for HL)
1. Description of the Problem
Let $X$ be a variable which takes on values in the interval $[0, 4]$. Suppose also that the probability for the value of $X$ is not "uniformly" distributed throughout this interval but it is more likely that $X$ obtains values near 4.
In such a case we have a continuous function which describes the behavior of the probability. Assume that this function is:
- (i) $f(x) \ge 0$
- (ii) the area of the triangle under the graph is 1
2. General Properties of Continuous Random Variables
For a continuous random variable we measure only the probability of an interval, not of a fixed value; we agree that the probability that $X$ takes on a particular value $a$ is 0, that is $P(X=a) = 0$.
In general, for a continuous random variable $X$ with probability density function (or pdf) $f(x)$, it holds:
- (i) $f(x) \ge 0$, i.e. the function is non-negative
- (ii) $\displaystyle\int_{-\infty}^{+\infty} f(x) dx = 1$, i.e. the total area under the curve is 1
While the probability that $X$ takes values between $a$ and $b$ is evaluated by integration:
For our introductory example, it holds $f(x) = \dfrac{x}{8}$ for $0 \le x \le 4$, and:
3. Expected Value, Variance, Mode, Median, and Quartiles
Comparison of Definitions
| Concept | $X$ DISCRETE | $X$ CONTINUOUS |
|---|---|---|
| Mean $\mu = E(X)$ | $\displaystyle\sum x_i p_i$ | $\displaystyle\int_{-\infty}^{+\infty} x f(x) dx$ |
| $E(X^2)$ | $\displaystyle\sum x_i^2 p_i$ | $\displaystyle\int_{-\infty}^{+\infty} x^2 f(x) dx$ |
| Variance $Var(X)$ | $E(X^2) - \mu^2$ | $E(X^2) - \mu^2$ |
Applying these definitions to our running example $f(x) = \dfrac{x}{8}$ ($0 \le x \le 4$):
-
THE EXPECTED VALUE $\mu = E(X)$
$\mu = E(X) = \displaystyle\int_{-\infty}^{+\infty} x f(x) dx = \int_{0}^{4} x \left(\dfrac{x}{8}\right) dx = \int_{0}^{4} \dfrac{x^2}{8} dx = \left[ \dfrac{x^3}{24} \right]_0^4 = \mathbf{\dfrac{8}{3}} \approx 2.67$ -
THE VARIANCE $Var(X)$
First compute $E(X^2) = \displaystyle\int_{-\infty}^{+\infty} x^2 f(x) dx = \int_{0}^{4} x^2 \left(\dfrac{x}{8}\right) dx = \int_{0}^{4} \dfrac{x^3}{8} dx = \left[ \dfrac{x^4}{32} \right]_0^4 = \mathbf{8}$
Then evaluate $Var(X) = E(X^2) - \mu^2 = 8 - \left(\dfrac{8}{3}\right)^2 = 8 - \dfrac{64}{9} = \mathbf{\dfrac{8}{9}}$.
Note: The initial definition $Var(X) = \int_{-\infty}^{+\infty} (x-\mu)^2 f(x) dx = \int_{0}^{4} \left(x-\dfrac{8}{3}\right)^2 \dfrac{x}{8} dx$ gives the exact same result but is much more complicated to calculate directly. -
MODE
It is the specific value of $x$ where $f(x)$ has its maximum absolute peak. By observing the triangular graph, it occurs at the far right bound.
MODE = 4 -
MEDIAN
It is the continuous value $m$ where cumulative area divides evenly: $P(X \le m) = 0.5$. In practice, we find $m$ by solving $\int_{-\infty}^{m} f(x) dx = 0.5$.
$\displaystyle\int_{0}^{m} \dfrac{x}{8} dx = 0.5 \iff \left[ \dfrac{x^2}{16} \right]_0^m = 0.5 \iff \dfrac{m^2}{16} = 0.5 \iff m^2 = 8 \iff \mathbf{m = \sqrt{8}}$ -
QUARTILES
The lower quartile $Q_1$ and upper quartile $Q_3$ are defined structurally by boundaries isolating 25% and 75% of the total area, respectively.
$P(X \le Q_1) = 0.25 \implies \displaystyle\int_{-\infty}^{Q_1} f(x) dx = 0.25 \implies \left[ \dfrac{x^2}{16} \right]_0^{Q_1} = 0.25 \implies \dfrac{Q_1^2}{16} = \dfrac{1}{4} \implies Q_1^2 = 4 \implies \mathbf{Q_1 = 2}$
$P(X \le Q_3) = 0.75 \implies \displaystyle\int_{-\infty}^{Q_3} f(x) dx = 0.75 \implies \left[ \dfrac{x^2}{16} \right]_0^{Q_3} = 0.75 \implies \dfrac{Q_3^2}{16} = \dfrac{3}{4} \implies Q_3^2 = 12 \implies \mathbf{Q_3 = 2\sqrt{3}}$
EXAMPLE 1: Evaluating a Step/Piecewise Function
Let $X$ be a continuous random variable in $[0, 4]$ with pdf defined as:
-
Confirming that $f(x)$ is a valid pdf:
$\displaystyle\int_{-\infty}^{+\infty} f(x) dx = \int_{0}^{2} \dfrac{x}{4} dx + \int_{2}^{4} \left(1 - \dfrac{x}{4}\right) dx = \dots = \dfrac{1}{2} + \dfrac{1}{2} = \mathbf{1}$.
[In fact, it would be much easier to find the area directly from the geometry of the triangular graph! Area = $\dfrac{1}{2} \times 4 \times 0.5 = 1$.] -
The Expected Value:
$\mu = E(X) = \displaystyle\int_{-\infty}^{+\infty} x f(x) dx = \int_{0}^{2} \dfrac{x^2}{4} dx + \int_{2}^{4} \left(x - \dfrac{x^2}{4}\right) dx = \dots = \mathbf{2}$.
[Again, it is completely obvious by the physical symmetry of the graph that $\mu=2$.] -
The Variance:
We systematically find $E(X^2)$ first:
$E(X^2) = \displaystyle\int_{-\infty}^{+\infty} x^2 f(x) dx = \int_{0}^{2} \dfrac{x^3}{4} dx + \int_{2}^{4} \left(x^2 - \dfrac{x^3}{4}\right) dx = \dots = 1 + \dfrac{11}{3} = \mathbf{\dfrac{14}{3}}$.
Then variance evaluates to: $Var(X) = \dfrac{14}{3} - 2^2 = \dfrac{14}{3} - \dfrac{12}{3} = \mathbf{\dfrac{2}{3}}$. -
The Median:
Since $\int_{0}^{2} \dfrac{x}{4} dx = 0.5$, the median boundary naturally aligns exactly at 2.
Let $f(x) = \begin{cases} f_1(x), & a \le x \le b \\ f_2(x), & b \le x \le c \end{cases}$
We first check $\displaystyle\int_{a}^{b} f_1(x) dx = A$.
If $\mathbf{A > 0.5}$, the median boundary definitively resides between $a$ and $b$, so we isolate $m$ by solving: $\displaystyle\int_{a}^{\text{median}} f_1(x) dx = 0.5$.
If $\mathbf{A < 0.5}$, the median boundary shifts to reside between $b$ and $c$, so we isolate $m$ by solving the remaining required volume: $\displaystyle\int_{\text{median}}^{c} f_2(x) dx = 0.5$.