4.12 Continuous Distributions in General (for HL)

1. Description of the Problem

Let $X$ be a variable which takes on values in the interval $[0, 4]$. Suppose also that the probability for the value of $X$ is not "uniformly" distributed throughout this interval but it is more likely that $X$ obtains values near 4.

In such a case we have a continuous function which describes the behavior of the probability. Assume that this function is:

$f(x) = \dfrac{x}{8}, \quad 0 \le x \le 4$
x y 1 2 3 4 0.5
That is, the probability increases as we move towards 4. The function is not accidental! Notice that:
  • (i) $f(x) \ge 0$
  • (ii) the area of the triangle under the graph is 1
The probability that $X$ is between 0 and 2 is given by the corresponding area under the curve, that is 0.25 (25% of the total area). We write:
$P(0 \le X \le 2) = 0.25$
Similarly, $P(2 \le X \le 4) = 0.75$.

2. General Properties of Continuous Random Variables

For a continuous random variable we measure only the probability of an interval, not of a fixed value; we agree that the probability that $X$ takes on a particular value $a$ is 0, that is $P(X=a) = 0$.

In general, for a continuous random variable $X$ with probability density function (or pdf) $f(x)$, it holds:

  • (i) $f(x) \ge 0$, i.e. the function is non-negative
  • (ii) $\displaystyle\int_{-\infty}^{+\infty} f(x) dx = 1$, i.e. the total area under the curve is 1

While the probability that $X$ takes values between $a$ and $b$ is evaluated by integration:

$P(a \le X \le b) = \int_{a}^{b} f(x) dx$
Notice: $P(a \le X \le b)$ and $P(a < X < b)$ are exactly the same, as the probability that $X$ takes a particular value, say $P(X=a)$, is zero!

For our introductory example, it holds $f(x) = \dfrac{x}{8}$ for $0 \le x \le 4$, and:

$\displaystyle\int_{-\infty}^{+\infty} f(x) dx = \int_{0}^{4} \dfrac{x}{8} dx = \left[ \dfrac{x^2}{16} \right]_0^4 = 1$
$P(0 \le X \le 2) = \displaystyle\int_{0}^{2} \dfrac{x}{8} dx = \left[ \dfrac{x^2}{16} \right]_0^2 = \dfrac{4}{16} = \mathbf{0.25}$
$P(2 \le X \le 4) = 1 - 0.25 = \mathbf{0.75}$
Notice also that for particular values of $X$, the probability is exactly 0. For example: $P(X=2) = 0$, $P(X=3.7) = 0$.

3. Expected Value, Variance, Mode, Median, and Quartiles

Comparison of Definitions

Concept $X$ DISCRETE $X$ CONTINUOUS
Mean $\mu = E(X)$ $\displaystyle\sum x_i p_i$ $\displaystyle\int_{-\infty}^{+\infty} x f(x) dx$
$E(X^2)$ $\displaystyle\sum x_i^2 p_i$ $\displaystyle\int_{-\infty}^{+\infty} x^2 f(x) dx$
Variance $Var(X)$ $E(X^2) - \mu^2$ $E(X^2) - \mu^2$

Applying these definitions to our running example $f(x) = \dfrac{x}{8}$ ($0 \le x \le 4$):

  • THE EXPECTED VALUE $\mu = E(X)$
    $\mu = E(X) = \displaystyle\int_{-\infty}^{+\infty} x f(x) dx = \int_{0}^{4} x \left(\dfrac{x}{8}\right) dx = \int_{0}^{4} \dfrac{x^2}{8} dx = \left[ \dfrac{x^3}{24} \right]_0^4 = \mathbf{\dfrac{8}{3}} \approx 2.67$
  • THE VARIANCE $Var(X)$
    First compute $E(X^2) = \displaystyle\int_{-\infty}^{+\infty} x^2 f(x) dx = \int_{0}^{4} x^2 \left(\dfrac{x}{8}\right) dx = \int_{0}^{4} \dfrac{x^3}{8} dx = \left[ \dfrac{x^4}{32} \right]_0^4 = \mathbf{8}$
    Then evaluate $Var(X) = E(X^2) - \mu^2 = 8 - \left(\dfrac{8}{3}\right)^2 = 8 - \dfrac{64}{9} = \mathbf{\dfrac{8}{9}}$.
    Note: The initial definition $Var(X) = \int_{-\infty}^{+\infty} (x-\mu)^2 f(x) dx = \int_{0}^{4} \left(x-\dfrac{8}{3}\right)^2 \dfrac{x}{8} dx$ gives the exact same result but is much more complicated to calculate directly.
  • MODE
    It is the specific value of $x$ where $f(x)$ has its maximum absolute peak. By observing the triangular graph, it occurs at the far right bound.
    MODE = 4
  • MEDIAN
    It is the continuous value $m$ where cumulative area divides evenly: $P(X \le m) = 0.5$. In practice, we find $m$ by solving $\int_{-\infty}^{m} f(x) dx = 0.5$.
    $\displaystyle\int_{0}^{m} \dfrac{x}{8} dx = 0.5 \iff \left[ \dfrac{x^2}{16} \right]_0^m = 0.5 \iff \dfrac{m^2}{16} = 0.5 \iff m^2 = 8 \iff \mathbf{m = \sqrt{8}}$
  • QUARTILES
    The lower quartile $Q_1$ and upper quartile $Q_3$ are defined structurally by boundaries isolating 25% and 75% of the total area, respectively.
    $P(X \le Q_1) = 0.25 \implies \displaystyle\int_{-\infty}^{Q_1} f(x) dx = 0.25 \implies \left[ \dfrac{x^2}{16} \right]_0^{Q_1} = 0.25 \implies \dfrac{Q_1^2}{16} = \dfrac{1}{4} \implies Q_1^2 = 4 \implies \mathbf{Q_1 = 2}$
    $P(X \le Q_3) = 0.75 \implies \displaystyle\int_{-\infty}^{Q_3} f(x) dx = 0.75 \implies \left[ \dfrac{x^2}{16} \right]_0^{Q_3} = 0.75 \implies \dfrac{Q_3^2}{16} = \dfrac{3}{4} \implies Q_3^2 = 12 \implies \mathbf{Q_3 = 2\sqrt{3}}$

EXAMPLE 1: Evaluating a Step/Piecewise Function

Let $X$ be a continuous random variable in $[0, 4]$ with pdf defined as:

$f(x) = \begin{cases} \dfrac{x}{4}, & 0 \le x \le 2 \\ 1 - \dfrac{x}{4}, & 2 \le x \le 4 \end{cases}$
x y 1 2 3 4 0.5
  • Confirming that $f(x)$ is a valid pdf:
    $\displaystyle\int_{-\infty}^{+\infty} f(x) dx = \int_{0}^{2} \dfrac{x}{4} dx + \int_{2}^{4} \left(1 - \dfrac{x}{4}\right) dx = \dots = \dfrac{1}{2} + \dfrac{1}{2} = \mathbf{1}$.
    [In fact, it would be much easier to find the area directly from the geometry of the triangular graph! Area = $\dfrac{1}{2} \times 4 \times 0.5 = 1$.]
  • The Expected Value:
    $\mu = E(X) = \displaystyle\int_{-\infty}^{+\infty} x f(x) dx = \int_{0}^{2} \dfrac{x^2}{4} dx + \int_{2}^{4} \left(x - \dfrac{x^2}{4}\right) dx = \dots = \mathbf{2}$.
    [Again, it is completely obvious by the physical symmetry of the graph that $\mu=2$.]
  • The Variance:
    We systematically find $E(X^2)$ first:
    $E(X^2) = \displaystyle\int_{-\infty}^{+\infty} x^2 f(x) dx = \int_{0}^{2} \dfrac{x^3}{4} dx + \int_{2}^{4} \left(x^2 - \dfrac{x^3}{4}\right) dx = \dots = 1 + \dfrac{11}{3} = \mathbf{\dfrac{14}{3}}$.
    Then variance evaluates to: $Var(X) = \dfrac{14}{3} - 2^2 = \dfrac{14}{3} - \dfrac{12}{3} = \mathbf{\dfrac{2}{3}}$.
  • The Median:
    Since $\int_{0}^{2} \dfrac{x}{4} dx = 0.5$, the median boundary naturally aligns exactly at 2.
Notice: Analytical approach for piecewise median:
Let $f(x) = \begin{cases} f_1(x), & a \le x \le b \\ f_2(x), & b \le x \le c \end{cases}$

We first check $\displaystyle\int_{a}^{b} f_1(x) dx = A$.
If $\mathbf{A > 0.5}$, the median boundary definitively resides between $a$ and $b$, so we isolate $m$ by solving: $\displaystyle\int_{a}^{\text{median}} f_1(x) dx = 0.5$.
If $\mathbf{A < 0.5}$, the median boundary shifts to reside between $b$ and $c$, so we isolate $m$ by solving the remaining required volume: $\displaystyle\int_{\text{median}}^{c} f_2(x) dx = 0.5$.