Random Variables
Random variables
Definition 1. A random variable is a function $X: \Omega \to \mathbb{R}$ with the property that $\{ \omega\in \Omega \;\vert\; X(\omega) \leq x\} \in \mathcal{F}$ for each $x \in \mathbb{R}$. Such a function is said to be $\mathbf{\mathcal{F}}$-measurable.
Example 1. Tossing two dice. We define $$ X = \begin{cases} 1 & \text{get double ($i=j$)} \\ 0 & \text{not double $(i\neq j)$} \end{cases}. $$ Then $X = (i,j) \in \mathbb{R}$.
Remark. We are interested in $g(x) = \mathbf{P}\{ \omega \;\vert\; X(\omega) = x \}$. But sometimes it doesn't work very well. Probability triplet is defined as $\{ \Omega, \mathcal{F}, \mathbf{P} \}$. If $\Omega$ is countable, then $g(x)$ is okay. But if $\Omega$ is uncountable like intervals, then $\mathbf{P}(X=x)$ doesn't make much sense because the cardinality is too large.
Definition 2. The distribution function of a random variable $X$ is the function $F: \mathbb{R} \to [0, 1]$ given by $F(x) = \mathbf{P}(X \leq x)$.
Example 2. The distribution function of preceeding example is $$ F(x) = \begin{cases} 0 & x \leq 0, \\ \frac{30}{36} & 0 < x \leq 1, \\ 1 & x > 1. \end{cases} $$
Notice that ${ \omega \;\vert\; X(\omega) \leq x}$ defines an event. It is an element in the corresponding $\sigma$-field. Denote $A(x) = { \omega \;\vert\; X(\omega) \leq x }$. Along with $A(x)$, we can define \(A^c(x) = \{ \omega \;\vert\; X(\omega) > x \}, \quad A(x, y) = A^c(x) \cap A(y) = \{ \omega \;\vert\; x < X(\omega) \leq y \}.\) Tow points worth noting:
- $F$ must be defined for all $x \in \mathbb{R}$.
- $A(x)$ should belongs to $\mathcal{F}$. Otherwise, we cannot talk about the probability of $\mathbf{P}(A(x))$. Then the definition of distribution function is meaningless.
Lemma 1. A distribution function $F$ has the following properties:
- $\lim_{x \to -\infty}F(x) = 0$, $\lim_{x\to\infty} F(x) = 1$,
- if $x < y$, then $F(x) \leq F(y)$,
- $F$ is right-continuous, that is $F(x+h) \to F(x)$ as $h \downarrow 0$. (left-continuous is not necessary)
Example 3. Indicator functions. A particular class of Bernoulli variables is very useful in probability theory. Let $A$ be an event and let $I_A : \Omega \to \mathbb{R}$ be the indicator function of $A$; that is, $$ I_A(\omega) = \begin{cases} 1 & \text{if $\omega \in A$}, \\ 0 & \text{if $\omega \in A^c$}. \end{cases} $$ Then $I_A$ is a Bernoulli random variable taking the values 1 and 0 with probabilities $\mathbf{P}(A)$ and $\mathbf{P}(A^c)$ respectively. Suppose $\{B_i \;\vert\; i \in I\}$ is a family of disjoint events with $A \subseteq \bigcup_{i\in I} B_i$. Then $$ I_A = \sum_{i} = I_{A\cap B_i}, $$ an identity which is often useful.
Lemma 2. Let $F$ be the distribution function of $X$. Then
- $\mathbf{P}(X>x) = 1-F(x)$,
- $\mathbf{P}(x < X \leq y) = F(y) - F(x)$,
- $\mathbf{P}(X = x) = F(x) - \lim_{y \uparrow x} F(y)$.
A random variable $X$ with distribution function $F$ is said to have two “tails” given by \(T_1 (X) = \mathbf{P}(X > x) = 1 - F(x), \quad T_2(X) = \mathbf{P}(X \leq -x) = F(-x),\) where $x$ is large and positive. The rates at which the $T_i$ decay to zero as $x\to\infty$ have a substantial effect on the existence or non-existence of certain associated quantities called the “moments” of the distribution.
Different random variables
Discrete random variables
Definition 3. The random variable $X$ is called discrete if it takes values in some countable subset $\{ x_1, x_2, \dots \}$, only, of $\mathbb{R}$. The discrete random variable $X$ has (probability) mass function $f : \mathbb{R} \to [0, 1]$ given by $f(x) = \mathbf{P}(X = x)$.
We shall see that the distribution function of a discrete variable has jump discontinuities at the values $x_1 , x_2, \dots$ and is constant in between; such a distribution is called atomic.
Continuous random variables
Definition 4. The random variable $X$ is called continuous if its distribution function can be expressed as $$ F(x) = \int_{-\infty}^x f(u) du \qquad x \in \mathbb{R} $$ for some integrable function $f : \mathbb{R} \to [0, \infty)$ called the **(probability) density function** of $X$.
Some point worth noting:
- The density function $f(x)$ is not unique. We can add some separate points to $f$, and it doesn't affect the integration.
- $F$ must be **absolutely continuous**. This implies $F$ is continuous. We can also deduce that the probability at certain point must be zero. \emph{i.e.}, $\mathbf{P}(X=x) = 0$.
There is another sort of random variable, called “singular”.
Random vectors
Definition
The random vector is a function $X : \Omega \to \mathbb{R}^n$. For example, $\mathbf{X} = (X, Y)$ for $n=2$. We can also define the distribution function for such $X$. But we need to first introduce the ordering in $\mathbb{R}^n$.
Definition 5. By definition, $(x_1, y_1) < (x_2, y_2)$ if and only if $x_1 < x_2$ **AND** $y_1 < y_2$.
Definition 6. The joint distribution function of a random vector $\mathbf{X} = (X_1, X_2 \dots, X_n)$ on the probability space $\{ \Omega, \mathcal{F}, \mathbf{P} \}$ is the function $F_{\mathbf{X}} : \mathbb{R}^n \to [0, 1]$ given by $F_{\mathbf{X}}(x) = \mathbf{P}(\mathbf{X} \leq \mathbf{x})$ for $\mathbf{x} \in \mathbb{R}^n$.
Remark . The joint probability $\mathbf{P}(\mathbf{X} \leq \mathbf{x}) = \mathbf{P}(X_1 \leq x_1, \dots, X_n \leq x_n)$. $\{\mathbf{X} \leq \mathbf{x} \}$ is an abbreviation for the event $\{\omega \in \Omega \;\vert\; \mathbf{X}(w) \leq \mathbf{x}\}$.
Lemma 3. The joint distribution function $F_{X, Y}$ of the random vector $(X, Y)$ has the following properties:
- $\lim_{x, y\to -\infty} F_{X, Y}(x, y) = 0, \lim_{x,y\to\infty} F_{X, Y}(x,y) = 1$,
- if $(x_1, y_1) \leq (x_2, y_2)$, then $F_{X,Y}(x_1, y_1) \leq F_{X, Y}(x_2, y_2)$,
- $F_{X,Y}$ is continuous from above, in that $$ F_{X,Y}(x+u, y+v) \to F_{X,Y}(x,y) \quad as \quad u, v \downarrow 0. $$
Marginalization
\(\begin{align} \lim_{y\to\infty} F_{X,Y} &= F_X(x) = \mathbf{P}(X \leq x), \\ \lim_{x\to\infty} F_{X,Y} &= F_Y(y) = \mathbf{P}(Y \leq y). \end{align}\)
The functions $F_X$ and $F_Y$ are called the “marginal” distribution functions of $F_{X,Y}$. $F_{X,Y}$ can determine two marginals $F_X$ and $F_Y$, but converse is NOT true.
Discrete and continuous distribution
Definition 7. The random variables $X$ and $Y$ on the probability space $\{ \Omega, \mathcal{F}, \mathbf{P} \}$ are called **(jointly) discrete** if the vector $(X, Y)$ takes values in some **countable** subset of $\mathbb{R}^2$ only. The jointly discrete random variables $X, Y$ have **joint (probability) mass function** $f : \mathbb{R}^2 \to [0,1]$ given by $f(x, y) = \mathbf{P}(X = x, Y = y)$.
Definition 8. The random variables $X$ and $Y$ on the probability space $\{ \Omega, \mathcal{F}, \mathbf{P} \}$ are called **(jointly) continuous** if their joint distribution function can be expressed as $$ F_{X,Y}(x,y) = \int_{u = -\infty}^x \int_{v = -\infty}^y f(u,v)du dv \qquad x, y\in\mathbb{R}, $$ for some **integrable** function $f : \mathbb{R}^2 \to [0, \infty)$ called the **joint (probability) density function** of the pair $(X, Y)$.