Sets and Probability Measures

Event as sets

Probability studies the repeatable (or ideal) experiments. The result of an experiment is called an outcome.

Definition 1. The set of all possible outcomes of an experiment is called the sample space and is denoted by $\Omega$.

Cardinality of sets

The cardinality of a set $\Omega$ refers to the number of elements in this set, and is denoted by $\mathbf{card}(\Omega)$ or $\vert \Omega \vert$.

For finite sets, $\mathbf{card}(\Omega)$ is a natural number.
For the integer set, $\mathbf{card}(\mathbb{Z}) = \aleph_0$, which is countable.
For the real number set, $\mathbf{card}(\mathbb{R}) = \aleph_1 > \aleph_0$.

The power set of a set $\Omega$ is the set of all subsets, which is denoted by $2^\Omega$.

For $\mathbb{Z}$, $\mathbf{card}(2^\mathbb{Z}) = \aleph_1 = \mathbf{card}(\mathbb{R})$.

In practical, sets in probability are: finite, countable, reals and their variants.

When we conduct an experiment, we want to know whether a subset occurs or not. For example, if we take a number from $\mathbb{R}$, the probability will be 0 because $\mathbf{card}(\mathbb{R})$ is very large. Therefore, we are interested in $A \subset \Omega$ in probability, where $A$ is a collection of subsets in probability.

Events and fields

Definition 2. The events are subsets of the sample space $\Omega$.

Remark. $\varnothing$ is called the impossible event. The set $\Omega$ is called the certain event. Events $A$ and $B$ are called disjoint if their intersection is the empty set.

We do not need all the subsets of $\Omega$ be events. It suffices for us to think of the collection of events as a subcollection $\mathcal{F}$ of the set of all subsets of $\Omega$.

Definition 3. Any collection $\mathcal{F}$ of subsets of $\Omega$ which satisfies the following three conditions is called a field:

if $A, B \in \mathcal{F}$, then $A \cup B \in \mathcal{F}$ and $A \cap B \in \mathcal{F}$ (actually $A\cap B \in \mathcal{F}$ is redundant);
if $A \in \mathcal{F}$, then $A^c \in \mathcal{F}$;
the empty set $\varnothing$ belongs to $\mathcal{F}$.

Remark. Some implications from the properties of a filed $\mathcal{F}$:

if $A_1, \dots, A_n \in \mathcal{F}$, then $\bigcup_{i=1}^n A_i \in \mathcal{F}$;
$\varnothing^C = \Omega \in \mathcal{F}$;
$A \cap B = (A^c \cup B^c)^C \in \mathcal{F}$.

what if we have an $\Omega$ such that $\mathbf{card}(\Omega) = \aleph_0$? In this case, we may need $2^\Omega$ to study the probability.

Definition 4. A collection $\mathcal{F}$ of subsets of $\Omega$ is called a $\sigma$-field if it satisfies the following conditions:

$\varnothing \in \mathcal{F}$;
if $A \in \mathcal{F}$, then $A^c \in \mathcal{F}$;
if $A_1, A_2, \dots \in \mathcal{F}$, then $\bigcup_{i=1}^\infty A_i \in \mathcal{F}$.

Remark. $\sigma$-fields are closed under the operation of taking countable intersections.

Example. The smallest $\sigma$-field associated with $\Omega$ is the collection $\mathcal{F} = \{\varnothing, \Omega \}$.

If $A \subset \Omega$, then $\mathcal{F} = \{\varnothing, A, A^c, \Omega \}$ is a $\sigma$-field.

The power set of $\Omega$ is a $\sigma$-field.

With any experiment we may associate a pair ${\Omega, \mathcal{F}}$, where $\Omega$ is the set of all possible outcomes or elementary events and $\mathcal{F}$ is a $\sigma$-field of subsets of $\Omega$ which contains \emph{all the events in whose occurrences we may be interested}; henceforth, to call a set $A$ an event is equivalent to asserting that $A$ belongs to the $\sigma$-field in question.

Probability

Assume we have a “repeatable” experiment and we repeat the experiment a large number $N$ of times. Let $A \in \Omega$ and $N(A)$ be the number of $A$ occurs in the $N$ trails. Intuitively, the ratio $N(A)/ N$ appears to converge to a constant limit as $N$ increases. In practice, we have

$0 \leq N(A)/ N \leq 1$;
if $A, B$ are disjoint, then $N(A \cup B) = N(A) + N(B)$. (finite additive and countably additive)

Definition 5. A probability measure $\mathbf{P}$ on $\{ \Omega, \mathcal{F} \}$ is a function $\mathbf{P} : \mathcal{F} \to [0,1]$ satisfying

$\mathbf{P}(\varnothing) = 0$;
$\mathbf{P}(\Omega) = 1$;
if $A_1, A2, \dots$ is a collection of disjoint members of $\mathcal{F}$, in that $A_i \cap A_j = \varnothing$ for all pairs $i, j$ satisfying $i \neq j$, then \begin{equation*} \mathbf{P}\left( \bigcup_{i=1}^\infty A_i \right) = \sum_{i=1}^\infty \mathbf{P} (A_i). \end{equation*}

The triple ${ \Omega, \mathcal{F}, \mathbf{P} }$, comprising a set $\Omega$, a $\sigma$-field $\mathcal{F}$ of subsets of $\Omega$, and a probability measure $\mathbf{P}$ on ${\Omega, \mathcal{F}}$, is called a \textbf{probability space}. We can associate a probability space ${ \Omega, \mathcal{F}, \mathbf{P} }$ with any experiment, and all questions associated with the experiment can be reformulated in terms of this space.

Remark. A probability measure is a special example of what is called a \emph{measure} on the pair $\{\Omega, \mathcal{F}\}$. A measure is a function $\mu: \mathcal{F} \to [0, \infty)$ satisfying $\mu(\varnothing) = 0$ together with (c) above. A measure $\mu$ is a probability measure if $\mu(\Omega) = 1$.

Lemma. We can deduce some lemmas from the definition.

$\mathbf{P}(A^c) = 1 - \mathbf{P}(A)$;
if $A \subseteq B$, then $\mathbf{P}(B) = \mathbf{P}(A) + \mathbf{P}(B\backslash A) \geq \mathbf{P}(A)$;
$\mathbf{P}(A \cup B) = \mathbf{P}(A) + \mathbf{P}(B) - \mathbf{P}(A \cap B)$;
more generally, if $A_1, A_2, \dots, A_n$ are events, then \begin{equation*} \begin{split} \mathbf{P}\left( \bigcup_{i=1}^n A_i \right) =& \sum_{i}\mathbf{P}(A_i) - \sum_{i < j} \mathbf{P}(A_i \cap A_j) + \sum_{i<j<k} \mathbf{P}(A_i \cap A_j \cap A_k) - \cdots \\ & +(-1)^{n+1} \mathbf{P}(A_1 \cap A_2 \cap \cdots \cap A_n) \end{split} \end{equation*}

Lemma. Let $A_1, A_2, \dots$ be an increasing sequence of events, so that $A_1 \subset A_2 \subset A_3 \subset \cdots$, and write $A$ for their limit: \begin{equation*} A = \bigcup_{i=1}^\infty A_i = \lim_{i \to \infty} A_i. \end{equation*} Then $\mathbf{P}(A) = \lim_{i\to\infty} \mathbf{P}(A_i)$. Similarly, if $B_1, B_2, \dots$ is a decreasing sequence of events, so that $B_1 \supseteq B_2 \supseteq B_3 \supseteq \cdots$, then \begin{equation*} B = \bigcap_{i=1}^\infty B_i = \lim_{i \to \infty} B_i \end{equation*} satisfies $\mathbf{P}(B) = \lim_{i\to\infty} \mathbf{P}(B_i)$

Some useful concepts

Conditional probability

What if we only care how many times does $A$ occur only when $B$ occurs? $N(A\cap B) / N(B)$, the universe is changed.

Definition. If $\mathbf{P}(B) > 0$, then the \textbf{conditional probability} that $A$ occurs given that $B$ occurs is defined to be \begin{equation*} \mathbf{P}(A \vert B) = \frac{\mathbf{P}(A \cap B)}{\mathbf{P}(B)}. \end{equation*}

Definition 6. Suppose $\{B_i\}$ is a finite set. If $B_i$ are all disjoint and $\bigcup_{i=1}^n B_i = \Omega$, then $\{B_i\}$ is called a \textbf{partition} of $\Omega$.

Lemma. For any events $A$ and $B$ such that $0 < \mathbf{P}(B) < 1$, \begin{equation*} \mathbf{P}(A) = \mathbf{P}(A \vert B) \mathbf{P}(B) + \mathbf{P}(A \vert B^c) \mathbf{P}(B^c). \end{equation*} More generally, let $B_1, B_2, \dots, B_n$ be a partition of $\Omega$ such that $\mathbf{P}(B_i) > O$ for all $i$. Then \begin{equation*} \mathbf{P}(A) = \sum_i \mathbf{P}(A \vert B_i) \mathbf{P}(B_i). \end{equation*}

Independence

Intuition: an event $A$ occurs doesn’t affect the probability of $A$ occurs when $B$ occurs, which means $\mathbf{P}(A) = \mathbf{P}(A\vert B)$.

Definition 7. Events $A$ and $B$ are called independent if \begin{equation*} \mathbf{P}(A \cap B) = \mathbf{P}(A) \mathbf{P}(B). \end{equation*} More generally, a family $\{A_i \vert i \in I\}$ is called independent if \begin{equation*} \mathbf{P} \left( \bigcap_{i \in J} A_i \right) = \prod_{i\in J} \mathbf{P}(A_i) \end{equation*} for all finite subsets $J$ of $I$.

Remark. A common student error is to make the fallacious statement that $A$ and $B$ are independent if $A \cap B = \varnothing$.

Remark. If the family $\{A_i \vert i \in I\}$ has the property that for all \begin{equation*} \mathbf{P}(A_i \cap A_j) = \mathbf{P}(A_i) \mathbf{P}(A_j) \quad \forall i\neq j, \end{equation*} then it is called \emph{pairwise independent}. Pairwise-independent families are not necessarily independent.

Completeness and product space

Lemma. If $\mathcal{F}$ and $\mathcal{G}$ are two $\sigma$-fields of subsets of $\Omega$, then their intersection $\mathcal{F} \cap \mathcal{G}$ is a $\sigma$-field also. More generally, if $\{ \mathcal{F}_i \vert i \in I \}$ is a family of $\sigma$-fields of subsets of $\Omega$, then $\mathcal{G} = \bigcap_{i\in I} \mathcal{F}_i $ is a $\sigma$-field also.

Completeness: Let ${\Omega, \mathcal{F}, \mathbf{P}}$ be a probability space. Any event $A$ which has zero probability, that is $\mathbf{P}(A) = 0$, is called \emph{null}. It may seem reasonable to suppose that any subset $B$ of a null set $A$ will itself be null, but this may be without meaning since $B$ may not be an event, and thus $\mathbf{P}(B)$ may not be defined.

Definition 8. A probability space $\{\Omega, \mathcal{F}, \mathbf{P}\}$ is called complete if all subsets of null sets are events.

Any incomplete space can be completed thus. Let $\mathcal{N}$ be the collection of all subsets of null sets in $\mathcal{F}$ and let $\mathcal{G} = \sigma (\mathcal{F} \cup \mathcal{N})$ be the smallest $\sigma$-field which contains all sets in $\mathcal{F}$ and $\mathcal{N}$. It can be shown that the domain of $\mathbf{P}$ may be extended in an obvious way from $\mathcal{F}$ to $\mathcal{G}$; ${\Omega, \mathcal{G}, \mathbf{P}}$ is called the completion of ${\Omega, \mathcal{F}, \mathbf{P}}$.

Product space: Suppose two experiments have associated probability spaces ${ \Omega_1, \mathcal{F}_1, \mathbf{P}_1 }$ and ${ \Omega_2, \mathcal{F}_2, \mathbf{P}_2 }$ respectively. The sample space of the pair of experiments, considered jointly, is the collection $\Omega_1 \times \Omega_2 = { (\omega_1 , \omega_2) \vert w_1 \in \Omega_1, \omega_2 \in \Omega_2 }$ of ordered pairs. The appropriate $\sigma$-field of events is \textbf{more complicated} to construct. \textbf{The family of all such sets, $\mathcal{F}_1 \times \mathcal{F}_2 = {A_1 \times A_2 \vert A_1 \in \mathcal{F}_1 , A_1 \in \mathcal{F}_2 }$, is NOT in general a $\sigma$-field.

Remark. There exists a unique smallest $\sigma$-field $\mathcal{G} = \sigma(\mathcal{F}_1 \times \mathcal{F}_2)$ of subsets of $\Omega_1 \times \Omega_2$ which contains $\mathcal{F}_1 \times \mathcal{F}_2$. All we require now is a suitable probability function on $( \Omega1 \times \Omega_2, \mathcal{G} )$. Let $\mathbf{P}_{12}: \mathcal{F}_1 \times \mathcal{F}_2 \to [0, 1]$ be given by: \begin{equation*} \mathbf{P}(A_1 \times A_2) = \mathbf{P}_1(A_1) \times \mathbf{P}_2(A_2) \quad \text{for $A_1 \in \mathcal{F}_1, A_2 \in \mathcal{F}_2$}. \end{equation*} It can be shown that the domain of $\mathbf{P}_{12}$ can be extended from $\mathcal{F}_1 \times \mathcal{F}_2$ to the whole of $\mathcal{G} = \sigma(\mathcal{F}_1 \times \mathcal{F}_2)$.

Definition 9. The probability space $(\mathcal{F}_1 \times \mathcal{F}_2, \mathcal{G}, \mathbf{P}_{12})$ is called the product space of $(\Omega_1, \mathcal{F}_1, \mathbf{P}_1)$ and $(\Omega_2, \mathcal{F}_2, \mathbf{P}_2)$. The measure $\mathbf{P}_{12}$ is sometimes called the `product measure'.