Generating Functions
Generating functions
A sequence $a = {a_i \;\vert\; i = 0, 1 , 2, \dots }$ of real numbers may contain a lot of information. One concise way of storing this information is to wrap up the numbers together in a ``generating function”. For example, the (ordinary) generating function of the sequence $a$ is the function $G_a$ defined by \(\begin{equation} G_{a}(s)=\sum_{i=0}^{\infty} a_{i} s^{i} \quad \text { for } s \in \mathbb{R} \text { for which the sum converges. } \end{equation}\) In many circumstances it is easier to work with the generating function $G_a$ than with the original sequence $a$.
Theorem. [Abel's theorem] If $a_i \geq 0$ for all $i$ and $G_a(s)$ is finite for $\left\vert s \right\vert < 1$, then $\lim_{s \uparrow 1} G_a (s) = \sum_{i=1}^ \infty a_i$, whether the sum is finite or equals $+\infty$. This standard result is useful when the radius of convergence $R$ satisfies $R = 1$, since then one has no a priori right to take the limit as $s \uparrow 1$.
Moment generating function
Definition. The **moment generating function** of the random variable $X$ is the function $M: \mathbb{R} \mapsto [0, \infty)$ given by the Laplace transform of the corresponding p.d.f. $f_X(s)$: $$\begin{equation} M_X(t) = \mathbf{E} \left( e^{tX} \right) = \int_{-\infty}^\infty e^{tx} f_X(x) dx, \end{equation}$$ or corresponding p.m.f. $p_X(k)$: $$\begin{equation} M_X(t) = \sum_{k} e^{t k} \mathbf{P}(X=k) = \sum_k \sum_{n=0}^{\infty} \frac{(t k)^{n}}{n !} \mathbf{P}(X=k) = \sum_{n=0}^{\infty} \frac{t^{n}}{n !}\left(\sum_k k^{n} \mathbf{P}(X=k)\right) = \sum_{n=0}^{\infty} \frac{t^{n}}{n !} \mathbf{E}\left(X^{n}\right). \end{equation}$$ $M_X(-t)$ is so called **bilateral** Laplace transform of $f_X(x)$ or $p_X(k)$.
Under the assumption that $M_X(t)$ is infinitely differentiable at $t=0$, the following statements are true.
- $M^\prime(0) = \mathbf{E}(0) = \mu$.
- $M^{(n)}(0) = \mathbf{E}(X^n)$.
- Using Taylor's theorem, $M_X(t) = \sum_{k=0}^\infty \frac{t^k}{k!} \mathbf{E}(X^k)$.
Theorem. If $X$ and $Y$ are independent, then $$\begin{equation} \begin{split} M_{X+Y}(t) &= \int_{-\infty}^\infty e^{tz} f_{x+y}(z) dz \\ &= \int_{-\infty}^\infty e^{tz} \int_{-\infty}^\infty f_X(x) f_Y(z-x) dx dz \\ &= \int_{-\infty}^\infty e^{t(x+y)} \int_{-\infty}^\infty f_X(x) f_Y(y) dx dy \\ &= M_X(t) M_Y(t). \end{split} \end{equation}$$
Characteristic functions
Sometimes $\mathbf{E}(e^{tX})$ may blow up. So we consider some transformations in the complex domain, which usually perform better.
Definition. The **characteristic function** of $X$ is the function $\phi: \mathbb{R} \mapsto \mathbb{C}$ defined by $$\begin{equation} \phi(t)=\mathbf{E}\left(e^{i t X}\right) \quad \text { where } \quad i=\sqrt{-1}. \end{equation}$$ We often write $\phi_x$ for the characteristic function of the random variable $X$. Characteristic functions are related to Fourier transforms.
Theorem. The characteristic function $\phi$ satisfies:
- $\phi(0) = 1$, $\left\vert \phi(t) \right\vert \leq 1$ for all $t$.
- $\phi$ is uniformly continuous on $\mathbb{R}$ w.r.t. $t$.
- If $X \sim \mathcal{N}(0,1)$, then $\phi_{X}(t) = e^{-t^2/2}$.
We only prove the first statement. $$\begin{equation} \begin{split} \left\vert \phi(t) \right\vert &= \left\vert \int_{-\infty}^\infty e^{itx} f(x)dx \right\vert \\ &\leq \int_{-\infty}^\infty \left\vert e^{itx} \right\vert f(x) dx \quad \text{(triangle inequality)} \\ &= \int_{-\infty}^\infty f(x) dx \quad (\left\vert e^{itx} \right\vert=1) \\ &= 1 \end{split} \end{equation}$$ ◼
Example. [Cauchy distribution] If $f(x) = \frac{1}{\pi(1+x^2)}$, then the corresponding characteristic function is $$\begin{equation} \phi(t)=\frac{1}{\pi} \int_{-\infty}^{\infty} \frac{e^{i t x}}{1+x^{2}} d x = e^{-\left\vert t \right\vert}. \end{equation}$$
Theorem. The following statements are true.
- If $\phi^{(k)}(0)$ exists, then $$\begin{equation} \left\{\begin{array}{ll}{\mathbf{E}\left|X^{k}\right|<\infty} & {\text { if $k$} \text { is even }} \\ {\mathbf{E}\left|X^{k-1}\right|<\infty} & {\text { if $k$ } \text { is odd }}\end{array}\right. \end{equation}$$
- If $\mathbf{E}(\left\vert X^k \right\vert) < \infty$, then $$\begin{equation} \phi(t)=\sum_{j=0}^{k} \frac{\mathbb{E}\left(X^{j}\right)}{j !}(i t)^{j}+\mathrm{o}\left(t^{k}\right) \end{equation}$$ and so $\phi^{(k)}(0)=i^{k} \mathbb{E}\left(X^{k}\right)$.
Theorem. If $X$ and $Y$ are independent then $$\begin{equation} \phi_{X+Y}(t)=\phi_{X}(t) \phi_{Y}(t). \end{equation}$$ Similarly, if $X_1, \dots, X_n$ are independent, then $$\begin{equation} \phi_{X_1 + \cdots + X_n}(t)= \prod_{i=1}^n \phi_{X_i}(t). \end{equation}$$
Theorem. If $a, b \in \mathbb{R}$ and $Y = aX+b$, then $\phi_{Y}(t)=e^{i t b} \phi_{X}(a t)$.
$$\begin{equation} \phi_{Y}(t)=\mathbf{E}\left(e^{i t(a X+b)}\right)=\mathbf{E}\left(e^{i t b} e^{i(a t) X}\right) = e^{i t b} \mathbf{E}\left(e^{i(a t) X}\right)=e^{i t b} \phi_{X}(a t). \end{equation}$$ ◼
Theorem. Random variables $X$ and $Y$ are independent if and only if $$\begin{equation} \phi_{X, Y}(s, t)=\phi_{X}(s) \phi_{Y}(t) \quad \text { for all } s \text { and } t. \end{equation}$$
Definition. We say that the sequence $F_1 , F_2, \dots $ of distribution functions converges to the distribution function $F$, written $F_n \to F$, if $F(x) = \lim_{n\to\infty} F_n(x)$ at each point $x$ where $F$ is continuous.
Theorem. [Continnity theorem] Suppose that $F_1 , F_2, \dots $ is a sequence of distribution functions with corresponding characteristic functions $\phi_1 , \phi_2, \dots $.
- If $F_n \to F$ for some distribution function $F$ with characteristic function $\phi$, then $\phi_n(t) \to \phi(t)$ for all $t$.
- Conversely, if $\phi(t) \lim_{n\to\infty} \phi_n(t)$ exists and is continuous at $t=0$, then $\phi$ is the characteristic function of some distribution function $F$, and $F_n \to F$.
Central limit theorem
Definition. If $X, X_1 , X_2 , \dots$ is a sequence of random variables with respective distribution functions $F, F_1, F_2, \cdots$, we say that $X_n$ converges in distribution to $X$, written $X_{n} \stackrel{\mathrm{D}}{\rightarrow} X$, if $F_n \to F$ as $n \to\infty$.
Theorem. [Central limit theorem] Let $X_1 , X_2, \dots$ be a sequence of independent identically distributed random variables with finite mean $\mu$ and finite nonzero variance $\sigma^2$, and let $S_n = X_1 + X_2 + \cdots + X_n$. Then $$\begin{equation} \frac{S_{n}-n \mu}{\sqrt{n \sigma^{2}}} \stackrel{\mathrm{D}}{\rightarrow} N(0,1) \quad \text { as } \quad n \rightarrow \infty. \end{equation}$$
First, write $Y_i = \frac{X_i - \mu}{\sigma}$, and let $\phi_Y$ be the characteristic function of the $Y_i$. We have that $$\begin{equation} \phi_{Y}(t)=1-\frac{1}{2} t^{2}+o\left(t^{2}\right). \end{equation}$$ Note that $Y_i$ are i.i.d. So the characteristic function of $\sum_{i=1}^n Y_i$ is $$\begin{equation} \phi_n = [\phi_{Y}(t)]^n = \left[ 1-\frac{1}{2} t^{2}+o\left(t^{2}\right) \right]^n. \end{equation}$$ Also, the characteristic function $\psi_n$ of $$\begin{equation} U_{n}=\frac{S_{n}-n \mu}{\sqrt{n \sigma^{2}}}=\frac{1}{\sqrt{n}} \sum_{i=1}^{n} Y_{i} \end{equation}$$ satisfies $$\begin{equation} \psi_{n}(t)=\left\{\phi_{Y}(t / \sqrt{n})\right\}^{n}=\left\{1-\frac{t^{2}}{2 n}+o\left(\frac{t^{2}}{n}\right)\right\}^{n} \rightarrow e^{-\frac{1}{2} t^{2}} \quad \text { as } \quad n \rightarrow \infty, \end{equation}$$ where we used $$\begin{equation} \lim_{n\to\infty} \left( 1 + \frac{a}{n} \right)^n = e^a. \end{equation}$$ The last function is the characteristic function of the $\mathcal{N}(0, 1)$ distribution, and an application of the continuity theorem completes the proof. ◼
Corollary. $Q_n = \frac{1}{n}S_n \to \mathcal{N} \left(\mu, \frac{\sigma^2}{n} \right)$, $S_n \to \mathcal{N}(\mu, n \sigma^2)$. The sampling error is proportional to $\frac{1}{\sqrt{n}}$.
There is a generalization. If $X_i$ is not i.i.d., we can still use the central limit theorem.