Generating functions

A sequence a=ai|i=0,1,2, of real numbers may contain a lot of information. One concise way of storing this information is to wrap up the numbers together in a ``generating function”. For example, the (ordinary) generating function of the sequence a is the function Ga defined by Ga(s)=i=0aisi for sR for which the sum converges.  In many circumstances it is easier to work with the generating function Ga than with the original sequence a.

Theorem. [Abel's theorem] If ai0 for all i and Ga(s) is finite for |s|<1, then lims1Ga(s)=i=1ai, whether the sum is finite or equals +. This standard result is useful when the radius of convergence R satisfies R=1, since then one has no a priori right to take the limit as s1.

Moment generating function

Definition. The **moment generating function** of the random variable X is the function M:R[0,) given by the Laplace transform of the corresponding p.d.f. fX(s): MX(t)=E(etX)=etxfX(x)dx, or corresponding p.m.f. pX(k): MX(t)=ketkP(X=k)=kn=0(tk)nn!P(X=k)=n=0tnn!(kknP(X=k))=n=0tnn!E(Xn). MX(t) is so called **bilateral** Laplace transform of fX(x) or pX(k).

Under the assumption that MX(t) is infinitely differentiable at t=0, the following statements are true.

  1. M(0)=E(0)=μ.
  2. M(n)(0)=E(Xn).
  3. Using Taylor's theorem, MX(t)=k=0tkk!E(Xk).

Theorem. If X and Y are independent, then MX+Y(t)=etzfx+y(z)dz=etzfX(x)fY(zx)dxdz=et(x+y)fX(x)fY(y)dxdy=MX(t)MY(t).

Characteristic functions

Sometimes E(etX) may blow up. So we consider some transformations in the complex domain, which usually perform better.

Definition. The **characteristic function** of X is the function ϕ:RC defined by ϕ(t)=E(eitX) where i=1. We often write ϕx for the characteristic function of the random variable X. Characteristic functions are related to Fourier transforms.

Theorem. The characteristic function ϕ satisfies:

  1. ϕ(0)=1, |ϕ(t)|1 for all t.
  2. ϕ is uniformly continuous on R w.r.t. t.
  3. If XN(0,1), then ϕX(t)=et2/2.

Proof ▸

We only prove the first statement. |ϕ(t)|=|eitxf(x)dx||eitx|f(x)dx(triangle inequality)=f(x)dx(|eitx|=1)=1

Example. [Cauchy distribution] If f(x)=1π(1+x2), then the corresponding characteristic function is ϕ(t)=1πeitx1+x2dx=e|t|.

Theorem. The following statements are true.

  1. If ϕ(k)(0) exists, then {E|Xk|< if k is even E|Xk1|< if k  is odd 
  2. If E(|Xk|)<, then ϕ(t)=j=0kE(Xj)j!(it)j+o(tk) and so ϕ(k)(0)=ikE(Xk).

Theorem. If X and Y are independent then ϕX+Y(t)=ϕX(t)ϕY(t). Similarly, if X1,,Xn are independent, then ϕX1++Xn(t)=i=1nϕXi(t).

Theorem. If a,bR and Y=aX+b, then ϕY(t)=eitbϕX(at).

Proof ▸

ϕY(t)=E(eit(aX+b))=E(eitbei(at)X)=eitbE(ei(at)X)=eitbϕX(at).

Theorem. Random variables X and Y are independent if and only if ϕX,Y(s,t)=ϕX(s)ϕY(t) for all s and t.

Definition. We say that the sequence F1,F2, of distribution functions converges to the distribution function F, written FnF, if F(x)=limnFn(x) at each point x where F is continuous.

Theorem. [Continnity theorem] Suppose that F1,F2, is a sequence of distribution functions with corresponding characteristic functions ϕ1,ϕ2,.

  1. If FnF for some distribution function F with characteristic function ϕ, then ϕn(t)ϕ(t) for all t.
  2. Conversely, if ϕ(t)limnϕn(t) exists and is continuous at t=0, then ϕ is the characteristic function of some distribution function F, and FnF.

Central limit theorem

Definition. If X,X1,X2, is a sequence of random variables with respective distribution functions F,F1,F2,, we say that Xn converges in distribution to X, written XnDX, if FnF as n.

Theorem. [Central limit theorem] Let X1,X2, be a sequence of independent identically distributed random variables with finite mean μ and finite nonzero variance σ2, and let Sn=X1+X2++Xn. Then Snnμnσ2DN(0,1) as n.

Proof ▸

First, write Yi=Xiμσ, and let ϕY be the characteristic function of the Yi. We have that ϕY(t)=112t2+o(t2). Note that Yi are i.i.d. So the characteristic function of i=1nYi is ϕn=[ϕY(t)]n=[112t2+o(t2)]n. Also, the characteristic function ψn of Un=Snnμnσ2=1ni=1nYi satisfies ψn(t)={ϕY(t/n)}n={1t22n+o(t2n)}ne12t2 as n, where we used limn(1+an)n=ea. The last function is the characteristic function of the N(0,1) distribution, and an application of the continuity theorem completes the proof. ◼

Corollary. Qn=1nSnN(μ,σ2n), SnN(μ,nσ2). The sampling error is proportional to 1n.

There is a generalization. If Xi is not i.i.d., we can still use the central limit theorem.