Law of large numbers
Note that in section, we are dealing with random variables with independent, identical distribution, also written as i.i.d. The law of large numbers aims to study the convergence of the average sum of large i.i.d. random variables.
We first prove the following important lemma.
Lemma. [Chebyshev Inequality]
Let be a random variable with and , then for any , we have
In other words, we have
Proof ▸
We assume is a discrete random variable. It can be easily extend to the case where is continuous. We denote and as the p.m.f. of .
We first expand the LHS of and obtain
On the other hand, we have
Therefore, we have
◼
Chebyshev’s Inequality is the best possible inequality in the sense that, for any , it is possible to give an example of a random variable for which Chebyshev’s Inequality is in fact an equality.
Example.
Suppose we have a random variable such that for any , . Clearly, and . Therefore,
Also note that . The equality sign of Chebyshev inequality holds. We cannot get better result.
Theorem 1. [Law of large numbers]
Consider a sequence of i.i.d. random variables with finite mean and variance. Denote and . Define
then for any ,
or
This means converges to in probability.
Proof ▸
We notice that
which shows that the expectation of is the same as the expectation of . We also have
Using Chebyshev inequality, for any , we have
Therfore,
Since the probability is nonnegative, we must have
This finishes the proof.
◼
This result is significant from the view of frequentist statistics. Recall the probability of an event is motivated by where and the number of occurrence of and the number of total experiments respectively. Now we can let , which is the indicator of the event . Since each experiment is independent, we are actually perform a series Bernoulli trails and is the simple Bernoulli variable. Then we can write . Now
Note that and . Therefore,
Example. [Cauchy distribution]
The Cauchy distribution is given by
where is the normalization parameter. Let be the random variable which has the Cauchy distribution. Note that although the Cauchy distribution is very like the normal distribution, doesn't have the variance. This is because the Cauchy distribution has a long tail as and it converges slowly. But has a mean which is . So the question is: does converges to ? The answer is negative. This example shows that if the variance is not finite, the law of large numbers fails.
%<div class="remark"><p>Remark.
%It is interesting to note that if and are bernoulli, then is Cauchy.
%</p></div>
Conditional distributions and conditional expectation
(This section is the supplement of the lecture.)
Definition 1.
The conditional distribution function of given is the function given by
for any such that . It is sometimes denoted .
Remembering that distribution functions are integrals of density functions, we are led to the following definition.
Definition 2.
The conditional density function of , written , is given by
for any such that .
Theorem 2.
The conditional expectation satisfies
Theorem 3.
The conditional expectation satisfies
for any function for which both expectations exist.
Functions of continuous random variables
Let be a random variable with density function , and let be a sufficiently nice
function. Then is a random variable also. In order to calculate the distribution of , we proceed thus
The is defined as follows. If then .
Example.
Let for fixed . Then has distribution function
Differentiate to obtain .
More generally, if and have joint density function , and are two functions mapping , then we can use the Jacobian to find the density the joint density function of the pair , .
Let ,
be a one-one mapping taking some domain onto some
range . The transformation can be inverted as , ; the Jacobian of this inverse is defined to be the determinant
which express as a function . We assume the partial derivatives are continuous.
Theorem 4.
If , and maps the set onto the set , then
Corollary 4.1.
If , have joint density function , then the pair given by has joint density function
A similar result holds for mappings of into . This technique is sometimes referred to as the method of change of variables.
Example.
Suppose that
where . Check that
Multivariate normal distribution
Definition and properties
Definition 3.
The vector has the **multivariate normal distribution** (or **multinormal distribution**), written , if its joint density function is
where is a positive definite symmetric matrix.
Theorem 5.
If is , then
- , which is to say that for all ,
- is called the covariance matrix, because .
Theorem 6.
If is and is given by for some matrix of rank , then is .
Definition 4.
The vector of random variables is said to have the
**multivariate normal distribution** whenever, for all , the linear combination has a normal distribution.
Distributions arising from the normal distribution
Suppose that is a collection
of independent variables for some fixed but unknown values of and . We can use them to estimate and .
Definition 5.
The **sample mean** of a sequence of random variables is
It is usually used as a guess at the value of .
Definition 6.
The **sample variance** of a sequence of random variables is
It is usually used as a guess at the value of .
Theorem 7.
If are independent variables, then and are independent. We have that is and is .
Definition 7.
If are standard normal random variables, then the sum of their squares,
is distributed according to the distribution with **degrees of freedom**. This is usually denoted as
The probability density function (p.d.f.) of the distribution is
Sampling from a distribution
A basic way of generating a random variable with given distribution function is to use the following theorem.
Theorem 8. [Inverse transform technique]
Let be a distribution function, and let be uniformly distributed on the interval .
- If is a continuous function, the random variable has distribution function .
-
Let be the distribution function of a random variable taking non-negative integer values. The random variable given by
has distribution function .