Expectation of discrete random variables
The intuition of expectation is the average value of an experiment. Suppose we do an experiment for repeated times. The probability of each possible outcome can be approximately defined by
Then the average outcome is
Definition 1.
The mean value, or expectation, or expected value of the random variable with mass function is defined to be
whenever this sum is absolutely convergent.
Lemma.
If has mass function and , then
whenever this sum is absolutely convergent.
Example.
If is a random variable with mass function f, and , then
Definition 2.
If is a positive integer, the th moment of is defined to be
The th central moment is defined as
The two moments of most use are and , called the mean (or expectation) and variance of . These two quantities are measures of the mean and dispersion of ; that is, is the average value of , and measures the amount by which tends to deviate from this average. The mean is often denoted , and the variance of is often denoted . The positive square root is called the standard deviation, and in this notation .
The central moments can be expressed in terms of the ordinary moments . For example, , and
which may be written as
Example. [Binomial variables]
Let be a random variable with binomial distribution. The p.m.f. is
where . The expectation of is
We use the following algebraic identity to compute .
Differentiate it and multiply by , we obtain
We substitute to obtain . A similar argument shows that the variance of is given by .
We can think of the process of calculating expectations as a linear operator on the space of random variables.
Theorem 2.
The expectation operator has the following properties:
- if , then ,
- if , then ,
- the random variable 1, taking the value 1 always, has expectation .
Proof ▸
We only prove the second property, which is also called the linear property.
We must use the joint p.m.f. of and to compute the expectation.
where and are marginal p.m.f. of and respectively.
◼
Lemma.
If and are independent, then .
Proof ▸
If are independent, . Then
\end{proof}
◼
Definition 3.
and are called **uncorrelated** if .
Theorem 3.
For random variables and ,
- for ,
- is and are uncorrelated.
Sometimes the sum does not converge absolutely, which means the mean of the distribution does not exist. Here is an example.
Example. [A distribution without a mean] Let have mass function
where is chosen so that . The sum doesn't converge absolutely, because both the positive and the negative parts diverge.
This example is suitable to point out that we can base probability theory upon the expectation operator rather than upon the probability measure . Roughly speaking, the way we proceed is to postulate axioms, such as (a)-(c) of the above Theorem, for a so-called “expectation operator” acting on a space of ``random variables”. The probability of an event can then be recaptured by defining .
Recall the indicator function of a set is defined as
In addition, we have .
Dependence of discrete random variables
Definition 4.
The **joint distribution function** of and , where and are discrete variables, is given by
Their **joint mass function** is given by
We write and when we need to stress the role of and . We may think of the joint mass function in the following way. If and , then
Lemma.
The discrete random variables and are **independent** if and only if
More generally, and are independent if and only if can be **factorized as the product** of a function of alone and a function of alone.
Definition 5.
The covariance of and is
The correlation (coefficient) of and is
as long as the variances are non-zero.
Covariance itself is not a satisfactory measure of dependence because the scale of values which may take contains no points which are clearly interpretable in terms of the relationship between and .
Theorem 4. [Cauchy-Schwarz inequality] For random variables and ,
with equality if and only if for some real and , at least one of which is non-zero.
Proof ▸
For , let . Then
Thus the right-hand side is a quadratic in the variable with at most one real root. Its discriminant must be non-positive. That is to say, if ,
The discriminant is zero if and only if the quadratic has a real root. This occurs if and only if
for some and .
◼
We define . Since all satisfy the Cauchy-Schwarz inequality, so do and . Therefore,
Therefore,
which gives the following lemma.
Lemma.
The correlation coefficient satisfies with equality if and only if for some .
Expectation of continuous random variables
Idea of translating expectation from discrete to continuous
Suppose we have a continuous random variable with being the probability density function. We split into small intervals . Then . is an approximation of probability density function. Therefore,
which is the Remann sum. We take the limit and get
Expectation
Definition 6.
The **expectation** of a continuous random variable with density function is given by
whenever this integral exists.
Theorem 5.
If and are continuous random variables, then
Definition 7.
The th **moment** of a continuous variable is defined as
whenever the integral converges.
Example. [Cauchy distribution] The random variable has the Cauchy distribution t if it has density
function
This distribution is notable for having no moments.
Dependence of continuous random variables
Definition 8.
The **joint distribution function** of and is the function given by
Definition 9.
The random variables and are **(jointly) continuous** with **joint (probability) density function** if
If is sufficiently differentiable at the point , then we usually specify
Probabilities:
If is a sufficiently nice subset of , then
Marginal distributions: The marginal distribution functions of and are
Marginal density function of and :
Expectation: If is a sufficiently nice function, then
in particular, setting ,
Independence: The random variables and are independent if and only if
which, for continuous random variables, is equivalent to requiring that
Theorem 6. [Cauchy-Schwarz inequality] For any pair of jointly continuous variables, we have that
with equality if and only if for some real and , at least one of which is non-zero.