Recursive Least Squares

Recursive least squares (RLS) is an adaptive filtering algorithm that recursively finds the coefficients to minimize a weighted linear least squares cost function relating to the input signals. Let xiRn and yiR be the i-th input and output signals which satisfy the following linear relationship:

yi=xiTw+vi,

where vi is a random noise and wRn is the parameter that we want to find. Assume we have m signal pairs. We denote Ym=[y1,,ym]Rm and Xm=[x1,,xn]Rn×m. Vm=[v1,,vm]Rm. Then, we can write a compact form

Ym=XmTw+Vm.

Note: some reference denote XRm×n such that we have Ym=Xmw+Vm. We use our notations.

Our objective is to find an estimated parameter w to minimize the least square error of m signal pairs. Define the error ei=yixiTw. Our objective is

minwi=1mei22:=YmXmTw22.

The solution to the above optimization problem is wm=(XmXmT)1XmYm.

Now we assume we have a new signal pair (xm+1,ym+1) comes. It augments the existing data set to Xm=1 and Ym+1. We want to update wm to minimize the least square loss of Xm+1 and Ym+1. We can completely forget wm and formulate a new problem to find wm+1. But it can be exhaustive. We want to do it quickly and reduce the computational burden. Can we leverage w to update the solution? The answer is yes. We know that wm+1=(Xm+1Xm+1T)Xm+1Ym+1 and Xm+1=[Xm,xm+1]Rn×(m+1) and Ym+1=[Ym,ym]Rm+1. Therefore, we have

wm+1=(XmXmT+xm+1xm+1T)1(XmYm+xm+1ym+1).

Using the Matrix Inversion Lemma (the Sherman-Morrison-Woodbury formula)

(A+UCV)1=A1A1U(C1+VA1U)1VA1,

we let A=XmXmT, U=xm+1, V=xm+1T, and C=I1. We have

(XmXmT+xm+1xm+1T)1=(XmXmT)1(XmXmT)1xm+1(I1+xm+1T(XmXmT)1xm+1)1xm+1T(XmXmT)1.

Let Pm=(XmXmT)1, then we have have the update

Pm+1=PmPmxm+1(I1+xm+1TPmxm+1)1xm+1TPm.

Then, we can write

wm+1=Pm+1(Pm1wm+xm+1ym+1).

Note from the definition that Pm1=XmXmT and Pm+11=Pm1+xm+1xm+1T, we can further simplify wm+1 as

wm+1=wm+Pm+1xm+1(ym+1xm+1Twm).

We can initialize P0=I, w_0 = 0. For m=0,1,2,, we have

  • observe (xm+1,ym+1)
  • Update Pm+1
  • update wm+1.

Initialization

We should note that RLS can start from a zero data set. We need to set P0 and w0. It is easy to get confused when solving linear systems. Suppose we want to solve b=Ax, where ARm×n. When m<n, the system is under-determined, i.e., there are more variables than equations. Therefore, infinite solutions can exist if bcol(A).

In RLS, when there is one data, the system is under-determined. So can we solve w for only one data point from the optimization? The answer is no because (xxT)1 does not exist for a single point. Therefore, at the beginning, when the data is less than the number of decision variables (m<n). We may be unable to use the formula (XXT)1XY to find the optimal w. This formula only holds when mn. Therefore, we need to assume an invertible P0 and follow the rules to update.

In fact, when m<n, Ax=b only has two possibilities. If bcol(A), we have infinitely many solutions. If bcol(A), there is no solution.

Decayed Error

We can add decay factors to the objective. The past data contribute less to the loss than the current data. The objective becomes minwi=1mλmiei22. for λ(0,1). The update role is the same, but λ is evolved.

Multi-dimensional Signals

The idea can naturally extend to yiRp for p>1. In this case, the input signal is generally a matrix xiRn×p. We have yi=xiTw. The formulation does not change.

Nonlinear RLS

Previously, we used a linear filter to achieve the least square estimation. The input-output signal can also follow some nonlinear rules

yi=f(xi)+vi.

In this case, we use a nonlinear filter f(x,w) parameterized by wRp to minimize the square loss

minwi=1mλmiyif(xi,w)22.

However, the general nonlinear functions can be hard to optimize, we in fact first linearize f at some w^ and then minimize the linearized loss. We want to perform it incrementally (or recursively) to reduce the computational burden. Assume we already have some wm1. The optimal wm on the data set Xm and Ym solves the following problem:

wm=argminwi=1mλmiyif(xi,wm1)wf(xi,wm1)T(wwm1)22.

Now we let y¯i(wm)=yif(xi,wm)+wf(xi,wm)Twm and x¯i(wm)=wf(xi,wm). We can formulate a new data matrix Y¯m(wm) and X¯m(wm). At step m+1, we receive (xm+1,ym+1) and we can compute y¯m+1(wm) and x¯m+1(wm). Then, we can use the same method in linear RLS to update wm. The difference is that we need to compute Y¯m and X¯ at wm in every m step.

The convergence issue of nonlinear RLS can be found in Bertsekas’ book Nonlinear Programming.

Useful reference

Recursive Least Squares