Contents
- Least Squares Approach
- Error function approximation
- Steepest descent method
- Gauss-Newton method
- Regularization
- Levenberg-Marquardt method
- SVD method
- References
Given:
| (1) |
Let us introduce a weighted error function
|
(2) |
In the ideal case of precise measurements and exact fitting function we have the following set of equations:
| |
(3) |
and the error function is equal to zero.
The number of equations K should be greater than or equal to N for the unique determination of the fitting parameters:
| |
(4) |
The least square approach defines the best fit as the set of parameters minimizing the error function:
|
(5) |
If the error function has derivatives, then the minimum is characterized by the following condition:
|
(6) |
In the vicinity of the point
near the minimum, the error function can be presented as
|
(7a) |
or
|
(7b) |
| where | ![]() |
is the gradient |
| and | ![]() |
is the Hessian |
The above error function approximation (7) can be rewritten in the following vector-matrix form:
| |
(8) |
The gradient vector gives the direction of the steepest ascent on the error surface. The fastest way to reach the minimum is to go in the opposite direction:
| |
(9) |
This approach is called the steepest descent method. This method certainly
converges to the minimum if the step
is sufficiently small. Although the convergence is slow.
For the second order approximation (8) the minimum condition (6) becomes
| |
(10) |
and the solution is
| |
(11) |
Here and below the * superscript at g and H is omitted.
Eq. (11) solves the optimization problem in one step if the error function is the second order surface. Otherwise the procedure (11) should be applied iteratively.
If the Hessian is (almost) degenerate then the procedure (11) becomes unstable. The regularization techniques use a modified operator instead of the original one (the Hessian in this case). Often a scalar matrix is used for the Hessian correction:
| |
(12) |
Instead of Eq. (10) we have
| |
(13) |
At a sufficiently large
the solution of Eq.(13) is stable because it reduces to Eq. (9) with the small
step
=
-1.
Let us apply the Gauss-Newton method to the above error function (2). We need the expressions for the gradient and the Hessian
The gradient is
|
(14) |
and the Hessian is
|
(15) |
When the error function approaches the minimum, the residuals tend to zero:
| |
(16) |
The first term in (15) becomes small and can be ignored:
|
(17) |
Eq. (17) can be rewritten in the following matrix form
|
(18a)
(18b) (18c) |
and Eq. (10) becomes
|
(19a)
(19b) |
Eq. (19) can be solved iteratively. The use of regularization (12) guarantees the convergence.
On the other hand, Eq. (19) is the least squares problem for the following system of linear equations:
| |
(20) |
The system is over-defined because of condition (4) and
(18c) is a rectangular K x N matrix. The matrix
can always be
presented as a product of three matrices:
| |
(21) |
Here U is K x K,
is
K x N and V is N x N matrices.
has
only diagonal non-zero elements and the number of non-zero elements is M
| |
(22) |
Keeping only non-zero elements in
,
the dimensions of matrices in (21) can be reduced to K x M, M
x M and N x M , respectively.
Eq. (21) is called Singular Value Decomposition (SVD), the column vectors of U
and V are called Singular Vectors and the diagonal elements of
are
called Singular Values. It is convenient to order the singular values as
follows:
| |
(23) |
The singular vectors are orthogonal
|
(24) |
The least squares solution of Eq. (20) is
|
(25) |
The regularized solution is
|
(26) |
The advantage of this approach is that change of the regularization factor does not require recalculation of SVD.
Truncation of the sum in Eq. (25) also stabilize (regularize) the solution. This regularization is called truncated SVD.
|
|
Please e-mail me at nikolai@shokhirev.com |
©Nikolai V. Shokhirev, 2001-2002