1. Introduction
Nonlinear least squares problems often arise while solving overdetermined systems of nonlinear equations, parameter estimation of physical processes by measurement results, constructing nonlinear regression models for solving engineering problems, etc.
The nonlinear least squares problem has the form
where the residual function
(
) is nonlinear in
x;
F is a continuously differentiable function. Effective methods for solving nonlinear least squares problems is the Gauss-Newton method [
1,
2,
3]
However, in practice, there are often problems with the calculation of derivatives. Hence, one can use the iterative-difference methods. These methods do not require calculation of derivatives. Moreover, they do not perform worse than Gauss-Newton method in terms of the convergence rate and the number of iterations. In some cases, nonlinear functions consist of differentiable and nondifferentiable parts. However, it is possible to use iterative-difference methods [
4,
5,
6,
7]
where
or
It is desirable to build iterative methods that take into account properties of the problem. In particular, we can use only derivative of differentiable part of operator instead of full Jacobian, which in fact, does not exist. The methods obtained using this approach converge slowly. More efficient methods use sum of the derivatives of the differentiable part and divided difference of the nondifferentiable part of the operator instead of the Jacobian. Such an approach shows great results in the case of solving nonlinear equations.
In this work we study a combined method for solving nonlinear least squares problem, based on the Gauss-Newton, secant methods. We also use a method, requiring only derivative from the differentiable part of operator. We prove the local convergence and show efficiency on test cases when comparing with secant type methods [
5,
6]. The convergence region of iterative methods is small in general. This fact limits the number of initial approximations. It is therefore important to extend this region without requiring additional hypotheses. The new approach [
8] leads to larger convergence radius than before [
9]. We achieve this goal by locating an at least as small region as before containing the iterates. Then, the new Lipschitz constants are at least as tight as the old Lipschitz constants. Moreover, using more precise estimates on the distances involved, under weaker hypotheses, and under the same computational cost, we provide an analysis of the Gauss-Newton-Secant method with the following advantages over the corresponding results in [
9]: larger convergence region; finer error estimates on the distances involved, and an at least as precise information on the location of the solution.
The rest of the paper is given as follows.
Section 2 contains the statement of the problem, in
Section 3 and
Section 4, we present the local convergence analysis of the first and second method, respectively. In
Section 5, we provide the numerical examples. The article ends with some conclusions.
2. Description of the Problem
Consider the nonlinear least squares problem
where residual function
(
) is nonlinear in
x;
F is continuously differentiable function;
G is continuous function, differentiability of which, in general, is not required.
We propose a modification of the Gauss-Newton method to find a solution of problem (
4):
Here,
is Fréchet derivative by
;
is a divided difference of order one for function
[
10], where vectors
,
and
,
are given initial approximations, satisfying
for
and
, if
G is differentiable. Setting
, from method (
5) we get Gauss-Newton type iterative method for solving problem (
4)
In case of
, problem (
4) turns into a system of nonlinear equations
Then, it is well known ([
3], p. 267) that techniques for minimizing problem (
4) are techniques for finding a solution
of Equation (
7). In this case (
5) transforms into the Newton-Secant combined method [
11,
12]
and method (
6) into Newton’s-type method for solving nonlinear Equation (
7) [
13]
We assume from now on that function G is differentiable at .
3. Local Convergence Analysis (5)
Sufficient conditions and the convergence order of the iterative process (
5) are presented. However first, we need some crucial definitions. They are needed to provide a clear relationship between the Lipschitz constants appearing in the local convergence analysis and the relationship between them.
Definition 1. The Fréchet derivative satisfies the center-Lipschitz condition on D, if there exists such that for each Definition 2. The divided difference satisfies the center-Lipschitz condition , if there exists such that for each Let and . Define function by Let . Suppose that equation has at least one positive solution. Denote by γ the smallest such solution. Define Definition 3. The Fréchet derivative satisfies the restricted Lipschitz condition on , if there exists such that for each Definition 4. The first order divided difference satisfies the restricted Lipschitz condition on , if there exists such that for each Next, we also state the definitions given in [
9], so we can compare them to preceding ones.
Definition 5. The Fréchet derivative satisfies the Lipschitz condition on D, if there exists such that for each Definition 6. The first order divided difference satisfies the Lipschitz condition on , if there exists such that for each Remark 1. It follows from the preceding definitions that , andsince . If any of (17)–(20) are strict inequalities, then the following advantages are obtained over the work in [9] using and instead of the new constants: At least as large convergence domain leading to at least as many initial choices.
At least as tight upper bounds on the distances , so at most as many iterations are needed to obtain a desired error tolerance.
It is always true that is at least as small and included in D by (12). Here lies the new idea and the reason for the advantages. Notice that these advantages are obtained under the same computational cost, as in [9], since the new constants and M are special cases of constants and . This technique of using the center Lipschitz condition in combination with the restricted convergence region has been used on Newton’s, Secant and Newton-like methods [14] and can be used on other methods in order to extend their applicability. The Euclidean norm, and the corresponding matrix norm are used in this study which has the advantage .
The proof of the next result follows the corresponding one in [9] but there are crucial differences where we use instead of and instead of . Theorem 1. Let be continuous on set , F be continuously differentiable in this set, and be a divided difference of order one. Suppose, the problem (4) has a solution on set D, and the inverse operator exists, , (9), (10), (13), (14) hold, and γ defined in (11) exists. Moreover,and , where is the unique positive zero of function q, defined by Then, for method (5) is well defined and generates the sequence which belongs to set , and converges to the solution . Moreover, the following error bounds holdwhere Proof. According to the intermediate value theorem on
for sufficiently large
r and in view of (
22) function
q has at least one positive zero. Denote by
the least such positive zero. Moreover, we have
for
Indeed, this zero is unique on
.
We shall show estimate (
24) by first showing that sequence
is well defined.
Let
, and set
. We need to show that linear operator
is invertible. By assuming,
, we obtain the following estimation:
By (
9) and (
10), we have in turn the estimate
Then from inequality (
26), definition of
(
23), we get
By the Banach Lemma on invertible operators [
3], and (
28)
is invertible. Then from (
26), (
27) and (
28), we get in turn that
Hence, iterate
is well defined by method (
5) for
. Next, we will show that
. First of all, we get the estimation
Moreover, using (
9), (
10), (
13), (
14) and (
21), we obtain in turn
Then, by method (
5) for
and the preceding estimate, we have in turn that
where
. That is
and estimate (
24) holds for
.
Suppose that
for
and estimate (
24) holds for
, where
1 is integer. We shall show that
and estimate (
24) holds for
.
As in the derivation of (
28), using (
9), (
21) and the definition of function
, we get in turn that
Hence,
exists and
Therefore, iteration
is well defined, and the following estimate holds
That proves
and estimate (
24) for
Thus, method (
5) is well defined,
for all
and estimate (
24) holds for all
. It remains to prove that
for
.
Define
a and
b on
by
and
According to
, we get
Using estimate (
24), the definitions of constants
, and functions
a and
b, for
we get following
As it was shown in [
1], under conditions (
29)–(
32) sequence {
} converges to
, as
. □
Corollary 1. In case of , we have a nonlinear least squares problem with zero residual. Then, and , and estimate (24) reduces to That is method (5) converges with order . Let
in (
4), corresponding to the residual functions being differentiable. Then, from Theorem 1, we obtain the following corollary.
Corollary 2. If , then in the conditions of theorem, we set, , , , and estimate (24) reduces to: Hence method (5) has a convergence order two. Remark 2. If and our results specialize to the corresponding ones in [9]. Otherwise, they constitute an improvement as already noted in the Remark 1. As an example let the denote the functions and parameter where are replaced by respectively. Then we have in view of (17)–(20) thatso Consequently, the new sufficient convergence criteria are weaker than the ones in [9], unless, if and . And moreover, the new error bounds are tighter than the corresponding ones in [9] and the rest of the advantages already mentioned in Remark 1 hold true. The results can be improved even further, if (10) and (14) are replaced byandrespectively, since , , and . We leave the details to the motivated reader. 4. Local Convergence Analysis (6)
Sufficient conditions and the rate of local converges of method (
6) are defined in the following theorem.
Theorem 2. Let be continuous on set , F be continuously differentiable in this set, and G be a function on D. Suppose, the problem (4) has a solution on set D, and the inverse operator exists and . Fréchet derivative and function G satisfy Lipschitz conditions on set Moreover,and where is unique positive zero of function q, defined by Then, for method (6) is well defined and generates the sequence which belongs to set , and converges to the solution . Moreover, the following error bounds holdwhere Proof. According to intermediate value theorem on
for sufficiently large
r and in view of (
39) function
q has a least positive zero, denoted by
, and
for
Indeed, this zero is unique on
. The proof analogous to the one given in Theorem 1.
Let
, and set
. By assuming
. By analogy to (
26) in Theorem 1, we get
Taking into account, that
from inequality (
43), definition of
given in (
40), we get
From the Banach Lemma on invertible operators [
3], and (
45)
is invertible. Then, from (
43)–(
45), we get
Hence, iteration is well defined.
Next, we will show that
. We have the estimate
In view of the estimates
we obtain in turn that
Hence,
and inequality (
41) holds for
.
Suppose
for
and estimate (
41) holds for
, where
is integer. Next, we show that
and estimate (
41) holds for
.
Hence,
exists and
Therefore iteration
is well defined, and we get in turn that
That proves
, and estimate (
41) for
.
Thus, iterative process (
6) is well defined,
for all
and estimate (
41) holds for all
.
Define function
a on
Using estimate (
41), the definitions of constants
and function
a, for
, we get the following
For any
and initial point
,
exists and
such that
. Similarly to the proof that all iterates stay in
, we show that all iterates stay in
. So, estimation (
47) holds, if
is replaced by
. In particular, from (
47) for
, we get
where
. Obviously
,
. Therefore, we obtain
However, for . Hence, sequence {} converges to as , with a rate of geometric progression. □
The same type of improvements as in Theorem 1 are obtained for Theorem 2 (see Remark 2).
Remark 3. As we can see from estimations (41) and (42), convergence of method (6) depends on α, , L and M. For problems with weak nonlinearity (α, , L and – “small”) convergence rate of iterative process is linear. In case of strongly nonlinear problems (α, , L and/or – “large”) method (6) may not converge at all. 6. Conclusions
Based on the theoretical studies, the numerical experiments, and the comparison of obtained results, we can argue that the combined differential-difference method (
5) converges faster than Gauss-Newton type method (
6) and Secant type method (
48). Moreover, the method has high convergence order
in case of zero residual and does not require calculation of derivatives of the nondifferentiable part of operator. Therefore, the proposed method (
5) solves the problem efficiently and fast.