Next Article in Journal
Fractional Metric Dimension of Generalized Jahangir Graph
Next Article in Special Issue
Convergence Analysis of Weighted-Newton Methods of Optimal Eighth Order in Banach Spaces
Previous Article in Journal
Application of Optimization to Select Contractors to Develop Strategies and Policies for the Development of Transport Infrastructure
Previous Article in Special Issue
Resistance Distance in the Double Corona Based on R-Graph
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improved Convergence Analysis of Gauss-Newton-Secant Method for Solving Nonlinear Least Squares Problems

1
Department of Mathematical Sciences, Cameron University, Lawton, OK 73505, USA
2
Department of Theory of Optimal Processes, Ivan Franko National University of Lviv, 79000 Lviv, Ukraine
*
Author to whom correspondence should be addressed.
Mathematics 2019, 7(1), 99; https://doi.org/10.3390/math7010099
Submission received: 20 October 2018 / Revised: 12 January 2019 / Accepted: 15 January 2019 / Published: 18 January 2019
(This article belongs to the Special Issue Computational Methods in Analysis and Applications)

Abstract

:
We study an iterative differential-difference method for solving nonlinear least squares problems, which uses, instead of the Jacobian, the sum of derivative of differentiable parts of operator and divided difference of nondifferentiable parts. Moreover, we introduce a method that uses the derivative of differentiable parts instead of the Jacobian. Results that establish the conditions of convergence, radius and the convergence order of the proposed methods in earlier work are presented. The numerical examples illustrate the theoretical results.

1. Introduction

Nonlinear least squares problems often arise while solving overdetermined systems of nonlinear equations, parameter estimation of physical processes by measurement results, constructing nonlinear regression models for solving engineering problems, etc.
The nonlinear least squares problem has the form
min x R p 1 2 F ( x ) T F ( x ) ,
where the residual function F : R p R m ( m p ) is nonlinear in x; F is a continuously differentiable function. Effective methods for solving nonlinear least squares problems is the Gauss-Newton method [1,2,3]
x n + 1 = x n [ F ( x n ) T F ( x n ) ] 1 F ( x n ) T F ( x n ) , n = 0 , 1
However, in practice, there are often problems with the calculation of derivatives. Hence, one can use the iterative-difference methods. These methods do not require calculation of derivatives. Moreover, they do not perform worse than Gauss-Newton method in terms of the convergence rate and the number of iterations. In some cases, nonlinear functions consist of differentiable and nondifferentiable parts. However, it is possible to use iterative-difference methods [4,5,6,7]
x n + 1 = x n ( A n T A n ) 1 A n T F ( x n ) , n = 0 , 1 , ,
where
A n = F ( x n , x n 1 ) ,
A n = F ( 2 x n x n 1 , x n 1 ) ,
or
A n = F ( x n , x n 1 ) + F ( x n 2 , x n ) F ( x n 2 , x n 1 ) .
It is desirable to build iterative methods that take into account properties of the problem. In particular, we can use only derivative of differentiable part of operator instead of full Jacobian, which in fact, does not exist. The methods obtained using this approach converge slowly. More efficient methods use sum of the derivatives of the differentiable part and divided difference of the nondifferentiable part of the operator instead of the Jacobian. Such an approach shows great results in the case of solving nonlinear equations.
In this work we study a combined method for solving nonlinear least squares problem, based on the Gauss-Newton, secant methods. We also use a method, requiring only derivative from the differentiable part of operator. We prove the local convergence and show efficiency on test cases when comparing with secant type methods [5,6]. The convergence region of iterative methods is small in general. This fact limits the number of initial approximations. It is therefore important to extend this region without requiring additional hypotheses. The new approach [8] leads to larger convergence radius than before [9]. We achieve this goal by locating an at least as small region as before containing the iterates. Then, the new Lipschitz constants are at least as tight as the old Lipschitz constants. Moreover, using more precise estimates on the distances involved, under weaker hypotheses, and under the same computational cost, we provide an analysis of the Gauss-Newton-Secant method with the following advantages over the corresponding results in [9]: larger convergence region; finer error estimates on the distances involved, and an at least as precise information on the location of the solution.
The rest of the paper is given as follows. Section 2 contains the statement of the problem, in Section 3 and Section 4, we present the local convergence analysis of the first and second method, respectively. In Section 5, we provide the numerical examples. The article ends with some conclusions.

2. Description of the Problem

Consider the nonlinear least squares problem
min x R p 1 2 ( F ( x ) + G ( x ) ) T ( F ( x ) + G ( x ) ) ,
where residual function F + G : R p R m ( m p ) is nonlinear in x; F is continuously differentiable function; G is continuous function, differentiability of which, in general, is not required.
We propose a modification of the Gauss-Newton method to find a solution of problem (4):
x n + 1 = x n ( A n T A n ) 1 A n T ( F ( x n ) + G ( x n ) ) , A n = F ( x n ) + G ( x n , x n 1 ) , n = 0 , 1 ,
Here, F ( x n ) is Fréchet derivative by F ( x ) ; G ( x n , x n 1 ) is a divided difference of order one for function G x [10], where vectors x n , x n 1 and x 0 , x 1 are given initial approximations, satisfying G ( x , y ) ( x y ) = G ( x ) G ( y ) for x y and G ( x , x ) = G ( x ) , if G is differentiable. Setting A n = F ( x n ) , from method (5) we get Gauss-Newton type iterative method for solving problem (4)
x n + 1 = x n ( F ( x n ) T F ( x n ) ) 1 F ( x n ) T ( F ( x n ) + G ( x n ) ) , n = 0 , 1 ,
In case of m = p , problem (4) turns into a system of nonlinear equations
F ( x ) + G ( x ) = 0 .
Then, it is well known ([3], p. 267) that techniques for minimizing problem (4) are techniques for finding a solution x * of Equation (7). In this case (5) transforms into the Newton-Secant combined method [11,12]
x n + 1 = x n ( F ( x n ) + G ( x n , x n 1 ) ) 1 ( F ( x n ) + G ( x n ) ) , n = 0 , 1 , ,
and method (6) into Newton’s-type method for solving nonlinear Equation (7) [13]
x n + 1 = x n ( F ( x n ) ) 1 ( F ( x n ) + G ( x n ) ) , n = 0 , 1 ,
We assume from now on that function G is differentiable at x = x * .

3. Local Convergence Analysis (5)

Sufficient conditions and the convergence order of the iterative process (5) are presented. However first, we need some crucial definitions. They are needed to provide a clear relationship between the Lipschitz constants appearing in the local convergence analysis and the relationship between them.
Definition 1.
The Fréchet derivative F satisfies the center-Lipschitz condition on D, if there exists L 0 > 0 such that for each x D
| | F ( x ) F ( x * ) | | L 0 | | x x * | | .
Definition 2.
The divided difference G ( x , y ) satisfies the center-Lipschitz condition D × D , if there exists M 0 > 0 such that for each x , y D
| | G ( x , y ) G ( x * , x * ) | | M 0 ( | | x x * | | + | | y x * | | ) .
Let B > 0 and α > 0 . Define function φ : [ 0 , + ) [ 0 , + ) by
φ ( r ) = B [ 2 α + ( L 0 + 2 M 0 ) r ] ( L 0 + 2 M 0 ) r .
Let U ( x * , r * ) = { x : | | x x * | | r * } , r * > 0 . Suppose that equation φ ( r ) = 1 has at least one positive solution. Denote by γ the smallest such solution. Define
D 0 = D U ( x * , γ ) .
Definition 3.
The Fréchet derivative F satisfies the restricted Lipschitz condition on D 0 , if there exists L > 0 such that for each x , y D 0
| | F ( x ) F ( y ) | | L | | x y | | .
Definition 4.
The first order divided difference G ( x , y ) satisfies the restricted Lipschitz condition on D 0 × D 0 , if there exists M > 0 such that for each x , y , u D 0
| | G ( x , y ) G ( u , x * ) | | M ( | | x u | | + | | y x * | | ) .
Next, we also state the definitions given in [9], so we can compare them to preceding ones.
Definition 5.
The Fréchet derivative F satisfies the Lipschitz condition on D, if there exists L 1 > 0 such that for each x , y D
| | F ( x ) F ( y ) | | L 1 | | x y | | .
Definition 6.
The first order divided difference G ( x , y ) satisfies the Lipschitz condition on D × D , if there exists M 1 > 0 such that for each x , y , u , v D
| | G ( x , y ) G ( u , v ) | | M 1 ( | | x u | | + | | y v | | ) .
Remark 1.
It follows from the preceding definitions that L = L ( L 0 , M 0 ) , M = M ( L 0 , M 0 )
L 0 L 1 ,
L L 1 ,
M 0 M 1 ,
and
M M 1 ,
since D 0 D . If any of (17)–(20) are strict inequalities, then the following advantages are obtained over the work in [9] using L 1 and M 1 instead of the new constants:
( a 1 ) At least as large convergence domain leading to at least as many initial choices.
( a 2 ) At least as tight upper bounds on the distances | | x n x * | | , so at most as many iterations are needed to obtain a desired error tolerance.
It is always true that D 0 is at least as small and included in D by (12). Here lies the new idea and the reason for the advantages. Notice that these advantages are obtained under the same computational cost, as in [9], since the new constants L 0 , M 0 , L and M are special cases of constants L 1 and M 1 . This technique of using the center Lipschitz condition in combination with the restricted convergence region has been used on Newton’s, Secant and Newton-like methods [14] and can be used on other methods in order to extend their applicability.
The Euclidean norm, and the corresponding matrix norm are used in this study which has the advantage A T = A .
The proof of the next result follows the corresponding one in [9] but there are crucial differences where we use ( L 0 , L ) instead of L 1 and M 0 , M instead of M 1 .
Theorem 1.
Let F + G : R p R m be continuous on set D R p , F be continuously differentiable in this set, and G ( · , · ) : D × D L ( R p , R m ) be a divided difference of order one. Suppose, the problem (4) has a solution x * on set D, and the inverse operator ( A * T A * ) 1 = [ ( F ( x * ) + G ( x * , x * ) ) T ( F ( x * ) + G ( x * , x * ) ) ] 1 exists, | | ( A * T A * ) 1 | | B , (9), (10), (13), (14) hold, and γ defined in (11) exists. Moreover,
| | F ( x * ) + G ( x * ) | | η , | | F ( x * ) + G ( x * , x * ) | | α ;
B ( L 0 + 2 M 0 ) η < 1 ,
and U ( x * , r * ) D , where r * is the unique positive zero of function q, defined by
q ( r ) = B [ ( α + ( L 0 + 2 M 0 ) r ) ( L + 2 M ) r / 2 + ( L 0 + 2 M 0 ) η ] + B [ 2 α + ( L 0 + 2 M 0 ) r ] ( L 0 + 2 M 0 ) r 1 .
Then, for x 0 , x 1 U ( x * , r * ) method (5) is well defined and generates the sequence { x n } , n = 0 , 1 , , which belongs to set U ( x * , r * ) , and converges to the solution x * . Moreover, the following error bounds hold
| | x n + 1 x * | | C 1 | | x n 1 x * | | + C 2 | | x n x * | | + C 3 | | x n 1 x * | | | | x n x * | | + C 4 | | x n 1 x * | | 2 | | x n x * | | + C 5 | | x n x * | | 2 + C 6 | | x n 1 x * | | | | x n x * | | 2 + C 7 | | x n x * | | 3 ,
where
g ( r ) = B [ 1 B ( 2 α + ( L 0 + 2 M 0 ) r ) ( L 0 + 2 M 0 ) r ] 1 ; C 1 = g ( r * ) M 0 η ; C 2 = g ( r * ) ( L 0 + M 0 ) η ; C 3 = g ( r * ) α M ; C 4 = g ( r * ) M 0 M ; C 5 = g ( r * ) α L 2 ; C 6 = g ( r * ) L 0 M + M 0 M + M 0 L 2 ; C 7 = g ( r * ) L 2 L 0 + M 0 .
Proof. 
According to the intermediate value theorem on [ 0 , r ] for sufficiently large r and in view of (22) function q has at least one positive zero. Denote by r * the least such positive zero. Moreover, we have q ( r ) 0 for r 0 . Indeed, this zero is unique on [ 0 , r ] .
We shall show estimate (24) by first showing that sequence x n is well defined.
Let A n = F ( x n ) + G ( x n , x n 1 ) , and set n = 0 . We need to show that linear operator A 0 is invertible. By assuming, x 0 , x 1 U ( x * , r * ) , we obtain the following estimation:
| | I ( A * T A * ) 1 A 0 T A 0 | | = | | ( A * T A * ) 1 ( A * T A * A 0 T A 0 ) | | = | | ( A * T A * ) 1 ( A * T ( A * A 0 ) + ( A * T A 0 T ) ( A 0 A * ) + ( A * T A 0 T ) A * ) | | | | ( A * T A * ) 1 | | ( | | A * T | | | | A * A 0 | | + | | A * T A 0 T | | | | A 0 A * | | + | | A * T A 0 T | | | | A * | | ) B ( α | | A * A 0 | | + | | A * T A 0 T | | | | A 0 A * | | + α | | A * T A 0 T | | ) .
By (9) and (10), we have in turn the estimate
| | A 0 A * | | = | | ( F ( x 0 ) + G ( x 0 , x 1 ) ) ( F ( x * ) + G ( x * , x * ) ) | | = | | F ( x 0 ) F ( x * ) + G ( x 0 , x 1 ) G ( x * , x * ) | | | | F ( x 0 ) F ( x * ) | | + | | G ( x 0 , x 1 ) G ( x * , x * ) | | L 0 | | x 0 x * | | + M 0 ( | | x 0 x * | | + | | x 1 x * | | ) .
Then from inequality (26), definition of r * (23), we get
| | I ( A * T A * ) 1 A 0 T A 0 | | B [ 2 α + L 0 | | x 0 x * | | + M 0 ( | | x 0 x * | | + | | x 1 x * | | ) ] × [ L 0 | | x 0 x * | | + M 0 ( | | x 0 x * | | + | | x 1 x * | | ) ] B [ 2 α + ( L 0 + 2 M 0 ) r * ] ( L 0 + 2 M 0 ) r * = φ ( r * ) < 1 .
By the Banach Lemma on invertible operators [3], and (28) A 0 T A 0 1 is invertible. Then from (26), (27) and (28), we get in turn that
| | ( A 0 T A 0 ) 1 | | g 0 = B { 1 B [ 2 α + ( L 0 | | x 0 x * | | + M 0 ( | | x 0 x * | | + | | x 1 x * | | ) ] × ( L 0 | | x 0 x * | | + M 0 ( | | x 0 x * | | + | | x 1 x * | | ) } 1 g ( r * ) = B { 1 B [ 2 α + ( L 0 + 2 M 0 ) r * ] ( L 0 + 2 M 0 ) r * } 1 .
Hence, iterate x 1 is well defined by method (5) for n = 0 . Next, we will show that x 1 U ( x * , r * ) . First of all, we get the estimation
| | x 1 x * | | = | | x 0 x * ( A 0 T A 0 ) 1 ( A 0 T ( F ( x 0 ) + G ( x 0 ) ) A * T ( F ( x * ) + G ( x * ) ) | | | | ( A 0 T A 0 ) 1 | | | | [ A 0 T ( A 0 0 1 F ( x * + t ( x 0 x * ) ) d t G ( x 0 , x * ) ) ( x 0 x * ) + ( A 0 T A * T ) ( F ( x * ) + G ( x * ) ) ] | | .
Moreover, using (9), (10), (13), (14) and (21), we obtain in turn
| | A 0 0 1 F ( x * + t ( x 0 x * ) ) d t G ( x 0 , x * ) | | = | | F ( x 0 ) 0 1 F ( x * + t ( x 0 x * ) ) d t + G ( x 0 , x 1 ) G ( x 0 , x * ) | | = | | 0 1 ( F ( x 0 ) F ( x * + t ( x 0 x * ) ) d t + G ( x 0 , x 1 ) G ( x 0 , x * ) | | 1 2 L | | x 0 x * | | + M | | x 1 x * | | = 1 2 L | | x 0 x * | | + 2 M | | x 1 x * | | , | | A 0 | | | | A * | | + | | A 0 A * | | α + L 0 | | x 0 x * | | + M 0 ( | | x 0 x * | | + | | x 1 x * | | ) .
Then, by method (5) for n = 0 and the preceding estimate, we have in turn that
| | x 1 x * | | B { α + L 0 | | x 0 x * | | + M 0 ( | | x 0 x * | | + | | x 1 x * | | ) × 1 2 L x 0 x * + 2 M x 1 x * | | x 0 x * | | + η L 0 | | x 0 x * | | + M 0 ( | | x 0 x * | | + | | x 1 x * | | ) } / { 1 B [ 2 α + L 0 | | x 0 x * | | + M 0 ( | | x 0 x * | | + | | x 1 x * | | ) ] × ( L 0 | | x 0 x * | | + M 0 ( | | x 0 x * | | + | | x 1 x * | | ) ) } g 0 { ( α + L 0 | | x 0 x * | | + M 0 ( | | x 0 x * | | + | | x 1 x * | | ) ) × 1 2 L x 0 x * + 2 M x 1 x * | | x 0 x * | | + η L 0 | | x 0 x * | | + M 0 ( | | x 0 x * | | + | | x 1 x * | | ) } < g ( r * ) [ ( α + ( L 0 + 2 M 0 ) r * ) ( L + 2 M ) r * / 2 + ( L 0 + 2 M 0 ) η ] r * = p ( r * ) r * = r * ,
where p ( r ) = g ( r ) [ ( α + ( L 0 + 2 M 0 ) r ) ( L + 2 M ) r / 2 + ( L 0 + 2 M 0 ) η ] . That is x 1 U ( x * , r * ) and estimate (24) holds for n = 0 .
Suppose that x n U ( x * , r * ) for n = 0 , 1 , , k and estimate (24) holds for n = 0 , 1 , , k 1 , where k 1 is integer. We shall show that x n + 1 U and estimate (24) holds for n = k .
As in the derivation of (28), using (9), (21) and the definition of function φ , we get in turn that
| | I ( A * T A * T ) 1 A k T A k | | = | | ( A * T A * ) 1 ( A * T A * A k T A k ) | | = | | ( A * T A * ) 1 ( A * T ( A * A k ) + ( A * T A k T ) ( A k A * ) + ( A * T A k T ) A * ) | | | | ( A * T A * ) 1 | | ( | | A * T | | | | A * A k | | + | | A * T A k T | | | | A k A * | | + | | A * T A k T | | | | A * | | ) B ( α | | A * A k | | + | | A * T A k T | | | | A k A * | | + α | | A * T A k T | | ) B [ 2 α + L 0 | | x k x * | | + M 0 ( | | x k x * | | + | | x k 1 x * | | ) ] × [ L 0 | | x k x * | | + M 0 ( | | x k x * | | + | | x k 1 x * | | ) ] B [ 2 α + ( L 0 + 2 M 0 ) r * ] ( L 0 + 2 M 0 ) r * < 1 .
Hence, A k T A k 1 exists and
| | ( A k + 1 T A k + 1 ) 1 | | g k = B { 1 B [ 2 α + ( L 0 | | x k x * | | + M 0 ( | | x k x * | | + | | x k 1 x * | | ) ) ] × ( L 0 | | x k x * | | + M 0 ( | | x k x * | | + | | x k 1 x * | | ) ) } 1 g ( r * ) .
Therefore, iteration x k + 1 is well defined, and the following estimate holds
| | x k + 1 x * | | = | | x k x * ( A k T A k ) 1 ( A k T ( F ( x k ) + G ( x k ) ) A * T ( F ( x * ) + G ( x * ) ) | | | | ( A k T A k ) 1 | | | | ( [ A k T ( A k 0 1 F ( x * + t ( x k x * ) ) d t G ( x k , x * ) ) ( x k x * ) + ( A k T A * T ) ( F ( x * ) + G ( x * ) ) ] | | g k { α + L 0 | | x k x * | | + M 0 ( | | x k x * | | + | | x k 1 x * | | ) × 1 2 L x k x * + 2 M x k 1 x * | | x k x * | | + η L 0 | | x k x * | | + M 0 ( | | x k x * | | + | | x k 1 x * | | ) } g ( r * ) { α + L 0 | | x k x * | | + M 0 ( | | x k x * | | + | | x k 1 x * | | ) × 1 2 L x k x * + 2 M x k 1 x * | | x k x * | | + η L 0 | | x k x * | | + M 0 ( | | x k x * | | + | | x k 1 x * | | ) } < p ( r * ) r * = r * .
That proves x k + 1 U ( x * , r * ) and estimate (24) for n = k .
Thus, method (5) is well defined, x n U ( x * , r * ) for all n 0 and estimate (24) holds for all n 0 . It remains to prove that x n x * for n .
Define a and b on [ 0 , r * ] by
a ( r ) = g ( r ) ( ( L 0 + M 0 ) η + α L r / 2 + L ( L 0 + M 0 ) r 2 / 2 )
and
b ( r ) = g ( r ) ( M 0 η + α M r + ( 2 M 0 M + L 0 M + M 0 L 2 ) r 2 ) .
According to r * , we get
a ( r * ) 0 , b ( r * ) 0 , a ( r * ) + b ( r * ) = 1 .
Using estimate (24), the definitions of constants C i , i = 1 , 2 , , 7 , and functions a and b, for n 0 we get following
| | x n + 1 x * | | C 1 | | x n 1 x * | | + C 2 | | x n x * | | + C 3 r * | | x n 1 x * | | + C 4 r * 2 | | x n 1 x * | | + C 5 r * | | x n x * | | + C 6 r * 2 | | x n 1 x * | | + C 7 r * 2 | | x n x * | | = a ( r * ) | | x n x * | | + b ( r * ) | | x n 1 x * | | .
As it was shown in [1], under conditions (29)–(32) sequence { x n } converges to x * , as n . □
Corollary 1.
In case of η = 0 , we have a nonlinear least squares problem with zero residual. Then, C 1 = 0 and C 2 = 0 , and estimate (24) reduces to
| | x n + 1 x * | | ( C 3 + C 4 r * ) | | x n 1 x * | | | | x n x * | | + ( C 5 + C 6 r * + C 7 r * ) | | x n x * | | 2 .
That is method (5) converges with order 1 + 5 2 .
Let G ( x ) 0 in (4), corresponding to the residual functions being differentiable. Then, from Theorem 1, we obtain the following corollary.
Corollary 2.
If G ( x ) 0 , then in the conditions of theorem, we set, M = 0 , C 3 = 0 , C 4 = 0 , and estimate (24) reduces to:
| | x n + 1 x * | | ( C 5 + C 6 r * + C 7 r * ) | | x n x * | | 2 .
Hence method (5) has a convergence order two.
Remark 2.
If L 0 = L = L 1 and M 0 = M = M 1 our results specialize to the corresponding ones in [9]. Otherwise, they constitute an improvement as already noted in the Remark 1. As an example let the a 1 , q 1 , C 1 1 , C 2 1 , C 3 1 , C 4 1 , r * 1 denote the functions and parameter where L 0 , L , M 0 , M are replaced by L 1 , L 1 , M 1 , M 1 respectively. Then we have in view of (17)–(20) that
q ( r ) q 1 ( r ) ,
g ( r ) g 1 ( r ) ,
C 1 C 1 1 ,
C 2 C 2 1 ,
C 3 C 3 1 ,
C 4 C 4 1 ,
so
r * 1 r * ,
B ( L 1 + 2 M 1 ) η < 1 B ( L 0 + 2 M 0 ) η < 1 .
Consequently, the new sufficient convergence criteria are weaker than the ones in [9], unless, if L 0 = L 1 and M 0 = M 1 . And moreover, the new error bounds are tighter than the corresponding ones in [9] and the rest of the advantages already mentioned in Remark 1 hold true.
The results can be improved even further, if (10) and (14) are replaced by
| | G ( x , y ) G ( x * , x * ) | | K 0 | | x x * | | + K 0 ¯ | | y x * | | ,
and
| | G ( x , y ) G ( u , x * ) | | N 0 | | x u | | + N 0 ¯ | | y x * | | ,
respectively, since K 0 M 0 , K 0 ¯ M 0 , N 0 M and N 0 ¯ M . We leave the details to the motivated reader.

4. Local Convergence Analysis (6)

Sufficient conditions and the rate of local converges of method (6) are defined in the following theorem.
Theorem 2.
Let F + G : R p R m be continuous on set D R p , F be continuously differentiable in this set, and G be a function on D. Suppose, the problem (4) has a solution x * on set D, F ( x * ) + G ( x * ) = 0 and the inverse operator ( A * T A * ) 1 = [ F ( x * ) T F ( x * ) ] 1 exists and | | ( A * T A * ) 1 | | B . Fréchet derivative F and function G satisfy Lipschitz conditions on set D 0
F ( x ) F ( x * ) L 0 x x * ,
F ( x ) F ( y ) L x y ,
G ( x ) G ( x * ) M 0 x x * .
Moreover,
F ( x * ) α ;
B M 0 α < 1
and U ( x * , r * ) D , where r * is unique positive zero of function q, defined by
q ( r ) = B [ ( α + L 0 r ) ( L r + 2 M 0 ) / 2 ] + B [ 2 α + L 0 r ] L 0 r 1 .
Then, for x 0 U ( x * , r * ) method (6) is well defined and generates the sequence { x n } , n = 0 , 1 , which belongs to set U ( x * , r * ) , and converges to the solution x * . Moreover, the following error bounds hold
| | x n + 1 x * | | C 1 | | x n x * | | + C 2 | | x n x * | | 2 + C 3 | | x n x * | | 3 ,
where
g ( r ) = B [ 1 B ( 2 α + L 0 r ) L 0 r ] 1 ; C 1 = g ( r * ) M 0 α ; C 2 = g ( r * ) L 0 M 0 + α L 2 ; C 3 = g ( r * ) L 0 L 2 .
Proof. 
According to intermediate value theorem on [ 0 , r ] for sufficiently large r and in view of (39) function q has a least positive zero, denoted by r * , and q ( r ) 0 for r 0 . Indeed, this zero is unique on [ 0 , r ] . The proof analogous to the one given in Theorem 1.
Let A n = F ( x n ) , and set n = 0 . By assuming x 0 , x 1 U ( x * , r * ) . By analogy to (26) in Theorem 1, we get
| | I ( A * T A * ) 1 A 0 T A 0 | | B ( α | | A * A 0 | | + | | A * T A 0 T | | | | A 0 A * | | + α | | A * T A 0 T | | ) .
Taking into account, that
| | A 0 A * | | = | | F ( x 0 ) F ( x * ) | | | L 0 | | x 0 x * | | ,
from inequality (43), definition of r * given in (40), we get
| | I ( A * T A * ) 1 A 0 T A 0 | | B [ 2 α + L 0 | | x 0 x * | | ] L 0 | | x 0 x * | | B [ 2 α + L 0 r * ] L 0 r * = φ ( r * ) < 1 .
From the Banach Lemma on invertible operators [3], and (45) A 0 T A 0 is invertible. Then, from (43)–(45), we get
| | ( A 0 T A 0 ) 1 | | g 0 = B { 1 B [ 2 α + L 0 | | x 0 x * | | ] ( L 0 | | x 0 x * | | } 1 g ( r * ) = B { 1 B [ 2 α + L 0 r * ] L 0 r * } 1 .
Hence, iteration x 1 is well defined.
Next, we will show that x 1 U ( x * , r * ) . We have the estimate
| | x 1 x * | | = | | x 0 x * ( A 0 T A 0 ) 1 ( A 0 T ( F ( x 0 ) + G ( x 0 ) ) A * T ( F ( x * ) + G ( x * ) ) | | | | ( A 0 T A 0 ) 1 | | | | [ A 0 T ( A 0 0 1 F ( x * + t ( x 0 x * ) ) d t ( x 0 x * ) ( G ( x 0 ) G ( x * ) ) ) + ( A 0 T A * T ) ( F ( x * ) + G ( x * ) ) ] | | .
In view of the estimates
| | ( A 0 0 1 F ( x * + t ( x 0 x * ) ) d t ) ( x 0 x * ) ( G ( x 0 ) G ( x * ) ) | | = | | 0 1 ( F ( x 0 ) F ( x * + t ( x 0 x * ) ) d t ( x 0 x * ) ( G ( x 0 ) G ( x * ) ) | | 1 2 L | | x 0 x * | | + M 0 | | x 0 x * | | = 1 2 L | | x 0 x * | | + 2 M 0 | | x 0 x * | | , | | A 0 | | | | A * | | + | | A 0 A * | | α + L 0 | | x 0 x * | | ,
we obtain in turn that
| | x 1 x * | | B { α + L 0 | | x 0 x * | | 1 2 L x 0 x * + 2 M 0 | | x 0 x * | | } / { 1 B [ 2 α + L 0 | | x 0 x * | | ] L 0 | | x 0 x * | | } g 0 { α + L 0 | | x 0 x * | | 1 2 L x 0 x * + 2 M 0 | | x 0 x * | | } g ( r * ) { α + L 0 | | x 0 x * | | 1 2 L x 0 x * + 2 M 0 | | x 0 x * | | } < g ( r * ) [ ( α + L 0 r * ) ( L r * + 2 M 0 ) / 2 ] r * = r * .
Hence, x 1 U ( x * , r * ) and inequality (41) holds for n = 0 .
Suppose x n U ( x * , r * ) for n = 0 , 1 , , k and estimate (41) holds for n = 0 , 1 , , k 1 , where k 1 is integer. Next, we show that x n + 1 U ( x * , r * ) and estimate (41) holds for n = k .
Then, we obtain
| | I ( A * T A * T ) 1 A k T A k | | = | | ( A * T A * ) 1 ( A * T A * A k T A k ) | | = | | ( A * T A * ) 1 ( A * T ( A * A k ) + ( A * T A k T ) ( A k A * ) + ( A * T A k T ) A * ) | | | | ( A * T A * ) 1 | | ( | | A * T | | | | A * A k | | + | | A * T A k T | | | | A k A * | | + | | A * T A k T | | | | A * | | ) B ( α | | A * A k | | + | | A * T A k T | | | | A k A * | | + α | | A * T A k T | | ) B [ 2 α + L 0 | | x k x * | | ] L 0 | | x k x * | | B [ 2 α + L 0 r * ] L 0 r * < 1 .
Hence, A k T A k 1 exists and
| | ( A k + 1 T A k + 1 ) 1 | | g k = B { 1 B [ 2 α + L 0 | | x k x * | | ] L 0 | | x k x * | | } 1 g ( r * ) .
Therefore iteration x k + 1 is well defined, and we get in turn that
| | x k + 1 x * | | = | | x k x * ( A k T A k ) 1 ( A k T ( F ( x k ) + G ( x k ) ) A * T ( F ( x * ) + G ( x * ) ) | | | | ( A k T A k ) 1 | | | | ( [ A k T ( A k 0 1 F ( x * + t ( x k x * ) ) d t G ( x k , x * ) ) ( x k x * ) + ( A k T A * T ) ( F ( x * ) + G ( x * ) ) ] | | g k { α + L 0 | | x k x * | | 1 2 L x k x * + 2 M 0 | | x k x * | | } g ( r * ) { α + L 0 | | x k x * | | 1 2 L x k x * + 2 M 0 | | x k x * | | } < r * .
That proves x k U ( x * , r * ) , and estimate (41) for n = k .
Thus, iterative process (6) is well defined, x n U ( x * , r * ) for all n 0 and estimate (41) holds for all n 0 .
Define function a on [ 0 , r * ]
a ( r ) = g ( r ) ( M 0 α + ( α L / 2 + L 0 M 0 ) r + L 0 L r 2 / 2 ) .
Using estimate (41), the definitions of constants C i , i = 1 , 2 , 3 and function a, for n 0 , we get the following
| | x n + 1 x * | | C 1 | | x n x * | | + C 2 r * | | x n x * | | + C 3 r * 2 | | x n x * | | = a ( r * ) | | x n x * | | .
For any r * > 0 and initial point x 0 U ( x * , r * ) , r exists and 0 < r < r * such that x 0 U ( x * , r ) . Similarly to the proof that all iterates stay in U ( x * , r * ) , we show that all iterates stay in U ( x * , r ) . So, estimation (47) holds, if r * is replaced by r . In particular, from (47) for n 0 , we get
| | x n + 1 x * | | a | | x n x * | | ,
where a = a ( r ) . Obviously a 0 , a < a ( r * ) = 1 . Therefore, we obtain
| | x n + 1 x * | | a | | x n x * | | a n + 1 | | x 0 x * | | .
However, a n + 1 0 for n . Hence, sequence { x n } converges to x * as n , with a rate of geometric progression. □
The same type of improvements as in Theorem 1 are obtained for Theorem 2 (see Remark 2).
Remark 3.
As we can see from estimations (41) and (42), convergence of method (6) depends on α, L 0 , L and M. For problems with weak nonlinearity (α, L 0 , L and M 0 – “small”) convergence rate of iterative process is linear. In case of strongly nonlinear problems (α, L 0 , L and/or M 0 – “large”) method (6) may not converge at all.

5. Numerical Experiments

Let us compare the convergence rate of combined method (5), Gauss-Newton type method (6) Secant-type method for solving nonlinear least squares problem [5,6] on some test cases with
x n + 1 = x n A n T A n 1 A n T ( F x n + G ( x n ) ) , A n = F x n , x n 1 + G x n , x n 1 , n = 0 , 1 ,
Testing is carried out on nonlinear systems with a nondifferentiable operator with zero and non-zero residual. Classic Gauss-Newton and Newton methods can not be used for solving such a problem. Results are searched with an accuracy ε = 10 8 . Calculations are performed until the following conditions are satisfied
| | x n + 1 x n | | ε a n d | | A n T ( F x n + G ( x n ) ) | | ε ,
in this case f ( x ) = min x R n 1 2 ( F ( x ) + G ( x ) ) T ( F ( x ) + G ( x ) ) .
Example 1.
[11,12].
3 x 2 y + y 2 1 + | x 1 | = 0 , x 4 + x y 3 1 + | y | = 0 ,
( x * , y * ) ( 0.89465537 , 0.32782652 ) , f ( x * ) = 0 .
Example 2.
n = 2 , m = 3 ;
3 x 2 y + y 2 1 + | x 1 | = 0 , x 4 + x y 3 1 + | y | = 0 , | x 2 y | = 0 ,
( x * , y * ) ( 0.74862800 , 0.43039151 ) , f ( x * ) 4.0469349 · 10 2 .
Remark 4.
The results of the numerical experiments are shown in the Table 1. In particular, we compare studied methods with respect to the number of iterations needed to find the solution with given accuracy. In Example 1, all methods converge to one solution. In Example 2 Gauss-Newton type method (6) converges to point ( x * , y * ) ( 0.89465537 , 0.32782652 ) with residual f ( x * ) 1.11666739 · 10 1 , with the same number of iterations. Such iterations are marked with * symbol in the table. Other methods find the point ( x * , y * ) ( 0.74862800 , 0.43039151 ) with smaller residual f ( x * ) 4.0469349 · 10 2 . Additional initial approximation ( x 1 , y 1 ) is chosen as:
( x 1 , y 1 ) = ( x 0 10 4 , y 0 10 4 ) .

6. Conclusions

Based on the theoretical studies, the numerical experiments, and the comparison of obtained results, we can argue that the combined differential-difference method (5) converges faster than Gauss-Newton type method (6) and Secant type method (48). Moreover, the method has high convergence order ( 1 + 5 ) / 2 in case of zero residual and does not require calculation of derivatives of the nondifferentiable part of operator. Therefore, the proposed method (5) solves the problem efficiently and fast.

Author Contributions

All authors contributed equally and significantly to the writing of this article. All authors read and approved the final manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors would like to express their sincere gratitude to the referees for their valuable comments which have significantly improved the presentation of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Argyros, I.K. Convergence and Applications of Newton-Type Iterations; Springer: New York, NY, USA, 2008; 506p. [Google Scholar]
  2. Dennis, J.E.; Schnabel, R.B. Numerical Methods for Unconstrained Optimization and Nonlinear Equations; SIAM: Philadelphia, PA, USA, 1996. [Google Scholar]
  3. Ortega, J.M.; Rheinboldt, W.C. Iterative Solution of Nonlinear Equations in Several Variables; Academic Press: New York, NY, USA, 1970. [Google Scholar]
  4. Argyros, I.K.; Ren, H. A derivative free iterative method method for solving least squares problems. Numer. Algorithms 2011, 58, 555–571. [Google Scholar]
  5. Ren, H.; Argyros, I.K. Local convergence of a secant type method for solving least squares problems. Appl. Math. Comput. 2010, 217, 3816–3824. [Google Scholar] [CrossRef]
  6. Shakhno, S.M.; Gnatyshyn, O.P. On an iterative algorithm of order 1.839... for solving the nonlinear least squares problems. Appl. Math. Comput. 2005, 161, 253–264. [Google Scholar] [CrossRef]
  7. Shakhno, S.M.; Gnatyshyn, O.P. Iterative-difference methods for solving nonlinear least-squares problem. In Progress in Industrial Mathematics at ECMI 98; Vieweg + Teubner Verlag: Stuttgart, Germany, 1999; pp. 287–294. [Google Scholar]
  8. Argyros, I.K.; Hilout, S. On an improved convergence analysis of Newton’s method. Appl. Math. Comput. 2013, 225, 372–386. [Google Scholar] [CrossRef]
  9. Shakhno, S.M.; Shunkin, Y.V. One combined method for solving nonlinear least squares problems. Visnyk Lviv Univ. Ser. Appl. Math. Inform. 2017, 25, 38–48. (In Ukrainian) [Google Scholar]
  10. Ulm, S. On generalized divided differences. Proc. Acad. Sci. Estonian SSR. Phys. Mathe. 1967, 16, 13–26. (In Russian) [Google Scholar]
  11. Cătinas, E. On some iterative methods for solving nonlinear equations. Revue d’Analyse Numérique et de Theorie de l’Approximation 1994, 23, 47–53. [Google Scholar]
  12. Shakhno, S.M.; Mel’nyk, I.V.; Yarmola, H.P. Analysis of the Convergence of a Combined Method for the Solution of Nonlinear Equations. J. Math. Sci. 2014, 201, 32–43. [Google Scholar] [CrossRef]
  13. Zabrejko, P.P.; Nguen, D.F. The majorant method in the theory of Newton-Kantorovich approximations and the Pták error estimates. Numer. Funct. Anal. Optim. 1987, 9, 671–686. [Google Scholar] [CrossRef]
  14. Argyros, I.K.; Magreñán, Á.A. A Contemporary Study of Iterative Methods: Convergence, Dynamics and Applications; Academic Press: London, UK, 2018. [Google Scholar]
Table 1. Number of iteration made to solve test problem.
Table 1. Number of iteration made to solve test problem.
Example ( x 0 , y 0 ) Gauss-Newton Type (6)Secant Type (48)Combined Method (5)
1(1, 0)1977
(3, 1)221110
(0.5, 0.5)211810
2(1, 0)19*2212
(3, 1)22*2515
(0.5, 0.5)21*1913

Share and Cite

MDPI and ACS Style

Argyros, I.; Shakhno, S.; Shunkin, Y. Improved Convergence Analysis of Gauss-Newton-Secant Method for Solving Nonlinear Least Squares Problems. Mathematics 2019, 7, 99. https://doi.org/10.3390/math7010099

AMA Style

Argyros I, Shakhno S, Shunkin Y. Improved Convergence Analysis of Gauss-Newton-Secant Method for Solving Nonlinear Least Squares Problems. Mathematics. 2019; 7(1):99. https://doi.org/10.3390/math7010099

Chicago/Turabian Style

Argyros, Ioannis, Stepan Shakhno, and Yurii Shunkin. 2019. "Improved Convergence Analysis of Gauss-Newton-Secant Method for Solving Nonlinear Least Squares Problems" Mathematics 7, no. 1: 99. https://doi.org/10.3390/math7010099

APA Style

Argyros, I., Shakhno, S., & Shunkin, Y. (2019). Improved Convergence Analysis of Gauss-Newton-Secant Method for Solving Nonlinear Least Squares Problems. Mathematics, 7(1), 99. https://doi.org/10.3390/math7010099

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop