An Efficient Algorithm for the Separable Nonlinear Least Squares Problem

Shen, Yunqiu; Ypma, Tjalling J.

doi:10.3390/a10030078

Open AccessArticle

An Efficient Algorithm for the Separable Nonlinear Least Squares Problem

by

Yunqiu Shen

^* and

Tjalling J. Ypma

^*

Department of Mathematics, Western Washington University, Bellingham, WA 98225-9063, USA

^*

Authors to whom correspondence should be addressed.

Algorithms 2017, 10(3), 78; https://doi.org/10.3390/a10030078

Submission received: 7 June 2017 / Revised: 23 June 2017 / Accepted: 1 July 2017 / Published: 10 July 2017

(This article belongs to the Special Issue Numerical Algorithms for Solving Nonlinear Equations and Systems 2017)

Download Versions Notes

Abstract

:

The nonlinear least squares problem

m i n_{y, z} ∥ A (y) z + b (y) ∥

, where

A (y)

is a full-rank

(N + ℓ) \times N

matrix,

y \in R^{n}

,

z \in R^{N}

and

b (y) \in R^{N + ℓ}

with

ℓ \geq n

, can be solved by first solving a reduced problem

m i n_{y} ∥ f (y) ∥

to find the optimal value

y^{*}

of y, and then solving the resulting linear least squares problem

m i n_{z} ∥ A (y^{*}) z + b (y^{*}) ∥

to find the optimal value

z^{*}

of z. We have previously justified the use of the reduced function

f (y) = C^{T} (y) b (y)

, where

C (y)

is a matrix whose columns form an orthonormal basis for the nullspace of

A^{T} (y)

, and presented a quadratically convergent Gauss–Newton type method for solving

m i n_{y} ∥ C^{T} (y) b (y) ∥

based on the use of QR factorization. In this note, we show how LU factorization can replace the QR factorization in those computations, halving the associated computational cost while also providing opportunities to exploit sparsity and thus further enhance computational efficiency.

Keywords:

separable equations; nonlinear least squares; full-rank matrices; QR factorization; over-determined systems; Gauss–Newton method; least squares solutions; LU factorization; quadratic convergence

MSC:

65H10

1. Introduction

Many applications [1,2,3] lead to the need to find a solution

(y^{*}, z^{*})

of the separable nonlinear overdetermined least squares problem

m i n_{y, z} ∥ F (y, z) ∥ \equiv m i n_{y, z} ∥ A (y) z + b (y) ∥,

(1)

where the full-rank matrix

A (y) \in R^{(N + ℓ) \times N}

and the vector

b (y) \in R^{N + ℓ}

are twice Lipschitz continuously differentiable functions of

y \in R^{n}

,

z \in R^{N}

and

ℓ \geq n

. Here, we are using the Euclidean norm. The classic Golub–Pereyra variable projection method [4] replaces the problem (1) by the simpler least squares problem

m i n_{y} ∥ f (y) ∥ \equiv m i n_{y} ∥ [I - A (y) A^{+} (y)] b (y) ∥ .

(2)

Once

y^{*}

has been obtained by solving (2),

z^{*}

can be found by solving the resulting linear least squares problem

m i n_{z} ∥ A (y^{*}) z + b (y^{*}) ∥;

(3)

formally,

z^{*} = - A^{+} (y^{*}) b (y^{*}),

(4)

where

A^{+} (y^{*})

is the pseudo-inverse of

A (y^{*})

. An alternative method that reduces (1) to a smaller least squares problem

m i n_{y} ∥ f (y) ∥

was proposed in [5,6] for the case

ℓ = n

, and the associated iterative technique was then applied to the case

ℓ > n

in [7] but without a complete analytical justification. The reduced system is

m i n_{y} ∥ f (y) ∥ \equiv m i n_{y} ∥ C^{T} (y) b (y) ∥,

(5)

where

C (y)

is an

(N + ℓ) \times N

matrix whose columns form an orthonormal basis for

N u l l (A^{T} (y))

. In [8], we presented a quadratically convergent Gauss–Newton type iterative method, incorporating second derivative terms, to solve this problem, and provided a complete theoretical justification for our technique. In particular, we showed in [8] that our Gauss–Newton type method, and also the standard Gauss–Newton method [9,10,11], which omits the second derivative terms and is not quadratically convergent, are both invariant with respect to the specific basis matrix

C (y)

that is used at any particular point in the iteration. This makes it possible to substitute, at each point of the iteration, any easily computed local orthonormal basis for the nullspace of

A^{T} (y)

. This observation also makes it possible to compute both the first and the second derivatives involved in the computation very cheaply.

Many instances of (1), such as those arising from the discretization of linear differential equations with only a few nonlinear parameters, involve

N \to \infty

, while ℓ and n are fixed and small. In this case, the main computational cost in each step of our quadratically convergent Gauss–Newton type method of [8] is the QR factorization of

A (y)

, costing approximately

4 N^{3} / 3

flops [12] per iteration. In this note, we show how the relevant computations can instead be performed by using one LU factorization of

A (y)

per iteration, so that the computational cost of the computation is essentially halved to

2 N^{3} / 3

flops [12]. We note that, in the case of discretizations of differential equations, the matrix

A (y)

is typically very sparse and strongly patterned. Since LU factorization more easily exploits and preserves such sparsity and patterns than does QR factorization, this technique also opens up opportunities for further reductions in the computational cost in such applications.

In a previous paper [3], we presented the use of LU factorizations in the context of solving separable systems of nonlinear equations and zero-residual separable nonlinear least squares problems. We mentioned there that extending that approach to nonzero residual problems remained an open question. The current paper resolves that question, while also improving on the technique of [3] in other ways. In particular, our new method does not involve the singular value decomposition, and in contrast to the use of the standard Gauss–Newton method in [3], it retains quadratic convergence even in the case of nonzero residual problems. The latter is achieved by retaining the second derivative terms, and we show here how those terms can be computed very efficiently. This is in contrast to the usual use of the Gauss–Newton method, which sacrifices the quadratic convergence for the convenience of not computing the second derivatives.

In the next section, we present the relevant results from [8] relating to our technique for solving (5) by our Gauss–Newton type method using QR factorization, and show how those results and methods can be modified to use LU factorization. The resulting algorithm and some numerical examples to illustrate the method are given in the following two sections. The numerical results exhibit the quadratic convergence expected from our Gauss–Newton type method.

2. Analysis

We begin by presenting the relevant background results from [8]. Assume that

A (y)

and

b (y)

are twice Lipschitz continuously differentiable in a neighborhood of the least squares solution

y^{*}

of (1). As shown in [5,6,7,8], there exists a smoothly differentiable

(N + ℓ) \times ℓ

matrix

C (y)

whose columns form an orthonormal basis of

N u l l (A^{T} (y))

in a neighborhood of

y^{*}

. Then, finding the least squares solution of (1) can be reduced to finding the least squares solution of (5). Our Gauss–Newton type method for solving (5) takes the form

\begin{matrix} [{(f^{'} (y^{(m)}))}^{T} f^{'} (y^{(m)}) + Σ_{i = 1}^{ℓ} f_{i} (y^{(m)}) H_{i} (y^{(m)})] (y^{(m + 1)} - y^{(m)}) \\ = & - {(f^{'} (y^{(m)}))}^{T} f (y^{(m)}), \end{matrix}

(6)

where

f (y) \equiv C^{T} (y) b (y)

and

f^{'} (y)

,

f_{i} (y)

and

H_{i} (y)

denote the derivative of

f (y)

, the i-th component of

f (y)

, and the Hessian matrix of

f_{i} (y)

, respectively.

In [8], we proved that, at any point

y^{(m)}

in the iteration (6), the particular matrix

C (y^{(m)})

whose columns form an orthonormal basis for the nullspace of

A^{T} (y^{(m)})

can be replaced by any other orthonormal basis

\tilde{C} (y^{(m)})

for this space, without changing

y^{(m + 1)}

. Thus, instead of using the function

f (y) \equiv C^{T} (y) b (y)

in (6), we may use a different function

g (y) \equiv {\tilde{C}}^{T} (y) b (y)

at each iteration point

y^{(m)}

. This freedom to use any orthonormal basis for the nullspace of

A^{T} (y)

at the point

y^{(m)}

is key to our numerical technique. It also permits us to write simply

C (y^{(m)})

and

f (y^{(m)})

instead of

\tilde{C} (y^{(m)})

and

g (y^{(m)})

in the rest of this note, regardless of the particular matrix

C (y)

being used, and for simplicity we write

\bar{y}

instead of

y^{(m)}

. The analysis below centers on the efficient computation of the terms required to use (6) at the particular point

y^{(m)} = \bar{y}

.

The following results are derived and used in [5,6,7,8] and are required for our analysis below. With the terms as defined above, and writing

C_{i} (y)

for the i-th column of the matrix

C (y)

,

C^{T} (y) A (y) = 0, C^{T} (y) C (y) = I,

(7)

f^{'} (y) = C^{T} (y) b^{'} (y) + [\begin{matrix} {[C^{T} (y)]}_{1}^{'} b (y) & . . . & {[C^{T} (y)]}_{n}^{'} b (y) \end{matrix}],

(8)

and the

(j, k)

-th entry of the

n \times n

Hessian matrix

H_{i} (y)

is

{[C_{i}^{T} (y)]}_{j, k}^{″} b (y) + {[C_{i}^{T} (y)]}_{j}^{'} {[b (y)]}_{k}^{'} + {[C_{i}^{T} (y)]}_{k}^{'} {[b (y)]}_{j}^{'} + C_{i}^{T} (y) {[b (y)]}_{j, k}^{″} .

(9)

Here, we use

{[C (y)]}_{j}^{'}

and

{[C (y)]}_{j, k}^{″}

to denote the termwise first derivative w.r.t.

y_{j}

and second derivative w.r.t.

y_{j}, y_{k}

, respectively, of

C (y)

, for

j, k = 1, \dots, n

.

Define the

(N + ℓ) \times (N + ℓ)

nonsingular matrix

M (y) \equiv [\begin{matrix} A (y) & C (y) \end{matrix}] .

(10)

Then, we showed in [8] that

{[C (\bar{y})]}_{j}^{'} M (\bar{y}) = [\begin{matrix} - C^{T} (\bar{y}) {[A (\bar{y})]}_{j}^{'} & 0_{(N + ℓ) \times ℓ} \end{matrix}]

(11)

and

\begin{matrix} {[C^{T} (\bar{y})]}_{j, k}^{″} M (\bar{y}) = \\ [- {[C^{T} (\bar{y})]}_{j}^{'} {[A (\bar{y})]}_{k}^{'} - {[C^{T} (\bar{y})]}_{k}^{'} {[A (\bar{y})]}_{j}^{'} - C^{T} (\bar{y}) {[A (\bar{y})]}_{j, k}^{″} ∣ {[K (\bar{y})]}_{j, k}^{″}], \end{matrix}

(12)

where the

(p, q)

-entry

{[K (\bar{y})]_{j, k}^{″}]}_{(p, q)}

of the upper triangular matrix

[K (\bar{y})]_{j, k}^{″}]

is

{[K (\bar{y})]_{j, k}^{″}]}_{(p, q)} = \{\begin{matrix} 0 & if p > q, \\ - {[C_{p}^{T} (\bar{y})]}_{j}^{'} {[C_{p} (\bar{y})]}_{k}^{'} & if p = q, \\ - {[C_{p}^{T} (\bar{y})]}_{j}^{'} {[C_{q} (\bar{y})]}_{k}^{'} - {[C_{p}^{T} (\bar{y})]}_{k}^{'} {[C_{q} (\bar{y})]}_{j}^{'} & if p < q . \end{matrix}

Throughout the rest of this paper, we assume not only that

A (y)

has full rank N, but also that it is sufficiently well-conditioned that LU factorization with partial pivoting can be performed safely. For separable nonlinear systems with a rank-deficient

A (y)

, the technique of bordered matrices [13,14,15,16] may be applied to produce a full rank matrix.

The following theorem lays the foundation for our approach to using the LU factorization.

Theorem 1.

Let the LU factorization of the rectangular matrix

A (\bar{y})

with some permutation matrix

P (\bar{y})

be

P (\bar{y}) A (\bar{y}) = L (\bar{y}) U (\bar{y}),

(13)

where

A (\bar{y})

is

(N + ℓ) \times N

,

P (\bar{y})

is an

(N + ℓ) \times (N + ℓ)

permutation matrix,

L (\bar{y})

is an

(N + ℓ) \times N

unit lower triangular matrix and

U (\bar{y})

is an

N \times N

nonsingular upper triangular matrix.

(a): The matrix

$\bar{M} (\bar{y}) \equiv [A (\bar{y}) | S (\bar{y})],$

(14)

where

$S (\bar{y}) \equiv P^{T} (\bar{y}) [\begin{matrix} 0 \\ I_{ℓ} \end{matrix}],$

is nonsingular.
(b): Define

$Ψ (\bar{y}) \equiv {[{\bar{M}}^{T} (\bar{y})]}^{- 1} [\begin{matrix} 0 \\ I_{ℓ} \end{matrix}] .$

(15)

Then, the thin positive QR factorization

$Ψ (\bar{y}) \equiv C (\bar{y}) R (\bar{y})$

(16)

produces a thin orthonormal $(N + ℓ) \times ℓ$ matrix $C (\bar{y})$ and a small $ℓ \times ℓ$ nonsingular upper triangular matrix $R (\bar{y})$ , and

$C^{T} (\bar{y}) A (\bar{y}) = 0, C^{T} (\bar{y}) C (\bar{y}) = I_{ℓ} .$

(17)

Proof.

(a) By direct computation,

\bar{M} (\bar{y})

is a product of nonsingular matrices:

\begin{matrix} \bar{M} (\bar{y}) = P^{T} (\bar{y}) [\begin{matrix} L (\bar{y}) U (\bar{y}) & |\begin{matrix} 0 \\ I_{ℓ} \end{matrix} \end{matrix}] \\ = & P^{T} (\bar{y}) [\begin{matrix} L (\bar{y}) & |\begin{matrix} 0 \\ I_{ℓ} \end{matrix} \end{matrix}] [\begin{matrix} U (\bar{y}) & 0 \\ 0 & I_{ℓ} \end{matrix}] . \end{matrix}

(18)

(b) By (15) and (16),

\begin{matrix} C^{T} (\bar{y}) A (\bar{y}) = R^{- T} (\bar{y}) Ψ^{T} (\bar{y}) A (\bar{y}) = R^{- T} (\bar{y}) [\begin{matrix} 0 & I_{ℓ} \end{matrix}] {[\bar{M} (\bar{y})]}^{- 1} (\bar{y}) A (\bar{y}) \\ = & R^{- T} (\bar{y}) [\begin{matrix} 0 & I_{ℓ} \end{matrix}] [\begin{matrix} I_{N} \\ 0 \end{matrix}] = 0 . \end{matrix}

☐

From (17), we see that the matrix

C (\bar{y})

constructed in the theorem satisfies (7) at

\bar{y}

, and thus, as shown in [8], Equations (11) and (12) hold for this matrix at

\bar{y}

.

To determine the computational cost, recall that we assume that

N \to \infty

, while ℓ and n are fixed and small. Then, the thin QR factorization in (16) is cheap since the matrix

Ψ (\bar{y})

has only ℓ columns. More specifically, the LU factorization of the

(N + ℓ) \times N

dimensional

A (\bar{y})

in (13) costs

(N + ℓ) N^{2} - N^{3} / 3 \approx 2 N^{3} / 3

flops, with an additional

O (N^{2})

comparisons to produce

P (\bar{y})

[12], while computing

S (\bar{y})

only costs

O (N)

flops and the thin QR factorization of the

(N + ℓ) \times ℓ

dimensional matrix

S (\bar{y})

takes only

(N + ℓ) ℓ^{2} - ℓ^{3} = O (N)

flops [12].

Notice that solving (11) and (12) involves solving several equations of the form

u^{T} M (\bar{y}) = v^{T}

. However, the matrix

M (y)

defined in (10) and which is used in (11) and (12), differs from the matrix

\bar{M} (\bar{y})

defined in (14) whose LU factors are given in (18). To reduce the cost of solving (11) and (12) from

O (N^{3})

to

O (N^{2})

operations, we can exploit the factorization (18) of the matrix

\bar{M} (\bar{y})

, as below. This result is essentially an application of the Sherman–Morrison–Woodbury formula.

Theorem 2.

Let

P (\bar{y}) A (\bar{y}) = L (\bar{y}) U (\bar{y})

be the LU factorization of

A (\bar{y})

with a permutation matrix

P (\bar{y})

as in (13), and let the matrices

\bar{M} (\bar{y})

,

S (\bar{y})

and

C (\bar{y})

be defined as in (14)–(16), while

M (y)

is defined by (10). Denote

W (\bar{y}) = I_{N + ℓ} - [C (\bar{y}) - S (\bar{y})] C^{T} (\bar{y}) .

(19)

Then,

(a): $M^{- 1} (\bar{y}) = {\bar{M}}^{- 1} (\bar{y}) W (\bar{y})$ ;
(b): For $u, v, w \in R^{N + ℓ}$ , $u^{T} M (\bar{y}) = v^{T}$ if and only if

$\begin{matrix} w^{T} \bar{M} (\bar{y}) = v^{T}, u^{T} = w^{T} W (\bar{y}) . \end{matrix}$

(20)

Proof.

(a)

By (18),

\begin{matrix} W (\bar{y}) M (\bar{y}) = W (\bar{y}) [\begin{matrix} A (\bar{y}) ∣ & C (\bar{y}) \end{matrix}] \\ = & [{I_{N + ℓ} - [C (\bar{y}) - S (\bar{y})] C^{T} (\bar{y})} A (\bar{y}) ∣ {I_{N + ℓ} - [C (\bar{y}) - S (\bar{y})] C^{T} (\bar{y})} C (\bar{y})] \\ = & [A (\bar{y}) ∣ C (\bar{y}) - [C (\bar{y}) - S (\bar{y})]] = [A (\bar{y}) ∣ S (\bar{y})] = \bar{M} (\bar{y}) . \end{matrix}

Then,

\begin{matrix} M^{- 1} (\bar{y}) = {\bar{M}}^{- 1} (\bar{y}) [\bar{M} (\bar{y})] M^{- 1} (\bar{y}) \\ = & {\bar{M}}^{- 1} (\bar{y}) [W (\bar{y}) M (\bar{y})] M^{- 1} (\bar{y}) = {\bar{M}}^{- 1} (\bar{y}) W (\bar{y}) . \end{matrix}

Part (b) follows directly from (a). ☐

Finally, we show how the value

z^{*}

defined by (3) can be computed efficiently using the LU factorization of

A (y^{*})

.

Theorem 3.

z^{*} = - A^{+} (y^{*}) b (y^{*})

if and only if

z^{*} = - [\begin{matrix} I_{N} & 0_{N \times ℓ} \end{matrix}] {[\begin{matrix} A (y^{*}) & C (y^{*}) \end{matrix}]}^{- 1} b (y^{*}) .

(21)

Proof.

Let the QR factorization of

A (y^{*})

be

A (y^{*}) = Q R \equiv [\begin{matrix} {(Q_{1})}_{(N + ℓ) \times N} & {(Q_{2})}_{(N + ℓ) \times ℓ} \end{matrix}] [\begin{matrix} R_{1} \\ 0 \end{matrix}] .

Since

A (y^{*})

is full rank, we see that

\begin{matrix} A^{+} (y^{*}) = {[A^{T} (y^{*}) A (y^{*})]}^{- 1} A^{T} (y^{*}) \\ = {([\begin{matrix} {(R_{1})}^{T} & 0 \end{matrix}] Q^{T} Q [\begin{matrix} R_{1} \\ 0 \end{matrix}])}^{- 1} [\begin{matrix} {(R_{1})}^{T} & 0 \end{matrix}] Q^{T} \\ = R_{1}^{- 1} {({(R_{1})}^{T})}^{- 1} [\begin{matrix} {(R_{1})}^{T} & 0 \end{matrix}] Q^{T} = [\begin{matrix} R_{1}^{- 1} & 0 \end{matrix}] Q^{T} . \end{matrix}

Since

C (y^{*})

is an orthonormal matrix, there exists an

ℓ \times ℓ

orthogonal matrix

Θ

such that

\begin{matrix} [\begin{matrix} I_{N} & 0_{N \times ℓ} \end{matrix}] {[\begin{matrix} A (y^{*}) & C (y^{*}) \end{matrix}]}^{- 1} = [\begin{matrix} I_{N} & 0_{N \times ℓ} \end{matrix}] {(Q [\begin{matrix} R_{1} & 0 \\ 0 & Θ_{ℓ \times ℓ} \end{matrix}])}^{- 1} \\ = [\begin{matrix} I_{N} & 0_{N \times ℓ} \end{matrix}] ([\begin{matrix} R_{1}^{- 1} & 0 \\ 0 & Θ_{ℓ \times ℓ}^{T} \end{matrix}] Q^{T}) = [\begin{matrix} R_{1}^{- 1} & 0 \end{matrix}] Q^{T} . \end{matrix}

Hence, the conclusion follows. ☐

Computing

z^{*}

via (21) involves the matrix

M (y^{*})

of (10), but the technique of Theorem 2 may again be used to replace this by the known LU factorization (18) of

\bar{M} (y^{*})

obtained from the LU factorization of

A (y^{*})

. Thus, the cost of this step is also only

2 N^{3} / 3 + O (N^{2})

flops.

3. Algorithm

We now give a detailed Algorithm 1 based on the analysis from the last section.

Algorithm 1: Given the function

F (y, z) = A (y) z + b (y)

with a matrix

A (y)

that is full rank in a neighborhood of

y^{*}

, a small positive real number

ϵ

, a positive integer

m_{0}

, and a point

y^{(0)}

near the solution

y^{*}

. For

m = 0, 1, 2, . . ., m_{0}

, do steps

(a)

–

(l)

:

(a): Compute the LU factorization (13) of $A (y^{(m)})$ :

$P (y^{(m)}) A (y^{(m)}) = L (y^{(m)}) U (y^{(m)});$
(b): Form the matrix $S (y^{(m)})$ and the matrix $\bar{M} (y^{(m)})$ as in (14);
(c): Use (16) and the factorization (18) to form the matrix $C (y^{(m)})$ ;
(d): Compute $f (y^{(m)}) = C^{T} (y^{(m)}) b (y^{(m)})$ ;
(e): Use (11) and (20) to solve for ${[C (y^{(m)})]}_{j}^{'}, j = 1, . . ., n$ ;
(f): Use (8) to form $f^{'} (y^{(m)})$ ;
(g): Use (12) and (20) to solve for ${[C^{T} (y^{(m)})]}_{j, k}^{″}, j, k = 1, ..., n$ ;
(h): Use (9) to find the $(j, k)$ -th entry of $H_{i} (y^{(m)}), i = 1, \dots, ℓ$ ;
(i): Form the matrix $[{(f^{'} (y^{(m)}))}^{T} f^{'} (y^{(m)}) + Σ_{i = 1}^{ℓ} f_{i} (y^{(m)}) H_{i} (y^{(m)})]$ for the left-hand side of (6);
(j): Form the vector $- {(f^{'} (y^{(m)}))}^{T} f (y^{(m)})$ for the right side of (6);
(k): Compute $y^{(m + 1)}$ by solving (6);
(l): If $| | y^{(m + 1)} - y^{(m)} | | < ϵ$ stop, output $y^{*} \approx y^{(m + 1)}$ and $z^{*}$ from (21) using (19) and (20). Otherwise, if $m < m_{0}$ , replace m by $m + 1$ and go to $(a)$ ; if $m = m_{0}$ , output that the method fails to obtain $y^{*}$ within given $m_{0}$ and $ϵ$ .

Note that the cost of solving (6) for

y^{(m + 1)} - y^{(m)}

is only

O (1)

since the matrix involved is only

ℓ \times ℓ

and we assume

ℓ = O (1)

. Thus, the overall cost per iteration remains

2 N^{3} / 3 + O (N^{2})

, the primary cost being the LU factorization of

A (y^{(m)})

. The standard Gauss–Newton method, which omits the second derivative terms, omits steps (g) and (h), but the resulting method is no longer quadratically convergent.

4. Examples

We present three examples to illustrate our algorithm for computing the least squares solution of overdetermined separable equations. Our examples have large N and small

n, ℓ

, approximating the theoretical characteristic that

n, ℓ = O (1)

as

N \to \infty

. Similar results hold if we use much larger N.

Example 1.

Consider the least squares solution of the overdetermined system

A (y) z + b (y) \equiv [\begin{matrix} {(A_{1} (y))}_{(2 k + 1) \times (2 k + 1)} + I_{2 k + 1} \\ {(A_{2} (y))}_{ℓ \times (2 k + 1)} \end{matrix}] z + [\begin{matrix} {(b_{1} (y))}_{(2 k + 1) \times 1} \\ {(b_{2} (y))}_{ℓ \times 1} \end{matrix}] = 0,

(22)

where

A_{1} (y) = [\begin{matrix} - 2 & 1 \\ 1 & - 2 & 1 \\ \dots \\ 1 & - 2 & 1 \\ 1 & - 2 \end{matrix}], b_{1} (y) = 0_{(2 k + 1) \times 1},

A_{2} (y) = [\begin{matrix} {e^{T}}_{k + 1} \\ 0 \end{matrix}], b_{2} (y) = [\begin{matrix} - 1 \\ 0.02 \sqrt{α (y, y^{*}, β)} \end{matrix}],

with

α (y, y^{*}, β) = {(y - y^{*})}^{2} - (y - y^{*}) s i n (2 (y - y^{*})) - 0.5 c o s (2 (y - y^{*})) + β

and

y \in R

(thus

n = 1

), and where

e_{k + 1}

is the

(k + 1)

-th standard unit vector in

R^{2 k + 1}

. The selected value of

y^{*} \in R

is the value of the exact solution

(y^{*}, z^{*})

. Here,

ℓ = 2

.

The problem can be regarded as a constrained generalized eigenvalue problem. We choose

n = 21

,

β = 9.5

, and

y^{*} = 0.25 c s c^{2} (π / 44) \approx 49.1228712506303890

. For the eigenvalues of matrices of the form of

A_{1} (y)

, see [17]. Beginning with

y^{(0)} = 48

and using Algorithm 1, we obtain the results in Table 1 for

y^{(m)}, m = 1, 2, 3, 4

, which shows quadratic convergence of the method.

After

y^{(4)}

is obtained, we compute

z^{(4)} \approx z^{*}

using (21), where

z^{*} \in R^{21}

,

z_{j}^{*} = s i n (j * π / 22),

j = 1, \dots, 21

. We find that

∥ z^{(4)} - z^{*} ∥ = 5.2774 \times 10^{- 15}

, and the 2-norm of the residual of

A (y^{(4)}) z^{(4)} + b (y^{(4)})

is ≈0.06.

Example 2.

Consider the least squares solution of the overdetermined system

A (y) z + b (y) \equiv [\begin{matrix} A_{1} (y) \\ A_{2} (y) \end{matrix}] z + [\begin{matrix} b_{1} (y) \\ b_{2} (y) \end{matrix}] = 0,

(23)

where

A_{1} (y) = {[\begin{matrix} 1 \\ 1 & 1 \\ ⋱ & ⋱ \\ 1 & 1 \end{matrix}]}_{N \times N}, b_{1} (y) = {[\begin{matrix} - 1 - y_{1} \\ 0 \\ \dots \\ 0 \end{matrix}]}_{N \times 1},

A_{2} (y) = {[\begin{matrix} y_{1} & 0 & 0 & 0 & \dots & 0 \\ y_{2} & y_{1} & 0 & 0 & \dots & 0 \\ 0 & y_{2} & y_{1} & 0 & \dots & 0 \end{matrix}]}_{3 \times N}, b_{2} (y) = [\begin{matrix} 1 - y_{1} + y_{1}^{2} \\ 1 + y_{1} - y_{2} + y_{1} y_{2} \\ 2 - y_{1} + y_{2} - y_{2}^{2} \end{matrix}] .

Here, we use

y = (y_{1}, y_{2})

, thus

n = 2

, and

ℓ = 3

.

We choose

n = 23

. The least squares solution of (23) is

{(y^{*})}^{T} = [0, 0]

. Beginning with

{[y^{(0)}]}^{T} = [0.1, 0.1]

and using Algorithm 1, we obtain the results in Table 2 for

y^{(m)}, m = 1, 2, 3, 4

. These results show the expected quadratic convergence. After

y^{(4)}

is obtained, we compute

z^{(4)} \approx z^{*}

using (21). Here,

z^{*} \in R^{23}

,

z_{j}^{*} = {(- 1)}^{j + 1}, j = 1, \dots, 23

, and the exact residual is

A (y^{*}) z^{*} + b (y^{*}) = e_{24} + e_{25} + 2 e_{26}

. We find that

∥ z^{(4)} - z^{*} ∥ = 5.0974 \times 10^{- 15}

, while 2-norm of the residual of

A (y^{(4)}) z^{(4)} + b (y^{(4)})

is

2.4494897427831779 \approx \sqrt{6}

.

Our last example arises from a discretization, using the finite element method, of the one-dimensional elliptic interface problem

- {(β u_{x})}_{x} + 2 = 0

, for

x \in (0, 1)

, where

β = β^{-}

for

0 < x < α

and

β = β^{+}

for

α < x < 1

. The boundary conditions are

u (0) = 0

and

u (1) = 1 / β^{+} + (1 / β^{-} - 1 / β^{+}) α^{2}

, and the interface conditions are

l i m_{x \to α^{-}} u (x) = l i m_{x \to α^{+}} u (x)

and

l i m_{x \to α^{-}} β^{-} u_{x} (x) = l i m_{x \to α^{+}} β^{+} u_{x} (x)

. For more on interface problems, see [18]. In this discretization, the slopes of the basis tent functions on either side of the interface

x = α

are modified from

- 1 / h

and

1 / h

to

- s_{1}

,

- s_{2}

and

s_{1}

,

s_{2}

on the respective line segments. In our setting,

β^{-}

and

s_{2}

are given, and

(y_{1}, y_{2}) = {(s_{1}, β^{+})}^{T}

represent the nonlinear variables.

Example 3.

Consider the least squares solution of the overdetermined system

A (y) z + b (y) = 0,

(24)

where the

N \times (N + 3)

matrix

A (y)

is defined as

A (y) \equiv [\begin{matrix} \frac{β^{-}}{h} F_{k - 1} & - \frac{β^{-}}{h} e_{k - 1} & 0_{(k - 1) \times 1} & 0_{(k - 1) \times (N - k - 1)} \\ - \frac{β^{-}}{h} e_{k - 1}^{T} & \frac{β^{-}}{h} + β^{-} y_{1} & - β^{-} y_{1} & 0_{1 \times (N - k - 1)} \\ 0_{1 \times (k - 1)} & - s_{2} y_{2} & \frac{y_{2}}{h} + s_{2} y_{2} & - \frac{y_{2}}{h} e_{1}^{T} \\ 0_{(N - k - 2) \times (k - 1)} & 0_{(N - k - 2) \times 1} & - \frac{y_{2}}{h} e_{1} & \frac{y_{2}}{h} F_{N - k - 1} \\ 0_{1 \times (k - 1)} & a_{N + 1, k} (y_{1}) & 0 & 0_{1 \times (N - k - 1)} \\ 0_{1 \times (k - 1)} & 0 & β^{-} y_{1} - s_{2} y_{2} & 0_{1 \times (N - k - 1)} \\ 0_{1 \times (k - 1)} & 0 & 0 & 0_{1 \times (N - k - 1)} \end{matrix}],

with

a_{N + 1, k} (y_{1}) = (α - x_{k}) y_{1} + (x_{k + 1} - α) s_{2} - 1

and

b (y) \equiv [\begin{matrix} 2 h E_{(k - 1) \times 1} \\ 2 h + (α - x_{k}) - y_{1} {(α - x_{k})}^{2} - (x_{k + 1} - α) + s_{2} {(x_{k + 1} - α)}^{2} \\ 2 h - (α - x_{k}) + y_{1} {(α - x_{k})}^{2} + (x_{k + 1} - α) - s_{2} {(x_{k + 1} - α)}^{2} \\ 2 h E_{(N - k - 1) \times 1} - [(1 - α^{2}) / h + (α^{2} / h) y_{2} / β^{-}] e_{N - k - 1} \\ 0 \\ 0 \\ {(a_{N + 1, k} (y))}^{2} + {(β_{1} y_{1} - s_{2} y_{2})}^{2} + {γ_{3}}^{2} \end{matrix}],

where

h = 1 / (N + 1)

and

x_{j} = j h

,

x_{k} < α < x_{k + 1}

, and the square matrices F are tridiagonal of the form

[\begin{matrix} 2 & - 1 \\ - 1 & 2 & - 1 \\ \dots \\ - 1 & 2 & - 1 \\ - 1 & 2 \end{matrix}],

the vectors E have all entries one,

e_{j}

are the jth standard unit vectors, and

γ_{1} = \frac{1 - (x_{k + 1} - α) s_{2}}{α - x_{k}}, γ_{2} = \frac{[1 - (x_{k + 1} - α) s_{2}] β^{-}}{(α - x_{k}) s_{2}} .

(25)

The exact least squares solution is

y^{*} = {(γ_{1}, γ_{2})}^{T}

.

We use

n = 39

,

k = 26

,

ℓ = 3

,

n = 2

,

β^{-} = 1

,

s_{2} = 8

,

α = 2 / 3

and

γ_{3} = 4.775 π

. Using the Algorithm 1 with

y^{(0)} = {(56.5, 7.2)}^{T}

, we obtain the results in Table 3 for

y^{(m)}, m = 1, 2, 3, 4

, which shows quadratic convergence of the method. After

y^{(4)}

is obtained, we obtain

z^{(4)}

using (21). Here,

z_{j}^{*} = {(j h)}^{2} / β^{-}, j = 1, 2, . . ., k

and

z_{j}^{*} = {(j h)}^{2} / γ_{2} + (1 / β^{-} - 1 / γ_{2}) α^{2}, j = k + 1, ..., N

, and the exact residual is

A (y^{*}) z^{*} + b (y^{*}) = γ_{3}^{2} e_{N + 3}

. We find that

∥ z^{(4)} - z^{*} ∥ = 1.5081 \times 10^{- 15}

and the 2-norm of the residual of

A (y^{(4)}) z^{(4)} + b (y^{(4)})

is

225.03314884758808 \approx {(4.775 π)}^{2}

.

5. Conclusions

We have shown how LU factorization can be used to solve separable nonlinear least squares problems with nonzero residuals. Our Gauss-Newton type method retains the second derivative terms and is quadratically convergent. The technique is most efficient for situations in which there are few nonlinear terms and variables. In that context we show how the second derivative terms can be computed relatively cheaply. Our numerical examples demonstrate the effectiveness of the technique in some areas of application.

Acknowledgments

Publication costs were provided by Western Washington University, Bellingham, WA 98225, USA.

Author Contributions

Yunqiu Shen conceived the key ideas and drafted the paper; Tjalling Ypma refined the details and the presentation.

Conflicts of Interest

The authors declare no conflict of interest.

References

Golub, G.H.; Pereyra, V. Separable nonlinear least squares: the variable projection method and its applications. Topic Review. Inverse Probl. 2003, 19, R1–R26. [Google Scholar] [CrossRef]
Mullen, K.M.; van Stokkum, I.M. The variable projection algorithm in time-resolved spectroscopy, microscopy and mass spectrometry applications. Numer. Algorithms 2009, 51, 319–340. [Google Scholar] [CrossRef]
Shen, Y.-Q.; Ypma, T.J. Solving separable nonlinear equations using LU factorization. ISRN Math. Anal. 2013, 2013, 258072. [Google Scholar] [CrossRef] [PubMed]
Golub, G.H.; Pereyra, V. The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate. SIAM J. Numer. Anal. 1973, 10, 413–432. [Google Scholar] [CrossRef]
Shen, Y.-Q.; Ypma, T.J. Solving nonlinear systems of equations with only one nonlinear variable. J. Comput. Appl. Math. 1990, 30, 235–246. [Google Scholar] [CrossRef]
Ypma, T.J.; Shen, Y.-Q. Solving N+m nonlinear equations with only m nonlinear variables. Computing 1990, 44, 259–271. [Google Scholar] [CrossRef]
Lukeman, G.G. Separable Overdetermined Nonlinear Systems: An Application of the Shen-Ypma Algorithm; VDM Verlag: Saarbrucken, Germany, 2009. [Google Scholar]
Shen, Y.; Ypma, T.J. Solving Separable Least Squares Problems using QR factorization. J. Comp. Appl. Math. submitted.
Dennis, J.E., Jr.; Schnabel, R.B. Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Corrected Reprint of the 1983 Original; SIAM: Philadelphia, PA, USA, 1996. [Google Scholar]
Deuflhard, P.; Hohmann, A. Numerical Analysis in Modern Scientific Computing, 2nd ed.; Springer: New York, NY, USA, 2003. [Google Scholar]
Ortega, J.M.; Rheinboldt, W.C. Iterative Solution of Nonlinear Equations in Several Variables; Academic Press: New York, NY, USA, 1970. [Google Scholar]
Golub, G.H.; Van Loan, C.F. Matrix Computations, 3rd ed.; Johns Hopkins: Baltimore, MD, USA, 1996. [Google Scholar]
Shen, Y.-Q.; Ypma, T.J. Newton’s method for singular nonlinear equations using approximate left and right nullspaces of the Jacobian. Appl. Numer. Math. 2005, 54, 256–265. [Google Scholar] [CrossRef]
Shen, Y.-Q.; Ypma, T.J. Solving rank-deficient separable nonlinear equations. Appl. Numer. Math. 2007, 57, 609–615. [Google Scholar] [CrossRef]
Shen, Y.-Q.; Ypma, T.J. Numerical bifurcation of separable parameterized equations. Elect. Trans. Numer. Anal. 2009, 34, 31–43. [Google Scholar]
Shen, Y.-Q.; Ypma, T.J. Rank deficiency and bifurcation into affine subspaces for separable parametrized equations. Math. Comp. 2016, 85, 271–293. [Google Scholar] [CrossRef]
Smith, G.D. Numerical Solution of Partial Differential Equations: Finite Difference Methods, 3rd ed.; Oxford University Press: New York, NY, USA, 1985. [Google Scholar]
Li, Z.; Ito, K. The Immersed Interface Method; SIAM: Philadelphia, PA, USA, 2006. [Google Scholar]

Table 1. Gauss–Newton type iterations for Example 1.

m	$y^{(m)}$	$\| \| y^{(m)} - y^{*} \| \|$
0	4.80000000000000e+001	1.1229e+000
1	4.90312546237768e+001	9.1617e−002
2	4.91243552951534e+001	1.4840e−003
3	4.91228716369764e+001	3.8635e−007
4	4.91228712506304e+001	2.8422e−014

Table 2. Gauss–Newton type iterations for Example 2.

m	$y_{1}^{(m)} - y_{1}^{*}$	$y_{2}^{(m)} - y_{2}^{*}$	$\| \| y^{(m)} - y^{*} \| \|$
0	1.0000e−001	1.0000e−001	1.4142e−001
1	−3.2975e−002	−1.7299e−002	3.7238e−002
2	8.5227e−004	3.5533e−004	9.2338e−004
3	−1.5227e−008	−6.1721e−009	1.6890e−008
4	−4.3532e−016	−1.8608e−016	4.7342e−016

Table 3. Gauss–Newton type iterations for Example 3.

m	$y_{1}^{(m)} - y_{1}^{*}$	$y_{2}^{(m)} - y_{2}^{*}$	$\| \| y^{(m)} - y^{*} \| \|$
0	5.0000e−001	2.0000e−001	5.3852e−001
1	1.4160e+000	1.7845e−001	1.4272e+000
2	2.0255e−002	2.5279e−003	2.0412e−002
3	−1.3048e−006	−1.6292e−007	1.3149e−006
4	1.4921e−013	1.8652e−014	1.5038e−013

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shen, Y.; Ypma, T.J. An Efficient Algorithm for the Separable Nonlinear Least Squares Problem. Algorithms 2017, 10, 78. https://doi.org/10.3390/a10030078

AMA Style

Shen Y, Ypma TJ. An Efficient Algorithm for the Separable Nonlinear Least Squares Problem. Algorithms. 2017; 10(3):78. https://doi.org/10.3390/a10030078

Chicago/Turabian Style

Shen, Yunqiu, and Tjalling J. Ypma. 2017. "An Efficient Algorithm for the Separable Nonlinear Least Squares Problem" Algorithms 10, no. 3: 78. https://doi.org/10.3390/a10030078

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Efficient Algorithm for the Separable Nonlinear Least Squares Problem

Abstract

1. Introduction

2. Analysis

3. Algorithm

4. Examples

5. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI