**Contents**


## **About the Editors**

**Lorentz J¨antschi** was born in Fag˘ aras ˘ , , Romania, in 1973. In 1991, he moved to Cluj-Napoca, Cluj, where he completed his studies. In 1995, he was awarded his B.Sc. and M.Sc. in Informatics (under the supervision of Prof. Militon FRENT, IU); in 1997, his B.Sc. and M.Sc. in Physics and Chemistry (under the supervision of Prof. Theodor HODIS, AN); in 2000, his Ph.D. in Chemistry (under the supervision of Prof. Mircea V. DIUDEA); in 2002, his M.Sc. in Agriculture (under the supervision of Prof. Iustin GHIZDAVU and Prof. Mircea V. DIUDEA); and in 2010, his Ph.D. in Horticulture (under the supervision of Prof. Radu E. SESTRAS, ). In 2013, he conducted a postdoc in Horticulture (with Prof. Radu E. SESTRAS, ) and that same year (2013), he became a Full Professor of Chemistry at the Technical University of Cluj-Napoca and Associate at Babes-Bolyai University, where he advises on Ph.D. studies in Chemistry. He currently holds both of these positions. Throughout his career, he has conducted his research and education activities under the auspices of various institutions: the G. Barit,iu (1995–1999) and Balcescu (1999–2001) National Colleges, the ˘ Iuliu Hat,ieganu University of Medicine and Pharmacy (2007–2012), Oradea University (2013–2015), and the Institute of Agricultural Sciences and Veterinary Medicine at University of Cluj-Napoca (2011–2016). He serves as Editor for the journals Notulae Scientia Biologicae, Notulae Horti Agro Botanici Cluj-Napoca, Open Agriculture, and Symmetry. He has served as Editor-in-Chief of the Leonardo Journal of Sciences and the Leonardo Electronic Journal of Practices and Technologies (2002–2018) and as Guest Editor (2019–2020) of Mathematics.

**Daniela Ros, ca** was born in Cluj-Napoca, Romania in 1972. In 1995, she was awarded her B.Sc. in Mathematics, and in 1996, her M.Sc. in Mathematics (Numerical and Statistical Calculus). In 2004, she became Doctor in Mathematics with a thesis entitled "Approximation with Wavelets" (defended: January 9th, 2004) and conducted a postdoc in Computing in 2013 (with Prof. Sergiu NEDEVSCHI). That same year (2013), she became a Full Professor of Mathematics at the Technical University of Cluj-Napoca, where she advises on Ph.D. studies in Mathematics. She was Invited Professor at Universite Catholique de Louvain, Louvain-la-Neuve, Belgium on numerous occasions ´ (13–27 January 2011 and 10–24 January 2013 and twice for 2 weeks in the academic years 2006–2007, 2007–2008, 2008–2009, and 2009–2010) delivering courses and seminars for the 3rd cycle (doctoral school) on wavelet analysis on the sphere and other manifolds.

## **Preface to "Numerical Methods"**

The Special Issue "Numerical Methods" (2020) was open for submissions in 2019–2020) and welcomed papers from broad interdisciplinary areas since 'numerical methods' are a specific form of mathematics that involve creating and using algorithms to map out the mathematical core of a practical problem. Numerical methods naturally find application in all fields of engineering, physical sciences, life sciences, social sciences, medicine, business, and even arts. The common uses of numerical methods include approximation, simulation, and estimation, and there is almost no scientific field in which numerical methods do not find a use.

Some subjects included in 'numerical methods' are IEEE arithmetic, root finding, systems of equations, least squares estimation, maximum likelihood estimation, interpolation, numeric integration, and differentiation—the list may go on and on. Mathematical subject classification for numerical methods includes topics in conformal mapping theory in connection with discrete potential theory and computational methods for stochastic equations, but most of the subjects are within approximation methods and numerical treatment of dynamical systems, numerical methods, and numerical analysis. Also included are topics in numerical methods for deformable solids, basic methods in fluid mechanics, basic methods for optics and electromagnetic theory, basic methods for classical thermodynamics and heat transfer, equilibrium statistical mechanics, time-dependent statistical mechanics, and last but not least, mathematical finance. In short, the topics of interest deal mainly with numerical methods for approximation, simulation, and estimation. The deadline for manuscript submissions was closed on 30 June 2020.

Considering the importance of numerical methods, two representative examples should be given. First, the Jenkins–Traub method (published as "Algorithm 419: Zeros of a Complex Polynomial" and "Algorithm 493: Zeros of a Real Polynomial") which practically put the use of computers to another level in numerical problems. Second, the Monte Carlo method (published as '"he Monte-Carlo Method") which gave birth to the broad class of computational algorithms found today that rely on repeated random sampling to obtain numerical results. Today, the "numerical method" topic is much more diversified than 50 years ago, especially because of the technological progress and this series of collected papers is proof of this fact.

Results communicated here include topics ranging from statistics (Detecting Extreme Values with Order Statistics in Samples from Continuous Distributions, https://www.mdpi.com/2227-7390/8/2/216) and statistical software packages (dCATCH—A Numerical Package for d-Variate near G-Optimal Tchakaloff Regression via Fast NNLS, https://www.mdpi.com/2227-7390/8/7/1122) to new approaches for numerical solutions (Exact Solutions to the Maxmin Problem max‖Ax‖ Subject to ‖Bx‖ ≤ 1, https:// www.mdpi.com/2227-7390/8/1/85; On q-Quasi-Newton's Method for Unconstrained Multiobjective Optimization Problems, https://www.mdpi.com/2227-7390/8/4/616; Convergence Analysis and Complex Geometry of an Efficient Derivative-Free Iterative Method, https://www.mdpi.com/2227-7390/7/10/919; On Derivative Free Multiple-Root Finders with Optimal Fourth Order Convergence, https://www.mdpi.com/2227-7390/8/7/1091; Finite Integration Method with Shifted Chebyshev Polynomials for Solving Time-Fractional Burgers' Equations, https://www.mdpi.com/2227-7390/7/12/1201) to the use of wavelets (Orhonormal Wavelet Bases on The 3D Ball Via Volume Preserving Map from the Regular Octahedron, https://www.mdpi.com/2227-7390/8/6/994) and methods for visualization (A Simple Method for

Network Visualization, https://www.mdpi.com/2227-7390/8/6/1020).

**Lorentz J¨antschi, Daniela Ros, ca**

*Editors*

### *Article* **Detecting Extreme Values with Order Statistics in Samples from Continuous Distributions**

### **Lorentz Jäntschi 1,2**


Received: 17 December 2019; Accepted: 4 February 2020; Published: 8 February 2020

**Abstract:** In the subject of statistics for engineering, physics, computer science, chemistry, and earth sciences, one of the sampling challenges is the accuracy, or, in other words, how representative the sample is of the population from which it was drawn. A series of statistics were developed to measure the departure between the population (theoretical) and the sample (observed) distributions. Another connected issue is the presence of extreme values—possible observations that may have been wrongly collected—which do not belong to the population selected for study. By subjecting those two issues to study, we hereby propose a new statistic for assessing the quality of sampling intended to be used for any continuous distribution. Depending on the sample size, the proposed statistic is operational for known distributions (with a known probability density function) and provides the risk of being in error while assuming that a certain sample has been drawn from a population. A strategy for sample analysis, by analyzing the information about quality of the sampling provided by the order statistics in use, is proposed. A case study was conducted assessing the quality of sampling for ten cases, the latter being used to provide a pattern analysis of the statistics.

**Keywords:** probability computing; Monte Carlo simulation; order statistics; extreme values; outliers

**MSC:** 62G30; 62G32; 62H10; 65C60

### **1. Introduction**

Under the assumption that a sample of size *n*, was drawn from a certain population (*x*1, ..., *xn* ∈ *X*) with a known distribution (with known probability density function, PDF) but with unknown parameters (in number of *m*, {*π*1, ..., *πm*}), there are alternatives available in order to assess the quality of sampling.

One category of alternatives sees the sample as a whole—and in this case, a series of statistics was developed to measure the agreement between a theoretical (in the population) and observed (of the sample) distribution. This approach is actually a reversed engineering of the sampling distribution, providing a likelihood for observing the sample as drawn from the population. To do this for any continuous distribution, the problem is translated into the probability space by the use of a cumulative distribution function (CDF).

Formally, if PDF(*x*;(*πj*)1≤*j*≤*m*) takes values on a domain *<sup>D</sup>*, then CDF is defined by Equation (1) and {*p*1, ..., *pn*} defined by Equation (2) is the series of cumulative probabilities associated with the drawings from the sample.

$$\text{CDF}(\mathbf{x}; (\pi\_{\mathbf{j}})\_{1 \le j \le m}) = \int\_{\inf(D)}^{\mathbf{x}} \text{PDF}(t; (\pi\_{\mathbf{j}})\_{1 \le j \le m}) dt \tag{1}$$

$$\{p\_1, \ldots, p\_n\} = \text{CDF}(\{\mathbf{x}\_1, \ldots, \mathbf{x}\_n\}; (\pi\_j)\_{1 \le j \le m}).\tag{2}$$

CDF is always a bijective (and invertible; let InvCDF be its inverse, Equation (3)) function.

$$\mathbf{x} = \text{InvCDF}(p; (\pi\_j)\_{1 \le j \le m}).\tag{3}$$

The series of cumulative probabilities {*p*1, ..., *pn*}, independently of the distribution (PDF) of the population (*X*) subjected to the analysis, have a known domain (0 ≤ *pi* ≤ 1 for all 1 ≤ *i* ≤ *n*) belonging to the continuous uniform distribution (*p*1, ..., *pn* ∈ U(0, 1)). In the sorted cumulative probabilities ({*q*1, ..., *qn*} defined by Equation (4)), sorting defines an order relationship (0 ≤ *q*<sup>1</sup> ≤ ... ≤ *qn* ≤ 1).

$$\{q\_1, \ldots, q\_n\} = \text{SORT}(\{p\_1, \ldots, p\_n\}; \text{\textquotedbl{}ascending\textquotedbl{}}).\tag{4}$$

If the order of drawing in sample ({*x*1, ..., *xn*}) and of appearance in the series of associated CDF ({*p*1, ..., *pn*}) is not relevant (e.g., the elements in those sets are indistinguishable), the order relationship defined by Equation (4) makes them ({*q*1, ..., *qn*}) distinguishable (the order being relevant).

A series of order statistics (*OS*) were developed (to operate on ordered cumulative probabilities {*q*1, ..., *qn*}) and they may be used to assess the quality of sampling for the sample taken as a whole (Equations (5)–(10) below): Cramér–von Mises (*CMStatistic* in Equation (5), see [1,2]), Watson U2 (*WUStatistic* in Equation (6), see [3]), Kolmogorov–Smirnov (*KSStatistic* in Equation (7), see [4–6]), Kuiper V (*KVStatistic* in Equation (8), see [7]), Anderson–Darling (*ADStatistic* in Equation (9), see [8,9]), and H1 (*H*1*Statistic* in Equation (10), see [10]).

$$\text{CM}\_{\text{Statistic}} = \frac{1}{12n} + \sum\_{i=1}^{n} \left( \frac{2i - 1}{2n} - q\_i \right)^2 \tag{5}$$

$$\text{MM}\_{\text{Statistic}} = \text{CM}\_{\text{Statistic}} + \left(\frac{1}{2} - \frac{1}{n}\sum\_{i=1}^{n} q\_i\right)^2 \tag{6}$$

$$\text{KS}\_{\text{Statistic}} = \sqrt{n} \cdot \max\_{1 \le i \le n} \left( q\_i - \frac{i-1}{n}, \frac{i}{n} - q\_i \right) \tag{7}$$

$$KV\_{\text{Static}} = \sqrt{n} \cdot \left( \max\_{1 \le i \le n} \left( q\_i - \frac{i-1}{n} \right) + \max\_{1 \le i \le n} \left( \frac{i}{n} - q\_i \right) \right) \tag{8}$$

$$AD\_{Statistics} = -n - \frac{1}{n} \sum\_{i=1}^{n} (2i - 1) \ln \left( q\_i (1 - q\_{n-i}) \right) \tag{9}$$

$$H1\_{Statistics} = -\sum\_{i=1}^{n} q\_i \ln(q\_i) - \sum\_{i=1}^{n} (1 - q\_i) \ln(1 - q\_i) \,. \tag{10}$$

Recent uses of those statistics include [11] (CM), [12] (WU), [13] (KS), [14] (AD), and [15] (H1). Any of the above given test statistics are to be used, providing a risk of being in error for the assumption (or a likelihood to observe) that the sample ({*x*1, ..., *xn*}) was drawn from the population (*X*). Usually this risk of being in error is obtained from Monte Carlo simulations (see [16]) applied on the statistic in question and, in some of the fortunate cases, there is also a closed-form expression (or at least, an analytic expression) for CDF of the statistic available as well. In the less fortunate cases, only 'critical values' (values of the statistic for certain risks of being in error) for the statistic are available.

The other alternative in assessing the quality of sampling refers to an individual observation in the sample, specifically the less likely one (having associated *q*<sup>1</sup> or *qn* with the notations given in Equation (4)). The test statistic is *g*1 [15], given in Equation (11).

$$\lg \mathbf{1}\_{Statistic} = \max\_{1 \le i \le n} |p\_i - 0.5|. \tag{11}$$

It should be noted that 'taken as a whole' refers to the way in which the information contained in the sample is processed in order to provide the outcome. In this scenario ('as a whole'), the entirety of the information contained in the sample is used. As it can be observed in Equations (5)–(10), each formula uses all values of sorted probabilities ({*q*1, ..., *qn*}) associated with the values ({*x*1, ..., *xn*}) contained in the sample, while, as it can be observed in Equation (11), only the extreme value (max({*q*1, ..., *qn*}) or min({*q*1, ..., *qn*})) is used; therefore, one may say that only an individual observation (the extremum portion of the sample) yields the statistical outcome.

The statistic defined by Equation (11) no longer requires cumulative probabilities to be sorted; one only needs to find the most departed probability from 0.5—see Equation (11)—or, alternatively, to find the smallest (one having associated *q*<sup>1</sup> defined by Equation (4)) and the largest (one having associated *qn* defined by Equation (4)), and to find which deviates from 0.5 the most (*g*1*Statistic* = max{|*q*<sup>1</sup> − 0.5|, |*qn* − 0.5|}).

We hereby propose a hybrid alternative, a test statistic (let us call it *TS*) intended to be used in assessing the quality of sampling for the sample, which is mainly based on the less likely observation in the sample, Equation (12).

$$\text{TS}\_{\text{Statistic}} = \frac{\max\_{1 \le i \le n} |p\_i - 0.5|}{\sum\_{1 \le i \le n} |p\_i - 0.5|}. \tag{12}$$

The aim of this paper is to characterize the newly proposed test statistic (*TS*) and to analyze its peculiarities. Unlike the test statistics assessing the quality of sampling for the sample taken as a whole (Equations (5)–(10), and like the test statistic assessing the quality of sampling based on the less likely observation of the sample, Equation (11), the proposed statistic, Equation (12), does not require that the values or their associated probabilities ({*p*1, ..., *pn*}) be sorted (as {*q*1, ..., *qn*}); since (like the *g*1 statistic) it uses the extreme value from the sample, one can still consider it a sort of *OS* [17]. When dealing with extreme values, the newly proposed statistic, Equation (12), is a much more natural construction of a statistic than the ones previously reported in the literature, Equations (5)–(10), since its value is fed mainly from the extreme value in the sample (see the *max* function in Equation (12)). Later, it will be given a pattern analysis, revealing that it belongs to a distinct group of statistics that are more sensitive to the presence of extreme values. A strategy of using the pool of *OS* (Equations (5)–(12)) including *TS* in the context of dealing with extreme values is given, and the probability patterns provided by the statistics are analyzed.

The rest of the paper is organized as follows. The general strategy of sampling a CDF from an *OS* and the method of combining probabilities from independent tests are given in Section 2, while the analytical formula for the proposed statistic is given in Section 3.1, and computation issues and proof of fact results are given in Section 3.2. Its approximation with other functions is given in Section 3.3. Combining its calculated risk of being in error with the risks from other statistics is given in Section 3.4, while discussion of the results is continued with a cluster analysis in Section 3.5, and in connection with other approaches in Section 3.6. The paper also includes an appendix of the source codes for two programs and accompanying Supplementary Material.

### **2. Material and Method**

### *2.1. Addressing the Computation of CDF for OS(s)*

A method of constructing the observed distribution of the *g*1 statistic, Equation (11), has already been reported elsewhere [15]. A method of constructing the observed distribution of the Anderson–Darling (*AD*) statistic, Equation (9), has already been reported elsewhere [17]; the method for constructing the observed distribution of any *OS* via Monte Carlo (MC) simulation, Equations (5)–(12), is described here and it is used for *TS*, Equation (12).

Let us take a sample size of *n*. The MC simulation needs to generate a large number of samples (let the number of samples be *m*) drawn from uniform continuous distribution ({*p*1, ..., *pn*} in Equation (2)). To ensure a good quality MC simulation, simply using a random number generator is not good enough. The next step (Equations (10)–(12) do not require this) is to sort the probabilities to arrive at {*q*1, ..., *qn*} from Equation (4) and to calculate an *OS* (an order statistic) associated with each sample. Finally, this series of sample statistics ({*OS*1, ...,*OSw*} in Figure 1) must be sorted in order to arrive at the population emulated distribution. Then, a series of evenly spaced points (from 0 to 1000 in Figure 1) corresponding to fixed probabilities (from InvCDF0 = 0 to InvCDF1000 = 1 in Figure 1) is to be used saving the (*OS* statistic, its observed CDF probability) pairs (Figure 1).

**Figure 1.** The four steps to arrive at the observed CDF of *OS*.

The main idea is how to generate a good pool of random samples from a uniform U(0, 1) distribution. Imagine a (pseudo) random number generator, *Rand*, is available, which generates numbers from a uniform U(0, 1) distribution, from a [0, 1) interval; such an engine is available in many types of software and in most cases, it is based on Mersenne Twister [18]. What if we have to extract a sample of size *n* = 2? If we split in two the [0, 1) interval (then into [0, 0.5) and [0.5, 1)) then for two values (let us say *v*1 and *v*2), the contingency of the cases is illustrated in Figure 2.


**Figure 2.** Contingency of two consecutive drawings from [0, 1).

According to the design given in Figure 2, for 4 (=22) drawings of two numbers (*v*1 and *v*2) from the [0, 1) interval, a better uniform extraction (*v*1*v*2, 'distinguishable') is ("00") to extract first (*v*1) from [0, 0.5) and second (*v*2) from [0, 0.5), then ("01") to extract first (*v*1) from [0, 0.5) and second (*v*2) from [0.5, 1), then ("10") to extract first (*v*1) from [0, 0.5) and second (*v*2) from [0.5, 1), and finally ("11") to extract first (*v*1) from [0.5, 1) and second (*v*2) from [0.5, 1).

An even better alternative is to do only 3 (=2 + 1) drawings (*v*1 + *v*2, 'undistinguishable'), which is ("0") to extract both from [0, 0.5), then "1") to extract one (let us say first) from [0, 0.5), and another (let us say second) from [0.5, 1), and finally, ("2") to extract both from [0.5, 1) and to keep a record for their occurrences (1, 2, 1), as well. For *n* numbers (Figure 3), it can be from [0, 0.5) from 0 to *n* of them, with their occurrences being accounted for.


**Figure 3.** Contingency of *n* consecutive drawings from [0, 1).

According to the formula given in Figure 3, for *n* numbers to be drawn from [0, 1), a multiple of *n* + 1 drawings must be made in order to maintain the uniformity of distribution (*w* from Figure 1 becomes *n* + 1). In each of those drawings, we actually only pick one of *n* (random) numbers (from the [0, 1) interval) as independent. In the (*j* + 1)-th drawing, the first *j* of them are to be from [0, 0.5), while the rest are to be from [0.5, 1). The algorithm implementing this strategy is given as Algorithm 1.

Algorithm 1 is ready to be used to calculate any *OS* (including the *TS* first reported here). For each sample drawn from the U(0, 1) distribution (the array *v* in Algorithm 1), the output of it (the array *u* and its associated frequencies *n*!/*j*!/(*n* − *j*)!) can be modified to produce less information and operations (Algorithm 2). Calculation of the *OS* (*OSj* output value in Algorithm 2) can be made to any precision, but for storing the result, a *single* data type (4 bytes) is enough (providing seven significant digits as the precision of the observed CDF of the *OS*). Along with a *byte* data type (*j* output value in Algorithm 2) to store each sampled *OS*, 5 bytes of memory is required, and the calculation of

*n*!/(*n* − *j*)!/*j*! can be made at a later time, or can be tabulated in a separate array, ready to be used at a later time.

**Algorithm 1:** Balancing the drawings from uniform U(0, 1) distribution.

```
Input data: n (2 ≤ n, integer)
Steps:
  For i from 1 to n do v[i] ← Rand
  For j from 0 to n do
    For i from 1 to j do u[i] ← v[i]/2
    For i from j+1 to n do u[i] ← v[i]/2+1/2
    occ ← n!/j!/(n-j)!
    Output u[1], ..., u[n], occ
  EndFor
Output data: (n+1) samples (u) of sample size (n) and their occurrences (occ)
```
**Algorithm 2:** Sampling an order statistic (*OS*).

```
Input data: n (2 ≤ n, integer)
Steps:
  For i from 1 to n do v[i] ← Rand
  For j from 0 to n do
    For i from 1 to j do u[i] ← v[i]/2
    For i from j+1 to n do u[i] ← v[i]/2+1/2
    OSj ← any Equations (5)–(12) with p1←u[1], ..., pn ←u[n]
    Output OSj, j
  EndFor
Output data: (n+1) OS and their occurrences
```
As given in Algorithm 2, each use of the algorithm sampling *OS* will produce two associated arrays: *OSj* (*single* data type) and *j* (*byte* data type); each of them with *n* + 1 values. Running the algorithm *r*0 times will require 5 ·(*n* + 1)·*r*0 bytes for storage of the results and will produce (*n* + 1)·*r*0 *OS*s, ready to be sorted (see Figure 1). With a large amount of internal memory (such as 64 GB when running on a 16/24 cores 64 bit computers), a single process can dynamically address very large arrays and thus can provide a good quality, sampled *OS*. To do this, some implementation tricks are needed (see Table 1).

**Table 1.** Software implementation peculiarities of MC simulation.


Depending on the value of the sample size (*n*), the number of repetitions (*r*2) for sampling of *OS*, using Algorithm 2, from *r*0 ← *mem*/(*n* + 1) runs, is *r*2 ← *r*0 · (*n* + 1), while the length (*sts*) of the variable (CDF*st*) storing the dynamic array (*dyst*) from Table 1 is *sts* ← 1 + *r*2/*buf* . After sorting the *OS*s (of *sttype*, see Table 1; total number of *r*2) another trick is to extract a sample series at evenly spaced probabilities from it (from InvCDF0 to InvCDF1000 in Figure 1). For each pair in the sample (*lvli* varying from 0 to *lvl* = 1000 in Table 1), a value of the *OS* is extracted from CDF*st* array (which contains ordered

*OS* values and frequencies indexed from 0 to *<sup>r</sup>*2−1), while the MC-simulated population size is *<sup>r</sup>*<sup>0</sup> · <sup>2</sup>*n*. A program implementing this strategy is available upon request (*project*\_*OS*.*pas*).

The associated objective (with any statistic) is to obtain its CDF and thus, by evaluating the CDF for the statistical value obtained from the sample, Equations (5)–(12), to associate a likelihood for the sampling. Please note that only in the lucky cases is it possible to do this; in the general case, only critical values (values corresponding to certain risks of being in error) or approximation formulas are available (see for instance [1–3,5,7–9]). When a closed form or an approximation formula is assessed against the observed values from an MC simulation (such as the one given in Table 1), a measure of the departure such as the standard error (*SE*) indicates the degree of agreement between the two. If a series of evenly spaced points (*lvl* + 1 points indexed from 0 to *lvl* in Table 1) is used, then a standard error of the agreement for inner points of it (from 1 to *lvl* − 1, see Equation (13)) is safe to be computed (where *pi* stands for the observed probability while *p*ˆ*<sup>i</sup>* for the estimated one).

$$SE = \sqrt{\frac{SS}{lvl - 1}}, \quad SS = \sum\_{i=1}^{lvl-1} (p\_i - \mathfrak{p}\_i)^2. \tag{13}$$

In the case of *lvl* + 1, evenly spaced points in the interval [0, 1] in the context of MC simulation (as the one given in Table 1) providing the values of OS statistic in those points (see Figure 1), the observed cumulative probability should (and is) taken as *pi* = *i*/*lvl*, while *p*ˆ*<sup>i</sup>* is to be (and were) taken from any closed form or approximation formula for the CDF statistic (labeled *p*ˆ) as *p*ˆ*<sup>i</sup>* = *p*ˆ(InvCDF*i*), where InvCDF*<sup>i</sup>* are the values collected by the strategy given in Figure 1 operating on the values provided by Algorithm 2. Before giving a closed form for CDF of *TS* (Equation (12)) and proposing approximation formulas, other theoretical considerations are needed.

### *2.2. Further Theoretical Considerations Required for the Study*

When the PDF is known, it does not necessarily imply that its statistical parameters ((*πj*)1≤*j*≤*<sup>m</sup>* in Equations (1)–(3)) are known, and here, a complex problem of estimating the parameters of the population distribution from the sample (it then uses the same information as the one used to assess the quality of sampling) or from something else (and then it does not use the same information as the one used to assess the quality of sampling) can be (re)opened, but this matter is outside the scope of this paper.

The estimation of distribution parameters (*πj*)1≤*j*≤*<sup>m</sup>* for the data is, generally, biased by the presence of extreme values in the data, and thus, identifying the outliers along with the estimation of parameters for the distribution is a difficult task operating on two statistical hypotheses. Under this state of facts, the use of a hybrid statistic, such as the proposed one in Equation (12), seems justified. However, since the practical use of the proposed statistics almost always requires estimation of the population parameters (and in the examples given below, as well), a certain perspective on estimation methods is required.

Assuming that the parameters are obtained using the maximum likelihood estimation method (MLE, Equation (14); see [19]), one could say that the uncertainty accompanying this estimation is propagated to the process of detecting the outliers. With a series of *τ* statistics (*τ* = 6 for Equations (5)–(10) and *τ* = 8 for Equations (5)–(12)) assessing independently the risk of being in error (let be *α*1, ..., *ατ* those risks), assuming that the sample was drawn from the population, the unlikeliness of the event (*αFCS* in Equation (15) below) can be ascertained safely by using a modified form of Fisher's "combining probability from independent tests" method (*FCS*, see [10,20,21]; Equation (15)), where CDF*χ*<sup>2</sup> (*x*; *τ*) is the CDF of *χ*<sup>2</sup> distribution with *τ* degrees of freedom.

$$\max\left(\prod\_{1\le i\le n} \text{PDF}(\mathbf{x}\_i; (\pi\_j)\_{1\le j\le m})\right) \to \min\left(\sum\_{1\le j\le m} \ln\left(\text{PDF}(\mathbf{x}\_i; (\pi\_j)\_{1\le j\le m})\right)\right) \tag{14}$$

*Mathematics* **2020**, *8*, 216

$$\text{RCS} = -\ln\left(\prod\_{1 \le k \le \tau} a\_k\right), \mathfrak{a}\_{\text{FCS}} = 1 - \text{CDF}\_{\chi^2}(\text{FCS}; \tau). \tag{15}$$

Two known symmetrical distributions were used (PDF, see Equation (1)) to express the relative deviation from the observed distribution: Gauss (*G*2 in Equation (16)) and generalized Gauss–Laplace (*GL* in Equation (17)), where (in both Equations (16) and (17)) *z* = (*x* − *μ*)/*σ*.

$$\text{G2}(\mathbf{x}; \mu, \sigma) = (2\pi)^{-1/2} \sigma^{-1} \sigma^{-z^2/2} \tag{16}$$

$$\mathrm{GL}(\mathbf{x};\boldsymbol{\mu},\sigma,\kappa) = \frac{\mathfrak{c}\_1}{\sigma} e^{-|\boldsymbol{\varepsilon}\_0 \boldsymbol{z}|^\kappa}, \mathbf{c}\_0 = \left(\frac{\Gamma(3/\kappa)}{\Gamma(1/\kappa)}\right)^{1/2}, \mathbf{c}\_1 = \frac{\kappa \mathfrak{c}\_0}{2\Gamma(1/\kappa)}.\tag{17}$$

The distributions given in Equations (16) and (17) will be later used to approximate the CDF of *TS* as well as in the case studies of using the order statistics. For a sum (*x* ← *p*1+...+*pn* in Equation (18)) of uniformly distributed (*p*1, ..., *pn* ∈ *U*(0, 1)) deviates (as {*p*1, ..., *pn*} in Equation (2)) the literature reports the Irwin–Hall distribution [22,23]. The CDF*IH*(*x*; *n*) is:

$$\text{CDF}\_{H}(\mathbf{x};n) = \sum\_{k=0}^{\lfloor \mathbf{x} \rfloor} (-1)^{k} \frac{(\mathbf{x} - k)^{n}}{k!(n-k)!}. \tag{18}$$

### **3. Results and Discussion**

### *3.1. The Analytical Formula of* CDF *for TS*

The CDF of *TS* depends (only) on the sample size (*n*), e.g., CDF*TS*(*x*; *n*). As the proposed equation, Equation (12), resembles (as an inverse of) a sum of normal deviates, we expected that the CDF*TS* will also be connected with the Irwin–Hall distribution, Equation (18). Indeed, the conducted study has shown that the inverse (*y* ← 1/*x*) of the variable (*x*) following the *TS* follows a distribution (1/*TS*) of which the CDF is given in Equation (19). Please note that the similarity between Equations (18) and (19) is not totally coincidental; 1/*TS* (see Equation (12)) is more or less a sum of uniform distributed deviates divided by the highest one. Also, for any positive arbitrary generated series, its ascending (*x*) and descending (1/*x*) sorts are complementary. With the proper substitution, CDF1/*TS*(*y*; *n*) can be expressed as a function of CDF*IH*—see Equation (20).

$$\text{CDF}\_{1/\text{TS}}(y;n) = \sum\_{k=0}^{\lfloor n-y \rfloor} (-1)^k \frac{(n-y-k)^{n-1}}{k!(n-1-k)!} \tag{19}$$

$$\text{CDF}\_{1/\text{TS}}(y;n) = \text{CDF}\_{\text{HI}}(n-y;n-1). \tag{20}$$

Unfortunately, the formulas, Equation (18) to Equation (20), are not appropriate for large *n* and *p* (*p* = CDF1/*TS*(*y*; *n*) from Equation (19)), due to the error propagated from a large number of numerical operations (see further Table 2 in Section 3.2). Therefore, for *p* > 0.5, a similar expression providing the value for *α* = 1 − *p* is more suitable. It is possible to use a closed analytical formula for *α* = 1 − CDF1/*TS*(*y*; *n*) as well, Equation (21). Equation (21) resembles the Irwin–Hall distribution even more closely than Equation (20)—see Equation (22).

$$(1 - \text{CDF}\_{1/\text{TS}}(y; n) = \sum\_{k=0}^{\lfloor y \rfloor - 1} (-1)^k \frac{(y - 1 - k)^n}{k!(n - 1 - k)!} \tag{21}$$

$$11 - \text{CDF}\_{1/\text{TS}}(y; n) = \text{CDF}\_{\text{IH}}(y - 1; n - 1). \tag{22}$$

For consistency in the following notations, one should remember the definition of CDF, see Equation (1), and then we mark the connection between notations in terms of the analytical expressions of the functions, Equation (23):

$$\begin{array}{c} \text{CDF}\_{\text{TS}}(\mathbf{x};n) = \text{1-CDF}\_{1/\text{TS}}(1/\mathbf{x};n), \text{CDF}\_{\text{TS}}(1/\mathbf{x};n) = 1 - \text{CDF}\_{1/\text{TS}}(\mathbf{x};n),\\ \quad \text{since } \text{InvCDF}\_{\text{TS}}(p;n) \cdot \text{InvCDF}\_{1/\text{TS}}(p;n) = 1. \end{array} \tag{23}$$

One should notice (Equation (1); Equation (23)) that the infimum for the domain of 1/*TS* (1) is the supremum for the domain of *TS* (1) and the supremum (*n*) for the domain of 1/*TS* is the infimum (1/n) for the domain of *TS*. Also, *TS* has the median (*p* = *α* = 0.5) at 2/(*n* + 1), while 1/*TS* has the median (which is also the mean and mode) at (*n* + 1)/2. The distribution of 1/*TS* is symmetrical.

For *n* = 2, the *p* = CDF1/*TS*(*y*; *n*) is linear (*y* + *p* = 2), while for *n* = 3, it is a mixture of two square functions: 2*<sup>p</sup>* = (<sup>3</sup> <sup>−</sup> *<sup>y</sup>*)2, for *<sup>p</sup>* <sup>≤</sup> 0.5 (and *<sup>y</sup>* <sup>≥</sup> 2), and 2*<sup>p</sup>* + (*<sup>y</sup>* <sup>−</sup> <sup>1</sup>)<sup>2</sup> <sup>=</sup> 1 for *<sup>p</sup>* <sup>≥</sup> 0.5 (and *<sup>x</sup>* <sup>≤</sup> 2). With the increase of *n*, the number of mixed polynomials of increasing degree defining its expression increases. Therefore, it has no way to provide an analytical expression for InvCDF of 1/*TS*, not even for certain *p* values (such as 'critical' analytical functions).

The distribution of 1/*TS* can be further characterized by its central moments (Mean *μ*, Variance *σ*2, Skewness *γ*1, and Kurtosis *κ* in Equation (24)), which are closely connected with the Irwin–Hall distribution.

$$\text{For } 1/\text{TS}(y;n): \mu = (n+1)/2; \text{ } \sigma^2 = (n-1)/12, \gamma\_1 = 0; \kappa = 3 - 6/(5n - 5). \tag{24}$$

### *3.2. Computations for the CDF of TS and Its Analytical Formula*

Before we proceed in providing the simulation results, some computational issues must be addressed. Any of the formulas provided for CDF of *TS* (Equations (19) and (21); or Equations (20) and (22) both connected with Equation (18)), will provide almost exact calculations as long as computations with the formulas are conducted with an engine or package that performs the operations with rational numbers to an infinite precision (such as is available in the Mathematica software [24]), when also the value of *y* (*y* ← 1/*x*, of floating point type) is converted to a rounded, rational number. Otherwise, with increasing *n*, the evaluation of CDF for *TS* using either Equation (19) to Equation (22) carries huge computational errors (see the alternating sign of the terms in the sums of Equations (18), (19), and (21)). In order to account for those computational errors (and to reduce their magnitude) an alternate formula for the CDF of *TS* is proposed (Algorithm 3), combining the formulas from Equations (19) and (21), and reducing the number of summed terms.

### **Algorithm 3:** Avoiding computational errors for *TS*.

**Input data:** n (n ≥ 2, integer), x (1 ≤ x ≤ 1/n, real number, double precision) *y* ← 1/*x*; //*p*1/*TS* ← Equation (19), *α*1/*TS* ← Equation (21) **if** y <(n+1)/2 *<sup>p</sup>* <sup>←</sup> <sup>∑</sup> *<sup>y</sup>* −1 *<sup>k</sup>*=<sup>0</sup> (−1)*<sup>k</sup>* (*y*−1−*k*)*<sup>n</sup> <sup>k</sup>*!(*n*−1−*k*)! ; *<sup>α</sup>* <sup>←</sup> <sup>1</sup> <sup>−</sup> *<sup>p</sup>* **else if** y >(n+1)/2 *<sup>α</sup>* <sup>←</sup> <sup>∑</sup> *<sup>n</sup>*−*<sup>y</sup> <sup>k</sup>*=<sup>0</sup> (−1)*<sup>k</sup>* (*n*−*y*−*k*)*n*−<sup>1</sup> *<sup>k</sup>*!(*n*−1−*k*)! ; *<sup>p</sup>* <sup>←</sup> <sup>1</sup> <sup>−</sup> *<sup>α</sup>* **else** *α* ← 0.5 ; *p* ← 0.5 **Output data:** *α*=*α*1/*TS*=*pTS* ←CDF*TS*(*x*; *n*) and *p*= *p*1/*TS*=*αTS* ←1−*pTS*

Table 2 contains the sums of the residuals (*SS* = ∑<sup>999</sup> *<sup>i</sup>*=1(*pi* <sup>−</sup> *<sup>p</sup>*ˆ*i*)<sup>2</sup> in Equation (13), *lvl* <sup>=</sup> 1000) of the agreement between the observed CDF of *TS* (*pi* = *i*/1000, for *i* from 1 to 999) and the calculated CDF of *TS* (the *p*ˆ*<sup>i</sup>* values are calculated using Algorithm 3 from *xi* = InvCDF(*i*/1000; *n*) for *i* from 1 to 999) for some values of the sample size (*n*). To prove the previous given statements, Table 2 provides the square sums of residuals computed using three alternate formulas (from Equation (20) and from Equation (22), along with the ones from Algorithm 3).


**Table 2.** Square sums of residuals calculated in double precision (IEEE 754 binary64, 64 bits).

In red: computing affected digits.

As given in Table 2, the computational errors by using either Equation (20) (or Equation (19)) and Equation (22) (or Equation (21)) until *n* = 34 are reasonably low, while from *n* = 42, they become significant. As can be seen (red values in Table 2), double precision alone cannot cope with the large number of computations, especially as the terms in the sums are constantly changing their signs (see (−1)*<sup>k</sup>* in Equations (19) and (21)).

The computational errors using Algorithm 3 are reasonably low for the whole domain of the simulated CDF of *TS* (with *n* from 2 to 55), but the combined formula (Algorithm 3) is expected to lose its precision for large *n* values, and therefore, a solution to safely compute (CDF for *IH*, *TS* and 1/*TS*) is to operate with rational numbers.

One other alternative is to use GNU GMP (Multiple Precision Arithmetic Library [25]). The calculations are the same (Algorithm 3); the only difference is the way in which the temporary variables are declared (instead of *double*, the variables become *mpf* \_*t* initialized later with a desired precision).

For convenience, the FreePascal [26] implementation for CDF of the Irwin–Hall distribution (Equation (18), called in the context of evaluating the CDF of *TS* in Equations (20) and (22)) is given as Algorithm 4.



In Algorithm 4, the changes made to a classical code running without GNU GMP floating point arithmetic functions are written in blue color. For convenience, the combined formula (Algorithm 3) trick for avoiding the computation errors can be implemented with the code given as Algorithm 4 at the call level, Equation (25). If pIH(x:double; n:integer):double returns the value from Algorithm 4, then *pg*1, as given in Equation (25), safely returns the combined formula (Algorithm 3) with (or without) GNU GMP.

$$pg1 \leftarrow \begin{cases} 1 - pIH(n-1, n-1/\mathbf{x}), & \text{if } \mathbf{x}(n+1) < 2. \\ pIH(n-1, 1/\mathbf{x}-1), & \text{otherwise.} \end{cases} \tag{25}$$

Regarding Table 2, Algorithm 4 listed data, from *n* = 2 to *n* = 55, the calculation of the residuals were made with *double* (64 bits), *extended* (FreePascal 80 bits), and *mpf* \_*t*-128 bits (GNU GMP). The sum of residuals (for all *n* from 2 to 55) differs from *double* to *extended* with less than 10−<sup>11</sup> and the same for *mpf*\_*t* with 128 bits, which safely provides confidence in the results provided in Table 2 for the combined formula (last column, Algorithm 4). The deviates for agreement in the calculation of CDF for *TS* are statistically characterized by *SE* (Equation (13)), *min*, and *max* in Table 3.

The *SE* of agreement (Table 3) between the expected value and the observed one (Algorithm 4, Equation (12), Table 1) of the CDF1/*TS*(*x*; *n*) is safely below the resolution for the grid of observing points (*lvl*−<sup>1</sup> <sup>=</sup> <sup>10</sup>−<sup>3</sup> in Table 1; *SE* <sup>≤</sup> 1.2 <sup>×</sup> <sup>10</sup>−<sup>5</sup> in Table 3; two orders of magnitude). By using Algorithm 4, Figures 4–7 depict the shapes of CDF*TS*(*x*; *n*), CDF1/*TS*(*x*; *n*), InvCDF*TS*(*x*; *n*), and InvCDF1/*TS*(*x*; *n*) for *n* from 2 to 20.

Finally, for the domain of the simulated CDF of the *TS* population for *n* from 2 to 54, the error in the odd points of the grid (for 1000 · *p* from 1 to 999 with a step of 2) is depicted in Figure 8 (the calculations of theoretical CDF for *TS* made with *gmpfloat* at a precision of at least 256 bits). As can be observed in Figure 8, the difference between *p* and *p*ˆ is rarely larger than 10−<sup>5</sup> and never larger than 3 <sup>×</sup> <sup>10</sup>−<sup>5</sup> (the boundary of the representation in Figure 8) for *<sup>n</sup>* ranging from 2 to 54.



*Mathematics* **2020**, *8*, 216

*minep* = *min*(*pi* − ˆ*pi*), *maxep* = *max*(*pi* − ˆ*pi*).

**Figure 5.** CDF*TS*(*x*; *n*) for *n* = 2 to 20.

**Figure 6.** InvCDF1/*TS*(*x*; *n*) for *n* = 2 to 20.

**Figure 7.** CDF1/*TS*(*x*; *n*) for *n* = 2 to 20.

**Figure 8.** Agreement estimating CDF*TS* for *n* = 2...54 and 1000*p* = 1...999 with a step of 2.

Based on the provided results, one may say that there is no error in saying that Equations (19) and (21) are complements (see Equation (23) as well) of the CDF of *TS* given as Equation (12). As long as the calculations (of either Equations (19) and (21)) are conducted using rational numbers, either formula provides the most accurate result. The remaining concerns are how large those numbers can be (e.g., the range of *n*). This is limited only by the amount of memory available and how precise the calculations are. This reaches the maximum defined by the measurement of data precision, and finally, the resolutions are provided, which are given by the precision of converting (if necessary) the *TS* value given by Equation (12) from float to rational. Either way, some applications prefer approximate formulas, which are easier to calculate, and are considered common knowledge for interpreting the results. For those reasons, the next section describes approximation formulas.

### *3.3. Approximations of CDF of TS with Known Functions*

Considering, once again, Equation (24), for sufficiently large *n*, the distribution of 1/*TS* is approximately normal (Equation (26). For normal Gauss distribution, see Equation (16)).

$$\text{PDF}\_{1/\text{TS}}(y;n) \xrightarrow{n \to \infty} \text{PDF}\_{\text{G2}}((n+1)/2; \sqrt{(n-1)/12}).\tag{26}$$

Even better (than Equation (26)), for large values of *n*, a generalized Gauss–Laplace distribution (see Equation (17)) can be used to approximate the 1/*TS* statistic. Furthermore, for those looking for critical values of the *TS* statistic, the approximation of the 1/*TS* statistic to a generalized Gauss–Laplace distribution may provide safe critical values for large *n*. One way to derive the parameters of the generalized Gauss–Laplace distribution approximating the 1/*TS* statistic is by connecting the kurtosis and skewness of the two (Equation (27)).

$$\operatorname{Ku}(\beta) = \frac{\Gamma(\frac{5}{\beta})\Gamma(\frac{1}{\beta})}{\Gamma(\frac{3}{\beta})\Gamma(\frac{3}{\beta})} \to \beta = \operatorname{Ku}^{-1}\left(3 - \frac{6}{5n - 5}\right), \text{ a} = \sqrt{\frac{n-1}{12}\frac{\Gamma(1/\beta)}{\Gamma(3/\beta)}}.\tag{27}$$

With *α* and *β* given by Equation (27) and *μ* = (*n* + 1)/2 (Equation (24)), the PDF of the generalized Gauss–Laplace distribution (Equation (17)), which approximates 1/*TS* (for large *n*), is given in Equation (28).

$$\text{PDF}\_{GL}(\mathbf{x}; \mu, \mathfrak{a}, \boldsymbol{\beta}) = \frac{\boldsymbol{\beta}}{2a\Gamma(1/\beta)} e^{-\left(\frac{|\boldsymbol{x} - \boldsymbol{\mu}|}{a}\right)^{\beta}}.\tag{28}$$

The errors of approximation (with Equation (29)) of *pi* = CDF1/*TS* (from Algorithm 3) with *p*ˆ*<sup>i</sup>* = CDF*GL* (from Equations (27) and 28) are depicted in Figure 9 using a grid of 52 × 999 points for *n* = 50...101 and *p* = 0.001...0.999.

$$\text{CSE} = \sqrt{\sum\_{i=1}^{999} \frac{(p\_i - \not p\_i)^2}{999}},\\ p\_i = \frac{i}{10^3},\\ \not p\_i = \text{CDF}\_{GL}(\text{InvCDF}\_{1/\text{TS}}(p\_i; n); a, \beta). \tag{29}$$

As can be observed in Figure 9, the confidence in approximation of 1/*TS* with the *GL* increases with the sample size (*n*), but the increase is less than linear. The tendency is to approximately linearly decrease with an exponential increase.

**Figure 9.** Standard errors (*SE*) as function of sample size (*n*) for the approximation of 1/*TS* with *GL* (Equation (29)).

The calculation of CDF for 1/*TS* is a little tricky, as anticipated previously (see Section 3.2). To avoid the computation errors in the calculation of CDF*TS*, a combined formula is more appropriate (Algorithms 3 and 4). With *p*1/*TS* ← CDF1/*TS*(*y*; *n*) and *α*1/*TS* ← 1 − CDF1/*TS*(*y*; *n*), depending on the value of *y* (*y* ← 1/*x*, where *x* is the sample statistic of *TS*, Equation (12)), only one (from *α* and *p*, where *α* + *p* = 1) is suitable for a precise calculation.

An important remark at this point is that (*n* + 1)/2 is the median, mean, and mode for 1/*TS* (see Section 3.1). Indeed, any symbolic calculation with either of the formulas from Equation (19) to Equation (22) will provide that CDF1/*TS*((*n* + 1)/2; *n*) = 0.5, or, expressed with InvCDF, InvCDF1/*TS*(0.5; *n*)=(*n* + 1)/2.

### *3.4. The Use of CDF for TS to Measure the Departure between an Observed Distribution and a Theoretical One*

With any of Equations (5)–(12), a likelihood to observe an observed sample can be ascertained. One may ask which statistic is to be trusted. The answer is, at the same time, none and all, as the problem of fitting the data to a certain distribution involves the estimation of the distribution's parameters—such as using MLE, Equation (14). In this process of estimation, there is an intrinsic variability that cannot be ascertained by one statistic alone. This is the reason that calculating the risk of being in error from a battery of statistics is necessary, Equation (15).

Also, one may say that the *g*1 statistic (Equation (11)) is not associated with the sample, but to its extreme value(s), while others may say the opposite. Again, the truth is that both are right, as in certain cases, samples containing outliers are considered not appropriate for the analysis [27], and in those cases, there are exactly two modes of action: to reject the sample or to remove the outlier(s). Figure 10 gives the proposed strategy of assessing the samples using order statistics.

**Figure 10.** Using the order statistics to measure the likelihood of sampling.

As other authors have noted, in nonparametric problems, it is known that order statistics, i.e., the ordered set of values in a random sample from least to greatest, play a fundamental role. 'A considerable amount of new statistical inference theory can be established from order statistics assuming nothing stronger than continuity of the cumulative distribution function of the population' as [28] noted, a statement that is perfectly valid today.

In the following case studies, the values of the sample statistics were calculated with Equations (5)–(10) (*AD*, *KS*, *CM*, *KV*, *WU*, *H*1; see also Figure 10), while the risks of being in error—associated with the values of sample statistics (*αStatistic* for those)—were calculated with the program developed and posted online available at http://l.academicdirect.org/Statistics/tests. The *g*1*Statistic* (Equation (11)) and *αg*<sup>1</sup> were calculated as given in [15], while the *TSStatistic* (Equation (12)) was calculated with Algorithm 4. For *FCS* and *αFCS*, Equation (15) was used.

### Case study 1.

Data: "Example 1" in [29]; Distribution: Gauss (Equation (16)); Sample size: *n* = 10; Population parameters (MLE, Equation (14)): *μ* = 575.2; *σ* = 8.256; Order statistics analysis is given in Table 4. Conclusion: at *α* = 5% risk of being in error, the sample does not have an outlier (*αg*<sup>1</sup> = 11.2%) but it is a bad drawing from normal (Gauss) distribution, with less than the imposed level (*α* = 5%) likelihood to appear from a random draw (*αFCS* = 4.5%).


**Table 4.** Order statistics analysis for case studies 1 to 10.

### Case study 2.

Data: "Example 3" in [29]; Distribution: Gauss (Equation (16)); Sample size: *n* = 15; Population parameters (MLE, Equation (14)): *μ* = 0.018; *σ* = 0.532; Order statistics analysis is given in Table 4. Conclusion: at *α* = 5% risk of being in error, the sample does not have an outlier (*αg*<sup>1</sup> = 10.9%) and it is a good drawing from normal (Gauss) distribution, with more than the imposed level (*α* = 5%) likelihood to appear from a random draw (*αFCS* = 59.6%).

### Case study 3.

Data: "Example 4" in [29]; Distribution: Gauss (Equation (16)); Sample size: *n* = 10; Population parameters (MLE, Equation (14)): *μ* = 3.406; *σ* = 0.732; Order statistics analysis is given in Table 4. Conclusion: at *α* = 5% risk of being in error, the sample does not have an outlier (*αg*<sup>1</sup> = 45.1%) and it is a good drawing from normal (Gauss) distribution, with more than the imposed level (*α* = 5%) likelihood to appear from a random draw (*αFCS* = 79.7%).

### Case study 4.

Data: "Example 5" in [29]; Distribution: Gauss (Equation (16)); Sample size: *n* = 8; Population parameters (MLE, Equation (14)): *μ* = 4715; *σ* = 140.8; Order statistics analysis is given in Table 4. Conclusion: at *α* = 5% risk of being in error, the sample does not have an outlier (*αg*<sup>1</sup> = 25.5%) and it is a good drawing from normal (Gauss) distribution, with more than the imposed level (*α* = 5%) likelihood to appear from a random draw (*αFCS* = 34.6%).

### Case study 5.

Data: "Table 4" in [15]; Distribution: Gauss (Equation (16)); Sample size: *n* = 206; Population parameters (MLE, Equation (14)): *μ* = 6.481; *σ* = 0.829; Order statistics analysis is given in Table 4. Conclusion: at *α* = 5% risk of being in error, the sample have an outlier (*αg*<sup>1</sup> = 3.4%) and it is a good

drawing from normal (Gauss) distribution, with more than the imposed level (*α* = 5%) likelihood to appear from a random draw (*αFCS* = 66.1%).

### Case study 6.

Data: "Table 1, Column 1" in [30]; Distribution: Gauss (Equation (16)); Sample size: *n* = 166; Population parameters (MLE, Equation (14)): *μ* = −0.348; *σ* = 1.8015; Order statistics analysis is given in Table 4. Conclusion: at *α* = 5% risk of being in error, the sample does not have an outlier (*αg*<sup>1</sup> = 24.7%) and it is a good drawing from normal (Gauss) distribution, with more than the imposed level (*α* = 5%) likelihood to appear from a random draw (*αFCS* = 68.7%).

### Case study 7.

Data: "Table 1, Set BBB" in [31]; Distribution: Gauss (Equation (16)); Sample size: *n* = 105; Population parameters (MLE, Equation (14)): *μ* = −0.094; *σ* = 0.762; Order statistics analysis is given in Table 4. Conclusion: at *α* = 5% risk of being in error, the sample does not have an outlier (*αg*<sup>1</sup> = 72.9%) and it is a good drawing from normal (Gauss) distribution, with more than the imposed level (*α* = 5%) likelihood to appear from a random draw (*αFCS* = 18.8%).

### Case study 8.

Data: "Table 1, Set SASCAII" in [31]; Distribution: Gauss (Equation (16)); Sample size: *n* = 47; Population parameters (MLE, Equation (14)): *μ* = 1.749; *σ* = 0.505; Order statistics analysis is given in Table 4. Conclusion: at *α* = 5% risk of being in error, the sample does not have an outlier (*αg*<sup>1</sup> = 98.0%) and it is a good drawing from normal (Gauss) distribution, with more than the imposed level (*α* = 5%) likelihood to appear from a random draw (*αFCS* = 65.5%).

### Case study 9.

Data: "Table 1, Set TaxoIA" in [31]; Distribution: Gauss (Equation (16)); Sample size: *n* = 63; Population parameters (MLE, Equation (14)): *μ* = 0.744; *σ* = 0.670; Order statistics analysis is given in Table 4. Conclusion: at *α* = 5% risk of being in error, the sample does not have an outlier (*αg*<sup>1</sup> = 74.6%) and it is a good drawing from normal (Gauss) distribution, with more than the imposed level (*α* = 5%) likelihood to appear from a random draw (*αFCS* = 95.2%).

### Case study 10.

Data: "Table 1, Set ERBAT" in [31]; Distribution: Gauss (Equation (16)); Sample size: *n* = 25; Population parameters (MLE, Equation (14)): *μ* = 0.379; *σ* = 1.357; Order statistics analysis is given in Table 4. Conclusion: at *α* = 5% risk of being in error, the sample does not have an outlier (*αg*<sup>1</sup> = 87.9%) and it is a good drawing from normal (Gauss) distribution, with more than the imposed level (*α* = 5%) likelihood to appear from a random draw (*αFCS* = 89.5%).

### *3.5. The Patterns in the Order Statistics*

A cluster analysis on the risks of being in error, provided by the series of order statistics on the case studies considered in this study, may reveal a series of peculiarities (Figures 11 and 12). The analysis given here is based on the series of the above given case studies in order to illustrate similarities (and not to provide a 'gold standard' as in [32] or in [33]).

**Figure 11.** Euclidian distances between the risks being in error provided by the order statistics.

**Figure 12.** Pearson disagreement between the risks being in error provided by the order statistics.

Both clustering methods illustrated in Figures 11 and 12 reveal two distinct groups of statistics: {AD, CM, KV, WU, KS} and {H1, TS, g1}. The combined test FCS is also attracted (as expected) to the largest group. When looking at single Euclidean distances (Figure 11) of the largest group, two other associations should be noticed {AD, CM, KS} and {KV, WU}, suggesting that those groups carry similar information, but when looking at the Pearson disagreements (Figure 12), we must notice that the subgroups are changed {CM, KV, WU}, {AD}, and {KS}, with no hint of an association with their calculation formulas (Equations (5)–(9)); therefore, their independence should not be dismissed. The second group {H1, TS, g1} is more stable, maintaining the same clustering pattern of the group ({H1, TS}, {g1} in Figure 12).

Taking into account that the g1 test (Equation (11)) was specifically designed to account for outliers suggests that the H1 and TS tests are more sensitive to the outliers than other statistics, and therefore, when the outliers (or just the presence of extreme values) are the main concern in the sampling, it is strongly suggested to use those tests. The H1 statistic is a Shannon entropy formula applied in the probability space of the sample. When accounting for this aspect in the reasoning, the rassociation of the H1 with TS suggests that TS is a sort of entropic measure (max-entropy, to be

more exact [34], a limit case of generalized Rényi's entropy [35]). Again, the g1 statistic is alone in this entropic group, suggesting that it carries a unique fingerprint about the sample—specifically, about its extreme value (see Equation (11))—while the others account for the context (the rest of the sampled values, Equations (10) and (12)).

Regarding the newly proposed statistic (TS), from the given case studies, the fact that it belongs to the {H1, TS, g1} group strongly suggests that it is more susceptible to the presence of outliers (such as g1, purely defined for this task, and unlike the well known statistics defined by Equations (5)–(9)).

Moreover, one may ask that, if based on the risks being in error provided by the statistics from case studies 1 to 10, some peculiarity about TS or another statistic involved in this study could be revealed. An alternative is to ask if the values of risks can be considered to be belonging to the same population or not, and for this, the K-sample Anderson–Darling test can be invoked [36]. With the series of probabilities, there are actually 2<sup>9</sup> <sup>−</sup> <sup>1</sup> <sup>−</sup> <sup>9</sup> <sup>=</sup> 502 tests to be conducted (for each subgroup of 2, 3, 4, 5, 6, 7, 8, and 9 statistics picked from nine possible choices) and for each of them, the answer is same: At the 5% risk of being in error, it cannot be rejected that the groups (of statistics) were selected from identical populations (of statistics), so, overall, any of those statistics perform the same.

The proposed method may find its uses in testing symmetry [37], as a homogeneity test [38] and, of course, in the process of detecting outliers [39].

### *3.6. Another Rank Order Statics Method and Other Approaches*

The series of rank order statistics included in this study, Equations (5)–(11), covers the most known rank order statistics reported to date. However, when considering a new order statistic not included there, the use of it in the context of combining methods, Equation (15), only increases the degrees of freedom *τ*, while the design of using (Figure 10) is changed accordingly.

It should be noted that the proposed approach is intended to be used for small sample sizes, when no statistic alone is capable of high precision and high trueness. With the increasing sample size, all statistics should converge to the same risk of being in error and present other alternatives, such as the superstatistical approach [40]. In the same context, each of the drawings included in the sample are supposed to be independent. In the presence of correlated data (such as correlated in time), again, other approaches, such as the one communicated in [41], are more suited.

### **4. Conclusions**

A new test statistic to be used to measure the agreement between continuous theoretical distributions and samples drawn from *TS* was proposed. The analytical formula of the *TS* cumulative distribution function was obtained. The comparative study against other order statistics revealed that the newly proposed statistic carries distinct information regarding the quality of the sampling. A combined probability formula from a battery of statistics is suggested as a more accurate measure for the quality of the sampling. Therefore Equation (15) combining the probabilities (the risks of being in error) from Equation (5) to Equation (12) is recommended anytime when extreme values are suspected being outliers in samples from continuous distributions.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2227-7390/8/2/216/s1. The source code for sampling order statistics (file named OS.pas) and source code evaluation of the CDF of *TS* with Algorithm 4 (file named TS.pas file) are available upon request. The k-Sample Anderson–Darling test(s) on risks of being in error from the case studies 1 to 10 is given as a supplementary file.

**Funding:** This research received no external funding.

**Acknowledgments:** The following software were used during the research and writing the paper: Lazarus (freeware) were used to compile the 64bit executable for Monte Carlo sampling (using the parametrization given in Table 1). The executable was compiled to work for a 64GB multi-core workstation and were used so. Mathcad (v.14, licensed) were used to check the validity for some of the equations given (Equations (19)–(22), (24), (26), (27)), and to do the MLE estimates (implementing Equation (14) with first order derivatives and results given in Section 3.4 as Case studies 1 to 10). Matlab (v.8.5.0, licensed) was used to obtain Figures 4–8. Wolfram Mathematica (v.12.0, licensed) was used to check (iteratively) the formulas given for 1/*TS* (Equations (19) and 21)) and to provide the data for Figure 8. FreePascal (with GNU GMP, freeware) were used to assess numerically the agreement for TS statistic (Tables 2 and 3, Figure 8). StatSoft Statistica (v.7, licensed) was used to obtain Figures 11 and 12.

**Conflicts of Interest:** The author declares no conflict of interest.

### **References**


© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **dCATCH—A Numerical Package for d-Variate near G-Optimal Tchakaloff Regression via Fast NNLS**

### **Monica Dessole, Fabio Marcuzzi and Marco Vianello \***

Department of Mathematics "Tullio Levi Civita", University of Padova, Via Trieste 63, 35131 Padova, Italy; mdessole@math.unipd.it (M.D.); marcuzzi@math.unipd.it (F.M.)

**\*** Correspondence: marcov@math.unipd.it

Received: 11 June 2020; Accepted: 7 July 2020; Published: 9 July 2020

**Abstract:** We provide a numerical package for the computation of a *d*-variate near G-optimal polynomial regression design of degree *<sup>m</sup>* on a finite design space *<sup>X</sup>* <sup>⊂</sup> <sup>R</sup>*d*, by few iterations of a basic multiplicative algorithm followed by Tchakaloff-like compression of the discrete measure keeping the reached G-efficiency, via an accelerated version of the Lawson-Hanson algorithm for Non-Negative Least Squares (NNLS) problems. This package can solve on a personal computer large-scale problems where *card*(*X*) <sup>×</sup> dim(*P<sup>d</sup>* <sup>2</sup>*m*) is up to 108–109, being dim(*P<sup>d</sup>* <sup>2</sup>*m*) = ( 2*m*+*d <sup>d</sup>* ) = ( 2*m*+*d* <sup>2</sup>*<sup>m</sup>* ). Several numerical tests are presented on complex shapes in *d* = 3 and on hypercubes in *d* > 3.

**Keywords:** multivariate polynomial regression designs; G-optimality; D-optimality; multiplicative algorithms; G-efficiency; Caratheodory-Tchakaloff discrete measure compression; Non-Negative Least Squares; accelerated Lawson-Hanson solver

### **1. Introduction**

In this paper we present the numerical software package *dCATCH* [1] for the computation of a *<sup>d</sup>*-variate near G-optimal polynomial regression design of degree *<sup>m</sup>* on a finite design space *<sup>X</sup>* <sup>⊂</sup> <sup>R</sup>*d*. In particular, it is the first software package for general-purpose Tchakaloff-like compression of d-variate designs via Non-Negative Least Squares (NNLS), freely available on the Internet. The code is an evolution of the codes in Reference [2] (limited to *d* = 2, 3), with a number of features tailored to higher dimension and large-scale computations. The key ingredients are:


Before giving a more detailed description of the algorithm, it is worth recalling in brief some basic notions of optimal design theory. Such a theory has its roots and main applications within statistics, but also strong connections with approximation theory. In statistics, a design is a probability measure *<sup>μ</sup>* supported on a (discrete or continuous) compact set <sup>Ω</sup> <sup>⊂</sup> <sup>R</sup>*d*. The search for designs that optimize some properties of statistical estimators (optimal designs) dates back to at least one century ago, and the relevant literature is so wide and still actively growing and monographs and survey papers are abundant in the literature. For readers interested in the evolution and state of the art of this research field, we may quote, for example, two classical treatises such as in References [4,5], the recent monograph [6] and the algorithmic survey [7], as well as References [8–10] and references therein. On the approximation theory side we may quote, for example, References [11,12].

The present paper is organized as follows—in Section 2 we briefly recall some basic concepts from the theory of Optimal Designs, for the reader's convenience, with special attention to the deterministic and approximation theoretic aspects. In Section 3 we present in detail our computational approach to near G-optimal *d*-variate designs via Caratheodory-Tchakaloff compression. All the routines of the *dCATCH* software package here presented, are described. In Section 4 we show several numerical results with dimensions in the range 3–10 and a Conclusions section follows.

For the reader's convenience we also display Tables 1 and 2, describing the acronyms used in this paper and the content (subroutine names) of the *dCATCH* software package.



**Table 2.** *dCATCH* package content.


#### **2. G-Optimal Designs**

Let P*<sup>d</sup> <sup>m</sup>*(Ω) denote the space of *d*-variate real polynomials of total degree not greater than *n*, restricted to a (discrete or continuous) compact set <sup>Ω</sup> <sup>⊂</sup> <sup>R</sup>*d*, and let *<sup>μ</sup>* be a design, that is, a probability measure, with *supp*(*μ*) <sup>⊆</sup> <sup>Ω</sup>. In what follows we assume that *supp*(*μ*) is *determining* for <sup>P</sup>*<sup>d</sup> <sup>m</sup>*(Ω) [13], that is, polynomials in P*<sup>d</sup> <sup>m</sup>* vanishing on *supp*(*μ*) vanish everywhere on Ω.

In the theory of optimal designs, a key role is played by the diagonal of the reproducing kernel for *μ* in P*<sup>d</sup> <sup>m</sup>*(Ω) (also called the Christoffel polynomial of degree *m* for *μ*)

$$\mathcal{K}\_{m}^{\mu}(\mathbf{x},\mathbf{x}) = \sum\_{j=1}^{N\_{m}} p\_{j}^{2}(\mathbf{x}) \; , \; \mathcal{N}\_{m} = \dim(\mathbb{P}\_{m}^{d}(\Omega)) \; , \tag{1}$$

where {*pj*} is any *<sup>μ</sup>*-orthonormal basis of <sup>P</sup>*<sup>d</sup> <sup>m</sup>*(Ω). Recall that *<sup>K</sup><sup>μ</sup> <sup>m</sup>*(*x*, *x*) can be proved to be independent of the choice of the orthonormal basis. Indeed, a relevant property is the following estimate of the *L*∞-norm in terms of the *L*<sup>2</sup> *<sup>μ</sup>*-norm of polynomials

$$\|\|p\|\|\_{L^{\infty}(\Omega)} \le \sqrt{\max\_{\mathbf{x} \in \Omega} K\_m^{\mu}(\mathbf{x}, \mathbf{x})} \, \|\|p\|\|\_{L\_p^2(\Omega)} \, \text{ } \,\forall p \in \mathbb{P}\_m^d(\Omega) \,. \tag{2}$$

Now, by (1) and *μ*-orthonormality of the basis we get

$$\int\_{\Omega} K\_m^{\mu}(\mathbf{x}, \mathbf{x}) \, d\mu = \sum\_{j=1}^{N\_m} \int\_{\Omega} p\_j^2(\mathbf{x}) \, d\mu = N\_m \, \text{s} \tag{3}$$

which entails that max*x*∈<sup>Ω</sup> *<sup>K</sup><sup>μ</sup> <sup>m</sup>*(*x*, *x*) ≥ *Nm*.

Then, a probability measure *μ*<sup>∗</sup> = *μ*∗(Ω) is then called a G-optimal design for polynomial regression of degree *m* on Ω if

$$\min\_{\mu} \max\_{\mathbf{x} \in \Omega} K\_m^{\mu}(\mathbf{x}, \mathbf{x}) = \max\_{\mathbf{x} \in \Omega} K\_m^{\mu\_\*}(\mathbf{x}, \mathbf{x}) = N\_m \,. \tag{4}$$

,

Observe that, since <sup>Ω</sup> *<sup>K</sup><sup>μ</sup> <sup>m</sup>*(*x*, *x*) *dμ* = *Nm* for every *μ*, an optimal design has also the following property *Kμ*<sup>∗</sup> *<sup>m</sup>* (*x*, *x*) = *Nm*, *μ*∗-a.e. in Ω.

Now, the well-known Kiefer-Wolfowitz General Equivalence Theorem [14] (a cornerstone of optimal design theory), asserts that the difficult min-max problem (4) is equivalent to the much simpler maximization problem

$$\max\_{\mu} \det(\mathbf{G}\_m^{\mu}) \; , \; \mathbf{G}\_m^{\mu} = \left( \int\_{\Omega} \phi\_i(\mathbf{x}) \phi\_j(\mathbf{x}) \, d\mu \right)\_{1 \le i,j \le N\_m}$$

where *G<sup>μ</sup> <sup>m</sup>* is the Gram matrix (or information matrix in statistics) of *μ* in a fixed polynomial basis {*φi*} of P*<sup>d</sup> <sup>m</sup>*(Ω). Such an optimality is called D-optimality, and ensures that an optimal measure always exists, since the set of Gram matrices of probability measures is compact and convex; see for example, References [5,12] for a general proof of these results, valid for continuous as well as for discrete compact sets.

Notice that an optimal measure is neither unique nor necessarily discrete (unless Ω is discrete itself). Nevertheless, the celebrated Tchakaloff Theorem ensures the existence of a positive quadrature formula for integration in *<sup>d</sup>μ*<sup>∗</sup> on <sup>Ω</sup>, with cardinality not exceeding *<sup>N</sup>*2*<sup>m</sup>* <sup>=</sup> dim(P*<sup>d</sup>* <sup>2</sup>*m*(Ω)) and which is exact for all polynomials in P*<sup>d</sup>* <sup>2</sup>*m*(Ω). Such a formula is then a design itself, and it generates the same orthogonal polynomials and hence the same Christoffel polynomial of *μ*∗, preserving G-optimality (see Reference [15] for a proof of Tchakaloff Theorem with general measures).

We recall that G-optimality has two important interpretations in terms of statistical and deterministic polynomial regression.

From a statistical viewpoint, it is the probability measure on Ω that minimizes the maximum prediction variance by polynomial regression of degree *m*, cf. for example, Reference [5].

On the other hand, from an approximation theory viewpoint, if we call <sup>L</sup>*μ*<sup>∗</sup> *<sup>m</sup>* the corresponding weighted least squares projection operator *<sup>L</sup>*∞(Ω) <sup>→</sup> <sup>P</sup>*<sup>d</sup> <sup>m</sup>*(Ω), namely

$$\|f - \mathcal{L}\_{\mathfrak{m}}^{\mu\_\*} f\|\_{L^2\_{\mu\_\*}(\Omega)} = \min\_{p \in \mathbb{P}^d\_{\mathfrak{m}}(\Omega)} \|f - p\|\_{L^2\_{\mu\_\*}(\Omega)}.\tag{5}$$

by (2) we can write for every *<sup>f</sup>* <sup>∈</sup> *<sup>L</sup>*∞(Ω)

$$\|\mathcal{L}\_{m}^{\mu\_{\ast}}f\|\_{L^{\infty}(\Omega)} \leq \sqrt{\max\_{\mathbf{x}\in\Omega} K\_{m}^{\mu\_{\ast}}(\mathbf{x},\mathbf{x})} \|\mathcal{L}\_{m}^{\mu\_{\ast}}f\|\_{L^{2}\_{\mu\_{\ast}}(\Omega)} = \sqrt{N\_{m}} \,\|\mathcal{L}\_{m}^{\mu\_{\ast}}f\|\_{L^{2}\_{\mu\_{\ast}}(\Omega)}$$

$$\leq \sqrt{N\_{m}} \,\|f\|\_{L^{2}\_{\mu\_{\ast}}(\Omega)} \leq \sqrt{N\_{m}} \,\|f\|\_{L^{\infty}(\Omega)}$$

(where the second inequality comes from *μ*∗-orthogonality of the projection), which gives

$$\|\|\mathcal{L}\_m^{\mu\_\*}\|\| = \sup\_{f \neq 0} \frac{\|\mathcal{L}\_m^{\mu\_\*} f\|\|\_{L^\infty(\Omega)}}{\|f\|\|\_{L^\infty(\Omega)}} \le \sqrt{N\_m} \,\,\,\tag{6}$$

that is a G-optimal measure minimizes (the estimate of) the weighted least squares uniform operator norm.

We stress that in this paper we are interested in the fully discrete case of a finite design space Ω = *X*, so that any design *μ* is identified by a set of positive weights (masses) summing up to 1 and integrals are weighted sums.

### **3. Computing near G-Optimal Compressed Designs**

Since in the present context we have a finite design space <sup>Ω</sup> <sup>=</sup> *<sup>X</sup>* <sup>=</sup> {*x*1, ... , *xM*} ⊂ <sup>R</sup>*d*, we may think a design *μ* as a vector of non-negative weights *u* = (*u*1, ··· , *uM*) attached to the points, such that *u*<sup>1</sup> = 1 (the support of *μ* being identified by the positive weights). Then, a G-optimal (or D-optimal) design *μ*<sup>∗</sup> is represented by the corresponding non-negative vector *u*∗. We write *Ku <sup>m</sup>*(*x*, *<sup>x</sup>*) = *<sup>K</sup><sup>μ</sup> <sup>m</sup>*(*x*, *x*) for the Christoffel polynomial and similarly for other objects (spaces, operators, matrices) corresponding to a discrete design. At the same time, *L*∞(Ω) = -<sup>∞</sup>(*X*), and *L*<sup>2</sup> *<sup>μ</sup>*(Ω) = -2 *<sup>u</sup>*(*X*) 1/2

(a weighted -<sup>2</sup> functional space on *<sup>X</sup>*) with *<sup>f</sup>* -2 *<sup>u</sup>*(*X*) = ∑*<sup>M</sup> <sup>i</sup>*=<sup>1</sup> *ui <sup>f</sup>* <sup>2</sup>(*xi*)

In order to compute an approximation of the desired *u*∗, we resort to the basic multiplicative algorithm proposed by Titterington in the '70s (cf. Reference [16]), namely

$$u\_i(k+1) = u\_i(k)\frac{K\_m^{\mu(k)}(x\_i, x\_i)}{N\_m}, \ 1 \le i \le M, \ k = 0, 1, 2, \dots,\tag{7}$$

.

with initialization *u*(0)=(1/*M*, ... , 1/*M*)*T*. Such an algorithm is known to be convergent sublinearly to a D-optimal (or G-optimal by the Kiefer-Wolfowitz Equivalence Theorem) design, with an increasing sequence of Gram determinants

$$\det(\mathcal{G}\_{\mathfrak{m}}^{\mathfrak{u}(k)}) = \det(\boldsymbol{V}^T \operatorname{diag}(\mathfrak{u}(k))\,\boldsymbol{V})\_{\mathfrak{u}}$$

where *V* is a Vandermonde-like matrix in any fixed polynomial basis of P*<sup>d</sup> <sup>m</sup>*(*X*); cf., for example, References [7,10]. Observe that *u*(*k* + 1) is indeed a vector of positive probability weights if such is *<sup>u</sup>*(*k*). In fact, the Christoffel polynomial *<sup>K</sup>u*(*k*) *<sup>m</sup>* is positive on *<sup>X</sup>*, and calling *<sup>μ</sup><sup>k</sup>* the probability measure on *X* associated with the weights *u*(*k*) we get immediately ∑*<sup>i</sup> ui*(*k* + 1) = <sup>1</sup> *Nm* <sup>∑</sup>*<sup>i</sup> ui*(*k*) *<sup>K</sup>u*(*k*) *<sup>m</sup>* (*xi*, *xi*) = 1 *Nm <sup>X</sup> <sup>K</sup>u*(*k*) *<sup>m</sup>* (*x*, *<sup>x</sup>*) *<sup>d</sup>μ<sup>k</sup>* <sup>=</sup> 1 by (3) in the discrete case <sup>Ω</sup> <sup>=</sup> *<sup>X</sup>*. Our implementation of (7) is based on the functions


The function dCHEBVAND computes the *d*-variate Chebyshev-Vandermonde matrix *C* = (*φj*(*xi*)) <sup>∈</sup> <sup>R</sup>*M*×*Nn* , where {*φj*(*x*)} <sup>=</sup> {*Tν*<sup>1</sup> (*α*1*x*<sup>1</sup> <sup>+</sup> *<sup>β</sup>*1)... *<sup>T</sup>ν<sup>d</sup>* (*αdxd* <sup>+</sup> *<sup>β</sup>d*)}, 0 <sup>≤</sup> *<sup>ν</sup><sup>i</sup>* <sup>≤</sup> *<sup>n</sup>*, *<sup>ν</sup>*<sup>1</sup> <sup>+</sup> ··· <sup>+</sup> *<sup>ν</sup><sup>d</sup>* <sup>≤</sup> *<sup>n</sup>*, is a suitably ordered total-degree product Chebyshev basis of the minimal box [*a*1, *b*1] ×···× [*ad*, *bd*] containing *X*, with *α<sup>i</sup>* = 2/(*bi* − *ai*), *β<sup>i</sup>* = −(*bi* + *ai*)/(*bi* − *ai*). Here we have resorted to the codes in Reference [17] for the construction and enumeration of the required "monomial" degrees. Though the initial basis is then orthogonalized, the choice of the Chebyshev basis is dictated by the necessity of controlling the conditioning of the matrix. This would be on the contrary extremely large with the standard monomial basis, already at moderate regression degrees, preventing a successful orthogonalization.

Indeed, the second function dORTHVAND computes a Vandermonde-like matrix in a *u*-orthogonal polynomial basis on *X*, where *u* is the probability weight array. This is accomplished essentially by numerical rank evaluation for *C* = dCHEBVAND(*n*, *X*) and QR factorization

$$\text{diag}(\sqrt{\boldsymbol{u}}) \subset\_0 = \mathcal{Q}\mathcal{R} \; , \; \mathcal{U} = \mathbb{C}\_0 \; \mathcal{R}^{-1} \; , \tag{8}$$

(with *<sup>Q</sup>* orthogonal rectangular and *<sup>R</sup>* square invertible), where <sup>√</sup>*<sup>u</sup>* = (√*u*1, ... , <sup>√</sup>*uM*). The matrix *C*<sup>0</sup> has full rank and corresponds to a selection of the columns of *C* (i.e., of the original basis polynomials) via QR with column pivoting, in such a way that these form a basis of P*<sup>d</sup> <sup>n</sup>*(*X*), since *rank*(*C*) = dim(P*<sup>d</sup> <sup>n</sup>*(*X*)). A possible alternative, not yet implemented, is the direct use of a rank-revealing QR factorization. The in-out parameter "jvec" allows to pass directly the column index

vector corresponding to a polynomial basis after a previous call to dORTHVAND with the same degree *n*, avoiding numerical rank computation and allowing a simple "economy size" QR factorization of *diag*( <sup>√</sup>*u*) *<sup>C</sup>*<sup>0</sup> <sup>=</sup> *diag*( <sup>√</sup>*u*) *<sup>C</sup>*(:, *jvec*).

Summarizing, *U* is a Vandermonde-like matrix for degree *n* on *X* in the required *u*-orthogonal basis of P*<sup>d</sup> <sup>n</sup>*(*X*), that is

$$\left[p\_1(\mathbf{x}), \dots, p\_{\mathbf{N}\_n}(\mathbf{x})\right] = \left[\phi\_{\mathbf{j}\_1}(\mathbf{x}), \dots, \phi\_{\mathbf{j}\_{\mathbf{N}\_n}}(\mathbf{x})\right] \mathbb{R}^{-1},\tag{9}$$

where *jvec* = (*j*1, ... , *jNn* ) is the multi-index resulting from pivoting. Indeed by (8) we can write the scalar product (*ph*, *pk*)-2 *<sup>u</sup>*(*X*) as

$$(p\_h, p\_k)\_{\ell^2\_u(X)} = \sum\_{i=1}^M \mu\_i \, p\_h(\mathbf{x}\_i) \, p\_k(\mathbf{x}\_i) = (\mathsf{U}^T \text{diag}(\mathbf{u}) \, \mathsf{U})\_{\mathbb{hk}} = (\mathsf{Q}^T \mathsf{Q})\_{\mathbb{hk}} = \delta\_{\mathsf{hk}\,\mathsf{k}}$$

for 1 ≤ *h*, *k* ≤ *Nn*, which shows orthonormality of the polynomial basis in (9).

We stress that *rank*(*C*) = dim(P*<sup>d</sup> <sup>n</sup>*(*X*)) could be strictly smaller than dim(P*<sup>d</sup> <sup>n</sup>*) = ( *n*+*d <sup>d</sup>* ), when there are polynomials in P*<sup>d</sup> <sup>n</sup>* vanishing on *X* that do not vanish everywhere. In other words, *X* lies on a lower-dimensional algebraic variety (technically one says that *X* is not P*<sup>d</sup> <sup>n</sup>*-determining [13]). This certainly happens when *card*(*X*) is too small, namely *card*(*X*) < dim(P*<sup>d</sup> <sup>n</sup>*), but think for example also to the case when *d* = 3 and *X* lies on the 2-sphere *S*<sup>2</sup> (independently of its cardinality), then we have dim(P*<sup>d</sup> <sup>n</sup>*(*X*)) <sup>≤</sup> dim(P*<sup>d</sup> <sup>n</sup>*(*S*2)) = (*<sup>n</sup>* + <sup>1</sup>)<sup>2</sup> < dim(P<sup>3</sup> *<sup>n</sup>*)=(*n* + 1)(*n* + 2)(*n* + 3)/6.

Iteration (7) is implemented within the third function dNORD whose name stands for *d*-dimensional Near G-Optimal Regression Designs, which calls dORTHVAND with *n* = *m*. Near optimality is here twofold, namely it concerns both the concept of G-efficiency of the design and the sparsity of the design support.

We recall that G-efficiency is the percentage of G-optimality reached by a (discrete) design, measured by the ratio

$$G\_m(\mu) = \frac{N\_m}{ma\varkappa\_{x \in \mathcal{X}} K\_m^u(\varkappa, x)} \text{ \textquotedblleft}$$

knowing that *Gm*(*u*) ≤ 1 by (3) in the discrete case Ω = *X*. Notice that *Gm*(*u*) can be easily computed after the construction of the *u*-orthogonal Vandermonde-like matrix *U* by dORTHVAND, as *Gm*(*u*) = *Nm*/(max*<sup>i</sup> rowi*(*U*)<sup>2</sup> <sup>2</sup>) .

In the multiplicative algorithm (7), we then stop iterating when a given threshold of G-efficiency (the input parameter "gtol" in the call to dNORD) is reached by *u*(*k*), since *Gm*(*u*(*k*)) → 1 as *k* → ∞, say for example *Gm*(*u*(*k*)) ≥ 95% or *Gm*(*u*(*k*)) ≥ 99%. Since convergence is sublinear and in practice we see that 1 − *Gm*(*u*(*k*)) = O(1/*k*), for a 90% G-efficiency the number of iterations is typically in the tens, whereas it is in the hundreds for 99% one and in the thousands for 99, 9%. When a G-efficiency very close to 1 is needed, one could resort to more sophisticated multiplicative algorithms, see for example, References [9,10].

In many applications however a G-efficiency of 90–95% could be sufficient (then we may speak of near G-optimality of the design), but though in principle the multiplicative algorithm converges to an optimal design *μ*<sup>∗</sup> on *X* with weights *u*<sup>∗</sup> and cardinality *Nm* ≤ *card*(*supp*(*μ*∗)) ≤ *N*2*m*, such a sparsity is far from being reached after the iterations that guarantee near G-optimality, in the sense that there is a still large percentage of non-negligible weights in the near optimal design weight vector, say

$$
\mu(\overline{k})\text{ such that }\ G\_m(\mu(\overline{k})) \ge \gcd.\tag{10}
$$

Following References [18,19], we can however effectively compute a design which has the same G-efficiency of *u*(*k*) but a support with a cardinality not exceeding *N*2*<sup>m</sup>* = dim(P*<sup>d</sup>* <sup>2</sup>*m*(*X*)), where in many applications *N*2*<sup>m</sup> card*(*X*), obtaining a remarkable compression of the near optimal design.

The theoretical foundation is a generalized version [15] of Tchakaloff Theorem [20] on positive quadratures, which asserts that for every measure on a compact set <sup>Ω</sup> <sup>⊂</sup> <sup>R</sup>*<sup>d</sup>* there exists an algebraic quadrature formula exact on P*<sup>d</sup> <sup>n</sup>*(Ω)), with positive weights, nodes in Ω and cardinality not exceeding *Nn* = dim(P*<sup>d</sup> <sup>n</sup>*(Ω).

In the present discrete case, that is, where the designs are defined on Ω = *X*, this theorem implies that for every design *μ* on *X* there exists a design *ν*, whose support is a subset of *X*, which is exact for integration in *dμ* on P*<sup>d</sup> <sup>n</sup>*(*X*). In other words, the design *ν* has the same basis moments (indeed, for any basis of P*<sup>d</sup> <sup>n</sup>*(Ω))

$$\int\_X p\_j(\mathbf{x}) \, d\mu = \sum\_{i=1}^M u\_i \, p\_j(\mathbf{x}\_i) = \int\_X p\_j(\mathbf{x}) \, d\nu = \sum\_{\ell=1}^L w\_\ell \, p\_j(\xi\_\ell) \, , \quad 1 \le j \le N\_n \,\mu$$

where *L* ≤ *Nn* ≤ *M*, {*ui*} are the weights of *μ*, *supp*(*ν*) = {*ξ*-} ⊆ *X* and {*w*-} are the positive weights of *ν*. For *L* < *M*, which certainly holds if *Nn* < *M*, this represents a compression of the design *μ* into the design *ν*, which is particularly useful when *Nn M*.

In matrix terms this can be seen as the fact that the underdetermined {*pj*}-moment system

$$\mathcal{U}\_n^T \upsilon = \mathcal{U}\_n^T \iota \tag{11}$$

has a non-negative solution *v* = (*v*1, ... , *vM*)*<sup>T</sup>* whose positive components, say *w*- = *vi*- , 1 ≤ - ≤ *L* ≤ *Nn*, determine the support points {*ξ*-} ⊆ *X* (for clarity we indicate here by *Un* the matrix *U* computed by dORTHVAND at degree *n*). This fact is indeed a consequence of the celebrated Caratheodory Theorem on conic combinations [21], asserting that a linear combination with non-negative coefficients of *M* vectors in R*<sup>N</sup>* with *M* > *N* can be re-written as linear positive combination of at most *N* of them. So, we get the discrete version of Tchakaloff Theorem by applying Caratheodory Theorem to the columns of *U<sup>T</sup> <sup>n</sup>* in the system (11), ensuring then existence of a non-negative solution *v* with at most *Nn* nonzero components.

In order to compute such a solution to (11) we choose the strategy based on Quadratic Programming introduced in Reference [22], namely on sparse solution of the Non-Negative Least Squares (NNLS) problem

$$w = \operatorname{argmin}\_{z \in \mathbb{R}^{M\_s}, z \ge 0} \|\mathcal{U}\_n^T z - \mathcal{U}\_n^T \boldsymbol{\mu}\|\_2^2$$

by a new accelerated version of the classical Lawson-Hanson active-set method, proposed in Reference [3] in the framework of design optimization in *d* = 2, 3 and implemented by the function LHDM (Lawson-Hanson with Deviation Maximization), that we tune in the present package for very large-scale *d*-variate problems (see the next subsection for a brief description and discussion). We observe that working with an orthogonal polynomial basis of P*<sup>d</sup> <sup>n</sup>*(*X*) allows to deal with the well-conditioned matrix *Un* in the Lawson-Hanson algorithm.

The overall computational procedure is implemented by the function

• [*pts*, *w*, *momerr*] = dCATCH(*n*, *X*, *u*),

where dCATCH stands for *d*-variate CAratheodory-TCHakaloff discrete measure compression. It works for any discrete measure on a discrete set *X*. Indeed, it could be used, other than for design compression, also in the compression of *d*-variate quadrature formulas, to give an example. The output parameter *pts* = {*ξ*-} ⊂ *X* is the array of support points of the compressed measure, while *w* = {*w*-} = {*vi*- > 0} is the corresponding positive weight array (that we may call a *d*-variate near G-optimal Tchakaloff design) and *momerr* <sup>=</sup> *U<sup>T</sup> <sup>n</sup> <sup>v</sup>* <sup>−</sup> *<sup>U</sup><sup>T</sup> <sup>n</sup> u*<sup>2</sup> is the moment residual. This function is called LHDM.

In the present framework we call dCATCH with *n* = 2*m* and *u* = *u*(*k*), cf. (10), that is, we solve

$$\upsilon = \operatorname{argmin}\_{z \in \mathbb{R}^M, z \ge 0} ||\mathcal{U}\_{2m}^T z - \mathcal{U}\_{2m}^T \mu(\overline{k})||\_2^2 \ . \tag{12}$$

In such a way the compressed design generates the same scalar product of *u*(*k*) in P*<sup>d</sup> <sup>m</sup>*(*X*), and hence the same orthogonal polynomials and the same Christoffel function on *X* keeping thus invariant the G-efficiency

$$\mathbb{P}\_{2m}^d(X) \ni K\_m^\upsilon(\mathbf{x}, \mathbf{x}) = K\_m^{u(\overline{k})}(\mathbf{x}, \mathbf{x}) \,\,\forall \mathbf{x} \in X \implies \mathsf{G}\_m(\upsilon) = \mathsf{G}\_m(u(\overline{k})) \ge \mathsf{g} \,\mathsf{tol} \tag{13}$$

with a (much) smaller support.

From a deterministic regression viewpoint (approximation theory), let us denote by *p opt <sup>m</sup>* the polynomial in P*<sup>d</sup> <sup>m</sup>*(*X*) of best uniform approximation for *f* on *X*, where we assume *f* ∈ *C*(*D*) with *<sup>X</sup>* <sup>⊂</sup> *<sup>D</sup>* <sup>⊂</sup> *<sup>R</sup>d*, *<sup>D</sup>* being a compact domain (or even lower-dimensional manifold), and by *Em*(*<sup>f</sup>* ; *<sup>X</sup>*) = inf*p*∈P*<sup>d</sup> <sup>m</sup>*(*X*) *f* − *p*-<sup>∞</sup>(*X*) = *f* − *p opt m* |-<sup>∞</sup>(*X*) and *Em*(*<sup>f</sup>* ; *<sup>D</sup>*) = inf*p*∈P*<sup>d</sup> <sup>m</sup>*(*D*) *f* − *pL*∞(*D*) the best uniform polynomial approximation errors on *X* and *D*.

Then, denoting by <sup>L</sup>*u*(*k*) *<sup>m</sup>* and <sup>L</sup>*<sup>w</sup> <sup>m</sup> <sup>f</sup>* <sup>=</sup> <sup>L</sup>*<sup>v</sup> <sup>m</sup> f* the weighted least squares polynomial approximation of *f* (cf. (5)) by the near G-optimal weights *u*(*k*) and *w*, respectively, with the same reasoning used to obtain (6) and by (13) we can write the operator norm estimates

$$\|\mathcal{L}\_{\mathfrak{m}}^{\mathfrak{u}(\mathfrak{F})}\| \; \; \|\mathcal{L}\_{\mathfrak{m}}^{\mathfrak{w}}\| \le \sqrt{\aleph\_{\mathfrak{m}}} \le \sqrt{\frac{\aleph\_{m}}{\mathfrak{g}\mathfrak{t}\mathfrak{o}\mathfrak{l}}} \; \; \; \tilde{\mathsf{N}}\_{\mathfrak{m}} = \frac{\aleph\_{\mathfrak{m}}}{\mathcal{G}\_{\mathfrak{m}}(\mathfrak{u}(\overline{\mathfrak{k}}))} = \frac{\aleph\_{\mathfrak{m}}}{\mathcal{G}\_{\mathfrak{m}}(\upsilon)} \; \; \; \|$$

Moreover, since <sup>L</sup>*<sup>w</sup> <sup>m</sup> <sup>p</sup>* <sup>=</sup> *<sup>p</sup>* for any *<sup>p</sup>* <sup>∈</sup> <sup>P</sup>*<sup>d</sup> <sup>m</sup>*(*X*), we can write the near optimal estimate

$$\|f - \mathcal{L}\_m^w f\|\_{\ell^\infty(X)} \le \|f - p\_m^{opt}\|\_{\ell^\infty(X)} + \|p\_m^{opt} - \mathcal{L}\_m^w p\_m^{opt}\|\_{\ell^\infty(X)} + \|\mathcal{L}\_m^w p\_m^{opt} - \mathcal{L}\_m^w f\|\_{\ell^\infty(X)}$$

$$= \|f - p\_m^{opt}\|\_{\ell^\infty(X)} + \|\mathcal{L}\_m^w p\_m^{opt} - \mathcal{L}\_m^w f\|\_{\ell^\infty(X)} \le \left(1 + \|\mathcal{L}\_m^w\|\right) E\_m(f; X)$$

$$\le \left(1 + \sqrt{\frac{N\_m}{\mathcal{g}tol}}\right) E\_m(f; X) \le \left(1 + \sqrt{\frac{N\_m}{\mathcal{g}tol}}\right) E\_m(f; D) \approx \left(1 + \sqrt{N\_m}\right) E\_m(f; D) \,.$$

Notice that <sup>L</sup>*<sup>w</sup> <sup>m</sup> f* is constructed by sampling *f* only at the compressed support {*ξ*-} ⊂ *X*. The error depends on the regularity of *f* on *D* ⊃ *X*, with a rate that can be estimated whenever *D* admits a multivariate Jackson-like inequality, cf. Reference [23].

*Accelerating the Lawson-Hanson Algorithm by Deviation Maximization (LHDM)*

Let *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>*N*×*<sup>M</sup>* and *<sup>b</sup>* <sup>∈</sup> <sup>R</sup>*N*. The NNLS problem consists of seeking *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>M</sup>* that solves

$$\mathbf{x} = \operatorname\*{argmin}\_{z \ge 0} \|Az - b\|\_2^2. \tag{14}$$

This is a convex optimization problem with linear inequality constraints that define the *feasible region*, that is the positive orthant *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>M</sup>* : *xi* <sup>≥</sup> <sup>0</sup> . The very first algorithm dedicated to problem (14) is due to Lawson and Hanson [24] and it is still one of the most often used. It was originally derived for solving overdetermined linear systems, with *N M*. However, in the case of underdetermined linear systems, with *N M*, this method succeeds in sparse recovery.

Recall that for a given point *x* in the feasible region, the index set {1, . . . , *M*} can be partitioned into two sets: the active set *Z*, containing the indices of active constraints *xi* = 0, and the passive set *P*, containing the remaining indices of inactive constraints *xi* > 0. Observe that an optimal solution *x* of (14) satisfies *Ax* = *b* and, if we denote by *P* and *Z* the corresponding passive and active sets respectively, *x* also solves in a least square sense the following unconstrained least squares subproblem

$$\|x\_{P\*}^\* = \operatorname\*{argmin}\_y \|A\_{P\*}y - b\|\_2^2 \,. \tag{15}$$

where *AP* is the submatrix containing the columns of *A* with index in *P*, and similarly *x <sup>P</sup>* is the subvector made of the entries of *x* whose index is in *P*. The remaining entries of *x*, namely those whose index is in *Z*, are null.

The Lawson-Hanson algorithm, starting from a null initial guess *x* = 0 (which is feasible), incrementally builds an optimal solution by moving indices from the active set *Z* to the passive set *P* and vice versa, while keeping the iterates within the feasible region. More precisely, at each iteration first order information is used to detect a column of the matrix *A* such that the corresponding entry in the new solution vector will be strictly positive; the index of such a column is moved from the active set *Z* to the passive set *P*. Since there's no guarantee that the other entries corresponding to indices in the former passive set will stay positive, an inner loop ensures the new solution vector falls into the feasible region, by moving from the passive set *P* to the active set *Z* all those indices corresponding to violated constraints. At each iteration a new iterate is computed by solving a least squares problem of type (15): this can be done, for example, by computing a QR decomposition, which is substantially expensive. The algorithm terminates in a finite number of steps, since the possible combinations of passive/active set are finite and the sequence of objective function values is strictly decreasing, cf. Reference [24].

The *deviation maximization* (DM) technique is based on the idea of adding a whole set of indices *T* to the passive set at each outer iteration of the Lawson-Hanson algorithm. This corresponds to select a block of new columns to insert in the matrix *AP*, while keeping the current solution vector within the feasible region in such a way that sparse recovery is possible when dealing with non-strictly convex problems. In this way, the number of total iterations and the resulting computational cost decrease. The set *T* is initialized to the index chosen by the standard Lawson-Hanson (LH) algorithm, and it is then extended, within the same iteration, using a set of candidate indices *C* chosen is such a way that the corresponding entries are likely positive in the new iterate. The elements of *T* are then chosen carefully within *C*: note that if the columns corresponding to the chosen indices are linearly dependent, the submatrix of the least squares problem (15) will be rank deficient, leading to numerical difficulties. We add *k* new indices, where *k* is an integer parameter to tune on the problem size, in such a way that, at the end, for every pair of indices in the set *T*, the corresponding column vectors form an angle whose cosine in absolute value is below a given threshold *thres*. The whole procedure is implemented in the function

### • [*x*,*resnorm*,*exitflag*] = LHDM(*A*, *b*, *options*).

The input variable *options* is a structure containing the user parameters for the LHDM algorithm; for example, the aforementioned *k* and *thres*. The output parameter *x* is the least squares solution, *resnorm* is the squared 2-norm of the residual and *exitflag* is set to 0 if the LHDM algorithm has reached the maximum number of iterations without converging and 1 otherwise.

In the literature, an accelerating technique was introduced by Van Benthem and Keenan [25], who presented a different NNLS solution algorithm, namely "fast combinatorial NNLS", designed for the specific case of a large number of right-hand sides. The authors exploited a clever reorganization of computations in order to take advantage of the combinatorial nature of the problems treated (multivariate curve resolution) and introduced a nontrivial initialization of the algorithm by means of unconstrained least squares solution. In the following section we are going to compare such an approach, briefly named LHI, and the standard LH algorithm with the LHDM procedure just summarized.

### **4. Numerical Examples**

In this section, we perform several tests on the computation of *d*-variate near G-optimal Tchakaloff designs, from low to moderate dimension *d*. In practice, we are able to treat, on a personal computer, large-scale problems where *card*(*X*) <sup>×</sup> dim(*P<sup>d</sup>* <sup>2</sup>*m*) is up to 108–109, with dim(*P<sup>d</sup>* <sup>2</sup>*m*) = ( 2*m*+*d <sup>d</sup>* ) = ( 2*m*+*d* <sup>2</sup>*<sup>m</sup>* ). Recall that the main memory requirement is given by the *<sup>N</sup>*2*<sup>m</sup>* <sup>×</sup> *<sup>M</sup>* matrix *<sup>U</sup><sup>T</sup>* in the compression process solved by the LHDM algorithm, where *M* = *card*(*X*) and *N*2*<sup>m</sup>* = dim(*P<sup>d</sup>* <sup>2</sup>*m*(*X*)) <sup>≤</sup> dim(*P<sup>d</sup>* <sup>2</sup>*m*).

Given the dimension *d* > 1 and the polynomial degree *m*, the routine LHDM empirically sets the parameter *k* as follows *k* = ( 2*m*+*d <sup>d</sup>* )/(*m*(*<sup>d</sup>* <sup>−</sup> <sup>1</sup>)), while the threshold is *thres* <sup>=</sup> *cos*( *<sup>π</sup>* <sup>2</sup> − *θ*), *θ* ≈ 0.22. All the tests are performed on a workstation with a 32 GB RAM and an Intel Core i7-8700 CPU @ 3.20 GHz.

### *4.1. Complex 3d Shapes*

To show the flexibility of the package *dCATCH*, we compute near G-optimal designs on a "multibubble" *<sup>D</sup>* <sup>⊂</sup> <sup>R</sup><sup>3</sup> (i.e., the union of a finite number of non-disjoint balls), which can have a very complex shape with a boundary surface very difficult to describe analytically. Indeed, we are able to implement near optimal regression on quite complex solids, arising from finite union, intersection and set difference of simpler pieces, possibly multiply-connected, where for each piece we have available the indicator function via inequalities. Grid-points or low-discrepancy points, for example, Halton points, of a surrounding box, could be conveniently used to discretize the solid. Similarly, thanks to the adaptation of the method to the actual dimension of the polynomial spaces, we can treat near optimal regression on the surfaces of such complex solids, as soon as we are able to discretize the surface of each piece by point sets with good covering properties (for example, we could work on the surface of a multibubble by discretizing each sphere via one of the popular spherical point configurations, cf. Reference [26]).

We perform a test at regression degree *m* = 10 on the 5-bubble shown in Figure 1b. The initial support *X* consists in the *M* = 18,915 points within 64,000 low discrepancy Halton points, falling in the closure of the multibubble. Results are shown in Figure 1 and Table 3.

(**b**)

**Figure 1.** Multibubble test case, regression degree *m* = 10. (**a**) The evolution of the cardinality of the passive set *P* along the iterations of the three LH algorithms. (**b**) Multibubble with 1763 compressed Tchakaloff points, extracted from 18,915 original points.

**Table 3.** Results for the multibubble numerical test: *compr* = *M*/*mean*(*cpts*) is the mean compression ratio obtained by the three methods listed; *tLH*/*tTitt* is the ratio between the execution time of LH and that of the Titterington algorithm; *tLH*/*tLHDM* (*tLHI*/*tLHDM*) is the ratio between the execution time of LH (LHI) and that of LHDM ; *cpts* is the number of compressed Tchakaloff points and *momerr* is the final moment residual.


#### *4.2. Hypercubes: Chebyshev Grids*

In a recent paper [19], a connection has been studied between the statistical notion of G-optimal design and the approximation theoretic notion of admissible mesh for multivariate polynomial approximation, deeply studied in the last decade after Reference [13] (see, e.g., References [27,28] with the references therein). In particular, it has been shown that near G-optimal designs on admissible meshes of suitable cardinality have a G-efficiency on the whole *d*-cube that can be made convergent to 1. For example, it has been proved by the notion of Dubiner distance and suitable multivariate polynomial inequalities, that a design with G-efficiency *γ* on a grid *X* of (2*km*)*<sup>d</sup>* Chebyshev points (the zeros of *T*2*km*(*t*) = *cos*(2*km* arccos(*t*)), *t* ∈ [−1, 1]), is a design for [−1, 1] *<sup>d</sup>* with G-efficiency *<sup>γ</sup>*(<sup>1</sup> <sup>−</sup> *<sup>π</sup>*2/(8*k*2)). For example, taking *k* = 3 a near G-optimal Tchakaloff design with *γ* = 0.99 on a Chebyshev grid of (6*m*)*<sup>d</sup>* points is near G-optimal on [−1, 1] *<sup>d</sup>* with G-efficiency approximately 0.99 · 0.86 <sup>≈</sup> 0.85, and taking *k* = 4 (i.e., a Chebyshev grid of (8*m*)*<sup>d</sup>* points) the corresponding G-optimal Tchakaloff design has G-efficiency approximately 0.99 · 0.92 ≈ 0.91 on [−1, 1] *<sup>d</sup>* (in any dimension *d*).

We perform three tests in different dimension spaces and at different regression degrees. Results are shown in Figure 2 and Table 4, using the same notation above.

(**a**) *d* = 3, *n* = 6, *M* = 110,592. (**b**) *d* = 4, *n* = 3, *M* = 331,776.

(**c**) *d* = 5, *n* = 2, *M* = 1,048,576.

**Figure 2.** The evolution of the cardinality of the passive set *P* along the iterations of the three LH algorithms for Chebyshev nodes' tests.

**Table 4.** Results of numerical tests on *M* = (2*km*)*<sup>d</sup>* Chebyshev's nodes, with *k* = 4, with different dimensions and degrees: *compr* = *M*/*mean*(*cpts*) is the mean compression ratio obtained by the three methods listed; *tLH*/*tTitt* is the ratio between the execution time of LH and that of Titterington algorithm; *tLH*/*tLHDM* (*tLHI*/*tLHDM*) is the ratio between the execution time of LH (LHI) and that of LHDM; *cpts* is the number of compressed Tchakaloff points and *momerr* is the final moment residual.


### *4.3. Hypercubes: Low-Discrepancy Points*

The direct connection of Chebyshev grids with near G-optimal designs discussed in the previous subsection suffers rapidly of the curse of dimensionality, so only regression at low degree in relatively low dimension can be treated. On the other hand, in sampling theory a number of discretization nets with good space-filling properties on hypercubes has been proposed and they allow to increase the dimension *d*. We refer in particular to Latin hypercube sampling or low-discrepancy points (Sobol, Halton and other popular sequences); see for example, Reference [29]. These families of points give a discrete model of hypercubes that can be used in many different deterministic and statistical applications.

Here we consider a discretization made via Halton points. We present in particular two examples, where we take as finite design space *X* a set of *M* = 105 Halton points, in *d* = 4 with regression degree *m* = 5, and in *d* = 10 with *m* = 2. In both examples, dim(*P<sup>d</sup>* <sup>2</sup>*m*) = ( 2*m*+*d <sup>d</sup>* ) = ( 2*m*+*d* <sup>2</sup>*<sup>m</sup>* ) = ( 14 <sup>4</sup> ) = 1001, so that the largest matrix involved in the construction is the 1001× 100,000 Chebyshev-Vandermonde matrix *C* for degree 2*m* on *X* constructed at the beginning of the compression process (by dORTHVAND within dCATCH to compute *U*2*<sup>m</sup>* in (12)).

Results are shown in Figure 3 and Table 5, using the same notation as above.

**Remark 1.** *The computational complexity of dCATCH mainly depends on the QR decompositions, which clearly limit the maximum size of the problem and mainly determine the execution time. Indeed, the computational complexity of a QR factorization of a matrix of size nr* <sup>×</sup> *nc, with nc* <sup>≤</sup> *nr, is high, namely* <sup>2</sup>(*n*<sup>2</sup> *<sup>c</sup> nr* <sup>−</sup> *<sup>n</sup>*<sup>3</sup> *<sup>c</sup>*/3) ≈ 2*n*<sup>2</sup> *<sup>c</sup> nr (see, e.g., Reference [30]).*

*Titterington algorithm performs a QR factorization of a M* × *Nm matrix at each iteration, with the following overall computational complexity*

$$\mathbb{C}\_{Titt} \approx 2\bar{k} \, M \, N\_{\text{m}}^2 \, \mu$$

*where* ¯ *k is the number of iterations necessary for convergence, that depends on the desired G-efficiency.*

*On the other hand, the computational cost of one iteration of the Lawson-Hanson algorithm, fixed the passive set P, is given by the solution of an LS problem of type (15), which approximately is* 2*N*2*m*|*P*| <sup>2</sup> *that is the cost of a QR decomposition of a matrix of size N*2*<sup>m</sup>* × |*P*|*. However, as experimental results confirm, the evolution of the set P along the execution of the algorithm may vary significantly depending on the experiment settings, so that the exact overall complexity is hard to estimate. Lower and upper bounds are available, but may lead to heavy under- and over-estimations, respectively; cf. Reference [31] for a discussion on complexity issues.*

(**a**) *d* = 10, *m* = 2, *M* = 10,000. (**b**) *d* = 10, *m* = 2, *M* = 100,000.

(**c**) *d* = 4, *m* = 5, *M* = 10,000. (**d**) *d* = 4, *m* = 5, *M* = 100,000.

**Figure 3.** The evolution of the cardinality of the passive set *P* along the iterations of the three LH algorithms for Halton points' tests.

**Table 5.** Results of numerical tests on Halton points: *compr* = *M*/*mean*(*cpts*) is the mean compression ratio obtained by the three methods listed; *tLH*/*tTitt* is the ratio between the execution time of LH and that of Titterington algorithm; *tLH*/*tLHDM* (*tLHI*/*tLHDM*) is the ratio between the execution time of LH (LHI) and that of LHDM; *cpts* is the number of compressed Tchakaloff points and *momerr* is the final moment residual.


### **5. Conclusions**

In this paper, we have presented *dCATCH* [1], a numerical software package for the computation of a *<sup>d</sup>*-variate near G-optimal polynomial regression design of degree *<sup>m</sup>* on a finite design space *<sup>X</sup>* <sup>⊂</sup> <sup>R</sup>*d*. The mathematical foundation is discussed connecting statistical design theoretic and approximation theoretic aspects, with a special emphasis on deterministic regression (Weighted Least Squares). The package takes advantage of an accelerated version of the classical NNLS Lawson-Hanson solver developed by the authors and applied to design compression.

As a few examples of use cases of this package we have shown the results on a complex shape (multibubble) in three dimensions, and on hypercubes discretized with Chebyshev grids and with Halton points, testing different combinations of dimensions and degrees which generate large-scale problems for a personal computer.

The present package, *dCATCH* works for any discrete measure on a discrete set *X*. Indeed, it could be used, other than for design compression, also in the compression of *d*-variate quadrature formulas, even on lower-dimensional manifolds, to give an example.

We may observe that with this approach we can compute a *d*-variate compressed design starting from a high-cardinality sampling set *X*, that discretizes a continuous compact set (see Sections 4.2 and 4.3). This design allows an *m*-th degree near optimal polynomial regression of a function on the whole *X*, by sampling on a small design support. We stress that the compressed design is function-independent and thus can be constructed "once and for all" in a pre-processing stage. This approach is potentially useful, for example, for the solution of *d*-variate parameter estimation problems, where we may think to model a nonlinear cost function by near optimal polynomial regression on a discrete *d*-variate parameter space *X*; cf., for example, References [32,33] for instances of parameter estimation problems from mechatronics applications (*Digital Twins* of controlled systems) and references on the subject. Minimization of the polynomial model could then be accomplished by popular methods developed in the growing research field of Polynomial Optimization, such as Lasserre's SOS (Sum of Squares) and measure-based hierarchies, and other recent methods; cf., for example, References [34–36] with the references therein.

From a computational viewpoint, the results shown in Tables 3–5 show relevant speed-ups in the compression stage, with respect to the standard Lawson-Hanson algorithm, in terms of the number of iterations required and of computing time within the Matlab scripting language. In order to further decrease the execution times and to allow us to tackle larger design problems, we would like in the near future to enrich the package *dCATCH* with an efficient C implementation of its algorithms and, possibly, a CUDA acceleration on GPUs.

**Author Contributions:** Investigation, M.D., F.M. and M.V. All authors have read and agreed to the published version of the manuscript.

**Funding:** Work partially supported by the DOR funds and the biennial project Project BIRD192932 of the University of Padova, and by the GNCS-INdAM. This research has been accomplished within the RITA "Research ITalian network on Approximation".

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Article* **Exact Solutions to the Maxmin Problem max** *-Ax-* **Subject to** *-Bx- ≤* **1**

### **Soledad Moreno-Pulido 1,†, Francisco Javier Garcia-Pacheco 1,†, Clemente Cobos-Sanchez 2,† and Alberto Sanchez-Alzola 3,\*,†**


Received: 15 October 2019; Accepted: 30 December 2019; Published: 4 January 2020

**Abstract:** In this manuscript we provide an exact solution to the maxmin problem max *Ax* subject to *Bx* ≤ 1, where *A* and *B* are real matrices. This problem comes from a remodeling of max *Ax* subject to min *Bx*, because the latter problem has no solution. Our mathematical method comes from the Abstract Operator Theory, whose strong machinery allows us to reduce the first problem to max *Cx* subject to *x* ≤ 1, which can be solved exactly by relying on supporting vectors. Finally, as appendices, we provide two applications of our solution: first, we construct a truly optimal minimum stored-energy Transcranian Magnetic Stimulation (TMS) coil, and second, we find an optimal geolocation involving statistical variables.

**Keywords:** maxmin; supporting vector; matrix norm; TMS coil; optimal geolocation

**MSC:** 47L05, 47L90, 49J30, 90B50

### **1. Introduction**

### *1.1. Scope*

Different scientific fields, such as Physics, Statistics, Economics, or Engineering, deal with real-life problems that are usually modelled by the experts on those fields using matrices and their norms (see [1–6]). A typical modelling is the following original maxmin problem

$$\left\{ \begin{array}{l} \max \|Ax\| \\ \min \|Bx\| \end{array} \right\}$$

One of the most iconic results in this manuscript (Theorem 2) shows that the previous problem, regarded strictly as a multiple optimization problem, has no solutions. To save this obstacle we provide a different model, such as

$$\begin{cases} \max \|Ax\| \\ \|Bx\| \le 1. \end{cases}$$

Here in this article we justify the remodelling of the original maxmin problem and we solve it by making use of supporting vectors. This concept comes from the Theory of Banach Spaces and Operator

Theory. Given a matrix *A*, a supporting vector is a unit vector *x* such that *A* attains its norm at *x*, that is, *x* is a solution of the following single optimization problem:

$$\begin{cases} \max \|Ax\| \\ \|x\| = 1. \end{cases}$$

The geometric and topological structure of supporting vectors can be consulted in [7–9]. On the other hand, generalized supporting vectors are defined and studied in [7,8]. The generalized supporting vectors of a finite sequence of matrices *A*1, ... , *An*, for the Euclidean norm •2, are the solutions of

$$\begin{cases} \max \|A\_1 x\|\_2^2 + \dots + \|A\_n x\|\_2^2\\ \|x\|\_2 = 1. \end{cases}$$

This optimization problem clearly generalizes the previous one.

Supporting vectors were originally applied in [10] to truly optimally design a TMS coil, because until that moment TMS coils had only been designed by means of heuristic methods, which were never proved to be convergent. In [10] a three-component TMS coil problem is posed but only the one-component case was resolved. The three-component case was stated and solved by means of the generalized supporting vectors in [8]. In this manuscript, we model a TMS coil with a maxmin problem and solve it exactly with our method.

A second application of supporting vectors was given in [8], where an optimal location situation using Principal Component Analysis (PCA) was solved. In this manuscript, we model a more complex PCA problem as an optimal maxmin geolocation involving statistical variables.

For other perspective on supporting vectors and generalized supporting vectors, we refer the reader to [9].

### *1.2. Background*

In the first place, we refer the reader to [8] (Preliminaries) for a general review of multiobjective optimization problems and their reformulations to avoid the lack of solutions (generally caused by the existence of many objective functions).

The original maxmin optimization problem has the form

$$M := \begin{cases} \max g(x) \\ \min f(x) \end{cases}$$

where *f* , *g* : *X* → (0, ∞) are real-valued functions and *X* is a nonempty set. Notice that

$$\text{sol}(M) = \arg\max \lg(x) \cap \arg\min f(x).$$

Many real-life problems can be mathematically model, such as a maxmin. However, this kind of multiobjective optimization problems may have the inconvenience of lacking a solution. If this occurs, then we are in need of remodeling the real-life problem with another mathematical optimization problem that has a solution and still models the real-life problem very accurately.

According to [10] (Theorem 5.1), one can realize that, in case sol(*M*) = ∅, the following optimization problems are good alternatives to keep modeling the real-life problem accurately:

• max *g*(*x*) min *f*(*x*) reform −→ min *<sup>f</sup>*(*x*) *g*(*x*) *<sup>g</sup>*(*x*) <sup>=</sup> <sup>0</sup> .

$$\bullet \quad \left\{ \begin{array}{c} \max \mathcal{g}(\mathbf{x}) \\ \min f(\mathbf{x}) \end{array} \stackrel{\text{reference}}{\longrightarrow} \left\{ \begin{array}{c} \max \frac{\mathcal{g}(\mathbf{x})}{f(\mathbf{x})} \\ f(\mathbf{x}) \neq \mathbf{0} \end{array} \right. \right.$$


We will prove in the third section that all four previous reformulations are equivalent for the original maxmin max *Ax* min *Bx* . In the fourth section, we will solve the reformulation max *Ax Bx* ≤ <sup>1</sup> .

### **2. Characterizations of Operators with Null Kernel**

Kernels will play a fundamental role towards solving the general reformulated maxmin (2) as shown in the next section. This is why we first study the operators with null kernel.

Throughout this section, all monoid actions considered will be left, all rings will be associative, all rings will be unitary rngs, all absolute semi-values and all semi-norms will be non-zero, all modules over rings will be unital, all normed spaces will be real or complex and all algebras will be unitary and complex.

Given a rng *R* and an element *s* ∈ *R*, we will denote by *d*(*s*) to the set of left divisors of *s*, that is,

$$\ell d(s) := \{ r \in \mathbb{R} : \exists \ t \in \mathbb{R} \mid \{ 0 \} \text{ with } rt = s \}. \text{ }$$

Similarly, *rd*(*s*) stands for the set of right divisors of *s*. If *R* is a ring, then the set of its invertibles is usually denoted by U(*R*). Notice that *d*(1) (*rd*(1)) is precisely the subset of elements of *R* which are right-(left) invertible. As a consequence, U(*R*) = *d*(1) ∩ *rd*(1). Observe also that *<sup>d</sup>*(0) <sup>∩</sup> *rd*(1) = <sup>∅</sup> <sup>=</sup> *rd*(0) ∩ *d*(1). In general we have that *d*(0) ∩ *<sup>d</sup>*(1) <sup>=</sup> <sup>∅</sup> and *rd*(0) <sup>∩</sup> *rd*(1) <sup>=</sup> <sup>∅</sup>. Later on in Example <sup>1</sup> we will provide an example of a ring where *rd*(0) <sup>∩</sup> *rd*(1) <sup>=</sup> <sup>∅</sup>.

Recall that an element *p* of a monoid is called involutive if *p*<sup>2</sup> = 1. Given a rng *R*, an involution is an additive, antimultiplicative, composition-involutive map ∗ : *R* → *R*. A ∗-rng is a rng endowed with an involution.

The categorical concept of monomorphism will play an important role in this manuscript. A morphism *<sup>f</sup>* ∈ hom<sup>C</sup> (*A*, *<sup>B</sup>*) between objects *<sup>A</sup>* and *<sup>B</sup>* in a category C is said to be a monomorphism provided that *<sup>f</sup>* ◦ *<sup>g</sup>* = *<sup>f</sup>* ◦ *<sup>h</sup>* implies *<sup>g</sup>* = *<sup>h</sup>* for all *<sup>C</sup>* ∈ ob(C) and all *<sup>g</sup>*, *<sup>h</sup>* ∈ hom<sup>C</sup> (*C*, *<sup>A</sup>*). Once can check that if *<sup>f</sup>* ∈ hom<sup>C</sup> (*A*, *<sup>B</sup>*) and there exist *<sup>C</sup>*<sup>0</sup> ∈ ob(C) and *<sup>g</sup>*<sup>0</sup> ∈ hom<sup>C</sup> (*B*, *<sup>C</sup>*0) such that *<sup>g</sup>*<sup>0</sup> ◦ *<sup>f</sup>* is a monomorphism, then *<sup>f</sup>* is also a monomorphism. In particular, if *<sup>f</sup>* ∈ hom<sup>C</sup> (*A*, *<sup>B</sup>*) is a section, that is, exists *<sup>g</sup>* ∈ hom<sup>C</sup> (*B*, *<sup>A</sup>*) such that *<sup>g</sup>* ◦ *<sup>f</sup>* = *IA*, then *<sup>f</sup>* is a monomorphism. As a consequence, the elements of hom<sup>C</sup> (*A*, *<sup>A</sup>*) that have a left inverse are monomorphisms. In some categories, the last condition suffices to characterize monomorphisms. This is the case, for instance, of the category of vector spaces over a division ring.

Recall that CL(*X*,*Y*) denotes the space of continuous linear operators from a topological vector space *X* to another topological vector space *Y*.

**Proposition 1.** *A continuous linear operator T* : *X* → *Y between locally convex Hausdorff topological vector spaces X and Y verifies that* ker(*T*) = {0} *if and only if exists S* ∈ CL(*Y*, *X*) \ {0} *with T* ◦ *S* = 0*. In particular, if X* = *Y, then* ker(*T*) = {0} *if and only if T* ∈ *d*(0) *in* CL(*X*)*.*

**Proof.** Let *S* ∈ CL(*Y*, *X*) \ {0} such that *T* ◦ *S* = 0. Fix any *y* ∈ *Y* \ ker(*S*), then *S*(*y*) = 0 and *T*(*S*(*y*)) = 0 so *S*(*y*) ∈ ker(*T*) \ {0}. Conversely, if ker(*T*) = {0}, then fix *x*<sup>0</sup> ∈ ker(*T*) \ {0} and *y*∗ <sup>0</sup> ∈ *Y*<sup>∗</sup> \ {0} (the existence of *y*<sup>∗</sup> is guaranteed by the Hahn-Banach Theorem on the Hausdorff locally convex topological vector space *Y*). Next, consider

$$\begin{array}{rcl} S: & \mathcal{Y} & \to & X\\ y & \mapsto & S(y) := y\_0^\*(y)x\_0. \end{array}$$

Notice that *S* ∈ CL(*Y*, *X*) \ {0} and *T* ◦ *S* = 0.

**Theorem 1.** *Let T* : *X* → *Y be a continuous linear operator between locally convex Hausdorff topological vector spaces X and Y. Then:*


### **Proof.**


We will finalize this section with a trivial example of a matrix *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>3×<sup>2</sup> such that *A* ∈ *rd*(*I*) ∩ *rd*(0).

**Example 1.** *Consider*

$$A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{pmatrix}.$$

*It is not hard to check that* ker(*A*) = {(0, 0)} *thus A is left-invertible by Theorem 1(2) and so A* ∈ *rd*(*I*)*. In fact,*

$$
\left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \end{array} \right) \left( \begin{array}{ccc} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{array} \right) = \left( \begin{array}{ccc} 1 & 0 \\ 0 & 1 \end{array} \right) \dots$$

*Finally,*

$$
\left( \begin{array}{ccc} 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{array} \right) \left( \begin{array}{ccc} 1 & 0 \\ 0 & 1 \\ 0 & 0 \\ \end{array} \right) = \left( \begin{array}{ccc} 0 & 0 \\ 0 & 0 \\ \end{array} \right) \dots$$

#### **3. Remodeling the Original Maxmin Problem max** *-T***(***x***)***-* **Subject to min** *-S***(***x***)***-*

*3.1. The Original Maxmin Problem Has No Solutions*

This subsection begins with the following theorem:

**Theorem 2.** *Let T*, *S* : *X* → *Y be nonzero continuous linear operators between Banach spaces X and Y. Then the original maxmin problem*

$$\begin{cases} \max \|T(\mathbf{x})\| \\ \min \|S(\mathbf{x})\| \end{cases} \tag{1}$$

*has trivially no solution.*

**Proof.** Observe that arg min *S*(*x*) <sup>=</sup> ker(*S*) and arg max *T*(*x*) <sup>=</sup> <sup>∅</sup> because *<sup>T</sup>* <sup>=</sup> {0}. Then the set of solutions of Problem (1) is

$$\text{arg}\min \|S(\mathfrak{x})\| \cap \text{arg}\max \|T(\mathfrak{x})\| = \text{ker}(S) \cap \mathcal{Q} = \mathcal{Q}.$$

As a consequence, Problem (1) must be reformulated or remodeled.

### *3.2. Equivalent Reformulations for the Original Maxmin Problem*

According to the Background section, we begin with the following reformulation:

$$\begin{cases} \max \|T(\mathbf{x})\| \\ \|S(\mathbf{x})\| \le 1 \end{cases} \tag{2}$$

Please note that arg max *S*(*x*)≤1 *T*(*x*) is a <sup>K</sup>-symmetric set, where <sup>K</sup> :<sup>=</sup> <sup>R</sup> or <sup>C</sup>, in other words, if *<sup>λ</sup>* <sup>∈</sup> <sup>K</sup> and <sup>|</sup>*λ*<sup>|</sup> <sup>=</sup> 1, then *<sup>λ</sup><sup>x</sup>* <sup>∈</sup> arg max *S*(*x*)≤<sup>1</sup> *T*(*x*) for every *<sup>x</sup>* <sup>∈</sup> arg max *S*(*x*)≤<sup>1</sup> *T*(*x*). The finite dimensional version of the previous reformulation is

$$\begin{cases} \max \|Ax\|\\ \|Bx\| \le 1 \end{cases} \tag{3}$$

where *<sup>A</sup>*, *<sup>B</sup>* <sup>∈</sup> <sup>R</sup>*m*×*n*.

Recall that B(*X*,*Y*) denotes the space of bounded operators from *X* to *Y*.

**Lemma 1.** *Let T*, *S* ∈ B(*X*,*Y*) *where X and Y are Banach spaces. If the general reformulated maxmin problem*

$$\begin{cases} \max \|T(x)\| \\ \|S(x)\| \le 1 \end{cases}$$

*has a solution, then* ker(*S*) ⊆ ker(*T*)*.*

**Proof.** If ker(*S*) \ ker(*T*) <sup>=</sup> <sup>∅</sup>, then it suffices to consider the sequence (*nx*0)*n*∈<sup>N</sup> for *<sup>x</sup>*<sup>0</sup> <sup>∈</sup> ker(*S*) \ ker(*T*), since *S*(*nx*0) <sup>=</sup> <sup>0</sup> <sup>≤</sup> 1 for all *<sup>n</sup>* <sup>∈</sup> <sup>N</sup> and *T*(*nx*0) <sup>=</sup> *<sup>n</sup>T*(*x*0) → <sup>∞</sup> as *<sup>n</sup>* <sup>→</sup> <sup>∞</sup>.

The general maxmin (1) can also be reformulated as

$$\left\{ \begin{array}{ll} \max \left||T(\mathfrak{x})\right|| & \xrightarrow{\text{refour}} & \left\{ \begin{array}{l} \max \frac{\left\Vert T(\mathfrak{x})\right\Vert}{\left\Vert S(\mathfrak{x})\right\Vert} \\ \left||S(\mathfrak{x})\right\Vert \end{array} \right\} \end{array} \right.$$

**Lemma 2.** *Let T*, *S* ∈ B(*X*,*Y*) *where X and Y are Banach spaces. If the second general reformulated maxmin problem*

$$\begin{cases} \max \frac{\|T(x)\|}{\|S(x)\|}\\ \|S(x)\| \neq 0 \end{cases}$$

*has a solution, then* ker(*S*) ⊆ ker(*T*)*.*

**Proof.** Suppose there exists *x*<sup>0</sup> ∈ ker(*S*) \ ker(*T*). Then fix an arbitrary *x*<sup>1</sup> ∈ *X* \ ker(*S*). Notice that

$$\frac{||T(n\mathbf{x}\_0 + \mathbf{x}\_1)||}{||S(n\mathbf{x}\_0 + \mathbf{x}\_1)||} \ge \frac{n||T(\mathbf{x}\_0)|| - ||T(\mathbf{x}\_1)||}{||S(\mathbf{x}\_1)||} \to \infty$$

as *n* → ∞.

*Mathematics* **2020**, *8*, 85

The next theorem shows that the previous two reformulations are in fact equivalent.

**Theorem 3.** *Let T*, *S* ∈ B(*X*,*Y*) *where X and Y are Banach spaces. Then*

$$\bigcup\_{t>0} \text{arg}\, \max\_{\|S(\mathfrak{x})\| \le 1} \|T(\mathfrak{x})\| = \text{arg}\, \max\_{\|S(\mathfrak{x})\| \ne 0} \frac{\|T(\mathfrak{x})\|}{\|S(\mathfrak{x})\|}.$$

**Proof.** Let *<sup>x</sup>*<sup>0</sup> ∈ arg max*S*(*x*)≤<sup>1</sup> *T*(*x*) and *<sup>t</sup>*<sup>0</sup> > 0. Fix an arbitrary *<sup>y</sup>* ∈ *<sup>X</sup>* \ ker(*S*). Notice that *x*<sup>0</sup> ∈/ ker(*S*) in virtue of Theorem 1. Then

$$\|T(x\_0)\| \ge \left\| \begin{array}{l} T\left(\frac{y}{\|S(y)\|}\right) \end{array} \right\|.$$

therefore

$$\frac{\|\|T(t\mathbf{x}\_0)\|\|}{\|\|S(t\mathbf{x}\_0)\|\|} = \frac{\|\|T(\mathbf{x}\_0)\|\|}{\|\|S(\mathbf{x}\_0)\|\|} \ge \|\|T(\mathbf{x}\_0)\|\| \ge \left\|\|T\left(\frac{y}{\|S(y)\|\|}\right)\|\right\|.$$

Conversely, let *x*<sup>0</sup> ∈ arg max*S*(*x*)=<sup>0</sup> *T*(*x*) *S*(*x*) . Fix an arbitrary *<sup>y</sup>* <sup>∈</sup> *<sup>X</sup>* with *S*(*y*) ≤ 1. Then

$$\left\|\begin{array}{c}T\left(\frac{\mathbf{x\_0}}{\|\|S(\mathbf{x\_0})\|}\right)\end{array}\right\| = \frac{\left\|\|T(\mathbf{x\_0})\|\right\|}{\left\|\|S(\mathbf{x\_0})\|\right\|} \geq \frac{\left\|\|T(y)\|\right\|}{\left\|\|S(y)\|\right\|} \geq \left\|\|T(y)\|\right\|$$

which means that *<sup>x</sup>*<sup>0</sup>

$$\frac{x\_0}{||S(x\_0)||} \in \arg\max\_{||S(x)|| \le 1} ||T(x)||$$

and thus

$$\text{l.r}\_0 \in \left\| S(\mathbf{x}\_0) \right\| \text{arg} \max\_{\|S(\mathbf{x})\| \le 1} \left\| T(\mathbf{x}) \right\| \le \bigcup\_{t > 0} \text{tr} \mathbf{g} \max\_{\|S(\mathbf{x})\| \le 1} \left\| T(\mathbf{x}) \right\|.$$

The reformulation

$$\begin{cases} \min \frac{\|S(x)\|}{\|T(x)\|}\\ \|T(x)\| \neq 0 \end{cases}$$

is slightly different from the previous two reformulations. In fact, if ker(*S*) \ ker(*T*) <sup>=</sup> <sup>∅</sup>, then arg min*T*(*x*)=<sup>0</sup> *S*(*x*) *T*(*x*) <sup>=</sup> ker(*S*) \ ker(*T*). The previous reformulation is equivalent to the following one as shown in the next theorem: 

$$\left\lceil \begin{array}{l} \min \|S(\mathbf{x})\| \\ \|T(\mathbf{x})\| \ge 1 \end{array} \right\rceil$$

**Theorem 4.** *Let T*, *S* ∈ B(*X*,*Y*) *where X and Y are Banach spaces. Then*

$$\bigcup\_{t>0} \text{targ}\min\_{||T(\mathfrak{x})|| \ge 1} ||S(\mathfrak{x})|| = \text{arg}\min\_{||T(\mathfrak{x})|| \ne 0} \frac{||S(\mathfrak{x})||}{||T(\mathfrak{x})||}.$$

We spare of the details of the proof of the previous theorem to the reader. Notice that if ker(*S*) \ ker(*T*) <sup>=</sup> <sup>∅</sup>, then arg min*T*(*x*)≥<sup>1</sup> *S*(*x*) <sup>=</sup> ker(*S*) \ {*<sup>x</sup>* <sup>∈</sup>: *T*(*x*) <sup>&</sup>lt; <sup>1</sup>}. However, if ker(*S*) <sup>⊆</sup> ker(*T*), then all four reformulations are equivalent, as shown in the next theorem, whose proof's details we spare again to the reader.

**Theorem 5.** *Let T*, *S* ∈ B(*X*,*Y*) *where X and Y are Banach spaces. If* ker(*S*) ⊆ ker(*T*)*, then*

$$\arg\max\_{\|S(\mathfrak{x})\|\neq 0} \frac{\|T(\mathfrak{x})\|}{\|S(\mathfrak{x})\|} = \arg\min\_{\|T(\mathfrak{x})\|\neq 0} \frac{\|S(\mathfrak{x})\|}{\|T(\mathfrak{x})\|}.$$

#### **4. Solving the Maxmin Problem max** *-T***(***x***)***-* **Subject to** *-S***(***x***)***- ≤* **1**

We will distinguish between two cases.

### *4.1. First Case: S Is an Isomorphism Over Its Image*

By bearing in mind Theorem 5, we can focus on the first reformulation proposed at the beginning of the previous section:

$$\left\{ \begin{array}{ll} \max \left||T(\mathbf{x})\right|| & \xrightarrow{\text{reference}} \left\{ \begin{array}{l} \max \left||T(\mathbf{x})\right|| \right\} \\ \left||S(\mathbf{x})\right|| & \end{array} \right\} \end{array}$$

The idea we propose to solve the previous reformulation is to make use of supporting vectors (see [7–10]). Recall that if *R* : *X* → *Y* is a continuous linear operator between Banach spaces, then the set of supporting vectors of *R* is defined by

$$\text{supp}\mathbf{v}(R) := \arg\max\_{||\mathbf{x}|| \le 1} ||R(\mathbf{x})||.$$

The idea of using supporting vectors is that the optimization problem

$$\begin{cases} \max \|\mathcal{R}(\mathbf{x})\| \\ \|\mathbf{x}\| \le 1 \end{cases}$$

whose solutions are by definition the supporting vectors of *R*, can be easily solved theoretically and computationally (see [8]).

Our first result towards this direction considers the case where *S* is an isomorphism over its image.

**Theorem 6.** *Let T*, *S* ∈ B(*X*,*Y*) *where X and Y are Banach spaces. Suppose that S is an isomorphism over its image and <sup>S</sup>*−<sup>1</sup> : *<sup>S</sup>*(*X*) <sup>→</sup> *<sup>X</sup> denotes its inverse. Suppose also that <sup>S</sup>*(*X*) *is complemented in Y, being p* : *Y* → *Y a continuous linear projection onto S*(*X*)*. Then*

$$\mathcal{S}^{-1}\left(\mathcal{S}(X)\cap\arg\max\_{\|y\|\le 1}\left\|\left(T\circ S^{-1}\circ p\right)(y)\right\|\right)\subseteq\arg\max\_{\|S(x)\|\le 1}\|T(x)\|.$$

*If, in addition, p* = 1*, then*

$$\arg\max\_{\|S(\mathbf{x})\|\le 1} \|T(\mathbf{x})\| = \mathcal{S}^{-1}\left(\mathcal{S}(X) \cap \arg\max\_{\|y\|\le 1} \left\| \left(T \circ \mathcal{S}^{-1} \circ p\right)(y) \right\|\right).$$

**Proof.** We will show first that

$$\left\| S(X) \cap \arg\max\_{\|y\| \le 1} \left\| \left( T \circ S^{-1} \circ p \right)(y) \right\| \right\| \le S \left( \arg\max\_{\|S(x)\| \le 1} \|T(x)\| \right).$$

Let *<sup>y</sup>*<sup>0</sup> <sup>=</sup> *<sup>S</sup>*(*x*0) <sup>∈</sup> arg max *y*≤<sup>1</sup> *<sup>T</sup>* ◦ *<sup>S</sup>*−<sup>1</sup> ◦ *<sup>p</sup>* (*y*) . We will show that *<sup>x</sup>*<sup>0</sup> <sup>∈</sup> arg max *S*(*x*)≤<sup>1</sup> *T*(*x*). Indeed, let *x* ∈ *X* with *S*(*x*) ≤ 1. Since *S*(*x*0) = *y*0 ≤ 1, by assumption we obtain

$$\begin{array}{rcl} \|T(\mathbf{x})\| &=& \left\| \left(T \circ S^{-1} \circ p\right)(S(\mathbf{x})) \right\| \\ &\leq& \left\| \left(T \circ S^{-1} \circ p\right)(y\_0) \right\| \\ &=& \left\| \left(T \circ S^{-1} \circ p\right)(S(\mathbf{x}\_0)) \right\| \\ &=& \left\| \|T(\mathbf{x}\_0)\| \right\| . \end{array}$$

Now assume that *p* = 1. We will show that

$$S\left(\arg\max\_{\|S(x)\|\le 1} \|T(x)\|\right) \subseteq S(X) \cap \arg\max\_{\|y\|\le 1} \left\| \left(T \circ S^{-1} \circ p\right)(y) \right\|.$$

Let *<sup>x</sup>*<sup>0</sup> <sup>∈</sup> arg max *S*(*x*)≤<sup>1</sup> *T*(*x*), we will show that *<sup>S</sup>*(*x*0) <sup>∈</sup> arg max *y*≤<sup>1</sup> *<sup>T</sup>* ◦ *<sup>S</sup>*−<sup>1</sup> ◦ *<sup>p</sup>* (*y*) . Indeed, let *y* ∈ B*Y*. Observe that

$$\left\| \mathcal{S} \left( \mathcal{S}^{-1} (p(y)) \right) \right\| = \left\| p(y) \right\| \le \left\| y \right\| \le 1$$

so by assumption

$$\begin{aligned} \left\| \left( T \circ S^{-1} \circ p \right)(y) \right\| &= \left\| \left( S^{-1}(p(y)) \right) \right\| \\ &\leq \left\| \left( T(\mathbf{x}\_0) \right) \right\| \\ &= \left\| \left( T \left( S^{-1}(p(S(\mathbf{x}\_0))) \right) \right) \right\| \\ &= \left\| \left( T \circ S^{-1} \circ p \right)(S(\mathbf{x}\_0)) \right\|. \end{aligned}$$

Notice that, in the settings of Theorem 6, *<sup>S</sup>*−<sup>1</sup> ◦ *<sup>p</sup>* is a left-inverse of *<sup>S</sup>*, in other words, *<sup>S</sup>* is a section, as in Theorem 1(2).

Taking into consideration that every closed subspace of a Hilbert space is 1-complemented (see [11,12] to realize that this fact characterizes Hilbert spaces of dimension ≥ 3), we directly obtain the following corollary.

**Corollary 1.** *Let T*, *S* ∈ B(*X*,*Y*) *where X is a Banach space and Y a Hilbert space. Suppose that S is an isomorphism over its image and let S*−<sup>1</sup> : *<sup>S</sup>*(*X*) <sup>→</sup> *X be its inverse. Then*

$$\begin{aligned} \arg\max\_{\|S(\mathbf{x})\|\leq 1} \|T(\mathbf{x})\|\ &=& S^{-1}\left(S(X) \cap \arg\max\_{\|y\|\leq 1} \left\| \left(T \circ S^{-1} \circ p\right)(y) \right\|\right) \\ &=& S^{-1}\left(S(X) \cap \operatorname{supp} \mathbf{v}\left(T \circ S^{-1} \circ p\right)\right) \end{aligned}$$

*where p* : *Y* → *Y is the orthogonal projection on S*(*X*)*.*

### *4.2. The Moore–Penrose Inverse*

If *<sup>B</sup>* <sup>∈</sup> <sup>K</sup>*m*×*n*, then the Moore–Penrose inverse of *<sup>B</sup>*, denoted by *<sup>B</sup>*+, is the only matrix *<sup>B</sup>*<sup>+</sup> <sup>∈</sup> <sup>K</sup>*n*×*<sup>m</sup>* which verifies the following:


If ker(*B*) = 0, then *B*<sup>+</sup> is a left-inverse of *B*. Even more, *BB*<sup>+</sup> is the orthogonal projection onto the range of *B*, thus we have the following result from Corollary 1.

**Corollary 2.** *Let A*, *<sup>B</sup>* <sup>∈</sup> <sup>R</sup>*m*×*<sup>n</sup> such that* ker(*B*) = {0}*. Then*

$$\begin{aligned} B\left(\arg\max\_{\|B\mathbf{x}\|\_2 \le 1} \|A\mathbf{x}\|\_2\right) &=& B\mathbb{R}^\mathbf{n} \cap \arg\max\_{\|\mathbf{y}\|\_2 \le 1} \|AB^+\mathbf{y}\|\_2 \\ &=& B\mathbb{R}^\mathbf{n} \cap \operatorname{supp}\mathbf{v}\left(AB^+\right) \end{aligned}$$

According to the previous Corollary, in its settings, if *<sup>y</sup>*<sup>0</sup> <sup>∈</sup> arg max*y*2≤<sup>1</sup> *AB*+*y*<sup>2</sup> and there exists *<sup>x</sup>*<sup>0</sup> <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* such that *<sup>y</sup>*<sup>0</sup> <sup>=</sup> *Bx*0, then *<sup>x</sup>*<sup>0</sup> <sup>∈</sup> arg max*Bx*2≤<sup>1</sup> *Ax*<sup>2</sup> and *<sup>x</sup>*<sup>0</sup> can be computed as

$$\mathbf{x}\_0 = B^+ B \mathbf{x}\_0 = B^+ y\_0.$$

### *4.3. Second Case: S Is Not an Isomorphism Over Its Image*

What happens if *S* is not an isomorphism over its image? Next theorem answers this question.

**Theorem 7.** *Let T*, *S* ∈ B(*X*,*Y*) *where X and Y are Banach spaces. Suppose that* ker(*S*) ⊆ ker(*T*)*. If*

$$\begin{array}{rcl} \pi : \quad X & \to & X/\ker(S) \\ \quad \text{x} & \mapsto & \pi(\text{x}) := \text{x} + \text{ker}(S) \end{array}$$

*denotes the quotient map, then*

$$\arg\max\_{\|S(x)\|\le 1} \|T(x)\| = \pi^{-1} \left( \arg\max\_{\|\overline{S}(\pi(x))\|\le 1} \|\overline{T}(\pi(x))\| \right),$$

*where*

$$\begin{array}{rcl}\overline{T}: & \frac{\chi}{\ker(S)} & \to & \chi\\ & \pi(\mathfrak{x}) & \mapsto & \mathsf{T}(\pi(\mathfrak{x})) := \mathsf{T}(\mathfrak{x})\end{array}$$

*and*

$$\begin{array}{rcl} \overline{S}: & \frac{\overline{X}}{\ker(\overline{S})} & \to & \overline{Y} \\ & \pi(\mathtt{x}) & \mapsto & \overline{S}(\pi(\mathtt{x})): = \mathcal{S}(\mathtt{x}). \end{array}$$

**Proof.** Let *<sup>x</sup>*<sup>0</sup> ∈ arg max*S*(*x*)≤<sup>1</sup> *T*(*x*). Fix an arbitrary *<sup>y</sup>* ∈ *<sup>X</sup>* with *S*(*π*(*y*)) ≤ 1. Then *S*(*y*) = *S*(*π*(*y*)) ≤ 1 therefore

$$\|\|T(\pi(\mathfrak{x}\_0))\| = \|\|T(\mathfrak{x}\_0)\|\| \ge \|\|T(\mathfrak{y})\|\| = \|\|T(\pi(\mathfrak{y}))\|\|.$$

This shows that *<sup>π</sup>*(*x*0) <sup>∈</sup> arg max*S*(*π*(*x*))≤<sup>1</sup> *T*(*π*(*x*)). Conversely, let

$$\|\pi(\mathbf{x}\_0) \in \arg\max\_{\|\overline{\mathbb{S}}(\pi(\mathbf{x}))\|\le 1} \|\overline{T}(\pi(\mathbf{x}))\|.$$

Fix an arbitrary *y* ∈ *X* with *S*(*y*) ≤ 1. Then *S*(*π*(*y*)) = *S*(*y*) ≤ 1 therefore

$$\|\|T(\mathbf{x}\_0)\|\| = \|\|\overline{T}(\pi(\mathbf{x}\_0))\|\| \ge \|\|\overline{T}(\pi(y))\|\| = \|\|T(y)\|\|.$$

This shows that *<sup>x</sup>*<sup>0</sup> ∈ arg max*S*(*x*)≤<sup>1</sup> *T*(*x*).

Please note that in the settings of Theorem 7, if *S*(*X*) is closed in *Y*, then *S* is an isomorphism over its image *S*(*X*), and thus in this case Theorem 7 reduces the reformulated maxmin to Theorem 6.

*4.4. Characterizing When the Finite Dimensional Reformulated Maxmin Has a Solution*

The final part of this section is aimed at characterizing when the finite dimensional reformulated maxmin has a solution.

**Lemma 3.** *Let <sup>S</sup>* : *<sup>X</sup>* → *<sup>Y</sup> be a bounded operator between finite dimensional Banach spaces <sup>X</sup> and Y. If* (*xn*)*n*∈<sup>N</sup> *is a sequence in* {*<sup>x</sup>* ∈ *<sup>X</sup>* : *S*(*x*) ≤ 1}*, then there is a sequence* (*zn*)*n*∈<sup>N</sup> *in* ker(*S*) *so that* (*xn* + *zn*)*n*∈<sup>N</sup> *is bounded.*

**Proof.** Consider the linear operator

$$\begin{array}{rcl} \overline{S}: & \frac{\overline{X}}{\ker(S)} & \to & \mathcal{Y} \\ & \mathfrak{x} + \ker(S) & \mapsto & \mathfrak{S}(\mathfrak{x} + \ker(S)) = \mathcal{S}(\mathfrak{x}). \end{array}$$

Please note that

$$\left\| \left| \overline{\mathcal{S}}(\mathfrak{x}\_n + \ker(\mathcal{S})) \right| \right\| = \left\| \mathcal{S}(\mathfrak{x}\_n) \right\| \le 1$$

for all *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>, therefore the sequence (*xn* <sup>+</sup> ker(*S*))*n*∈<sup>N</sup> is bounded in *<sup>X</sup>* ker(*S*) because *<sup>X</sup>* ker(*S*) is finite dimensional and *S* has null kernel so its inverse is continuous. Finally, choose *zn* ∈ ker(*S*) such that *xn* <sup>+</sup> *zn* <sup>&</sup>lt; *xn* <sup>+</sup> ker(*S*) <sup>+</sup> <sup>1</sup> *<sup>n</sup>* for all *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>.

**Lemma 4.** *Let <sup>A</sup>*, *<sup>B</sup>* <sup>∈</sup> <sup>R</sup>*m*×*n. If* ker(*B*) <sup>⊆</sup> ker(*A*)*, then <sup>A</sup> is bounded on* {*<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* : *Bx* ≤ <sup>1</sup>} *and attains its maximum on that set.*

**Proof.** Let (*xn*)*n*∈<sup>N</sup> be a sequence in {*<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* : *Bx* ≤ <sup>1</sup>}. In accordance with Lemma 3, there exists a sequence (*zn*)*n*∈<sup>N</sup> in ker(*B*) such that (*xn* + *zn*)*n*∈<sup>N</sup> is bounded. Since *<sup>A</sup>*(*xn*) = *<sup>A</sup>*(*xn* + *zn*) by hypothesis (recall that ker(*B*) <sup>⊆</sup> ker(*A*)), we conclude that *<sup>A</sup>* is bounded on {*<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* : *Bx* ≤ <sup>1</sup>}. Finally, let (*xn*)*n*∈<sup>N</sup> be a sequence in {*<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* : *Bx* ≤ <sup>1</sup>} such that *Axn* → max *Bx*≤1 *Ax* as *n* → ∞. Please note that *<sup>A</sup>*(*xn* <sup>+</sup> ker(*B*)) <sup>=</sup> *Axn* for all *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>, so *A*(*xn* + ker(*B*)) *<sup>n</sup>*∈<sup>N</sup> is bounded in <sup>R</sup>*<sup>m</sup>* and so is *A*(*xn* + ker(*B*)) *<sup>n</sup>*∈<sup>N</sup> in <sup>R</sup>*<sup>n</sup>* ker(*B*). Fix *bn* <sup>∈</sup> ker(*B*) such that *xn* <sup>+</sup> *bn* <sup>&</sup>lt; *xn* <sup>+</sup> ker(*B*) <sup>+</sup> <sup>1</sup> *<sup>n</sup>* for all *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>. This means that (*xn* <sup>+</sup> *bn*)*n*∈<sup>N</sup> is a bounded sequence in <sup>R</sup>*<sup>n</sup>* so we can extract a convergent subsequence *xnk* + *bnk <sup>k</sup>*∈<sup>N</sup> to some *<sup>x</sup>*<sup>0</sup> <sup>∈</sup> *<sup>X</sup>*. At this stage, notice that *B xnk* + *bnk* <sup>=</sup> *Bxnk* ≤ 1 for all *<sup>k</sup>* <sup>∈</sup> <sup>N</sup> and *B xnk* + *bnk <sup>k</sup>*∈<sup>N</sup> converges to *Bx*0, so *Bx*0 ≤ 1. Note also that, since ker(*B*) <sup>⊆</sup> ker(*A*), *Axnk <sup>n</sup>*∈<sup>N</sup> converges to *Ax*0, which implies that

$$\|x\_0 \in \arg\max\_{||Bx|| \le 1} ||Ax||.$$

**Theorem 8.** *Let A*, *<sup>B</sup>* <sup>∈</sup> <sup>R</sup>*m*×*n. The reformulated maxmin problem*

$$\left\{ \begin{array}{l} \max \|Ax\| \\ \|Bx\| \le 1 \end{array} \right\}$$

*has a solution if and only if* ker(*B*) ⊆ ker(*A*)*.*

**Proof.** If ker(*B*) <sup>⊆</sup> ker(*A*), then we just need to call on Lemma 4. Conversely, if ker(*B*) \ ker(*A*) <sup>=</sup> <sup>∅</sup>, then it suffices to consider the sequence (*nx*0)*n*∈<sup>N</sup> for *<sup>x</sup>*<sup>0</sup> ∈ ker(*B*) \ ker(*A*), since *B*(*nx*0) = 0 ≤ 1 for all *<sup>n</sup>* <sup>∈</sup> <sup>N</sup> and *A*(*nx*0) <sup>=</sup> *<sup>n</sup>A*(*x*0) → <sup>∞</sup> as *<sup>n</sup>* <sup>→</sup> <sup>∞</sup>.

### *4.5. Matrices on Quotient Spaces*

Consider the maxmin

$$\begin{cases} \max \|T(\mathbf{x})\| \\ \|S(\mathbf{x})\| \le 1 \end{cases}$$

being *<sup>X</sup>* and *<sup>Y</sup>* Banach spaces and *<sup>T</sup>*, *<sup>S</sup>* ∈ B(*X*,*Y*) with ker(*S*) ⊆ ker(*T*). Notice that if (*ei*)*i*∈*<sup>I</sup>* is a Hamel basis of *<sup>X</sup>*, then (*ei* <sup>+</sup> ker(*S*))*i*∈*<sup>I</sup>* is a generator system of *<sup>X</sup>* ker(*S*). By making use of the Zorn's Lemma, it can be shown that (*ei* <sup>+</sup> ker(*S*))*i*∈*<sup>I</sup>* contains a Hamel basis of *<sup>X</sup>* ker(*S*). Observe that a subset *C* of *<sup>X</sup>* ker(*S*) is linearly independent if and only if *S*(*C*) is a linearly independent subset of *Y*.

In the finite dimensional case, we have

$$\begin{array}{rcl} \overline{B} : & \frac{\mathbb{R}^n}{\ker(B)} & \to & \mathbb{R}^m \\ \mathbbm{x} + \ker(B) & \longmapsto & \overline{B}(\mathbbm{x} + \ker(B)) := B\mathtt{x}. \end{array}$$

and

$$\begin{array}{rcl}\overline{A}: & \frac{\mathbb{R}^n}{\ker(B)} & \to & \mathbb{R}^m\\ \mathbbm{x} + \ker(B) & \mapsto & \overline{A}(\mathbbm{x} + \ker(B)) := A\mathtt{x}.\end{array}$$

If {*e*1, ... ,*en*} denotes the canonical basis of <sup>R</sup>*n*, then {*e*<sup>1</sup> <sup>+</sup> ker(*B*), ... ,*en* <sup>+</sup> ker(*B*)} is a generator system of <sup>R</sup>*<sup>n</sup>* ker(*B*). This generator system contains a basis of <sup>R</sup>*<sup>n</sup>* ker(*B*) so let {*ej*<sup>1</sup> + ker(*B*), ... ,*ejl* + ker(*B*)} be a basis of <sup>R</sup>*<sup>n</sup>* ker(*B*). Please note that *<sup>A</sup> ejk* + ker(*B*) <sup>=</sup> *Aejk* and *<sup>B</sup> ejk* + ker(*B*) <sup>=</sup> *Bejk* for every *k* ∈ {1, ... , *l*}. Therefore, the matrix associated with the linear map defined by *B* can be obtained from the matrix *B* by removing the columns corresponding to the indices {1, ... , *n*}\{*j*1, ... , *jl*}, in other words, the matrix associated with *B* is *Bej*<sup>1</sup> |···|*Bejl* . Similarly, the matrix associated with the linear map defined by *A* is *Aej*<sup>1</sup> |···|*Aejl* . As we mentioned above, recall that a subset *<sup>C</sup>* of <sup>R</sup>*<sup>n</sup>* ker(*B*) is linearly independent if and only if *B*(*C*) is a linearly independent subset of R*m*. As a consequence, in order to obtain the basis {*ej*<sup>1</sup> + ker(*B*), ... ,*ejl* + ker(*B*)}, it suffices to look at the rank of *B* and consider the columns of *B* that allow such rank, which automatically gives us the matrix associated with *B*, that is, *Bej*<sup>1</sup> |···|*Bejl* .

Finally, let

$$\begin{array}{rcl} \pi : & \mathbb{R}^n & \to & \frac{\mathbb{R}^n}{\ker(B)}\\ & \ge & \pi(x) : x + \ker(B) \end{array}$$

denote the quotient map. Let *<sup>l</sup>* :<sup>=</sup> rank(*B*) = dim <sup>R</sup>*<sup>n</sup>* ker(*B*) . If *<sup>x</sup>* = (*x*1, ... , *xl*) <sup>∈</sup> <sup>R</sup>*<sup>l</sup>* , then ∑*l <sup>k</sup>*=<sup>1</sup> *xk ejk* + ker(*B*) <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* ker(*B*). The vector *<sup>z</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* defined by

$$z\_p := \left\{ \begin{array}{ll} \mathfrak{x}\_k & p = j\_k \\ 0 & p \notin \{j\_1, \ldots, j\_l\} \end{array} \right\}$$

verifies that

$$p(z) = \sum\_{k=1}^{l} \pi\_k \left( e\_{\hat{\mu}\_k} + \ker(B) \right).$$

To simplify the notation, we can define the map

$$\begin{array}{rcl} \mathfrak{a} : & \mathbb{R}^{I} & \to & \mathbb{R}^{n} \\ & & \mathfrak{x} & \mapsto & \mathfrak{a}(\mathfrak{x}) := \mathfrak{z} \end{array}$$

where *z* is the vector described right above.

### **5. Discussion**

Here we compile all the results from the previous subsections and define the structure of the algorithm that solves the maxmin (3).

Let *<sup>A</sup>*, *<sup>B</sup>* <sup>∈</sup> <sup>R</sup>*m*×*<sup>n</sup>* with ker(*B*) <sup>⊆</sup> ker(*A*). Then

$$\left\{ \begin{array}{ll} \max ||Ax||\_2 & \stackrel{\text{reference}}{\longrightarrow} \left\{ \begin{array}{l} \max ||Ax||\_2 \\ ||Bx||\_2 \le 1 \end{array} \right\} \end{array} \right\}$$

Case 1: ker(*B*) = {0}. *<sup>B</sup>*<sup>+</sup> denotes the Moore–Penrose inverse of *<sup>B</sup>*.

$$\left\{ \begin{array}{ll} \texttt{max} \, \|Ax\|\_{2} & \underline{\text{supp}.\text{vec}} \\ \|Bx\|\_{2} \leq 1 \end{array} \xrightarrow{\text{vec}.} \left\{ \begin{array}{ll} \texttt{max} \, \|AB^{+}y\|\_{2} & \underline{\text{subtimes}} \\ \|y\|\_{2} \leq 1 \end{array} \xrightarrow{\text{substitution}} \left\{ \begin{array}{ll} y\_{0} \in \arg\, \max \, \|AB^{+}y\|\_{2} & \underline{\text{final}\,\text{yd}.} \\ \texttt{rank}(B) = \texttt{rank}([B]y\_{0}] \end{array} \xrightarrow{\text{final}.\text{yd}.} \mathbf{x}\_{0} := B^{+}y\_{0} \right\} \right\}$$

$$\begin{array}{rcl} \text{Case 2:} & \text{ker}(B) \neq \{0\}. & \overline{B} & = & \left[ \text{Re}\_{\overline{j}\_{1}} | \cdot \cdot \cdot | \text{Re}\_{\overline{j}\_{l}} \right] \text{ where } \text{rank}(B) & = & l = \text{rank}\left(\overline{B}\right) \text{ and } \overline{A} = & \left[ \text{Re}\_{\overline{j}\_{1}} | \cdot \cdot \cdot \cdot | \text{Re}\_{\overline{j}\_{l}} \right]. \end{array}$$

 max *Ax*<sup>2</sup> *Bx*<sup>2</sup> ≤ 1 case 1 −→ max *Ay*<sup>2</sup> *By*<sup>2</sup> ≤ 1 solution −→ *y*<sup>0</sup> final sol. −→ *x*<sup>0</sup> := *α*(*y*0)

In case a real-life problem is modeled like a maxmin involving more operators, we proceed as the following remark establishes in accordance with the preliminaries of this manuscript (reducing the number of multiobjective functions to avoid the lack of solutions):

**Remark 1.** *Let* (*Tn*)*n*∈<sup>N</sup> *and* (*Sn*)*n*∈<sup>N</sup> *be sequences of continuous linear operators between Banach spaces <sup>X</sup> and Y. The maxmin*

$$\begin{cases} \max \|T\_n(\mathbf{x})\| \; n \in \mathbb{N} \\ \min \|S\_n(\mathbf{x})\| \; n \in \mathbb{N} \end{cases} \tag{4}$$

*can be reformulated as (recall the second typical reformulation)*

$$\begin{cases} \max \sum\_{n=1}^{\infty} ||T\_n(\mathbf{x})||^2\\ \min \sum\_{n=1}^{\infty} ||S\_n(\mathbf{x})||^2 \end{cases} \tag{5}$$

*which can be transformed into a regular maxmin as in* (1) *by considering the operators*

$$\begin{array}{rcl} T: & \mathbf{X} & \to & \ell\_2(\mathbf{Y})\\ \mathbf{x} & \mapsto & T(\mathbf{x}): = (T\_n(\mathbf{x}))\_{n \in \mathbb{N}} \end{array}$$

*and*

$$\begin{array}{rcl} S : & X & \to & \ell\_2(Y) \\ & & \times & S(\mathfrak{x}) := (S\_{\mathfrak{n}}(\mathfrak{x}))\_{\mathfrak{n} \in \mathbb{N}} \end{array}$$

max *T*(*x*)<sup>2</sup> min *S*(*x*)<sup>2</sup>

max *T*(*x*) min *S*(*x*)

*obtaining then*

$$which\text{ is equivalent to}$$

*Observe that for the operators <sup>T</sup> and <sup>S</sup> to be well defined it is sufficient that* (*Tn*)*n*∈<sup>N</sup> *and* (*Sn*)*n*∈<sup>N</sup> *be in* -2*.*

### **6. Materials and Methods**

The initial methodology employed in this research work is the Mathematical Modelling of real-life problems. The subsequent methodology followed is given by the Axiomatic-Deductive Method framed in the First-Order Mathematical language. Inside this framework, we deal with the Category Theory (the main category involved is the Category of Banach spaces with the Bounded Operators). The final methodology used is the implementation of our mathematical results in the MATLAB programming language.

### **7. Conclusions**

We finally enumerate the novelties provided in this work, which serve as conclusions for our research:

1. We prove that the original maxmin problem

$$\begin{cases} \max ||Ax|| \\ \min ||Bx|| \end{cases} \tag{6}$$

has no solution (Theorem 2).

2. We then rewrite (6) as

$$\begin{cases} \max \|Ax\| \\ \|Bx\| \le 1 \end{cases} \tag{7}$$

which still models the real-life problem very accurately and has a solution if and only if ker(*B*) ⊆ ker(*A*) (Theorem 8).


**Author Contributions:** Conceptualization, S.M.-P., F.J.G.-P., C.C.-S. and A.S.-A.; methodology, S.M.-P., F.J.G.-P., C.C.-S. and A.S.-A.; software, S.M.-P., F.J.G.-P., C.C.-S. and A.S.-A.; validation, S.M.-P., F.J.G.-P., C.C.-S. and A.S.-A.; formal analysis, S.M.-P., F.J.G.-P., C.C.-S. and A.S.-A.; investigation, S.M.-P., F.J.G.-P., C.C.-S. and A.S.-A.; resources, S.M.-P., F.J.G.-P., C.C.-S. and A.S.-A.; data curation, S.M.-P., F.J.G.-P., C.C.-S. and A.S.-A.; writing—original draft preparation, S.M.-P., F.J.G.-P., C.C.-S. and A.S.-A.; writing—review and editing, S.M.-P., F.J.G.-P., C.C.-S. and A.S.-A.; visualization, S.M.-P., F.J.G.-P., C.C.-S. and A.S.-A.; supervision, S.M.-P., F.J.G.-P., C.C.-S. and A.S.-A.; project administration, S.M.-P., F.J.G.-P., C.C.-S. and A.S.-A.; funding acquisition, S.M.-P., F.J.G.-P., C.C.-S. and A.S.-A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Research Grant PGC-101514-B-100 awarded by the Spanish Ministry of Science, Innovation and Universities and partially funded by FEDER.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

### **Appendix A. Applications to Optimal TMS Coils**

### *Appendix A.1. Introduction to TMS Coils*

Transcranial Magnetic Stimulation (TMS) is a non-invasive technique to stimulate the brain. We refer the reader to [8,10,13–23] for a description on the development of TMS coils desing as an optimization problem.

An important safety issue in TMS is the minimization of the stimulation of non-target areas. Therefore, the development of TMS as a medical tool would be benefited with the design of TMS stimulators capable of inducing a maximum electric field in the region of interest, while minimizing the undesired stimulation in other prescribed regions.

### *Appendix A.2. Minimum Stored-Energy TMS Coil*

In the following section, in order to illustrate an application of the theoretical model developed in this manuscript, we are going to tackle the design of a minimum stored-energy hemispherical TMS coil of radius 9 cm, constructed to stimulate only one cerebral hemisphere. To this end, the coil must produce an E-field which is both maximum in a spherical region of interest (ROI) and minimum in a second region (ROI2). Both volumes of interest are of 1 cm radius and formed by 400 points, where ROI is shifted by 5 cm in the positive *z*-direction and by 2 cm in the positive *y*-direction; and ROI2 is shifted by 5 cm in the positive *z*-direction and by 2 cm in the negative *y*-direction, as shown in Figure A1a. In Figure A1b a simple human head made of two compartments, scalp and brain, used to evaluate the performance of the designed stimulator is shown.

**Figure A1.** (**a**) Description of hemispherical surface where the optimal *ψ* must been found along with the spherical regions of interest ROI and ROI2 where the electric field must be maximized and minimized respectively. (**b**) Description of the two compartment scalp-brain model.

By using the formalism presented in [10] this TMS coil design problem can be posed as the following optimization problem:

$$\begin{cases} \max \|E\_{\mathcal{X}\_1} \psi\|\_2\\ \min \|E\_{\mathcal{X}\_2} \psi\|\_2\\ \min \psi^T L \psi \end{cases} \tag{A1}$$

where *ψ* is the stream function (the optimization variable), *M* = 400 are the number of points in the ROI and ROI2, *<sup>N</sup>* <sup>=</sup> 2122 the number of mesh nodes, *<sup>L</sup>* <sup>∈</sup> <sup>R</sup>*N*×*<sup>N</sup>* is the inductance matrix, and *Ex*<sup>1</sup> <sup>∈</sup> <sup>R</sup>*M*×*<sup>N</sup>* and *Ex*<sup>2</sup> <sup>∈</sup> <sup>R</sup>*M*×*<sup>N</sup>* are the *<sup>E</sup>*-field matrices in the prescribe *<sup>x</sup>*-direction.

**Figure A2.** (**a**) Wirepaths with 18 turns of the TMS coil solution (red wires indicate reversed current flow with respect to blue). (**b**) E-field modulus induced at the surface of the brain by the designed TMS coil.

Figure A2a shows the coil solution of problem in Equation (A1) computed by using the theoretical model proposed in this manuscript (see Section 5 and Appendix A.3), and as expected, the wire arrangements is remarkably concentrated over the region of stimulation.

To evaluate the stimulation of the coil, we resort to the direct BEM [24], which permits the computation of the electric field induced by the coils in conducting systems. As can be seen in Figure A2b, the TMS coil fulfils the initial requirements of stimulating only one hemisphere of the brain (the one where ROI is found); whereas the electric field induced in the other cerebral hemisphere (where ROI2 can be found) is minimum.

### *Appendix A.3. Reformulation of Problem* (A1) *to Turn it into a Maxmin*

Now it is time to reformulate the multiobjective optimization problem given in (A1), because it has no solution in virtue of Theorem 2. We will transform it into a maxmin problem as in (7) so that we can apply the theoretical model described in Section 5:

$$\begin{cases} \max \left\| \boldsymbol{E}\_{\boldsymbol{x}\_1} \boldsymbol{\Psi} \right\|\_2 \\ \min \left\| \boldsymbol{E}\_{\boldsymbol{x}\_2} \boldsymbol{\Psi} \right\|\_2 \\ \min \boldsymbol{\Psi}^T \boldsymbol{L} \boldsymbol{\Psi} \end{cases}$$

Since raising to the square is a strictly increasing function on [0, ∞), the previous problem is trivially equivalent to the following one:

$$\begin{cases} \begin{array}{l} \max \|E\_{\mathbf{x}\_1} \boldsymbol{\psi}\|\_2^2 \\ \min \|E\_{\mathbf{x}\_2} \boldsymbol{\psi}\|\_2^2 \\ \min \boldsymbol{\psi}^T L \boldsymbol{\psi} \end{array} \end{cases} \tag{A2}$$

Next, we apply Cholesky decomposition to *L* to obtain *L* = *CTC* so we have that *ψ<sup>T</sup> Lψ* = (*Cψ*)*T*(*Cψ*) = *Cψ*<sup>2</sup> <sup>2</sup> so we obtain ⎧

⎪⎨ ⎪⎩

$$\begin{array}{c} \max \left\| E\_{x\_1} \psi \right\|\_2^2 \\ \min \left\| E\_{x\_2} \psi \right\|\_2^2 \\ \min \left\| C \psi \right\|\_2^2 \end{array} \tag{A3}$$

Since *<sup>C</sup>* is an invertible square matrix, arg min *Cψ*<sup>2</sup> <sup>2</sup> = {0} so the previous multiobjective optimization problem has no solution. Therefore it must be reformulated. We call then on Remark 1 to obtain:

$$\begin{cases} \max \left\| E\_{\mathbf{x}\_1} \psi \right\|\_2^2\\ \min \left\| E\_{\mathbf{x}\_2} \psi \right\|\_2^2 + \| C \psi \|\_2^2 \end{cases} \tag{A4}$$

which in essence is

$$\begin{cases} \max \|E\_{x\_1}\psi\|\_2\\ \min \|D\psi\|\_2 \end{cases} \tag{A5}$$

where *D* := *Ex*<sup>2</sup> *C* . The matrix *D* in this specific case has null kernel. In accordance with the previous sections, Problem (A5) is remodeled as

$$\begin{cases} \max \|E\_{\mathbf{x}\_1} \psi\|\_2\\ \|D\psi\|\_2 \le 1 \end{cases} \tag{A6}$$

Finally, we can refer to Section 5 to solve the latter problem.

### **Appendix B. Applications to Optimal Geolocation**

Several studies involving optimal geolocation [25], multivariate statistics [26,27] and multiobjective problems [28–30] were carried out recently. To show another application of maxmin multiobjective problems, we consider in this work the best situation of a tourism rural inn considering several measured climate variables. Locations with low highest temperature *m*1, radiation *m*<sup>2</sup> and evapotranspiration *m*<sup>3</sup> in summer time and high values in winter time are sites with climatic characteristics desirable for potential visitors. To solve this problem, we choose 11 locations in the Andalusian coastline and 2 in the inner, near the mountains. We have collected the data from the official *Andalusian government* webpage [31] evaluating the mean values of these variables on the last 5 years 2013–2019. The referred months of the study were January and July.

**Table A1.** Mean values of high temperature (T) in Celsius Degrees, radiation (R) in *M J*/*m*2, and evapotranspiration (E) in mm/day, measures in January (winter time) and July (summer time) between 2013 and 2018.


To find the optimal location, let us evaluate the site where the variables mean values are maximum in January and minimum in July. Here we have a typical multiobjective problem with two data matrices that can be formulated as follows: ⎧

$$\begin{cases} \max \|Ax\|\_2\\ \min \|Bx\|\_2\\ \min \|x\|\_2 \end{cases} \tag{A7}$$

where *A* and *B* are real 16 × 3 matrices with the values of the three variables (*m*1, *m*2, *m*3) taking into account (highest temperature, radiation and evapotranspiration) in January and July respectively. To avoid unit effects, we standarized the variables (*μ* = 0 and *σ* = 1). The vector *x* is the solution of the multiobjective problem.

Since (A7) lacks any solution in view of Theorem 2, we reformulate it as we showed in Remark 1 by the following:

$$\begin{cases} \max \|Ax\|\_2\\ \min \|Dx\|\_2 \end{cases} \tag{A8}$$

with matrix *D* := *B In* , where *In* is the identity matrix with *n* = 3. Notice that it also verifies that ker(*D*) = {0}. Observe that, according to the previous sections, (A8) can be remodeled into

$$\begin{cases} \max \|Ax\|\_2\\ \|Dx\|\_2 \le 1 \end{cases} \tag{A9}$$

and solved accordingly.

**Figure A3.** Geographic distribution of the sites considered in the study. 11 places are in the coastline of the region and 5 in the inner.

**Figure A4.** Locations considering Ax and Bx axes. Group named *A* represents the best places for the tourism rural inn, near Costa Tropical (Granada province). Sites on *B* are also in the coastline of the region. Sites on *C* are the worst locations considering the multiobjective problem, they are situated inside the region.

**Figure A5.** (**left**) Sites considering Ax and Bx and the function *y* = −*x*. The places with high values of Ax (max) and low values of Bx (min) are the best locations for the solution of the multiobjective problem (round). (**right**) Multiobjective scores values obtained for each site projecting the point in the function *y* = −*x*. High values of this score indicate better places to locate the tourism rural inn.

**Figure A6.** Distribution of the three areas described in Figure A4. A and B areas are in the coastline and C in the inner.

The solution of (A9) allow us to draw the sites with a 2*D* plot considering the *X* axe as *Ax* and the *Y* axe as *Bx*. We observe that better places have high values of *Ax* and low values of *Bx*. Hence, we can sort the sites in order to achieve the objectives in a similar way as factorial analysis works (two factors, the maximum and the minimum, instead of *m* variables).

### **Appendix C. Algorithms**

To solve the real problems posed in this work, the algorithms were developed in MATLAB. As pointed out in Section 5, our method relies on finding the generalized supporting vectors. Thus, we refer the reader to [8] (Appendix A.1) for the MATLAB code "sol\_1.m" to compute a basis of

generalized supporting vectors of a finite number of matrices *A*1, ... , *Ak*, in other words, a solution of Problem (A10), which was originally posed and solved in [7]:

$$\begin{cases} \max \sum\_{i=1}^{k} ||A\_i x||\_2^2 \\ \quad ||x||\_2 = 1 \end{cases} \tag{A10}$$

The solution of the previous problem (see [7] (Theorem 3.3)) is given by

$$\max\_{\|\|\mathbf{x}\|\|\_{2}=1} \sum\_{i=1}^{k} \|\mathcal{A}\_{i}\mathbf{x}\|\|\_{2}^{2} = \lambda\_{\max} \left(\sum\_{i=1}^{k} A\_{i}^{T} A\_{i}\right)$$

and

$$\arg\max\_{\|\mathbf{x}\|\_{2}=1} \sum\_{i=1}^{k} \|A\_{i}\mathbf{x}\|\_{2}^{2} = V\left(\lambda\_{\max}\left(\sum\_{i=1}^{k} A\_{i}^{T}A\_{i}\right)\right) \cap \mathbb{S}\_{\ell\_{2}^{m}}$$

where *λ*max denotes the greatest eigenvalue and *V* denotes the associated eigenvector space. We refer the reader to [8] (Theorem 4.2) for a generalization of [7] (Theorem 3.3) to a infinite number of operators on an infinite dimensional Hilbert space.

As we pointed out in Theorem 8, the solution of the problem

$$\left\{ \begin{array}{l} \max \|Ax\| \\ \|Bx\| \le 1 \end{array} \right\}$$

exists if and only if ker(*B*) ⊆ ker(*A*). Here is a simple code to check this.

```
function p=existence_sol(A,B)
%%%%
%%%% This function checks the existence of the solution of the
%%%% problem
%%%%
%%%% max ||Ax||
%%%% ||Bx||<=1
%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%
%%%% INPUT:
%%%%
%%%% A, B - the matrices involved in the problem
%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%
%%%% OUTPUT:
%%%%
%%%% p - true if the problem has solution or false on the contrary
%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
KerB = null(B);
dimKerB = size(KerB,2);
KerA = null(A);
dimKerA = size(KerA,2);
if (dimKerB<=dimKerA) & (rank([KerB KerA])==dimKerA)
   p = true;
else
```

```
Mathematics 2020, 8, 85
```

```
p = false;
    end
end
```
Now we present the code to solve the first case of the previous maxmin problem, that is, the case where ker(*B*) = {0}. We refer the reader to Section 5 on which this code is based.

```
function x = case_1(A, B)
   %%%%
   %%%% This function computes the solution of the problem
   %%%%
   %%%% max ||Ax||_2
   %%%% ||Bx||_2<=1
   %%%%
   %%%% in the case KerB={0}.
   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
   %%%%
   %%%% INPUT:
   %%%%
   %%%% A, B - the matrices involved in the problem
   %%%%
   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
   %%%%
   %%%% OUTPUT:
   %%%%
   %%%% x - basis of unit eigenvectors associated to lambda_max
   %%%%
   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
   %%%%
   KerB = null(B);
   dimKerB = size(KerB,2);
   if (dimKerB ~= 0)
       display('KerB~={0}')
       x=[];
   else % KerB={0}
           M = A*pinv(B); % M = A*B^+
                                    % B^+ is the pseudoinverse matrix
           [lambda_max, y] = sol_1({M}); % where sol_1 is the algorithm in [5, Appendix A.1]
           [nrows_y ncols_y] = size(y);
           r_B = rank(B);
           counter = 0;
           for i=1:ncols_y
              r = rank([B y(:,i)]);
              if (abs(r_B - r)<1e-12) % Here we check if rank(B) = rank ([B y0]).
                             % A tolerance of 1e-12 is needed in
                             % order to compare these two ranks.
                 counter = counter +1;
                 y0(:,counter) = y(:,i);
              end
           end
           x = pinv(B)*y0; % This is a basis of solutions of our problem
end
```
Next, we can compute the global solution of the maxmin problem by means of the following code. Again, we refer the reader to Section 5 on which this code is based.

*Mathematics* **2020**, *8*, 85

```
function x = sol_2(A, B)
%%%%
%%%% This function computes the solution of the problem
%%%%
%%%% max ||Ax||_2
%%%% ||Bx||_2<=1
%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%
%%%% INPUT:
%%%%
%%%% A, B - the matrices involved in the problem
%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%
%%%% OUTPUT:
%%%%
%%%% x - Supporting vector which is the solution of the problem
%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%
p=existence_sol(A,B);
if p==true
   n = size(B,2);
   KerB = null(B);
   dimKerB = size(KerB,2);
   if (dimKerB == 0) % KerB = {0} This is the case 1
       x = case_1(A,B); % x is the solution of our problem
   else % KerB~={0}
       [Br indices] = colsindep(B); %%% First we extract the
                                    %%% independent columns in B
       Ar = A(indices); %%% We extract the same columns of A
            %%% Now, Ker(Br)={0} so this is the case 1 treated above:
       xr = case_1(Ar,Br);
       [nrows_xr,ncols_xr] = size(xr);
       %%% Now we compute the matrix solutions x of the problem
       counter = 0;
       for j = 1:ncols_xr
           for i=1:n
              if ismember(i,indices)==1 %%% i is an index of the ones
                                   %%% defined above
                  counter = counter + 1;
                  x(i,j) = xr(counter,j);
              else
                  x(i,j) = 0;
              end
           end
       end
   end
else
   display('This problem has no solution');
   x=[];
end
```
#### end

Notice that we use the case\_1 function described above and a new function named colsindep.We include the code to implement this new function below.

```
function [Dcolsind, indices]=colsindep(D)
   %%%%
   %%%% This function extracts r = rank(D) independent columns of the
   %%%% matrix D and the indices of the columns in D which are independent
   %%%%
   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
   %%%%
   %%%% INPUT:
   %%%%
   %%%% D - a matrix with rank r
   %%%%
   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
   %%%%
   %%%% OUTPUT:
   %%%%
   %%%% Dcolsind - r independent columns in D
   %%%% indices - the indices of independent columns extracted from D
   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
   r=rank(D); %%% Compute the rank
   [Q R p]=qr(D,0); %%% p is a permutation vector such that A(:,p)=Q*R
   indices=sort(p(1:r)); %%% The first r elements in p are the indices of the
                       %%% columns linearly independent in D
   Dcolsind=D(:,indices);%%% Extract these columns
end
```
The MATLAB code to compute the solution of the TMS coil problem (A6):

$$\begin{cases} \max \|E\_{x\_1}\psi\|\_2\\ \|D\psi\|\_2 \le 1 \end{cases}$$

with the matrix *D* := *Ex*<sup>2</sup> *C* , where *C* is the Cholesky matrix of *L*, and in this case it verifies that ker(*D*) = {0}. Recall that (A6) comes from (A1):

$$\begin{cases} \max \left\| \boldsymbol{E}\_{\boldsymbol{x}\_1} \boldsymbol{\Psi} \right\|\_2 \\ \min \left\| \boldsymbol{E}\_{\boldsymbol{x}\_2} \boldsymbol{\Psi} \right\|\_2 \\ \min \boldsymbol{\Psi}^T \boldsymbol{L} \boldsymbol{\Psi} \end{cases}$$

```
function psi = sol2_psi(Ex1, Ex2, L)
```

```
C = chol(L); % Cholesky's decomposition of matrix L = C' * C
   A = Ex1;
   B = [Ex2;C];
   psi = case_1(A,B); % We apply the algorithm to obtain the solutions
end
```
Finally, we provide the code to compute the solution of the optimal geolocation problem (A9):

$$\begin{cases} \max \|Ax\|\_2\\ \|Dx\|\_2 \le 1 \end{cases}$$

with matrix *D* := *B I*3 . Notice that it also verifies that ker(*D*) = {0} and *A* and *B* are composed by standardized variables. Recall that (A9) comes from (A7):

```
max Ax2
min Bx2
min x2
```

```
function x = sol_2_geoloc(A, B)
```

```
[rows,cols] = size(A);
   D = [B; eye(size(cols))];
   x = case_1(A,D); % We apply the algorithm to obtain the solutions
end
```
⎧ ⎪⎨ ⎪⎩

### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **On** *q***-Quasi-Newton's Method for Unconstrained Multiobjective Optimization Problems**

**Kin Keung Lai 1,\*,†, Shashi Kant Mishra 2,† and Bhagwat Ram 3,†**


Received: 01 April 2020; Accepted: 13 April 2020; Published: 17 April 2020

**Abstract:** A parameter-free optimization technique is applied in Quasi-Newton's method for solving unconstrained multiobjective optimization problems. The components of the Hessian matrix are constructed using *q*-derivative, which is positive definite at every iteration. The step-length is computed by an Armijo-like rule which is responsible to escape the point from local minimum to global minimum at every iteration due to *q*-derivative. Further, the rate of convergence is proved as a superlinear in a local neighborhood of a minimum point based on *q*-derivative. Finally, the numerical experiments show better performance.

**Keywords:** multiobjective programming; methods of quasi-Newton type; Pareto optimality; *q*-calculus; rate of convergence

**MSC:** 90C29; 90C53; 58E17; 05A30; 41A25

### **1. Introduction**

Multiobjective optimization is the method of optimizing two or more real valued objective functions at the same time. There is no ideal minimizer to minimize all objective functions at once, thus the optimality concept is replaced by the idea of Pareto optimality/efficiency. A point is called Pareto optimal or efficient if there does not exist an alternative point with the equivalent or smaller objective function values, such that there is a decrease in at least one objective function value. In many applications such as engineering [1,2], economic theory [3], management science [4], machine learning [5,6], and space exploration [7], etc., several multiobjective optimization techniques are used to make the desired decision. One of the basic approaches is the weighting method [8], where a single objective optimization problem is created by the weighting of several objective functions. Another approach is the -constraint method [9], where we minimize only the chosen objective function and keep other objectives as constraints. Some multiobjective algorithms require a lexicographic method, where all objective functions are optimized in their order of priority [10,11]. First, the most preferred function is optimized, then that objective function is transformed into a constraint and a second priority objective function is optimized. This approach is repeated until the last objective function is optimized. The user needs to choose the sequence of objectives. Two distinct lexicographic optimizations with distinct sequences of objective functions do not produce the same solution. The disadvantages of such approaches are the choice of weights, constraints, and importance of the functions, respectively, which are not known in advance and they have to be specified from the beginning. Some other techniques [12–14] that do not need any prior information are developed for solving unconstrained

multiobjective optimization problems (UMOP) with at most linear convergence rate. Other methods like heuristic approaches or evolutionary approaches [15] provide an approximate Pareto front but do not guarantee the convergence property.

Newton's method [16] that solves the single-objective optimization problems is extended for solving (UMOP), which is based on an a priori parameter-free optimization method [17]. In this case, the objective functions are twice continuously differentiable, no other parameter or ordering of the functions is needed, and each objective function is replaced with a quadratic model. The rate of convergence is observed as superlinear, and it is quadratic if the second-order derivative is Lipschitz continuous. Newton's method is also studied under the assumptions of Banach and Hilbert spaces for finding the efficient solutions of (UMOP) [18]. A new type of Quasi-Newton algorithm is developed to solve the nonsmooth multiobjective optimization problems, where the directional derivative of every objective function exists [19].

A necessary condition for finding the vector critical point of (UMOP) is introduced in the steepest descent algorithm [12], where neither weighting factors nor ordering information for the different objective functions are assumed to be known. The relationship between critical points and efficient points is discussed in [17]. If the domain of (UMOP) is a convex set and the objective functions are convex component-wise then every critical point is the weak efficient point, and if the objective functions are strictly convex component-wise, then every critical point is the efficient point. The new classes of vector invex and pseudoinvex functions for (UMOP) are also characterized in terms of critical points and (weak) efficient points [20] by using Fritz John (FJ) optimality conditions and Karush–Kuhn–Tucker (KKT) conditions. Our focus is on Newton's direction for a standard scalar optimization problem which is implicitly induced by weighting the several objective functions. The weighting values are a priori unknown and non-negative KKT multipliers, that is, they are not required to fix in advance. Every new point generated by the Newton algorithm [17] initiates such weights in the form of KKT multipliers.

Quantum calculus or *q*-calculus is also called calculus without limits. The *q*-analogues of mathematical objects can be again recaptured as *q* → 1. The history of quantum calculus can be traced back to Euler (1707–1783), who first proposed the quantum *q* in Newton's infinite series. In recent years, many researchers have shown considerable interest in examining and exploring the quantum calculus. Therefore, it emerges as an interdisciplinary subject. Of course, the quantum analysis is very useful in numerous fields such as in signal processing [21], operator theory [22], fractional integral and derivatives [23], integral inequalities [24], variational calculus [25], transform calculus [26], sampling theory [27], etc. The quantum calculus is seen as the bridge between mathematics and physics. To study some recent developments in quantum calculus, interested researches should refer to [28–31].

The *q*-calculus was first studied in the area of optimization [32], where the *q*-gradient is used in steepest descent method to optimize objective functions. Further, global optimum was searched using *q*-steepest descent method and *q*-conjugate gradient method where a descent scheme is presented using *q*-calculus with the stochastic approach which does not focus on the order of convergence of the scheme [33]. The *q*-calculus is applied in Newton's method to solve unconstrained single objective optimization [34]. Further, this idea is extended to solve (UMOP) within the context of the *q*-calculus [35].

In this paper, we present the *q*-calculus in Quasi-Newton's method for solving (UMOP). We approximate the second *q*-derivative matrices instead of evaluating them. Using *q*-calculus, we present the convergence rate is superlinear.

The rest of this paper is organized as follows. Section 2 recalls the problem, notation, and preliminaries. Section 3 derives a *q*-Quasi-Newton direction search method solved by (KKT) conditions. Section 4 establishes the algorithms for convergence analysis. The numerical results are given in Section 5 and the conclusion is in the last section.

### **2. Preliminaries**

Denote <sup>R</sup> as the set of real numbers, <sup>N</sup> as the set of positive integers, and <sup>R</sup><sup>+</sup> or (R−) as the set of strictly positive or (negative) real numbers. If a function is continuous on any interval excluding zero, then the function is called continuous *<sup>q</sup>*-differentiable. For a function *<sup>f</sup>* : <sup>R</sup> <sup>→</sup> <sup>R</sup>, the *<sup>q</sup>*-derivative of *f* [36] denoted as *Dq*,*x f* , is given as

$$D\_{q, \mathbf{x}} f(\mathbf{x}) = \begin{cases} \frac{f(\mathbf{x}) - f(q\mathbf{x})}{(1 - q)\mathbf{x}}, & \mathbf{x} \neq \mathbf{0}, \ q \neq 1 \\ f'(\mathbf{x}), & \mathbf{x} = \mathbf{0}. \end{cases} \tag{1}$$

Suppose *<sup>f</sup>* : <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>R</sup>, whose partial derivatives exist. For *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*n*, consider an operator *q*,*<sup>i</sup>* on *<sup>f</sup>* as

$$(\varepsilon\_{q,i})f(\mathbf{x}) = f(\mathbf{x}\_1, \mathbf{x}\_2, \dots, q\mathbf{x}\_i, \mathbf{x}\_{i+1}, \dots, \mathbf{x}\_n). \tag{2}$$

The *q*-partial derivative of *f* at *x* with respect to *xi*, indicated by *Dq*,*xi f* , is [23]:

$$D\_{q, \mathbf{x}\_i} f(\mathbf{x}) = \begin{cases} \frac{f(\mathbf{x}) - (\mathbf{c}\_{q,i} f)(\mathbf{x})}{(1 - q)\mathbf{x}\_i}, & \mathbf{x}\_i \neq \mathbf{0}, \ q \neq \mathbf{1}\_\star\\ \frac{\partial f}{\partial \mathbf{x}\_i}, & \mathbf{x}\_i = \mathbf{0}. \end{cases} \tag{3}$$

We are interested to solve the following (UMOP):

$$\begin{array}{ll}\text{minimize} & F(\mathbf{x}) \\ \text{subject to} & \mathbf{x} \in X\_{\prime} \end{array} \tag{4}$$

where *<sup>X</sup>* <sup>⊆</sup> <sup>R</sup>*<sup>n</sup>* is a feasible region and *<sup>F</sup>* : *<sup>X</sup>* <sup>→</sup> <sup>R</sup>*m*. Note that the function *<sup>F</sup>* = (*f*1, *<sup>f</sup>*2, ... , *fm*) is a vector function whose components are real valued functions such as *fj* : *<sup>X</sup>* <sup>→</sup> <sup>R</sup>, where *<sup>j</sup>* <sup>=</sup> 1, ... , *<sup>m</sup>*. In general, *<sup>n</sup>* and *<sup>m</sup>* are independent. For *<sup>x</sup>*, *<sup>y</sup>* <sup>∈</sup> <sup>R</sup>*n*, we present the vector inequalities as:

$$\begin{aligned} \mathbf{x} &= y \iff \mathbf{x}\_i = y\_i; \forall \ i = 1, \dots, n\_i\\ \mathbf{x} &\ge y \iff \mathbf{x}\_i \ge y\_i \; \forall \ i = 1, \dots, n\_i\\ \mathbf{x} &\ge y \iff \mathbf{x}\_i \ge y\_i \; \text{and} \; \mathbf{x} \ne y\_i\\ \mathbf{x} &> y \iff \mathbf{x}\_i > y\_i \; \forall \ i = 1, \dots, n. \end{aligned}$$

A point *x*<sup>∗</sup> ∈ *X* is called Pareto optimal point such that there is no any point *x* ∈ *X*, for which *F*(*x*) ≤ *F*(*x*∗), and *F*(*x*) = *F*(*x*∗). A point *x*<sup>∗</sup> ∈ *X* is called weakly Pareto optimal point if there is no *x* ∈ *X* for which *F*(*x*) < *F*(*x*∗). Similarly, a point *x*<sup>∗</sup> ∈ *X* is a local Pareto optimal if there exists a neighborhood *Y* ⊆ *X* of *x*<sup>∗</sup> such that the point *x*<sup>∗</sup> is a Pareto optima for *F* restricted on *Y*. Similarly, a point *x*<sup>∗</sup> is a local weak Pareto optima if there exists a neighborhood *Y* ⊆ *X* of *x*<sup>∗</sup> such that the point *<sup>x</sup>*<sup>∗</sup> is a weak Pareto optimal for *<sup>F</sup>* restricted on *<sup>Y</sup>*. The matrix *JF*(*x*) <sup>∈</sup> <sup>R</sup>*m*×*<sup>n</sup>* is the Jacobian matrix of *fj* at *x*, i.e., the *j*-th row of *JF*(*x*) is ∇*<sup>q</sup> fj*(*x*) (*q*-gradient) for all *j* = 1, ... , *m*. Let *W fj*(*x*) be the Hessian matrix of *fj* at *x* for all *j* = 1, ... , *m*. Note that every Pareto optimal point is a weakly Pareto optimal point [37]. The directional derivative of *fj* at *x* in the descent direction *dq* is given as:

$$f\_j'(\mathbf{x}, d\_q) = \lim\_{\alpha \to 0} \frac{f\_j(\mathbf{x} + \alpha d\_q) - f\_j(\mathbf{x})}{\alpha} \tag{5}$$

The necessary condition to get the critical point for multiobjective optimization problems is given in [17]. For any *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*n*, *x* denotes the Euclidean norm in <sup>R</sup>*n*. Let *<sup>K</sup>*(*x*0,*r*) = {*<sup>x</sup>* : *<sup>x</sup>* <sup>−</sup> *<sup>x</sup>*0 ≤ *<sup>r</sup>*} with a center *<sup>x</sup>*<sup>0</sup> <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* and radius *<sup>r</sup>* <sup>∈</sup> <sup>R</sup>+. Norm of the matrix *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* is *A* <sup>=</sup> max*x*∈R*n*×*<sup>n</sup> Ax x* , *<sup>x</sup>* <sup>=</sup> 0. The following proposition indicates that when *f*(*x*) is a linear function, then the *q*-gradient is similar to the classical gradient.

**Proposition 1** ([33])**.** *If <sup>f</sup>*(*x*) = *<sup>a</sup>* <sup>+</sup> *<sup>p</sup>Tx, where <sup>a</sup>* <sup>∈</sup> <sup>R</sup> *and <sup>p</sup>* <sup>∈</sup> <sup>R</sup>*n, then for any <sup>x</sup>* <sup>∈</sup> <sup>R</sup>*n, and <sup>q</sup>* <sup>∈</sup> (0, 1)*, we have* ∇*<sup>q</sup> f*(*x*) = ∇ *f*(*x*) = *p.*

All the quasi-Newton methods approximate the Hessian of function *<sup>f</sup>* as *<sup>W</sup><sup>k</sup>* <sup>∈</sup> <sup>R</sup>*n*×*n*, and update the new formula based on previous approximation [38]. Line search methods are imperative methods for (UMOP) in which a search direction is first computed and then along this direction a step-length is chosen. The entire process is an iterative.

### **3. The** *q***-Quasi-Newton Direction for Multiobjective**

The most well-known quasi-Newton method for single objective function is the BFGS (Broyden, Fletcher, Goldfarb, and Shanno) method. This is a line search method along with a descent direction *d<sup>k</sup> q* within the context of *q*-derivative, given as:

$$d\_q^k = -\left(\mathcal{W}^k\right)^{-1} \nabla\_{\mathcal{\boldsymbol{q}}} f(\mathbf{x}^k),\tag{6}$$

where *<sup>f</sup>* is a continuously *<sup>q</sup>*-differentiable function, and *<sup>W</sup><sup>k</sup>* <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* is a positive definite matrix that is updated at every iteration. The new point is:

$$
\mathbf{x}^{k+1} = \mathbf{x}^k + \alpha\_k d\_q^k. \tag{7}
$$

In the case of the Steepest Descent method and Newton's method, *W<sup>k</sup>* is taken to be an Identity matrix and exact Hessian of *f* , respectively. The quasi-Newton BFGS scheme generates the next *Wk*+<sup>1</sup> as

$$\mathcal{W}^{k+1} = \mathcal{W}^k - \frac{\mathcal{W}^k \mathbf{s}^k (\mathbf{s}^k)^T \mathcal{W}^k}{(\mathbf{s}^k)^T \mathcal{W}^k \mathbf{s}^k} + \frac{\mathcal{Y}^k (\mathbf{y}^k)^T}{(\mathbf{s}^k)^T \mathbf{y}^{k'}} \tag{8}$$

where *<sup>s</sup><sup>k</sup>* <sup>=</sup> *<sup>x</sup>k*+<sup>1</sup> <sup>−</sup> *<sup>x</sup><sup>k</sup>* <sup>=</sup> *<sup>α</sup>kd<sup>k</sup> <sup>q</sup>*, and *<sup>y</sup><sup>k</sup>* <sup>=</sup> <sup>∇</sup>*<sup>q</sup> <sup>f</sup>*(*xk*+1) − ∇*<sup>q</sup> <sup>f</sup>*(*xk*). In Newton's method, second-order differentiability of the function is required. While calculating *Wk*, we use *q*-derivative which behaves like a Hessian matrix of *f*(*x*). *Wk*+<sup>1</sup> may not be a positive definite, which can be modified to be a positive definite through the symmetric indefinite factorization [39]. The *q*-Quasi-Newton's direction *dq*(*x*) is an optimal solution of the following modified problem [40] as:

$$\min\_{\mathbf{1}\_{d\_q \in \mathbb{R}^u}} \max\_{j=1,\ldots,m} \nabla\_q f\_j(\mathbf{x}) d\_q + \frac{1}{2} d\_q^T \mathcal{W}\_{\mathbf{j}}(\mathbf{x})^T d\_{q\prime} \tag{9}$$

where *Wj*(*x*) is computed as (8). The solution and optimal value of (9) are:

$$\psi(\mathbf{x}) = \min\_{d\_q \in \mathbb{R}^n} \max\_{j=1,\ldots,m} \nabla\_q f\_j(\mathbf{x})^T d\_q + \frac{1}{2} d\_q^T \mathcal{W}\_j(\mathbf{x}) d\_{q\prime} \tag{10}$$

and

$$d\_q(\mathbf{x}) = \arg\min\_{d\_q \in \mathbb{R}^n} \max\_{j=1,\ldots,m} \nabla f\_j(\mathbf{x})^T d\_q + \frac{1}{2} d\_q^T \mathcal{W}\_l(\mathbf{x}) d\_q. \tag{11}$$

The problem (9) becomes a convex quadratic optimization problem (CQOP) as follows:

$$\begin{aligned} \text{minimize} \quad & h(t, d\_{\emptyset}) = t, \\ \text{subject to} \quad & \nabla\_{q} f\_{j}(\mathbf{x})^{T} d\_{q} + \frac{1}{2} d\_{q}^{T} \mathcal{W}\_{j}(\mathbf{x}) d\_{q} - t \le 0, \ j = 1, \ldots, m, \end{aligned} \tag{12}$$
 
$$\text{where} \quad (t, d\_{q}) \in \mathbb{R} \times \mathbb{R}^{n}.$$

The Lagrangian function of (CQOP) is:

$$L((t, d\_q), \lambda) = t + \sum\_{j=1}^{m} \lambda\_j \left(\nabla\_q f\_j(\mathbf{x})^T d\_q + \frac{1}{2} d\_q^T \mathcal{W}\_j(\mathbf{x}) d\_q - t\right). \tag{13}$$

For *λ* = (*λ*1, *λ*2,..., *λm*)*T*, we obtain the following (KKT) conditions [40]:

$$\sum\_{j=1}^{m} \lambda\_j \left(\nabla\_q f\_j(\mathbf{x}) + \mathcal{W}\_j(\mathbf{x}) d\_q\right) = 0,\tag{14}$$

$$
\lambda\_j \ge 0, \ j = 1, \dots, m,\tag{15}
$$

$$\sum\_{j=1}^{m} \lambda\_j = 1,\tag{16}$$

$$\nabla\_{\boldsymbol{\theta}} f\_{\boldsymbol{j}}(\mathbf{x})^T d\_{\boldsymbol{\theta}} + \frac{1}{2} d\_{\boldsymbol{\theta}}^T \mathcal{W}\_{\boldsymbol{\hat{j}}}(\mathbf{x}) d\_{\boldsymbol{\theta}} \le t, \; \boldsymbol{j} = 1, \ldots, m,\tag{17}$$

$$\lambda\_j \left( \nabla\_q f\_j(\mathbf{x})^T d\_q + \frac{1}{2} d\_q^T \mathcal{W}\_j(\mathbf{x}) d\_q - \mathbf{t} \right) = 0, \ j = 1, \ldots, m. \tag{18}$$

The solution (*dq*(*x*), *ψ*(*x*)) is unique, and set *λ<sup>j</sup>* = *λj*(*x*) for all *j* = 1, ... , *m* with *dq* = *dq*(*x*) and *t* = *ψ*(*x*) for satisfying (14)–(18). From (14), we obtain

$$d\_q(\mathbf{x}) = -\left(\sum\_{j=1}^{m} \lambda\_j(\mathbf{x}) \mathcal{W}\_j(\mathbf{x})\right)^{-1} \sum\_{j=1}^{m} \lambda\_j(\mathbf{x}) \nabla\_q f\_j(\mathbf{x}).\tag{19}$$

This is a so-called *q*-Quasi-Newton's direction for solving (UMOP). We present the basic result for relating the stationary condition at a given point *x* to its *q*-Quasi-Newton direction *dq*(*x*) and function *ψ*.

**Proposition 1.** *Let <sup>ψ</sup>* : *<sup>X</sup>* <sup>→</sup> <sup>R</sup> *and dq* : *<sup>X</sup>* <sup>→</sup> <sup>R</sup>*<sup>n</sup> be given by (10) and (11), respectively, and Wj*(*x*) <sup>≥</sup> <sup>0</sup> *for all x* ∈ *X. Then,*

	- *(a) The point x is non stationary.*
	- *(b) dq*(*x*) = 0
	- *(c) ψ*(*x*) < 0*.*
	- *(d) dq*(*x*) *is a descent direction.*

**Proof.** Since *dq* = 0, then from (10), we have

$$\psi(\mathbf{x}) \le \min\_{d\_{\emptyset} \in \mathbb{R}^n} \max\_{j=1,\dots,m} \nabla\_{\boldsymbol{\eta}} f\_j(\mathbf{x})^T \boldsymbol{0} + \frac{1}{2} d\_{\boldsymbol{\eta}}^T \boldsymbol{W}\_j(\mathbf{x}) \boldsymbol{0} = \boldsymbol{0},$$

thus *<sup>ψ</sup>*(*x*) <sup>≤</sup> 0. It means that *JF*(*x*∗)*dq*(*x*) <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* <sup>−</sup>. Thus, the given point *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is non-stationary. Since *Wj*(*x*) is positive definite, and from (10) and (11), we have

$$\nabla\_q f\_j(\mathbf{x})^T d\_q(\mathbf{x}) < \nabla f\_j(\mathbf{x})^T d\_q(\mathbf{x}) + \frac{1}{2} d\_q(\mathbf{x})^T W\_j(\mathbf{x})^T d\_q(\mathbf{x}) = \psi(\mathbf{x}) \le 0.$$

Since *ψ*(*x*) is the optimal value of (CQOP), and it is negative, thus solution of (CQOP) can never be *dq*(*x*) = 0. It is sufficient to show that the continuity [41] of *ψ* in set *Y* ⊂ *X*. Since *ψ*(*x*) ≤ 0, then

$$
\nabla\_{\boldsymbol{\theta}} f\_{\boldsymbol{\bar{\boldsymbol{\beta}}}}(\mathbf{x})^{\mathsf{T}} d\_{\boldsymbol{\bar{\boldsymbol{\beta}}}}(\mathbf{x}) \leq -\frac{1}{2} d\_{\boldsymbol{\bar{\boldsymbol{\beta}}}}(\mathbf{x})^{\mathsf{T}} \mathcal{W}\_{\boldsymbol{\bar{\boldsymbol{\beta}}}}(\mathbf{x}) d\_{\boldsymbol{\bar{\boldsymbol{\beta}}}}(\mathbf{x}), \tag{20}
$$

for all *j* = 1, ... , *m*, and *Wj*(*x*), where *j* = 1 ... , *m* are positive definite for all *x* ∈ *Y*. Thus, the eigenvalues of Hessian matrices *Wj*(*x*), where *j* = 1, ... , *m* are uniformly bounded away from zero on *<sup>Y</sup>* so there exists *<sup>R</sup>*, *<sup>S</sup>* <sup>∈</sup> <sup>R</sup><sup>+</sup> such that

$$\mathcal{R} = \max\_{\mathbf{x} \in \mathcal{Y}\_{\gamma} = 1, \dots, m} \|\nabla\_{\mathbf{q}} f\_j(\mathbf{x})\|\_{\mathsf{\prime}} \tag{21}$$

and

$$S = \min\_{\mathbf{x} \in \mathcal{Y}\_r \|\| \mathbf{c}\| = 1, j = 1, \dots, m} e^T \mathcal{W}\_j(\mathbf{x}) \mathbf{c}. \tag{22}$$

From (20) and using Cauchy–Schwarz inequality, we get

$$||\nabla\_q f\_j(\mathfrak{x})|| ||d\_q(\mathfrak{x})|| \le \frac{1}{2} S ||d\_q(\mathfrak{x})||^2 \le R ||d\_q(\mathfrak{x})||\_{\mathcal{H}}$$

that is,

$$d\_q(\mathbf{x}) \le 2\frac{R}{S'} $$

for all *x* ∈ *Y*, that is, Newton's direction is uniformly bounded on Y. We present the family of function {ℵ*x*,*j*}*x*∈*Y*,*j*=1,...,*m*, where

<sup>ℵ</sup>*x*,*<sup>j</sup>* : *<sup>Y</sup>* <sup>→</sup> <sup>R</sup>,

and

$$z \to \nabla\_{\boldsymbol{\theta}} f(\boldsymbol{z})^T d\_{\boldsymbol{\theta}}(\boldsymbol{x}) + \frac{1}{2} d\_{\boldsymbol{\theta}}(\boldsymbol{x})^T \mathcal{W}\_{\boldsymbol{\theta}}(\boldsymbol{x}) d\_{\boldsymbol{\theta}}(\boldsymbol{x}).$$

We shall prove that this family of functions is uniformly equicontinuous. For small value *<sup>z</sup>* <sup>∈</sup> <sup>R</sup><sup>+</sup> there exists *<sup>δ</sup><sup>z</sup>* <sup>∈</sup> <sup>R</sup>+, and for *<sup>y</sup>* <sup>∈</sup> *<sup>K</sup>*(*z*, *<sup>δ</sup>z*), we have

$$\|\mathcal{W}\_j(y) - \nabla\_q^2 f\_j(z)\| < \frac{\epsilon\_z}{2}.$$

and

$$\|\nabla\_q^2 f\_{\vec{\jmath}}(y) - \nabla\_q^2 f\_{\vec{\jmath}}(z)\| < \frac{\epsilon\_z}{2},$$

for all *j* = 1, ... , *m*. because of *q*-continuity of Hessian matrices, the second inequality is true. Since *Y* is compact space, then there exists a finite sub-cover.

$$
\psi\_{\mathbf{x}, \mathbf{j}}(z) = \nabla\_{\mathbf{q}} f\_{\mathbf{j}}(z)^T d\_{\mathbf{q}}(\mathbf{x}) + \frac{1}{2} d\_{\mathbf{q}}(\mathbf{x})^T \mathcal{W}\_{\mathbf{j}}(\mathbf{x}) d\_{\mathbf{q}}(\mathbf{x}),
$$

that is

$$\Psi\_{x,j}(z) = \nabla\_q f\_j(z)^T d\_q(\mathbf{x}) + \frac{1}{2} d\_q(\mathbf{x})^T \nabla^2 f\_j(z) d\_q(\mathbf{x}) + \frac{1}{2} d\_q(\mathbf{x})^T (\mathcal{W}\_{\hat{j}}(z) - \nabla\_q^2 f\_{\hat{j}}(z) d\_q(\mathbf{x})) .$$

To show the *<sup>q</sup>*-continuous of last term, set *<sup>y</sup>*1, *<sup>y</sup>*<sup>2</sup> <sup>∈</sup> *<sup>Y</sup>* such that *y*<sup>1</sup> <sup>−</sup> *<sup>y</sup>*2 <sup>&</sup>lt; *<sup>δ</sup>* for small *<sup>δ</sup>* <sup>∈</sup> <sup>R</sup>+, then

$$\begin{split} \|\frac{1}{2}d\_{\boldsymbol{q}}(\mathbf{x})^{T}\mathcal{W}\_{\boldsymbol{f}}(\boldsymbol{y}\_{1}) - \nabla\_{\boldsymbol{q}}^{2}f\_{\boldsymbol{f}}(\boldsymbol{y}\_{1})d\_{\boldsymbol{q}}(\mathbf{x}) - \frac{1}{2}d\_{\boldsymbol{q}}(\mathbf{x})^{T}\mathcal{W}\_{\boldsymbol{f}}(\boldsymbol{y}\_{2}) - \nabla\_{\boldsymbol{q}}^{2}f\_{\boldsymbol{f}}(\boldsymbol{y}\_{2})d\_{\boldsymbol{q}}(\mathbf{x})) \\ \leq \frac{1}{2} \|d\_{\boldsymbol{q}}(\mathbf{x})\|^{2} (\|B\_{\boldsymbol{f}}(\boldsymbol{y}\_{1}) - \nabla^{2}f\_{\boldsymbol{f}}(\boldsymbol{z}\_{1})\| \| + \|\nabla\_{\boldsymbol{q}}^{2}f\_{\boldsymbol{f}}(\boldsymbol{z}\_{2}) \\ - \nabla\_{\boldsymbol{q}}^{2}f\_{\boldsymbol{f}}(\boldsymbol{z}\_{1} + \|B\_{\boldsymbol{f}}(\boldsymbol{y}\_{2}) - \nabla^{2}f\_{\boldsymbol{f}}(\boldsymbol{z}\_{21})\|) \| + \|\nabla\_{\boldsymbol{q}}^{2}f\_{\boldsymbol{f}}(\boldsymbol{z}\_{2}) - \nabla\_{\boldsymbol{q}}^{2}f\_{\boldsymbol{f}}(\boldsymbol{z}\_{21})\| \\ \leq \frac{1}{2} \|d\_{\boldsymbol{q}}(\mathbf{x})\|^{2} (\varepsilon\_{\boldsymbol{z}1} + \varepsilon\_{\boldsymbol{z}2}). \end{split}$$

*ψx*,*<sup>j</sup>* is uniformly continuous [40] for all *x* ∈ *Y* and for all *j* = 1, ... , *m*. There exists *δ* ∈ *R*<sup>+</sup> such that for all *y*, *z* ∈ *Y*, *y* − *z* < *δ* implies |*ψ*(*y*) − *ψ*(*z*)| < for all *x* ∈ *Y*. Thus, *y* − *z* < *δ*.

$$\begin{aligned} \psi(z) &\leq \max\_{j=1,\dots,\text{ur}} \nabla f\_j(z)^T d\_q(y) + \frac{1}{2} d\_q(y)^T \mathcal{W}\_j(z) d\_q(y) = \phi\_\mathcal{Y}(z), \\ &\leq \phi\_\mathcal{Y}(y) + |\phi\_\mathcal{Y}(z) - \phi\_\mathcal{Y}(y)| < \psi(y) + \varepsilon. \end{aligned}$$

Thus, *ψ*(*z*) − *ψ*(*y*) < . If we interchange *y* and *z*, then |*ψ*(*z*) − *ψ*(*y*)| < . It proves the continuity of *ψ*.

The following modified lemma is due to [17,42].

**Lemma 1.** *Let <sup>F</sup>* : <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>R</sup>*<sup>m</sup> be continuously q-differentiable. If <sup>x</sup>*<sup>∗</sup> <sup>∈</sup> *<sup>X</sup> is not a critical point for* <sup>∇</sup>*q*(*x*)*dq* <sup>&</sup>lt; <sup>0</sup>*, where dq* <sup>∈</sup> <sup>R</sup>*n, <sup>σ</sup>* <sup>∈</sup> (0, 1]*, and <sup>ε</sup>* <sup>&</sup>gt; <sup>0</sup>*. Then,*

$$
\lambda x + \mathfrak{a}d\_{\mathfrak{q}}(\mathfrak{x}) \in X \text{ and } F(x + \mathfrak{a}d\_{\mathfrak{q}}(\mathfrak{x})) < F(x) + \mathfrak{a}\gamma\psi(x),
$$

*for any α* ∈ (0, *σ*] *and γ* ∈ (0,*ε*].

**Proof.** Since *x*<sup>∗</sup> is not a critical point, then *ψ*(*x*) < 0. Let *r* > 0 such that *B*(*x*,*r*) ⊂ *X* and *α* ∈ (0, *σ*]. Therefore,

$$F(\mathbf{x} + \mathfrak{a}d\_{\mathfrak{q}}(\mathbf{x})) - F(\mathbf{x}) = \mathfrak{a} \nabla\_{\mathfrak{q}} F(\mathbf{x})^T d\_{\mathfrak{q}}(\mathbf{x}) + o\_{\mathfrak{j}}(\mathfrak{a}d\_{\mathfrak{q}}(\mathbf{x}), \mathbf{x})$$

Since ∇*q*(*x*)*dq*(*x*) < *ψ*(*x*), for *α* ∈ (0, *σ*], then

$$F(\mathbf{x} + \alpha d\_q(\mathbf{x})) - F(\mathbf{x}) = \alpha \gamma \psi(\mathbf{x}) + \mathfrak{a} (1 - \sigma) \psi(\mathbf{x}) + o\_{\mathfrak{j}}(\alpha d\_q(\mathbf{x}), \mathbf{x}).$$

The last term in the right-hand side of the above equation is non-positive because *<sup>ψ</sup>*(*x*) <sup>≤</sup> *<sup>ψ</sup>*(*x*∗) <sup>2</sup> < 0, for *α* ∈ [0, *σ*].

#### **4. Algorithm and Convergence Analysis**

We first present the following Algorithm 1 [43] to find the gradient of the function using *q*-calculus. The higher-order *q*-derivative of *f* can be found in [44].

**Algorithm 1** *q*-Gradient Algorithm

```
1: Input q ∈ (0, 1), f(x), x ∈ R, z.
2: if x = 0 then
3: Set g ← lim  f(z)−f(q∗z)
                      (z−q∗z) , z, 0
                                     .
4: else
5: Set g ← f(x)−f(q∗x)
                 (x−q∗x) .
6: Print ∇q f(x) ← g.
```

$$\text{Example 1. Given that } f: \mathbb{R}^2 \to \mathbb{R} \text{ defined by } f(\mathbf{x}\_1, \mathbf{x}\_2) = \mathbf{x}\_2^2 + 3\mathbf{x}\_1^3. \text{ Then } \nabla\_q f(\mathbf{x}) = \begin{bmatrix} 3x\_1^2(1+q+q^2) \\ x\_2(1+q) \end{bmatrix}.$$

We are now prepared to write the unconstrained *q*-Quasi-Newton's Algorithm 2 for solving (UMOP). At each step, we solve the (CQOP) to find the *q*-Quasi-Newton direction. Then, we obtain the step length using the Armijo line search method. In every iteration, the new point and Hessian approximation are generated based on historical values.

### **Algorithm 2** *q*-Quasi-Newton's Algorithm for Unconstrained Multiobjective (*q*-QNUM)

1: Choose *<sup>q</sup>* <sup>∈</sup> (0, 1), *<sup>x</sup>*<sup>0</sup> <sup>∈</sup> *<sup>X</sup>*, symmetric definite matrix *<sup>W</sup>*<sup>0</sup> <sup>∈</sup> <sup>R</sup>*n*×*n*, *<sup>c</sup>* <sup>∈</sup> (0, 1), and a small tolerance value > 0. 2: **for** k=0,1,2,. . . **do** 3: Solve (CQOP). 4: Compute *d<sup>k</sup> <sup>q</sup>* and *ψk*. 5: **if** *<sup>ψ</sup><sup>k</sup>* <sup>&</sup>gt; <sup>−</sup> **then** 6: Stop. 7: **else** 8: Choose *<sup>α</sup><sup>k</sup>* as the *<sup>α</sup>* <sup>∈</sup> (0, 1] such that *<sup>x</sup><sup>k</sup>* <sup>+</sup> *<sup>α</sup>d<sup>k</sup> <sup>q</sup>* <sup>∈</sup> *<sup>X</sup>* and *<sup>F</sup>*(*x<sup>k</sup>* <sup>+</sup> *<sup>α</sup>d<sup>k</sup> <sup>q</sup>*) <sup>≤</sup> *<sup>F</sup>*(*xk*) + *<sup>c</sup>αψk*. 9: Update *<sup>x</sup>k*+<sup>1</sup> <sup>←</sup> *<sup>x</sup><sup>k</sup>* <sup>+</sup> *<sup>α</sup>kd<sup>k</sup> q*. 10: Update *Wk*+<sup>1</sup> *<sup>j</sup>* , where *j* = 1, . . . , *m* using (8).

We now finally start to show that every sequence produced by the proposed method converges to a weakly efficient point. It does not matter how poorly the initial point is guessed. We assume that the method does not stop, and produces an infinite sequence of iterates. We now present the modified sufficient conditions for the superlinear convergence [17,40] within the context of *q*-calculus.

**Theorem 1.** *Let* {*xk*} *be a sequence generated by (q-QNUM), and <sup>Y</sup>* <sup>⊂</sup> *<sup>X</sup> be a convex set. Also, <sup>γ</sup>* <sup>∈</sup> (0, 1) *and r*, *a*, *b*, *δ*, > 0, *and*


$$(d) \quad \frac{\bar{\epsilon}}{\frac{d}{\alpha}} \le \left(1 - c\_{\epsilon}\right)^{\epsilon}$$

$$\stackrel{\circ}{\circ}\_{\alpha} \quad \stackrel{B}{\underset{\cdots}{\circ}} \stackrel{\circ}{\circ}\_{\alpha'} \stackrel{r}{\underset{\cdots}{\circ}} \stackrel{r}{\in} \stackrel{\circ}{\times}\_{\alpha}$$

*(f) dq*(*x*0) <sup>&</sup>lt; min{*δ*,*r*(<sup>1</sup> <sup>−</sup> *<sup>ε</sup> <sup>a</sup>* )}.

*Then, for all k* ≥ *k*0*, we have that*

$$1. \quad \left\| \mathbf{x}^k - \mathbf{x}^{k\_0} \right\| \le \left\| d\_{\mathfrak{q}}(\mathbf{x}^0) \right\| \frac{1 - (\frac{\mathfrak{r}}{\mathfrak{r}})^{k - k\_0}}{1 - (\frac{\mathfrak{r}}{\mathfrak{q}})} $$

*2. α<sup>k</sup>* = 1*,*

*3. dq*(*xk*)≤*dq*(*xk*<sup>0</sup> )(*<sup>ε</sup> <sup>a</sup>* )*k*−*k*<sup>0</sup> ,

$$4. \quad \left||\dot{d}\_{\eta}(\mathbf{x}^{k+1})|| \stackrel{\smile}{\leq} \left||\dot{d}\_{\eta}(\mathbf{x}^{k})||\frac{\dot{e}}{a}.\right|$$

*Then, the sequence* {*xk*} *converges to local Pareto points <sup>x</sup>*<sup>∗</sup> <sup>∈</sup> <sup>R</sup>*m, and the convergence rate is superlinear.*

**Proof.** From part 1, part 3 of this theorem and triangle inequality,

$$||\mathfrak{x}^k + d\_q(\mathfrak{x}^k) - \mathfrak{x}^0|| \le \frac{1 - \left(\frac{\mathfrak{e}}{a}\right)^{k+1}}{1 - \frac{\mathfrak{e}}{a}} ||d\_q(\mathfrak{x}^{k\_0})||.$$

From (d) and (f), we follow *<sup>x</sup>k*, *<sup>x</sup><sup>k</sup>* <sup>+</sup> *dq*(*xk*) <sup>∈</sup> *<sup>K</sup>*(*xk*<sup>0</sup> ,*r*) and *<sup>x</sup><sup>k</sup>* <sup>+</sup> *dq*(*xk*) <sup>−</sup> *<sup>x</sup><sup>k</sup>* <sup>&</sup>lt; *<sup>δ</sup>*. We also have

$$f\_j(\mathbf{x}^k + d\_q(\mathbf{x}^k)) \le f\_j(\mathbf{x}^k) + d\_q(\mathbf{x}^k) \nabla\_q f(\mathbf{x}^k) + \frac{1}{2} d\_q(\mathbf{x}^k) (\nabla\_q^2 f)(\mathbf{x}^k) + \frac{1}{2} \|d\_q(\mathbf{x}^k)\|^2,$$

*Mathematics* **2020**, *8*, 616

that is,

$$\begin{split} f\_{\boldsymbol{f}}(\mathbf{x}^{k} + d\_{\boldsymbol{\theta}}(\mathbf{x}^{k})) &\leq f\_{\boldsymbol{f}}(\mathbf{x}^{k}) + \Psi(\mathbf{x}^{k}) + \frac{\varepsilon}{2} \| d\_{\boldsymbol{\theta}}(\mathbf{x}^{k}) \|^{2} \\ &= f\_{\boldsymbol{f}}(\mathbf{x}^{k}) + \gamma \Psi(\mathbf{x}^{k}) + (1 - \gamma) \psi(\mathbf{x}^{k}) + \frac{\varepsilon}{2} \| d\_{\boldsymbol{\theta}}(\mathbf{x}^{k}) \|^{2}. \end{split}$$
 
$$\text{Since } \boldsymbol{\psi} \leq 0 \text{ and } (1 - \gamma)\psi(\mathbf{x}^{k}) + \frac{\varepsilon}{2} \| d\_{\boldsymbol{\theta}}(\mathbf{x}^{k}) \|^{2} \leq (\varepsilon - a(1 - \gamma)) \frac{\| d\_{\boldsymbol{\theta}}(\mathbf{x}^{k}) \|^{2}}{2} \leq 0, \text{ we get}$$
 
$$f\_{\boldsymbol{f}}(\mathbf{x}^{k} + d\_{\boldsymbol{\theta}}(\mathbf{x}^{k})) \leq f\_{\boldsymbol{f}}(\mathbf{x}^{k}) + \gamma \Psi(\mathbf{x}^{k}),$$

for all *j* = 1, ... , *m*. The Armijo conditions holds for *α<sup>k</sup>* = 1. Part 1 of this theorem holds. We now set *<sup>x</sup>k*, *<sup>x</sup>k*+<sup>1</sup> <sup>∈</sup> *<sup>K</sup>*(*xk*<sup>0</sup> ,*r*), and *xk*+<sup>1</sup> <sup>−</sup> *<sup>x</sup>k* <sup>&</sup>lt; *<sup>δ</sup>*. Thus, we get *<sup>x</sup>k*+<sup>1</sup> <sup>=</sup> *<sup>x</sup><sup>k</sup>* <sup>+</sup> *dq*(*xk*). We now define *v*(*xk*+1) = ∑*<sup>m</sup> <sup>j</sup>*=<sup>1</sup> *λ<sup>k</sup> <sup>j</sup>* <sup>∇</sup>*<sup>q</sup> fj*(*xk*+1). Therefore,

$$|\psi(x^{k+1})| \le \frac{1}{2a} \|v(x^{k+1})\|^2.$$

We now estimate *v*(*xk*+1). For *<sup>x</sup>* <sup>∈</sup> *<sup>X</sup>*, we define

$$G^k(\mathbf{x}) := \sum\_{j=1}^m \lambda\_j^k f\_j(\mathbf{x}^{k+1})\_{\mathbf{x}^j}$$

and

$$H^k = \sum\_{j=1}^m \lambda\_j^k W\_j(\mathbf{x}^k),$$

where *λ<sup>k</sup> <sup>j</sup>* ≥ 0, for all *j* = 1, . . . , *m*, are KKT multipliers. We obtain following:

$$\nabla\_q G^k(\mathbf{x}) = \sum\_{j=1}^m \lambda\_j^k \nabla\_q f\_j(\mathbf{x})\_{\mathbf{x}}$$

and

$$
\nabla\_q^2 G^k(x) = \sum\_{j=1}^m \lambda\_j^k \nabla\_q^2 f\_j(x).
$$

Then, *<sup>v</sup>k*+<sup>1</sup> <sup>=</sup> <sup>∇</sup>*qGk*(*xk*+1). We get

$$d\_q(\mathfrak{x}^k) = -(H^k)^{-1} \nabla\_q G^k(\mathfrak{x}^k).$$

From assumptions (b) and (c) of this theorem,

$$\|\nabla\_q^2 G^k(y) - \nabla\_q^2 G^k(x^k)\| < \frac{\varepsilon}{2},$$

$$\left| \left| \left( H^k - \nabla\_q^2 G^k(\mathbf{x}^k) \right) (y - \mathbf{x}^k) \right| \right| < \frac{\varepsilon}{2} ||y - \mathbf{x}^k||^2$$

hold for all *x*, *y* ∈ *Y* with *y* − *x* < *δ* and *k* ≥ *k*0. We have

$$\|\nabla\_{\boldsymbol{\theta}}\mathsf{G}^{k}(\boldsymbol{\mathsf{x}}^{k}+\boldsymbol{d}\_{\boldsymbol{\theta}}(\boldsymbol{\mathsf{x}}^{k}))-(\nabla\_{\boldsymbol{\theta}}\mathsf{G}^{k}(\boldsymbol{\mathsf{x}}^{k})+\boldsymbol{H}^{k}\boldsymbol{d}\_{\boldsymbol{\theta}}(\boldsymbol{\mathsf{x}}^{k}))\|\,\|\,<\epsilon\|\boldsymbol{d}\_{\boldsymbol{\theta}}(\boldsymbol{\mathsf{x}}^{k})\|\,.$$

Since <sup>∇</sup>*qGk*(*xk*) + *<sup>H</sup>kdq*(*xk*) = 0, then

$$\|\|\boldsymbol{v}(\mathbf{x}^{k+1})\|\| = \|\nabla\_{\boldsymbol{q}}\mathcal{G}^{k}(\mathbf{x}^{k+1})\|\| \prec \epsilon \|\|d\_{\boldsymbol{q}}(\mathbf{x}^{k})\|\|\_{\ast}$$

*Mathematics* **2020**, *8*, 616

and

$$|\psi^{k+1}| \le \frac{1}{2a} ||v(\mathfrak{x}^{k+1})||^2 < \frac{\epsilon^2}{2a} ||d\_q(\mathfrak{x}^k)||^2.$$

We have

$$\frac{a}{2}||d\_{\emptyset}(\mathfrak{x}^{k+1})||^{2} < \frac{\epsilon^{2}}{2a}||d\_{\emptyset}(\mathfrak{x}^{k})||^{2}.$$

Thus,

$$||d\_{\emptyset}(\mathfrak{x}^{k+1})|| \le \frac{\mathfrak{e}}{a}||d\_{\emptyset}(\mathfrak{x}^k)||$$

Thus, part 4 is proved. We finally prove superlinear convergence of {*xk*}. First we define

$$r^k = ||d\_q^0|| \frac{\frac{k}{a}^{k-k\_0}}{1 - \frac{\epsilon}{a}}\_{.}'$$

and

$$\delta^k = ||d\_q^{k\_0}|| \left(\frac{\varepsilon}{a}\right)^{k - k\_0} \cdot \frac{\varepsilon}{a}$$

From triangle inequality, assumptions (e), (f) and part 1, we have *<sup>K</sup>*(*xk*,*rk*) <sup>⊂</sup> *<sup>K</sup>*(*xk*<sup>0</sup> ,*r*) <sup>⊂</sup> *<sup>V</sup>*. Choose any *<sup>τ</sup>* <sup>∈</sup> <sup>R</sup>+, and define

$$
\bar{\varepsilon} = \min \{ a \frac{\tau}{1 + 2\tau}, \varepsilon \}.
$$

For *k* ≥ *k*<sup>0</sup> inequalities

$$\|\nabla\_q^2 f\_{\bar{\jmath}}(y) - \nabla\_q^2 f\_{\bar{\jmath}}(\mathbf{x})\| < \frac{\bar{\varepsilon}}{2}$$

for all *<sup>x</sup>*, *<sup>y</sup>* <sup>∈</sup> *<sup>K</sup>*(*xk*,*rk*) with *<sup>y</sup>* <sup>−</sup> *<sup>x</sup>* <sup>&</sup>lt; *<sup>δ</sup>k*, and

$$\left\|\left|\mathcal{W}\_{\vec{j}}(\mathbf{x}^{I}) - \nabla\_{\mathbf{q}}^{2}f\_{\vec{j}}(\mathbf{x}^{I})(\boldsymbol{y} - \mathbf{x}^{I})\right|\right\| \leq \frac{\varepsilon}{2}$$

for all *<sup>y</sup>* <sup>∈</sup> *<sup>K</sup>*(*xk*,*rk*) and *<sup>l</sup>* <sup>≥</sup> *<sup>k</sup>* holds both for *<sup>j</sup>* <sup>=</sup> 1, ... , *<sup>m</sup>*. Assumptions (a)–(f) are satisfied for *<sup>ε</sup>*¯,*rk*, *<sup>δ</sup>k*, and *x<sup>k</sup>* instead of ,*r*, *δ*, and *x*0, respectively. We have

$$||\mathbf{x}^{l} - \mathbf{x}^{k}|| \le ||d\_{q}(\mathbf{x}^{k})|| \frac{1 - (\frac{\mathcal{E}}{\tilde{a}})^{l-k}}{1 - \frac{\mathcal{E}}{\tilde{a}}}.$$

Let *<sup>l</sup>* <sup>→</sup> <sup>∞</sup> and we get *x*<sup>∗</sup> <sup>−</sup> *<sup>x</sup>k*≤*dq*(*xk*) <sup>1</sup> <sup>1</sup><sup>−</sup> *<sup>ε</sup>*¯ *a* . Using the last inequality, and part 4, we have

$$||\mathbf{x}^\* - \mathbf{x}^{k+1}|| \le ||d\_q(\mathbf{x}^{k+1})|| \frac{1}{1 - \frac{\xi}{a}} \le ||d\_q(\mathbf{x}^k)|| \frac{\frac{\xi}{a}}{1 - \frac{\xi}{a}}$$

From above and triangle inequality, we have

$$||\mathbf{x}^\* - \mathbf{x}^{k+1}|| \ge ||\mathbf{x}^{k+1} - \mathbf{x}^k|| - ||\mathbf{x}^\* - \mathbf{x}^{k+1}||\_\prime$$

that is,

$$\|\mathbf{x}^\* - \mathbf{x}^{k+1}\| \ge \|d\_q(\mathbf{x}^k)\| - \|d^k\| \frac{\mathbb{E}}{1 - \frac{\mathbb{E}}{\tilde{a}}} = \|d\_q^k\| \frac{1 - 2\frac{\mathbb{E}}{\tilde{a}}}{1 - \frac{\mathbb{E}}{\tilde{a}}}.\tag{23}$$

.

Since 1 <sup>−</sup> <sup>2</sup> *<sup>ε</sup>*¯ *<sup>a</sup>* <sup>&</sup>gt; 0, and 1 <sup>−</sup> <sup>2</sup> *<sup>ε</sup>*¯ *<sup>a</sup>* > 0, then we get

$$\|\mathbf{x}^\* - \mathbf{x}^{k+1}\| \le \tau \|\mathbf{x}^\* - \mathbf{x}^k\|\_{\prime}$$

where *<sup>τ</sup>* <sup>∈</sup> <sup>R</sup><sup>+</sup> is chosen arbitrarily. Thus, the sequence {*xk*} converges superlinearly to *<sup>x</sup>*∗.

### **5. Numerical Results**

The proposed algorithm (*q*-QNUM), i.e., Algorithm 2, presented in Section 4 is implemented in MATLAB (2017a) and tested on some test problems known from the literature. All tests were run under the same conditions. The box constraints of the form *lb* ≤ *x* ≤ *ub* are used for each test problem. These constraints are considered under the direction search problem (CQOP) such that the newly generated point always lies in the same box, that is, *lb* <sup>≤</sup> *<sup>x</sup>* <sup>+</sup> *dq* <sup>≤</sup> *ub* holds. We use the stopping criteria at *<sup>x</sup><sup>k</sup>* as: *<sup>ψ</sup>*(*xk*) <sup>&</sup>gt; <sup>−</sup> where <sup>∈</sup> <sup>R</sup>+. All test problems given in Table <sup>1</sup> are solved 100 times. The starting points are randomly chosen from a uniform distribution between *lb* and *ub*. The first column in the given table is the name of the test problem. We use the abbreviation of author's names and number of the problem in the corresponding paper. The second column indicates the source of the paper. The third column is for lower bound and upper bound. We compare the results of (q-QNUM) with (QNMO) of [40] in the form of a number of iterations (*iter*), number of objective functions evaluation (*obj*), and number of gradient evaluations (*grad*), respectively. From Table 1, we can conclude that our algorithm shows better performance.

**Example 2.** *Find the approximate Pareto front using (q-QNUM) and (QNMO) for the given (UMOP) [45]:*

$$\begin{aligned} \text{Minimize } f\_1(\mathbf{x\_1}, \mathbf{x\_2}) &= (\mathbf{x\_1} - \mathbf{1})^2 + (\mathbf{x\_1} - \mathbf{x\_2})^2, \\ \text{Minimize } f\_2(\mathbf{x\_1}, \mathbf{x\_2}) &= (\mathbf{x\_2} - \mathbf{3})^2 + (\mathbf{x\_1} - \mathbf{x\_2})^2. \end{aligned}$$

*where* −3 ≤ *x*1, *x*<sup>2</sup> ≤ 10*.*

The number of Pareto points generated due to (*q*-QNUM) with Algorithm 1 and (QNMO) is shown in Figure 1. One can observe that the number of iterations as *iter* = 200 in (*q*-QNUM) and *iter* = 525 in (QNMO) are responsible for generating the approximate Pareto front of above (UMOP).

**Figure 1.** Approximate Pareto Front of Example 1.


**Table 1.** Numerical Results of Test Problems.

### **6. Conclusions**

The *q*-Quasi-Newton method converges superlinearly to the solution of (UMOP) if all objective functions are strongly convex within the context of *q*-derivative. In a neighborhood of this solution, the algorithm uses a full Armijo steplength. The numerical performance of the proposed algorithm is faster than their actual evaluation.

**Author Contributions:** K.K.L. gave reasonable suggestions for this manuscript; S.K.M. gave the research direction of this paper; B.R. revised and completed this manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the Science and Engineering Research Board (Grant No. DST-SERB-MTR-2018/000121) and the University Grants Commission (IN) (Grant No. UGC-2015-UTT–59235).

**Acknowledgments:** The authors are grateful to the anonymous reviewers and the editor for the valuable comments and suggestions to improve the presentation of this paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Convergence Analysis and Complex Geometry of an Efficient Derivative-Free Iterative Method**

**Deepak Kumar 1,2\*,†, Janak Raj Sharma 1,\*,† and Lorentz Jäntschi 3,4,\***


Received: 12 September 2019; Accepted: 29 September 2019; Published: 2 October 2019

**Abstract:** To locate a locally-unique solution of a nonlinear equation, the local convergence analysis of a derivative-free fifth order method is studied in Banach space. This approach provides radius of convergence and error bounds under the hypotheses based on the first Fréchet-derivative only. Such estimates are not introduced in the earlier procedures employing Taylor's expansion of higher derivatives that may not exist or may be expensive to compute. The convergence domain of the method is also shown by a visual approach, namely basins of attraction. Theoretical results are endorsed via numerical experiments that show the cases where earlier results cannot be applicable.

**Keywords:** local convergence; nonlinear equations; Banach space; Fréchet-derivative

**MSC:** 49M15; 47H17; 65H10

### **1. Introduction**

Banach [1] or complete normed vector spaces constantly bring new solving strategies for real problems in domains dealing with numerical methods (see for example [2–5]). In this context, development of new methods [6] and their convergence analysis [7] are of growing interest.

Let *B*1, *B*<sup>2</sup> be Banach spaces and Ω ⊆ *B*<sup>1</sup> be closed and convex. In this study, we locate a solution *x*∗ of the nonlinear equation

$$F(\mathbf{x}) = \mathbf{0},\tag{1}$$

where *F* : Ω ⊆ *B*<sup>1</sup> → *B*<sup>2</sup> is a Fréchet-differentiable operator. In computational sciences, many problems can be transformed into form (1). For example, see the References [8–11]. The solution of such nonlinear equations is hardly attainable in closed form. Therefore, most of the methods for solving such equations are usually iterative. The important issue addressed to an iterative method is its domain of convergence since it gives us the degree of difficulty for obtaining initial points. This domain is generally small. Thus, it is necessary to enlarge the domain of convergence but without any additional hypotheses. Another important problem related to convergence analysis of an iterative method is to find precise error estimates on *xn*+<sup>1</sup> − *xn* or *xn* − *x*∗.

A good reference for the general principles of functional analysis is [12]. Recurrence relations for rational cubic methods are revised in [13] (for Halley method) and in [14] (for Chebyshev method). A new iterative modification of Newton's method for solving nonlinear scalar equations was proposed in [15], while a modification of a variant of it with accelerated third order convergence was proposed in [16]. An ample collection of iterative methods is found in [9]. The recurrence relations for Chebyshev-type methods accelerating classical Newton iteration have been introduced in [17], recurrence relations in a third-order family of Newton-like methods for approximating solution of a nonlinear equation in Banach spaces were studied in [18]. In the context of Kantrovich assumptions for semilocal convergence of a Chebyshev method, the convergence conditions are significantly reduced in [19]. The computational efficiency and the domain of the uniqueness of the solution were readdressed in [20]. The point of attraction of two fourth-order iterative Newton-type methods was studied in [21], while convergence ball and error analysis of Newton-type methods with cubic convergence were studied in [22,23]. Weaker conditions for the convergence of Newton's method are given in [24], while further analytical improvements in two particular cases as well as numerical analysis in the general case are given in [25], while local convergence of three-step Newton–Gauss methods in Banach spaces was recently analyzed in [26]. Recently, researchers have also constructed some higher order methods; see, for example [27–31] and references cited therein.

One of the basic methods for approximating a simple solution *x*∗ of Equation (1) is the quadratically convergent derivative-free Traub–Steffensen's method, which is given by

$$\mathbf{x}\_{n+1} = \mathbf{M}\_{2,1}(\mathbf{x}\_n) = \mathbf{x}\_n - [u\_n, \mathbf{x}\_n; F]^{-1} F(\mathbf{x}\_n), \text{ for each } n = 0, 1, 2, \dots, \tag{2}$$

where *un* <sup>=</sup> *xn* <sup>+</sup> *<sup>β</sup>F*(*xn*), *<sup>β</sup>* <sup>∈</sup> <sup>R</sup> − {0} has a quadratic order of convergence. Based on (2), Sharma et al. [32] have recently proposed a derivative-free method with fifth order convergence for approximating a solution of *F*(*x*) = 0 using the weight-function scheme defined for each *n* = 0, 1 ... by

$$y\_n = \mathbf{M}\_{2,1}(\mathbf{x}\_n),$$

$$z\_{n!} = y\_n - \begin{bmatrix} u\_{n\prime} \mathbf{x}\_n \text{; } F \end{bmatrix}^{-1} F(y\_n),$$

$$\mathbf{x}\_{n+1} = z\_{n!} - H(\mathbf{x}\_{\text{ll}}) \begin{bmatrix} u\_{\text{ll}} \mathbf{x}\_{\text{ll}} \text{; } F \end{bmatrix}^{-1} F(z\_n),\tag{3}$$

wherein *H*(*xn*) = 2*I* − [*un*, *xn* ; *F*] <sup>−</sup>1[*zn*, *yn* ; *F*]. The computational efficiency of this method was discussed in detail and performance was favorably compared with existing methods in [32]. To prove the local convergence order, the authors used Taylor expansions with hypotheses based on a Fréchet-derivative up to the fifth order. It is quite clear that these hypotheses restrict the applicability of methods to the problems involving functions that are at least five times Fréchet-differentiable. For example, let us define a function *<sup>g</sup>* on <sup>Ω</sup> = [−<sup>1</sup> 2 , 5 <sup>2</sup> ] by

$$\lg(t) = \begin{cases} t^3 \ln t^2 + t^5 - t^4, & t \neq 0, \\ 0, \; t = 0. \end{cases} \tag{4}$$

We have that

$$\begin{aligned} \mathbf{g}'(t) &= 3t^2 \ln t^2 + 5t^4 - 4t^3 + 2t^2, \\ \mathbf{g}''(t) &= 6t \ln t^2 + 20t^3 - 12t^2 + 10t \end{aligned}$$

and

$$g^{\prime\prime\prime}(t) = 6\ln t^2 + 60t^2 - 24t + 22.5$$

Then, *g* is unbounded on Ω. Notice also that the proofs of convergence use Taylor expansions.

In this work, we study the local convergence of the methods (3) using the hypotheses on the first Fréchet-derivative only taking advantage of the Lipschitz continuity of the first Fréchet-derivative. Moreover, our results are presented in the more general setting of a Banach space. We summarize the contents of the paper. In Section 2, the local convergence analysis of method (3) is presented. In Section 3, numerical examples are performed to verify the theoretical results. Basins of attraction showing convergence domain are drawn in Section 4. Concluding remarks are reported in Section 5.

*Mathematics* **2019**, *7*, 919

### **2. Local Convergence Analysis**

Let's study the local convergence of method (3). Let *p* ≥ 0 and *M* ≥ 0 be the parameters and *<sup>w</sup>*<sup>0</sup> : [0, <sup>+</sup>∞)<sup>2</sup> <sup>→</sup> [0, <sup>+</sup>∞) be a continuous and nondecreasing function with *<sup>w</sup>*0(0, 0) = 0. Let the parameter *r* be defined by

$$r = \sup\left\{ t \ge 0 \; ; \; w\_0(pt, t) < 1 \right\}.\tag{5}$$

Consider the functions *<sup>w</sup>*<sup>1</sup> : [0,*r*)<sup>2</sup> <sup>→</sup> [0, <sup>+</sup>∞) and *<sup>v</sup>*<sup>0</sup> : [0,*r*) <sup>→</sup> [0, <sup>+</sup>∞) as continuous and nondecreasing. Furthermore, define functions *g*<sup>1</sup> and *h*<sup>1</sup> on the interval [0,*r*) as

$$\mathbf{g}\_1(t) = \frac{w\_1(\beta v\_0(t)t, t)}{1 - w\_0(pt, t)}$$

and

$$h\_1(t) = \lg\_1(t) - 1.$$

Suppose that

$$w\_1(0,0) < 1.\tag{6}$$

From (6), we obtain that

$$h\_1(0) = \frac{w\_1(0,0)}{1 - w\_0(0,0)} - 1 < 0$$

and, by (5), *h*1(*t*) → +∞ as *t* → *r*−. Then, it follows from the intermediate value theorem [33] that equation *h*1(*t*) = 0 has solutions in (0,*r*). Denote by *r*<sup>1</sup> the smallest such solution.

Furthermore, define functions *g*<sup>2</sup> and *h*<sup>2</sup> on the interval [0,*r*1) by

$$\mathcal{g}\_2(t) = \left(1 + \frac{M}{1 - w\_0(pt, t)}\right) \mathcal{g}\_1(t)$$

and

$$h\_2(t) = \lg\_2(t) - 1.$$

Then, we have that *h*2(0) = −1 < 0 and *h*2(*t*) → +∞ as *t* → *r*<sup>−</sup> <sup>1</sup> . Let *r*<sup>2</sup> be the smallest zero of function *h*<sup>2</sup> on the interval (0,*r*1).

Finally, define the functions *g*¯, *g*<sup>3</sup> and *h*<sup>3</sup> on the interval [0,*r*2) by

$$\mathcal{g}(t) = \frac{1}{1 - w\_0(pt, t)} \left( 1 + \frac{\left(w\_0(pt, t) + w\_0(\mathcal{g}\_1(t)t, \mathcal{g}\_2 t)\right)}{1 - w\_0(pt, t)} \right) \rho$$

$$\mathcal{g}\_3(t) = \left(1 + M\mathcal{g}(t)\right)\mathcal{g}\_2(t)$$

and

*h*3(*t*) = *g*3(*t*) − 1.

It follows that *h*3(*t*) = −1 < 0 and *h*3(*t*) → +∞ as *t* → *r*<sup>−</sup> <sup>2</sup> . Denote the smallest zero of function *h*<sup>3</sup> by *r*<sup>3</sup> on the interval (0,*r*2). Finally, define the radius of convergence (say, *r*∗) by

$$r^\* = \min\{r\_i\}, \ i = 1, 2, 3. \tag{7}$$

Then, for each *t* ∈ [0,*r*), we have that

$$0 \le \varrho\_i(t) < 1, \ i = 1, 2, 3. \tag{8}$$

Denote by *U*(*ν*,*ε*) = {*x* ∈ *B*<sup>1</sup> : *x* − *ν* < *ε*} the ball whose center *ν* ∈ *B*<sup>1</sup> and radius *ε* > 0. Moreover, *U*¯ (*ν*,*ε*) denotes the closure of *U*(*ν*,*ε*).

*Mathematics* **2019**, *7*, 919

We will study the local convergence of method (3) in a Banach space setting under the following hypotheses (collectively called by the name 'A'):


$$\|\|F'(\mathbf{x}^\*)^{-1}([\mathbf{x}, \mathbf{y}; F] - F'(\mathbf{x}^\*))\|\| \le w\_0 (\|\|\mathbf{x} - \mathbf{x}^\*\|\|, \|\|\mathbf{y} - \mathbf{x}^\*\|\|).$$

(a4) Let Ω<sup>0</sup> = Ω ∩ *U*(*x*∗,*r*), where *r* has been defined before. There exists continuous and nondecreasing function *<sup>v</sup>*<sup>0</sup> : [0,*r*) <sup>→</sup> <sup>R</sup><sup>+</sup> ∪ {0} such that, for each *<sup>x</sup>*, *<sup>y</sup>* <sup>∈</sup> <sup>Ω</sup>0,

$$\|\beta[\mathbf{x}, \mathbf{x}^\*; F] \| \le v\_0 (\|\mathbf{x}\_0 - \mathbf{x}^\*\|),$$

$$\Omega(\mathbf{x}^\*, r) \subset \Omega,$$

$$\|\|I + \beta[\mathbf{x}, \mathbf{x}^\*; F] \| \le p.$$


**Theorem 1.** *Suppose that the hypotheses* (*A*) *hold. Then, the sequence* {*xn*} *generated by method (3) for x*<sup>0</sup> ∈ *U*(*x*∗,*r*3) − {*x*∗} *is well defined in U*(*x*∗,*r*3)*, remains in U*(*x*∗,*r*3) *and converges to x*∗*. Moreover, the following conditions hold:*

$$\|\|y\_n - \mathbf{x}^\*\|\| \le g\_1(\|\mathbf{x}\_n - \mathbf{x}^\*\|) \|\mathbf{x}\_n - \mathbf{x}^\*\| \le \|\mathbf{x}\_n - \mathbf{x}^\*\| < \varrho,\tag{9}$$

$$||\mathbf{z}\_{\mathsf{H}} - \mathbf{x}^\*|| \le \mathsf{g}\_2(||\mathbf{x}\_{\mathsf{H}} - \mathbf{x}^\*||) ||\mathbf{x}\_{\mathsf{H}} - \mathbf{x}^\*|| \le ||\mathbf{x}\_{\mathsf{H}} - \mathbf{x}^\*||\tag{10}$$

*and*

$$\|\mathbf{x}\_{n+1} - \mathbf{x}^\*\| \le \mathcal{g}\_3(\|\mathbf{x}\_n - \mathbf{x}^\*\|) \|\mathbf{x}\_n - \mathbf{x}^\*\| \le \|\mathbf{x}\_n - \mathbf{x}^\*\|,\tag{11}$$

*where the functions gi, i* = 1, 2, 3 *are defined as above. Furthermore, the vector x*<sup>∗</sup> *is the only solution of F*(*x*) = 0 *in* Ω1*.*

**Proof.** We shall show estimates (9)–(11) using mathematical induction. By hypothesis (a3) and for *x* ∈ *U*(*x*∗,*r*3), we have that

$$\begin{array}{c} \|F'(\mathbf{x}^\*)^{-1}([\![\mathbf{u}\_0, \mathbf{x}\_0; F] - F'(\mathbf{x}^\*) )\!)\| \le w\_0 (\|\![\mathbf{u}\_0 - \mathbf{x}^\* \Vert \!) / \|\![\mathbf{x}\_0 - \mathbf{x}^\* \Vert \!))\\ \le w\_0 (\|\![\mathbf{x}\_0 - \mathbf{x}^\* + \beta F(\mathbf{x}\_0) \Vert \!) / \|\![\mathbf{x}\_0 - \mathbf{x}^\* \Vert \!))\\ \le w\_0 (\langle I + \beta[\mathbf{x}\_0, \mathbf{x}^\* \rangle \! \! \rangle \|\!) \|\![\mathbf{x}\_0 - \mathbf{x}^\* \Vert \!) / \|\!) \|\![\mathbf{x}\_0 - \mathbf{x}^\* \Vert \!)\\ \le w\_0 (p \|\!|\mathbf{x}\_0 - \mathbf{x}^\*\Vert \!) / \|\!|\mathbf{x}\_0 - \mathbf{x}^\*\Vert \!)\\ \le w\_0 (p r \, r \right) < 1. \end{array} \tag{12}$$

By (12) and the Banach Lemma [9], we have that [*un*, *xn* ; *F*] <sup>−</sup><sup>1</sup> ∈ L(*B*2, *<sup>B</sup>*1) and

$$\|\| [u\_0, \mathbf{x}\_0; F]^{-1} F(\mathbf{x}^\*) \|\| \le \frac{1}{1 - w\_0(p \|\| \mathbf{x}\_0 - \mathbf{x}^\* \|\|, \|\| \mathbf{x}\_0 - \mathbf{x}^\* \|\|)}. \tag{13}$$

We show that *yn* is well defined by the method (3) for *n* = 0. We have

$$\begin{aligned} y\_0 - \mathbf{x}^\* &= \left. \mathbf{x}\_0 - \mathbf{x}^\* - \left[ \boldsymbol{\mu}\_{0\prime} \mathbf{x}\_0; F \right]^{-1} F(\mathbf{x}\_0) \\ &= \left[ \boldsymbol{\mu}\_{0\prime} \mathbf{x}\_0; F \right]^{-1} F(\mathbf{x}^\*) F'(\mathbf{x}^\*)^{-1} \left( \left[ \boldsymbol{\mu}\_{0\prime} \mathbf{x}\_0; F \right] - \left[ \mathbf{x}\_0, \mathbf{x}^\*; F \right] \right) (\mathbf{x}\_0 - \mathbf{x}^\*) . \end{aligned} \tag{14}$$

Then, using (8) (for *i* = 1), the conditions (a4) and (13), we have in turn that

*y*<sup>0</sup> − *x*∗ = [*u*0, *x*<sup>0</sup> ; *F*] <sup>−</sup>1*F* (*x*∗)*F* (*x*∗)−<sup>1</sup> [*u*0, *x*<sup>0</sup> ; *F*] − [*xn*, *x*<sup>∗</sup> ; *F*] *x*<sup>0</sup> − *x*∗ <sup>≤</sup> *<sup>w</sup>*1(*u*0−*x*0,*x*0−*x*∗)*x*0−*x*∗ 1−*w*0(*px*0−*x*∗,*x*0−*x*∗) <sup>≤</sup> *<sup>w</sup>*1(*βF*(*x*0),*x*0−*x*∗)*x*0−*x*∗ 1−*w*0(*px*0−*x*∗,*x*0−*x*∗) <sup>≤</sup> *<sup>w</sup>*1(*β*[*x*0,*x*<sup>∗</sup> ; *<sup>F</sup>*](*x*0−*x*∗),*x*0−*x*∗) <sup>1</sup>−*w*0(*px*0−*x*∗,*x*0−*x*∗) *x*<sup>0</sup> <sup>−</sup> *<sup>x</sup>*∗ <sup>≤</sup> *<sup>w</sup>*1(*βv*0(*x*0−*x*∗)(*x*0−*x*∗),*x*0−*x*∗) <sup>1</sup>−*w*0(*px*0−*x*∗,*x*0−*x*∗) *x*<sup>0</sup> <sup>−</sup> *<sup>x</sup>*∗ ≤ *g*1(*x*<sup>0</sup> − *x*∗)*x*<sup>0</sup> − *x*∗ < *x*<sup>0</sup> − *x*∗ < *r*, (15)

which implies (9) for *n* = 0 and *y*<sup>0</sup> ∈ *U*(*x*∗,*r*3).

Note that for each *θ* ∈ [0, 1] and *x*<sup>∗</sup> + *θ*(*x*<sup>0</sup> − *x*∗) − *x*∗ = *θx*<sup>0</sup> − *x*∗ < *r*, that is, *x*<sup>∗</sup> + *θ*(*x*<sup>0</sup> − *x*∗) ∈ *U*(*x*∗,*r*3), writing

$$F(\mathbf{x}\_0) = F(\mathbf{x}\_0) - F(\mathbf{x}^\*) = \int\_0^1 F'(\mathbf{x}^\* + \theta(\mathbf{x}\_0 - \mathbf{x}^\*))(\mathbf{x}\_0 - \mathbf{x}^\*)d\theta. \tag{16}$$

Then, using (a5), we get that

$$\begin{aligned} \|F'(\mathbf{x}^\*)^{-1}F(\mathbf{x}\_0)\| &= \left\| \int\_0^1 F'(\mathbf{x}^\*)^{-1}F'(\mathbf{x}^\* + \theta(\mathbf{x}\_0 - \mathbf{x}^\*))(\mathbf{x}\_0 - \mathbf{x}^\*)d\theta \right\| \\ &\le M\|\mathbf{x}\_0 - \mathbf{x}^\*\|. \end{aligned} \tag{17}$$

Similarly, we obtain

$$\|\|F'(\mathbf{x}^\*)^{-1}F(y\_0)\|\| \le M\|\|y\_0 - \mathbf{x}^\*\|\|,\tag{18}$$

$$\|\|F'(\mathbf{x}^\*)^{-1}F(z\_0)\|\| \le M\|z\_0 - \mathbf{x}^\*\|.\tag{19}$$

From the second sub-step of method (3), (13), (15) and (18), we obtain that

$$\begin{split} \left\| \left\| \mathbf{z}\_{0} - \mathbf{x}^{\*} \right\| \right\| &\leq \left\| \left\| y\_{0} - \mathbf{x}^{\*} \right\| + \left\| \left[ \left[ u\_{0}, \mathbf{x}\_{0} \right]; F \right]^{-1} F'(\mathbf{x}^{\*}) \right\| \left\| \left\| F(\mathbf{x}^{\*})^{-1} F(y\_{0}) \right\| \right\| \\ &\leq \left\| y\_{0} - \mathbf{x}^{\*} \right\| + \frac{M \| y\_{0} - \mathbf{x}^{\*} \|}{1 - w\_{0}(p \| \left\| \mathbf{x}\_{0} - \mathbf{x}^{\*} \right\|, \left\| \mathbf{x}\_{0} - \mathbf{x}^{\*} \right\|)} \\ &\leq \left( 1 + \frac{M}{1 - w\_{0}(p \| \left\| \mathbf{x}\_{0} - \mathbf{x}^{\*} \right\|, \left\| \left\| \mathbf{x}\_{0} - \mathbf{x}^{\*} \right\| \right\|)}{1 - w\_{0}(p \| \left\| \left\| \mathbf{x}\_{0} - \mathbf{x}^{\*} \right\|, \left\| \left\| \mathbf{x}\_{0} - \mathbf{x}^{\*} \right\|) \right\|} \right) \| \mathbf{x}\_{0} - \mathbf{x}^{\*} \| \right| \\ &\leq \left( 1 + \frac{M}{1 - w\_{0}(p \| \left\| \mathbf{x}\_{0} - \mathbf{x}^{\*} \right\|, \left\| \left\| \mathbf{x}\_{0} - \mathbf{x}^{\*} \right\| \right\|)}{1 - w\_{0}(p \| \left\| \left\| \left\| \mathbf{x}\_{0} - \mathbf{x}^{\*} \right\|) \right\|}$$

which proves (10) for *n* = 0 and *z*<sup>0</sup> ∈ *U*(*x*∗,*r*3).

Let *ψ*(*xn*, *yn*) = 2*I* − [*un*, *xn* ; *F*] <sup>−</sup>1[*yn*, *zn* ; *F*] [*un*, *xn* ; *F*] <sup>−</sup><sup>1</sup> and notice that, since *<sup>x</sup>*0, *<sup>y</sup>*<sup>0</sup> <sup>∈</sup> *U*(*x*∗,*r*3), we have that

*ψ*(*x*0, *y*0)*F* (*x*∗) = 2*I* − [*u*0, *x*<sup>0</sup> ; *F*] <sup>−</sup>1[*y*0, *z*<sup>0</sup> ; *F*] [*u*0, *x*<sup>0</sup> ; *F*] <sup>−</sup>1*F* (*x*∗) ≤ 1 + [*u*0, *x*<sup>0</sup> ; *F*] −1 [*u*0, *x*<sup>0</sup> ; *F*] − [*y*0, *z*<sup>0</sup> ; *F*] [*u*0, *x*<sup>0</sup> ; *F*] <sup>−</sup>1*F* (*x*∗) ≤ 1 + [*u*0, *x*<sup>0</sup> ; *F*] <sup>−</sup>1*F* (*x*∗) *F* (*x*∗)−1([*u*0, *<sup>x</sup>*<sup>0</sup> ; *<sup>F</sup>*] <sup>−</sup> *<sup>F</sup>* (*x*∗)) +*F* (*x*∗)−1(*F* (*x*∗) − [*y*0, *z*<sup>0</sup> ; *F*]) [*u*0, *x*<sup>0</sup> ; *F*] <sup>−</sup>1*F* (*x*∗) ≤ 1 + *w*0(*pxn*−*x*∗,*xn*−*x*∗)+*w*<sup>0</sup> *y*0−*x*∗,*z*0−*x*∗ 1−*w*0(*pxn*−*x*∗,*xn*−*x*∗) <sup>×</sup> <sup>1</sup> 1−*w*0(*pxn*−*x*∗,*xn*−*x*∗) ≤ 1 + *w*0(*pxn*−*x*∗,*xn*−*x*∗)+*w*<sup>0</sup> *g*1(*x*0−*x*∗)*x*0−*x*∗,*g*2(*x*0−*x*∗)*x*0−*x*∗ 1−*w*0(*pxn*−*x*∗,*xn*−*x*∗) <sup>×</sup> <sup>1</sup> 1−*w*0(*pxn*−*x*∗,*xn*−*x*∗) ≤ *g*¯(*x*<sup>0</sup> − *x*∗). (21)

Then, using Equation (8) (for *i* = 3), (19), (20) and (21), we obtain

$$\begin{split} \|\|\mathbf{x}\_{1} - \mathbf{x}^{\*}\|\| &= \|\|z\_{0} - \mathbf{x}^{\*} - \psi(\mathbf{x}\_{0}, y\_{0})F(z\_{0})\|\| \\ &\leq \|\|z\_{0} - \mathbf{x}^{\*}\|\| + \|\psi(\mathbf{x}\_{0}, y\_{0})F(\mathbf{x}^{\*})\|\| \|F(\mathbf{x}^{\*})^{-1}F(z\_{0})\|\| \\ &\leq \|\|z\_{0} - \mathbf{x}^{\*}\|\| + \overline{g}(\|\|\mathbf{x}\_{0} - \mathbf{x}^{\*}\|)M\|\|z\_{0} - \mathbf{x}^{\*}\|\| \\ &= \left(1 + M\overline{g}(\|\|\mathbf{x}\_{0} - \mathbf{x}^{\*}\|\|)\right) \|z\_{0} - \mathbf{x}^{\*}\|\| \\ &\leq \left(1 + M\overline{g}(\|\|\mathbf{x}\_{0} - \mathbf{x}^{\*}\|\|)\right) \underline{g}\_{2}(\|\|\mathbf{x}\_{0} - \mathbf{x}^{\*}\|\|) \|\mathbf{x}\_{0} - \mathbf{x}^{\*}\|\| \\ &\leq \underline{g}\_{3}(\|\|\mathbf{x}\_{0} - \mathbf{x}^{\*}\|\|) \|\mathbf{x}\_{0} - \mathbf{x}^{\*}\|\| \end{split}$$

which proves (11) for *n* = 0 and *x*<sup>1</sup> ∈ *U*(*x*∗,*r*3).

Replace *x*0, *y*0, *z*0, *x*<sup>1</sup> by *xn*, *yn*, *zn*, *xn*+<sup>1</sup> in the preceding estimates to obtain (9)–(11). Then, from the estimates *xn*+<sup>1</sup> − *x*∗ ≤ *cxn* − *x*∗ < *r*3, where *c* = *g*3(*x*<sup>0</sup> − *x*∗) ∈ [0, 1), we deduce that lim*n*→<sup>∞</sup> *xn* = *x*<sup>∗</sup> and *xn*+<sup>1</sup> ∈ *U*(*x*∗,*r*3).

Next, we show the uniqueness part using conditions (a3) and (a6). Define operator *P* by *P* = 1 <sup>0</sup> *F* (*x*∗∗ + *θ*(*x*<sup>∗</sup> − *x*∗∗))*dθ* for some *x*∗∗ ∈ Ω<sup>1</sup> with *F*(*x*∗∗) = 0. Then, we have that

$$\begin{aligned} \| |F'(\mathbf{x}^\*)^{-1} \left( P - F'(\mathbf{x}^\*) \right)| \| \leq \int\_0^1 w\_0(\theta \| \mathbf{x}^\* - \mathbf{x}^{\*\*} \|) d\theta \\ \leq \int\_0^1 w\_0(\theta \varrho^\*) d\theta < 1, \end{aligned}$$

so *<sup>P</sup>*−<sup>1</sup> ∈ L(*B*2, *<sup>B</sup>*1). Then, from the identity

$$0 = F(\mathbf{x}^\*) - F(\mathbf{x}^{\*\*}) = P(\mathbf{x}^\* - \mathbf{x}^{\*\*})\_{\prime\prime}$$

it implies that *x*∗ = *x*∗∗.

### **3. Numerical Examples**

We illustrate the theoretical results shown in Theorem 1. For the computation of divided difference, let us choose [*x*, *y*; *F*] = <sup>1</sup> <sup>0</sup> *F* (*y* + *θ*(*x* − *y*))*dθ* . Consider the following three numerical examples:

**Example 1.** *Assume that the motion of a particle in three dimensions is governed by a system of differential equations:*

$$\begin{aligned} f\_1'(x) - f\_1(x) - 1 &= 0, \\ f\_2'(y) - (e - 1)y - 1 &= 0, \\ f\_3'(z) - 1 &= 0, \end{aligned}$$

*with <sup>x</sup>*, *<sup>y</sup>*, *<sup>z</sup>* <sup>∈</sup> <sup>Ω</sup> *for <sup>f</sup>*1(0) = *<sup>f</sup>*2(0) = *<sup>f</sup>*3(0) = <sup>0</sup>*. A solution of the system is given for <sup>u</sup>* = (*x*, *<sup>y</sup>*, *<sup>z</sup>*)*<sup>T</sup> by function F* := (*f*1, *<sup>f</sup>*2, *<sup>f</sup>*3) : <sup>Ω</sup> <sup>→</sup> <sup>R</sup><sup>3</sup> *defined by*

$$F(u) = \left(e^x - 1, \frac{e - 1}{2}y^2 + y, z\right)^T.$$

*Its Fréchet-derivative F* (*u*) *is given by*

$$F'(u) = \begin{bmatrix} e^x & 0 & 0 \\ 0 & (e-1)y+1 & 0 \\ 0 & 0 & 1 \end{bmatrix}.$$

*Then, for x*<sup>∗</sup> = (0, 0, 0)*T, we deduce that w*0(*s*, *t*) = *w*1(*s*, *t*) = *<sup>L</sup>*<sup>0</sup> <sup>2</sup> (*<sup>s</sup>* <sup>+</sup> *<sup>t</sup>*) *and <sup>v</sup>*0(*t*) = <sup>1</sup> <sup>2</sup> (1 + *e* 1 *<sup>L</sup>*<sup>0</sup> )*, p* = 1 + <sup>1</sup> <sup>2</sup> (1 + *e* 1 *<sup>L</sup>*<sup>0</sup> )*, β* = <sup>1</sup> <sup>100</sup> *, where L*<sup>0</sup> = *e* − 1 *and M* = 2*. Then, using a definition of parameters, the calculated values are displayed as*

*r*<sup>∗</sup> = min{*r*1,*r*2,*r*3} = min{0.313084, 0.165881, 0.0715631} = 0.0715631.

**Example 2.** *Let X* = *C*[0, 1]*,* Ω = *U*¯ (*x*∗, 1)*. We consider the integral equation of the mixed Hammerstein-type [9] given by*

$$\mathbf{x}(\mathbf{s}) = \int\_0^1 k(\mathbf{s}, t) \frac{\mathbf{x}(t)^2}{2} dt,$$

*wherein the kernel k is the green function on the interval* [0, 1] × [0, 1] *defined by*

$$k(\mathbf{s}, t) = \begin{cases} |(1 - \mathbf{s})t, \ t \le \mathbf{s}, \\\ s(1 - t), \ s \le t. \end{cases}$$

*Solution x*∗(*s*) = 0 *is the same as the solution of equation F*(*x*) = 0*, where F* : C[0, 1] *is given by*

$$F(\mathbf{x})(s) = \mathbf{x}(s) - \int\_0^1 k(s, t) \frac{\mathbf{x}(t)^2}{2} dt.$$

*Observe that*

$$\left\| \left\| \int\_{0}^{1} k(s,t)dt \right\| \right\| \leq \frac{1}{8}.$$

*Then, we have that*

$$F'(x)y(s) = y(s) - \int\_0^1 k(s, t)x(t)dt, s$$

*and F* (*x*∗(*s*)) = *I. We can choose w*0(*s*, *t*) = *w*1(*s*, *t*) = *<sup>s</sup>*+*<sup>t</sup>* <sup>16</sup> *, <sup>v</sup>*0(*t*) = <sup>9</sup> <sup>16</sup> *, <sup>p</sup>* <sup>=</sup> <sup>25</sup> <sup>16</sup> *, <sup>β</sup>* <sup>=</sup> <sup>1</sup> <sup>100</sup> *and M* = 2*. Then, using a definition of parameters, the calculated values are displayed as*

$$r^\* = \min\{r\_1, r\_2, r\_3\} = \min\{4.4841, 2.3541, 1.0090\} = 1.0090.1$$

**Example 3.** *Let B*<sup>1</sup> = *B*<sup>2</sup> = *C*[0, 1] *be the spaces of continuous functions defined on the interval* [0, 1]*. Define function F on* Ω = *U*¯ (0, 1) *by*

$$F(\boldsymbol{\varrho})(\boldsymbol{x}) = \boldsymbol{\phi}(\boldsymbol{x}) - 10 \int\_0^1 \boldsymbol{x} \boldsymbol{\theta} \boldsymbol{\varrho}(\boldsymbol{\theta})^3 d\boldsymbol{\theta} \boldsymbol{\varrho}$$

*It follows that*

$$F'(\boldsymbol{\varrho}(\boldsymbol{\xi}))(\boldsymbol{x}) = \boldsymbol{\xi}(\boldsymbol{x}) - 30 \int\_0^1 \boldsymbol{x} \boldsymbol{\theta} \boldsymbol{\varrho}(\boldsymbol{\theta})^2 \boldsymbol{\xi}(\boldsymbol{\theta}) d\boldsymbol{\theta}, \text{ for each } \boldsymbol{\xi} \in \Omega.$$

*Then, for x*<sup>∗</sup> = 0*, we have that w*0(*s*, *t*) = *w*1(*s*, *t*) = *L*0(*s* + *t*) *and v*0(*t*) = 2*, p* = 3*, β* = <sup>1</sup> <sup>100</sup> *, where L*<sup>0</sup> = 15 *and M* = 2*. The parameters are displayed as*

*r*<sup>∗</sup> = min{*r*1,*r*2,*r*3} = min{0.013280, 0.0076012, 0.0034654} = 0.0034654.

### **4. Basins of Attraction**

The basin of attraction is a useful geometrical tool for assessing convergence regions of the iterative methods. These basins show us all the starting points that converge to any root when we apply an iterative method, so we can see in a visual way which points are good choices as starting points and which are not. We take the initial point as *<sup>z</sup>*<sup>0</sup> <sup>∈</sup> *<sup>R</sup>*, where *<sup>R</sup>* is a rectangular region in <sup>C</sup> containing all the roots of a poynomial *p*(*z*) = 0. The iterative methods starting at a point *z*<sup>0</sup> in a rectangle can converge to the zero of the function *p*(*z*) or eventually diverge. In order to analyze the basins, we consider the stopping criterion for convergence as 10−<sup>3</sup> up to a maximum of 25 iterations. If the mentioned tolerance is not attained in 25 iterations, the process is stopped with the conclusion that the iterative method starting at *z*<sup>0</sup> does not converge to any root. The following strategy is taken into account: A color is assigned to each starting point *z*<sup>0</sup> in the basin of attraction of a zero. If the iteration starting from the initial point *z*<sup>0</sup> converges, then it represents the basins of attraction with that particular color assigned to it and, if it fails to converge in 25 iterations, then it shows the black color.

We analyze the basins of attraction on the following two problems:

**Test problem 1.** Consider the polynomial *<sup>p</sup>*1(*z*) = *<sup>z</sup>*<sup>4</sup> <sup>−</sup> <sup>6</sup>*z*<sup>2</sup> <sup>+</sup> 8 that has four simple zeros {±2, <sup>±</sup>1.414 ...}. We use a grid of 400 <sup>×</sup> 400 points in a rectangle *<sup>R</sup>* <sup>∈</sup> <sup>C</sup> of size [−3, 3] <sup>×</sup> [−3, 3] and allocate the red, blue, green and yellow colors to the basins of attraction of these four zeros. Basins obtained for the method (3) are shown in Figure 1(i)–(iii) corresponding to *β* = 10−2, 10−4, 10−9. Observing the behavior of the method, we say that the divergent zones (black zones) are becoming smaller with the decreasing value of *β*.

**Figure 1.** Basins of attraction of method for polynomial *p*1(*z*).

**Problem 2**. Let us take the polynomial *<sup>p</sup>*2(*z*) = *<sup>z</sup>*<sup>3</sup> <sup>−</sup> *<sup>z</sup>* having zeros {0, <sup>±</sup>1}. In this case, we also consider a rectangle *<sup>R</sup>* = [−3, 3] <sup>×</sup> [−3, 3] <sup>∈</sup> <sup>C</sup> with 400 <sup>×</sup> 400 grid points and allocate the colors red, green and blue to each point in the basin of attraction of −1, 0 and 1, respectively. Basins obtained for the method (3) are displayed in Figure 2(i)–(iii) for the parameter values *β* = 10−2, 10−4, 10−9. Notice that the divergent zones are becoming smaller in size as parameter *β* assumes smaller values.

**Figure 2.** Basins of attraction of method for polynomial *p*2(*z*).

#### **5. Conclusions**

In this paper, the local convergence analysis of a derivative-free fifth order method is studied in Banach space. Unlike other techniques that rely on higher derivatives and Taylor series, we have used only derivative of order one in our approach. In this way, we have extended the usage of the considered method since the method can be applied to a wider class of functions. Another advantage of analyzing the local convergence is the computation of a convergence ball, uniqueness of the ball where the iterates lie and estimation of errors. Theoretical results of convergence thus achieved are confirmed through testing on some practical problems.

The basins of attraction have been analyzed by applying the method on some polynomials. From these graphics, one can easily visualize the behavior and suitability of any method. If we choose an initial guess *x*<sup>0</sup> in a domain where different basins of the roots meet each other, it is uncertain to predict which root is going to be reached by the iterative method that begins from *x*0. Thus, the choice of initial guess lying in such a domain is not suitable. In addition, black zones and the zones with different colors are not suitable to take the initial guess *x*<sup>0</sup> when we want to achieve a particular root. The most attractive pictures appear when we have very intricate boundaries of the basins. Such pictures belong to the cases where the method is more demanding with respect to the initial point.

**Author Contributions:** Methodology, D.K.; writing, review and editing, J.R.S.; investigation, J.R.S.; data curation, D.K.; conceptualization, L.J.; formal analysis, L.J.

**Funding:** This research received no external funding.

**Acknowledgments:** We would like to express our gratitude to the anonymous reviewers for their valuable comments and suggestions which have greatly improved the presentation of this paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **On Derivative Free Multiple-Root Finders with Optimal Fourth Order Convergence**

**Janak Raj Sharma 1,\*, Sunil Kumar <sup>1</sup> and Lorentz Jäntschi 2,3,\***


Received: 14 June 2020; Accepted: 2 July 2020; Published: 3 July 2020

**Abstract:** A number of optimal order multiple root techniques that require derivative evaluations in the formulas have been proposed in literature. However, derivative-free optimal techniques for multiple roots are seldom obtained. By considering this factor as motivational, here we present a class of optimal fourth order methods for computing multiple roots without using derivatives in the iteration. The iterative formula consists of two steps in which the first step is a well-known Traub–Steffensen scheme whereas second step is a Traub–Steffensen-like scheme. The Methodology is based on two steps of which the first is Traub–Steffensen iteration and the second is Traub–Steffensen-like iteration. Effectiveness is validated on different problems that shows the robust convergent behavior of the proposed methods. It has been proven that the new derivative-free methods are good competitors to their existing counterparts that need derivative information.

**Keywords:** multiple root solvers; composite method; weight-function; derivative-free method; optimal convergence

**MSC:** 65H05; 41A25; 49M15

### **1. Introduction**

Finding root of a nonlinear equation *ψ*(*u*) = 0 is a very important and interesting problem in many branches of science and engineering. In this work, we examine derivative-free numerical methods to find a multiple root (say, *α*) with multiplicity *μ* of the equation *ψ*(*u*) = 0 that means *<sup>ψ</sup>*(*j*)(*α*) = 0, *<sup>j</sup>* <sup>=</sup> 0, 1, 2, ..., *<sup>μ</sup>* <sup>−</sup> 1 and *<sup>ψ</sup>*(*μ*)(*α*) <sup>=</sup> 0. Newton's method [1] is the most widely used basic method for finding multiple roots, which is given by

$$u\_{k+1} = u\_k - \mu \frac{\psi(u\_k)}{\psi'(u\_k)}, \ k = 0, 1, 2, \dots, \ \mu = 2, 3, 4, \dots \tag{1}$$

A number of modified methods, with or without the base of Newton's method, have been elaborated and analyzed in literature, see [2–14]. These methods use derivatives of either first order or both first and second order in the iterative scheme. Contrary to this, higher order methods without derivatives to calculate multiple roots are yet to be examined. These methods are very useful in the problems where the derivative *ψ* is cumbersome to evaluate or is costly to compute. The derivative-free counterpart of classical Newton method (1) is the Traub–Steffensen method [15]. The method uses the approximation

$$
\psi'(u\_k) \simeq \frac{\psi(u\_k + \beta \psi(u\_k)) - \psi(u\_k)}{\beta \psi(u\_k)}, \quad \beta \in \mathbb{R} - \{0\},
$$

or

$$
\psi'(u\_k) \simeq \psi[v\_{k\prime} u\_k]\_{\prime\prime}
$$

for the derivative *<sup>ψ</sup>* in the Newton method (1). Here, *vk* <sup>=</sup> *uk* <sup>+</sup> *βψ*(*uk*) and *<sup>ψ</sup>*[*vk*, *uk*] = *<sup>ψ</sup>*(*vk* )−*ψ*(*uk* ) *vk*−*uk* is a first order divided difference. Thereby, the method (1) takes the form of the Traub–Steffensen scheme defined as

$$
\mu\_{k+1} = \mu\_k - \mu \frac{\psi(u\_k)}{\psi\left[v\_{k'}, u\_k\right]}.\tag{2}
$$

The Traub–Steffensen method (2) is a prominent improvement of the Newton method because it maintains the quadratic convergence without adding any derivative.

Unlike Newton-like methods, the Traub–Steffensen-like methods are difficult to construct. Recently, a family of two-step Traub–Steffensen-like methods with fourth order convergence has been proposed in [16]. In terms of computational cost, the methods of [16] use three function evaluations per iteration and thus possess optimal fourth order convergence according to Kung–Traub conjecture (see [17]). This hypothesis states that multi-point methods without memory requiring *m* functional evaluations can attain the convergence order 2*m*−<sup>1</sup> called optimal order. Such methods are usually known as optimal methods. Our aim in this work is to develop derivative-free multiple root methods of good computational efficiency, which is to say, the methods of higher convergence order with the amount of computational work as small as we please. Consequently, we introduce a class of Traub–Steffensen-like derivative-free fourth order methods that require three new pieces of information of the function *ψ* and therefore have optimal fourth order convergence according to Kung–Traub conjecture. The iterative formula consists of two steps with Traub–Steffensen iteration (2) in the first step, whereas there is Traub–Steffensen-like iteration in the second step. Performance is tested numerically on many problems of different kinds. Moreover, comparison of performance with existing modified Newton-like methods verifies the robust and efficient nature of the proposed methods.

We summarize the contents of paper. In Section 2, the scheme of fourth order iteration is formulated and convergence order is studied separately for different cases. The main result, showing the unification of different cases, is studied in Section 3. Section 4 contains the basins of attractors drawn to assess the convergence domains of new methods. In Section 5, numerical experiments are performed on different problems to demonstrate accuracy and efficiency of the methods. Concluding remarks about the work are reported in Section 6.

### **2. Development of a Novel Scheme**

Researchers have used different approaches to develop higher order iterative methods for solving nonlinear equations. Some of them are: Interpolation approach, Sampling approach, Composition approach, Geometrical approach, Adomian approach, and Weight-function approach. Of these, the Weight-function approach has been most popular in recent times; see, for example, Refs. [10,13,14,18,19] and references therein. Using this approach, we consider the following two-step iterative scheme for finding multiple root with multiplicity *μ* ≥ 2:

$$\begin{aligned} z\_k &= u\_k - \mu \frac{\psi(u\_k)}{\psi[v\_k, u\_k]}, \\ u\_{k+1} &= z\_k - G(h) \left( 1 + \frac{1}{y\_k} \right) \frac{\psi(u\_k)}{\psi[v\_k, u\_k]}, \end{aligned} \tag{3}$$

where *h* = *xk* 1+*xk* , *xk* = *<sup>μ</sup> <sup>ψ</sup>*(*zk* ) *<sup>ψ</sup>*(*uk* ), *yk* <sup>=</sup> *<sup>μ</sup> ψ*(*vk* ) *<sup>ψ</sup>*(*uk* ) and *<sup>G</sup>* : <sup>C</sup> <sup>→</sup> <sup>C</sup> is analytic in the neighborhood of zero. This iterative scheme is weighted by the factors *<sup>G</sup>*(*h*) and 1 + <sup>1</sup> *yk* , hence the name weight-factor or weight-function technique.

Note that *xk* and *yk* are one-to-*μ* multi-valued functions, so we consider their principal analytic branches [18]. Hence, it is convenient to treat them as the principal root. For example, let us consider the case of *xk*. The principal root is given by *xk* = exp <sup>1</sup> *μ*Log *<sup>ψ</sup>*(*zk* ) *ψ*(*uk* ) , with Log *<sup>ψ</sup>*(*zk* ) *ψ*(*uk* ) = Log, , *ψ*(*zk* ) *ψ*(*uk* ) , , + *i* Arg *<sup>ψ</sup>*(*zk* ) *ψ*(*uk* ) for <sup>−</sup>*<sup>π</sup>* <sup>&</sup>lt; Arg *<sup>ψ</sup>*(*zk* ) *ψ*(*uk* ) <sup>≤</sup> *<sup>π</sup>*; this convention of Arg(*p*) for *<sup>p</sup>* <sup>∈</sup> <sup>C</sup> agrees with that of Log[*p*] command of Mathematica [20] to be employed later in the sections of basins of attraction and numerical experiments. Similarly, we treat for *yk*.

In the sequel, we prove fourth order of convergence of the proposed iterative scheme (3). For simplicity, the results are obtained separately for the cases depending upon the multiplicity *μ*. Firstly, we consider the case *μ* = 2.

**Theorem 1.** *Assume that <sup>u</sup>* <sup>=</sup> *<sup>α</sup> is a zero with multiplicity <sup>μ</sup>* <sup>=</sup> <sup>2</sup> *of the function <sup>ψ</sup>*(*u*)*, where <sup>ψ</sup>* : <sup>C</sup> <sup>→</sup> <sup>C</sup> *is sufficiently differentiable in a domain containing α. Suppose that the initial point u*<sup>0</sup> *is closer to α; then, the order of convergence of the scheme* (3) *is at least four, provided that the weight-function G*(*h*) *satisfies the conditions G*(0) = 0*, G* (0) = 1*, G*(0) = 6 *and* |*G*(0)| < ∞*.*

**Proof.** Assume that *ε<sup>k</sup>* = *uk* − *α* is the error at the *k*-th stage. Expanding *ψ*(*uk*) about *α* using the Taylor series keeping in mind that *ψ*(*α*) = 0, *ψ* (*α*) = 0 and *<sup>ψ</sup>*(2)(*α*) <sup>=</sup> 0,, we have that

$$\psi(\mu\_k) = \frac{\psi^{(2)}(a)}{2!} \varepsilon\_k^2 \left( 1 + A\_1 \varepsilon\_k + A\_2 \varepsilon\_k^2 + A\_3 \varepsilon\_k^3 + A\_4 \varepsilon\_k^4 + \cdots \right), \tag{4}$$

where *Am* = 2! (2+*m*)! *ψ*(2+*m*)(*α*) *<sup>ψ</sup>*(2)(*α*) for *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>.

Similarly, Taylor series expansion of *ψ*(*vk*) is

$$\psi(v\_k) = \frac{\psi^{(2)}(a)}{2!} \varepsilon\_{v\_k}^2 \left( 1 + A\_1 \varepsilon\_{v\_k} + A\_2 \varepsilon\_{v\_k}^2 + A\_3 \varepsilon\_{v\_k}^3 + A\_4 \varepsilon\_{v\_k}^4 + \cdots \right), \tag{5}$$

.

where *<sup>ε</sup>vk* <sup>=</sup> *vk* <sup>−</sup> *<sup>α</sup>* <sup>=</sup> *<sup>ε</sup><sup>k</sup>* <sup>+</sup> *βψ*(2)(*α*) 2! *<sup>ε</sup>*<sup>2</sup> *k* 1 + *A*1*ε<sup>k</sup>* + *A*2*ε*<sup>2</sup> *<sup>k</sup>* + *<sup>A</sup>*3*ε*<sup>3</sup> *<sup>k</sup>* + *<sup>A</sup>*4*ε*<sup>4</sup> *<sup>k</sup>* <sup>+</sup> ··· By using (4) and (5) in the first step of (3), we obtain

$$\begin{split} \varepsilon\_{\mathbb{Z}\_{k}} &= z\_{k} - \alpha \\ &= \frac{1}{2} \left( \frac{\beta \mathfrak{g}^{(2)}(\boldsymbol{a})}{2} + A\_{1} \right) \varepsilon\_{k}^{2} - \frac{1}{16} \left( (\beta \mathfrak{g}^{(2)}(\boldsymbol{a}))^{2} - 8 \beta \mathfrak{g}^{(2)}(\boldsymbol{a}) A\_{1} + 12A\_{1}^{2} - 16A\_{2} \right) \varepsilon\_{k}^{3} + \frac{1}{64} \left( (\beta \mathfrak{g}^{(2)}(\boldsymbol{a}))^{3} - (\beta \mathfrak{g}^{(2)}(\boldsymbol{a}))^{2} - 8 \beta \mathfrak{g}^{(2)}(\boldsymbol{a}) A\_{2}^{2} - 16A\_{1}^{2} - 16A\_{2} \right) \varepsilon\_{k}^{4} \\ &- 20 \beta \mathfrak{g}^{(2)}(\boldsymbol{a}) A\_{1}^{2} + 72A\_{1}^{3} + 64 \beta \mathfrak{g}^{(2)}(\boldsymbol{a}) A\_{2} - 10A\_{1} \left( (\beta \mathfrak{g}^{(2)}(\boldsymbol{a}))^{2} + 16A\_{2} \right) + 96A\_{3} \right) \varepsilon\_{k}^{4} + O(\varepsilon\_{k}^{5}). \end{split} \tag{6}$$

In addition, we have that

$$
\psi(z\_k) = \frac{\psi^{(2)}(a)}{2!} \varepsilon\_{z\_k}^2 \left( 1 + A\_1 \varepsilon\_{z\_k} + A\_2 \varepsilon\_{z\_k}^2 + \cdots \right). \tag{7}
$$

Using (4), (5) and (7), we further obtain

*xk* <sup>=</sup> <sup>1</sup> 2 *βψ*(2)(*α*) <sup>2</sup> <sup>+</sup> *<sup>A</sup>*<sup>1</sup> *<sup>ε</sup><sup>k</sup>* <sup>−</sup> <sup>1</sup> 16 (*βψ*(2) (*α*))<sup>2</sup> <sup>−</sup> <sup>6</sup>*βψ*(2) (*α*)*A*<sup>1</sup> + 16(*A*<sup>2</sup> <sup>1</sup> − *A*2) *ε* 2 *<sup>k</sup>* <sup>+</sup> <sup>1</sup> 64 (*βψ*(2) (*α*))<sup>3</sup> <sup>−</sup> <sup>22</sup>*βψ*(2) (*α*)*A*<sup>2</sup> <sup>1</sup> + 4 29*A*<sup>3</sup> <sup>1</sup> <sup>+</sup> <sup>14</sup>*βψ*(2) (*α*)*A*<sup>2</sup> − 2*A*<sup>1</sup> 3(*βψ*(2) (*α*))<sup>2</sup> + 104*A*<sup>2</sup> + 96*A*<sup>3</sup> *ε* 3 *k* <sup>+</sup> <sup>1</sup> 256 212*βψ*(2) (*α*)*A*<sup>3</sup> <sup>1</sup> <sup>−</sup> <sup>800</sup>*A*<sup>4</sup> <sup>1</sup> + <sup>2</sup>*A*<sup>2</sup> 1(−7(*βψ*(2) (*α*))<sup>2</sup> + 1040*A*2) + 2*A*1(3(*βψ*(2) (*α*))<sup>3</sup> <sup>−</sup> <sup>232</sup>*βψ*(2) (*α*)*A*<sup>2</sup> <sup>−</sup> <sup>576</sup>*A*3) <sup>−</sup> ((*βψ*(2) (*α*))<sup>4</sup> + 8*βψ*(2) (*α*)*A*<sup>2</sup> + 640*A*<sup>2</sup> <sup>2</sup> <sup>−</sup> <sup>416</sup>*βψ*(2) (*α*)*A*<sup>3</sup> − 512*A*4) *ε* 4 *<sup>k</sup>* + *O*(*ε* 5 *k* ) (8)

and

$$\begin{split} y\_k &= 1 + \frac{\oint \phi^{(2)}(a)}{2} \varepsilon\_k \left( 1 + \frac{3}{2} A\_1 \varepsilon\_k + \frac{1}{4} \left( \beta \phi^{(2)}(a) A\_1 + 8 A\_2 \right) \varepsilon\_k^2 + \frac{1}{16} \left( 3 \beta \phi^{(2)}(a) A\_1^2 + 12 \beta \phi^{(2)}(a) A\_2 + 40 A\_3 \right) \varepsilon\_k^3 \\ &+ O(\varepsilon\_k^4) .\end{split} \tag{9}$$

Using (8), we have

*<sup>h</sup>* <sup>=</sup> <sup>1</sup> 2 *βψ*(2)(*α*) <sup>2</sup> <sup>+</sup> *<sup>A</sup>*<sup>1</sup> *<sup>ε</sup><sup>k</sup>* <sup>−</sup> <sup>1</sup> 8 (*βψ*(2) (*α*))<sup>2</sup> <sup>−</sup> *βψ*(2) (*α*)*A*<sup>1</sup> <sup>−</sup> <sup>2</sup>(4*A*<sup>2</sup> <sup>−</sup> <sup>5</sup>*A*<sup>2</sup> 1) *ε* 2 *<sup>k</sup>* <sup>+</sup> <sup>1</sup> 32 <sup>−</sup> *βψ*(2) (*α*)*A*<sup>2</sup> <sup>1</sup> + <sup>94</sup>*A*<sup>3</sup> 1 <sup>−</sup> <sup>4</sup>*A*1((*βψ*(2) (*α*))<sup>2</sup> + 34*A*2) + 2((*βψ*(2) (*α*))<sup>3</sup> + 6*βψ*(2) (*α*)*A*<sup>2</sup> + 24*A*3) *ε* 3 *<sup>k</sup>* <sup>+</sup> <sup>1</sup> 128 54*βψ*(2) (*α*)*A*<sup>3</sup> <sup>1</sup> <sup>−</sup> <sup>864</sup>*A*<sup>4</sup> 1 + *A*<sup>2</sup> <sup>1</sup>(1808*A*<sup>2</sup> <sup>−</sup> <sup>13</sup>(*βψ*(2) (*α*))2) + 2*A*1(6(*βψ*(2) (*α*))<sup>3</sup> <sup>−</sup> <sup>68</sup>*βψ*(2) (*α*)*A*<sup>2</sup> <sup>−</sup> <sup>384</sup>*A*3) <sup>−</sup> <sup>4</sup>((*βψ*(2) (*α*))<sup>4</sup> + 5(*βψ*(2) (*α*))2*A*<sup>2</sup> + 112*A*<sup>2</sup> <sup>2</sup> <sup>−</sup> <sup>28</sup>*βψ*(2) (*α*)*A*<sup>3</sup> − 64*A*4) *ε* 4 *<sup>k</sup>* + *O*(*ε* 5 *k* ). (10)

Taylor expansion of the weight function *G*(*h*) in the neighborhood of origin up to third-order terms is given by

$$G(h) \approx G(0) + hG'(0) + \frac{1}{2}h^2 G''(0) + \frac{1}{6}h^2 G''''(0). \tag{11}$$

Using (4)–(11) in the last step of (3), we have

$$\varepsilon\_{k+1} = -G(0)\varepsilon\_k + \frac{1}{4} \left( \beta \mathfrak{q}^{(2)}(a)(1 + 2G(0) - G'(0)) + 2(1 + G(0) - G'(0))A\_1 \right) \varepsilon\_k^2 + \sum\_{n=1}^2 \phi\_k \varepsilon\_k^{n+2} + O(\varepsilon\_k^5), \tag{12}$$

where *φ<sup>n</sup>* = *φn*(*β*, *A*1, *A*2, *A*3, *G*(0), *G* (0), *G*(0), *G*(0)), *n* = 1, 2. The expressions of *φ*<sup>1</sup> and *φ*<sup>2</sup> being very lengthy have not been produced explicitly.

We can obtain at least fourth order convergence if we set coefficients of *εk*, *ε*<sup>2</sup> *<sup>k</sup>* and *<sup>ε</sup>*<sup>3</sup> *<sup>k</sup>* simultaneously equal to zero. Then, some simple calculations yield

$$G(0) = 0, \ G'(0) = 1, \ G''(0) = 6. \tag{13}$$

Using (13) in (12), we will obtain final error equation

$$\begin{split} \varepsilon\_{k+1} &= -\frac{1}{192} \left( \frac{\delta \psi^{(2)}(a)}{2} + A\_1 \right) \big( (G'''(0) - 42)(\delta \psi^{(2)}(a))^2 + 4(G'''(0) - 45)\delta \psi^{(2)}(a)A\_1 + 4(G'''(0) - 63)A\_1^2 \\ &+ 48A\_2) \varepsilon\_k^4 + O(\varepsilon\_k^5). \end{split} \tag{14}$$

Thus, the theorem is proved.

Next, we prove the following theorem for case *μ* = 3.

**Theorem 2.** *Using assumptions of Theorem 1, the convergence order of scheme* (3) *for the case μ* = 3 *is at least* 4*, if G*(0) = 0*, G* (0) = <sup>3</sup> <sup>2</sup> *, G*(0) = 9 *and* |*G*(0)| < ∞*.*

**Proof.** Taking into account that *ψ*(*α*) = 0, *ψ* (*α*) = 0, *<sup>ψ</sup>*(*α*) = 0 and *<sup>ψ</sup>*(3)(*α*) <sup>=</sup> 0, the Taylor series development of *ψ*(*uk*) about *α* gives

$$\psi(u\_k) = \frac{\psi^{(3)}(a)}{3!} \varepsilon\_k^3 \left( 1 + B\_1 \varepsilon\_k + B\_2 \varepsilon\_k^2 + B\_3 \varepsilon\_k^3 + B\_4 \varepsilon\_k^4 + \dotsb \right),\tag{15}$$

where *Bm* = 3! (3+*m*)! *ψ*(3+*m*)(*α*) *<sup>ψ</sup>*(3)(*α*) for *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>. Expanding *ψ*(*vk*) about *α*

$$\psi(\upsilon\_k) = \frac{\psi^{(3)}(a)}{3!} \varepsilon\_{\upsilon\_k}^3 \left( 1 + B\_1 \varepsilon\_{\upsilon\_k} + B\_2 \varepsilon\_{\upsilon\_k}^2 + B\_3 \varepsilon\_{\upsilon\_k}^3 + B\_4 \varepsilon\_{\upsilon\_k}^4 + \dotsb \right), \tag{16}$$

where *<sup>ε</sup>vk* <sup>=</sup> *vk* <sup>−</sup> *<sup>α</sup>* <sup>=</sup> *<sup>ε</sup><sup>k</sup>* <sup>+</sup> *βψ*(3)(*α*) 3! *<sup>ε</sup>*<sup>3</sup> *k* 1 + *B*1*ε<sup>k</sup>* + *B*2*ε*<sup>2</sup> *<sup>k</sup>* + *<sup>B</sup>*3*ε*<sup>3</sup> *<sup>k</sup>* + *<sup>B</sup>*4*ε*<sup>4</sup> *<sup>k</sup>* <sup>+</sup> ··· . Then, using (15) and (16) in the first step of (3), we obtain

$$\begin{split} \varepsilon\_{\bar{z}\_{k}} &= \bar{z}\_{k} - \alpha \\ &= \frac{B\_{1}}{3} \varepsilon\_{k}^{2} + \frac{1}{18} \Big( 3\beta\mathfrak{\boldsymbol{\uprho}}^{(3)}(a) - 8B\_{1}^{2} + 12B\_{2} \Big) \varepsilon\_{k}^{3} + \frac{1}{27} \Big( 16B\_{1}^{3} + 3B\_{1} \Big( 2\beta\mathfrak{\boldsymbol{\uprho}}^{(3)}(a) - 13B\_{2} \Big) + 27B\_{3} \Big) \varepsilon\_{k}^{4} + O(\varepsilon\_{k}^{5}). \end{split} \tag{17}$$

Expansion of *ψ*(*zk*) about *α* yields

$$
\psi(z\_k) = \frac{\psi^{(3)}(a)}{3!} \varepsilon\_{z\_k}^3 \left( 1 + B\_1 \varepsilon\_{z\_k} + B\_2 \varepsilon\_{z\_k}^2 + B\_3 \varepsilon\_{z\_k}^3 + B\_4 \varepsilon\_{z\_k}^4 + \dotsb \right). \tag{18}
$$

Then, from (15), (16), and (18), it follows that

$$\begin{split} x\_k &= \frac{B\_1}{3} \boldsymbol{\upsilon\_k} + \frac{1}{18} \left( 3\boldsymbol{\delta\phi}^{(3)}(\boldsymbol{a}) - 10\boldsymbol{\delta\_1^2} + 12\boldsymbol{\delta\_2} \right) \boldsymbol{\varepsilon\_k^2} + \frac{1}{54} \left( 46\boldsymbol{\delta\_1^3} + 3\boldsymbol{\delta\_1} \left( 3\boldsymbol{\phi}^{(3)}(\boldsymbol{a})\boldsymbol{\beta} - 32\boldsymbol{\delta\_2} \right) + 54\boldsymbol{\delta\_3} \right) \boldsymbol{\varepsilon\_k^3} - \frac{1}{486} \left( 610\boldsymbol{\delta\_1^4} + 3\boldsymbol{\delta\_2^3} \left( 3\boldsymbol{\delta\_3^2} - 3\boldsymbol{\delta\_3} \right) \right) \boldsymbol{\varepsilon\_k^2} \\ &- B\_1^2 (1818\boldsymbol{\delta\_2} - 27\boldsymbol{\delta} \boldsymbol{\phi}^{(3)}(\boldsymbol{a})) + 1188B\_1\boldsymbol{\delta\_3} + 9 \left( (\boldsymbol{\delta\phi}^{(3)}(\boldsymbol{a}))^2 - 15\boldsymbol{\delta} \boldsymbol{\phi}^{(3)}(\boldsymbol{a})\boldsymbol{\delta\_2} + 72\boldsymbol{\delta\_2^2} - 72\boldsymbol{\delta\_4} \right) \boldsymbol{\varepsilon\_k^4} + O(\boldsymbol{\varepsilon\_k^5}) \end{split} \tag{19}$$

and

$$y\_k = 1 + \frac{\beta \mathfrak{p}^{(3)}(\alpha)}{3!} \varepsilon\_k^2 \left( 1 + \frac{4}{3} B\_1 \varepsilon\_k + \frac{5}{3} B\_2 \varepsilon\_k^2 + \frac{1}{18} (\beta \mathfrak{p}^{(3)}(\alpha) B\_1 + 3 \epsilon b B\_3) \varepsilon\_k^3 + O(\varepsilon\_k^4) \right). \tag{20}$$

Using (19), we have

$$\begin{split} h &= \frac{B\_1}{3} \boldsymbol{\iota}\_k + \frac{1}{6} \left( \boldsymbol{\beta} \boldsymbol{\uprho}^{(3)}(\boldsymbol{a}) - 4 \boldsymbol{B}\_1^2 + 4 \boldsymbol{B}\_2 \right) \boldsymbol{\iota}\_k^2 + \frac{1}{54} \left( 6 \boldsymbol{\mathcal{S}} \boldsymbol{B}\_1^{\boldsymbol{\mathfrak{I}}} + 3 \boldsymbol{B}\_1 \left( \boldsymbol{\mathcal{S}} \boldsymbol{\uprho}^{(3)}(\boldsymbol{a}) - 4 \boldsymbol{0} \boldsymbol{B}\_2 \right) + 5 \boldsymbol{4} \boldsymbol{B}\_3 \right) \boldsymbol{\iota}\_k^3 - \frac{1}{2916} \left( 6792 \boldsymbol{B}\_1^4 + 4 \boldsymbol{B}\_2 \boldsymbol{\mathfrak{I}} \boldsymbol{\uprho}^{(3)}(\boldsymbol{a}) - 4 \boldsymbol{0} \boldsymbol{B}\_3^2 \right) \boldsymbol{\iota}\_k^4 \\ &- 108 \boldsymbol{B}\_1^2 (159 \boldsymbol{B}\_2 + 2 \boldsymbol{\mathfrak{f}} \boldsymbol{\uprho}^{(3)}(\boldsymbol{a})) + 9072 \boldsymbol{B}\_1 \boldsymbol{B}\_3 - 27 \left( -5 (\boldsymbol{\mathcal{S}} \boldsymbol{\uprho}^{(3)}(\boldsymbol{a}))^2 + 6 \boldsymbol{\mathcal{S}} \boldsymbol{\uprho}^{(3)}(\boldsymbol{a}) \boldsymbol{B}\_2 - 192 \boldsymbol{B}\_2^2 + 144 \boldsymbol{B}\_4 \right) \boldsymbol{\iota}\_k^4 \\ &+ O(\boldsymbol{\mathcal{S}}\_k^5). \end{split} \tag{21}$$

Developing weight function *G*(*h*) about origin by the Taylor series expansion,

$$G(h) \approx G(0) + hG'(0) + \frac{1}{2}h^2 G''(0) + \frac{1}{6}h^3 G''''(0). \tag{22}$$

By using (15)–(22) in the last step of (3), we have

$$\varepsilon\_{k+1} = -\frac{2G(0)}{3}\varepsilon\_k + \frac{1}{9}(3 + 2G(0) - 2G'(0))B\_1\varepsilon\_k^2 + \sum\_{n=1}^2 \phi\_n \varepsilon\_k^{n+2} + O(\varepsilon\_k^5),\tag{23}$$

where *ϕ<sup>n</sup>* = *ϕn*(*β*, *B*1, *B*2, *B*3, *G*(0), *G* (0), *G*(0), *G*(0)), *n* = 1, 2.

To obtain fourth order convergence, it is sufficient to set coefficients of *εk*, *ε*<sup>2</sup> *<sup>k</sup>*, and *<sup>ε</sup>*<sup>3</sup> *<sup>k</sup>* simultaneously equal to zero. This process will yield

$$G(0) = 0, \; G'(0) = \frac{3}{2}, \; G''(0) = 9. \tag{24}$$

Then, error equation (23) is given by

$$\varepsilon\_{k+1} = -\frac{B\_1}{972} \left( 27 \beta \psi^{(3)}(a) + 4 \left( G^{\prime \prime \prime}(0) - 99 \right) B\_1^2 + 108 B\_2 \right) \varepsilon\_k^4 + O(\varepsilon\_k^5). \tag{25}$$

Hence, the result is proved.

**Remark 1.** *We can observe from the above results that the number of conditions on G*(*h*) *is 3 corresponding to the cases μ* = 2, 3 *to attain the fourth order convergence of the method* (3)*. These cases also satisfy common conditions: G*(0) = 0*, G* (0) = *<sup>μ</sup>* <sup>2</sup> *, G*(0) = 3*μ. Their error equations also contain the term involving the parameter β. However, for the cases μ* ≥ 4*, it has been seen that the error equation in each such case does not contain β term. We shall prove this fact in the next section.*

### **3. Main Result**

We shall prove the convergence order of scheme (3) for the multiplicity *μ* ≥ 4 by the following theorem:

**Theorem 3.** *Using assumptions of Theorem 1, the convergence order of scheme* (3) *for μ* ≥ 4 *is at least four, provided that G*(0) = 0*, G* (0) = *<sup>μ</sup>* <sup>2</sup> *, G*(0) = 3*μ and* |*G*(0)| < ∞*. Moreover, error in the scheme is given by*

$$\varepsilon\_{k+1} = \frac{1}{6\mu^4} \left( (3\mu(19+\mu) - 2G^{\prime\prime\prime}(0))F\_1^3 - 6\mu^2 F\_1 F\_2 \right) \varepsilon\_k^4 + O(\varepsilon\_k^5),$$

*where Fm* = *<sup>μ</sup>*! (*μ*+*m*)! *ψ*(*μ*+*m*)(*α*) *<sup>ψ</sup>*(*μ*)(*α*) *for m* <sup>∈</sup> <sup>N</sup>*.*

**Proof.** Taking into account that *<sup>ψ</sup>*(*i*)(*α*) = 0, *<sup>i</sup>* <sup>=</sup> 0, 1, 2, ..., *<sup>μ</sup>* <sup>−</sup> 1 and *<sup>ψ</sup>*(*μ*)(*α*) <sup>=</sup> 0, then Taylor series expansion of *ψ*(*uk*) about *α* is

$$\psi(\mu\_k) = \frac{\psi^{(\mu)}(a)}{\mu!} \varepsilon\_k^{\mu} \left( 1 + F\_1 \varepsilon\_k + F\_2 \varepsilon\_k^2 + F\_3 \varepsilon\_k^3 + F\_4 \varepsilon\_k^4 + \dotsb \right). \tag{26}$$

Taylor expansion of *ψ*(*vk*) about *α* yields

$$\psi(\upsilon\_k) = \frac{\psi^{(\mu)}(a)}{\mu!} \varepsilon\_{\upsilon\_k}^{\mu} \left( 1 + F\_1 \varepsilon\_{\upsilon\_k} + F\_2 \varepsilon\_{\upsilon\_k}^2 + F\_3 \varepsilon\_{\upsilon\_k}^3 + F\_4 \varepsilon\_{\upsilon\_k}^4 + \dotsb \right), \tag{27}$$

.

where *<sup>ε</sup>vk* <sup>=</sup> *vk* <sup>−</sup> *<sup>α</sup>* <sup>=</sup> *<sup>ε</sup><sup>k</sup>* <sup>+</sup> *βψ*(*μ*)(*α*) *<sup>μ</sup>*! *ε μ k* 1 + *F*1*ε<sup>k</sup>* + *F*2*ε*<sup>2</sup> *<sup>k</sup>* + *<sup>F</sup>*3*ε*<sup>3</sup> *<sup>k</sup>* + *<sup>F</sup>*4*ε*<sup>4</sup> *<sup>k</sup>* <sup>+</sup> ··· Using (26) and (27) in the first step of (3), we obtain

$$\varepsilon\_{z\_k} = \begin{cases} \frac{F\_1}{4} \varepsilon\_k^2 + \frac{1}{16} \left( 8F\_2 - 5F\_1^2 \right) \varepsilon\_k^3 + \frac{1}{64} \left( 4\theta\psi^{(4)}(a) + 25F\_1^3 - 64F\_1F\_2 + 48F\_3 \right) \varepsilon\_k^4 + O(\varepsilon\_k^5), \text{ if } \mu = 4, \\\frac{F\_1}{\mu} \varepsilon\_k^2 + \frac{1}{\mu^2} \left( 2\mu F\_2 - (1+\mu)F\_1^2 \right) \varepsilon\_k^3 + \frac{1}{\mu^3} \left( (1+\mu)^2 F\_1^3 - \mu(4+3\mu)F\_1F\_2 + 3\mu^2 F\_3 \right) \varepsilon\_k^4 + O(\varepsilon\_k^5), \text{ if } \mu \ge 5, \end{cases} \tag{28}$$

where *εzk* = *zk* − *α*.

Expansion of *ψ*(*zk*) around *α* yields

$$\psi(z\_k) = \frac{\psi^{(\mu)}(a)}{\mu!} \varepsilon\_{z\_k}^{\mu} \left( 1 + F\_1 \varepsilon\_{z\_k} + F\_2 \varepsilon\_{z\_k}^2 + F\_3 \varepsilon\_{z\_k}^3 + F\_4 \varepsilon\_{z\_k}^4 + \dotsb \right). \tag{29}$$

Using (26), (27) and (29), we have that

*xk* = ⎧ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ *F*1 <sup>4</sup> *<sup>ε</sup><sup>k</sup>* <sup>+</sup> <sup>1</sup> 8 <sup>4</sup>*F*<sup>2</sup> <sup>−</sup> <sup>3</sup>*F*<sup>2</sup> 1 *ε*2 *<sup>k</sup>* <sup>+</sup> <sup>1</sup> 128 8*βψ*(4)(*α*) + 67*F*<sup>3</sup> <sup>1</sup> − 152*F*1*F*<sup>2</sup> + 96*F*<sup>3</sup> *ε*3 *<sup>k</sup>* <sup>+</sup> <sup>1</sup> <sup>768</sup> <sup>−</sup> <sup>543</sup>*F*<sup>4</sup> <sup>1</sup> + <sup>1740</sup>*F*<sup>2</sup> <sup>1</sup> *F*<sup>2</sup> <sup>+</sup>4*F*1(11*βψ*(4)(*α*) <sup>−</sup> <sup>312</sup>*F*3) + <sup>96</sup>(−7*F*<sup>2</sup> <sup>2</sup> + 8*F*4) *ε*4 *<sup>k</sup>* <sup>+</sup> *<sup>O</sup>*(*ε*<sup>5</sup> *<sup>k</sup>*), if *μ* = 4, *F*1 <sup>5</sup> *<sup>ε</sup><sup>k</sup>* <sup>+</sup> <sup>1</sup> 25 <sup>10</sup>*F*<sup>2</sup> <sup>−</sup> <sup>7</sup>*F*<sup>2</sup> 1 *ε*2 *<sup>k</sup>* <sup>+</sup> <sup>1</sup> 125 46*F*<sup>3</sup> <sup>1</sup> − 110*F*1*F*<sup>2</sup> + 75*F*<sup>3</sup> *ε*3 *<sup>k</sup>* <sup>+</sup> *βψ*(5)(*α*) <sup>60</sup> <sup>−</sup> <sup>294</sup> <sup>625</sup> *<sup>F</sup>*<sup>4</sup> <sup>1</sup> <sup>+</sup> <sup>197</sup> <sup>125</sup> *<sup>F</sup>*<sup>2</sup> <sup>1</sup> *<sup>F</sup>*<sup>2</sup> <sup>−</sup> <sup>16</sup> <sup>25</sup> *<sup>F</sup>*<sup>2</sup> 2 −6 <sup>5</sup> *<sup>F</sup>*1*F*<sup>3</sup> <sup>+</sup> <sup>4</sup> <sup>5</sup> *F*<sup>4</sup> *ε*4 *<sup>k</sup>* <sup>+</sup> *<sup>O</sup>*(*ε*<sup>5</sup> *<sup>k</sup>*), if *μ* = 5, *F*1 *<sup>μ</sup> <sup>ε</sup><sup>k</sup>* <sup>+</sup> <sup>1</sup> *μ*2 <sup>2</sup>*μF*<sup>2</sup> <sup>−</sup> (<sup>2</sup> <sup>+</sup> *<sup>μ</sup>*)*F*<sup>2</sup> 1 *ε*2 *<sup>k</sup>* <sup>+</sup> <sup>1</sup> 2*μ*<sup>3</sup> (7 + 7*μ* + 2*μ*2)*F*<sup>3</sup> <sup>1</sup> <sup>−</sup> <sup>2</sup>*μ*(<sup>7</sup> <sup>+</sup> <sup>3</sup>*μ*)*F*1*F*<sup>2</sup> <sup>+</sup> <sup>6</sup>*μ*2*F*<sup>3</sup> *ε*3 *k* <sup>−</sup> <sup>1</sup> 6*μ*<sup>4</sup> (34 + 51*μ* + 29*μ*<sup>2</sup> + 6*μ*3)*F*<sup>4</sup> <sup>1</sup> <sup>−</sup> <sup>6</sup>*μ*(<sup>17</sup> <sup>+</sup> <sup>16</sup>*<sup>μ</sup>* <sup>+</sup> <sup>4</sup>*μ*2)*F*<sup>2</sup> <sup>1</sup> *<sup>F</sup>*<sup>2</sup> + <sup>12</sup>*μ*2(<sup>3</sup> + *<sup>μ</sup>*)*F*<sup>2</sup> 2 +12*μ*2(5 + 2*μ*)*F*1*F*<sup>3</sup> *ε*4 *<sup>k</sup>* <sup>+</sup> *<sup>O</sup>*(*ε*<sup>5</sup> *<sup>k</sup>*), if *μ* ≥ 6 (30)

and

$$y\_k = 1 + \frac{\beta \varphi^{(\mu)}(\boldsymbol{\mu})}{\mu!} \varepsilon\_k^{\mu - 1} \left( 1 + \frac{(\mu + 1)F\_1}{\mu} \varepsilon\_k + \frac{(\mu + 2)F\_2}{\mu} \varepsilon\_k^2 + \frac{(\mu + 3)F\_3}{\mu} \varepsilon\_k^3 + \frac{(\mu + 4)F\_4}{\mu} \varepsilon\_k^4 + O(\varepsilon\_k^5) \right). \tag{31}$$

Using (30), we obtain that

*h* = ⎧ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ *F*1 <sup>4</sup> *<sup>ε</sup><sup>k</sup>* <sup>+</sup> <sup>1</sup> 16 <sup>8</sup>*μF*<sup>2</sup> <sup>−</sup> <sup>7</sup>*F*<sup>2</sup> 1 *ε*2 *<sup>k</sup>* <sup>+</sup> <sup>1</sup> 128 8*βψ*(4)(*α*) + 93*F*<sup>3</sup> <sup>1</sup> − 184*F*1*F*<sup>2</sup> + 96*F*<sup>3</sup> *ε*3 *k* + <sup>−</sup> <sup>303</sup> <sup>256</sup> *<sup>F</sup>*<sup>4</sup> <sup>1</sup> <sup>+</sup> <sup>213</sup> <sup>64</sup> *<sup>F</sup>*<sup>2</sup> <sup>1</sup> *<sup>F</sup>*<sup>2</sup> <sup>−</sup> <sup>9</sup> <sup>8</sup> *<sup>F</sup>*<sup>2</sup> <sup>2</sup> <sup>+</sup> *<sup>F</sup>*1( <sup>5</sup> <sup>192</sup> *βψ*(4)(*α*) <sup>−</sup> <sup>2</sup>*F*3) + *<sup>F</sup>*<sup>4</sup> *ε*4 *<sup>k</sup>* <sup>+</sup> *<sup>O</sup>*(*ε*<sup>5</sup> *<sup>k</sup>*), if *μ* = 4, *F*1 <sup>5</sup> *<sup>ε</sup><sup>k</sup>* <sup>+</sup> <sup>1</sup> 25 <sup>10</sup>*F*<sup>2</sup> <sup>−</sup> <sup>8</sup>*F*<sup>2</sup> 1 *ε*2 *<sup>k</sup>* <sup>+</sup> <sup>1</sup> 125 61*F*<sup>3</sup> <sup>1</sup> − 130*F*1*F*<sup>2</sup> + 75*F*<sup>3</sup> *ε*3 *k* + <sup>−</sup> <sup>457</sup> <sup>625</sup> *<sup>F</sup>*<sup>4</sup> <sup>1</sup> <sup>+</sup> <sup>11</sup> <sup>5</sup> *<sup>F</sup>*<sup>2</sup> <sup>1</sup> *<sup>F</sup>*<sup>2</sup> <sup>−</sup> <sup>36</sup> <sup>25</sup> *<sup>F</sup>*1*F*<sup>3</sup> <sup>+</sup> <sup>1</sup> <sup>60</sup> (*βψ*(5)(*α*) <sup>−</sup> <sup>48</sup>*F*<sup>2</sup> <sup>2</sup> + 48*F*4) *ε*4 *<sup>k</sup>* <sup>+</sup> *<sup>O</sup>*(*ε*<sup>5</sup> *<sup>k</sup>*), if *μ* = 5, *F*1 *<sup>μ</sup> <sup>ε</sup><sup>k</sup>* <sup>+</sup> <sup>1</sup> *μ*2 <sup>2</sup>*μF*<sup>2</sup> <sup>−</sup> (<sup>3</sup> <sup>+</sup> *<sup>μ</sup>*)*F*<sup>2</sup> 1 *ε*2 *<sup>k</sup>* <sup>+</sup> <sup>1</sup> 2*μ*<sup>3</sup> (17 + 11*μ* + 2*μ*2)*F*<sup>3</sup> <sup>1</sup> <sup>−</sup> <sup>2</sup>*μ*(<sup>11</sup> <sup>+</sup> <sup>3</sup>*μ*)*F*1*F*<sup>2</sup> <sup>+</sup> <sup>6</sup>*μ*2*F*<sup>3</sup> *ε*3 *k* <sup>−</sup> <sup>1</sup> 6*μ*<sup>4</sup> (142 + 135*μ* + 47*μ*<sup>2</sup> + 6*μ*3)*F*<sup>4</sup> <sup>1</sup> <sup>−</sup> <sup>6</sup>*μ*(<sup>45</sup> <sup>+</sup> <sup>26</sup>*<sup>μ</sup>* <sup>+</sup> <sup>4</sup>*μ*2)*F*<sup>2</sup> <sup>1</sup> *<sup>F</sup>*<sup>2</sup> + <sup>12</sup>*μ*2(<sup>5</sup> + *<sup>μ</sup>*)*F*<sup>2</sup> 2 +24*μ*2(4 + *μ*)*F*1*F*<sup>3</sup> *ε*4 *<sup>k</sup>* <sup>+</sup> *<sup>O</sup>*(*ε*<sup>5</sup> *<sup>k</sup>*), if *μ* ≥ 6. (32)

Developing weight function *G*(*h*) about origin by the Taylor series expansion,

$$G(h) \approx G(0) + hG'(0) + \frac{1}{2}h^2 G''(0) + \frac{1}{6}h^3 G''''(0). \tag{33}$$

Using (26)–(33) in the last step of (3), we get

$$\varepsilon\_{k+1} = -\frac{2G(0)}{\mu}\varepsilon\_k + \frac{1}{\mu^2} \left( (2G(0) - 2G'(0) + \mu)F\_1 \right) \varepsilon\_k^2 + \sum\_{n=1}^2 \chi\_n \varepsilon\_k^{n+2} + O(\varepsilon\_k^5),\tag{34}$$

where *χ<sup>n</sup>* = *χn*(*β*, *F*1, *F*2, *F*3, *G*(0), *G* (0), *G*(0), *G*(0)) when *μ* = 4, 5 and *χ<sup>n</sup>* = *χn*(*F*1, *F*2, *F*3, *G*(0), *G* (0), *G*(0), *G*(0)) when *μ* ≥ 6 for *n* = 1, 2.

The fourth order convergence can be attained if we put coefficients of *εk*, *ε*<sup>2</sup> *<sup>k</sup>* and *<sup>ε</sup>*<sup>3</sup> *<sup>k</sup>* simultaneously equal to zero. Then, the resulting equations yield

$$G(0) = 0, \; G'(0) = \frac{\mu}{2}, \; G''(0) = 3\mu. \tag{35}$$

As a result, the error equation is given by

$$\varepsilon\_{k+1} = \frac{1}{6\mu^4} \left( (3\mu(19+\mu) - 2G'''(0))F\_1^3 - 6\mu^2 F\_1 F\_2 \right) \varepsilon\_k^4 + O(\varepsilon\_k^5). \tag{36}$$

This proves the result.

**Remark 2.** *The proposed scheme* (3) *achieves fourth-order convergence with the conditions of weight-function G*(*h*) *as shown in Theorems 1–3. This convergence rate is attained by using only three functional evaluations viz. ψ*(*uk*)*, ψ*(*vk*) *and ψ*(*zk*) *per iteration. Therefore, the iterative scheme* (3) *is optimal according to Kung–Traub conjecture [17].*

**Remark 3.** *Note that the parameter β, which is used in vk, appears only in the error equations of the cases μ* = 2, 3 *but not for μ* ≥ 4 *(see Equation* (36)*). However, for μ* ≥ 4*, we have observed that this parameter appears in the terms of ε*<sup>5</sup> *<sup>k</sup> and higher order. Such terms are difficult to compute in general. However, we do not need these in order to show the required fourth order of convergence. Note also that Theorems 1–3 are presented to show the difference in error expressions. Nevertheless, the weight function G*(*h*) *satisfies the common conditions G*(0) = 0*, G* (0) = *<sup>μ</sup>* <sup>2</sup> *, G*(0) = 3*μ for every μ* ≥ 2*.*

### *Some Special Cases*

Based on various forms of function *G*(*h*) that satisfy the conditions of Theorem 3, numerous special cases of the family (3) can be explored. The following are some simple forms:

$$\begin{aligned} \text{(1)} \quad &G(h) = \frac{\mu \, h (1 + 3 \, h)}{2}, \quad \text{(2)} \quad \text{(3)} \quad &G(h) = \frac{\mu \, h}{2 - 6 \, h}, \quad \text{(3)} \quad \text{(4)} \quad &G(h) = \frac{\mu \, h (\mu - 2h)}{2(\mu - (2 + 3\mu)h + 2\mu h^2)}, \\\text{(4)} \quad &G(h) = \frac{\mu \, h (3 - h)}{6 - 20h}. \end{aligned}$$

The corresponding method to each of the above forms can be expressed as follows: Method 1 (M1) :

$$u\_{k+1} = z\_k - \frac{\mu \, h(1+3\, h)}{2} \left(1 + \frac{1}{y\_k}\right) \frac{\psi(u\_k)}{\psi[v\_{k'}u\_k]}.$$

Method 2 (M2) :

$$
\mu\_{k+1} = z\_k - \frac{\mu \, h}{2 - 6h} \left( 1 + \frac{1}{y\_k} \right) \frac{\psi(u\_k)}{\psi[v\_{k'} u\_k]}.
$$

Method 3 (M3) :

$$
\mu\_{k+1} = z\_k - \frac{\mu \, h(\mu - 2h)}{2(\mu - (2 + 3\mu)h + 2\mu h^2)} \left( 1 + \frac{1}{y\_k} \right) \frac{\psi(\mu\_k)}{\psi[v\_{k\prime}\mu\_k]}.
$$

Method 4 (M4) :

$$
\mu\_{k+1} = z\_k - \frac{\mu \, h(\mathfrak{H} - h)}{6 - 20h} \left( 1 + \frac{1}{y\_k} \right) \frac{\psi(\mu\_k)}{\psi[v\_{k\prime} u\_k]}.
$$

Note that, in all the above cases, *zk* has the following form:

$$z\_k = u\_k - \mu \frac{\psi(u\_k)}{\psi[v\_{k\prime} u\_k]}.$$

### **4. Basins of Attraction**

In this section, we present complex geometry of the above considered method with a tool, namely basin of attraction, by applying the method to some complex polynomials *ψ*(*z*). Basin of attraction of the root is an important geometrical tool for comparing convergence regions of the iterative methods [21–23]. To start with, let us recall some basic ideas concerned with this graphical tool.

Let *<sup>R</sup>* : <sup>C</sup> <sup>→</sup> <sup>C</sup> be a rational mapping on the Riemann sphere. We define orbit of a point *<sup>z</sup>*<sup>0</sup> <sup>∈</sup> <sup>C</sup> as the set {*z*0, *<sup>R</sup>*(*z*0), *<sup>R</sup>*2(*z*0), ... , *<sup>R</sup>n*(*z*0), ...}. A point *<sup>z</sup>*<sup>0</sup> <sup>∈</sup> <sup>C</sup> is a fixed point of the rational function *R* if it satisfies the equation *R*(*z*0) = *z*0. A point *z*<sup>0</sup> is said to be periodic with period *m* > 1 if *<sup>R</sup>m*(*z*0) = *<sup>z</sup>*0, where *<sup>m</sup>* is the smallest such integer. A point *<sup>z</sup>*<sup>0</sup> is called attracting if <sup>|</sup>*<sup>R</sup>* (*z*0)| < 1, repelling if |*R* (*z*0)| > 1, neutral if |*R* (*z*0)| = 1 and super attracting if |*R* (*z*0)| = 0. Assume that *z*<sup>∗</sup> *<sup>ψ</sup>* is an attracting fixed point of the rational map *R*. Then, the basin of attraction of *z*∗ *<sup>ψ</sup>* is defined as

$$A(z\_{\psi}^{\*}) = \{ z\_0 \in \mathbb{C} : R^n(z\_0) \to z\_{\psi'}^{\*}n \to \infty \}.$$

The set of points whose orbits tend to an attracting fixed point *z*∗ *<sup>ψ</sup>* is called the Fatou set. The complementary set, called the Julia set, is the closure of the set of repelling fixed points, which establishes the boundaries between the basins of the roots. Attraction basins allow us to assess those starting points which converge to the concerned root of a polynomial when we apply an iterative method, so we can visualize which points are good options as starting points and which are not.

We select *z*<sup>0</sup> as the initial point belonging to *D*, where *D* is a rectangular region in C containing all the roots of the equation *ψ*(*z*) = 0. An iterative method starting with a point *z*<sup>0</sup> ∈ *D* may converge to the zero of the function *ψ*(*z*) or may diverge. To assess the basins, we consider 10−<sup>3</sup> as the stopping criterion for convergence restricted to 25 iterations. If this tolerance is not achieved in the required iterations, the procedure is dismissed with the result showing the divergence of the iteration function started from *z*0. While drawing the basins, the following criterion is adopted: A color is allotted to -







every initial guess *z*<sup>0</sup> in the attraction basin of a zero. If the iterative formula that begins at point *z*<sup>0</sup> converges, then it forms the basins of attraction with that assigned color and, if the formula fails to converge in the required number of iterations, then it is painted black.

To view the complex dynamics, the proposed methods are applied on the following three problems:

**Test problem 1**. Consider the polynomial *<sup>ψ</sup>*1(*z*)=(*z*<sup>2</sup> <sup>+</sup> *<sup>z</sup>* <sup>+</sup> <sup>1</sup>)<sup>2</sup> having two zeros {−0.5 <sup>−</sup> 0.866025*i*, −0.5 + 0.866025*i*} with multiplicity *μ* = 2. The attraction basins for this polynomial are shown in Figures 1–3 corresponding to the choices 0.01, 10<sup>−</sup>4, 10−<sup>6</sup> of parameter *β*. A color is assigned to each basin of attraction of a zero. In particular, red and green colors have been allocated to the basins of attraction of the zeros −0.5 − 0.866025*i* and −0.5 + 0.866025*i*, respectively.

**Figure 3.** Basins of attraction by M-1–M-4 (*β* = 10−6) for polynomial *ψ*1(*z*).

**Test problem 2**. Consider the polynomial *ψ*2(*z*) = *z*<sup>3</sup> + <sup>1</sup> 4 *z* <sup>3</sup> which has three zeros {− *<sup>i</sup>* 2 , *i* <sup>2</sup> , 0} with multiplicities *μ* = 3. Basins of attractors assessed by methods for this polynomial are drawn in Figures 4–6 corresponding to choices *β* = 0.01, 10−4, 10−6. The corresponding basin of a zero is identified by a color assigned to it. For example, green, red, and blue colors have been assigned corresponding to <sup>−</sup> *<sup>i</sup>* 2 , *i* <sup>2</sup> , and 0.

**Figure 4.** Basins of attraction by M-1–M-4 (*β* = 0.01) for polynomial *ψ*2(*z*).










**Figure 5.** Basins of attraction by M-1–M-4 (*β* = 10−4) for polynomial *ψ*2(*z*).

**Figure 6.** Basins of attraction by methods M-1–M-4 (*β* = 10−6) for polynomial *ψ*2(*z*).

**Test problem 3**. Next, let us consider the polynomial *ψ*3(*z*) = *z*<sup>3</sup> + <sup>1</sup> *z* <sup>4</sup> that has four zeros {−0.707107 + 0.707107*i*, −0.707107 − 0.707107*i*, 0.707107 + 0.707107*i*, 0.707107 − 0.707107*i*} with multiplicity *μ* = 4. The basins of attractors of zeros are shown in Figures 7–9, for choices of the parameter *β* = 0.01, 10−4, 10−6. A color is assigned to each basin of attraction of a zero. In particular, we assign yellow, blue, red, and green colors to −0.707107 + 0.707107*i*, −0.707107 − 0.707107*i*, 0.707107 + 0.707107*i* and 0.707107 − 0.707107*i*, respectively.

**Figure 9.** Basins of attraction by M-1–M-4 (*β* = 10−6) for polynomial *ψ*3(*z*).

Estimation of *β* values plays an important role in the selection of those members of family (3) which possess good convergence behavior. This is also the reason why different values of *β* have been chosen to assess the basins. The above graphics clearly indicate that basins are becoming wider with the smaller values of parameter *β*. Moreover, the black zones (used to indicate divergence zones) are also diminishing as *β* assumes small values. Thus, we conclude this section with a remark that the convergence of proposed methods is better for smaller values of parameter *β*.

### **5. Numerical Results**

In order to validate of theoretical results that have been shown in previous sections, the new methods M1, M2, M3, and M4 are tested numerically by implementing them on some nonlinear equations. Moreover, these are compared with some existing optimal fourth order Newton-like methods. For example, we consider the methods by Li–Liao–Cheng [7], Li–Cheng–Neta [8], Sharma–Sharma [9], Zhou–Chen–Song [10], Soleymani–Babajee–Lotfi [12], and Kansal–Kanwar–Bhatia [14]. The methods are expressed as follows:

Li–Liao–Cheng method (LLCM):

$$\begin{aligned} z\_k &= u\_k - \frac{2\mu}{\mu + 2} \frac{\psi(u\_k)}{\psi'(u\_k)'}\\ u\_{k+1} &= u\_k - \frac{\mu(\mu - 2) \left(\frac{\mu}{\mu + 2}\right)^{-\mu} \psi'(z\_k) - \mu^2 \psi'(u\_k)}{\psi'(u\_k) - \left(\frac{\mu}{\mu + 2}\right)^{-\mu} \psi'(z\_k)} \frac{\psi(u\_k)}{2\psi'(u\_k)} \end{aligned}$$

.

.

Li–Cheng–Neta method (LCNM):

$$\begin{aligned} z\_k &= u\_k - \frac{2\mu}{\mu + 2} \frac{\psi(u\_k)}{\psi'(u\_k)}, \\ u\_{k+1} &= u\_k - \alpha\_1 \frac{\psi(u\_k)}{\psi'(z\_k)} - \frac{\psi(u\_k)}{\alpha\_2 \psi'(u\_k) + \alpha\_3 \psi'(z\_k)} \end{aligned}$$

where

$$\begin{aligned} \varkappa\_1 &= -\frac{1}{2} \frac{(\frac{\mu}{\mu+2})^{\mu} \mu (\mu^4 + 4\mu^3 - 16\mu - 16)}{\mu^3 - 4\mu + 8}, \\ \varkappa\_2 &= -\frac{(\mu^3 - 4\mu + 8)^2}{\mu (\mu^4 + 4\mu^3 - 4\mu^2 - 16\mu + 16)(\mu^2 + 2\mu - 4)}, \\ \varkappa\_3 &= \frac{\mu^2 (\mu^3 - 4\mu + 8)}{\left(\frac{\mu}{\mu+2}\right)^{\mu} (\mu^4 + 4\mu^3 - 4\mu^2 - 16\mu + 16)(\mu^2 + 2\mu - 4)} \end{aligned}$$

Sharma–Sharma method (SSM):

$$\begin{split} z\_{k} &= u\_{k} - \frac{2\mu}{\mu+2} \frac{\psi(u\_{k})}{\psi'(u\_{k})'} \\ u\_{k+1} &= u\_{k} - \frac{\mu}{8} \left[ (\mu^{3} - 4\mu + 8) - (\mu + 2)^{2} \left(\frac{\mu}{\mu+2}\right)^{\mu} \frac{\psi'(u\_{k})}{\psi'(z\_{k})} \\ &\quad \times \left( 2(\mu - 1) - (\mu + 2) \left(\frac{\mu}{\mu+2}\right)^{\mu} \frac{\psi'(u\_{k})}{\psi'(z\_{k})} \right) \right] \frac{\psi(u\_{k})}{\psi'(u\_{k})}. \end{split}$$

*Mathematics* **2020**, *8*, 1091

Zhou–Chen–Song method (ZCSM):

$$\begin{split} z\_{k} &= u\_{k} - \frac{2\mu}{\mu+2} \frac{\psi(u\_{k})}{\psi'(u\_{k})}, \\ u\_{k+1} &= u\_{k} - \frac{\mu}{8} \Big[\mu^{3} \left(\frac{\mu+2}{\mu}\right)^{2\mu} \left(\frac{\psi'(z\_{k})}{\psi'(u\_{k})}\right)^{2} - 2\mu^{2} (\mu+3) \left(\frac{\mu+2}{\mu}\right)^{\mu} \frac{\psi'(z\_{k})}{\psi'(u\_{k})} \\ &\quad + (\mu^{3} + 6\mu^{2} + 8\mu + 8) \Big] \frac{\psi(u\_{k})}{\psi'(u\_{k})}. \end{split}$$

Soleymani–Babajee–Lotfi method (SBLM):

$$\begin{aligned} z\_k &= u\_k - \frac{2\mu}{\mu + 2} \frac{\psi(u\_k)}{\psi'(u\_k)'} \\ u\_{k+1} &= u\_k - \frac{\psi'(z\_k)\psi(u\_k)}{q\_1(\psi'(z\_k))^2 + q\_2\psi'(z\_k)\psi'(u\_k) + q\_3(\psi'(u\_k))^2} \end{aligned}$$

where *q*<sup>1</sup> = <sup>1</sup> *μ*3−*μ*(*<sup>μ</sup>* <sup>+</sup> <sup>2</sup>)*μ*, *<sup>q</sup>*<sup>2</sup> <sup>=</sup> <sup>8</sup>−*μ*(*μ*+2)(*μ*2−2) *<sup>μ</sup>* , *<sup>q</sup>*<sup>3</sup> <sup>=</sup> <sup>1</sup> (*<sup>μ</sup>* <sup>−</sup> <sup>2</sup>)*μμ*−1(*<sup>μ</sup>* <sup>+</sup> <sup>2</sup>)3−*μ*.

Kansal–Kanwar–Bhatia method (KKBM):

$$\begin{aligned} z\_k &= u\_k - \frac{2\mu}{\mu + 2} \frac{\psi(u\_k)}{\psi'(u\_k)'}\\ u\_{k+1} &= u\_k - \frac{\mu}{4} \psi(u\_k) \left( 1 + \frac{\mu^4 p^{-2\mu} \left( p^{\mu - 1} - \frac{\psi'(z\_k)}{\psi'(u\_k)} \right)^2 (p^{\mu} - 1)}{8(2p^{\mu} + \mathfrak{n}(p^{\mu} - 1))} \right) \\ &\times \left( \frac{4 - 2\mu + \mu^2 (p^{-\mu} - 1)}{\psi'(u\_k)} - \frac{p^{-\mu} (2p^{\mu} + \mu (p^{\mu} - 1))^2}{\psi'(u\_k) - \psi'(z\_k)} \right) \end{aligned}$$

where *p* = *<sup>μ</sup> <sup>μ</sup>*+<sup>2</sup> .

Computations are performed in the programming package of Mathematica software [20] in a PC with specifications: Intel(R) Pentium(R) CPU B960 @ 2.20 GHz, 2.20 GHz (32-bit Operating System) Microsoft Windows 7 Professional and 4 GB RAM. Numerical tests are performed by choosing the value −0.01 for parameter *β* in new methods. The tabulated results of the methods displayed in Table 1 include: (i) iteration number (*k*) required to obtain the desired solution satisfying the condition <sup>|</sup>*uk*<sup>+</sup><sup>1</sup> <sup>−</sup> *uk*<sup>|</sup> <sup>+</sup> <sup>|</sup>*ψ*(*uk*)<sup>|</sup> <sup>&</sup>lt; <sup>10</sup>−100, (ii) estimated error <sup>|</sup>*uk*<sup>+</sup><sup>1</sup> <sup>−</sup> *uk*<sup>|</sup> in the consecutive first three iterations, (iii) calculated convergence order (CCO), and (iv) time consumed (CPU time in seconds) in execution of a program, which is measured by the command "TimeUsed[ ]". The calculated convergence order (CCO) is computed by the well-known formula (see [24])

$$\text{CCO} = \frac{\log| (u\_{k+2} - a) / (u\_{k+1} - a) |}{\log| (u\_{k+1} - a) / (u\_k - a) |}, \quad \text{for each } k = 1, 2, \dots \tag{37}$$

,


**Table 1.** Comparison of numerical results.

The problems considered for numerical testing are shown in Table 2.


**Table 2.** Test functions.

From the computed results in Table 1, we can observe the good convergence behavior of the proposed methods. The reason for good convergence is the increase in accuracy of the successive approximations as is evident from values of the differences |*uk*<sup>+</sup><sup>1</sup> − *uk*|. This also implies to stable nature of the methods. Moreover, the approximations to solutions computed by the proposed methods have either greater or equal accuracy than those computed by existing counterparts. The value 0 of <sup>|</sup>*uk*<sup>+</sup><sup>1</sup> <sup>−</sup> *uk*<sup>|</sup> indicates that the stopping criterion <sup>|</sup>*uk*<sup>+</sup><sup>1</sup> <sup>−</sup> *uk*<sup>|</sup> <sup>+</sup> <sup>|</sup>*ψ*(*uk*)<sup>|</sup> <sup>&</sup>lt; <sup>10</sup>−<sup>100</sup> has been satisfied at this stage. From the calculation of calculated convergence order as shown in the second last column in each table, we have verified the theoretical fourth order of convergence. The robustness of new algorithms can also be judged by the fact that the used CPU time is less than that of the CPU time by the existing techniques. This conclusion is also confirmed by similar numerical experiments on many other different problems.

### **6. Conclusions**

We have proposed a family of fourth order derivative-free numerical methods for obtaining multiple roots of nonlinear equations. Analysis of the convergence has been carried out under standard assumptions, which proves the convergence order four. The important feature of our designed scheme is its optimal order of convergence which is rare to achieve in derivative-free methods. Some special cases of the family have been explored. These cases are employed to solve some nonlinear equations. The performance is compared with existing techniques of a similar nature. Testing of the numerical results have shown the presented derivative-free method as good competitors to the already established optimal fourth order techniques that use derivative information in the algorithm. We conclude this work with a remark: the proposed derivative-free methods can be a better alternative to existing Newton-type methods when derivatives are costly to evaluate.

**Author Contributions:** Methodology, J.R.S.; Writing—review & editing, J.R.S.; Investigation, S.K.; Data Curation, S.K.; Conceptualization, L.J.; Formal analysis, L.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Finite Integration Method with Shifted Chebyshev Polynomials for Solving Time-Fractional Burgers' Equations**

### **Ampol Duangpan 1, Ratinan Boonklurb 1,\* and Tawikan Treeyaprasert <sup>2</sup>**


Received: 21 October 2019; Accepted: 3 December 2019; Published: 7 December 2019

**Abstract:** The Burgers' equation is one of the nonlinear partial differential equations that has been studied by many researchers, especially, in terms of the fractional derivatives. In this article, the numerical algorithms are invented to obtain the approximate solutions of time-fractional Burgers' equations both in one and two dimensions as well as time-fractional coupled Burgers' equations which their fractional derivatives are described in the Caputo sense. These proposed algorithms are constructed by applying the finite integration method combined with the shifted Chebyshev polynomials to deal the spatial discretizations and further using the forward difference quotient to handle the temporal discretizations. Moreover, numerical examples demonstrate the ability of the proposed method to produce the decent approximate solutions in terms of accuracy. The rate of convergence and computational cost for each example are also presented.

**Keywords:** finite integration method; shifted Chebyshev polynomial; Caputo fractional derivative; Burgers' equation; coupled Burgers' equation

### **1. Introduction**

Fractional calculus has received much attention due to the fact that several real-world phenomena can be demonstrated successfully by developing mathematical models using fractional calculus. More specifically, fractional differential equations (FDEs) are the generalized form of integer order differential equations. The applications of the FDEs have been emerging in many fields of science and engineering such as diffusion processes [1], thermal conductivity [2], oscillating dynamical systems [3], rheological models [4], quantum models [5], etc. However, one of the interesting issues for the FDEs is a fractional Burgers' equation. It appears in many areas of applied mathematics and can describe various kinds of phenomena such as mathematical models of turbulence and shock wave traveling, formation, and decay of nonplanar shock waves at the velocity fluctuation of sound, physical processes of unidirectional propagation of weakly nonlinear acoustic waves through a gas-filled pipe, and so on, see [6–8]. In order to understand these phenomena as well as further apply them in the practical life, it is important to find their solutions. Some powerful numerical methods had been developed for solving the fractional Burgers' equation, such as finite difference methods (FDM) [9], Adomian decomposition method [10], and finite volume method [11]. Moreover, in 2015, Esen and Tasbozan [12] gave a numerical solution of time fractional Burgers' equation by assuming that the solution *u*(*x*, *t*) can be approximated by a linear combination of products of two functions, one of which involves only *x* and the other involves only *t*. Recently, Yokus and kaya [13] used the FDM to find the numerical solution for time fractional Burgers' equation, however, their results contained less accuracy. In 2017,

Cao et al. [14] studied solution of two-dimensional time-fractional Burgers' equation with high and low Reynolds numbers using discontinuous Galerkin method, however, the method involves the triangulations of the domain which usually gives difficulty in terms of devising a computational program. There are more numerical studies on time- and/or space-fractional Burgers' equations which can be found in many researches.

In this article, we present the numerical technique based on the finite integration method (FIM) for solving time-fractional Burger' equations and time-fractional coupled Burgers' equations. The FIM is one of the interesting numerical methods in solving partial differential equations (PDEs). The idea of using FIM is to transform the given PDE into an equivalent integral equation and apply numerical integrations to solve the integral equation afterwards. It is known that the numerical integration is very insensitive to round-off errors, while numerical differentiation is very sensitive to round-off errors. It is because the manipulation task of numerical differentiation involves division by small step-size but the process of numerical integration involves multiplication by small step-size.

Originally, the FIM has been firstly proposed by Wen et al. [15]. They constructed the integration matrices based on trapezoidal rule and radial basis functions for solving one-dimensional linear PDEs and then Li et al. [16] continued to develop it in order to overcome the two-dimensional problems. After that, the FIM was improved using three numerical quadratures, including Simpson's rule, Newton-Cotes, and Lagrange interpolation, presented by Li et al. [17]. The FIM has been successfully applied to solve various kinds of PDEs and it was verified by comparing with several existing methods that it offers a very stable, highly accurate and efficient approach, see [18–20]. In 2018, Boonklurb et al. [21] modified the original FIM via Chebyshev polynomials for solving linear PDEs which provided a much higher accuracy than the FDM and those traditional FIMs. Unfortunately, the modified FIM in [21] has never been studied for the Burgers' equations and coupled Burgers' equations involving fractional order derivatives with respect to time. This became the major motivation to carry out the current work.

In this paper, we improve the modified FIM in [21] by using the shifted Chebyshev polynomials (FIM-SCP) to devise the numerical algorithms for finding the decent approximate solutions of time-fractional Burgers' equations both in one- and two-dimensional domains as well as time-fractional coupled Burgers' equations. Their time-fractional derivative terms are described in the Caputo sense. We note here that the FIM in [21] is applicable for solving linear differential equations. With our improvement in this paper, we propose the numerical methods that are applicable for solving time-fractional Burgers' equations. It is well known that Chebyshev polynomial have the orthogonal property which plays an important role in the theory of approximation. The roots of the Chebyshev polynomial can be found explicitly and when the equidistant nodes are so bad, we can overcome the problem by using the Chebyshev nodes. If we sample our function at the Chebyshev nodes, we can have best approximation under the maximum norm, see [22] for more details. With these advantages, our improved FIM-SCP is constructed by approximating the solutions expressed in term of the shifted Chebyshev expansion. We use the zeros of the Chebyshev polynomial of a certain degree to interpolate the approximate solution. With our work, we obtain the shifted Chebyshev integration matrices in oneand two- dimensional spaces which are used to deal with the spatial discretizations. The temporal discretizations are approximated by the forward difference quotient.

The rest of this paper is organized as follows. In Section 2, we provide the basic definitions and the necessary notations used throughout this paper. In Section 3, the improved FIM-SCP of constructing the shifted Chebyshev integration matrices, both for one and two dimensions are discussed. In Section 4, we derive the numerical algorithms for solving one-dimensional time-fractional Burgers' equations, two-dimensional time-fractional Burgers' equations, and time-fractional coupled Burgers' equations. The numerical results are presented, which are also shown to be more computationally efficient and accurate than the other methods with CPU time(s) and rate of convergence. The conclusion and some discussion for the future work are provided in Section 5.

### **2. Preliminaries**

Before embarking into the details of the FIM-SCP for solving time-fractional differential equations, we provide in this section the basic definitions of fractional derivatives and shifted Chebyshev polynomials. The necessary notations and some important facts used throughout this paper are also given. More details on basic results of fractional calculus can be found in [23] and further details of Chebyshev polynomials can be reached in [22].

**Definition 1.** *Let p*, *μ*, *and t be real numbers such that t* > 0*, and*

$$\mathcal{C}\_{\mu} = \left\{ u(t) \mid u(t) = t^p u\_1(t), \text{ where } u\_1(t) \in \mathbb{C}[0, \infty) \text{ and } p > \mu \right\}. \text{ }$$

*If an integrable function u* (*t*) ∈ *Cμ, we define the Riemann–Liouville fractional integral operator of order α* ≥ 0 *as*

$$I^{\mathfrak{a}}u(t) = \begin{cases} \frac{1}{\Gamma(\mathfrak{a})} \int\_0^t \frac{u(s)}{(t-s)^{1-\mathfrak{a}}} ds & \text{for } \mathfrak{a} > 0, \\ u(t) & \text{for } \mathfrak{a} = 0, \end{cases}$$

*where* Γ(·) *is the well-known Gamma function.*

**Definition 2.** *The Caputo fractional derivative <sup>D</sup><sup>α</sup> of <sup>u</sup>*(*t*) <sup>∈</sup> *<sup>C</sup><sup>m</sup>* <sup>−</sup>1*, with <sup>u</sup>*(*t*) <sup>∈</sup> *<sup>C</sup><sup>m</sup> <sup>μ</sup> if and only if <sup>u</sup>*(*m*) <sup>∈</sup> *<sup>C</sup>μ, is defined by*

$$D^n u(t) = I^{m-a} D^m u(t) = \begin{cases} \frac{1}{\Gamma(m-a)} \int\_0^t \frac{u^{(m)}(s)}{(t-s)^{1-m+a}} ds & \text{for } n \in (m-1, m), \\ u^{(m)}(t) & \text{for } n = m, \end{cases}$$

*where m* <sup>∈</sup> <sup>N</sup> *and t* <sup>&</sup>gt; <sup>0</sup>*.*

**Definition 3.** *The shifted Chebyshev polynomial of degree n* <sup>≥</sup> <sup>0</sup> *for L* <sup>∈</sup> <sup>R</sup><sup>+</sup> *is defined by*

$$T\_n^\*(\mathbf{x}) = \cos\left(n \arccos\left(\frac{2\mathbf{x}}{L} - 1\right)\right) \text{ for } \mathbf{x} \in [0, L]. \tag{1}$$

**Lemma 1.** (i) *For n* <sup>∈</sup> <sup>N</sup>*, the zeros of the shifted Chebyshev polynomial T*<sup>∗</sup> *<sup>n</sup>* (*x*) *are*

$$\mathbf{x}\_k = \frac{L}{2} \left[ \cos \left( \frac{2k - 1}{2n} \pi \right) + 1 \right], k \in \{1, 2, 3, \dots, n\}. \tag{2}$$

(ii) *For x* ∈ [0, *L*]*, the single layer integrations of the shifted Chebyshev polynomial T*<sup>∗</sup> *<sup>n</sup>* (*x*) *are*

$$\begin{aligned} \overline{T}\_0^\*(\mathbf{x}) &= \int\_0^\mathbf{x} T\_0^\*(\xi) \, d\xi = \mathbf{x}, \\ T\_1^\*(\mathbf{x}) &= \int\_0^\mathbf{x} T\_1^\*(\xi) \, d\xi = \frac{\mathbf{x}^2}{L} - \mathbf{x}, \\ T\_n^\*(\mathbf{x}) &= \int\_0^\mathbf{x} T\_n^\*(\xi) \, d\xi = \frac{L}{4} \left[ \frac{T\_{n+1}^\*(\mathbf{x})}{n+1} - \frac{T\_{n-1}^\*(\mathbf{x})}{n-1} - \frac{2(-1)^n}{n^2 - 1} \right], n \in \{2, 3, 4, ...\}. \end{aligned}$$

(iii) *Let* {*xk*}*<sup>n</sup> <sup>k</sup>*=<sup>1</sup> *be a set of zeros of T*<sup>∗</sup> *<sup>n</sup>* (*x*) *defined in (2), and define the shifted Chebyshev matrix* **T** *by*

$$\mathbf{T} = \begin{bmatrix} T\_0^\*(\mathbf{x}\_1) & T\_1^\*(\mathbf{x}\_1) & \cdots & T\_{n-1}^\*(\mathbf{x}\_1) \\ T\_0^\*(\mathbf{x}\_2) & T\_1^\*(\mathbf{x}\_2) & \cdots & T\_{n-1}^\*(\mathbf{x}\_2) \\ \vdots & \vdots & \ddots & \vdots \\ T\_0^\*(\mathbf{x}\_{\mathrm{il}}) & T\_1^\*(\mathbf{x}\_{\mathrm{il}}) & \cdots & T\_{n-1}^\*(\mathbf{x}\_{\mathrm{il}}) \end{bmatrix}.$$

*Then, it has the multiplicative inverse* **T**−<sup>1</sup> = <sup>1</sup> *<sup>n</sup>*diag(1, 2, 2, ..., 2)**T**#.

### **3. Improved FIM-SCP**

In this section, we improve the technique of Boonklurb et al. [21] to construct the first and higher order integration matrices in one and two dimensions. We note here that Boonklurb et al. used Chebyshev polynomials to construct the integration matrices and obtained numerical algorithms for solving linear differential equations, whereas in this work, we use the shifted Chebyshev polynomials to construct first and higher order shifted Chebyshev integration matrices to obtain numerical algorithms that are applicable to solve time-fractional Burgers' equations on any domain [0, *L*] rather than [−1, 1].

### *3.1. One-Dimensional Shifted Chebyshev Integration Matrices*

Let *<sup>M</sup>* <sup>∈</sup> <sup>N</sup> and *<sup>L</sup>* <sup>∈</sup> <sup>R</sup>+. Define an approximate solution *<sup>u</sup>*(*x*) of a certain PDE by the linear combination of shifted Chebyshev polynomials (1), i.e.,

$$u(\mathbf{x}) = \sum\_{n=0}^{M-1} c\_n T\_n^\*(\mathbf{x}) \text{ for } \mathbf{x} \in [0, L]. \tag{3}$$

Let *xk*, *k* ∈ {1, 2, 3, ..., *M*}, be the grid points generated by the zeros of the shifted Chebyshev polynomial *T*∗ *<sup>M</sup>*(*x*) defined in (2). Substituting each *xk* into (3), then (3) can be expressed as

$$
\begin{bmatrix}
u(\mathbf{x}\_{1})\\u(\mathbf{x}\_{2})\\\vdots\\u(\mathbf{x}\_{M})
\end{bmatrix} = \begin{bmatrix}
T\_{0}^{\*}(\mathbf{x}\_{1}) & T\_{1}^{\*}(\mathbf{x}\_{1}) & \cdots & T\_{M-1}^{\*}(\mathbf{x}\_{1})\\T\_{0}^{\*}(\mathbf{x}\_{2}) & T\_{1}^{\*}(\mathbf{x}\_{2}) & \cdots & T\_{M-1}^{\*}(\mathbf{x}\_{2})\\\vdots & \vdots & \ddots & \vdots\\\vdots & \vdots & \ddots & \vdots\\T\_{0}^{\*}(\mathbf{x}\_{M}) & T\_{1}^{\*}(\mathbf{x}\_{M}) & \cdots & T\_{M-1}^{\*}(\mathbf{x}\_{M})
\end{bmatrix} \begin{bmatrix}
c\_{0}\\c\_{1}\\\vdots\\c\_{M-1}
\end{bmatrix}'
$$

and we let it be denoted by **<sup>u</sup>** <sup>=</sup> **Tc**. The coefficients {*cn*}*M*−<sup>1</sup> *<sup>n</sup>*=<sup>0</sup> can be obtained by computing **<sup>c</sup>** <sup>=</sup> **<sup>T</sup>**−1**u**. Let *U*(1)(*xk*) denote the single layer integration of *u* from 0 to *xk*. Then,

$$\iota L^{(1)}(\mathbf{x}\_k) = \int\_0^{\mathbf{x}\_k} \iota(\xi) \, d\xi = \sum\_{n=0}^{M-1} c\_n \int\_0^{\mathbf{x}\_k} T\_n^\*(\xi) \, d\xi = \sum\_{n=0}^{M-1} c\_n T\_n^\*(\mathbf{x}\_k)$$

for *k* ∈ {1, 2, 3, ..., *M*} or in matrix form:

$$
\begin{bmatrix}
\boldsymbol{U}^{(1)}(\mathbf{x}\_{1}) \\
\boldsymbol{U}^{(1)}(\mathbf{x}\_{2}) \\
\vdots \\
\boldsymbol{I}^{(1)}(\mathbf{x}\_{M})
\end{bmatrix} = \begin{bmatrix}
\boldsymbol{T}^{\*}\_{0}(\mathbf{x}\_{1}) & \boldsymbol{T}^{\*}\_{1}(\mathbf{x}\_{1}) & \cdots & \boldsymbol{T}^{\*}\_{M-1}(\mathbf{x}\_{1}) \\
\overline{\boldsymbol{T}}^{\*}\_{0}(\mathbf{x}\_{2}) & \overline{\boldsymbol{T}}^{\*}\_{1}(\mathbf{x}\_{2}) & \cdots & \overline{\boldsymbol{T}}^{\*}\_{M-1}(\mathbf{x}\_{2}) \\
\vdots & \vdots & \ddots & \vdots \\
\overline{\boldsymbol{T}}^{\*}\_{0}(\mathbf{x}\_{M}) & \boldsymbol{T}^{\*}\_{1}(\mathbf{x}\_{M}) & \cdots & \boldsymbol{T}^{\*}\_{M-1}(\mathbf{x}\_{M})
\end{bmatrix} \begin{bmatrix}
\boldsymbol{c}\_{0} \\
\boldsymbol{c}\_{1} \\
\vdots \\
\boldsymbol{c}\_{M-1}
\end{bmatrix}.
$$

We denote the above equation by **<sup>U</sup>**(1) <sup>=</sup> **Tc** <sup>=</sup> **TT**−1**<sup>u</sup>** :<sup>=</sup> **Au**, where **<sup>A</sup>** <sup>=</sup> **TT**−<sup>1</sup> := [*aki*]*M*×*<sup>M</sup>* is called the "shifted Chebyshev integration matrix" for the improved FIM-SCP in one dimension. Next, let us consider the double layer integration of *u* from 0 to *xk* that denoted by *U*(2)(*xk*). We have

$$\mathrm{d}I^{(2)}(\mathbf{x}\_k) = \int\_0^{\mathbf{x}\_k} \int\_0^{\mathbf{\tilde{x}}\_2} \mathrm{u}(\mathbf{\tilde{y}}\_1) \, d\mathbf{\tilde{y}}\_1 d\mathbf{\tilde{y}}\_2 = \sum\_{i=1}^M a\_{ki} \int\_0^{\mathbf{x}\_i} \mathrm{u}(\mathbf{\tilde{y}}\_1) \, d\mathbf{\tilde{y}}\_1 = \sum\_{i=1}^M \sum\_{j=1}^M a\_{ki} a\_{ij} \mathrm{u}(\mathbf{x}\_j).$$

for *<sup>k</sup>* ∈ {1, 2, 3, ..., *<sup>M</sup>*}, it can be written in matrix form as **<sup>U</sup>**(2) <sup>=</sup> **<sup>A</sup>**2**u**. The *<sup>m</sup>*th layer integration of *<sup>u</sup>* from 0 to *xk*, denoted by *U*(*m*)(*xk*), can be obtained in the similar manner, that is,

$$\mathcal{U}^{(m)}(\mathbf{x}\_k) = \int\_0^{\mathbf{x}\_k} \cdot \cdots \int\_0^{\mathbf{x}\_2} \mu(\xi\_1) \, d\xi\_1 \cdots \, d\xi\_{\mathfrak{M}} = \sum\_{i\_m=1}^M \cdot \cdots \cdot \sum\_{j=1}^M a\_{ki\_m} \cdot \cdots \cdot a\_{i\_1 j} \mu(\mathfrak{x}\_j)$$

for *<sup>k</sup>* ∈ {1, 2, 3, ..., *<sup>M</sup>*}, or written in the matrix form as **<sup>U</sup>**(*m*) <sup>=</sup> **<sup>A</sup>***m***u**.

### *3.2. Two-Dimensional Shifted Chebyshev Integration Matrices*

Let *<sup>M</sup>*, *<sup>N</sup>* <sup>∈</sup> <sup>N</sup> and *<sup>L</sup>*1, *<sup>L</sup>*<sup>2</sup> <sup>∈</sup> <sup>R</sup>+. Divide the domain [0, *<sup>L</sup>*1] <sup>×</sup> [0, *<sup>L</sup>*2] into a mesh with *<sup>M</sup>* nodes by *<sup>N</sup>* nodes along the horizontal and the vertical directions, respectively. Let *xk*, where *k* ∈ {1, 2, 3, ..., *M*}, be the grid points generated by the shifted Chebyshev nodes of *T*∗ *<sup>M</sup>*(*x*) and let *ys*, where *s* ∈ {1, 2, 3, ..., *N*}, be the grid points generated by the shifted Chebyshev nodes of *T*∗ *<sup>N</sup>*(*y*). Thus, there are *M* × *N* grid points in total. For computation, we index the numbering of grid points along the *x*-direction by the global numbering system (Figure 1a) and along *y*-direction by the local numbering system (Figure 1b).

Let *<sup>U</sup>*(1) *<sup>x</sup>* and *<sup>U</sup>*(1) *<sup>y</sup>* be the single layer integrations with respect to the variables *x* and *y*, respectively. For each fixed *<sup>y</sup>*, we have *<sup>U</sup>*(1) *<sup>x</sup>* (*xk*, *<sup>y</sup>*) in the global numbering system as

$$\, \, \, \mathcal{U}\_{\mathbf{x}}^{(1)}(\mathbf{x}\_{k'} \, \mathbf{y}) = \int\_0^{\mathbf{x}\_k} \mu(\mathbf{f}\_{\mathbf{v}} \, \mathbf{y}) \, d\mathbf{f} = \sum\_{i=1}^M a\_{ki} \mu(\mathbf{x}\_i, \mathbf{y}). \tag{4}$$

For *<sup>k</sup>* ∈ {1, 2, 3, ..., *<sup>M</sup>*}, (4) can be expressed as **<sup>U</sup>**(1) *<sup>x</sup>* (·, *<sup>y</sup>*) = **<sup>A</sup>***M***u**(·, *<sup>y</sup>*), where **<sup>A</sup>***<sup>M</sup>* <sup>=</sup> **TT**−<sup>1</sup> is the *M* × *M* matrix. Thus, for each *y* ∈ {*y*1, *y*2, *y*3, ..., *yN*},

$$
\begin{bmatrix} \mathbf{U}\_{x}^{(1)}(\cdot, y\_{1}) \\ \mathbf{U}\_{x}^{(1)}(\cdot, y\_{2}) \\ \vdots \\ \mathbf{U}\_{x}^{(1)}(\cdot, y\_{N}) \end{bmatrix} = \underbrace{\begin{bmatrix} \mathbf{A}\_{M} & 0 & \cdots & 0 \\ 0 & \mathbf{A}\_{M} & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & \mathbf{A}\_{M} \end{bmatrix}}\_{N \text{ blocks}} \begin{bmatrix} \mathbf{u}(\cdot, y\_{1}) \\ \mathbf{u}(\cdot, y\_{2}) \\ \vdots \\ \mathbf{u}(\cdot, y\_{N}) \end{bmatrix} / \end{bmatrix}
$$

we shall denote it by **<sup>U</sup>**(1) *<sup>x</sup>* = **A***x***u**, where **A***<sup>x</sup>* = **I***<sup>N</sup>* ⊗ **A***<sup>M</sup>* is the shifted Chebyshev integration matrix with respect to *x*-axis and ⊗ is the Kronecker product defined in [24]. Similarly, for each fixed *x*, *U*(1) *<sup>y</sup>* (*x*, *ys*) can be expressed in the local numbering system as

$$
\Delta I\_y^{(1)}(\mathbf{x}, \mathbf{y}\_s) = \int\_0^{y\_s} \mu(\mathbf{x}, \eta) \, d\eta = \sum\_{j=1}^N a\_{sj} \mu(\mathbf{x}, \mathbf{y}\_j). \tag{5}
$$

.

For *<sup>s</sup>* ∈ {1, 2, 3, ..., *<sup>N</sup>*}, (5) can be written as **<sup>U</sup>**(1) *<sup>y</sup>* (*x*, ·) = **<sup>A</sup>***N***u**(*x*, ·), where **<sup>A</sup>***<sup>N</sup>* <sup>=</sup> **TT**−<sup>1</sup> is the *N* × *N* matrix. Therefore, for each *x* ∈ {*x*1, *x*2, *x*3, ..., *xM*},

$$
\begin{bmatrix} \mathbf{U}\_{y}^{(1)}(\mathbf{x}\_{1\prime}.\cdot) \\ \mathbf{U}\_{y}^{(1)}(\mathbf{x}\_{2\prime}.\cdot) \\ \vdots \\ \mathbf{U}\_{y}^{(1)}(\mathbf{x}\_{M\prime}.\cdot) \end{bmatrix} = \underbrace{\begin{bmatrix} \mathbf{A}\_{N} & 0 & \cdots & 0 \\ 0 & \mathbf{A}\_{N} & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & \mathbf{A}\_{N} \end{bmatrix}}\_{\text{M blocks}} \begin{bmatrix} \mathbf{u}(\mathbf{x}\_{1\prime}.\cdot) \\ \mathbf{u}(\mathbf{x}\_{2\prime}.\cdot) \\ \vdots \\ \mathbf{u}(\mathbf{x}\_{M\prime}.\cdot) \end{bmatrix}.$$

We shall denote the above matrix equation by **<sup>U</sup>**(1) *<sup>y</sup>* <sup>=</sup> **<sup>A</sup>***y***u**, where **<sup>A</sup>***<sup>y</sup>* <sup>=</sup> **<sup>I</sup>***<sup>M</sup>* <sup>⊗</sup> **<sup>A</sup>***N*. We notice that the elements of **<sup>u</sup>** and **<sup>u</sup>** are the same but different positions in the numbering system. Thus, we can transform **<sup>U</sup>**(1) *<sup>y</sup>* and **<sup>u</sup>** in the local numbering system to the global numbering system by using the permutation matrix **P** = [*pij*]*MN*×*MN*, where each *pij* is defined by

$$p\_{ij} = \begin{cases} 1 & ; \begin{cases} i = (s-1)M + k\_{\prime} \\ j = (k-1)N + s\_{\prime} \end{cases} \\ 0 & ; \text{ otherwise,} \end{cases} \tag{6}$$

for all *<sup>k</sup>* ∈ {1, 2, 3, ..., *<sup>M</sup>*} and *<sup>s</sup>* ∈ {1, 2, 3, ..., *<sup>N</sup>*}. We obtain that **<sup>U</sup>**(1) *<sup>y</sup>* <sup>=</sup> **PU**(1) *<sup>y</sup>* and **<sup>u</sup>** <sup>=</sup> **Pu**. Therefore, we have **<sup>U</sup>**(1) *<sup>y</sup>* <sup>=</sup> **<sup>A</sup>***y***u**, where **<sup>A</sup>***<sup>y</sup>* <sup>=</sup> **PA***y***P**−<sup>1</sup> <sup>=</sup> **<sup>P</sup>**(**I***<sup>M</sup>* <sup>⊗</sup> **<sup>A</sup>***N*)**P**# is the shifted Chebyshev integration matrix with respect to *y*-axis in the global numbering system.

**Remark 1** ([21])**.** *Let <sup>m</sup>*, *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>*, the multi-layer integrations in the global numbering system can be represented in the matrix forms as follows,*


**Figure 1.** Global and local grid points.

### **4. The Numerical Algorithms for Time-Fractional Burgers' Equations**

In this section, we derive the numerical algorithms based on our improved FIM-SCP for solving time-fractional Burgers' equations both in one and two dimensions. The numerical algorithm for solving time-fractional coupled Burgers' equations is also proposed. To demonstrate the effectiveness and the efficiency of our algorithms, some numerical examples are given. Moreover, we find the time convergence rates and CPU times(s) of each example in order to demonstrate the computational cost. We note here that we implemented our numerical algorithms in MatLab R2016a. The experimental computer system is configured as: Intel(R) Core(TM) i7-6700 CPU @ 3.40 GHz. Finally, the graphically numerical solutions of each example are also depicted.

### *4.1. Algorithm for One-Dimensional Time-Fractional Burgers' Equation*

Let *L* and *T* be positive real numbers and *α* ∈ (0, 1]. Consider the time-fractional Burgers' equation with a viscosity parameter *ν* > 0 as follows.

$$
\rho \frac{\partial^a u}{\partial t^a} + \mu \frac{\partial u}{\partial x} - \nu \frac{\partial^2 u}{\partial x^2} = f(\mathbf{x}, t), \quad \mathbf{x} \in (0, L), \ t \in (0, T], \tag{7}
$$

*Mathematics* **2019**, *7*, 1201

subject to the initial condition

$$
\mu(\mathbf{x},0) = \phi(\mathbf{x}), \quad \mathbf{x} \in [0,L], \tag{8}
$$

and the boundary conditions

$$
\mu(0, t) = \psi\_1(t) \text{ and } \mu(L, t) = \psi\_2(t), \quad t \in (0, T]. \tag{9}
$$

where *f*(*x*, *t*), *φ*(*x*), *ψ*1(*t*), and *ψ*2(*t*) are given functions. Let us first linearize (7) by determining the iteration at time *tm* <sup>=</sup> *<sup>m</sup>*(Δ*t*), where <sup>Δ</sup>*<sup>t</sup>* is the time step and *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>. Then, we have

$$\frac{\partial^a u}{\partial t^a}\Big|\_{t=t\_m} + u^{m-1}\frac{\partial u^m}{\partial x} - \nu \frac{\partial^2 u^m}{\partial x^2} = f(x, t\_m), \tag{10}$$

where *u<sup>m</sup>* = *u*(*x*, *tm*) is the numerical solution at the *m*th iteration. For the Caputo time-fractional derivative term defined in Definition 2, we have

$$\frac{\partial^a u}{\partial t^a}\Big|\_{t=t\_m} = \frac{1}{\Gamma(1-a)} \int\_0^{t\_m} \frac{u\_s(\mathbf{x}, \mathbf{s})}{(t\_m - \mathbf{s})^a} \, d\mathbf{s} = \frac{1}{\Gamma(1-a)} \sum\_{i=0}^{m-1} \int\_{t\_i}^{t\_{i+1}} \frac{u\_s(\mathbf{x}, \mathbf{s})}{(t\_m - \mathbf{s})^a} \, d\mathbf{s}.\tag{11}$$

Using the first-order forward difference quotient to approximate the derivative term in (11), we get

$$\begin{split} \frac{\partial^{n}u}{\partial t^{n}}\Big|\_{t=t\_{m}} &\approx \frac{1}{\Gamma(1-a)}\sum\_{i=0}^{m-1}\int\_{t\_{i}}^{t\_{i+1}}(t\_{m}-s)^{-a}\left(\frac{u^{i+1}-u^{i}}{\Delta t}\right)ds \\ &= \frac{1}{\Gamma(1-a)}\sum\_{i=0}^{m-1}\left(\frac{u^{i+1}-u^{i}}{\Delta t}\right)\left[\frac{(t\_{m}-t\_{i})^{1-a}-(t\_{m}-t\_{i+1})^{1-a}}{1-a}\right] \\ &= \frac{1}{\Gamma(2-a)}\sum\_{i=0}^{m-1}\left(\frac{u^{i+1}-u^{i}}{\Delta t}\right)\left[(m-i)^{1-a}-(m-i-1)^{1-a}\right](\Delta t)^{1-a} \\ &= \frac{(\Delta t)^{-a}}{\Gamma(2-a)}\sum\_{j=0}^{m-1}(u^{m-j}-u^{m-j-1})\left[(j+1)^{1-a}-j^{1-a}\right] \\ &= \sum\_{j=0}^{m-1}w\_{j}(u^{m-j}-u^{m-j-1}), \end{split} \tag{12}$$

where *wj* <sup>=</sup> (Δ*t*)−*<sup>α</sup>* Γ(2−*α*) (*<sup>j</sup>* <sup>+</sup> <sup>1</sup>)1−*<sup>α</sup>* <sup>−</sup> *<sup>j</sup>* <sup>1</sup>−*<sup>α</sup>* . Thus, (10) becomes

$$w\_0(\boldsymbol{u}^m - \boldsymbol{u}^{m-1}) + \sum\_{j=1}^{m-1} w\_j(\boldsymbol{u}^{m-j} - \boldsymbol{u}^{m-j-1}) + \boldsymbol{u}^{m-1}\frac{\partial \boldsymbol{u}^m}{\partial \mathbf{x}} - \nu \frac{\partial^2 \boldsymbol{u}^m}{\partial \mathbf{x}^2} = f(\mathbf{x}, t\_m). \tag{13}$$

In order to eliminate the derivative terms in (13), we apply the modified FIM by taking the double layer integration. Then, for each shifted Chebyshev node *xk*, *k* ∈ {1, 2, 3, ..., *M*}, we obtain

$$w\_0 \int\_0^{x\_k} \int\_0^{\eta} (u^m - u^{m-1}) d\xi d\eta + \sum\_{j=1}^{m-1} w\_j \int\_0^{x\_k} \int\_0^{\eta} (u^{m-j} - u^{m-j-1}) d\xi d\eta$$

$$+ \int\_0^{x\_k} \int\_0^{\eta} \left( u^{m-1} \frac{\partial u^m}{\partial \xi} \right) d\xi d\eta - \nu u^m + d\_1 x\_k + d\_2 = \int\_0^{x\_k} \int\_0^{\eta} f(\xi, t\_m) d\xi d\eta,\tag{14}$$

where *d*<sup>1</sup> and *d*<sup>2</sup> are the arbitrary constants of integration. Next, we consider the nonlinear term in (14). By using the technique of integration by parts, we have

$$\begin{split} q(\mathbf{x}\_{k}) &:= \int\_{0}^{\mathbf{x}\_{k}} \int\_{0}^{\eta} \left( \boldsymbol{u}^{m-1} \frac{\partial \boldsymbol{u}^{m}}{\partial \boldsymbol{\xi}} \right) d\boldsymbol{\xi} d\eta \\ &= \int\_{0}^{\mathbf{x}\_{k}} \boldsymbol{u}^{m-1}(\eta) \boldsymbol{u}^{m}(\eta) \, d\eta - \int\_{0}^{\mathbf{x}\_{k}} \int\_{0}^{\eta} \frac{\partial \boldsymbol{u}^{m-1}(\boldsymbol{\xi})}{\partial \boldsymbol{\xi}} \boldsymbol{u}^{m}(\boldsymbol{\xi}) \, d\boldsymbol{\xi} d\eta \\ &= \int\_{0}^{\mathbf{x}\_{k}} \boldsymbol{u}^{m-1}(\eta) \boldsymbol{u}^{m}(\eta) \, d\eta - \int\_{0}^{\mathbf{x}\_{k}} \int\_{0}^{\eta} \sum\_{n=0}^{M-1} c\_{n}^{m-1} \frac{d\boldsymbol{T}\_{n}^{n}(\boldsymbol{\xi})}{d\boldsymbol{\xi}} \boldsymbol{u}^{m}(\boldsymbol{\xi}) \, d\boldsymbol{\xi} d\eta \\ &= \int\_{0}^{\mathbf{x}\_{k}} \boldsymbol{u}^{m-1}(\eta) \boldsymbol{u}^{m}(\eta) \, d\eta - \int\_{0}^{\mathbf{x}\_{k}} \int\_{0}^{\eta} \boldsymbol{\mathsf{T}}'(\boldsymbol{\xi}) \boldsymbol{\mathsf{T}}^{-1} \boldsymbol{u}^{m-1} \boldsymbol{u}^{m}(\boldsymbol{\xi}) \, d\boldsymbol{\xi} d\eta, \end{split} \tag{15}$$

where **T** (*ξ*) = - *dT*<sup>∗</sup> <sup>0</sup> (*ξ*) *<sup>d</sup><sup>ξ</sup>* , *dT*∗ <sup>1</sup> (*ξ*) *<sup>d</sup><sup>ξ</sup>* , *dT*∗ <sup>2</sup> (*ξ*) *<sup>d</sup><sup>ξ</sup>* , ..., *dT*<sup>∗</sup> *<sup>M</sup>*−1(*ξ*) *dξ* . . Thus, for *k* ∈ {1, 2, 3, ..., *M*}, (15) can be expressed in matrix form as

$$\begin{bmatrix} q(\mathbf{x}\_1) \\ q(\mathbf{x}\_2) \\ \vdots \\ q(\mathbf{x}\_M) \end{bmatrix} = \mathbf{A} \begin{bmatrix} \boldsymbol{\mu}^{m-1}(\mathbf{x}\_1)\boldsymbol{\mu}^m(\mathbf{x}\_1) \\ \boldsymbol{\mu}^{m-1}(\mathbf{x}\_2)\boldsymbol{\mu}^m(\mathbf{x}\_2) \\ \vdots \\ \boldsymbol{\mu}^{m-1}(\mathbf{x}\_M)\boldsymbol{\mu}^m(\mathbf{x}\_M) \end{bmatrix} - \mathbf{A}^2 \begin{bmatrix} \mathbf{T}'(\mathbf{x}\_1)\mathbf{T}^{-1}\mathbf{u}^{m-1}\boldsymbol{\mu}^m(\mathbf{x}\_1) \\ \mathbf{T}'(\mathbf{x}\_2)\mathbf{T}^{-1}\mathbf{u}^{m-1}\boldsymbol{\mu}^m(\mathbf{x}\_2) \\ \vdots \\ \mathbf{T}'(\mathbf{x}\_M)\mathbf{T}^{-1}\mathbf{u}^{m-1}\boldsymbol{\mu}^m(\mathbf{x}\_M) \end{bmatrix}$$

For computational convenience, we reduce the above equation into the matrix form:

$$\mathbf{q} = \mathbf{A} \text{diag}\left(\mathbf{u}^{m-1}\right) \mathbf{u}^{m} - \mathbf{A}^{2} \text{diag}\left(\mathbf{T}' \mathbf{T}^{-1} \mathbf{u}^{m-1}\right) \mathbf{u}^{m} := \mathbf{Q} \mathbf{u}^{m},\tag{16}$$

.

where **<sup>q</sup>** = [*q*(*x*1), *<sup>q</sup>*(*x*2), *<sup>q</sup>*(*x*3)..., *<sup>q</sup>*(*xM*)], **<sup>Q</sup>** <sup>=</sup> **<sup>A</sup>**diag(**u***m*−1) <sup>−</sup> **<sup>A</sup>**2diag(**<sup>T</sup> T**−1**u***m*−1), and

$$\mathbf{T}' = \begin{bmatrix} \mathbf{T}'(\mathbf{x}\_1) \\ \mathbf{T}'(\mathbf{x}\_2) \\ \vdots \\ \mathbf{T}'(\mathbf{x}\_M) \end{bmatrix} = \begin{bmatrix} \frac{d T\_0^\*(\boldsymbol{\xi})}{d\boldsymbol{\xi}}|\_{\boldsymbol{x}\_1} & \frac{d T\_1^\*(\boldsymbol{\xi})}{d\boldsymbol{\xi}}|\_{\boldsymbol{x}\_1} & \cdots & \frac{d T\_{M-1}^\*(\boldsymbol{\xi})}{d\boldsymbol{\xi}}|\_{\boldsymbol{x}\_1} \\ \frac{d T\_0^\*(\boldsymbol{\xi})}{d\boldsymbol{\xi}}|\_{\boldsymbol{x}\_2} & \frac{d T\_1^\*(\boldsymbol{\xi})}{d\boldsymbol{\xi}}|\_{\boldsymbol{x}\_2} & \cdots & \frac{d T\_{M-1}^\*(\boldsymbol{\xi})}{d\boldsymbol{\xi}}|\_{\boldsymbol{x}\_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{d T\_0^\*(\boldsymbol{\xi})}{d\boldsymbol{\xi}}|\_{\boldsymbol{x}\_M} & \frac{d T\_1^\*(\boldsymbol{\xi})}{d\boldsymbol{\xi}}|\_{\boldsymbol{x}\_M} & \cdots & \frac{d T\_{M-1}^\*(\boldsymbol{\xi})}{d\boldsymbol{\xi}}|\_{\boldsymbol{x}\_M} \end{bmatrix}. \tag{17}$$

Consequently, for *k* ∈ {1, 2, 3, ..., *M*} by hiring (16) and the idea of Boonklurb et al. [21], we can convert (14) into the matrix form as

$$\mathbf{w}\_{0}\mathbf{A}^{2}(\mathbf{u}^{m}-\mathbf{u}^{m-1})+\sum\_{j=1}^{m-1}w\_{j}\mathbf{A}^{2}(\mathbf{u}^{m-j}-\mathbf{u}^{m-j-1})+\mathbf{Q}\mathbf{u}^{m}-\nu\mathbf{u}^{m}+d\_{1}\mathbf{x}+d\_{2}\mathbf{i}=\mathbf{A}^{2}\mathbf{f}^{m}$$

$$\left[w\_{0}\mathbf{A}^{2}+\mathbf{Q}-\nu\mathbf{I}\right]\mathbf{u}^{m}+d\_{1}\mathbf{x}+d\_{2}\mathbf{i}=\mathbf{A}^{2}\mathbf{f}^{m}+w\_{0}\mathbf{A}^{2}\mathbf{u}^{m-1}-\sum\_{j=1}^{m-1}w\_{j}\mathbf{A}^{2}(\mathbf{u}^{m-j}-\mathbf{u}^{m-j-1}),\tag{18}$$

where **I** is the *M* × *M* identity matrix, **i** = [1, 1, 1, ..., 1] #, **u***<sup>m</sup>* = [*u*(*x*1, *tm*), *u*(*x*2, *tm*), ..., *u*(*xM*, *tm*)]#, **x** = [*x*1, *x*2, *x*3, ..., *xM*] #, **f***<sup>m</sup>* = [ *f*(*x*1, *tm*), *f*(*x*2, *tm*), ..., *f*(*xM*, *tm*)]# and **A** = **TT**<sup>−</sup>1. For the boundary conditions (9), we can change them into the vector forms by using the linear combination of the shifted Chebyshev polynomial at the *m*th iteration as follows.

$$\mu(0, t\_m) = \sum\_{n=0}^{M-1} c\_n^m T\_n^\*(0) = \sum\_{n=0}^{M-1} c\_n^m (-1)^n := \mathbf{t}\_l \mathbf{c}^m = \mathbf{t}\_l \mathbf{T}^{-1} \mathbf{u}^m = \boldsymbol{\psi}\_1(t\_m), \tag{19}$$

$$\mathbf{u}(L, t\_m) = \sum\_{n=0}^{M-1} c\_n^m T\_n^\*(L) = \sum\_{n=0}^{M-1} c\_n^m (1)^n := \mathbf{t}\_r \mathbf{c}^m = \mathbf{t}\_r \mathbf{T}^{-1} \mathbf{u}^m = \psi\_2(t\_m), \tag{20}$$

where **<sup>t</sup>***<sup>l</sup>* = [1, <sup>−</sup>1, 1, ...,(−1)*M*−1] and **<sup>t</sup>***<sup>r</sup>* = [1, 1, 1, ..., 1].

From (18)–(20), we can construct the following system of iterative linear equations that contains *M* + 2 unknowns

$$
\begin{bmatrix}
\frac{w\_0 \mathbf{A}^2 + \mathbf{Q} - \nu \mathbf{I} & \mathbf{x} & \mathbf{i}}{\mathbf{t}\_1 \mathbf{T}^{-1}} \\
\frac{\mathbf{t}\_1 \mathbf{T}^{-1}}{\mathbf{t}\_1 \mathbf{T}^{-1}} & \mathbf{0} & \mathbf{0}
\end{bmatrix}
\begin{bmatrix}
\mathbf{u}^m \\
\hline d\_1 \\
 d\_2
\end{bmatrix} = \begin{bmatrix}
\frac{\mathbf{A}^2 \mathbf{f}^m + w\_0 \mathbf{A}^2 \mathbf{u}^{m-1} - \mathbf{s}}{\Psi\_1(t\_m)} \\
\Psi\_2(t\_m)
\end{bmatrix},\tag{21}
$$

where **s** = ∑*m*−<sup>1</sup> *<sup>j</sup>*=<sup>1</sup> *wj***A**2(**u***m*−*<sup>j</sup>* <sup>−</sup> **<sup>u</sup>***m*−*j*−1) for *<sup>m</sup>* <sup>&</sup>gt; 1, and **<sup>s</sup>** <sup>=</sup> **<sup>0</sup>** if *<sup>m</sup>* <sup>=</sup> 1. Thus, starting from the initial condition **u**<sup>0</sup> = [*φ*(*x*1), *φ*(*x*2), *φ*(*x*3), ..., *φ*(*xM*)]#, the approximate solution **u***<sup>m</sup>* can be obtained by solving the system (21). We note here that, for any fixed *t* ∈ (0, *T*], the approximate solution *u*(*x*, *t*) for each arbitrary *x* ∈ [0, *L*] can be computed from

$$u(\mathbf{x}, t) = \sum\_{n=0}^{M-1} c\_n T\_n^\*(\mathbf{x}) = \mathbf{t}\_\mathbf{x} \mathbf{c}^m = \mathbf{t}\_\mathbf{x} \mathbf{T}^{-1} \mathbf{u}^m \mathbf{x}$$

where **t***<sup>x</sup>* = [*T*<sup>∗</sup> <sup>0</sup> (*x*), *T*<sup>∗</sup> <sup>1</sup> (*x*), *T*<sup>∗</sup> <sup>2</sup> (*x*), ..., *T*<sup>∗</sup> *<sup>M</sup>*−1(*x*)] and **<sup>u</sup>***<sup>m</sup>* is the final iterative solution of (21).

**Example 1.** *Consider the time-fractional Burgers' Equation (7) for x* ∈ (0, 1) *and t* ∈ (0, 1] *with*

$$f(\mathbf{x},t) = \frac{2t^{2-\alpha}e^{\mathbf{x}}}{\Gamma(3-\alpha)} + t^4e^{2\mathbf{x}} - \nu t^2 e^{\mathbf{x}}.$$

*subject to the initial condition*

$$u(\mathbf{x},0) = 0, \ \mathbf{x} \in [0,1]$$

*and the boundary conditions*

$$
\mu(0, t) = t^2, \; \mu(1, t) = et^2, \; t \in (0, 1].
$$

*The exact solution given by Esen and Tasbozan [12] is u*∗(*x*, *t*) = *t* <sup>2</sup>*ex. In the numerical test, we choose the kinematic viscosity ν* = 1*, α* = 0.5 *and* Δ*t* = 0.00025*. Table 1 presents the exact solution u*∗(*x*, 1)*, the numerical solution u*(*x*, 1) *by using our FIM-SCP in Algorithm 1, and the solution obtained by the quadratic B-spline finite element Galerkin method (QBS-FEM) proposed by Esen and Tasbozan [12]. The comparison between the absolute errors Ea (as the difference in absolute value between the approximate solution and the exact solution) of the two methods shows that our FIM-SCP is more accurate than QBS-FEM for M* = 10 *and similar accuracy for other M. Algorithm 1 acquires the significant improvement in accuracy with less computational nodal points M and regardless the time steps* Δ*t and the fractional order derivatives α. With the selection of α* = 0.5 *and M* = 40*, Table 2 shows the comparison between the exact solution u*∗(*x*, 1) *and the numerical solution u*(*x*, 1) *using Algorithm 1 for various values of* Δ*t* ∈ {0.05, 0.01, 0.005, 0.001}*. Table 3 illustrates the comparison between the exact solution u*∗(*x*, 1) *and the numerical solution u*(*x*, 1) *by our method for* Δ*t* = 0.001*, M* = 40*, and α* ∈ {0.1, 0.25, 0.75, 0.9}*. Moreover, the convergence rates are estimated by using our FIM-SCP with the discretization points <sup>M</sup>* <sup>=</sup> <sup>20</sup> *and step sizes* <sup>Δ</sup>*<sup>t</sup>* <sup>=</sup> <sup>2</sup>−*<sup>k</sup> for <sup>k</sup>* ∈ {4, 5, 6, 7, 8}*. In Table 4, we observe that these time convergence rates for the* -<sup>∞</sup> *norm indeed are almost O*(Δ*t*) *for the different α* ∈ (0, 1)*. Then, we also find the computational cost in term of CPU time(s) in Table 4. Finally, the graph of our approximate solutions, u*(*x*, *t*)*, for different times, t, and the surface plot of the solution under the parameters ν* = 1*, M* = 40*, and* Δ*t* = 0.001*, are provided in Figure 2.*

**Algorithm 1** The numerical algorithm for solving one-dimensional time-fractional Burgers' equation

**Input:** *α*, *ν*, *x*, *L*, *T*, *M*, Δ*t*, *φ*(*x*), *ψ*1(*t*), *ψ*2(*t*), and *f*(*x*, *t*). **Output:** An approximate solution *u*(*x*, *T*). 1: Set *xk* = *<sup>L</sup>* 2 cos <sup>2</sup>*k*−<sup>1</sup> <sup>2</sup>*<sup>M</sup> <sup>π</sup>* + 1 for *k* ∈ {1, 2, 3, ..., *M*}. 2: Compute **x**, **i**, **A**, **t***l*, **t***r*, **t***x*,**I**, **T**, **T**, **T**−<sup>1</sup> and **u**0. 3: Set *t*<sup>0</sup> = 0 and *m* = 0. 4: **while** *tm* ≤ *T* **do** 5: Set *m* = *m* + 1. 6: Set *tm* = *m*Δ*t*. 7: Set **s** = **0**. 8: **for** *j* = 1 to *m* − 1 **do** 9: Compute *wj* <sup>=</sup> (Δ*t*)−*<sup>α</sup>* Γ(2−*α*) (*<sup>j</sup>* <sup>+</sup> <sup>1</sup>)1−*<sup>α</sup>* <sup>−</sup> *<sup>j</sup>* <sup>1</sup>−*<sup>α</sup>* . 10: Compute **<sup>s</sup>** <sup>=</sup> **<sup>s</sup>** <sup>+</sup> *wj***A**2(**u***m*−*<sup>j</sup>* <sup>−</sup> **<sup>u</sup>***m*−*j*−1). 11: **end for** 12: Compute **f***<sup>m</sup>* = [ *f*(*x*1, *tm*), *f*(*x*2, *tm*), *f*(*x*3, *tm*), ..., *f*(*xM*, *tm*)]#. 13: Find **u***<sup>m</sup>* by solving the iterative linear system (21). 14: **end while**

15: **return** *u*(*x*, *T*) = **t***x***T**−1**u***m*.

**Table 1.** Comparison of absolute errors *Ea* between QBS-FEM and FIM-SCP for Example 1.


**Table 2.** Absolute errors *Ea* at different Δ*t* for Example 1 by FIM-SCP with *α* = 0.5 and *M* = 40.



**Table 3.** Absolute errors *Ea* at different *α* for Example 1 by FIM-SCP with Δ*t* = 0.001 and *M* = 40.

**Table 4.** Time convergence rates and CPU time(s) for Example 1 by FIM-SCP with *M* = 20.


**Figure 2.** The graphical results of Example 1 for *ν* = 1, *M* = 40, and Δ*t* = 0.001.

**Example 2.** *Consider the time-fractional Burgers' Equation (7) over* (0, 1) × (0, 1] *with f*(*x*, *t*) = 0*, subject to the initial condition*

$$
\mu(\mathbf{x},0) = \left[ -1 + 5 \cosh\left(\frac{\mathbf{x}}{2}\right) - 5 \sinh\left(\frac{\mathbf{x}}{2}\right) \right]^{-1}, \mathbf{x} \in [0,1].
$$

*and the boundary conditions*

$$u(0,t) = \left[5e^{-\frac{t^a}{4\Gamma(1+a)}} - 1\right]^{-1} \\
\text{and } u(1,t) = \left[5e^{-\left(\frac{1}{2} + \frac{t^a}{4\Gamma(1+a)}\right)} - 1\right]^{-1}, t \in (0,1].$$

*The exact solution given by Yokus and Kaya [13] is u*∗(*x*, *t*) = 5*e* − *x* <sup>2</sup> <sup>+</sup> *<sup>t</sup> α* 4Γ(1+*α*) − 1 −<sup>1</sup> *. In our numerical test, we choose the kinematic viscosity ν* = 1*, α* = 0.8*, M* = 50 *and* Δ*t* = 0.001*. Table 5 presents the exact solution u*∗(*x*, 0.02)*, the numerical solution u*(*x*, 0.02) *by using our FIM-SCP in Algorithm 1, and the solution obtained by using the expansion method and the Cole–Hopf transformation (EPM-CHT) proposed by Yokus and Kaya in [13]. The error norms L*<sup>2</sup> *and L*<sup>∞</sup> *of this problem between our FIM-SCP and EPM-CHT with α* = 0.8 *for the various values of nodal grid points M* ∈ {5, 10, 20, 25, 50} *and step size* Δ*t* = 1/*M are illustrated in Table 6. We see that our Algorithm 1 achieves improved accuracy with less computational cost. Furthermore,* *we estimate the convergence rates of time for this problem by using our FIM-SCP with the discretization nodes <sup>M</sup>* <sup>=</sup> <sup>20</sup> *and step sizes* <sup>Δ</sup>*<sup>t</sup>* <sup>=</sup> <sup>2</sup>−*<sup>k</sup> for <sup>k</sup>* ∈ {4, 5, 6, 7, 8} *which are tabulated in Table 7. We observe that these rates of convergence for the* -<sup>∞</sup> *norm indeed are almost linear convergence <sup>O</sup>*(Δ*t*) *for the different values <sup>α</sup>* <sup>∈</sup> (0, 1)*. Then, we also calculate the computational cost in term of CPU time(s) as shown in Table 7. Figure 3a,b depict the numerical solutions u*(*x*, *t*) *at different times t and the surface plot of u*(*x*, *t*)*, respectively.*


**Table 5.** Comparison of the exact and numerical solutions for Example 2 for *α* = 0.8 and *M* = 50.

**Table 6.** Comparison of the error norms *L*<sup>2</sup> and *L*<sup>∞</sup> for Example 2 with *α* = 0.8 and Δ*t* = 1/*M*.


**Table 7.** Time convergence rates and CPU time(s) for Example 2 by FIM-SCP with *M* = 20.


**Figure 3.** The graphical solutions of Example 2 for *ν* = 1, *M* = 40, and Δ*t* = 0.001.

### *4.2. Algorithm for Two-Dimensional Time-Fractional Burgers' Equation*

Let *L*<sup>1</sup> and *L*<sup>2</sup> be positive real numbers, Ω = (0, *L*1) × (0, *L*2), and *α* ∈ (0, 1]. Consider the two-dimensional time-fractional Burgers' equation with a viscosity *ν* > 0,

$$\left(\frac{\partial^{\mu}u}{\partial t^{a}} + u\left(\frac{\partial u}{\partial x} + \frac{\partial u}{\partial y}\right) - v\left(\frac{\partial^{2}u}{\partial x^{2}} + \frac{\partial^{2}u}{\partial y^{2}}\right) = f(x, y, t), \quad (x, y) \in \Omega, \ t \in (0, T], \tag{22}$$

subject to the initial condition

$$
\mu(\mathbf{x}, y, 0) = \phi(\mathbf{x}, y), \ (\mathbf{x}, y) \in \Omega,\tag{23}
$$

and the boundary conditions

$$\begin{aligned} u(0, y, t) &= \psi\_1(y, t), \ u(L\_1, y, t) = \psi\_2(y, t), \ y \in [0, L\_2], \ t \in (0, T], \\ u(x, 0, t) &= \psi\_3(x, t), \ u(x, L\_2, t) = \psi\_4(x, t), \ x \in [0, L\_1], \ t \in (0, T], \end{aligned} \tag{24}$$

where *f* , *φ*, *ψ*1, *ψ*2, *ψ*3, and *ψ*<sup>4</sup> are given functions. As *<sup>∂</sup> <sup>∂</sup><sup>x</sup>* ( *<sup>u</sup>*<sup>2</sup> <sup>2</sup> ) = *<sup>u</sup> <sup>∂</sup><sup>u</sup> <sup>∂</sup><sup>x</sup>* and *<sup>∂</sup> <sup>∂</sup><sup>y</sup>* ( *<sup>u</sup>*<sup>2</sup> <sup>2</sup> ) = *<sup>u</sup> <sup>∂</sup><sup>u</sup> <sup>∂</sup><sup>y</sup>* , we can transform (22) to

$$
\frac{
\partial^a u
}{
\partial t^a
} + \frac{
\partial
}{
\partial x
} \left(
\frac{
u^2
}{2}
\right) + \frac{
\partial
}{
\partial y
} \left(
\frac{
u^2
}{2}
\right) - \nu \left(
\frac{
\partial^2 u
}{
\partial x^2
} + \frac{
\partial^2 u
}{
\partial y^2
}
\right) = f(x, y, t).
\tag{25}
$$

Let us linearize (25) by imposing the iteration at time *tm* <sup>=</sup> *<sup>m</sup>*(Δ*t*) for *<sup>m</sup>* <sup>∈</sup> <sup>N</sup> and <sup>Δ</sup>*<sup>t</sup>* is an arbitrary time step. Thus, we have

$$\frac{\partial^{\alpha}u}{\partial t^{a}}\Big|\_{t=t\_{m}} + \frac{\partial}{\partial x}\left(\frac{u^{m-1}}{2}u^{m}\right) + \frac{\partial}{\partial y}\left(\frac{u^{m-1}}{2}u^{m}\right) - \nu\left(\frac{\partial^{2}u^{m}}{\partial x^{2}} + \frac{\partial^{2}u^{m}}{\partial y^{2}}\right) = f^{m},\tag{26}$$

where *f <sup>m</sup>* = *f*(*x*, *y*, *tm*) and *u<sup>m</sup>* = *u*(*x*, *y*, *tm*) is the numerical solution at the *m*th iteration. Next, consider the fractional order derivative in the Caputo sense as defined in Definition 2, by using (12), then (26) becomes

$$\sum\_{j=0}^{m-1} w\_j (u^{m-j} - u^{m-j-1}) + \frac{\partial}{\partial x} \left( \frac{u^{m-1}}{2} u^m \right) + \frac{\partial}{\partial y} \left( \frac{u^{m-1}}{2} u^m \right) - \nu \left( \frac{\partial^2 u^m}{\partial x^2} + \frac{\partial^2 u^m}{\partial y^2} \right) = f^m \nu$$

where *wj* <sup>=</sup> (Δ*t*)−*<sup>α</sup>* Γ(2−*α*) (*<sup>j</sup>* <sup>+</sup> <sup>1</sup>)1−*<sup>α</sup>* <sup>−</sup> *<sup>j</sup>* <sup>1</sup>−*<sup>α</sup>* . The above equation can be transformed to the integral equation by taking twice integrations over both *x* and *y*, we have

$$\begin{split} \sum\_{j=0}^{m-1} \boldsymbol{w}\_{j} \int\_{0}^{y} \int\_{0}^{\eta\_{2}} \int\_{0}^{\mathbf{x}} \int\_{0}^{\xi\_{2}} (\boldsymbol{u}^{m-j} - \boldsymbol{u}^{m-j-1}) d\xi\_{1} d\xi\_{2} d\eta\_{1} d\eta\_{2} + \frac{1}{2} \int\_{0}^{y} \int\_{0}^{\eta\_{2}} \int\_{0}^{\mathbf{x}} (\boldsymbol{u}^{m-1} \boldsymbol{u}^{m}) d\xi\_{2} d\eta\_{1} d\eta\_{2} \\ + \frac{1}{2} \int\_{0}^{y} \int\_{0}^{\mathbf{x}} \int\_{0}^{\xi\_{2}} (\boldsymbol{u}^{m-1} \boldsymbol{u}^{m}) d\xi\_{1} d\xi\_{2} d\eta\_{2} - \nu \int\_{0}^{y} \int\_{0}^{\eta\_{2}} \boldsymbol{u}^{m} d\eta\_{1} d\eta\_{2} - \nu \int\_{0}^{\mathbf{x}} \int\_{0}^{\xi\_{2}} \boldsymbol{u}^{m} d\xi\_{1} d\xi\_{2} \\ + \ x \boldsymbol{g}\_{1}(y) + \xi \boldsymbol{z}(y) + \boldsymbol{y} h\_{1}(\mathbf{x}) + h \boldsymbol{z}(\mathbf{x}) = \int\_{0}^{y} \int\_{0}^{\eta\_{2}} \int\_{0}^{\mathbf{x}} \int\_{0}^{\xi\_{2}} f(\xi\_{1}, \eta\_{1}, t\_{m}) d\xi\_{1} d\xi\_{2} d\eta\_{1} d\eta\_{2}, \end{split} \tag{27}$$

where *g*1(*y*), *g*2(*y*), *h*1(*x*), and *h*2(*x*) are the arbitrary functions emerged in the process of integration which can be approximated by the shifted Chebyshev polynomial interpolation. For *r* ∈ {1, 2}, define

$$h\_r(\mathbf{x}) = \sum\_{i=0}^{M-1} h\_r^{(i)} T\_i^\*(\mathbf{x}) \text{ and } \lg\_r(\mathbf{y}) = \sum\_{j=0}^{N-1} \lg\_r^{(j)} T\_j^\*(\mathbf{y}),\tag{28}$$

where *h* (*i*) *<sup>r</sup>* and *g* (*j*) *<sup>r</sup>* , for *i* ∈ {0, 1, 2, ..., *M* − 1} and *j* ∈ {0, 1, 2, ..., *N* − 1}, are the unknown values of these interpolated points. Next, we divide the domain Ω into a mesh with *M* nodes by *N* nodes along *x*- and *y*-directions, respectively. We denote the nodes along the *x*-direction by **x** = {*x*1, *x*2, *x*3, ..., *xM*} and the nodes along the *y*-direction by **y** = {*y*1, *y*2, *y*3, ..., *yN*}. These nodes along the *x*- and *y*-directions are the zeros of shifted Chebyshev polynomials *T*∗ *<sup>M</sup>*(*x*) and *T*<sup>∗</sup> *<sup>N</sup>*(*y*), respectively. Thus, the total number of grid points in the system is *P* = *M* × *N*, where each point is an entry in the set of Cartesian product **x** × **y** ordering as global type system, i.e., (*xi*, *yi*) ∈ **x** × **y** for *i* ∈ {1, 2, 3, ..., *P*}. By substituting each node in (27) and hiring **A***x* and **A***y* in Section 3.2, we can change (27) to the matrix form as

$$\sum\_{j=0}^{m-1} w\_j \mathbf{A}\_x^2 \mathbf{A}\_y^2 (\mathbf{u}^{m-j} - \mathbf{u}^{m-j-1}) + \frac{1}{2} \mathbf{A}\_x \mathbf{A}\_y^2 \text{diag}(\mathbf{u}^{m-1}) \mathbf{u}^m + \frac{1}{2} \mathbf{A}\_x^2 \mathbf{A}\_y \text{diag}(\mathbf{u}^{m-1}) \mathbf{u}^m$$

$$-\nu \mathbf{A}\_y^2 \mathbf{u}^m - \nu \mathbf{A}\_x^2 \mathbf{u}^m + \mathbf{X} \Phi\_y \mathbf{g}\_1 + \Phi\_y \mathbf{g}\_2 + \mathbf{Y} \Phi\_x \mathbf{h}\_1 + \Phi\_y \mathbf{h}\_2 = \mathbf{A}\_x^2 \mathbf{A}\_y^2 \mathbf{f}^m.$$

Simplifying the above equation yields

$$\mathbf{K}\mathbf{u}^{\rm m} + \mathbf{X}\boldsymbol{\Phi}\_{\rm y}\mathbf{g}\_1 + \boldsymbol{\Phi}\_{\rm y}\mathbf{g}\_2 + \mathbf{Y}\boldsymbol{\Phi}\_{\rm x}\mathbf{h}\_1 + \boldsymbol{\Phi}\_{\rm y}\mathbf{h}\_2 = \mathbf{A}\_x^2 \mathbf{A}\_y^2 \mathbf{f}^m + w\_0 \mathbf{A}\_x^2 \mathbf{A}\_y^2 \mathbf{u}^{m-1} - \mathbf{s},\tag{29}$$

where each parameter contained in (29) can be defined as follows.

**K** = *w*0**A**<sup>2</sup> *x***A**<sup>2</sup> *<sup>y</sup>* + <sup>1</sup> 2**A***x***A**<sup>2</sup> *<sup>y</sup>*diag(**u***m*−1) + <sup>1</sup> 2**A**<sup>2</sup> *<sup>x</sup>***A***y*diag(**u***m*−1) <sup>−</sup> *<sup>ν</sup>***A**<sup>2</sup> *<sup>y</sup>* <sup>−</sup> *<sup>ν</sup>***A**<sup>2</sup> *x*, **s** = ∑*m*−<sup>1</sup> *<sup>j</sup>*=<sup>1</sup> *wj***A**<sup>2</sup> *x***A**<sup>2</sup> *<sup>y</sup>*(**u***m*−*<sup>j</sup>* <sup>−</sup> **<sup>u</sup>***m*−*j*−1), **X** = diag(*x*1, *x*2, *x*3, ..., *xP*), **Y** = diag(*y*1, *y*2, *y*3, ..., *yP*), **h***<sup>r</sup>* = [*h* (0) *<sup>r</sup>* , *h* (1) *<sup>r</sup>* , *h* (2) *<sup>r</sup>* , ..., *h* (*M*−1) *<sup>r</sup>* ] # for *r* ∈ {1, 2}, **g***<sup>r</sup>* = [*g* (0) *<sup>r</sup>* , *<sup>g</sup>* (1) *<sup>r</sup>* , *<sup>g</sup>* (2) *<sup>r</sup>* , ..., *<sup>g</sup>* (*N*−1) *<sup>r</sup>* ] # for *r* ∈ {1, 2}, **f***<sup>m</sup>* = [ *f*(*x*1, *y*1, *tm*), *f*(*x*2, *y*2, *tm*), *f*(*x*3, *y*3, *tm*), ..., *f*(*xP*, *yP*, *tm*)]#, **u***<sup>m</sup>* = [*u*(*x*1, *y*1, *tm*), *u*(*x*2, *y*2, *tm*), *u*(*x*3, *y*3, *tm*), ..., *u*(*xP*, *yP*, *tm*)]#.

From (28), we obtain **Φ***<sup>x</sup>* and **Φ***y*, where

$$
\boldsymbol{\Phi}\_{\boldsymbol{x}} = \begin{bmatrix} T\_0^\*(\mathbf{x}\_1) & T\_1^\*(\mathbf{x}\_1) & \cdots & T\_{M-1}^\*(\mathbf{x}\_1) \\ T\_0^\*(\mathbf{x}\_2) & T\_1^\*(\mathbf{x}\_2) & \cdots & T\_{M-1}^\*(\mathbf{x}\_2) \\ \vdots & \vdots & \ddots & \vdots \\ T\_0^\*(\mathbf{x}\_P) & T\_1^\*(\mathbf{x}\_P) & \cdots & T\_{M-1}^\*(\mathbf{x}\_P) \end{bmatrix} \quad \text{and} \quad \boldsymbol{\Phi}\_{\boldsymbol{y}} = \begin{bmatrix} T\_0^\*(\boldsymbol{y}\_1) & T\_1^\*(\boldsymbol{y}\_1) & \cdots & T\_{N-1}^\*(\boldsymbol{y}\_1) \\ T\_0^\*(\boldsymbol{y}\_2) & T\_1^\*(\boldsymbol{y}\_2) & \cdots & T\_{N-1}^\*(\boldsymbol{y}\_2) \\ \vdots & \vdots & \ddots & \vdots \\ T\_0^\*(\boldsymbol{y}\_P) & T\_1^\*(\boldsymbol{y}\_P) & \cdots & T\_{N-1}^\*(\boldsymbol{y}\_P) \end{bmatrix}.
$$

For the boundary conditions (24), we can transform them into the matrix form, similar the idea in [21], by employing the linear combination of the shifted Chebyshev polynomials as follows,

• Left & Right boundary conditions: For each fixed *y* ∈ {*y*1, *y*2, *y*3, ..., *yN*}, then

$$u(0, y, t\_m) = \sum\_{n=0}^{M-1} c\_n^m T\_n^\*(0) := \mathbf{t}\_l \mathbf{T}\_M^{-1} \mathbf{u}^m(\cdot, y) = \boldsymbol{\upvarphi}\_1(y, t\_m) \quad \Rightarrow \quad (\mathbf{I}\_N \otimes \mathbf{t}\_l \mathbf{T}\_M^{-1}) \mathbf{u}^m = \mathbf{Y}\_1 \quad \text{(30)}$$

$$u(L\_1, y, t\_m) = \sum\_{n=0}^{M-1} c\_n^m T\_n^\*(L\_1) := \mathbf{t}\_r \mathbf{T}\_M^{-1} \mathbf{u}^m(\cdot, y) = \boldsymbol{\upvarphi}\_2(y, t\_m) \quad \Rightarrow \quad (\mathbf{I}\_N \otimes \mathbf{t}\_r \mathbf{T}\_M^{-1}) \mathbf{u}^m = \mathbf{Y}\_2 \quad \text{(31)}$$

• Bottom & Top boundary conditions: For each fixed *x* ∈ {*x*1, *x*2, *x*3, ..., *xM*}, then

$$\left(\mathbf{u}(\mathbf{x},0,t\_m)\right) \quad \stackrel{\textstyle \mathbf{T} \rightarrow \mathbf{0}}{=} \mathbf{c}\_n^{\mathrm{m}} T\_n^\*(0) := \mathbf{t}\_b \mathbf{T}\_N^{-1} \mathbf{u}^{\mathrm{m}}(\mathbf{x}, \cdot) = \boldsymbol{\uppsi}\_3(\mathbf{x}, t\_m) \quad \Rightarrow \ (\mathbf{I}\_M \otimes \mathbf{t}\_b \mathbf{T}\_N^{-1}) \mathbf{P}^{-1} \mathbf{u}^{\mathrm{m}} = \mathbf{Y}\_3 \tag{32}$$

$$\mathbf{u}(\mathbf{x}, \mathbf{L}2, t\_{\mathrm{m}}) = \sum\_{n=0}^{N-1} c\_{n}^{\mathrm{m}} T\_{n}^{\*}(L\_{2}) := \mathbf{t}\_{l} \mathbf{T}\_{N}^{-1} \mathbf{u}^{\mathrm{m}}(\mathbf{x}, \cdot) = \boldsymbol{\uppsi}\_{4}(\mathbf{x}, t\_{\mathrm{m}}) \Rightarrow (\mathbf{I}\_{M} \otimes \mathbf{t}\_{l} \mathbf{T}\_{N}^{-1}) \mathbf{P}^{-1} \mathbf{u}^{\mathrm{m}} = \mathbf{Y}\_{4} \tag{33}$$

where **<sup>I</sup>***<sup>M</sup>* and **<sup>I</sup>***<sup>N</sup>* are, respectively, the *<sup>M</sup>* <sup>×</sup> *<sup>M</sup>* and *<sup>N</sup>* <sup>×</sup> *<sup>N</sup>* identity matrices, **<sup>T</sup>**−<sup>1</sup> *<sup>M</sup>* and **<sup>T</sup>**−<sup>1</sup> *<sup>N</sup>* are, respectively, the *M* × *M* and *N* × *N* matrices defined in Lemma 1, **P** is defined in (6), and the other parameters are

$$\begin{array}{rclclcl}\mathbf{t}\_{l}&=&[1,1,1,\ldots,1^{M-1}]\_{\prime}\\\mathbf{t}\_{l}&=&[1,1,1,\ldots,1^{N-1}]\_{\prime}\\\mathbf{t}\_{l}&=&[1,-1,1,\ldots,(-1)^{M-1}]\_{\prime}\\\mathbf{t}\_{b}&=&[1,-1,1,\ldots,(-1)^{N-1}]\_{\prime}\\\mathbf{Y}\_{i}&=&[\psi\_{l}(y\_{1},t\_{m}),\psi\_{l}(y\_{2},t\_{m}),\psi\_{l}(y\_{3},t\_{m}),\ldots,\psi\_{l}(y\_{N},t\_{m})]^{\top}\text{ for } i\in\{1,2\},\\\mathbf{Y}\_{j}&=&[\psi\_{l}(x\_{1},t\_{m}),\psi\_{l}(x\_{2},t\_{m}),\psi\_{j}(x\_{3},t\_{m}),\ldots,\psi\_{j}(x\_{M},t\_{m})]^{\top}\text{ for } j\in\{3,4\}.\end{array}$$

Finally, we can construct the system of iterative linear equations from Equations (29)–(33) for a total of *P* + 2(*M* + *N*) unknowns, including **u***m*, **g**1, **g**2, **h**<sup>1</sup> and **h**2, as follows,


Thus, the approximate solutions **u***<sup>m</sup>* can be reached by solving (34) in conjunction with the initial condition (23), that is, **<sup>u</sup>**<sup>0</sup> = [*φ*(*x*1, *<sup>y</sup>*1), *<sup>φ</sup>*(*x*2, *<sup>y</sup>*2), ..., *<sup>φ</sup>*(*xP*, *yP*)]#, where for all (*xi*, *yi*) <sup>∈</sup> **<sup>x</sup>** <sup>×</sup> **<sup>y</sup>**. Therefore, an arbitrary solution *u*(*x*, *y*, *t*) at any fixed time *t* can be estimated from

$$u(\mathbf{x}, y, t) = \mathbf{t}\_y \mathbf{T}\_N^{-1} (\mathbf{I}\_N \otimes \mathbf{t}\_x \mathbf{T}\_M^{-1}) \mathbf{u}^m,$$

where **t***<sup>x</sup>* = [*T*<sup>∗</sup> <sup>0</sup> (*x*), *T*<sup>∗</sup> <sup>1</sup> (*x*), *T*<sup>∗</sup> <sup>2</sup> (*x*), ..., *T*<sup>∗</sup> *<sup>M</sup>*−1(*x*)] and **<sup>t</sup>***<sup>y</sup>* = [*T*<sup>∗</sup> <sup>0</sup> (*y*), *T*<sup>∗</sup> <sup>1</sup> (*y*), *T*<sup>∗</sup> <sup>2</sup> (*y*), ..., *T*<sup>∗</sup> *<sup>N</sup>*−1(*y*)].

**Example 3.** *Consider the 2D time-fractional Burgers' Equation (22) for* (*x*, *y*) ∈ Ω = (0, 1) × (0, 1) *and t* ∈ (0, 1] *with the forcing term*

$$f(\mathbf{x}, y, t) = (\mathbf{x}^2 - \mathbf{x})(y^2 - y) \left[ \frac{2t^{1-a}}{\Gamma(2-a)} + t^2(\mathbf{x} + y - 1)(2\mathbf{x}y - \mathbf{x} - y) \right] - 2\nu t(\mathbf{x}^2 + y^2 - \mathbf{x} - y),$$

*subject to the both homogeneous of initial and boundary conditions. The analytical solution of this problem is <sup>u</sup>*∗(*x*, *<sup>y</sup>*, *<sup>t</sup>*) = *<sup>t</sup>*(*x*<sup>2</sup> <sup>−</sup> *<sup>x</sup>*)(*y*<sup>2</sup> <sup>−</sup> *<sup>y</sup>*)*. For the numerical test, we pick <sup>ν</sup>* <sup>=</sup> <sup>100</sup>*, <sup>α</sup>* <sup>=</sup> 0.5*,* <sup>Δ</sup>*<sup>t</sup>* <sup>=</sup> 0.01*, and <sup>M</sup>* <sup>=</sup> *<sup>N</sup>* <sup>=</sup> <sup>10</sup>*. In Table 8, the solutions approximated by our FIM-SCP Algorithm 2 are presented in the space domain* Ω *for various times t. We test the accuracy of our method by measuring it with the absolute error Ea. In addition, we seek the rates of convergence via* -<sup>∞</sup> *norm of our Algorithm 2 with the nodal points M* = *N* = 10 *and different step sizes* <sup>Δ</sup>*<sup>t</sup>* <sup>=</sup> <sup>2</sup>−*<sup>k</sup> for <sup>k</sup>* ∈ {4, 5, 6, 7, 8}*, we found that these convergence rates approach to the linear convergence O*(Δ*t*) *as shown in Table 9 together with the CPU times(s). Also, the graphically numerical solutions are provided in Figure 4.*

**Table 8.** Exact and numerical solutions of Example 3 for *α* = 0.5, *M* = *N* = 10 and Δ*t* = 0.01.


**Algorithm 2** The numerical algorithm for solving two-dimensional time-fractional Burgers' equation

**Input:** *α*, *ν*, *x*, *y*, *T*, *M*, *L*1, *L*2, Δ*t*, *φ*(*x*, *y*), *ψ*1(*y*, *t*), *ψ*2(*y*, *t*), *ψ*3(*x*, *t*), *ψ*4(*x*, *t*) and *f*(*x*, *y*, *t*).

**Output:** An approximate solution *u*(*x*, *y*, *T*).


12: Compute *wj* <sup>=</sup> (Δ*t*)−*<sup>α</sup>* <sup>Γ</sup>(2−*α*)[(*<sup>j</sup>* <sup>+</sup> <sup>1</sup>)1−*<sup>α</sup>* <sup>−</sup> *<sup>j</sup>* <sup>1</sup>−*α*].

$$\text{13:} \qquad \text{Compute } \mathbf{s} = \mathbf{s} + w\_j \mathbf{A}\_x^2 \mathbf{A}\_y^2 (\mathbf{u}^{m-j} - \mathbf{u}^{m-j-1}) .$$


$$\text{1s: } \mathtt{return} \,\,\mu(\mathtt{x}, \mathtt{y}, T) = \mathtt{t}\_{\mathtt{y}} \mathtt{T}\_{N}^{-1} (\mathtt{I}\_{N} \otimes \mathtt{t}\_{\mathtt{x}} \mathtt{T}\_{M}^{-1}) \mathtt{u}^{m}.$$

**Table 9.** Time convergence rates and CPU time(s) for Example 3 by FIM-SCP with *M* = *N* = 10.


**Figure 4.** The graphical solutions of Example 3 for *ν* = 100, *M* = *N* = 15, and Δ*t* = 0.01..

**Example 4.** *Consider the 2D Burgers' Equation (22) for x* ∈ Ω = (0, 1) × (0, 1) *and t* ∈ (0, 1] *with the homogeneous initial condition and the forcing term*

$$\begin{aligned} f(x,y,t) &= \frac{6t^{3-\alpha}(1-x^2)^2(1-y^2)^2}{\Gamma(4-\alpha)} + 4t^6(1-x^2)^3(1-y^2)^3(x^2y+xy^2-x-y) \\ &- 0.4t^3\left[ (y^2-1)^2(3x^2-1) + (x^2-1)^2(3y^2-1) \right] .\end{aligned}$$

*subject to the boundary conditions corresponding to the analytical solution given by Cao et al. [14] is u*∗(*x*, *y*, *t*) = *t* <sup>3</sup>(<sup>1</sup> <sup>−</sup> *<sup>x</sup>*2)2(<sup>1</sup> <sup>−</sup> *<sup>y</sup>*2)2*. By picking the parameters <sup>ν</sup>* <sup>=</sup> 0.1*, <sup>α</sup>* <sup>=</sup> 0.5*, and <sup>M</sup>* <sup>=</sup> *<sup>N</sup>* <sup>=</sup> <sup>10</sup>*, the comparison of error norm L*<sup>2</sup> *between our FIM-SCP via Algorithm 2 and the discontinuous Galerkin method combined with finite different scheme (DGM-FDS) presented by Cao et al. [14] are displayed in Table 10 at time t* = 0.1*. We can see that our method gives a higher accuracy than the DGM-FDS at the same step size* Δ*t. Next, we provide the CPU times(s) and time convergence rates based on* -<sup>∞</sup> *norm of our algorithm for this problem in Table 11. Then, we see that they converge to the linear rate O*(Δ*t*)*. Finally, the graphical solutions of this Example 4 are provided in Figure 5.*

**Table 10.** Error norms *L*<sup>2</sup> between DGM-FDS and FIM-SCP of Example 4 for *M* = *N* = 10.



**Table 11.** Time convergence rates and CPU time(s) for Example 4 by FIM-SCP with *M* = *N* = 10.

**Figure 5.** The graphical solutions of Example 4 for *ν* = 0.1, *M* = *N* = 15, and Δ*t* = 0.01.

### *4.3. Algorithm for Time-Fractional Coupled Burgers' Equation*

Consider the following coupled Burgers' equation with fractional time derivative for *α* ∈ (0, 1]

$$\begin{aligned} \frac{\partial^{\alpha}u}{\partial t^{\alpha}} &= \frac{\partial^{2}u}{\partial x^{2}} + 2u\frac{\partial u}{\partial x} - \frac{\partial(uv)}{\partial x} + f(\mathbf{x}, t), \ \mathbf{x} \in (0, L), \ t \in (0, T] \\ \frac{\partial^{\beta}v}{\partial t^{\beta}} &= \frac{\partial^{2}v}{\partial x^{2}} + 2v\frac{\partial v}{\partial x} - \frac{\partial(uv)}{\partial x} + g(\mathbf{x}, t), \ \mathbf{x} \in (0, L), \ t \in (0, T] \end{aligned} \tag{35}$$

subject to the initial conditions

$$\begin{aligned} u(\mathbf{x},0) &= \phi\_1(\mathbf{x}), \ \mathbf{x} \in [0,L], \\ v(\mathbf{x},0) &= \phi\_2(\mathbf{x}), \ \mathbf{x} \in [0,L], \end{aligned} \tag{36}$$

and the boundary conditions

$$\begin{aligned} u(0,t) &= \psi\_1(t), \; u(L,t) = \psi\_2(t), \; t \in (0,T], \\ v(0,t) &= \psi\_3(t), \; v(L,t) = \psi\_4(t), \; t \in (0,T], \end{aligned} \tag{37}$$

where *f*(*x*, *t*), *g*(*x*, *t*), *φ*<sup>1</sup> (*x*), *φ*<sup>2</sup> (*x*), *ϕ*<sup>1</sup> (*t*), *ϕ*<sup>2</sup> (*t*), *ϕ*<sup>3</sup> (*t*), and *ϕ*<sup>4</sup> (*t*) are the given functions. The procedure of using our FIM for solving *u* and *v* are similar, we only discuss here the details in finding the approximate solution *u*.

We begin with linearizing the system (35) by taking the an iteration of time *tm* <sup>=</sup> *<sup>m</sup>*(Δ*t*) for *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>, where Δ*t* is a time step. We obtain

$$\frac{\partial^{\mathfrak{x}}u}{\partial t^{\mathfrak{x}}}\Big|\_{t=t\_{\mathfrak{m}}} = \frac{\partial^{2}u^{\mathfrak{m}}}{\partial \mathfrak{x}^{2}} + 2u^{\mathfrak{m}-1}\frac{\partial u^{\mathfrak{m}}}{\partial \mathfrak{x}} - \frac{\partial(v^{\mathfrak{m}-1}u^{\mathfrak{m}})}{\partial \mathfrak{x}} + f(\mathfrak{x}, t\_{\mathfrak{m}}),$$

$$\frac{\partial^{\mathfrak{E}}v}{\partial t^{\mathfrak{E}}}\Big|\_{t=t\_{\mathfrak{m}}} = \frac{\partial^{2}v^{\mathfrak{m}}}{\partial \mathfrak{x}^{2}} + 2v^{\mathfrak{m}-1}\frac{\partial v^{\mathfrak{m}}}{\partial \mathfrak{x}} - \frac{\partial(u^{\mathfrak{m}-1}v^{\mathfrak{m}})}{\partial \mathfrak{x}} + g(\mathfrak{x}, t\_{\mathfrak{m}}),$$

where *u<sup>m</sup>* = *u*(*x*, *tm*) and *v<sup>m</sup>* = *v*(*x*, *tm*) are numerical solutions of *u* and *v* in the *m*th iteration, respectively. Next, let us consider the fractional time derivative for *α* ∈ (0, 1] in the Caputo sense by using the same procedure as in (12), by taking the double layer integration on both sides, we obtain

$$\sum\_{j=0}^{m-1} w\_j^\mu \int\_0^{x\_k} \int\_0^\eta (u^{m-j} - u^{m-j-1}) d\_5^x d\eta = u^m(\mathbf{x}\_k) + 2 \int\_0^{x\_k} \int\_0^\eta \left( u^{m-1} \frac{\partial u^m}{\partial \xi} \right) d\_5^x d\eta$$

$$- \int\_0^{x\_k} (v^{m-1} u^m) d\eta + \int\_0^{x\_k} \int\_0^\eta f(\xi, t\_m) d\xi d\eta + d\_1 \mathbf{x}\_k + d\_2. \tag{38}$$

$$\sum\_{j=0}^{m-1} w\_j^\mu \int\_0^{x\_k} \int\_0^\eta (v^{m-j} - v^{m-j-1}) d\xi d\eta = v^m(\mathbf{x}\_k) + 2 \int\_0^{x\_k} \int\_0^\eta \left( v^{m-1} \frac{\partial v^m}{\partial \xi} \right) d\xi d\eta$$

$$- \int\_0^{x\_k} (u^{m-1} v^m) d\eta + \int\_0^{x\_k} \int\_0^\eta g(\xi, t\_m) d\xi d\eta + d\_3 \mathbf{x}\_k + d\_4. \tag{39}$$

where *w<sup>γ</sup> <sup>j</sup>* <sup>=</sup> (Δ*t*)−*<sup>γ</sup>* Γ(2−*γ*) (*<sup>j</sup>* <sup>+</sup> <sup>1</sup>)1−*<sup>γ</sup>* <sup>−</sup> *<sup>j</sup>* <sup>1</sup>−*<sup>γ</sup>* for *γ* ∈ {*α*, *β*}, and *d*1, *d*2, *d*3, and *d*<sup>4</sup> are arbitrary constants of integration. For the nonlinear terms in (38) and (39), by using the same process as in (15), we let

$$q\_1(\mathbf{x}\_k) := \int\_0^{\chi\_k} \int\_0^{\eta} \left(\mathbf{u}^{m-1} \frac{\partial \mathbf{u}^m}{\partial \xi}\right) d\xi d\eta = \int\_0^{\chi\_k} \mathbf{u}^{m-1} \mathbf{u}^m d\eta - \int\_0^{\chi\_k} \int\_0^{\eta} \mathbf{T}'(\xi) \mathbf{T}^{-1} \mathbf{u}^{m-1} \mathbf{u}^m d\xi d\eta,$$

$$q\_2(\mathbf{x}\_k) := \int\_0^{\chi\_k} \int\_0^{\eta} \left(\mathbf{v}^{m-1} \frac{\partial \mathbf{v}^m}{\partial \xi}\right) d\xi d\eta = \int\_0^{\chi\_k} \mathbf{v}^{m-1} \mathbf{v}^m d\eta - \int\_0^{\chi\_k} \int\_0^{\eta} \mathbf{T}'(\xi) \mathbf{T}^{-1} \mathbf{v}^{m-1} \mathbf{v}^m d\xi d\eta.$$

For computational convenience, we express *q*1(*xk*) and *q*2(*xk*) into matrix forms as

$$\mathbf{q}\_1 = \mathbf{A} \text{diag}(\mathbf{u}^{m-1}) \mathbf{u}^m - \mathbf{A}^2 \text{diag}(\mathbf{T}' \mathbf{T}^{-1} \mathbf{u}^{m-1}) \mathbf{u}^m := \mathbf{Q}\_1 \mathbf{u}^m,\tag{40}$$

$$\mathbf{q}\_2 = \mathbf{A} \text{diag}(\mathbf{v}^{m-1}) \mathbf{v}^m - \mathbf{A}^2 \text{diag}(\mathbf{T}' \mathbf{T}^{-1} \mathbf{v}^{m-1}) \mathbf{v}^m := \mathbf{Q}\_2 \mathbf{v}^m,\tag{41}$$

where **T** is defined in (17) and other parameters obtained on (40) and (41) are

$$\begin{array}{rclcrcl}\mathbf{Q}\_{1} &=& \mathbf{Ad} \text{diag}(\mathbf{u}^{\text{m}-1}) - \mathbf{A}^{2} \text{diag}(\mathbf{T}^{\prime}\mathbf{T}^{-1}\mathbf{u}^{\prime\prime -1}),\\\mathbf{Q}\_{2} &=& \mathbf{Ad} \text{diag}(\mathbf{v}^{\text{m}-1}) - \mathbf{A}^{2} \text{diag}(\mathbf{T}^{\prime}\mathbf{T}^{-1}\mathbf{v}^{\prime\prime -1}),\\\mathbf{u}^{\text{m}} &=& \left[\boldsymbol{u}(\boldsymbol{x}\_{1},t\_{\mathrm{m}}),\boldsymbol{u}(\boldsymbol{x}\_{2},t\_{\mathrm{m}}),\boldsymbol{u}(\boldsymbol{x}\_{3},t\_{\mathrm{m}}),...,\boldsymbol{u}(\boldsymbol{x}\_{M},t\_{\mathrm{m}})\right]^{\top},\\\mathbf{v}^{\text{m}} &=& \left[\boldsymbol{v}(\boldsymbol{x}\_{1},t\_{\mathrm{m}}),\boldsymbol{v}(\boldsymbol{x}\_{2},t\_{\mathrm{m}}),\boldsymbol{v}(\boldsymbol{x}\_{3},t\_{\mathrm{m}}),...,\boldsymbol{v}(\boldsymbol{x}\_{M},t\_{\mathrm{m}})\right]^{\top},\\\mathbf{q}\_{\parallel} &=& \left[q\_{i}(\boldsymbol{x}\_{1}),q\_{i}(\boldsymbol{x}\_{2}),q\_{i}(\boldsymbol{x}\_{3}),...,q\_{i}(\boldsymbol{x}\_{M})\right]^{\top} \text{ for } i \in \{1,2\}.\end{array}$$

Consequently, using (40), (41), and the procedure in Section 3.1, we can convert both (38) and (39) into the matrix forms as

$$\sum\_{j=0}^{m-1} w\_j^{\mathbf{a}} \mathbf{A}^2 (\mathbf{u}^{m-j} - \mathbf{u}^{m-j-1}) = \mathbf{u}^m + 2\mathbf{Q}\_1 \mathbf{u}^m - \mathbf{A} \text{diag}(\mathbf{v}^{m-1}) \mathbf{u}^m + \mathbf{A}^2 \mathbf{f}^m + d\_1 \mathbf{x} + d\_2 \mathbf{i},$$

$$\sum\_{j=0}^{m-1} w\_j^{\mathbf{e}} \mathbf{A}^2 (\mathbf{v}^{m-j} - \mathbf{v}^{m-j-1}) = \mathbf{v}^m + 2\mathbf{Q}\_2 \mathbf{v}^m - \mathbf{A} \text{diag}(\mathbf{u}^{m-1}) \mathbf{v}^m + \mathbf{A}^2 \mathbf{g}^m + d\_3 \mathbf{x} + d\_4 \mathbf{i},$$

Rearranging the above system yields

$$\begin{cases} \mathbf{I} + 2\mathbf{Q}\_1 - \mathbf{A} \text{diag}(\mathbf{v}^{m-1}) - w\_0^a \mathbf{A}^2 \Big] \mathbf{u}^m + d\_1 \mathbf{x} + d\_2 \mathbf{i} = \mathbf{s}\_1 - w\_0^a \mathbf{A}^2 \mathbf{u}^{m-1} - \mathbf{A}^2 \mathbf{f}^m, \\\\ \mathbf{I} + 2\mathbf{O} \mathbf{A} - \mathbf{A} \text{diag}(\mathbf{v}^{m-1}) - w\_0^{a\_1} \mathbf{A}^2 \Big] \mathbf{v}^m + d\_1 \mathbf{v} + d\_2 \mathbf{i} = \mathbf{c} - w\_0^{a\_1} \mathbf{A}^2 \mathbf{v}^{m-1} - \mathbf{A}^2 \mathbf{c}^m \end{cases} \tag{42}$$

$$\left[\mathbf{I} + 2\mathbf{Q}\_2 - \mathbf{A}\mathrm{diag}(\mathbf{u}^{m-1}) - w\_0^\beta \mathbf{A}^2\right] \mathbf{v}^m + d\_3 \mathbf{x} + d\_4 \mathbf{i} = \mathbf{s}\_2 - w\_0^\beta \mathbf{A}^2 \mathbf{v}^{m-1} - \mathbf{A}^2 \mathbf{g}^m,\tag{43}$$

where **I** is the *M* × *M* identity matrix and other parameters are defined by

$$\begin{array}{rclclcl}\mathbf{s}\_{1} &=& \sum\_{j=1}^{m-1} \boldsymbol{\upsilon}\_{j}^{a} \mathbf{A}^{2} (\mathbf{u}^{m-j} - \mathbf{u}^{m-j-1}),\\\mathbf{s}\_{2} &=& \sum\_{j=1}^{m-1} \boldsymbol{\upsilon}\_{j}^{\beta} \mathbf{A}^{2} (\mathbf{v}^{m-j} - \mathbf{v}^{m-j-1}),\\\mathbf{f}^{m} &=& [f(\boldsymbol{\chi}\_{1}, t\_{m}), f(\boldsymbol{\chi}\_{2}, t\_{m}), f(\boldsymbol{\chi}\_{3}, t\_{m}), \dots, f(\boldsymbol{\chi}\_{M}, t\_{m})]^{\top},\\\mathbf{g}^{m} &=& [\boldsymbol{\varrho}(\boldsymbol{\chi}\_{1}, t\_{m}), \boldsymbol{\varrho}(\boldsymbol{\chi}\_{2}, t\_{m}), \boldsymbol{\varrho}(\boldsymbol{\chi}\_{3}, t\_{m}), \dots, \boldsymbol{\varrho}(\boldsymbol{\chi}\_{M}, t\_{m})]^{\top}.\end{array}$$

The boundary conditions (37) are transformed into the vector forms by using the same process as in (19) and (20), that is,

$$\mathbf{t}\_l \mathbf{T}^{-1} \mathbf{u}^m = \psi\_1(t\_m) \text{ and } \mathbf{t}\_r \mathbf{T}^{-1} \mathbf{u}^m = \psi\_2(t\_m), \tag{44}$$

$$\mathbf{t}\_{\mathbf{t}} \mathbf{T}^{-1} \mathbf{v}^{m} = \psi\_{\mathbf{3}}(t\_{m}) \text{ and } \mathbf{t}\_{r} \mathbf{T}^{-1} \mathbf{v}^{m} = \psi\_{\mathbf{4}}(t\_{m}), \tag{45}$$

where **<sup>t</sup>***<sup>l</sup>* = [1, <sup>−</sup>1, 1, ...,(−1)*M*−1] and **<sup>t</sup>***<sup>r</sup>* = [1, 1, 1, ..., 1]. Finally, starting from the initial guesses

$$\mathbf{u}^{0} = \left[\phi\_{1}(\mathbf{x}\_{1}), \phi\_{1}(\mathbf{x}\_{2}), \phi\_{1}(\mathbf{x}\_{3}), \dots, \phi\_{1}(\mathbf{x}\_{M})\right]^{\top} \text{ and } \mathbf{v}^{0} = \left[\phi\_{2}(\mathbf{x}\_{1}), \phi\_{2}(\mathbf{x}\_{2}), \phi\_{2}(\mathbf{x}\_{3}), \dots, \phi\_{2}(\mathbf{x}\_{M})\right]^{\top}.$$

we can construct the system of the *m*th iterative linear equations for finding numerical solutions. The approximate solutions of *u* can be obtained from (42) and (44) while the approximate solutions of *v* can be reached by using (43) and (45):

$$\begin{bmatrix} \mathbf{I} + 2\mathbf{Q}\_1 - \mathbf{A} \text{diag}(\mathbf{v}^{m-1}) - w\_0^\mu \mathbf{A}^2 & \mathbf{x} & \mathbf{i} \\ \hline \mathbf{t}\_1 \mathbf{T}^{-1} & 0 & 0 \\ \mathbf{t}\_2 \mathbf{T}^{-1} & 0 & 0 \end{bmatrix} \begin{bmatrix} \mathbf{u}^m \\ d\_1 \\ d\_2 \end{bmatrix} = \begin{bmatrix} \mathbf{s}\_1 - w\_0^\mu \mathbf{A}^2 \mathbf{u}^{m-1} - \mathbf{A}^2 \mathbf{f}^m \\ \hline \Psi\_1(t\_m) \\ \Psi\_2(t\_m) \end{bmatrix},\tag{46}$$

and

$$\begin{bmatrix} \mathbf{I} + 2\mathbf{Q}\_2 - \mathbf{A} \text{diag}(\mathbf{u}^{m-1}) - w\_0^\beta \mathbf{A}^2 & \mathbf{x} & \mathbf{i} \\ \hline \mathbf{t}/\mathbf{T}^{-1} & 0 & 0 \\ \mathbf{t}/\mathbf{T}^{-1} & 0 & 0 \end{bmatrix} \begin{bmatrix} \mathbf{v}^m \\ d\_3 \\ d\_4 \end{bmatrix} = \begin{bmatrix} \mathbf{s}\_2 - w\_0^\beta \mathbf{A}^2 \mathbf{v}^{m-1} - \mathbf{A}^2 \mathbf{g}^m \\ \hline \psi\_3(t\_m) \\ \psi\_4(t\_m) \end{bmatrix}. \tag{47}$$

For any fixed *t*, the approximate solutions of *u*(*x*, *t*) and *v*(*x*, *t*) on the space domain can be obtained by computing *u*(*x*, *t*) = **t***x***T**−1**u***<sup>m</sup>* and *v*(*x*, *t*) = **t***x***T**−1**v***m*, where **t***<sup>x</sup>* = [*T*∗ <sup>0</sup> (*x*), *T*<sup>∗</sup> <sup>1</sup> (*x*), *T*<sup>∗</sup> <sup>2</sup> (*x*), ..., *T*<sup>∗</sup> *<sup>M</sup>*−1(*x*)].

**Example 5.** *Consider the time-fractional coupled Burgers' Equation (35) for x* ∈ (0, 1) *and t* ∈ (0, 1] *with the forcing terms*

$$f(\mathbf{x},t) = \frac{6\mathbf{x}t^{3-\alpha}}{\Gamma(4-\alpha)} \text{ and } \operatorname{g}(\mathbf{x},t) = \frac{6\mathbf{x}t^{3-\beta}}{\Gamma(4-\beta)}$$

*subject to the homogeneous initial conditions and the boundary conditions corresponding to the analytical solution given by Albuohimad and Adibi [25] is u*∗(*x*, *t*) = *v*∗(*x*, *t*) = *xt*3*. For the numerical test, we choose the kinematic viscosity ν* = 1*, α* = *β* = 0.5 *and M* = 40*. Table 12 presents the exact solution u*∗(*x*, 1) *and the numerical solutions u*(*x*, 1) *together with v*(*x*, 1) *by using our FIM-SCP through Algorithm 3. The accuracy is measured by the absolute error Ea. Table 13 displays the comparison of the error norms L*∞ *of our approximate solutions and the approximate solutions obtained by using the collocation method with FDM (CM-FDM) introduced by Albuohimad and Adibi in [25]. As can be seen from Table 13, our FIM-SCP Algorithm 3 is more accurate. Next, the time convergence rates based on* -<sup>∞</sup> *and CPU times(s) of this problem that solved by Algorithm 3 are demonstrated in Table 14. Since the approximate solutions u and v are the same, we only present the graphical solution of u in Figure 6.*

### **Algorithm 3** The numerical algorithm for solving 1D time-fractional coupled Burgers' equation

**Input:** *α*, *β*, *x*, *L*, *T*, *M*, Δ*t*, *φ*1(*x*), *φ*2(*x*), *ψ*1(*t*), *ψ*2(*t*), *ψ*3(*t*), *ψ*4(*t*), *f*(*x*, *t*) and *g*(*x*, *t*). **Output:** The approximate solutions *u*(*x*, *T*) and *v*(*x*, *T*). 1: Set *xk* = *<sup>L</sup>* 2 cos <sup>2</sup>*k*−<sup>1</sup> <sup>2</sup>*<sup>M</sup> <sup>π</sup>* + 1 for *k* ∈ {1, 2, 3, ..., *M*}. 2: Compute **x**, **i**, **t***l*, **t***r*, **t***x*, **A**,**I**, **T**, **T** , **T**, **T**<sup>−</sup>1, **u**<sup>0</sup> and **v**0. 3: Set *t*<sup>0</sup> = 0 and *m* = 0. 4: **while** *tm* ≤ *T* **do** 5: Set *m* = *m* + 1. 6: Set *tm* = *m*Δ*t*. 7: Set **s**<sup>1</sup> = **0** and **s**<sup>2</sup> = **0**. 8: **for** *j* = 1 to *m* − 1 **do** 9: Compute *w<sup>α</sup> <sup>j</sup>* <sup>=</sup> (Δ*t*)−*<sup>α</sup>* <sup>Γ</sup>(2−*α*)[(*<sup>j</sup>* <sup>+</sup> <sup>1</sup>)1−*<sup>α</sup>* <sup>−</sup> *<sup>j</sup>* <sup>1</sup>−*α*]. 10: Compute *w<sup>β</sup> <sup>j</sup>* <sup>=</sup> (Δ*t*)−*<sup>β</sup>* <sup>Γ</sup>(2−*β*)[(*<sup>j</sup>* <sup>+</sup> <sup>1</sup>)1−*<sup>β</sup>* <sup>−</sup> *<sup>j</sup>* <sup>1</sup>−*β*]. 11: Compute **s**<sup>1</sup> = **s**<sup>1</sup> + *w<sup>α</sup> <sup>j</sup>* **<sup>A</sup>**2(**u***m*−*<sup>j</sup>* <sup>−</sup> **<sup>u</sup>***m*−*j*−1). 12: Compute **<sup>s</sup>**<sup>2</sup> <sup>=</sup> **<sup>s</sup>**<sup>2</sup> <sup>+</sup> *<sup>w</sup><sup>β</sup> <sup>j</sup>* **<sup>A</sup>**2(**v***m*−*<sup>j</sup>* <sup>−</sup> **<sup>v</sup>***m*−*j*−1). 13: **end for** 14: Calculate **Q**1, **Q**2, **f***<sup>m</sup>* and **g***m*. 15: Find **u***<sup>m</sup>* by solving the iterative linear system (46). 16: Find **v***<sup>m</sup>* by solving the iterative linear system (47). 17: **end while** 18: **return** *u*(*x*, *T*) = **t***x*(**T**∗)−1**u***<sup>m</sup>* and *v*(*x*, *T*) = **t***x*(**T**∗)−1**v***m*.


**Table 12.** Comparison of exact and numerical solutions of Example 5 for *α* = *β* = 0.5, *M* = 40.

**Table 13.** Comparison of error norms *L*∞ of Example 5 for *α* = *β* = 0.5, *M* = 5 and *t* = 1.


**Table 14.** Time convergence rates and CPU time(s) for Example 5 by FIM-SCP with *M* = 20.


**Figure 6.** The graphical solutions of Example 5 for *α* = *β* = 0.5, *M* = 40, and Δ*t* = 0.001.

**Example 6.** *Consider the time-fractional coupled Burgers' Equation (35) for x* ∈ (0, 1) *and t* ∈ (0, 1] *with the forcing terms*

$$f(\mathbf{x},t) = \left[\frac{\Gamma(4)t^{-\alpha}}{\Gamma(4-\alpha)} + 1\right]t^3 \sin(\mathbf{x}) \text{ and } \operatorname{g}(\mathbf{x},t) = \left[\frac{\Gamma(4)t^{-\beta}}{\Gamma(4-\beta)} + 1\right]t^3 \sin(\mathbf{x})$$

*subject to the homogeneous initial conditions and the boundary conditions corresponding to the analytical solution given by Albuohimad and Adibi [25] is u*∗(*x*, *t*) = *v*∗(*x*, *t*) = *t* 3sin(*x*)*. For the numerical test, we choose the viscosity ν* = 1*, α* = *β* = 0.5 *and M* = 5*. Table 15 provides the comparison of error norms L*<sup>∞</sup> *between our FIM-SCP and the CM-FDM in [25] for various values of* Δ*t and M, it show that our method is more accurate. Moreover, Table 16 illustrates the rates of convergence and CPU times(s) for M* = 20*. Figure 7a,b show the numerical solutions u*(*x*, *t*) *at different times t and the surface plot of u*(*x*, *t*)*, respectively. Note that we only show the graphical solution of u*(*x*, *t*) *since the approximate solutions u*(*x*, *t*) *and v*(*x*, *t*) *are the same.*


**Table 15.** Comparison of error norms *L*∞ between CM-FDM and FIM-SCP for Example 6.

**Table 16.** Time convergence rates and CPU time(s) for Example 6 by FIM-SCP with *M* = 20.


**Figure 7.** The graphical results of Example 6 for *α* = *β* = 0.5, *M* = 40 and Δ*t* = 0.001.

### **5. Conclusions and Discussion**

In this paper, we applied our improved FIM-SCP to develop the decent and accurate numerical algorithms for finding the approximate solutions of time-fractional Burgers' equations both in one- and two-dimensional spatial domains and time-fractional coupled Burgers' equations. Their fractional-order derivatives with respect to time were described in the Caputo sense and

estimated by forward difference quotient. According to Example 1, even though, we obtain similar accuracy, however, it can be seen that our method does not require the solution to be separable among the spatial and temporal variables. For Example 2, the results confirm that even with nonlinear FDEs, the FIM-SCP provides better accuracy than FDM. For two dimensions, Example 4 shows that even with the small kinematic viscosity *ν*, our method can deal with a shock wave solution, which is not globally continuously differentiable as that of the classical Burgers' equation under the same effect of small kinematic viscosity *ν*. We can also see from Examples 5 and 6 that our proposed method can be extended to solve the time-fractional Burgers' equation and we expect that it will also credibly work with other system of time-fractional nonlinear equation. We notice that our method provides better accuracy even when we use a small number of nodal points. Evidently, when we decrease the time step, it furnishes more accurate results. Also, we illustrated the time convergence rate of our method based on -<sup>∞</sup> norm, we observe that it approaches to the linear convergence *O*(Δ*t*). Finally, we show the computational cost in terms of CPU time(s) for each example. An interesting direction for our future work is to extend our technique to solve space-fractional Burgers' equations and other nonlinear FDEs.

**Author Contributions:** Conceptualization, A.D., R.B., and T.T.; methodology, R.B.; software, A.D.; validation, A.D., R.B. and T.T.; formal analysis, R.B.; investigation, A.D.; writing—original draft preparation, A.D.; writing—review and editing, R.B. and T.T.; visualization, A.D.; supervision, R.B. and T.T.; project administration, R.B.; funding acquisition, A.D.

**Funding:** This research was funded by The 100th Anniversary Chulalongkorn University Fund for Doctoral Scholarship.

**Conflicts of Interest:** The authors declare no conflicts of interest.

### **Abbreviations**

The following abbreviations are used in this manuscript.


### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Orhonormal Wavelet Bases on The 3D Ball Via Volume Preserving Map from The Regular Octahedron**

### **Adrian Holho¸s and Daniela Ro¸sca \***

Department of Mathematics, Technical University of Cluj-Napoca, str. Memorandumului 28, RO-400114 Cluj-Napoca, Romania; Adrian.Holhos@math.utcluj.ro

**\*** Correspondence: Daniela.Rosca@math.utcluj.ro

Received: 13 May 2020; Accepted: 12 June 2020; Published: 17 June 2020

**Abstract:** We construct a new volume preserving map from the unit ball B<sup>3</sup> to the regular 3D octahedron, both centered at the origin, and its inverse. This map will help us to construct refinable grids of the 3D ball, consisting in diameter bounded cells having the same volume. On this 3D uniform grid, we construct a multiresolution analysis and orthonormal wavelet bases of *L*2(B3), consisting in piecewise constant functions with small local support.

**Keywords:** wavelets on 3D ball; uniform 3D grid; volume preserving map

### **1. Introduction**

Spherical 3D signals occur in a wide range of fields, including computer graphics, and medical imaging (e.g., 3D reconstruction of medical images [1]), crystallography (texture analysis of crystals) [2,3] and geoscience [4–6]. Therefore, we need suitable efficient techniques for manipulating such signals, and one of the most efficient technique consists in using wavelets on the 3D ball (see e.g., [4–10] and the references therein). In this paper we propose to construct an orthonormal basis of wavelets with small support, defined on the 3D ball B3, starting from a multiresolution analysis. Our wavelets will be piecewise constant functions on the cells of a uniform and refinable grid of B3. By a refinable (or hierarchical) grid we mean that the cells can be divided successively into a given number of smaller cells of the same volume. By a uniform grid we mean that all the cells at a certain level of subdivision have the same volume. These two very important properties of our grid derive from the fact that it is constructed by mapping a uniform and refinable grid of the 3D regular octahedron, using a volume preserving map onto B3. Compared to the wavelets on the 3D ball constructed in [8,10], with localized support, our wavelets have local support, and this is very important when dealing with data consisting in big jumps on small portions, as shown in [11]. Another construction of piecewise constant wavelets on the 3D ball was realized in [7], starting from a similar construction on the 2D sphere. The author assumes that his wavelets are the first Haar wavelets on the 3D ball which are orthogonal and symmetric, even though we do not see any symmetry, neither in the cells, nor in the decomposition matrix. Moreover, his 8 × 8 decomposition matrices change in each step of the refinement, the entries depending on the volumes of the cells, which are, in our opinion, difficult to evaluate and for this reason they are not calculated explicitly in [7]. Another advantage of our construction is that our cells are diameter bounded, unlike the cells in [7] containing the origin, which become long and thin after some steps of refinement.

The paper is structured as follows. In Section 2 we introduce some notations used for the construction of the volume preserving map. In Section 3 we construct the volume preserving maps between the regular 3D octahedron and the 3D ball B3. In Section 4 we construct a uniform refinable grid of the regular octahedron followed by implementation issues, and its projection onto B<sup>3</sup> . Finally, in Section 5 we construct a multiresolution analysis and piecewise constant wavelet bases of *L*2(B3).

### **2. Preliminaries**

Consider the ball of radius *r* centered at the origin *O*, defined as

$$\mathbb{B}^3 = \left\{ (x, y, z) \in \mathbb{R}^3, x^2 + y^2 + z^2 \le r^2 \right\}$$

and the regular octahedron K of the same volume, centered at *O* and with vertices on the coordinate axes

$$\mathbb{K} = \left\{ (\mathbf{x}, \mathbf{y}, \mathbf{z}) \in \mathbb{R}^3, |\mathbf{x}| + |\mathbf{y}| + |\mathbf{z}| \le a \right\}.$$

Since the volume of the regular octahedron is 4*a*3/3, we have

$$a = r \sqrt[3]{\pi}.\tag{1}$$

The parametric equations of the ball are

$$\begin{aligned} x &= \rho \cos \theta \sin \varphi, \\ y &= \rho \sin \theta \sin \varphi, \\ z &= \rho \cos \varphi, \end{aligned} \tag{2}$$

where *ϕ* ∈ [0, *π*] is the colatitude, *θ* ∈ [0, 2*π*) is the longitude and *ρ* ∈ [0,*r*] is the distance to the origin. A simple calculation shows that the volume element of the ball is

$$dV = \rho^2 \sin\varphi \, d\rho \, d\theta \, d\varphi. \tag{3}$$

The ball and the octahedron can be split into eight congruent parts (see Figure 1), each part being situated in one of the eight octants *I* ± *<sup>i</sup>* , *i* = 0, 1, 2, 3,

$$\begin{aligned} I\_0^+ &= \{(x,y,z), \, x \ge 0, \, y \ge 0, \, z \ge 0\}, \quad I\_0^- = \{(x,y,z), \, x \ge 0, \, y \ge 0, \, z \le 0\}, \\\ I\_1^+ &= \{(x,y,z), \, x \le 0, \, y \ge 0, \, z \ge 0\}, \quad I\_1^- = \{(x,y,z), \, x \le 0, \, y \ge 0, \, z \le 0\}, \\\ I\_2^+ &= \{(x,y,z), \, x \le 0, \, y \le 0, \, z \ge 0\}, \quad I\_2^- = \{(x,y,z), \, x \le 0, \, y \le 0, \, z \le 0\}, \\\ I\_3^+ &= \{(x,y,z), \, x \ge 0, \, y \le 0, \, z \ge 0\}, \quad I\_3^- = \{(x,y,z), \, x \ge 0, \, y \le 0, \, z \le 0\}. \end{aligned}$$

Let B*<sup>s</sup> <sup>i</sup>* and <sup>K</sup>*<sup>s</sup> <sup>i</sup>* be the regions of <sup>B</sup><sup>3</sup> and <sup>K</sup>, situated in *<sup>I</sup><sup>s</sup> <sup>i</sup>* , respectively.

**Figure 1.** The eight spherical zones obtained as intersections of the coordinate planes with the ball B3.

Next we will construct a map <sup>U</sup> : <sup>B</sup><sup>3</sup> <sup>→</sup> <sup>K</sup> which preserves the volume, i.e., <sup>U</sup> satisfies

$$\text{vol}(D) = \text{vol}(\mathcal{U}(D)), \qquad \text{for all } D \subseteq \mathbb{B}^3,\tag{4}$$

where vol(*D*) denotes the volume of a domain *<sup>D</sup>*. For an arbitrary point (*x*, *<sup>y</sup>*, *<sup>z</sup>*) <sup>∈</sup> <sup>B</sup><sup>3</sup> we denote

$$\mathcal{U}(\mathbf{X}, \mathbf{Y}, \mathbf{Z}) = \mathcal{U}(\mathbf{x}, \mathbf{y}, \mathbf{z}) \in \mathbb{K}.\tag{5}$$

### **3. Construction of the Volume Preserving Map** U **and Its Inverse**

We focus on the region B<sup>+</sup> <sup>0</sup> ⊂ *I* + <sup>0</sup> where we consider the points *A* = (*r*, 0, 0), *B* = (0,*r*, 0), *C* = (0, 0,*r*) and the vertical plane of equation *y* = *x* tan *α* with *α* ∈ (0, *π*/2) (see Figure 2 (left)). We denote by *M* its intersection with the great arc *AB*7 of the sphere of radius *r*. More precisely, *M* = (*r* cos *α*,*r* sin *α*, 0). The volume of the spherical region *OAMC* equals *r*3*α*/3.

**Figure 2.** The spherical region *OAMC* and its image *OA M C* = U(*OAMC*) on the octahedron.

Now we intersect the region K<sup>+</sup> <sup>1</sup> of the octahedron with the vertical plane of equation *y* = *x* tan *β* and denote by *M* (*m*, *n*, 0) its intersection with the edge *A B* , where *A* (*a*, 0, 0), *B* (0, *a*, 0) (see Figure 2 (right)). Then *m* + *n* = *a* and from *n* = *m* tan *β* we find

$$m = a \cdot \frac{1}{1 + \tan \beta}, \qquad n = a \cdot \frac{\tan \beta}{1 + \tan \beta}$$

The volume of *OA M C* is

$$\mathcal{V}(OA'M'\mathbb{C}') = \frac{OC' \cdot \mathcal{A}(OA'M')}{3} = \frac{a}{3} \cdot \frac{OA' \cdot n}{2} = \frac{a^3 \tan \beta}{6(1 + \tan \beta)}.$$

If we want the volume of the region *OAMC* of the unit ball to be equal to the volume of *OA M C* , we obtain

$$
\kappa = \frac{\pi}{2} \cdot \frac{\tan \beta}{1 + \tan \beta}, \text{ whence } \tan \beta = \frac{2\alpha}{\pi - 2\alpha}.
$$

This give us a first relation between (*x*, *y*, *z*) and (*X*,*Y*, *Z*):

$$\frac{\chi}{X} = \frac{2 \arctan \frac{\psi}{\chi}}{\pi - 2 \arctan \frac{\psi}{\chi}}.$$

Using the spherical coordinates (2) we obtain

$$Y = \frac{2\theta}{\pi - 2\theta} \cdot X. \tag{6}$$

.

In order to obtain a second relation between (*x*, *y*, *z*) and (*X*,*Y*, *Z*), we impose that, for an arbitrary *<sup>ρ</sup>* <sup>∈</sup> (0,*r*] the region

$$\left\{ (\mathbf{x}, y, z) \in \mathbb{R}^3, \ x^2 + y^2 + z^2 \le \tilde{\rho}^2, \ x, y, z \ge 0 \right\} \text{ of volume } \frac{\pi \tilde{\rho}^3}{6}$$

is mapped by U onto

$$\left\{ (X, \mathcal{Y}, Z) \in \mathbb{R}^3 \mid X + \mathcal{Y} + Z \le \ell \text{, } X, \mathcal{Y}, Z \ge 0 \right\} \text{ of volume } \frac{\ell^3}{6}.$$

Then, the volume preserving condition (4) implies -<sup>=</sup> *<sup>a</sup>* · *<sup>ρ</sup>*/*r*, with *<sup>a</sup>* satisfying (1). Thus,

$$X + Y + Z = \frac{a}{r}\sqrt{x^2 + y^2 + z^2}$$

and in spherical coordinates this can be written as

$$X + \mathcal{Y} + Z = \frac{a\rho}{r}.\tag{7}$$

In order to have a volume preserving map, the modulus of the Jacobian *J*(U) of U must be 1, or, equivalently, taking into account the volume element (3), we must have

$$J(\mathcal{U}) = \begin{vmatrix} X'\_{\rho} & X'\_{\phi} & X'\_{\theta} \\ Y'\_{\rho} & Y'\_{\phi} & Y'\_{\theta} \\ Z'\_{\rho} & Z'\_{\varphi} & Z'\_{\theta} \end{vmatrix} = \rho^2 \sin \varphi. \tag{8}$$

Taking into account formula (7), we have

$$J(\mathcal{U}) = \begin{vmatrix} X'\_{\rho} & X'\_{\rho} & X'\_{\theta} \\ Y'\_{\rho} & Y'\_{\varphi} & Y'\_{\theta} \\ a/r - X'\_{\rho} - Y'\_{\rho} & -X'\_{\rho} - Y'\_{\varphi} & -X'\_{\theta} - Y'\_{\theta} \end{vmatrix} = \begin{vmatrix} X'\_{\rho} & X'\_{\rho} & X'\_{\theta} \\ Y'\_{\rho} & Y'\_{\varphi} & Y'\_{\theta} \\ a/r & 0 & 0 \end{vmatrix} = \frac{a}{r} \begin{vmatrix} X'\_{\rho} & X'\_{\theta} \\ Y'\_{\varphi} & Y'\_{\theta} \end{vmatrix}.$$

Further, using relation (6) we get

$$f(\mathcal{U}) = \frac{a}{r} \left| \begin{array}{cc} X'\_{\boldsymbol{\theta}} & X'\_{\boldsymbol{\theta}} \\ \frac{2\boldsymbol{\theta}}{\pi - 2\boldsymbol{\theta}} \cdot X'\_{\boldsymbol{\theta}} & \frac{2\boldsymbol{\theta}}{\pi - 2\boldsymbol{\theta}} \cdot X'\_{\boldsymbol{\theta}} + \frac{2\pi}{(\pi - 2\boldsymbol{\theta})^2} \cdot X \end{array} \right| = \frac{a}{r} \left| \begin{array}{cc} X'\_{\boldsymbol{\theta}} & X'\_{\boldsymbol{\theta}} \\ 0 & \frac{2\pi}{(\pi - 2\boldsymbol{\theta})^2} \cdot X \end{array} \right| = \frac{2\pi a}{r(\pi - 2\boldsymbol{\theta})^2} X X'\_{\boldsymbol{\theta}}.$$

For the last equality, we have multiplied the first row by −2*θ*/(*π* − 2*θ*) and we have added it to the second row. Then, using the expression for *J*(U) obtained in (8) we get the differential equation

$$2X\_{\varphi}^{\prime} \cdot X = \frac{r\rho^2}{\pi a} (\pi - 2\theta)^2 \sin \varphi.$$

The integration with respect to *ϕ* gives

$$X^2 = -\frac{r(\pi - 2\theta)^2}{\pi a} \rho^2 \cos \phi + \mathcal{C}(\theta, \rho)\_{\theta}$$

and further, for finding C(*θ*, *ρ*) we use the fact that, for *ϕ* = *π*/2 we must obtain *Z* = 0. Thus, for *ϕ* = *π*/2 we have

$$X^2 = \mathcal{C}(\theta, \rho), \text{ so } \mathcal{Y} = \frac{2\theta}{\pi - 2\theta} \sqrt{\mathcal{C}(\theta, \rho)}, \text{ and }$$

$$Z = \frac{a\rho}{r} - X - \mathcal{Y} = \frac{a\rho}{r} - \frac{\pi}{\pi - 2\theta} \sqrt{\mathcal{C}(\theta, \rho)}.$$

Thus, *Z* = 0 is obtained for

$$\mathcal{C}(\theta,\rho) = \frac{a^2 \rho^2}{\pi^2 r^2} (\pi - 2\theta)^2 \rho$$

and finally, the map U restricted to the region *I* + <sup>0</sup> is

$$X = \frac{\sqrt{2}}{\pi^{2/3}} \cdot \rho(\pi - 2\theta) \sin \frac{\theta}{2} \,\prime \tag{9}$$

$$Y = \frac{\sqrt{2}}{\pi^{2/3}} \cdot \rho \cdot 2\theta \sin\frac{\theta}{2},\tag{10}$$

$$Z = \pi^{1/3} \rho \left( 1 - \sqrt{2} \sin \frac{\varphi}{2} \right). \tag{11}$$

In the other seven octants, the map U can be obtained by symmetry as follows. A point (*x*, *y*, *z*) ∈ B3, can be written as

$$\mathbf{r}(x, y, z) = (\operatorname{sgn}(x) \cdot |\mathbf{x}|, \operatorname{sgn}(y) \cdot |y|, \operatorname{sgn}(z) \cdot |z|), \quad \text{with } (|x|, |y|, |z|) \in I\_0^+.$$

Therefore, if we denote by (*X*,*Y*, *Z*) = U(|*x*|, |*y*|, |*z*|), then we can define U(*x*, *y*, *z*) as

$$\mathcal{U}(\mathbf{x}, y, z) = (\text{sgn}(\mathbf{x}) \cdot \overline{\mathcal{X}}, \text{sgn}(y) \cdot \overline{\mathcal{Y}}, \text{sgn}(z) \cdot \overline{\mathcal{Z}}).\tag{12}$$

Next we deduce the formulas for the inverse of U. First, from (6) we obtain

$$\theta = \frac{\pi Y}{2(X+Y)'} $$

and from (7) we have

$$
\rho = \frac{r}{a}(X + \mathcal{Y} + Z) = \pi^{-1/3}(X + \mathcal{Y} + Z).
$$

Adding (9) and (10), after some more calculations we obtain

$$\sin\frac{\varphi}{2} = \frac{X+Y}{\sqrt{2}(X+Y+Z)}\sqrt{2}$$

and further

$$\cos \varrho = \frac{Z(2X + 2Y + Z)}{(X + Y + Z)^2}, \quad \sin \varrho = \frac{X + Y}{X + Y + Z} \sqrt{2 - \left(\frac{X + Y}{X + Y + Z}\right)^2}.$$

Finally, the inverse <sup>U</sup> <sup>−</sup><sup>1</sup> : <sup>K</sup> <sup>→</sup> <sup>B</sup><sup>3</sup> is defined by

$$\ln x = \pi^{-1/3} (X+Y) \sqrt{2 - \left(\frac{X+Y}{X+Y+Z}\right)^2} \cos \frac{\pi Y}{2(X+Y)},\tag{13}$$

$$y = \pi^{-1/3} (X+Y) \sqrt{2 - \left(\frac{X+Y}{X+Y+Z}\right)^2} \sin \frac{\pi Y}{2(X+Y)}\tag{14}$$

$$z = \pi^{-1/3} \frac{Z(2X + 2Y + Z)}{(X + Y + Z)}.\tag{15}$$

for (*X*,*Y*, *<sup>Z</sup>*) <sup>∈</sup> <sup>K</sup><sup>+</sup> <sup>0</sup> , and for the other seven octants the formulas can be calculated as in (12).

### **4. Uniform and Refinable Grids of the Regular Octahedron and of the Ball**

In this section we construct a uniform refinement of the regular octahedron K of volume vol(K), more precisely a subdivision of K into 64 cells of two shapes, each of them having the volume vol(K)/64. This subdivision can be repeated for each of the 64 small cells, the resulting 64<sup>2</sup> cells of volume vol(K)/642 being of one of the two types from the first refinement. Next, the volume preserving map <sup>U</sup> will allow us the construction of uniform and refinable grids of the 3D ball <sup>B</sup><sup>3</sup> by transporting the octahedral uniform refinable 3D grids, and further, the construction of orthonormal piecewise constant wavelets on the 3D ball.

### *4.1. Refinement of the Octahedron*

The initial octahedron K consists in four congruent cells, each situated in one of the octants *I* + *<sup>i</sup>* ∪ *I* − *<sup>i</sup>* , *i* = 0, 1, 2, 3 (see Figure 3). We will say that this type of cell is **T**0, the index 0 of **T**<sup>0</sup> being the coarsest level of the refinement. For simplifying the writing we denote by N<sup>0</sup> the set of positive natural numbers and by <sup>N</sup>*<sup>n</sup>* <sup>=</sup> {1, 2, . . . , *<sup>n</sup>*}, for *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>0.

**Figure 3.** Left: one of the four cells of type **T**<sup>0</sup> constituting the octahedron. Right: each cell of type **T**<sup>0</sup> can be subdivided into six cells of type **T**<sup>1</sup> and two cells of type **M**1.

### 4.1.1. First Step of Refinement

The cell **T**<sup>0</sup> = (*ABCD*) ∈ *I* + <sup>0</sup> ∪ *I* − <sup>0</sup> , with *A*(*a*, 0, 0), *B*(0, *a*, 0), *C*(0, 0, *a*), *D*(0, 0, −*a*) (see Figure 3), will be subdivided into eight smaller cells having the same volume, as follows: we take the mid-points *M*, *N*, *P*, *Q*, *R* of the edges *AC*, *BC*, *AB*, *AD*, *BD*, respectively. Thus, one obtains *t*<sup>1</sup> = 6 cells of type **T**<sup>1</sup> (*MQOP*, *MQAP*, *NROP*, *NRBP*, *ODQR* and *COMN*), and *m*<sup>1</sup> = 2 other cells, *OMNP* and *OPQR*, of another type, say **M**1. The cells of type **T**<sup>1</sup> have the same shape with the cells **T**0. Their volumes are

$$\text{vol}(\mathbf{T}\_1) = \text{vol}(\mathbf{M}\_1) = \frac{\text{vol}(\mathbf{T}\_0)}{8}.$$

Figures 4 and 5 also show the eight cells at the first step of refinement.

Similarly we refine the other three cells situated in *I* + *<sup>i</sup>* ∪ *I* − *<sup>i</sup>* , *i* = 1, 2, 3, therefore the total number of cells after the first step of refinement is 32, more precisely 24 of type **T**<sup>1</sup> and 8 of type **M**1.

**Figure 4.** The subdivision of a **T** cell.

**Figure 5.** The first step of the refinement: the cell **T**<sup>0</sup> is divided into two cells of type **M**<sup>1</sup> (yellow) and six cells of type **T**1: two red, two blue and two green.

### 4.1.2. Second Step of Refinement

A cell of type **T**<sup>1</sup> will be subdivided in the same way as a cell of type **T**0, i.e., into six cells of type **T**<sup>2</sup> and two cells of type **M**2. Their volumes will be

$$\text{vol}(\mathbf{T}\_2) = \text{vol}(\mathbf{M}\_2) = \frac{\text{vol}(\mathbf{T}\_0)}{8^2}.$$

Therefore, from the subdivision of the 6 cells of type **T**<sup>1</sup> we have 36 cells of type **T**<sup>2</sup> and 12 cells of type **M**2.

For a cell (*OMNP*) of type **M**1, which is a regular tetrahedron of edge -<sup>1</sup> = *a* <sup>√</sup>2/2, we take the mid-points of the six edges (see Figures 6 and 7). This will give four cells of type **T**<sup>2</sup> in the middle and four cells of type **M**2, i.e., regular tetrahedrons of edge -<sup>2</sup> = *a* <sup>√</sup>2/22. From the subdivision of the two cells of type **M**<sup>1</sup> we have 8 cells of type **T**<sup>2</sup> and 8 cells of type **M**2.

**Figure 6.** The four cells of type **T** of the subdivision of a cell of type **M**.

**Figure 7.** The subdivision of a cell of type **M**<sup>1</sup> into four cells of type **M**: the four tetrahedrons at the corners and four cells of type **T** in the middle, forming an octahedron.

In conclusion, the second step of subdivision yields in *I* + <sup>0</sup> ∪ *I* − <sup>0</sup> *t*<sup>2</sup> = 44 cells of type **T**<sup>2</sup> and *m*<sup>2</sup> = 20 cells of type **M**2, each having the volume vol(**T**0)/64, therefore the total number of cells after the second refinement will be 4 · 82, more precisely 76 of type **<sup>T</sup>**<sup>2</sup> and 80 of type **<sup>M</sup>**2.

### 4.1.3. The General Step of Refinement

Let *mj* and *tj* denote the numbers of cells of type **M***<sup>j</sup>* and **T***j*, respectively, resulted at the step *j* of the subdivision, starting from one cell of type *T*0. At this step, each of the *tj*−<sup>1</sup> cells of type **T***j*−<sup>1</sup> is subdivided into 6 cells of type **T***<sup>j</sup>* and 2 cells of type **M***j*, and each of the *mj*−<sup>1</sup> cells of type **M***j*−<sup>1</sup> is subdivided into 4 cells of type **T***<sup>j</sup>* and 4 cells of type **M***j*. This implies

$$\begin{aligned} t\_{\dot{j}} &= 6t\_{\dot{j}-1} + 4m\_{\dot{j}-1'}, \\ m\_{\dot{j}} &= 2t\_{\dot{j}-1} + 4m\_{\dot{j}-1'} \end{aligned}$$

or 
$$
\binom{t\_j}{m\_j} = A \binom{t\_{j-1}}{m\_{j-1}} = A^2 \binom{t\_{j-2}}{m\_{j-2}} = \dots = A^j \binom{t\_0}{m\_0} \dots
$$

$$\begin{aligned} \text{with } t\_0 = 1, m\_0 = 0 \text{ and } A = \begin{pmatrix} 6 & 4 \\ 2 & 4 \end{pmatrix}. \text{ After some calculations we obtain } \\ A^j &= \frac{1}{3} \begin{pmatrix} 2^j(2^{2j+1}+1) & 2^{j+1}(2^{2j}-1) \\ 2^j(2^{2j}-1) & 2^j(2^{2j}+2) \end{pmatrix}, \text{ whence} \\\\ t\_j &= \frac{2^j}{3}(2^{2j+1}+1), \quad m\_j = \frac{2^j}{3}(2^{2j}-1), \end{aligned}$$

the total number of cells of K<sup>+</sup> <sup>1</sup> <sup>∪</sup> <sup>K</sup><sup>−</sup> <sup>1</sup> at step *<sup>j</sup>* being *tj* + *mj* = <sup>8</sup>*<sup>j</sup>* , and 4 · <sup>8</sup>*<sup>j</sup>* for the whole octahedron <sup>K</sup>. Each of the cells of type **T***<sup>j</sup>* and **M***<sup>j</sup>* has the volume vol(**T**0)/8*<sup>j</sup>* .

### *4.2. Implementation Issues*

Every cell of the polyhedron is identified by the coordinates of its four vertices. We have two types of cells, which will be denoted by **T** and **M**.

A cell of type **T** has the same coordinates *x* and *y* for the first two vertices. The *z* coordinate of the first vertex is greater than the *z* coordinate of the second vertex and the mean value of these *z* coordinates gives the value of the *z* coordinate of the third and fourth vertices of **T**.

A cell of type **M** has two pairs of vertices at the same altitude (the same value of the *z* coordinate).

At every step of refinement, every cell **T** is divided into 6 cells of type **T** and two cells of type **M**. Suppose [**p**1, **p**2, **p**3, **p**4] is the array giving the coordinates of the four vertices of a T cell. The coordinates of the vertices of the next level cells are computed as follows


The cells 1–6 are of type **T** and the cells 7 and 8 are of type **M** (see Figure 4).

Every cell **M** consists in 4 cells of type **T** and 4 cells of type **M**. Suppose [**p**1, **p**2, **p**3, **p**4] is the array giving the coordinates of the four vertices of the cell **M** and let **p***<sup>k</sup>* = *pkx*, *pky*, *pkz* , *k* = 1, 2, 3, 4. We rearrange these four vertices in ascending order with respect to the *z* coordinate. Let [**q**1, **q**2, **q**3, **q**4] be the vector [**p**1, **p**2, **p**3, **p**4] sorted ascendingly with respect to the *z* coordinate of the vertices, i.e., *q*1*<sup>z</sup>* ≤ *q*2*<sup>z</sup>* ≤ *q*3*<sup>z</sup>* ≤ *q*4*z*. Similarly, let [**r**1,**r**2,**r**3,**r**4] be the rearrangement of vertices **p**1, ... , **p**<sup>4</sup> such that *r*1*<sup>x</sup>* ≤ *r*2*<sup>x</sup>* ≤ *r*3*<sup>x</sup>* ≤ *r*4*x*. Let, also, [*s*1,*s*2,*s*3,*s*4] be the array of rearranged vertices with respect to the *y*

coordinate in ascending order. The coordinates of the vertices of the cells at the next level are computed as follows:


To verify whether a point **p** = (*px*, *py*, *pz*) is inside a cell with vertices [**p**1, **p**2, **p**3, **p**4], we compute the following numbers:

$$\begin{aligned} d\_{1} &= \text{sgn}\begin{vmatrix} p\_{1x} & p\_{2x} & p\_{3x} & p\_{x} \\ p\_{1y} & p\_{2y} & p\_{3y} & p\_{y} \\ p\_{1z} & p\_{2z} & p\_{3z} & p\_{z} \\ 1 & 1 & 1 & 1 \end{vmatrix}, d\_{2} = \text{sgn}\begin{vmatrix} p\_{1x} & p\_{2x} & p\_{3x} & p\_{4x} \\ p\_{1y} & p\_{2y} & p\_{3} & p\_{4y} \\ p\_{1z} & p\_{2z} & p\_{2z} & p\_{4z} \\ 1 & 1 & 1 & 1 \end{vmatrix}, d\_{3} = \text{sgn}\begin{vmatrix} p\_{1x} & p\_{1x} & p\_{1x} & p\_{1x} \\ p\_{1y} & p\_{y} & p\_{3y} & p\_{4y} \\ p\_{1z} & p\_{2z} & p\_{3z} & p\_{4z} \\ 1 & 1 & 1 & 1 \end{vmatrix}, \\ d\_{4} &= \text{sgn}\begin{vmatrix} p\_{1x} & p\_{2x} & p\_{3x} & p\_{4x} \\ p\_{2z} & p\_{2z} & p\_{3z} & p\_{4z} \\ 1 & 1 & 1 & 1 \end{vmatrix}, d\_{5} = \text{sgn}\begin{vmatrix} p\_{1x} & p\_{2x} & p\_{3x} & p\_{4x} \\ p\_{1z} & p\_{2z} & p\_{3z} & p\_{4z} \\ p\_{1z} & p\_{2z} & p\_{3z} & p\_{4z} \\ 1 & 1 & 1 & 1 \end{vmatrix}. \end{aligned}$$

We calculate *v* = |*d*1| + |*d*2| + |*d*2| + |*d*3| + |*d*4| + |*d*5|. If |*d*<sup>1</sup> + *d*<sup>2</sup> + *d*<sup>3</sup> + *d*<sup>4</sup> + *d*5| = *v*, then for *v* = 5 the point **p** is in the interior of the cell, for *v* = 4 the point **p** is on one of the faces of the cell, for *v* = 3 the point **p** is situated on one of the edges of the cell, and for *v* = 2 the point **p** is one of the vertices of the cell. If |*d*<sup>1</sup> + *d*<sup>2</sup> + *d*<sup>3</sup> + *d*<sup>4</sup> + *d*5| = *v*, the point **p** is located outside the cell. Since the vertices **p***<sup>k</sup>* are different we have *v* ≥ 2.

### *4.3. Uniform and Refinable Grids of the Ball* B<sup>3</sup>

If we transport the uniform and refinable grid on K onto the ball B<sup>3</sup> using the volume preserving map <sup>U</sup> <sup>−</sup>1, we obtain a uniform and refinable grid of <sup>B</sup>3. Figures 8–10 show the images on <sup>B</sup><sup>3</sup> of different cells of K.

Besides the multiresolution analysis and wavelet bases, which will be constructed in Section 5, another useful application is the construction of a uniform sampling of the rotation group *SO*(3), by calculations similar to the ones in [3]. This will be subject of a future paper.

**Figure 8.** Left: a cell of **M** in red and a cell of **T** type in green from the octahedron Middle and right: the corresponding cells of the ball.

**Figure 9.** Left: the image on the ball of the positive octant; Right: the same image rotated.

**Figure 10.** The image on the ball of the cells of the octahedron corresponding to Figure 7.

### **5. Multiresolution Analysis and Piecewise Constant Orthonormal Wavelet Bases of** *L*2(K) **and** *L*2(B3)

Let <sup>D</sup> <sup>=</sup> <sup>D</sup><sup>0</sup> <sup>=</sup> {*D*1, *<sup>D</sup>*2, *<sup>D</sup>*3, *<sup>D</sup>*4} be the decomposition of the domain <sup>K</sup> considered in Section 4.1, consisting in four congruent domains (cells) of type **T**0. For *D* ∈ D, let R*<sup>D</sup>* denote the set of the eight refined domains, constructed in Section 4.1.1. The set <sup>D</sup><sup>1</sup> <sup>=</sup> <sup>∪</sup>*D*∈D<sup>0</sup>R*<sup>D</sup>* is a refinement of <sup>D</sup>0, consisting in 4 · 8 congruent cells. Continuing the refinement process as we described in Section 4, we obtain a decomposition <sup>D</sup>*<sup>j</sup>* of <sup>K</sup>, for *<sup>j</sup>* <sup>∈</sup> <sup>N</sup>0, |D*<sup>j</sup>* <sup>|</sup> <sup>=</sup> <sup>4</sup> · <sup>8</sup>*<sup>j</sup>* .

For a fixed *<sup>j</sup>* <sup>∈</sup> <sup>N</sup><sup>0</sup> we assign to each domain *<sup>D</sup><sup>j</sup> <sup>k</sup>* ∈ D*<sup>j</sup>* , *<sup>k</sup>* ∈ N*<sup>j</sup>* :<sup>=</sup> <sup>N</sup>4·8*<sup>j</sup>* , the function *<sup>ϕ</sup>D<sup>j</sup>* : <sup>K</sup> <sup>→</sup> <sup>R</sup>,

*k*

$$\varphi\_{D\_k^j} = (2\sqrt{2})^j \frac{2}{\sqrt{\text{vol}(\mathbb{K})}} \chi\_{D\_k^j}.$$

where *<sup>χ</sup>D<sup>j</sup> k* is the characteristic function of the domain *D<sup>j</sup> <sup>k</sup>*. Then we define the spaces of functions *<sup>V</sup><sup>j</sup>* <sup>=</sup> span{*ϕD<sup>j</sup> k* , *<sup>k</sup>* ∈ N*j*} of dimension 4 · <sup>8</sup>*<sup>j</sup>* , consisting of piecewise constant functions on the domains of <sup>D</sup>*<sup>j</sup>* . Moreover, we have *ϕD<sup>j</sup> k L*2(K) <sup>=</sup> 1, the norm being the usual 2-norm of the space *<sup>L</sup>*2(K). For *<sup>A</sup><sup>j</sup>* ∈ D*<sup>j</sup>* <sup>=</sup> {*D<sup>j</sup> <sup>k</sup>*, *<sup>j</sup>* ∈ N*j*}, let *<sup>A</sup>j*+<sup>1</sup> *<sup>k</sup>* , *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>8, be the refined subdomains obtained from *<sup>A</sup><sup>j</sup>* . One has

$$\varphi\_{A^j} = \frac{1}{2\sqrt{2}} \left( \varphi\_{A\_1^{j+1}} + \varphi\_{A\_2^{j+1}} + \dots + \varphi\_{A\_8^{j+1}} \right), \dots$$

in *<sup>L</sup>*2(K), equality which implies the inclusion *<sup>V</sup><sup>j</sup>* <sup>⊆</sup> *<sup>V</sup>j*<sup>+</sup>1, for all *<sup>j</sup>* <sup>∈</sup> <sup>N</sup>0. With respect to the usual inner product %·, ·&*L*2(K), the spaces *<sup>V</sup><sup>j</sup>* are Hilbert spaces, with the corresponding usual 2-norm ·*L*2(K). In conclusion, the sequence of subspaces *V<sup>j</sup>* has the following properties:


i.e., the sequence {*V<sup>j</sup>* , *<sup>j</sup>* <sup>∈</sup> <sup>N</sup>0} constitutes a *multiresolution analysis* of the space *<sup>L</sup>*2(K). Let *<sup>W</sup><sup>j</sup>* denote the orthogonal complement of the coarse space *V<sup>j</sup>* in the fine space *Vj*<sup>+</sup>1, so that

$$V^{j+1} = V^j \oplus W^j.$$

The dimension of *<sup>W</sup><sup>j</sup>* is dim *<sup>W</sup><sup>j</sup>* <sup>=</sup> <sup>28</sup> · <sup>8</sup>*<sup>j</sup>* . The spaces *W<sup>j</sup>* are called *wavelet spaces* and their elements are called *wavelets*. In the following we construct an orthonormal basis of *W<sup>j</sup>* . To each domain *<sup>A</sup><sup>j</sup>* ∈ D*<sup>j</sup>* , seven wavelets supported on *D<sup>j</sup>* will be associated in the following way:

$$
\Psi\_{A^j}^\ell = a\_{\ell 1} \boldsymbol{\varrho}\_{A\_1^{j+1}} + a\_{\ell 2} \boldsymbol{\varrho}\_{A\_2^{j+1}} + \dots + a\_{\ell 8} \boldsymbol{\varrho}\_{A\_8^{j+1}}, \text{ for } \ell \in \mathbb{N}\_7,
$$

with *a<sup>j</sup>* <sup>∈</sup> <sup>R</sup>, - <sup>∈</sup> <sup>N</sup>7, *<sup>j</sup>* <sup>∈</sup> <sup>N</sup>8. We have to find conditions on the coefficients *<sup>a</sup><sup>j</sup>* which ensure that the set {*ψ*- *<sup>A</sup><sup>j</sup>* , - <sup>∈</sup> <sup>N</sup>7, *<sup>A</sup><sup>j</sup>* ∈ D*<sup>j</sup>* } is an orthonormal basis of *<sup>W</sup><sup>j</sup>* . First we must have

$$
\langle \psi\_{A^{j}}^{\ell}, \varphi\_{\mathbb{S}^{j}} \rangle = 0,\text{ for } \ell \in \mathbb{N}\_{\mathbb{T}} \text{ and } A^{j}, S^{j} \in \mathcal{D}^{j}.\tag{16}
$$

If *<sup>A</sup><sup>j</sup>* <sup>=</sup> *<sup>S</sup><sup>j</sup>* , the equality is immediate, since supp *ψ*- *<sup>A</sup><sup>j</sup>* ⊆ supp *ϕA<sup>j</sup>* and supp *ϕA<sup>j</sup>* ∩ supp *ϕS<sup>j</sup>* is either empty or an edge, whose measure is zero. If *A<sup>j</sup>* = *S<sup>j</sup>* , evaluating the inner product (16) we obtain

$$\begin{aligned} \langle \psi\_{A^{j\prime}}^{\ell} \varphi\_{S^{j}} \rangle &= \langle a\_{\ell 1} \mathfrak{p}\_{A\_1^{j+1}} + a\_{\ell 2} \mathfrak{p}\_{A\_2^{j+1}} + \dots + a\_{\ell 8} \mathfrak{p}\_{A\_8^{j+1}}, \mathfrak{p}\_{A^{j}} \rangle \\ &= \frac{1}{2\sqrt{2}} (a\_{\ell 1} + a\_{\ell 2} + \dots + a\_{\ell 8}). \end{aligned}$$

Then, each of the orthogonality conditions

$$
\langle \psi\_{A^j}^{\ell}, \psi\_{A^j}^{\ell'} \rangle = \delta\_{\ell\ell'\prime} \text{ for all } A^j \in \mathcal{D}^j,
$$

is equivalent to *a*- <sup>1</sup>*a*-<sup>1</sup> + *a*- <sup>2</sup>*a*-<sup>2</sup> + ... + *a*- <sup>8</sup>*a*-<sup>8</sup> = *δ*-- , -, - <sup>∈</sup> <sup>N</sup>7. In fact, one requires the orthogonality of the 8 <sup>×</sup> 8 matrix *<sup>M</sup>* <sup>=</sup> *aij <sup>i</sup>*,*<sup>j</sup>* with the entries of the first row equal to 1/(<sup>2</sup> <sup>√</sup>2).

A particular case was considered in [12], where the authors divide a tetrahedron into eight small tetrahedrons of the same area using Bey's method and for the construction of the orthonormal wavelet basis they take the Haar matrix

$$
\frac{1}{2\sqrt{2}}\begin{pmatrix}1&1&1&1&1&1&1&1\\1&1&1&1&-1&-1&-1&-1\\1&1&-1&-1&0&0&0&0\\0&0&0&0&1&1&-1&-1\\1&-1&0&0&0&0&0&0\\0&0&1&-1&0&0&0&0\\0&0&0&0&1&-1&0&0\\0&0&0&0&0&0&1&-1\end{pmatrix}
$$

Alternatively, we can consider the symmetric orthogonal matrix

$$
\begin{pmatrix}
 c & c & c & c & c & c & c & c \\
 c & a & b & b & b & b & b & b \\
 c & b & a & b & b & b & b & b \\
 c & b & b & a & b & b & b & b \\
 c & b & b & b & a & b & b & b \\
 c & b & b & b & b & a & b & b \\
 c & b & b & b & b & b & a & b \\
 c & b & b & b & b & b & b & a
\end{pmatrix}
$$

with

$$a = \frac{\pm 24 - \sqrt{2}}{28}, \; b = \frac{\mp 4 - \sqrt{2}}{28}, \; c = \frac{1}{2\sqrt{2}}.$$

or the tensor product *H* ⊗ *H* ⊗ *H* of the matrix

$$H = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ -1 & -1 \end{pmatrix} / \text{which is } $$


or, more general, we can generate *all* orthogonal 8 × 8 matrices with the entries of the first row equal to 1/(2 <sup>√</sup>2) using the method described in [13], where we start with the well known Euler's formula for the general form of a 3 × 3 rotation matrix. It is also possible to use different orthogonal matrices for the wavelets associated to the decomposition of the cells of type **T** and **M**.

Next, following the ideas in [14] we show how one can transport the above multiresolution analysis and wavelet bases on the 3D ball <sup>B</sup>3, using the volume preserving map <sup>U</sup> : <sup>B</sup><sup>3</sup> <sup>→</sup> <sup>K</sup> constructed in Section 3.

Consider the ball B<sup>3</sup> is given by the parametric equations

$$\xi = \xi(X,\mathcal{Y},Z) = \mathcal{U}^{-1}(X,\mathcal{Y},Z) = (\mathbf{x}(X,\mathcal{Y},Z), y(X,\mathcal{Y},Z), z(X,\mathcal{Y},Z)), \dots$$

with (*X*,*Y*, *<sup>Z</sup>*) <sup>∈</sup> <sup>K</sup>. Since <sup>U</sup> and its inverse preserve the volume, the volume element *<sup>d</sup>ω*(*ξ*) of <sup>B</sup><sup>3</sup> equals the volume element *dX dY dZ* = *d***x** of K (and R3). Therefore, for all *f* , *<sup>g</sup>* <sup>∈</sup> *<sup>L</sup>*2(B3) we have

$$\begin{aligned} \langle \widetilde{f}, \widetilde{\mathcal{g}} \rangle\_{L^2(\mathbb{B}^3)} &= \int\_{\mathbb{B}^3} \overline{\widetilde{f}(\xi)} \widetilde{\mathfrak{z}}(\xi) \, d\omega(\xi) \\ &= \int\_{\mathcal{U}(\mathbb{B}^3)} \overline{\widetilde{f}(\mathcal{U}^{-1}(\mathcal{X}, \mathcal{Y}, \mathbb{Z}))} \, \widetilde{\mathfrak{g}}(\mathcal{U}^{-1}(\mathcal{X}, \mathcal{Y}, \mathbb{Z})) \, d\mathcal{X} \, d\mathcal{Y} \, d\mathcal{Z} \\ &= \langle \widetilde{f} \circ \mathcal{U}^{-1}, \widetilde{\mathfrak{g}} \circ \mathcal{U}^{-1} \rangle\_{L^2(\mathbb{K})} \end{aligned}$$

and similarly, for all *<sup>f</sup>* , *<sup>g</sup>* <sup>∈</sup> *<sup>L</sup>*2(K) we have

$$
\langle f, \mathfrak{g} \rangle\_{L^2(\mathbb{K})} = \langle f \circ \mathcal{U}, \mathfrak{g} \circ \mathcal{U} \rangle\_{L^2(\mathbb{B}^3)}.\tag{17}
$$

If we consider the map <sup>Π</sup> : *<sup>L</sup>*2(B3) <sup>→</sup> *<sup>L</sup>*2(K) induced by <sup>U</sup>, defined by

$$(\Pi \widetilde{f})(X,Y,Z) = \widetilde{f}\left(\mathcal{U}^{-1}(X,Y,Z)\right), \text{ for all } \widetilde{f} \in L^2(\mathbb{B}^3),$$

and its inverse <sup>Π</sup>−<sup>1</sup> : *<sup>L</sup>*2(K) <sup>→</sup> *<sup>L</sup>*2(B3),

$$(\Pi^{-1}f)(\xi) = f(\mathcal{U}(\xi))\_\prime \text{ for all } f \in L^2(\mathbb{K})\_\prime$$

then Π is a unitary map, that is

$$
\langle \Pi \widetilde{f}, \Pi \widetilde{g} \rangle\_{L^2(\mathbb{K})} = \langle \widetilde{f}, \widetilde{g} \rangle\_{L^2(\mathbb{B}^3)'} \tag{18}
$$

$$
\langle \Pi^{-1} f, \Pi^{-1} g \rangle\_{L^2(\mathbb{B}^3)} = \langle f, g \rangle\_{L^2(\mathbb{K})}.\tag{19}
$$

Equality (17) suggests us the construction of orthonormal scaling functions and wavelets defined on <sup>B</sup>3. The scaling functions *<sup>ϕ</sup>*:*D<sup>j</sup> k* : <sup>B</sup><sup>3</sup> <sup>→</sup> <sup>R</sup> will be

$$\widehat{\varphi\_{D\_k^j}} = \varphi\_{D\_k^j} \circ \mathcal{U} = \begin{cases} 1, & \text{on } \mathcal{U}^{-1}(D\_k^j), \\ 0, & \text{in rest.} \end{cases} \tag{20}$$

and the wavelets will be defined similarly,

$$
\widetilde{\psi^\ell\_{A^j}} = \psi^\ell\_{A^j} \circ \mathcal{U}.
$$

From equality (17) we can conclude that the spaces

$$\overrightarrow{V^j} := \text{span}\left\{ \overrightarrow{\boldsymbol{\varphi}\_{D\_k^{j,\prime}}}, k \in \mathcal{N}\_j \right\}.$$

constitute a multiresolution analysis of *<sup>L</sup>*2(B3), each of the set {*ϕ*:*D<sup>j</sup> k* , *k* ∈ N*j*} being an orthonormal basis for the space *V* 7*j* . Moreover, the set

$$\{\overline{\psi\_{A^{j'}}^{\ell}}, \ell \in \mathbb{N}\_{\mathsf{T}}, A^{\bar{j}} \in \mathcal{D}\_{\bar{j}}\}$$

is an orthonormal basis of *W* 7*j* .

### **6. Conclusions and Future Works**

The 3D uniform hierarchical grid constructed here can find applications in texture analysis of crystalls, by constructing a grid in the space of 3D rotations, using the technique used in [3]. A comparison of these grids is subject of a future paper.

Another interesting topic which we are going to approach in the future is to compare our wavelets with other 3D wavelets on the ball, listed in the introduction.

**Author Contributions:** Conceptualization, writing, visualization, A.H. and D.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **A Simple Method for Network Visualization**

### **Jintae Park, Sungha Yoon, Chaeyoung Lee and Junseok Kim \***

Department of Mathematics, Korea University, Seoul 02841, Korea; jintae2002@korea.ac.kr (J.P.); there122@korea.ac.kr (S.Y.); chae1228@korea.ac.kr (C.L.)

**\*** Correspondence: cfdkim@korea.ac.kr

Received: 16 May 2020; Accepted: 19 June 2020; Published: 22 June 2020

**Abstract:** In this article, we present a simple method for network visualization. The proposed method is based on distmesh [P.O. Persson and G. Strang, A simple mesh generator in MATLAB, SIAM Review 46 (2004) pp. 329–345], which is a simple unstructured triangular mesh generator for geometries represented by a signed distance function. We demonstrate a good performance of the proposed algorithm through several network visualization examples.

**Keywords:** Network; graph drawing; planar visualizations

### **1. Introduction**

Since the formation of society, the relationships between its components have been significant. These relationships become more complex as society progresses; in addition, the components of society have also diversified. In sociology, a bundle of relationships is referred to as a network, which became a central concept in sociology in the 1970s. In a modern society called an information society, we have information regarding networks that has been transformed into concrete data. With a vast amount of information, information visualization has been used to analyze network and is gaining popularity. Techniques for information visualization have evolved, and they vary depending on the type of data [1–4]. Among the methods, visualization using graphs is one of the most helpful for understanding data and their relationships. The authors in [5] showed various graphs used in information visualization including tree layouts, H-tree layouts, balloon layout, radial layout, hyperbolic trees, fisheye graphs, and animated radial layouts (see Figure 1 as an example of network plot). Furthermore, toolkits for information visualization such as Prefuse, Protovis, and GUESS have been developed and widely used [1,6–8]. In several studies, nodes represent subjects, such as people and businesses, whereas edges represent relationships, such as friendships and partnerships. The scope of a network is not limited to people and institutions: if something is in an interactive relationship, we can call it a network, and networks can be also graphically identified by data. Network visualization is therefore being used in a variety of fields. For example, analysis for social and personal networks [9], pharmacological networks [10], biological networks [11,12], financial networks [13], and street networks [14] have been actively conducted.

**Figure 1.** Example of a circular network. Reprinted from Salanti et al. [15] with permission from PLoS ONE.

Automatically drawing a network diagram requires algorithms. One of such algorithms is a classical force-directed algorithm that employs straight-edges. The force-directed algorithm treats edges as springs [16]. This algorithm turned the graph representation problem into a mathematical optimization problem. In other words, by reducing the energy generated by the spring system, we can find the equilibrium of the graph. The force-directed method has advantages such as simplicity to use and a good theoretical basis. As a result, many new methods of graph representation have been developed based on the method. As a typical example, Kamada and Kawai introduced an ideal distance for drawing graphs [17]. Let {**X**1, **X**2, ... , **X***n*} be *n*-vertices and assume that they are spring-connected. The total energy of the spring is then expressed as follows:

$$\mathcal{E} = \sum\_{i=1}^{n-1} \sum\_{j=i+1}^{n} \frac{k\_{ij}}{2} \left( |\mathbf{X}\_i - \mathbf{X}\_j| - l\_{ij} \right)^2 \rho$$

where *lij* is the desirable length of the spring between **X***<sup>i</sup>* and **X***j*, *kij* is a parameter representing the strength of this spring, and |·| is the Euclidean norm. The desirable length represents the final length after executing the algorithm, and the strength of the spring refers to the tension of the spring keeping certain distance. The best graph is determined by minimizing E. Please refer to [17] for more details about the algorithm and parameter definition. Another approach for automatically drawing a network diagram is based on the algorithm presented by Hall [18]. The main idea of this algorithm is to find the position of nodes {**X**1, **X**2,..., **X***N*} which minimizes

$$\mathcal{E} = \sum\_{i$$

where *aij* ≥ 0 is the connection weight between **X***<sup>i</sup>* and **X***j*. This algorithm is suitable for application to a structured data such as polyhedron [19]. However, it may not work well on actual data [20]. Rücker and Schwarzer et al. [20] introduced a method of automatically drawing network diagrams using graph theory and studied network meta-analysis. Furthermore, the algorithm was applied to a variety of examples from the literature. Another representative method for drawing network diagrams is the stress majorization [21]. The objective function is defined as follows:

$$\mathcal{E} = \sum\_{i \neq j}^{N} w\_{ij} (|\mathbf{X}\_i - \mathbf{X}\_j| - d\_{ij})^2,\tag{2}$$

where *wij* is the weight between **X***<sup>i</sup>* and **X***j*, and *dij* is an ideal distance. For additional details about the algorithm, please refer to [21]. This algorithm was applied to real networks related to diseases and implemented by using the function *netgraph* in the R package *netmeta* [20].

We propose a simple algorithm for network visualization based on the distmesh algorithm [22] in this paper. The proposed method employs a distance *dij*, which is given by a reciprocal of weight *wij*, hence the computing process is essentially simple. Furthermore, the position of nodes is renewed proportionally by the net force, which is based on the gradient, therefore one can obtain an optimal diagram to the given data. A two-step stopping criterion is applied to further maximize the visual effect of the network diagram. Compared to other methods based on the gradient to optimize total level of movements, for instance, the force-directed method, the stress majorization method, etc., our proposed algorithm is simple to implement.

The contents of this article are organized as follows. In Section 2, the proposed algorithm is described. In Section 3, specific examples of network visualization are presented. Conclusions are presented in Section 4.

### **2. Numerical Algorithm**

### *2.1. Distmesh Algorithm*

A brief introduction to the distmesh algorithm [22], which is employed to generate the triangular mesh in domain with the level set representation, is presented in this section. We define the level set representation in the two-dimensional domain which imposes that the interface structure is treated as the zero-level set. The following procedure depicts the whole algorithm of the distmesh. A function *<sup>ψ</sup>*(*x*, *<sup>y</sup>*) = <sup>8</sup>*x*<sup>2</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> <sup>−</sup> 1 is adopted to a sample level set description. Figure <sup>2</sup> depicts the overall process of distmesh algorithm quite in detail.


$$\mathbf{X}\_{i}^{n+1} = \chi(\mathbf{X}\_{i}^{n+1/2}) \left( \mathbf{X}\_{i}^{n+1/2} - \frac{\nabla \psi(\mathbf{X}\_{i}^{n+1/2})}{|\nabla \psi(\mathbf{X}\_{i}^{n+1/2})|^{2}} \psi(\mathbf{X}\_{i}^{n+1/2}) \right), \tag{3}$$

where *χ*(**X***n*+1/2 *<sup>i</sup>* ) is 1 if **<sup>X</sup>***n*+1/2 *<sup>i</sup>* is placed outside of the boundary; otherwise 0.

**Step 6.** Repeat **Step 3–5** until the level of the total movement of nodes is less than a given tolerance.

**Figure 2.** Schematic illustration of generating the distmesh. (**a**) Generated random nodes in the domain. (**b**) Signed distance function *ψ* in bounding box. The boundary of domain is regarded as the zero-level set. (**c**) Net force **F** in current triangulation. (**d**) Arrangement of nodes via Δ*t***F**. (**e**) Projection of the nodes located outside *ψ* > 0 into the boundary *ψ* ≈ 0 using Equation (3). (**f**) Final result of unstructured mesh by using the distmesh algorithm.

Using the distmesh algorithm, triangular mesh generation can be performed nonuniformly on domain of various shapes. The following Figure 3 is an example of such generated mesh.

**Figure 3.** Example of nonuniformly generated mesh: the airfoil.

### *2.2. Proposed Algorithm for Network Visualization*

The proposed algorithm for network visualization seeks to find {**X**1, **X**2, ... , **X***N*} that minimize the objective function

$$\mathcal{E} = \sum\_{i$$

where *wij* and *dij* are the weighting value and the desired distance between nodes **X***<sup>i</sup>* and **X***j*, respectively. The proposed algorithm is based on distmesh [22], which is a simple unstructured triangular mesh generator for geometries represented by a signed distance function. Let {**X***<sup>n</sup>* <sup>1</sup> , **<sup>X</sup>***<sup>n</sup>* <sup>2</sup> , ... , **<sup>X</sup>***<sup>n</sup> <sup>N</sup>*} be given node positions at iteration *n*. For simplicity of exposition, we assume 0 ≤ *wij* ≤ 1. We then propose the following distance function:

$$d\_{\vec{i}\vec{j}} = d(w\_{\vec{i}\vec{j}}) = \frac{1}{w\_{\vec{i}\vec{j}}^p}, \text{ for } w\_{\vec{i}\vec{j}} > 0,\tag{5}$$

where *p* is a constant. Let minW be the minimum positive value of *wij*, i.e.,

$$\text{minW} = \min\_{\substack{1 \le i, j \le N \\ w\_{ij} > 0}} w\_{ij}.$$

As shown in Figure 4, by setting the minimum distance minD = 1 when *wij* = 1 and the maximum distance maxD when *wij* = minW, we obtain

**Figure 4.** Illustration of distance function *dij* related to weighting value *wij*. minD = 1 and maxD are set to appear when *wij* = 1 and *wij* = minW, respectively.

Figure 5a,b show repulsive and attractive forces at nodes **X***<sup>n</sup> <sup>i</sup>* and **<sup>X</sup>***<sup>n</sup> <sup>j</sup>* when <sup>|</sup>**X***<sup>n</sup> <sup>i</sup>* <sup>−</sup> **<sup>X</sup>***<sup>n</sup> <sup>j</sup>* | < *dij* and |**X***n <sup>i</sup>* <sup>−</sup> **<sup>X</sup>***<sup>n</sup> <sup>j</sup>* | > *dij*, respectively.

**Figure 5.** Two possible forces at nodes **X***<sup>n</sup> <sup>i</sup>* and **<sup>X</sup>***<sup>n</sup> <sup>j</sup>* : (**a**) repulsive force and (**b**) attractive force.

We loop over all the line segments connecting two nodes and compute the net force vector **F***<sup>n</sup> <sup>i</sup>* at each node point **X***<sup>n</sup> i* :

$$\mathbf{F}\_{i}^{n} = \sum\_{\substack{j=1,\ j \neq i \\ \mathbf{n}\_{ij} > 0}}^{N} (|\mathbf{X}\_{i}^{n} - \mathbf{X}\_{j}^{n}| - d\_{ij}) \frac{\mathbf{X}\_{j}^{n} - \mathbf{X}\_{i}^{n}}{|\mathbf{X}\_{j}^{n} - \mathbf{X}\_{i}^{n}|}.$$

Then, we update the position of the node points as

$$\mathbf{X}\_{i}^{n+1} = \mathbf{X}\_{i}^{n} + \Delta t \mathbf{F}\_{i}^{n}, \text{ for } 1 \le i \le N,\tag{7}$$

where Δ*t* is an artificial time step. Upon updating the position of the node points, the network diagram is drawn automatically. The iterative algorithm has reached an equilibrium state if

$$\sqrt{\frac{1}{N} \sum\_{i=1}^{N} |\mathbf{F}\_i^k|^2} < tol\_1 \tag{8}$$

after *k* iterations.

As a concrete example, we consider three points **X**1, **X**2, and **X**3. Assume that the weighting matrix between **X***<sup>i</sup>* and **X***<sup>j</sup>* is given as

$$\mathbf{W} = \begin{pmatrix} 0 & 2 & 4 \\ 2 & 0 & 1 \\ 4 & 1 & 0 \end{pmatrix}.$$

We scale the matrix **W** by dividing the elements by the maximum value among elements and redefine **W** as

$$\mathbf{W} = \begin{pmatrix} 0 & 0.5 & 1 \\ 0.5 & 0 & 0.25 \\ 1 & 0.25 & 0 \end{pmatrix}.$$

Let **X**<sup>0</sup> <sup>1</sup> = ( <sup>3</sup> 4 , 3 √3 <sup>4</sup> ), **<sup>X</sup>**<sup>0</sup> <sup>2</sup> = (0, 0), and **<sup>X</sup>**<sup>0</sup> <sup>3</sup> = ( <sup>3</sup> <sup>2</sup> , 0), where the superscript 0 denotes the starting index. Here, we use Δ*t* = 0.3, minD= 1, maxD= 2, minW= 0.25, and *tol*<sup>1</sup> = 0.01. Consequently, we get *p* = 0.5 and

$$
\left( \begin{array}{ccc} d\_{12} & d\_{13} \\ d\_{21} & d\_{23} \\ d\_{31} & d\_{32} \end{array} \right) = \left( \begin{array}{ccc} \sqrt{2} & 1 \\ & 2 \\ 1 & 2 \end{array} \right) .
$$

Figure 6a indicates the position of the three points with red markers, and the non-zero elements of **W** are represented by gray lines. In particular, the values of each element is expressed by the thickness of the line. The red arrows are net force vectors **F**<sup>0</sup> <sup>1</sup>, **<sup>F</sup>**<sup>0</sup> <sup>2</sup> and **<sup>F</sup>**<sup>0</sup> <sup>3</sup>. Using these net force vectors, we update the positions as

$$\mathbf{X}\_1^1 = \mathbf{X}\_1^0 + \Delta t \mathbf{F}\_{1\prime}^0 \quad \mathbf{X}\_2^1 = \mathbf{X}\_2^0 + \Delta t \mathbf{F}\_{2\prime}^0 \quad \mathbf{X}\_3^1 = \mathbf{X}\_3^0 + \Delta t \mathbf{F}\_{3\prime}^0$$

which are shown in Figure 6b. Figure 6c–e show the network diagrams after 2, 3, and 6 iterations, respectively. The equilibrium state of the network diagram is obtained after 10 iterations as shown in Figure 6f. Even though the nodes are initially arranged in an equilateral triangle with sides of length 1.5, the network diagram in equilibrium is drawn according to the given weights.

**Figure 6.** Schematic of the proposed algorithm. (**a**) initial condition, (**b**) after 1 iteration, (**c**) after 2 iterations, (**d**) after 3 iterations, (**e**) after 6 iterations, and (**f**) equilibrium state after 10 iterations.

### **3. Numerical Results**

In this section, we present the generation of a network diagram with more data to confirm the efficiency and robustness of the proposed method. Specifically, we select 19 nodes and 19 × 19 matrix **W**, which are given in Appendix A. The matrix is created based on the dialogue between the characters in William Shakespeare's play, 'The Venice Merchant'. Each element *wij* of the matrix is the cumulative number of conversations between person *i* and person *j*. The parameters used are Δ*t* = 0.01, minD = 1, maxD = 2, and *tol*<sup>1</sup> = 0.01. The value of *p* is then approximately 0.1879. Figure 7 shows process of the network visualization by our proposed method. The equilibrium state of the network diagram appears after 1985 iterations.

After 1985 iterations, each node is appropriately located according to the weights between the nodes in the network. This means that even if the nodes are initially randomly arranged, the network diagram is well drawn by our distance function. While the network plot is drawn, we can see that the objective function E is decreasing. As shown in Figure 8, E decreases and converges as time goes by. This shows that our proposed method has a mathematical basis for drawing the network diagram.

**Figure 7.** Snapshots of the network visualization process for 'The Venice Merchant': (**a**) initial condition, (**b**) after 20 iteration, (**c**) after 40 iterations, and (**d**) equilibrium state after 1985 iterations.

**Figure 8.** E decreases and converges while each node is moving.

However, the equilibrium state network diagram is not visually good. This is due to the nodes (9, 13, 15, 16, 17, 18) that have only one connection. Therefore, we further update the location of the nodes that have only one connection while fixing the other nodes. Let Ω*<sup>s</sup>* be the index set of the nodes having only one connection. We compute the net force vector **F***<sup>n</sup> <sup>i</sup>* at each node point *i* ∈ Ω*<sup>s</sup>* as follows:

$$\mathbf{F}\_i^n = \sum\_{\substack{j=1,\ j \neq i \\ w\_{ij} > 0}}^N \frac{\mathbf{X}\_i^n - \mathbf{X}\_j^n}{|\mathbf{X}\_i^n - \mathbf{X}\_j^n|}.$$

Then, we temporally update the node points as

$$\mathbf{X}\_{i}^{\*} = \mathbf{X}\_{i}^{\mathrm{u}} + \Delta t \mathbf{F}\_{i}^{\mathrm{u}} \text{ for } i \in \Omega\_{\delta} \tag{9}$$

where Δ*t* = 10 is used. Finally, we set

$$\mathbf{X}\_{i}^{n+1} = \mathbf{X}\_{j}^{n} + d\_{ij} \frac{\mathbf{X}\_{i}^{\*} - \mathbf{X}\_{j}^{n}}{|\mathbf{X}\_{i}^{\*} - \mathbf{X}\_{j}^{n}|} \text{ for } i \in \Omega\_{\boldsymbol{\theta}} \text{ and } \mathbf{X}\_{i}^{n+1} = \mathbf{X}\_{i}^{n} \text{ for } i \notin \Omega\_{\boldsymbol{\theta}} \tag{10}$$

where **X***<sup>n</sup> <sup>j</sup>* is the unique node connecting **X**<sup>∗</sup> *<sup>i</sup>* . We define that the equilibrium state of the second step has been attained if

$$\sqrt{\frac{1}{|\Omega\_{\mathsf{s}}|} \sum\_{i \in \Omega\_{\mathsf{s}}} |\mathsf{X}\_{i}^{k+1} - \mathsf{X}\_{i}^{k}|^{2}} < tol\_{2} \tag{11}$$

after *k* iterations, where |Ω*s*| is the counting measure. Here, *tol*<sup>2</sup> = 0.002 is used. Therefore, the second step effectively rotates the node that has only one connection around the connecting node so that the overall distribution of the nodes is scattered.

Figure 9 illustrates the process of updating the position of nodes (red makers) that have only one connection. Figure 9a–d shows the network in the equilibrium state of the first step, after 1 iteration of the second step, after 2 iterations of the second step, and in the equilibrium state of the second step after 75 iterations.

**Figure 9.** Updating the position of nodes with only one connection: (**a**) Equilibrium state of the first step, (**b**) after 1 iteration, (**c**) after 2 iterations, and (**d**) equilibrium state of the second step after 75 iterations.

Next, we consider another example 'Romeo and Juliet' which is a play written by William Shakespeare. Matrix **W** is defined by counting the number of conversations between 27 characters. The parameters used are minD = 1, maxD = 3, and *tol*<sup>1</sup> = *tol*<sup>2</sup> = 0.002, and then the value of *p* is approximately 0.2493. In particular, time step Δ*t* = 0.2 and Δ*t* = 10 are used in the first step and the

second step, respectively. Figure 10a–c illustrate the character network at the initial condition, after the first step, and after the second step, respectively. From the results, we can find the main characters and relatively small parts.

**Figure 10.** Snapshots of network visualization for 'Romeo and Juliet': (**a**) the initial condition, (**b**) after 230 iterations of the first step, and (**c**) after 20 iterations of the second step.

#### **4. Conclusions**

In this paper, we have proposed a simple method based on distmesh for network visualization. We have demonstrated the good performance of the proposed algorithm through network visualization examples. We can provide the MATLAB source code of this method for the interested readers. In future work, we plan to investigate effective network diagrams for character networks from novels and movies. We may further speed up the computation of the proposed method by using a Gauss–Newton–secant type method [23].

**Author Contributions:** All authors contributed equally to this work; J.P., S.Y., C.L. and J.K. critically reviewed the manuscript. All authors have read and agree to the published version of the manuscript.

**Funding:** The corresponding author (J. Kim) expresses thanks for the support from the BK21 PLUS program.

**Acknowledgments:** The authors thank the editor and the reviewers for their constructive and helpful comments on the revision of this article.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Appendix A**

In this appendix, we provide the MATLAB source codes for network visualization. The following code is for 'The Merchant of Venice'. The code for 'Romeo and Juliet' is available on the following website:

http://elie.korea.ac.kr/~cfdkim/codes/

Listing A1: Matlab Code for the network visualization.

```
% The f i r s t s tep
clear ;
W= [ 0 21 24 16 0 0 0 2 0 7 4 5 0 0 0 0 0 0 1
21 0 27 32 0 0 0 0 0 2 0 11 2 3 2 0 0 0 0
24 27 0 40 0 0 7 0 12 6 0 2 0 5 0 0 0 0 2
16 32 40 0 36 3 0 7 0 0 0 10 0 0 0 3 13 2 0
0 0 0 36 0 2 0 0 0 0 0 4 0 0 0 0 0 0 0
0 0 0 3 2 0 15 0 0 0 0 40000 000
0 0 7 0 0 15 0 0 0 0 0 00300 000
2 0 0 7 0 0 00 000 00000 000
0 0 12 0 0 0 0 0 0 0 0 00000 000
7 2 6 0 0 0 00 009 00000 000
4 0 0 0 0 0 00 090 00000 000
5 11 2 10 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0
0 2 0 0 0 0 00 000 00000 000
0 3 5 0 0 0 30 000 00000 000
0 2 0 0 0 0 00 000 00000 000
0 0 0 3 0 0 00 000 00000 000
0 0 0 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 2 0 0 00 000 00000 000
1 0 2 0 0 0 00 000 0 0 0 0 0 0 0 0];
N=size (W, 1 ) ; rand (" seed " ,3773) ; t=rand (N, 1 ) ;
xy=[ cos ( 2∗ pi ∗ t), sin ( 2∗ pi ∗ t ) ] ; W=W/max(max(W) ) ;
minW=min(min(W(W>0 ) ) ) ; minD= 1; maxD= 2; p=−log (maxD) /log (minW) ;
for i = 1:N
for j = 1:N
i f W( i , j ) >0
d ( i , j ) =1/W( i , j ) ^p ;
end
end
end
dt =0.01; tol =0.01; n=0; error =2∗ tol ;
while error ≥ tol
n=n+ 1; F = zeros (N, 2 ) ;
for i = 1:N
for j = i + 1:N
i f W( i , j ) >0
vt = xy ( j , : )−xy ( i , : ) ;
F(i ,: ) = F(i ,: ) + (norm( vt )−d(i , j ) ) ∗ vt/norm( vt ) ;
F(j ,: ) = F(j ,: ) − (norm( vt )−d(i , j ) ) ∗ vt/norm( vt ) ;
end
end
end
xy = xy + dt ∗F ; error=norm(F)/sqrt (N) ;
i f n==1 || mod( n , 1 0 ) ==0 || error <tol
figure ( 1 ) ; DrawNetwork ( xy ,W) ; pause (0.1)
end
end
% The second s tep
z=find ( sum(W>0 ) ==1) ; M=length (z);
for k = 1:M
s (k)=find (W( z ( k ) , : ) >0) ;
```

```
Mathematics 2020, 8, 1020
```

```
end
xy0=xy ; n=0; dt =10.0; t ol =0.002; error =2∗ tol ;
while error ≥ tol
n=n+ 1; F = zeros (N, 2 ) ;
for k = 1:M
v=[0 0 ];
for j = 1:N
vt = xy ( z ( k ) , : )−xy ( j , : ) ;
i f norm( vt ) >0
v=v+v t/norm( vt ) ;
end
end
F ( z ( k ) , : ) =v/norm(v) ;
end
xy = xy + dt ∗F ;
error =0;
for k = 1:M
v=xy ( z ( k ) , : )−xy ( s ( k ) , : ) ;
xy ( z ( k ) , : ) =xy ( s ( k ) , : ) +d ( z ( k ) , s ( k ) ) ∗v/norm(v) ;
error=error+norm( xy ( z ( k ) , : )−xy0 ( z (k ) , : ) ) ^2;
end
error=sqrt ( error/M) ; xy0=xy ;
figure ( 2 ) ; DrawNetwork ( xy ,W) ; pause (0.1)
end
```
Listing A2: Function code for DrawNetwork.

```
function DrawNetwork ( xy ,W)
N=length ( xy ) ; clf ; hold on
for i = 1:N
for j = i + 1:N
i f W( i , j ) >0
plot ( xy ( [ i , j ] , 1 ) , xy ( [ i , j ] , 2 ) , " b " , " linewidth " ,15∗W( i , j ) ^2+1) ;
end
end
end
s c a t t e r ( xy ( : , 1 ) , xy ( : , 2 ) ,400 , " g " , " f illed ") ;
for i = 1:N
text ( xy ( i , 1 ) −0.04 ,xy ( i , 2 ) , num2str(i));
end
axis off ; axis image ;
end
```
### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Numerical Solution of the Cauchy-Type Singular Integral Equation with a Highly Oscillatory Kernel Function**

#### **SAIRA 1,**† **, Shuhuang Xiang 1,\* ,**† **and Guidong Liu <sup>2</sup>**


Received: 16 August 2019; Accepted: 18 September 2019; Published: 20 September 2019

**Abstract:** This paper aims to present a Clenshaw–Curtis–Filon quadrature to approximate the solution of various cases of Cauchy-type singular integral equations (CSIEs) of the second kind with a highly oscillatory kernel function. We adduce that the zero case oscillation (*k* = 0) proposed method gives more accurate results than the scheme introduced in Dezhbord et al. (2016) and Eshkuvatov et al. (2009) for small values of *N*. Finally, this paper illustrates some error analyses and numerical results for CSIEs.

**Keywords:** Clenshaw–Curtis–Filon; high oscillation; singular integral equations; boundary singularities

### **1. Introduction**

Integral equations have broad roots in branches of science and engineering [1–6]. Cauchy-type singular integral equations (CSIEs) of the second kind occur in electromagnetic scattering and quantum mechanics [7] and are defined as:

$$
tau\text{(x)} + \frac{b}{\tau\text{f}} \int\_{-1}^{1} \frac{u(y)\mathbb{K}\langle x, y\rangle}{y - x} dy = f(\mathbf{x}), \quad \mathbf{x} \in \{-1, 1\}.\tag{1}
$$

A singular integral equation with a Cauchy principal value is a generalized form of an airfoil equation [8]. Here *a* and *b* are constants such that *a* <sup>2</sup> + *b* <sup>2</sup> <sup>=</sup> 1, *<sup>b</sup>* <sup>≠</sup> 0 and *<sup>K</sup>*(*x*, *<sup>y</sup>*) <sup>=</sup> *<sup>e</sup> ik*(*y*−*x*) are the highly oscillatory kernel function. The function *f*(*x*) is the Hölder continuous function, whereas *u*(*x*) is an unknown function. The solution to the above-mentioned Equation (1) contains boundary singularities *w*(*x*) = (*x* + <sup>1</sup>) *α* (<sup>1</sup> − *x*) *β* , i.e., *u*(*x*) = *w*(*x*)*g*(*x*) and *g*(*x*) is a smooth function [9,10]. Then the above Equation (1) transforms into:

$$
tau \text{(x)} \text{g(x)} + \frac{b}{\pi} \int\_{-1}^{1} \frac{w(y)g(y)e^{ik(y-x)}}{y-x} dy = f(\text{x}), \quad \text{x} \in \{-1, 1\}, \tag{2}
$$

where *α*, *β* ∈ (−1, 1) depend on *a* and *b*, such that:

$$\alpha = \frac{1}{2\pi i} \log\left(\frac{a - ib}{a + ib}\right) - N \quad , \quad \beta = \frac{-1}{2\pi i} \log\left(\frac{a - ib}{a + ib}\right) - M\_\prime \tag{3}$$

$$\kappa = -\{\alpha + \beta\} = M + N.$$

Here *M* and *N* are integers in [−1, 1], whereas the index of the integral equation is called *κ*, analogous to a class of functions, wherein the solution is to be sought. It is pertinent to mention that to

produce integrable singularities in the solution, the index *κ* is restricted to three cases, [−1, 0, 1], but the addressed paper considers only two cases for *κ*, i.e., *κ* ≤ 0. The value of the index *κ* depends on different values for *M* and *N* [11–13]. A great number of real life practical problems, e.g., for *κ* = −1, the so-called natched half-plane problem and another problem of a crack parallel to the free boundary of an isotropic semi-infinite plane, that can be reduced to Cauchy singular integral equations are addressed in [14–17]. Writing Equation (2) in operator form, we get [18]:

$$H\!\!\!g = f,\tag{4}$$

where:

$$H\_{\mathcal{S}} = aw(\mathbf{x})\mathcal{g}(\mathbf{x}) + \frac{b}{\pi} \int\_{-1}^{1} \frac{w(\mathbf{y})\mathcal{g}(\mathbf{y})e^{ik(\boldsymbol{y}-\boldsymbol{\chi})}}{\boldsymbol{y}-\boldsymbol{\chi}}d\boldsymbol{y}.$$

Let us define another operator:

$$H'f = aw^\star(x)f(x) - \frac{b}{\pi} \int\_{-1}^{1} \frac{w^\star(y)f(y)e^{ik\langle y-x\rangle}}{y-x} dy. \tag{5}$$

further:

$$\begin{aligned} HH' &= I & if\kappa > 0\\ HH' &= H'H = I & if\kappa = 0\\ H'H &= I & if\kappa < 0 \end{aligned} \tag{6}$$

where *w*<sup>∗</sup> (*x*) = (1 + *x*) −*α* (1 − *x*) −*β* .

It is worthy mentioning the fact that the solution for CSIE exists but unfortunately it is not unique, as CSIE has three solution cases for different values of *κ*. The aforementioned theorem appertains to the existence of the solution of CSIE for case *κ* = 0.

**Theorem 1.** *[13,15] (Existence of CSIEs) Let the singular integral Equation (2) be equivalent to a Fredholm integral equation, which implies that every solution of a Fredholm integral equation is the solution of a singular integral equation and vice versa.*

**Proof.** Based on Equations (4)–(6) the SIE (2) can be transforms into:

$$\mathbf{g} = \vec{H}\vec{f}.$$

Furthermore, it can be written as a Fredholm integral equation:

$$
\mu \langle y \rangle + \int\_{-1}^{1} N(y, \tau) y(\tau) d\tau = F(y). \tag{7}
$$

where:

$$F(y) = \frac{b}{\pi} w(y) \int\_{-1}^{1} \frac{w^\*(x) f(x)}{y - x} dx \,\mu$$

and:

$$N(\underline{y},\tau) = aK(\underline{x},\tau)w^{-1} - \frac{b}{\tau\tau}w(\underline{y})\int\_{-1}^{1} \frac{w^{\star}(\underline{x})K(\underline{x},\tau)}{\underline{y}-\underline{x}}d\underline{x}.$$

Thus the claimed theorem is proven.

Moreover, for Equation (1) we have three cases for *κ*:

$$\kappa = \begin{cases} 1, & \alpha < 0, -1 < \beta, \quad \alpha \neq \beta, \\ -1, & 0 < \beta, \alpha < 1, \quad \alpha \neq \beta, \\ 0, & \alpha = -\beta, \quad |\beta| \neq \frac{1}{2}. \end{cases} \tag{8}$$

Similarly, solution cases of the CSIE of the second type depending on values of *κ* are:

• 1: The solution *u*(*x*) for *κ* = 1 is unbounded at both end points *x* = ±1:

$$u(\mathbf{x}) = af(\mathbf{x}) - \frac{bw(\mathbf{x})}{\pi} e^{-i\mathbf{k}\mathbf{x}} \oint\_{-1}^{1} \frac{w^{\star}(\mathbf{y})f(\mathbf{y})e^{i\mathbf{k}\mathbf{y}}}{\mathbf{y} - \mathbf{x}} dy + \text{Cov}(\mathbf{x}),\tag{9}$$

where *C* is an arbitrary constant such that:

$$\int\_{-1}^{1} u(y)e^{i\mathbf{k}y} dy = \text{C.}\tag{10}$$

Equation (2) gets infinitely many solutions but is unique for the above condition.

• 2: The solution *u*(*x*) is bounded for *κ* = 0 at *x* = ±1 and unbounded at *x* = ∓1:

$$u(\mathbf{x}) = af(\mathbf{x}) - \frac{bw(\mathbf{x})}{\pi} \varepsilon^{-ikx} \oint\_{-1}^{1} \frac{w^\*(\mathbf{y}) f(\mathbf{y}) \varepsilon^{iky}}{y - \mathbf{x}} d\mathbf{y},\tag{11}$$

Equation (2) gets a unique solution.

• 3: The solution *u*(*x*) is bounded at both end points *x* = ±1 for *κ* = −1:

$$u(\mathbf{x}) = af(\mathbf{x}) - \frac{bw(\mathbf{x})}{\pi t} e^{-i\mathbf{k}\mathbf{x}} \int\_{-1}^{1} \frac{w^\*(\mathbf{y})f(\mathbf{y})e^{i\mathbf{y}}}{\mathbf{y} - \mathbf{x}} d\mathbf{y}.\tag{12}$$

Equation (2) has no solution unless it satisfies the following condition:

$$\int\_{-1}^{1} \frac{f(y)e^{iky}}{w(y)} dy = 0. \tag{13}$$

For many decades researchers have been struggling to find an efficient method to get these solutions. The Galerkin method, polynomial collocation method, Clenshaw–Curtis–Filon method and the steepest descent method are some of the eminent methods among many others for the solution of SIEs [19–24]. Moreover, Chakarbarti and Berge [25] for a linear function *f*(*x*) gave an approximated method based on polynomial approximation and Chebyshev points. Z.K. Eshkuvatov [10] introduced the method taking Chebyshev polynomials of all four kinds for all four different solution cases of the CSIE. Reproducing the kernel Hilbert space (RKHS) method has been proposed by A. Dezhbord et al. [26]. The representation of solution u(x) is in the form of a series in reproducing kernel spaces.

This research work introduces the Clenshaw–Curtis–Filon quadrature to approximate the solution for various cases of a Cauchy singular integral equation of the second kind, Equation (1), at equally spaced points *xi*. So the integral equation takes the form:

$$u\_N(\mathbf{x}\_i) = af(\mathbf{x}\_i) - \frac{bw(\mathbf{x}\_i)}{\pi} \varepsilon^{-ikx\_i} \int\_{-1}^{1} \frac{w^\*(\mathbf{y}) f\_N(\mathbf{y}) \varepsilon^{iky}}{y - \mathbf{x}\_i} d\mathbf{y},\tag{14}$$

depending on the *κ*. Furthermore, the results of the numerical example are compared with [10,26] for *k* = 0. Comparison reveals that the addressed method gives a more accurate approximation than these methods, Section 4 provides this phenomena. The rest of the paper is organised as follows; Section 2 defines the numerical evaluation of the Cauchy integral in CSIE and approximates the solution at equally spaced points *xi*. Section 3 represents some error analyses for CSIE. Section 4 concludes this paper by giving numerical results.

### **2. Description of the Method**

The presented Clenshaw–Curtis–Filon quadrature to approximate the integral term *I*(*α*, *β*, *k*, *x*) = ⨍ 1 −1 *w*(*y*)*f*(*y*)*e iky <sup>y</sup>*−*<sup>x</sup> dy* consists of replacing function *f*(*y*) by its interpolation polynomial *PN*(*y*) at Clenshaw–Curtis point set, *yj* <sup>=</sup> *cosj<sup>π</sup> <sup>N</sup>* , *<sup>j</sup>* <sup>=</sup> 0, 1, <sup>⋯</sup>, *<sup>N</sup>*. Rewriting the interpolation in terms of the Chebyshev series:

$$f(\mathbf{y}) \approx P\_N(\mathbf{y}) = \sum\_{n=0}^{N} \, ^n c\_n \, T\_n(\mathbf{y}). \tag{15}$$

Here *Tn*(*y*) is the Chebyshev polynomial of the first kind of degree *n*. Double prime denotes a summation, wherein the first and last terms are divided by 2. The FFT is used for proficient calculation of the coefficient *cn* [27,28], defined as:

$$c\_n = \frac{2}{N} \sum\_{j=0}^{N} \, ''f(y\_j) T\_n(y\_j).$$

Let it be that for any fixed *x* we can elect *N* s.t *x* ∉ {*yj*}; then the interpolation polynomial is rewritten in the form of a Chebyshev series as:

$$P\_{N+1}\{y\} = \sum\_{n=0}^{N+1} a\_n \, T\_n\{y\}$$

where *an* can be computed in *O*(*N*) operations once *cn* are calculated [27,29]. The Clenshaw–Curtis–Filon quadrature rule for integral *I*(*α*, *β*, *k*, *x*) is defined as:

$$M\{a,\beta,k,x\} = \int\_{-1}^{1} \frac{w(y)f(y)e^{iky}}{y-x} dy = \int\_{-1}^{1} \frac{w(y)P\_{N+1}(y)e^{iky}}{y-x} dy = \sum\_{n=0}^{N+1} a\_n M\_n\{a,\beta,k,x\},\tag{16}$$

where *Mn*(*α*, *β*, *k*, *x*) = ⨍ 1 −1 *w*(*y*)*Tn*(*y*)*e iky <sup>y</sup>*−*<sup>x</sup> dy* are the modified moments. The forthcoming subsection defines the method to compute the moments *Mn*(*α*, *β*, *k*, *x*) efficiently.

### *Computation of Moments*

A well known property for *Tn*(*y*) is defined as [30]:

$$\frac{T\_n(\mathbf{y}) - T\_n(\mathbf{x})}{y - \mathbf{x}} = 2 \sum\_{j=0}^{n-1} \mathsf{U}\_{n-1-j} \langle \mathbf{y} \rangle T\_j(\mathbf{x}) = 2 \sum\_{j=0}^{n-1} \mathsf{U}\_{n-1-j} \langle \mathbf{x} \rangle T\_j(\mathbf{y}), \tag{17}$$

where the prime indicates the summation whose first term is divided by 2 and *Un*(*y*) is the Chebyshev polynomial of the second kind.

$$\begin{split} \mathcal{M}\_{n}(\boldsymbol{\mu},\boldsymbol{\beta},k,\mathbf{x}) &= \int\_{-1}^{1} \frac{w(\boldsymbol{y})T\_{n}(\boldsymbol{y})e^{ik\boldsymbol{y}}}{\boldsymbol{y}-\boldsymbol{x}}d\boldsymbol{y} \\ &= \int\_{-1}^{1} \frac{w(\boldsymbol{y})(T\_{n}(\boldsymbol{y})-T\_{n}(\boldsymbol{x})+T\_{n}(\boldsymbol{x}))e^{ik\boldsymbol{y}}}{\boldsymbol{y}-\boldsymbol{x}}d\boldsymbol{y} \\ &= \int\_{-1}^{1} \frac{w(\boldsymbol{y})(T\_{n}(\boldsymbol{y})-T\_{n}(\boldsymbol{x}))e^{ik\boldsymbol{y}}}{\boldsymbol{y}-\boldsymbol{x}}d\boldsymbol{y} + T\_{n}(\boldsymbol{x})\int\_{-1}^{1} \frac{w(\boldsymbol{y})e^{ik\boldsymbol{y}}}{\boldsymbol{y}-\boldsymbol{x}}d\boldsymbol{y} \\ &= \int\_{-1}^{1} w(\boldsymbol{y})\Big{2}\sum\_{j=0}^{n-1} \mathsf{U}\_{n-1-j}(\boldsymbol{x})T\_{j}(\boldsymbol{y})\Big{e}^{ik\boldsymbol{y}}d\boldsymbol{y} + T\_{n}(\boldsymbol{x})\int\_{-1}^{1} \frac{w(\boldsymbol{y})e^{ik\boldsymbol{y}}}{\boldsymbol{y}-\boldsymbol{x}}d\boldsymbol{y} \\ &= 2\sum\_{j=0}^{n-1} \mathsf{U}\_{n-1-j}(\boldsymbol{x})\int\_{-1}^{1} w(\boldsymbol{y})T\_{j}(\boldsymbol{y})e^{ik\boldsymbol{y}}d\boldsymbol{y} + T\_{n}(\boldsymbol{x})\int\_{-1}^{1} \frac{w(\boldsymbol{y})e^{ik\boldsymbol{y}}}{\boldsymbol{y}-\boldsymbol{x}}d\boldsymbol{y} \end{split} \tag{18}$$

Piessens and Branders [31] have addressed the fourth homogenous recurrence relation for the integral without singularity *Mn*(*α*, *β*, *k*) = ∫ 1 <sup>−</sup><sup>1</sup> *w*(*y*)*Tj*(*y*)*e ikydy*.

$$i\hbar \overline{\mathcal{M}}\_{n+2} + 2\{n+a+\beta+2\} \overline{\mathcal{M}}\_{n+1} - 2\{2a-2\beta+ik\} \overline{\mathcal{M}}\_n - 2\{n-a-\beta-2\} \overline{\mathcal{M}}\_{n-1} + ik\overline{\mathcal{M}}\_{n-2} = 0, \quad n \ge 2,\tag{19}$$

along with four initial values:

$$\begin{aligned} \overline{\mathcal{M}\_0} &= 2^{\alpha+\beta+1} e^{-ik} \frac{\Gamma(\alpha+1)\Gamma(\beta+1)}{\Gamma(\alpha+\beta+2)} F\_1(\alpha+1; \alpha+\beta+2; 2ik), \\\\ \overline{\mathcal{M}\_1}^0 &= M\_0(\mathbf{x}, \alpha+1, \beta, k) - M\_0(\mathbf{x}, \alpha, \beta, k), \\\\ \overline{\mathcal{M}\_2}^0 &= \frac{i}{k} [2(\alpha+\beta+2)M\_1 - \{2\alpha - 2\beta + ik\} M\_0], \\\\ \overline{\mathcal{M}\_3}^0 &= \frac{i}{k} [2\{\alpha+\beta+3\}M\_2 - \{4\alpha - 4\beta + ik\} M\_1 + 2\{\alpha+\beta+1\} M\_0]. \end{aligned} \tag{20}$$

where *F*1(*α* + 1; *α* + *β* + 2; 2*ik*) stands for confluent hypergeometric function of the first kind. Unfortunately the discussed recurrence relation for moments *Mn*(*α*, *β*, *k*) is numerically unstable in the forward direction for *n* > *k*; in this sense by applying Oliver's algorithm these modified moments can be computed efficiently [31,32].

The integral ⨍ 1 −1 *w*(*y*)*e iky <sup>y</sup>*−*<sup>x</sup> dy* is computed by the steepest descent method; the original idea was given by Huybrenchs and Vandewalle [33] for sufficiently high oscillatory integrals.

**Proposition 1.** *The Cauchy singular integral* ⨍ 1 −1 *w*(*y*)*e iky <sup>y</sup>*−*<sup>x</sup> dy can be transformed into:*

$$\int\_{-1}^{1} \frac{w(y)e^{iky}}{y-x} dy = S\_{-1} - S\_1 + i\tau w(x)e^{ikx} \tag{21}$$

*where:*

$$\begin{split} S\_{-1} &= i^{a+1} e^{-ik} \int\_0^\infty \frac{y^\alpha (2 - iy)^\beta}{-1 + iy - \chi} e^{-ky} dy \\ S\_1 &= \{-i\}^{\beta+1} e^{ik} \int\_0^\infty \frac{y^\beta (2 + iy)^\alpha}{1 + iy - \chi} e^{-ky} dy. \end{split} \tag{22}$$

**Proof.** Readers are referred to [34] for more details.

*Mathematics* **2019**, *7*, 872

The generalized Gauss Laguerre quadrature rule can be used to evaluate the integrals *S*−<sup>1</sup> and *S*<sup>1</sup> in the above equation by using the command lagpts in chebfun [35]. Let {*y<sup>α</sup> <sup>j</sup>* , *<sup>w</sup><sup>α</sup> j* } *k <sup>j</sup>*=<sup>1</sup> be the nodes and weights of the weight functions *y<sup>α</sup> e* <sup>−</sup>*<sup>y</sup>* and {*<sup>y</sup> β <sup>j</sup>* , *<sup>w</sup><sup>β</sup> j* } *k <sup>j</sup>*=<sup>1</sup> be the nodes and weights of the weight functions *yβ e* <sup>−</sup>*<sup>y</sup>* in accordance with the generalized Gauss Laguerre quadrature rule. Moreover, these integrals can be approximated by:

$$\begin{split} S\_{-1} &\approx Q\_k = \left(\frac{i}{k}\right)^{a+1} e^{-ik} \sum\_{j=1}^k w\_j^a \frac{\{2 - \{i/k\} y\_j^a\}^\beta}{\{-1 + \{i/k\} y\_j^a - x\}} \\ S\_1 &\approx Q\_k = \left(\frac{i}{k}\right)^{\beta+1} e^{ik} \sum\_{j=1}^k w\_j^\beta \frac{\{2 + \{i/k\} y\_j^\beta\}^a}{\{-1 + \{i/k\} y\_j^\beta - x\}}. \end{split} \tag{23}$$

*Mn*(*α*, *β*, *k*, *x*) is obtained by substituting Equations (19) and (21) into the last equality of Equation (18). Finally, together with Equations (16) and (14), the approximate solution:

$$u\_N(\mathbf{x}\_i) = af(\mathbf{x}\_i) - \frac{bw(\mathbf{x}\_i)}{\pi} e^{-ikx\_i} \sum\_{n=0}^{N+1} a\_n M\_n(\alpha, \beta, k, \mathbf{x}),\tag{24}$$

for CSIE (1) is derived for different solution cases at equally spaced points.

### **3. Error Analysis**

**Lemma 1.** *[36,37] Let f*(*x*) *be a Lipschitz continuous function on [*−*1, 1] and PN*[*f*] *be the interpolation polynomial of f*(*x*) *at N* + 1 *Clenshaw–Curtis points. Then it follows that:*

$$\lim\_{N \to \infty} \|\|f - P\_N[f]\|\|\_{\infty} = 0. \tag{25}$$

*In particular,*

*• (i) if f*(*x*) *is analytic with* ∣*f*(*x*)∣ ≤ *M in an ellipse ερ (Bernstein ellipse) with foci* ±<sup>1</sup> *and major and minor semiaxis lengths summing to ρ* > 1*, then:*

$$\|\|f - P\_N[f]\|\|\_{\infty} \le \frac{4M}{\rho^N(\rho - 1)}.\tag{26}$$

*• (ii) if f*(*x*) *has an absolutely continuous* (*κ*<sup>0</sup> −1)*st derivative and a κ*0*th derivative f* (*κ*0) *of bounded variation Vκ*<sup>0</sup> *on [*−*1,1] for some κ*<sup>0</sup> ≥ 1*, then for N* ≥ *κ*<sup>0</sup> + 1*:*

$$\|\|f - P\_N[f]\_{\infty}\|\| \le \frac{4V\_{\kappa\_0}}{\kappa\_0 \pi N \{N-1\} \cdots \{N-\kappa\_0+1\}}.\tag{27}$$

**Proposition 2.** *[29] Suppose that <sup>f</sup>*(*y*) <sup>∈</sup> *<sup>C</sup>R*+<sup>2</sup> [−1, 1] *with R* = ⌈*min*{*α*, *β*}⌉*, then the error of the Clenshaw–Curtis–Filon quadrature rule for integral I*[*f*] *satisfies:*

$$E\_N = |I\{\mathfrak{a}, \beta, k, \mathbf{x}\} - I\_N\{\mathfrak{a}, \beta, k, \mathbf{x}\}| = O\{k^{-2-\min\{\mathfrak{a}, \beta\}}\}, \qquad k \to \infty. \tag{28}$$

**Theorem 2.** *Suppose that uN*(*x*) *is the approximate solution of u*(*x*) *of CSIE for case κ* ≤ 0*, then for error* ∣*u*(*x*) − *uN*(*x*)∣*, x* ∈ (−1, 1)*, the Clenshaw–Curtis–Filon quadrature is convergent, i.e.:*

$$\lim\_{N \to \infty} |u(\mathbf{x}) - u\_N(\mathbf{x})| = 0. \tag{29}$$

**Proof.** Suppose that *<sup>x</sup>* <sup>∉</sup> *YN*+1, *<sup>f</sup>* <sup>∈</sup> *<sup>C</sup>*<sup>2</sup> [−1, 1] and let

$$Q(y) = \begin{cases} \frac{f(y) - f(x)}{y - x}, & y \neq x \\ f'(x), & y = x. \end{cases}$$

It is stated that *<sup>Q</sup>*(*y*) <sup>∈</sup> *<sup>C</sup>*<sup>1</sup> [−1, 1] and ∥*Q* ′ <sup>∥</sup><sup>∞</sup> <sup>≤</sup> <sup>3</sup> <sup>2</sup>∥ *f* ′′ <sup>∥</sup>∞, in addition *<sup>R</sup>*(*y*) <sup>=</sup> *PN*+1(*y*)−*f*(*x*) *<sup>y</sup>*−*<sup>x</sup>* is a polynomial of degree at most *N*. Then error for solutions *u*(*x*) and *uN*(*x*) to CSIE for cases *κ* ≤ 0 is defined as:

$$\begin{aligned} u(\mathbf{x}) &= af(\mathbf{x}) - \frac{b}{\mathcal{T}} e^{-ik\mathbf{x}} w(\mathbf{x}) \oint\_{-1}^{1} \frac{w^{\star}(\mathbf{y}) f(\mathbf{y}) e^{ik\mathbf{y}}}{\mathbf{y} - \mathbf{x}} d\mathbf{y}, \\ u\_{N}(\mathbf{x}) &= af(\mathbf{x}) - \frac{b}{\mathcal{T}} e^{-ik\mathbf{x}} w(\mathbf{x}) \oint\_{-1}^{1} \frac{w^{\star}(\mathbf{y}) \bar{P}\_{N+1}(\mathbf{y}) e^{ik\mathbf{y}}}{\mathbf{y} - \mathbf{x}} d\mathbf{y}. \end{aligned}$$

Then:

$$\begin{aligned} |u(\mathbf{x}) - u\_N(\mathbf{x})| &= \left| a(f(\mathbf{x}) - f(\mathbf{x})) - \frac{b}{\tau t} e^{-ik\mathbf{x}} w(\mathbf{x}) \int\_{-1}^{1} w^\star(y) (Q(y) - R(y)) e^{iky} dy \right| \\ &\le \frac{b}{\tau t} w(\mathbf{x}) \int\_{-1}^{1} w^\star(y) dy ||Q(y) - R(y)||\_{\infty} \\ &= D ||Q(y) - R(y)||\_{\infty} .\end{aligned}$$

where *<sup>D</sup>* <sup>=</sup> *bw*(*x*)<sup>2</sup> *α*+*β*+1 Γ(*α*+1)Γ(*β*+1) *<sup>π</sup>*Γ(*α*+*β*+2) .

### **4. Numerical Examples**

**Example 1.** *Let us consider the CSIE of the second kind:*

$$\frac{u\langle\mathbf{x}\rangle}{\sqrt{2}} + \frac{1}{\sqrt{2}\pi}e^{-ikx} \oint\_{-1}^{1} \frac{u\langle\mathbf{y}\rangle e^{iky}}{y-\mathbf{x}} dy = \frac{f\langle\mathbf{x}\rangle}{\sqrt{2}}\tag{30}$$

*where <sup>f</sup>*(*x*) <sup>=</sup> *cos*(*x*). *For <sup>x</sup>* <sup>=</sup> 0.5 *and <sup>a</sup>* <sup>=</sup> *<sup>b</sup>* <sup>=</sup> <sup>1</sup> √ 2 *, we get values of α* = 0.25 *and β* = 0.25 *from Equation (3) for κ* = 0*. The absolute error for u*(*x*) *is presented in Tables 1 and 2 below.*

**Table 1.** Absolute error for *κ* = 0, bounded at *x* = 1.


**Table 2.** Absolute error for *κ* = 0, bounded at *x* = −1.


**Figure 1.** The mixed boundary value problem.

*Taken from [18], it has the analytic solution <sup>φ</sup>*(*x*, *<sup>t</sup>*) <sup>=</sup> <sup>2</sup> *<sup>π</sup> arctan* <sup>2</sup>*<sup>y</sup>* 1−*x*2−*t* <sup>2</sup> *. It can further be reduced to the following integral equation for κ* = −1 *and for α* = *β* = <sup>1</sup> 2 .

$$\frac{-1}{\pi\pi} \oint\_{-1}^{1} \frac{u(y)}{y-x} dy = \mathcal{C}\_1 + \frac{1}{\pi} \left[ \frac{1-x}{2} \log(1-x) + \frac{1+x}{2} \log(1+x) - \log(2+x) - 1 \right] \tag{31}$$

*Here C*<sup>1</sup> *is a constant defined as C*<sup>1</sup> = 0.4192007182789807*. Furthermore if u*(*x*) *is known, the solution of the above boundary value can be derived as:*

$$\phi(\mu, \nu) = \frac{1}{\pi} \int\_{-\infty}^{\infty} \frac{\nu u(y, 0)}{(y - \mu)^2 + \nu^2} dy$$

*where:*

$$u(y,0) = \begin{cases} u(y) + (1-y)/2, & |y| \le 1, \\ 1 & t \in \{-2, -1\}, \\ 0, & otherwise. \end{cases} \tag{32}$$

*So here we just solve u*(*x*) *for simplicity. Figure 2 illustrates the absolute error for u*(*x*)*.*

**Figure 2.** The absolute error for *u*(*x*), for *x* = 0.6.

*Figure 2 shows that absolute error for u(x) decreases for greater values of N.*

**Example 3.** *[10,26] For CSIE with k* = 0*:*

$$\int\_{-1}^{1} \frac{u(y)}{y - \mathbf{x}} dy = \mathbf{x}^4 + 5\mathbf{x}^3 + 2\mathbf{x}^2 + \mathbf{x} - \frac{11}{8} \tag{33}$$

*in the case a* = 0 *and b* = 1*, where α and β are derived from Equation (3) and the exact values of u*(*y*) *for cases κ* ≤ 0 *for the solution bounded at x* = −1, *x* = 1, *x* = ±1 *are given as:*

$$\begin{aligned} u(y) &= \frac{1}{\pi} \sqrt{\frac{1+y}{1-y}} \left[ y^4 + 4y^3 - 5/2y^2 + y - 7/2 \right] \\ u(y) &= \frac{-1}{\pi} \sqrt{\frac{1-y}{1+y}} \left[ y^4 + 6y^3 + 15/2y^2 + 6y + 7/2 \right] \\ u(y) &= \frac{-1}{\pi} \sqrt{1-y^2} \left[ y^3 + 5y^2 + 5/2y + 7/2 \right] \end{aligned} \tag{34}$$

*Table 3 presents the absolute error for the above three cases.*

**Table 3.** Absolute error for case *κ* ≤ 0, *k* = 0.


*Clearly, Table 3 shows that obtained absolute errors are significantly good for really small values of N, N* = 5*, that can never be achieved in [10,26]. The exact value for u*(*x*) *in the above examples is obtained through Mathematica* 11*, while the approximated results are calculated using Matlab R2018a on a 4 GHz personal laptop with 8 GB of RAM. For Example 2 Matlab code and Mathematica command is provided as supplementary material.*

### **5. Conclusions**

In the presented research work, the Clenshaw–Curtise–Filon quadrature is used to get higher order accuracy. Absolute errors are presented in Tables 1 and 2 for solutions of highly oscillatory CSIEs for *κ* = 0. For larger values of *N*, Figure 2 shows the absolute error for *u*(*x*) for mixed the boundary value problem, whereas for frequency *k* = 0, the proposed quadrature posseses higher accuracy than the schemes claimed in [10,26]; Table 3 addresses this very well. This shows that the quadrature rule is quite accurate with the exact solution.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2227-7390/7/10/872/s1, for Example 2, Figure 2: The absolute error for *u*(*x*), for *x* = 0.6.

**Author Contributions:** Conceptualization, SAIRA, S.X. and G.L.; Methodology, SAIRA; Supervision, S.X.; Writing (original draft), SAIRA; Writing (review and editing), SAIRA, S.X. and G.L.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Mathematics* Editorial Office E-mail: mathematics@mdpi.com www.mdpi.com/journal/mathematics

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18