**Proof.**


From (3) and (4), it shows 1/*q<sup>T</sup>*<sup>1</sup> *k* (*u*)=(*CT*<sup>1</sup> <sup>1</sup>→*k*−<sup>1</sup>(*u*)/*CT*<sup>1</sup> <sup>1</sup>→*<sup>k</sup>*(*u*)) is the last component on the diagonal of (*<sup>T</sup>*1 − *uI*)−1. Then, we have

$$\begin{aligned} 1/q\_k^{T\_1}(\boldsymbol{\mu}) &= e\_k^{T}(T\_1 - \boldsymbol{\mu}I)^{-1}e\_{k\prime} \\ \Rightarrow 1/q\_k^{T\_1}(\boldsymbol{\mu}) &= e\_k^T Q (D\_1 - \boldsymbol{\mu}I)^{-1} Q^T e\_{k\prime} \\ \Rightarrow 1/q\_k^{T\_1}(\boldsymbol{\mu}) &= \sum\_{j=1}^k v\_{jk}^2 \frac{1}{s\_j^{T\_1} - \boldsymbol{\mu}} \end{aligned} \tag{13}$$

where *vj* is the *j*th eigenvector of *T*1.

As *CT*1 <sup>1</sup>→*<sup>k</sup>*(*u*) is the determinant of *T*1 − *uI*, *qT*1 *k* should be close to zero when *u* → *si*. However, in IEEE double precision arithmetic, this is not true if *v*2*ik* is also small when compared to *si* − *u*. (13) can be expressed as

$$1/q\_k^{T\_1}(\boldsymbol{u}) = \frac{\upsilon\_{ik}^2}{s\_i - \boldsymbol{u}} + \sum\_{j=1 \neq i}^k \upsilon\_{jk}^2 \frac{1}{s\_j - \boldsymbol{u}} = \upsilon\_{ik}^2/(s\_i - \boldsymbol{u}) + R\_{i\prime} \tag{14}$$

where apparently (recall that *g* = min*j*=*<sup>t</sup>* |*sT*1 *j* − *u*|)

$$|R\_i| \in \left[0, \frac{1}{\mathcal{S}}\right). \tag{15}$$

Given that *u* is the previous computation result, we have |*si* − *u*| ≤ *tol*. When *q T*1 *k*(*u*)(*si* − *u*) > 0, (14) and (15) can be united as

$$\begin{aligned} \left| v\_{ik}^2 / (s\_i - u) \right| &< 1/g + \left| 1/q\_k^{T\_1} \right|, \\ \Rightarrow |v\_{ik}| &< \sqrt{(1/q\_k^{T\_1} + 1/g)} \text{tol.} \end{aligned} \tag{16}$$

In addition, we have

$$|v\_{ik}| < \sqrt{(1/q\_k^{T\_1} - 1/\mathcal{g})tol} \tag{17}$$

similarly when *qT*1 *k* (*u*)(*si* − *u*) < 0.

The condition of Theorem 4b shows |*bkvik*| < *tol* according to (17). By taking a review of (12) and Lemma 2, the proof is completed.

Theorem 4 is satisfying because *qi* values of *T*1 and *pi* values of *T*2 happen to be accompanying products of Algorithm 2, which can be utilized as the basic iteration of the 'fzero' scheme. The condition of Theorem 4b is sufficient but not necessary, as there are many other possibilities that make |*vik*| < *tol*, even when (*CT*1 <sup>1</sup>→*k*−<sup>1</sup>(*u*)/*CT*<sup>1</sup> <sup>1</sup>→*<sup>k</sup>*(*u*))(*si* − *u*) ≥ 0. A trivial plan is to calculate and check *vik* once one *si* is solved and the accompanying |1/*q<sup>T</sup>*<sup>1</sup> *i* | is suspiciously small. Although this idea already saves a large number of unnecessary computations compared to the DC algorithm, we are still concerned that it is too expensive to call the Inverse Iteration algorithm here.

Our scheme is to mark those suspicious small |1/*q<sup>T</sup>*<sup>1</sup> *i* | values by a rough discriminant, for example |1/*q<sup>T</sup>*<sup>1</sup> *i* | < 1, then to substitute the corresponding *s*¯*i* ± *tol* values into Algorithm 1 to check if deflation is available. We have found in our numerical experiments that it is difficult to cover all the deflation situations by this method, even if we set the discriminant quite loosely. Even filtrating directly by |*vik*|, as in the DC algorithm, would still leave some out. We applied these methods to 20 randomly generated 2001 × 2001 matrices for computation, where *T*1 and *T*2 are both 1000 × 1000 matrices. The averages were calculated and are shown in Table 1. We collected the hit rate of the DC algorithm by checking how many *s* ¯ *i* values, which had negligible corresponding *vik* values, were really close to *λi* values. In Table 1, the plan 1 refers to "rough discriminant + Inverse Iteration algorithm", the plan 2 refer to "rough discriminant + Algorithm 1", and the hit rates of them were collected similarly. It can be seen that the hit rate and accuracy of our method are acceptable or at least no worse than the DC algorithm. The errors in Table 1 refer to the difference between *s* ¯ *i* values selected during deflations and *λ* ¯ *i* values obtained by the Bisection method. The data were collected on an Intel Core i5-4590 3.3 GHz CPU and 16 GB RAM machine. All codes were written in Matlab2017b and executed in IEEE double precision. The machine precision is *eps* ≈ 2.2 × 10−16.


**Table 1.** Comparison of deflation detecting methods (average of 20 2001 × 2001 matrices).

We give the Divisional Bisection method for all eigenvalues by Algorithm 5 and the following subroutine Algorithm 6.


**Algorithm 6:** Fzero by Determinant

**Input :** *a*, *b*2, *n*, *V*, *tol* **1** // searh one root in a isolating interval *V* **Output:** *x*, *qn* **2 call** Algorithm 2 ⇐ *a*, *b*2, *n*; **3 call** 'fzero' function in Matlab ⇐ Algorithm 2, *V*, *tol*; **4 then get** *x*; **5 save** *qn* of the last iteration.

#### **5. Accuracy Analysis and Numerical Results**

*5.1. Accuracy Analysis*

After the eigenvalues of the original submatrices are calculated by the QR Algorithm, as shown by line 3 in Algorithm 5, it is not safe to take (*s*¯*i* − *<sup>s</sup>*¯*i*−<sup>1</sup>)/2 as a *λi* if one *s*¯*i* − *<sup>s</sup>*¯*i*−<sup>1</sup> ≤ *tol*, because the QR algorithm is not always as accurate as the Bisection method or fzero scheme. So, in practice, we do an extra check for the selected *s*¯*i* values by Theorem 4a when checking deflation from results of the QR Algorithm. Suppose *m* sub-eigenvalues (denoted by *s*1, ... ,*sm*) cluster in the interval [*x*, *y*] where *y* − *x* ≤ *tol*; the process is shown as Algorithm 7.


In Algorithm 7, 10*tol* is a pessimistic estimation of QR algorithm error, which means it decuples that of the Bisection error. The data in Table 2, which are present in a later paragraph, supports our point. Line 2 in Algorithm 7 costs 2 Bisection iterations for *w* − 1 *λ* values and line 10 costs 3 to 4 per *λ* compared to about 7.5 iterations per *λ* in Algorithm 6 and 53 iterations per *λ* in the Bisection algorithm.

When arithmetic approximations *s*¯*i* are treated as the boundaries of isolating intervals in the next level, they do not affect the accuracy because if the number of *λ*'s in an interval is not one, Algorithm 6 fails. The troublesome number could be 0 or 2, but it is certainly not bigger than 3. When there are 4 or more *λ*'s in an interval, it means there are clustering *s* ¯ *i*'s of the previous results which can be perceived during the deflation check. For example, if 4 *λ*'s lie in [*s*¯*j*,*s*¯*j*+<sup>1</sup>] as

$$\mathbb{S}\_{j} < \lambda\_{j-1} < \mathbb{s}\_{j-1} < \lambda\_{j} < \mathbb{s}\_{j} < \mathbb{s}\_{j} < \lambda\_{j+1} < \mathbb{s}\_{j+1} < \lambda\_{j+2} < \mathbb{s}\_{j+1} \tag{18}$$

we have *sj*−<sup>1</sup> − *s*¯*j* < where is the previous computation error. (18) shows that *<sup>s</sup>*¯*j*−<sup>1</sup> and *s* ¯ *j* both lie in (*sj*−<sup>1</sup> − ,*sj*−<sup>1</sup>], which could not happen because we do the deflation check previously.

We regard this as a beneficial situation. It can be seen in (18) that the troublesome number arises only when *s*¯*j* < *λj* (or *s*¯*j* > *<sup>λ</sup>j*+1), contrary to Theorem 3. As the accurate *sj* > *λj* and *sj* − *s*¯*j* ≤ , we have *λj* − *s*¯*j* ≤ and then can speed up the calculation. Finally, the accuracy of Theorem 5 is as good as the Bisection algorithm.

We checked the accuracy of Algorithm 5 by computing the eigenvalues of a 2001 × 2001 Toeplitz ST matrix, which has all 2's on its diagonal and all −1's on its sub-diagonal. The results of each method were then compared with the exact value, i.e., *λi* = 2 − 2 cos(*<sup>i</sup>π*/2002), and are shown in Table 2. In addition, all eigenvalues of 20 randomly generated matrices

were calculated for testing the efficiency on serial machines, and we show the average results of 20 in Table 3. We set *p* = 2 in Algorithm 5 for the serial execution.

**Table 2.** Accuracy Result.


#### **Table 3.** Time Cost Result.


Table 2 demonstrates that our method substantially improves the speed of the Bisection method without losing accuracy. In addition, Table 3 confirms that Algorithm 5 is *<sup>O</sup>*(*n*<sup>2</sup>) as its iteration based on Algorithm 2. In the following subsections, we illustrate more test results of several different types of matrices. All results in Section 5 were collected on an Intel Core i5-4590 3.3-GHz CPU and 16-GB RAM machine, except for the last figure, which will be introduced in Section 5.4 specifically. All codes were written in Matlab2017b and executed in IEEE double precision. The machine precision is *eps* ≈ 2.2 × 1−16.

#### *5.2. Matrices Introduction and Accuracy Test*

In the following subsections, we present a numerical comparison among the Divisional Bisection algorithm and four other algorithms for solving the ST eigenvalue problem:


We use the following sets of test *n* × *n* matrices:

1. Matrix A:

$$\text{Matrix A} = \text{triddiagonal}\begin{bmatrix} 1 & 1 & \cdots & 1 \\ 2 & 2 & \cdots & \cdots & 2 \\ & 1 & 1 & \cdots & 1 \end{bmatrix}.$$

i.e., the Toeplitz matrix [1,2,1] to test the accuracy and efficiency, which has *λi* = 2 − 2 cos(*<sup>i</sup>π*/(*n* + <sup>1</sup>));

2. Matrix T1:

$$\text{MatrixT1} = \text{tridiagonal}\begin{bmatrix} 1 & 1 & \cdots & 1 \\ 1 & 0 & \cdots & \cdots & 1 \\ & 1 & 1 & \cdots & 1 \end{bmatrix},$$

to test the accuracy and efficiency, which has *λi* = −2 cos(2*iπ*/(<sup>2</sup>*n* + <sup>1</sup>)). Matrix T1 is from [45], as well as the following Matrix T2 and T3;

3. Matrix T2 [45]:

4.

$$\text{Matrix T2} = \text{triddiagonal}\begin{bmatrix} 1 & 1 & \cdots & 1 \\ 1 & 0 & \cdots & & 1 \\ & 1 & 1 & \cdots & 1 \end{bmatrix}'$$

to test the accuracy and efficiency, which has *λi* = −2 cos(*<sup>i</sup>π*/*n*); Matrix T3 [45]:

$$\text{Matrix T3} = \text{tridiagonal}\begin{bmatrix} 1 & 1 & \cdots & 1 \\ 1 & 0 & \cdots & \cdots & -1 \\ & 1 & 1 & \cdots & 1 \end{bmatrix} \text{ }$$

to test the accuracy and efficiency, which has *λi* = 2 cos((2*<sup>i</sup>* − <sup>1</sup>)*π*/(<sup>2</sup>*n*));


Figures 1–4 present the test results of accuracy, where the Average Errors denote the means of errors of all the calculated eigenvalues and the Maximal Errors denote the maximum. Seven different sizes are used, from 800 × 800 to 3200 × 3200. All errors have been divided by the machine precision *eps* for clarity. It can be seen that the new Divisional Bisection algorithm has the best accuracy as well as the Bisection method, considerably higher than the others.

**Figure 1.** *Cont*.

**Figure 1.** Results of Matrix A: (**a**) the Average Errors; (**b**) the Maximal Errors.

**Figure 2.** Results of Matrix T1: (**a**) the Average Errors; (**b**) the Maximal Errors.

**Figure 3.** Results of Matrix T2: (**a**) the Average Errors; (**b**) the Maximal Errors.

**Figure 4.** *Cont*.

**Figure 4.** Results of Matrix T3: (**a**) the Average Errors; (**b**) the Maximal Errors.

#### *5.3. Efficiency Test for Computing all the Eigenvalues*

Figure 5 presents the test results of time cost. Seven different sizes are used, from 800 × 800 to 3200 × 3200. Note that the results of the Random Matrix of each size are the mean data of 20 tests. Therefore, we use the plural form in the figures.

When the eigenvalues clutter, as in Matrix W, the Divisional Bisection method improves the Bisection method by about 70%. Such a good result can also be in Matrix T1 and Matrix T3. However, the improvement is less than 50% in Matrix A and Matrix T2. The reason is their submatrices have close eigenvalues to the global one but are not equal in finite precision arithmetic. For example, the sub-eigenvalues give an interval for Algorithm 6 and have an upper or lower bound that has a distance between *λi* less than 5 × 10−14. The 'fzero' scheme uses the linear interpolation to accelerate convergence; such a bound produces poor slopes during the linear interpolation process. As a consequence, more iterations are needed to guarantee convergence, which finally results in the efficiency loss of the Divisional Bisection method. Recall that Algorithm 7 is for checking similar situations. However, a distance of 5 × 10−<sup>14</sup> could not be detected, because it does not meet the conditions of Theorem 4.

Nevertheless, we are not pessimistic about the Divisional Bisection method. First, it still improves more than 35% in such cases and performs well for Random Matrices. Secondly, the 'fzero' scheme is not a prerequisite or non-replaceable in our method, which could be modified or substituted by a more powerful competitor in future follow-up studies.

**Figure 5.** Time cost for: (**a**) Matrix A; (**b**) Matrix T1; (**c**) Matrix T2; (**d**) Matrix T3; (**e**) Matrix W; (**f**) Random Matrices.

#### *5.4. Efficiency Test for Computing a Part of the Eigenvalues*

All along, the Bisection method undertakes the task of computing a part of eigenvalues, especially when the size of the matrix is large. When Algorithm 5 obtains all the subeigenvalues, as shown in lines 2–11 in Algorithm 5, it is an easy task to calculate any

parts of *λi*'s. For example, if eigenvalues in a certain interval are wanted, we can drop the sub-eigenvalues which are outside and substitute ±*F*, in Algorithm 5 line 2 and line 15, with the upper and lower bounds of the given interval. If *<sup>r</sup>*1th∼*<sup>r</sup>*2th eigenvalues are wanted, we need to drop the sub-eigenvalues that are of the order lower than *r*1 − 1 or higher than *r*2. When *sr*1−1 and *sr*2 are the substitutions of ±*F*, the problem can be solved.

Figure 6 shows the time cost in Random Matrices of four relatively large size, i.e., 5000 × 5000, 10,000 × 10,000, 15,000 × 15,000, and 20,000 × 20,000. We calculated 1%, 10%, 30%, and 50% *λi*'s of each size. Note the results are mean data of 40 tests, 20 for computing *λi*'s in a certain interval and 20 for computing *λi*'s in a certain order. Given that there is no evident difference between the test results of calculating *λi*'s in an interval or order, we mixed them for averaging.

**Figure 6.** Time cost for: (**a**) 1% *λ*'s; (**b**) 10% *λ*'s; (**c**) 30% *λ*'s; (**d**) 50% *λ*'s.

The results show that the Divisional Bisection method is not suitable for computing a small group of eigenvalues, despite the matrix being relatively large. We consider 10% as an applicable threshold. Although we can replace the QR method with the Bisection method in Algorithm 5 line 3, which could avoid the calculation of all the sub-eigenvalues, the result seems even worse. As the matrix size increases, the efficiency disadvantage of the Bisection method becomes increasingly severe, which could ignore only a quite small number of wanted *λi*, for example, 0.1%. In this case, the 'fzero' loops (line 12 to line 24 in

Algorithm 5) become a heavy burden to the Divisional Bisection method. Therefore, we insist on using the PWK version of the QR method in Algorithm 5.

We now consider the situation of calculating one *λ* in parallel. The problem also arises when the number of wanted *λ* is less than the number of CPU cores or not divisible by it. Algorithm 4 solves the problem and makes it available for computing with any number of CPU cores. Of course, the need to compute an eigenvalue in parallel must occur in a very large matrix. Therefore, we use three Random Matrices with sizes of 10<sup>6</sup> × 106, 10<sup>7</sup> × 107, and 10<sup>8</sup> × 10<sup>8</sup> for the test of parallel efficiency. The results, presented in Figure 7, were collected on an Intel Xeon(R) Core E5-2687 3.1-GHz CPU and 256-GB RAM machine, which has 20 CPU cores. Note that the results are mean data of 20 tests.

**Figure 7.** Computing one *λ* in parallel.

The three purple horizontal lines in Figure 7 denote the time cost of the serial Bisection algorithm. Specifically, the top one denotes the time cost for 10<sup>8</sup> × 10<sup>8</sup> Random Matrices, the middle 10<sup>7</sup> × 107, and the bottom 10<sup>6</sup> × 106. The parallel efficiency is unsatisfactory, especially for the 10<sup>7</sup> × 10<sup>7</sup> and 10<sup>6</sup> × 10<sup>6</sup> Random Matrices, which are even worse than the serial Bisection algorithm. The reason is that Matlab is not available for multi-threaded computation. Instead, we run the codes in multi-processes. The task of copying inputs and distributing them to the processes takes up the vast majority of the time. The script time consumption analysis tool in Matlab confirms our point, which shows at least 75% time was consumed during copying and distributing. Therefore, we would focus on the version written in C or Fortran of the Divisional Bisection algorithm in future follow-up studies. Nevertheless, Figure 7 verifies the feasibility of Algorithm 4, which to our knowledge is the only algorithm that works in parallel for computing any one ST eigenvalue. This paper also focuses on the serial version.

## **6. Conclusions**

In this paper, a novel *<sup>O</sup>*(*n*<sup>2</sup>) Divisional Bisection method is given for the ST eigenvalue problem by Algorithms 4 and 5. When computing all eigenvalues, the results show that the time cost is reduced by more than 35–70% on serial machines compared to the Bisection algorithm. In addition,


5. Combining Algorithms 4 and 5, it is practicable to calculate eigenvalues in any interval in parallel or any orders.

The Divisional Bisection method offers a novel idea for solving the ST eigenvalue problem and a new choice, especially for readers who care about an algorithm of good parallelization, flexibility, and warranted accuracy.

**Author Contributions:** Formal analysis, W.C., Y.Z. and H.Y.; investigation, W.C. and Y.Z.; writing—original draft, W.C.; writing—review and editing, H.Y. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is funded by the Talent Team Project of Zhangjiang City in 2021 and the R & D and industrialization project of the offshore aquaculture cage nets system of Guangdong Province of China (grant No. 2021E05034). Huazhong University of Science and Technology funds the APC.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors would like to thank the editors and reviewers for their constructive comments, which will improve the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.
