**A Switched-Element System Based Direction of Arrival (DOA) Estimation Method for Un-Cooperative Wideband Orthogonal Frequency Division Multi Linear Frequency Modulation (OFDM-LFM) Radar Signals**

**Yifei Liu 1,\* , Yuan Zhao <sup>1</sup> , Jun Zhu 1, Jun Wang <sup>2</sup> and Bin Tang <sup>1</sup>**


Received: 4 December 2018; Accepted: 28 December 2018; Published: 2 January 2019

**Abstract:** This paper proposes a switched-element direction finding (SEDF) system based Direction of Arrival (DOA) estimation method for un-cooperative wideband Orthogonal Frequency Division Multi Linear Frequency Modulation (OFDM-LFM) radar signals. This method is designed to improve the problem that most DOA algorithms occupy numbers of channel and computational resources to handle the direction finding for wideband signals. Then, an iterative spatial parameter estimator is designed through deriving the analytical steering vector of the intercepted OFDM-LFM signal by the SEDF system, which can remarkably mitigate the dispersion effect that is caused by high chirp rate. Finally, the algorithm flow and numerical simulations are given to corroborate the feasibility and validity of our proposed DOA method.

**Keywords:** wideband OFDM-LFM signal; switched-element system; fractional autocorrelation; DOA estimation

### **1. Introduction**

As a novel synthetic aperture radar (SAR) system, the multiple-input multiple-output SAR (MIMO-SAR) utilizes multiple antennas to emit mutually orthogonal waveforms, and employs multiple receiving channels to process the echo signals simultaneously [1–3]. Subject to current technical conditions, wideband Orthogonal Frequency Division Multi Linear Frequency Modulation (OFDM-LFM) modulated waveforms are commonly employed in modern MIMO-SAR systems [1,4], which brings challenge to the passive direction of arrival (DOA) estimation techniques.

Passive DOA estimation techniques have been implemented in electronic warfare equipment. In particular, a review of the most commonly used techniques can be found in literatures [5–7]. However, most of them are derived for narrowband signals, which cannot handle the wideband signal scenario, i.e., the OFDM-LFM signals. In this paper, we focus on the DOA estimation method for un-cooperative wideband OFDM-LFM radar signals. Overview of existing DOA algorithms [8–13] for wideband signals, the common approach is to sample the signals in the frequency domain through the array sensors, then, consider each frequency component into a narrowband signal for processing individually. The broadband beamforming approaches in H. L. Van Trees book [14] utilize arrays with non-uniform element spacing and a time-shift operator to complete decoupling of broadband signals. Although the mentioned methods can function well, they still suffer from huge cost of hardware and computational resources. Therefore, we exploit the switched-element direction finding (SEDF) system to solve the DOA estimation problem for wideband signals without much cost.

The block diagram of the modified SEDF and the target MIMO radar system are drawn in Figure 1. Its primary advantages include reducing the hardware and storage costs, simplifying the channel calibration process and decreasing the computation load [15–17]. Moreover, SEDF is also suitable for dealing with long-pulse signals, because there is no need to store the entire pulse in each channel. As shown in Figure 1, we consider a SEDF system with two receiving channels whose name are the reference channel (RC) and the switched channel (SC) respectively. When a signal of interest (SOI) is intercepted, the SC starts to switch in a constant period from antenna #1 to antenna #*K*. Thus, the signal pulse is split into multiple sub-pulses in the SC. Meanwhile, the data are collected via the RC. In formulating the DOA estimation problem for wideband OFDM-LFM signal on this SEDF system, we found that the steering vector is turned into a discrete time LFM-like vector. Hence, we proposed a modified approach to solve this estimation problem, which is inspired by a recently developed parameter estimation algorithm called Fast Iterative Interpolated Digital Fraction Fourier Transform (FII-DFrFT) [18].

**Figure 1.** Block diagram of the Switched-Element Direction Finding System (RF is short for the Radio Frequency, ADCs is short for Analog to Digital converters) and Multiple-input multiple-output radar system.

The rest of this paper is organized as follows. In Section 2, we introduce the signal model and the formula derivation for DOA estimation problem. In Section 3, the proposed FII-DFrFT estimator is illustrated in detail. Numerical simulation results are shown in Section 4. Finally, in Section 5 some conclusions are drawn.

### **2. Problem Formulations**

Consider an adversary MIMO-SAR with *M* transmitters. This radar employs wideband OFDM-LFM waveforms, which were first introduced into the design of an MIMO radar system by F. Cheng [19]. Afterwards, the signal of the *m*th transmitter is given as:

$$s\_m(t) = u\_m(t) \, e^{j2\pi f\_0 t}, 0 \le m \le M - 1 \tag{1}$$

$$u\_{\rm mf}(t) = e^{j2\pi \left(mf\_{\Lambda}t + \frac{1}{2}\gamma\_0 t^2\right)}\tag{2}$$

where *f*<sup>0</sup> denotes the carrier frequency; *f*<sup>Δ</sup> is the frequency step between two adjacent transmitters; *<sup>γ</sup>*<sup>0</sup> stands for the chirp rate. Besides, the bandwidth *<sup>B</sup>* of the OFDM-LFM signal is defined as *<sup>B</sup>* <sup>Δ</sup> = (*M* − 1) *f*<sup>Δ</sup> + *γ*0*T*p, where *T*<sup>p</sup> represents the pulse width of *sm* (*t*).

On the contrary, there are *K* + 1 antennas allocated in the SEDF system with interspace *d*R, as shown in Figure 1. Here, we set the intercepted signal via RC as *y*RC (*t*) = *s*(*t* − *t*0) + *n*RC (*t*), where *t*<sup>0</sup> represents the propagation time, and *n*RC (*t*) is the additive Gaussian white noise in RC. Since this paper focuses on the DOA, without loss of generality, it is reasonable to set *t*<sup>0</sup> = 0 for the sake of simplicity of derivations. Meanwhile, to avoid redundancy introductions of other scholars' existing work, we assume that the estimation for inner pulse parameters and the radio frequency demodulation have already been accomplished by the techniques and algorithms in References [18,20–22], while using the collected data in the RC. Moreover, we also assume the incident direction *θ* and the power of the SOI is stable during the switch period *T*s. Therefore, the OFDM-LFM signal intercepted via the SC can be written as:

$$\begin{split} y\_{\text{SC}}\left(t\right) &= A \sum\_{m=0}^{M-1} \sum\_{k=1}^{K} s\_{m} \left(t - \tau\_{k}\right) \text{rect}\left(\frac{t - (k-1)T\_{s}}{T\_{s}}\right) + n\_{\text{SC}}\left(t\right) \\ &= A \sum\_{k=1}^{K} \sum\_{m=0}^{M-1} \exp\left[j2\pi\left(f\_{m}t + \frac{1}{2}\gamma\_{0}t^{2}\right) + j2\pi\left(-f\_{m}\tau\_{k} - \gamma\_{0}\tau\_{k}t + \frac{1}{2}\gamma\_{0}\tau\_{k}^{2}\right)\right] \text{rect}\left(\frac{t - (k-1)T\_{s}}{T\_{s}}\right) + n\_{\text{SC}}\left(t\right) \\ &= A \sum\_{k=1}^{K} \sum\_{m=0}^{M-1} u\_{m}\left(t\right) e^{j\varphi\_{\text{s}}\left(\gamma\_{s}t\right)} \text{rect}\left(\frac{t - (k-1)T\_{s}}{T\_{s}}\right) + n\_{\text{SC}}\left(t\right) \end{split} \tag{3}$$

where *τ<sup>k</sup>* = [*k*dR sin (*θ*)] /*c* is the propagation delay between the #*k* and #0 antenna, with *c* represents the speed of light; *T*<sup>s</sup> is the duration for each switch; *n*SC(*t*) is the thermal noise in SC; *fm* = *f*<sup>0</sup> + *m f*Δ; the phase shift *ϕ<sup>m</sup>* (*τk*, *t*) is recast to:

$$\varphi\_m(\tau\_k, t) = 2\pi \left( -f\_m \tau\_k - \gamma\_0 \tau\_k t + \frac{1}{2} \gamma\_0 \tau\_k^2 \right) \tag{4}$$

which is time related.

Let us consider a common LFM, whose chirp rate has the quantity of 1012Hz/s, while *τ<sup>k</sup>* has the quantity of 10−<sup>9</sup> s. This means that the third term ( <sup>1</sup> 2*γ*0*τ*<sup>2</sup> *<sup>k</sup>* ) in Equation (4) is almost 0. Thus, we discard this term in the following derivations. Then, ignoring the noise term (its effect will be analyzed in the performance evaluations Section), we can obtain the instantaneous cross correlation between the SC and RC by:

$$\begin{split} r\left(t\right) &= y\_{\text{SC}}\left(t\right) \cdot y\_{\text{RC}}^{\*}\left(t\right) = A^2 \sum\_{k=1}^{K} \sum\_{m=0}^{M-1} u\_m\left(t\right) \exp\left[j\rho\_m\left(\pi\_k, t\right)\right] \sum\_{m'=0}^{M-1} u\_{m'}^{\*}\left(t\right) \\ &= A^2 \sum\_{k=1}^{K} \left[ \sum\_{m=0}^{M-1} \sum\_{m'=0}^{M-1} \exp\left[j2\pi\left(m-m'\right)f\_{\Delta}t + j\rho\_m\left(\pi\_{k'}, t\right)\right] \text{rect}\left(\frac{t - (k-1)T\_s}{T\_s}\right) \right] \end{split} \tag{5}$$

The above equation reveals that the interested phase shift terms (exp [*jϕ<sup>m</sup>* (*τk*, *t*)]) are mixed with the cross terms (exp [*j*2*π* (*m* − *m* ) *f*Δ*t*]), which are caused by the multi-component of the intercepted signal. In order to extract the phase shift term, a low-pass filter *h* (*t*) is designed [23] to filter out the cross terms, which ranges from ± exp [±*j*2*π f*Δ*t*] to exp [±*j*2*π* (*M* − 1) *f*Δ*t*]. Therefore, we can obtain a new baseband signal *x* (*t*) after cross correlation and low-pass filter processing:

$$\text{tr}\left(t\right) = \left\{ y\_{\text{SC}}\left(t\right) \cdot y\_{\text{RC}}^{\*}\left(t\right) \right\} \otimes h\left(t\right) \approx A^2 \sum\_{m=0}^{M-1} \sum\_{k=1}^{K} \exp\left[j\varphi\_{m}\left(\tau\_{k}, t\right)\right] \tag{6}$$

Afterwards, we collect the samples of *x* (*t*) every time when the SC switches the antenna, i.e., at *t* = 0, *T*s, ··· ,(*K* − 1) *T*s. Therefore, the sampled data is given by:

$$\mathbf{x} = \begin{bmatrix} \mathbf{x} \ (0) \ge \begin{pmatrix} T\_{\mathbf{s}} \end{pmatrix} \cdot \cdots \cdot \mathbf{x} \left( \begin{pmatrix} K - 1 \end{pmatrix} T\_{\mathbf{s}} \right) \end{pmatrix} \mathbf{\overset{T}{K}}{\in} \mathbf{a} \begin{pmatrix} \theta \end{pmatrix} \tag{7}$$

where the steering vector **a** (*θ*) is expressed as:

$$\mathbf{a}\left(\theta\right) = \begin{bmatrix} \sum\_{m=0}^{M-1} \exp\left[-j2\pi\left(f\_m \frac{\mathrm{d}\_\mathrm{R}\sin(\theta)}{c}\right)\right] \\ \sum\_{m=0}^{M-1} \exp\left[-j2\pi\left(\left(f\_m + \gamma\_0 T\_s\right)\frac{2\mathrm{d}\_\mathrm{R}\sin(\theta)}{c}\right)\right] \\ \vdots \\ \sum\_{m=0}^{M-1} \exp\left[-j2\pi\left(\left(f\_m + (K-1)\,\gamma\_0 T\_s\right)\frac{\mathrm{K}\mathrm{d}\_\mathrm{R}\sin(\theta)}{c}\right)\right] \end{bmatrix} \tag{8}$$

For the simplicity of derivations, we define *<sup>v</sup>* <sup>Δ</sup> = *d*<sup>R</sup> sin (*θ*) /*c*. Then, the *k*th entry of **x** can be further denoted by:

$$\exp\left[\mathbf{x}\left[k\right] = A^2 \sum\_{m=0}^{M-1} \exp\left[-j2\pi\left(\left(f\_m - \gamma\_0 T\_s\right)\upsilon k + \gamma\_0 T\_s \upsilon k^2\right)\right] \tag{9}$$

It is interesting to find out that comparing with the traditional narrow band representation, the steering vector of OFDM-LFM signal by SEDF system is also a chirp modulated signal, with respect to *k*2. Thus, this spatial signal model brings failure to the regular DOA estimation algorithms such as MUSIC and ESPRIT. Concerning on this, we approach our DOA estimation problem to the parameter estimation for OFDM-LFM signals. Therefore, we define the spatial chirp rate (*μ*0) and spatial frequency (*ω*) as *<sup>μ</sup>*<sup>0</sup> <sup>Δ</sup> = 2*vγ*0*T*<sup>s</sup> and *ω<sup>m</sup>* = (*fm* − *T*s*γ*0) *v* respectively. Then, Equation (9) can be simplified as:

$$\exp\left[k\right] = A^2 \sum\_{m=0}^{M-1} \exp\left[-j2\pi\left(\omega\_m k + \frac{\mu\_0}{2}k^2\right)\right] \tag{10}$$

To solve this estimation problem, we introduce the fast digital algorithm of FrFT [24] as:

$$X\_{\mathfrak{A}}\left(\frac{\mathcal{U}}{2\Delta\mathbf{x}}\right) = \frac{B\_{\mathfrak{A}}}{2\Delta\mathbf{x}}e^{j\pi\tan\left(\frac{\mathfrak{A}}{2}\right)\left(\frac{\mathcal{U}}{2\Delta\mathbf{x}}\right)^{2}}\sum\_{k=-K}^{K}e^{j\pi\csc a\left(\frac{\mathcal{U}-k}{2\Delta\mathbf{x}}\right)^{2}}e^{j\pi\tan\left(\frac{\mathfrak{A}}{2}\right)\left(\frac{k}{2\Delta\mathbf{x}}\right)^{2}}\mathcal{X}\left(\frac{k}{2\Delta\mathbf{x}}\right)\tag{11}$$

where <sup>Δ</sup>*<sup>x</sup>* <sup>=</sup> <sup>√</sup>*<sup>K</sup>* and *<sup>B</sup><sup>α</sup>* <sup>=</sup> (<sup>1</sup> <sup>−</sup> j cot *<sup>α</sup>*).

Substituting Equation (7) into Equation (11) we can obtain:

$$\begin{split} &X\_{\text{at}}\left(\frac{\text{J}}{2\text{\AA}x}\right) = \frac{B\_{\text{a}}}{2\text{\AA}x} \sum\_{m=0}^{M-1} \sum\_{k=-K}^{K} \exp\left[j\pi \frac{\left(\cot \Lambda l^{2} - 2\csc \text{all} k + \cot \text{at}^{2}\right)}{\left(2\text{\AA}x\right)^{2}} - j2\pi \left(\omega\_{m}\frac{k}{2} + \frac{\mu\_{0}}{2}\left(\frac{k}{2}\right)^{2}\right)\right] \\ &= \frac{B\_{\text{a}}}{2\text{\AA}x} \exp\left[j\pi \cot \alpha \left(\frac{\text{J}}{2\text{\AA}x}\right)^{2}\right] \sum\_{m=0}^{M-1} \sum\_{k=-K}^{K} \exp\left[j\pi \left(-\omega\_{m} - \frac{2\text{J}\cos \alpha}{\left(2\text{\AA}x\right)^{2}}\right)k + j\pi \left(-\frac{\mu\_{0}}{4} + \frac{\cot \alpha}{\left(2\text{\AA}x\right)^{2}}{\left(2\text{\AA}x\right)^{2}}\right)k^{2}\right] \end{split} \tag{12}$$

From Equation (12), we can see that **x** can be reformulated into multiple (precisely say *M*) impulses only for a particular *α*0(cot *α*<sup>0</sup> = −*Kμ*0) in the FrFT domain when *K* → ∞. After peak

searching, the peak coordinates (*α*B, *U*B*m*) in the FrFT domain can be utilized as an estimator for spatial frequency *v* (*θ*) and DOA *θ* as:

$$\begin{cases} \begin{array}{l} \begin{array}{l} \psi = -\frac{1}{M-1} \sum\_{m=2}^{M} \left( \mathcal{U}\_{\text{Bm}} - \mathcal{U}\_{\text{Bm}-1} \right) \frac{\csc a\_{\text{B}}}{2\mathcal{K}f\_{\text{A}}}\\ \theta = \arcsin \left( \frac{\csc \theta}{\mathcal{d}\_{\text{R}}} \right) \end{array} \end{cases} \tag{13}$$

However, since the number of antennas *K* is a limited value, there always some residual terms between the quasi peaks (*α*B, *U*B*m*) and real peaks (*α*0, *Um*). In this paper, we define these residual terms as *φ*<sup>0</sup> and *εm*, where *α*<sup>0</sup> = *α*<sup>B</sup> + *φ*<sup>0</sup> and *Um* = *U*B*<sup>m</sup>* + *εm*. Concerning on the influence of these residual terms to the estimation precision of DOA, we propose an iterative high-accuracy method to solve this problem.

### **3. Proposed Method**

### *3.1. Estimation of Spatial Chirp Rate*

As the analytical formulation of |*Xα*<sup>B</sup> (*U*B*m*)| involves Fresnel integral formula [25], it is difficult to directly construct the estimator for *φ*0. Thus, we consider utilizing the Fractional Autocorrelation (FA) spectrum of *x* (*t*) to form this estimator, which is defined as [26]:

$$\begin{split} \chi\_{a}\left(\tau\right) &= \int \mathbf{x}\left(t + \frac{\tau}{2}\sin a\right) \mathbf{x}^\*\left(t - \frac{\tau}{2}\sin a\right) e^{2j\pi t\tau} \cos a \, dt \\ &= \int \mathbf{r}\mathbf{c}\left(\frac{t}{T\_{\mathbf{K}}}\right) e^{j2\pi t\tau\left(\mu\_{0}\sin a + \cos a\right)} \sum\_{m\_{i}=0}^{M-1} \sum\_{m\_{j}=0}^{M-1} e^{-j\pi\tau v(\theta)\sin a\left(m\_{i}-m\_{j}\right)} f\_{\Lambda} e^{-j2\pi v(\theta)\left(m\_{i}-m\_{j}\right)} f\_{\Lambda} t \, dt \end{split} \tag{14}$$

where *T*<sup>K</sup> Δ = *KT*s.

Afterwards, we can calculate the detection statistic [26] interpreted as:

$$L\left(\boldsymbol{a}\right) = \int\_{-\infty}^{\infty} |\chi\_{\boldsymbol{a}}\left(\boldsymbol{\tau}\right)| d\boldsymbol{\tau} \tag{15}$$

Substituting Equation (14) into Equation(15) yields

$$L\left(\mathfrak{a}\right) = \left|\Gamma\left(\mathfrak{a}\right)\right| \int\_{-\infty}^{\infty} \left|T\_{\mathbb{K}} \text{Sinc}\left[2\pi T\_{\mathbb{K}} \left(\mu\_0 \sin a + \cos a\right)\tau\right]\right| d\tau \tag{16}$$

where

$$\Gamma\left(a\right) = \int\_{-\infty}^{\infty} \int\_{0}^{T\_{\rm K}} \sum\_{m\_i=0}^{M-1} \sum\_{m\_j=0}^{M-1} e^{-j\pi \tau v f\_\Delta \sin a \left(m\_i - m\_j\right)} e^{-j2\pi t v f\_\Delta \left(m\_i - m\_j\right)} dt d\tau \tag{17}$$

We can ignore the Γ (*α*) in the following derivation as this term does not involve *μ*0. Therefore, we can estimate *μ*<sup>0</sup> by locating the peak of *L* (*α*), namely:

$$
\hat{\mu}\_0 = -\cot \pi |\_{a=a\_0} \tag{18}
$$

where the coordination of the peak is given by *α*<sup>0</sup> = arg max {*L* (*α*)}.

However, the estimation performance is affected by the grid size of searching, say Δ*α*, as is demonstrated in Figure 2. To be specific, the actual residual term *φ*<sup>0</sup> between the *α*<sup>0</sup> and *α*<sup>B</sup> is also defined by Δ*α*, which is given by:

$$
\alpha\_0 = \alpha\_B + \phi\_0 = \alpha\_B + \delta\_0 \Delta \alpha \tag{19}
$$

where *δ*<sup>0</sup> ∈ [−0.5, 0.5]. Therefore, the fine estimation is now equivalent to obtain an estimate of *δ*0. Plugging in Equations (18) and (19), after some trigonometric derivation, we can define the FA coefficient as:

$$L\_P = L\left(a\_B + P\Delta a\right) = \int\_{-\infty}^{\infty} \left|T\_\mathbf{K} \text{Sinc}\left[2\pi T\_\mathbf{K} \csc a\_0 \sin\left(\left(P - \delta\_0\right)\Delta a\right)\text{\textpi}\right]\right|d\tau\tag{20}$$

where *Lp* (*P* = ±0.5) calculates the interpolation coefficient at the both edges of *α*B. Afterwards, we introduce the error mapping formulation through Algorithm 1 of [27] (see Table I in [7] for more information), which is defined as:

$$h\_1\left(\delta\right) = \text{Re}\left\{\frac{L\_{0.5} + L\_{-0.5}}{L\_{0.5} - L\_{-0.5}}\right\} \approx \frac{1}{2\delta\_0} \tag{21}$$

**Figure 2.** Demonstraction on the effect of the off-grid.

It is worth noting that Equation (21) needs a small enough Δ*α*, then the following approximations can be utilized: sin (*δ*Δ*α*) ≈ *δ*Δ*α* and sin [*T*<sup>K</sup> (0.5 − *δ*) Δ*απ* csc *ατ*˜ ] ≈ sin [*T*<sup>K</sup> (0.5 + *δ*) Δ*απ* csc *ατ*˜ ]. Thus, we can construct the estimator ˆ *δ*<sup>0</sup> = <sup>1</sup> <sup>2</sup>*h*1(*δ*0) for *<sup>δ</sup>*0. Then, an iterative process can be combined to improve the estimation accuracy by updating *α*<sup>B</sup> after each iteration, which will be shown in Section 3.3.

### *3.2. Estimation of Spatial Frequency*

Firstly, following Equation (12), we consider one component, say *m*, of the OFDM-LFM signal with a well estimated spatial chirp rate − cot *α*ˆ <sup>0</sup> ≈ *Kμ*0. Thus, Equation (12) can be rewritten as:

$$X\_{\hbar\_0} \left(\frac{\mathcal{U}}{2\Delta x}\right) = \frac{B\_{\hbar\_0}}{2\Delta x} e^{j\pi \cot \hat{a}\_0 \left(\frac{\mathcal{U}}{2\Delta x}\right)^2} \sum\_{k=-K}^{K} e^{j\pi k \left(-\omega\_m - \frac{2\mathcal{U}\csc \hat{a}\_0}{\left(2\Delta x\right)^2}\right)}\tag{22}$$

*Sensors* **2019**, *19*, 132

As we analyze in Section 2, the coordination estimated from the discrete searching is bias from the actual value with the finite *K*. Hence, at the quasi peak (*α*ˆ 0, *U*B*m*), *Xα*<sup>ˆ</sup> <sup>0</sup> *<sup>U</sup>*B*<sup>m</sup>* 2Δ*x* equals:

$$X\_{\hbar\_0} \left( \frac{\mathcal{U}\_{\text{Bm}}}{2\Delta x} \right) = \frac{B\_{\hbar\_0}}{2\Delta x} e^{j\pi \cot \hat{a}\_0 \left( \frac{\mathcal{U}\_{\text{Bm}}}{2\Delta x} \right)^2} \sum\_{k=-K}^{K} e^{j\pi k \left( -\omega\_m - \frac{\mathcal{U}\_{\text{Bm}} \csc \Lambda\_0}{2K} \right)} \tag{23}$$

Substituting the real value *<sup>ω</sup><sup>m</sup>* <sup>=</sup> <sup>−</sup>csc *<sup>α</sup>*<sup>ˆ</sup> <sup>0</sup> <sup>2</sup>*<sup>K</sup> Um* and *Um* = *U*B*<sup>m</sup>* + *ε<sup>m</sup>* into Equation (23), we can rewrite it as:

$$X\_{\hbar\_0} \left(\frac{lI\_{\rm Bm}}{2\Delta x}\right) = \frac{B\_{\hbar\_0}}{2\Delta x} e^{j\pi \cot \hbar\_0 \left(\frac{lI\_{\rm Bm}}{2\Delta x}\right)^2} \sum\_{k=-K}^{K} e^{j\pi k \frac{\epsilon\_{\rm m} \csc \hbar\_0}{2K}} \tag{24}$$

Similar to the approach in Section 3.1, we can obtain *Xα*<sup>ˆ</sup> <sup>0</sup> *<sup>U</sup>*B*m*±*<sup>P</sup>* 2Δ*x* as:

$$\mathrm{X\_{4}}\left(\frac{\mathrm{lL\_{Bm}}\pm P}{2\Delta x}\right) = \Gamma'\left(\hat{\mathrm{a\_{0}}}\,\mathrm{lL\_{Bm}}\pm P\right)\left[\frac{e^{-j\pi\frac{\left(\mathrm{a\_{0}}\mp P\right)\csc\theta\_{0}}{2}}\left(1-e^{j\pi\left(\varepsilon\_{0}\mp P\right)\csc\theta\_{0}}\right)}{1-e^{j\pi\frac{\left(\mathrm{a\_{0}}\mp P\right)\csc\theta\_{0}}{2N}}}\right] \tag{25}$$

where <sup>Γ</sup> (*α*<sup>ˆ</sup> 0, *<sup>U</sup>*B*<sup>m</sup>* <sup>±</sup> *<sup>P</sup>*) <sup>=</sup> *<sup>B</sup>α*<sup>ˆ</sup> <sup>0</sup> <sup>2</sup>Δ*<sup>x</sup> e jπ* cot *α*ˆ <sup>0</sup> *<sup>U</sup>*B*m*±*<sup>P</sup>* 2Δ*x* 2 .

When (*ε<sup>m</sup>* ∓ *P*)  *N*, we can approximate Equation (25) by using the first order Taylor expansion at *<sup>x</sup>* = 0 of 1 − *<sup>e</sup><sup>x</sup>* ≈ *<sup>x</sup>*. Then, similarly to Section 3.1, we could also construct the error mapping through this approximation as:

$$X\_{\mathbf{k}\_{\rm b}}\left(\frac{lI\_{\rm Bm}\pm P}{2\Delta x}\right) = \Gamma'\left(\pounds\_{\rm O} \, lI\_{\rm Bm}\pm P\right) \left[\frac{e^{-j\pi\frac{\left(\nu\_{0}\mp P\right)\csc\theta\_{0}}{2}}\left(1-e^{j\pi\left(\nu\_{0}\mp P\right)\csc\theta\_{0}}\right)}{1-e^{j\pi\frac{\left(\nu\_{0}\mp P\right)\csc\theta\_{0}}{2N}}}\right] \tag{26}$$

Hence, we can similarly obtain an estimator ˆ *δ<sup>m</sup>* = 0.5*h*<sup>2</sup> (*δm*) for the residual term *ε*ˆ*m*, and combine an iterative process to improve its accuracy.

### *3.3. Iterative DOA Estimation for OFDM-LFM*

In this subsection, the estimators of spatial chirp rate and spatial frequency are combined to estimate the DOA for OFDM-LFM signals. Due to the fact that the FrFT is characterized by linear transformations [28], the major estimation bias between multi-component and mono-component signals through the FrFT based algorithm is caused by the energy leakage from the multi-component. To adapt the above process to the multi-component scenario, we introduce the CLEAN algorithm [27]. Firstly, the noise-free actual fractional coefficient *<sup>X</sup>*˜ *<sup>α</sup>*<sup>ˆ</sup> 0,*<sup>m</sup> U*<sup>ˆ</sup> *<sup>m</sup>* <sup>±</sup> *<sup>P</sup>* /2Δ*x* of the *m*th OFDM component is defined as:

$$\begin{split} \mathcal{X}\_{\mathbb{A}\_{0},m} \left( \frac{\mathbb{\hat{\mathcal{U}}\_{m} \pm P}}{\Sigma \Delta x} \right) &= DFRFT\_{\left(\mathbb{A}\_{0}, \mathbb{\hat{\mathcal{U}}\_{m} \pm P\right)} \left( \text{x} \left[ \text{k} \right] \right)}\\ \mathcal{X} &= X\_{\mathbb{A}\_{0},m} \left( \frac{\mathbb{\hat{\mathcal{U}}\_{m} \pm P}}{\Sigma \Delta x} \right) + \sum\_{l=1, \left( l \neq m \right)}^{M} \breve{X}\_{\hat{\mathbf{a}}\_{0},l} \left( \frac{\mathbb{\hat{\mathcal{U}}\_{m} \pm P}}{\Sigma \Delta x} \right) \end{split} \tag{27}$$

where *Xα*<sup>ˆ</sup> 0,*<sup>l</sup> U*<sup>ˆ</sup> *<sup>m</sup>* <sup>±</sup> *<sup>P</sup>* /2Δ*x* denote the energy leakage from the other *M* − 1 OFDM components, which can be calculated by:

$$\breve{X}\_{\mathbb{A}\_0, l} \left( \frac{\hat{\mathcal{U}}\_{\mathbb{m}} \pm P}{2 \Delta \mathbf{x}} \right) = A\_l \text{DFrFT}\_{\left( \mathbb{A}\_0 \hat{\mathcal{U}}\_{\mathbb{m}} \pm p \right)} \left( \mathbb{s}\_l \left[ n \right] \right) = A\_l \frac{B\_{\mathbb{A}\_0}}{2 \Delta \mathbf{x}} e^{j \pi \cot \mathbb{A}\_0 \left( \frac{\hat{\mathcal{U}}\_{\mathbb{m}} \pm P}{2 \Delta \mathbf{x}} \right)^2} \sum\_{n=-N}^{N} e^{j \pi n \left( \frac{(\hat{\mathcal{U}}\_{\mathbb{m}} - \hat{\mathcal{U}}\_{\mathbb{m}} \mp P) \cos \mathbb{A}\_0}{2 \Delta} \right)} \tag{28}$$

where *Al* is the amplitude of the *l*th (*l* = 1, ..., *M*) component.

*Sensors* **2019**, *19*, 132

Then, the target fractional coefficient *<sup>X</sup>*<sup>ˆ</sup> *<sup>α</sup>*<sup>ˆ</sup> 0,*<sup>m</sup>* can be separated from the mixed term *<sup>X</sup>*˜ *<sup>α</sup>*<sup>ˆ</sup> 0,*<sup>m</sup>* by subtracting the leakages as:

$$\mathcal{X}\_{\mathbb{A}\_{0},m}\left(\frac{\bullet\_{m}\pm P}{2\Delta x}\right) = \mathcal{X}\_{\mathbb{A}\_{0},m}\left(\frac{\bullet\_{m}\pm P}{2\Delta x}\right) - \sum\_{l=1,\left(l\neq m\right)}^{M} \breve{\mathcal{X}}\_{\mathbb{A}\_{0},l}\left(\frac{\bullet\_{m}\pm P}{2\Delta x}\right) \tag{29}$$

According to the above derivation, an iteration-based method to accomplish the DOA estimation for OFDM-LFM signal is demonstrated in Algorithm 1.

**Algorithm 1:** Proposed FII-DFrFT DOA Estimation Method. **Initialization:** Set ˆ *<sup>δ</sup>*<sup>0</sup> <sup>=</sup> 0, *<sup>ε</sup>*ˆ*<sup>m</sup>* <sup>=</sup> 0, *<sup>A</sup>*ˆ*<sup>m</sup>* <sup>=</sup> 0 and *<sup>P</sup>* <sup>=</sup> 0.5, where *<sup>m</sup>* <sup>=</sup> <sup>1</sup> ··· *<sup>M</sup>*; Find *<sup>α</sup>*<sup>B</sup> <sup>=</sup> arg Max *<sup>α</sup>* {*<sup>L</sup>* (*α*)}, where *<sup>α</sup>* <sup>∈</sup> [0 : <sup>Δ</sup>*<sup>α</sup>* : *<sup>π</sup>*] and *<sup>u</sup>* <sup>∈</sup> [0:1: *<sup>N</sup>* <sup>−</sup> <sup>1</sup>]. **Estimation of Spatial Chirp Rate: 1.** Calculate the detection statistic *L* (*α*) using Equations (14) and (15), find *<sup>α</sup>*<sup>B</sup> <sup>=</sup> arg Max *<sup>α</sup>* {*<sup>L</sup>* (*α*)}, where *<sup>α</sup>* <sup>∈</sup> [0 : <sup>Δ</sup>*<sup>α</sup>* : *<sup>π</sup>*]; **2.for** *q* = 1:1: *Q* **do 2.1** Calculate *L* (*α*<sup>B</sup> ± 0.5Δ*α*) and *β* using Equations (13) and (16). **2.2** Calculate ˆ *δ<sup>q</sup>* = 1/ (2*h*1), renew *α*<sup>B</sup> = *α*<sup>B</sup> + ˆ *δq*Δ*α*; **end 3.** Finally, let *α*ˆ <sup>0</sup> = *α*B. **Estimation of Spatial Frequency: 1.** Calculate the *Xα*<sup>ˆ</sup> <sup>0</sup> [*u*] = DFrFT(*α*<sup>ˆ</sup> 0,*u*) (*x* [*k*]) using Equation (12). **2.for** *m* = 1:1: *M* **do 2.1 if** *q* = 1 **then** *<sup>X</sup>*˜ *<sup>α</sup>*<sup>ˆ</sup> <sup>0</sup> [*u*] <sup>=</sup> *<sup>X</sup>α*<sup>ˆ</sup> <sup>0</sup> [*u*] <sup>−</sup> *<sup>M</sup>* ∑ *l*=1,*l*=*m A*ˆ *<sup>l</sup>*DFrFT(*α*<sup>ˆ</sup> 0,*u*) (*s*ˆ*<sup>l</sup>* [*k*]), *<sup>U</sup>*<sup>ˆ</sup> <sup>B</sup>*<sup>m</sup>* <sup>=</sup> arg Max *<sup>u</sup>* ' '*X*˜ *<sup>α</sup>*<sup>ˆ</sup> <sup>0</sup> [*u*] ' '2 ; **end 2.2** Calculate *<sup>X</sup>*<sup>ˆ</sup> *<sup>m</sup>*,*α*<sup>ˆ</sup> <sup>0</sup> *<sup>U</sup>*<sup>ˆ</sup> *<sup>m</sup>* <sup>±</sup> *<sup>P</sup>* and *h*<sup>2</sup> (*δm*) using Equations (26)–(29); **2.3** Calculate *ε*ˆ*<sup>m</sup>* = 0.5*h*<sup>2</sup> (*δm*), renew *U*ˆ <sup>B</sup>*<sup>m</sup>* = *U*ˆ <sup>B</sup>*<sup>m</sup>* + *ε*ˆ*m*; **2.4** Calculate *A*ˆ*<sup>m</sup>* = ' ' ' ' ' *Xα*<sup>ˆ</sup> <sup>0</sup> , *U*ˆ *<sup>m</sup>* - <sup>−</sup> *<sup>M</sup>* ∑ *l*=1,*l*=*m Xα*<sup>ˆ</sup> 0,*<sup>l</sup> <sup>U</sup>*<sup>ˆ</sup> *<sup>m</sup>* 2Δ*x* ' ' ' ' ' / Δ*x* ' ' ' *Bα*ˆ 0 ' ' ' **end 3.** Calculate *U*ˆ *<sup>m</sup>* = *U*ˆ <sup>B</sup>*<sup>m</sup>* + *ε*ˆ*m*. **Output:** ⎧ ⎪⎨ ⎪⎩ *<sup>v</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>−</sup> <sup>1</sup> *M*−1 *M* ∑ *m*=2 *<sup>U</sup>*<sup>ˆ</sup> *<sup>m</sup>* <sup>−</sup> *<sup>U</sup>*<sup>ˆ</sup> *<sup>m</sup>*−<sup>1</sup> csc *α*ˆ <sup>0</sup> 2*K f*Δ ˆ *<sup>θ</sup>* <sup>=</sup> arcsin *cv*<sup>ˆ</sup> *d*R 

### **4. Performance Evaluation**

In this section, we report our numerical evaluation through a Monte-Carlo simulation. Since the DOA estimation performance is mainly dependent on three factors, which are the signal-to-noise-ratio (SNR), the incidence angle (*θ*) and the component number (*M*), we evaluate the estimation performance with respect to these factors in a realistic case.

Consider a coherent MIMO radar (e.g. MIMO-SAR) which employs wideband OFDM-LFM signal. The simulation parameters of this MIMO radar and our SEDF system are listed in Table 1. It is worth noting that we assume the pulse width (*T*P) of the OFDM-Signal is greater than *KT*s, thus our method can function well. Moreover, we assume the far field sources whose initial phase is uniformly distributed within [0, 2*π*), and we take the thermal noise into consideration, which is modeled as zero-mean Gaussian with variance *σ*<sup>2</sup> <sup>n</sup> = 1. Additionally, in all simulations, 1000 independent runs

are conducted to calculate the Normalized Root Mean Square Error (NRMSE) and Root Mean Square Error (RMSE).

### (a) *DOA Estimation versus SNR*

In this simulation, we evaluate the DOA estimation performance with respect to the SNR, while the DOA *θ* is set as 30 deg. For the sake of comparison, we also simulate the following approaches, Incoherent Signal-subspace Method Conventional Beam Forming (ISM-CBF) [29], Coherent Signal-subspace Method Linearly Constrained Minimum Variance (CSM-LCMV) [11], Rotational Signal Subspace Sparse Asymptotic Minimum Variance (RSS-SAMV) [12] and Sparse Iterative Covariance-based Estimation (SPICE) [9] As these existing approaches are designed for single wideband LFM signal, here, we consider the intercepted signal that received by our switched-element system a mono-component wideband LFM signal (*M* = 1). Then, the above approaches and FII-FrFT are utilized to process the output signal and obtain the DOA estimation results, respectively. These results are collected and organized to NRMSE curves, which are shown in Figure 3. These curves reveal that our FII-FrFT method outperforms most mentioned approaches when SNR is beyond −8 dB. However, the NRMSE curve of FII-FrFT remains stable when SNR is beyond 12 dB and suffers a stable estimation bias, which is caused by the approximations that we employed in the theoretical derivations of Section 3.1 and the off-grid effect. On the other side, although the RSS-SAMV performs best in this simulation, its implementation will consume much more hardware resource (*K* receiving channels) and computational resource [12].

### **Table 1.** Parameter Settings.


**Figure 3.** Normalized root mean square error (NRMSE) of DOA versus the signal-to-noise ratio (SNR). Cramer Rao, FII-FrFT, ISM-CBF, CSM-LCMV, RSS-SAMV and SPICE.

### (b) *DOA Estimation versus Real Incident Angle and Component Number M*

In this simulation, we focus on the DOA estimation performance as the function of the real direction *θ* within [10, 70] degree by the FII-FrFT. We also consider the intercepted OFDM-LFM signals consist different component numbers *M* = [2, 3, 4]. For intuitional comparison with different OFDM-LFM signals, we define a different SNR in this subsection as *ρ* = 10 lg *MA*2/*σ*<sup>2</sup> n . The root mean square error (RMSE) of DOA estimation results at SNR = 10 dB are given in Figure 4. Firstly, we can see from Figure 4 that our proposed method can handle the OFDM-LFM radar signal well, while its component number affects the RMSE slightly. Secondly, the periodic variation of RMSE curves in Figure 4 reflects the off-grid effect in the fixed searching interval on estimation performance, which is in coincidence with our theoretical analysis in Section 3.2 and the simulation results in Reference [15]. This bias can be reduced by decreasing , i.e., using a denser grid, but it will also lead to the expensive price of computational load. Therefore, our DOA estimation method has to reach a compromise between accuracy and cost.

**Figure 4.** Root mean square error (RMSE) of DOA versus the incident angle.

### **5. Conclusions**

In this paper, a FII-DFrFT based SEDF system was introduced to improve the DOA estimation performance considering the wideband OFDM-LFM signals. The steering vector was reformulated followed by the iterative interpolation in both FA and DFrFT spectrum. Numerical simulations illustrated the validity and superiority of our algorithm compared with some other wideband DOA estimation approaches like ISM-CBF, CSM-LCMV, RSS-SAMV and SPICE. On the other hand, in the practice scenario, the modulated parameters of un-cooperative MIMO radar are generally unknown. This will cause the DOA estimation to be possibly ambiguous. Fortunately, taking advantage of a flexible switching interval, we can design a multi-interval SEDF system to resolve this ambiguity. Finally, the estimation bias caused by the off-grid effect and approximation are also of interest and will be the subject of our further investigation.

**Author Contributions:** Y.L. designed and wrote this paper under the supervision of B.T. Y.Z. proposed the main idea and assisted with the methodology with formal analysis. J.Z. and J.W. provided the support of the entire study. Y.Z. and B.T. reviewed and edited the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China (Grant No. 61571088).

**Acknowledgments:** Thanks to the University of Electronic Science and Technology of China. Thanks to my supervisor Bin Tang. Thank you to my dear love Silei Gui and all the people who support us.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Improved Bound Fit Algorithm for Fine Delay Scheduling in a Multi-Group Scan of Ultrasonic Phased Arrays**

### **Yuzhong Li 1,2 , Wenming Tang <sup>1</sup> and Guixiong Liu 1,\***


Received: 19 December 2018; Accepted: 16 February 2019; Published: 21 February 2019

**Abstract:** Multi-group scanning of ultrasonic phased arrays (UPAs) is a research field in distributed sensor technology. Interpolation filters intended for fine delay modules can provide high-accuracy time delays during the multi-group scanning of large-number-array elements in UPA instruments. However, increasing focus precision requires a large increase in the number of fine delay modules. In this paper, an architecture with fine delay modules for time division scheduling is explained in detail. An improved bound fit (IBF) algorithm is proposed, and an analysis of its mathematical model and time complexity is provided. The IBF algorithm was verified by experiment, wherein the performances of list, longest processing time, bound fit, and IBF algorithms were compared in terms of frame data scheduling in the multi-group scan. The experimental results prove that the scheduling algorithm decreased the makespan by 8.76–21.48%, and achieved the frame rate at 78 fps. The architecture reduced resource consumption by 30–40%. Therefore, the proposed architecture, model, and algorithm can reduce makespan, improve real-time performance, and decrease resource consumption.

**Keywords:** ultrasonic phased array; scheduling algorithm; multi-group sensors; FPGA

### **1. Introduction**

Ultrasonic phased array (UPA) technology is an important nondestructive testing method that is widely used in aerospace, shipbuilding, port machinery, and nuclear energy. With its multiple-group scanning functionality and a large number of other elements, the multi-group scan UPA system can provide extended scanning flexibility and image contrast, increased focal law diversification, and high signal-to-noise ratio (SNR). Within the system, a number of filters in a given module determine the precision of fine delay. The higher the precision, the better the image resolution. Classical all-parallel fine delay modules require a lot of hardware resources, i.e., a multiplier, look-up table (LUT), register (Reg), and an in field programmable gate array (FPGA). Synchronization and integration difficulty need to be considered in the use of multi-chip schemes, while hardware resources are limited in single chip schemes. Therefore, an architecture with time-division multiplexing is used to schedule frame tasks between fine delay modules in a single chip. This method can significantly improve resource utilization and reduce the number of resources used. However, when the sampling depth or the value of the focal law is large, the frame rate (frames per second, fps) decreases, leading to worse real-time performance of the distributed UPA instrument and a greatly reduced application scope. Therefore, it is necessary to coordinate fine modules and frame tasks for multi-group scanning through algorithm schedules, minimize idle time slots of resources in the fine delay modules, and reduce the makespan of all frame tasks to improve time performance.

In order to reduce the trade-off between the real-time processing of big data and system complexity, many studies have been conducted on high-performance hardware architecture and corresponding algorithms. Thus, various resource optimizations have been proposed. Holmes et al. [1] proposed a UPA system called the full matrix capture and total focus method (FMC-TFM), which requires the processing of large focus and delay data, and has been the mainstream architecture for research in recent years. Njiki el al. [2] proposed a hardware architecture for big data processing based on FMC and a large-scale phased array instrument, applied in the M2M NDT (nondestructive testing) (Eddyfi Technologies, Québec, QC, Canada) UPA system. The proposed FMC-TFM architecture can achieve a frame rate of 73.6 fps and 128 × 128 pixels in the region of interest. Shao and Yuan [3] proposed a method based on the compute unified device architecture (CUDA) interface, a parallel graphics processing unit (GPU), and whole parallel echo signal processing, wherein parallel GPUs accelerate the method 2–3-fold compared to MATLAB (Mathworks, Co., Ltd., Natick, MA, USA), while the multithreading CPU provides four times higher acceleration than a single thread. Guo et al. [4] improved a TFM imaging system and proposed an algorithm based on read-only memory (ROM). Zhang et al. [5] used a state machine, operation unit, and a large data storage unit to form the TFM algorithm imaging system, which achieved good performance. Tang et al. [6] proposed a data transmission algorithm for UPAs, but it does not work with delay and focus. Liu et al. [7] proposed an improved 8× interpolation cascaded integrator-comb (CIC) filter parallel algorithm, which reduced 12.5% of addition and 29.2% of multiplication and yielded a time delay accuracy of 1 ns at 125 MHz. Su et al. [8] proposed a parallel delay multiply and sum beamforming (PDMAS) algorithm, based on a graphics processing unit (GPU) that improved the parallelism and stability of the beamformer with a frame rate of 83 fps. However, these papers only focus on the performance of the delay and focus module, and not the multi-group scan and its frame task scheduling.

Although a multi-core CPU with single instruction multiple data (SIMD) and the GPU programmed by CUDA also realizes the beamform function (delay and focus), Asano et al. [9] found that a GPU was slower than a CPU for complex algorithms. Furthermore, they also found that a GPU only has potential for naïve computation methods, due to its small local memory and the memory access limitation in the architecture. The performance of a quad CPU is 1/12 to 1/7 that of a field programmable gate array (FPGA). The performance of an FPGA is only limited by its size and bandwidth. FPGA is the mainstream solution for portable UPA instruments, and is supported by manufacturers. Moreover, it is convenient for design and verification of the UPA system's integrated circuits. Therefore, this paper uses FPGA to implement the algorithm and architecture of the multi-group scan UPA system.

The fine delay scheduling problem in the multi-group scanning of UPA systems, which we address here, can be considered as a parallel machine scheduling problem. The aim is to decrease the makespan, which can be represented as Pm||Cmax. It is a non-deterministic polynomial-time hard (NP-hard) problem [10], which cannot be solved using polynomial algorithms. Heuristic algorithms are a simple and effective method used to address NP-hard problems at present.

The most commonly used heuristic algorithms are the longest processing time algorithm (LPT) [11] and the MULTIFIT algorithm. The MULTIFIT algorithm proposed by Coffman et al. [12] is based on the first fit decreasing (FFD) iteration algorithm, which is used in bin-packing problems. However, the MULTIFIT algorithm has much better performance than the LPT algorithm. Freisen et al. [13] studied the absolute performance and time complexity of the MULTIFIT algorithm. Lee et al. [14] used a combination of the LPT and MULTIFIT algorithms. Kang et al. [15] simplified the MULTIFIT algorithm and combined it with the prepare algorithm (PA), in order to form the bound fit (BF) algorithm. Li et al. [16] proposed the QUICKFIT algorithm, which is an improved BF algorithm in the iteration stage. Based on the advantages of the LPT and BF algorithms, the improved bound fit (IBF) algorithm is proposed here.

In this paper, a fine delay scheduling architecture was also analyzed considering multi-group-scan echo data diversity, using a non-preempt model for the scheduling problem and proposing the IBF algorithm for optimization.

The paper is organized as follows. In Section 2, the architecture of the fine delay module scheduling for the multi-group scanning of UPA systems is presented, and the multi-group scan problem is explained. In Section 3, the IBF algorithm is proposed and an analysis of its performance and time complexity is provided. LIST, LPT, BF, and IBF algorithms are compared in Section 4. Finally, a conclusion is provided in Section 5.

### **2. Fine Delay Module for Multi-Group Scanning of UPAs**

### *2.1. Fine Delay Scheduling Principle*

The delay method and focus scheduling based on different UPA instrument focal parameters (e.g., number of apertures, sending and receiving time, and data amount), which control the pulse repetition frequency (PRF) and frame formation, are used for scheduling in multi-group scans. The delay precision is 1.25 ns. Due to the limitation of the resources of the FPGA in our experiments, the system architecture is designed as four groups and two fine delay modules. Each group has eight channels, and each channel has 10-bit analog-digital converter (ADC). Sampling depth is 2–8 K, the number of focal law ≤128, and read parameter length is 1024 in each group. The design frame rate is not less than 24 fps, which meets the requirements of real-time display.

A diagram of the for mutli-group scanning is shown in Figure 1, labels -<sup>1</sup> –-<sup>5</sup> in Figure 1 are described below.

**Figure 1.** Diagram of the fine delay module for multi-group scanning.

The presented block diagram includes the following parts:


The fine-delay module used in this study contains the multi-level half-band filter that was proposed by Liu and Tang [17]. A diagram of the multi-level half-band fine delay filter is presented in Figure 2, whereas its simulation diagram created in ModelSim (Mentor Co., Ltd., Wilsonville, OR, USA) is shown in Figure 3.

**Figure 2.** Diagram of fine delay module.

**Figure 3.** ModelSim simulation diagram of the multi-level half-band fine delay filter.

The multi-level half-band fine delay filter uses the interpolation method with eight time intervals to design a half-band filter. The implementation of synthetic technology in the multi-level half-band interpolation filter results in filter decomposition into eight sub-filters. Simultaneously, interpolation with poly-phase decomposition is achieved. The eight filters delay the original signal for 0, 1.25, 2.5, 3.75, 5, 6.25, 7.5, and 8.75 ns. The data samples have a 10-bit length, and thus two 9-bit multipliers are needed for multiplications. However, the multi-level half-band filter uses six 9-bit multipliers. In addition, each channel has eight fine delay channels, so there are 96 (i.e., 6 × 2 × 8 = 96) 9-bit multipliers. If all parallel delay is used in a 256-element UPA system, then 24,576 multipliers would be needed. Given such large resource consumption, the integration of a single FPGA in the multi-group scan module of a UPA system would be difficult.


control for scheduling Mux and Demux based on the above parameters. A fine delay scheduling model diagram in the multi-scan group is presented in Figure 4.

**Figure 4.** Fine delay scheduling model diagram in the multi-scan group.

### *2.2. Fine Delay Scheduling Problem in Multi-Group Scanning*

The parameters of the fine delay module for multi-group scanning of UPAs are presented in Table 1. Here, we represent the symbols used in the scheduling problems with brackets.

**Table 1.** Parameters of the fine delay module for multi-group scanning of a ultrasonic phased array (UPA) system.


<sup>1</sup> Symbols in brackets are those used in the scheduling problem.

Fine-delay scheduling for multi-group scanning of UPAs must satisfy four conditions:


Condition (1) avoids timing confusion, condition (2) avoids interruption of the fine delay signal processing, and condition (3) compacts the frame task for scheduling and decreases the time slot waste. Condition (4) ensures that the fine delay processing will not exceed its abilities, leading to echo data overlap.

Before a description of the fine delay scheduling problem is presented, some parameters must be defined:

### **Definition 1.** *Frame task.*

If it is assumed that the *i*th scan has focal law frame *N<sup>i</sup>* FocalLaw and sample depth *<sup>D</sup><sup>i</sup>* Sample, then the frame task is the time needed to complete all beamforms (or focal laws) of the image.

### **Definition 2.** *Frame task deadline.*

The frame task deadline represents the time the system needs to generate a complete image for all groups, and it must be less than 1/24 s for real-time applications.

Schematic diagrams of the frame task and frame task deadline are presented in Figure 5a,b, respectively.

**Figure 5.** Schematic diagram of: (**a**) Frame task and (**b**) Frame task and frame task deadline.

The time parameters used in the proposed algorithm are defined as follows. Start time, *t i <sup>s</sup>*, is defined by:

$$t\_s^i = 0 \quad i = 1, 2, \dots, N\_{\text{Group}} \tag{1}$$

Processing time, *t i <sup>p</sup>*, is defined by:

$$t\_p^i = (D\_{\text{sample}}^i \times T\_{\text{clock}-\text{cycle}} + T\_{\text{RP}}^i) \times N\_{\text{Focal,aw}}^i \quad i = 1, 2, \dots, N\_{\text{Group}} \tag{2}$$

End time, *t i <sup>d</sup>*, is defined by:

$$t\_d^i = 1/24 \text{ s} \quad i = 1, 2, \dots, N\_{\text{Group}} \tag{3}$$

Therefore, the question can be set as *Pm*||*C*max , and the scheduling model is defined by:

$$\text{Min } \mathbf{z} \; = \text{Max}(\sum\_{j=1}^{n} t\_p^j \mathbf{x}\_{ij}) \quad i = 1, 2, \dots, m \tag{4}$$

subject to:

$$\sum\_{j=1}^{n} t\_p^j \mathbf{x}\_{ij} \le t\_d y\_i \quad i = 1, 2, \dots, m \quad j = 1, 2, \dots, n \tag{5}$$

$$\sum\_{i=1}^{m} x\_{ij} = 1 \quad i = 1, 2, \dots, m \quad j = 1, 2, \dots, n \tag{6}$$

$$\mathbf{x}\_{ij} \in \{0, 1\} \tag{7}$$

$$t\_d \le 1/24\tag{8}$$

Equation (4) refers to the scheduling goal of minimizing the project's maximum completion time, which represents the time needed for the completion of all project tasks. In this paper, we consider the frame task as the job or task of the scheduling problem. According to Equation (5), the time allocation of each fine delay module cannot be greater than *td*. Equations (6) and (7) show that any task can be assigned only to one processor, and *xij* is an assigned variable that is equal to zero or one. Equation (8) represents all fame tasks that must be finished before the frame task deadline.

### **3. IBF Algorithm**

Since there is no dependency between tasks, the fine delay scheduling problem in multi-group scanning can be considered as an independent, parallel processor scheduling task.

The IBF algorithm parameters are defined as follows. Input is the set of tasks *T* = {*ti*, *i* = 1,2, ... ,*n*}, the number of fine-delay modules is *m*, and the number of tasks is *n*. Output is the maximal processing time, *C*IBF max.

The IBF algorithm steps are as follows:

Step 1. Sort tasks *T* in descending order according to the task processing time: *pi*, *i* = 1,2, . . . ,*n*;

Step 2. Assume that *A* = <sup>1</sup> *m n* ∑ *i*=1 *pi* and *Lj*, *j* = 1, 2, ... , *m* are the focus and delay module pointers, respectively;

Step 3. Use the LPT algorithm to obtain the maximal processing time *C*LPT max. Let *l* = 1 and *B*(1) = *C*LPT max;

Step 4. If *A* < max(*Lj*) < *B*(*l*), go to step 5; otherwise, go to step 8;

Step 5. Let *l* = *l* + 1, *i* = 1, *B*(*l*) = min(max(*Lj*), *B*(*l* − 1) − 1);

Step 6. If there is at least one *j* that satisfies the condition *Lj* + *pi* ≤ *B*(*l*), then allocate task *ti* to the focus and delay module, which satisfies condition *Lj* + *pi* ≤ *B*(*l*). Otherwise, allocate the task to the focus and delay module, which provides the minimal value of *Lj* + *pi*;

Step 7. Set *i* = *i* + 1, and if *i* ≤ *n*, go back to step 6; otherwise, go back to step 4;

Step 8. *C*IBF max = min(*B*(1), *B*(2), *B*(*l* − 1)).

In step 3, the LPT algorithm is used to calculate the initial processing time in order to better approximate the initial conditions. Steps 4–8 represent the prepare algorithm (PA). Thus, the IBF algorithm is a combination of LPT and PA that improves the boundary and convergence of iteration, and achieves better performance in terms of local search and iterative progression. The IBF flowchart is shown in Figure 6.

The IBF algorithm analysis is obtained for *B*(1) = *C*LPT max. In the case the iteration stops at *l* = 2, then the output algorithm result will be *C*IBF max = *C*LPT max. If the iteration stops at *l* = 3, then the output result will be *C*IBF max <sup>=</sup> *<sup>C</sup>*PA(B(0)) max , and that wil be the makespan. If the iteration stops at *<sup>l</sup>* > 3, then *C*IBF max <sup>=</sup> *<sup>C</sup>*PA(B(*l*−1)) max .

From *B*(*l*) = min(max(*Lj*), *B*(*l* − 1) − 1), we obtain *B*(*l*) ≤ *B*(*l* − 1) − 1. Thus, *B*(2) ≤ *B*(1) − 1, *B*(3) ≤ *B*(2) − 1 ≤ (*B*(1) − 1) − 1 = *B*(1) − 2.

After induction *B*(*l*) ≤ *B*(1) − (*l* − 1). Therefore, the absolute performance of the IBF algorithm is defined by:

$$\mathcal{C}\_{\text{max}}^{\text{IBF}} \le \left(\frac{4}{3} - \frac{1}{3m}\right) \mathcal{C}\_{\text{max}}^{\text{OPT}} - (l - 1) \tag{9}$$

If the iteration number is equal to one, the IBF time complexity is defined by:

$$O(n\log n + nl\log m)\tag{10}$$

If the number of iterations is greater than one, IBF employs the PA, which represents the FFD algorithm used in the bin-packing problem.

**Figure 6.** Improved bound fit algorithm (IBF) flowchart. LPT: longest processing time algorithm.

### **4. Experimental Results**

### *4.1. Time Performance*

In order to determine the real-time performance of the IBF algorithm, a randomly generated set of tasks was used. The set and real-time deadline were used to simulate a UPA multi-group fine delay scheduling problem. The specific task generation process was as follows. First, *m* time blocks were generated. The length of each time block was as long as the deadline *td*. Then, each task block was divided into *h* = *n*/*m* + 1 parts, and thus *h* × *m* tasks were obtained in *m* time blocks. Afterward, *n* tasks from *h* × *m* tasks that were generated from the previous step were chosen to create a set of tasks, and all task lengths were multiplied by 0.99. Thus, a random generation of a set of tasks was produced. The whole experiment ran in I7-4850HQ (Intel Corporation, Santa Clara, CA, USA) 8 GB RAM with MATLAB 2016a.

This process was conducted to ensure that the processing time of each generated task was not greater than the real-time deadline. All generated tasks did not exceed the calculating ability of the fine-delay module. In other words, a feasible solution always existed for a given scheduling in terms of the number of modules that satisfied the required conditions. The generated set was subjected to a random uniform distribution, and a variety of large scopes were covered.

Five tests were conducted with the following parameters: the number of fine-delay modules *m*, the ratio of number of tasks and fine delay modules *k* = *n*/*m*, the real-time deadline *d*, the number of iterations *K*, and makespan *Cmax*. Each test was generated 100 times, and the average result was calculated. The LIST, LPT, BF, and IBF algorithms were compared.

Test 1 compared LPT, BF, and IBF algorithms in terms of makespan. In Figure 7a, the parameter settings were: *m* = 4, *k* = 2–10, and *d* = 1000. Note that each curve had a peak value at *k* = 3, because when *k* = 3, the method generating the problem reduced the number of tasks and increased the length. Under this condition, the problem was difficult to schedule. With gradually increasing *k*, all curves gradually declined. IBF had the smallest makespan at *k* < 8, and when *k* ≥ 8, IBF and BF almost had the same makespan performance. This is because with the increase in *k*, the problem produced more tasks and the length decreased. That is, the smaller the granularity of the tasks, the greater the role of the scheduling algorithm. In Figure 7b, the parameter settings were: *m* = 2–10, *k* = 4, and *d* = 1000. We can see that the IBF algorithm still had the smallest makespan, but with the increase in *m*, the gap between BF and IBF continued to narrow. Although *k* was unchanged, the larger the value of *m*, the greater

the permutations and combinations of the scheduling algorithm were. In makespan comparisons, IBF always had the best performance, but, as parameters *k* and *m* increased, the performance of BF and IBF gradually approached each other.

**Figure 7.** Comparison of LPT, bound fit (BF), and IBF in terms of makespan with (**a**) variable *k* (ratio of the number of tasks *n* and the number of fine delay modules *m*) and (**b**) variable number of fine delay modules *m*.

Test 2 compared LPT, BF, and IBF in terms of the missed deadline rate (MDR) with variables *k* and *m*. The parameter settings in Figure 8a were the same as in Figure 7a, and those in Figure 7b were applied to Figure 8b. The MDR is defined as the number of times a deadline was missed when a scheduling problem was generated randomly 100 times. Figure 8a shows that all curves had a peak value at *k* = 3, and then gradually decreased with increasing *k*. The reason is similar to test 1. Note that in Figure 8b, IBF had the smallest makespan, but when *m* > 9, the values of BF and IBF were basically the same. IBF was still the best in MDR performance, and with the increase in *k*, the scheduling performance improved as well. When *k* > 8, IBF was not significantly superior to BF.

Test 3 compared LPT, BF, and IBF using statistical plots. Parameter settings were *m* = 4, *k* = 4, and calculation was run 100 times to obtain the makespan. Figure 9a shows the box plot. Note that the IBF algorithm had the lowest median and upper limits and the narrowest interquartile range (IQR). This shows that IBF scheduling had the best overall performance and the most centralized data. In the 95% confidence interval (CI) plot in Figure 9b, IBF had the lowest mean and the narrowest 95% CI. The IBF algorithm outperformed the BF and LPT algorithms in terms of statistical performance.

**Figure 8.** Comparison of LPT, BF, and IBF in terms of missed deadline rate (MDR) with (**a**) *k* (the ratio of the number of tasks *n* and the number of fine-delay modules *m*) and (**b**) variable number of fine delay modules *m*.

**Figure 9.** Comparison of LIST, LPT, and IBF algorithms in (**a**) boxplot and (**b**) 95% confidence interval (CI) plot.

Test 4 compared the performance of LIST, LPT, BF, and IBF algorithms (Table 2). The test parameter settings were *m* = 4, *k* = 4, *d* = 1000, and the average of 100 runs was taken. The LIST algorithm had the worst performance, which affected the display of the figures. In order to clearly compare BF and IBF, which was not mentioned in the previous experiments, *R*IBF/LIST was defined as follows:

$$R\_{\rm IBF/LIST} = \frac{\overline{C\_{\rm max}} - \overline{C\_{\rm max}^{\rm IBF}}}{\overline{C\_{\rm max}^{\rm IST}}} \times 100\% \tag{11}$$

where *C*LIST max , *C*LPT max, *C*BF max, and *C*IBF max represent the average makespans of LIST, LPT, BF, and IBF obtained from 100 runs, respectively. In addition, *K*BF and *K*IBF represent the average number of iterations for BF and IBF. As shown in Table 2, IBF had the lowest average makespan, but its average number of iterations was slightly greater than that of the BF algorithm. This was also reflected in the elapsed time. In the worst case of our experiment, the average elapsed times at *m* = 10, *k* = 4 for LIST, LPT, BF, and IBF algorithms were 2.70, 2.63, 40.61, and 55.21 ms, respectively. The elapsed time of IBF was greater than BF by about 35.95%. However, as shown in the last column of Table 2, IBF improved performance by 8.76–21.48% compared to the LIST algorithm.


**Table 2.** Comparison of LIST, LPT, BF, and IBF performance.

Test 5 was used to examine the relationship of IBF with the number of iterations. In Figure 10a, all curves had a peak value at *k* = 3–5, and then slowly declined. This occurred because when *k* = 35, the generated tasks had large granularity, which facilitated iteration without satisfying the conditions, so the number of iterations was greater. The number of iterations with larger *m* was greater than that with smaller *m*, because a large *m* leads to more permutations and combinations. When *k* > 8, the number of iterations decreased gradually and tended to be the same. Due to the small size of the task, the initial LPT algorithm was more effective, so the number of iterations decreased. In Figure 10b, except for the case of *k* = 2, the other curves increased gradually, and the larger the value of *k*, the smaller the number of iterations. Therefore, the greater the task granularity, the greater the value of *m* and the greater the number of iterations.

**Figure 10.** IBF number of iterations for: (**a**) *k* = 2–10 and (**b**) *m* = 2–10.

### *4.2. Resource Consumption*

In the experiment, an Altera Cyclone VI EP4CE115F29C8 and Quartus II 13.0 (Intel Corporation, Santa Clara, CA, USA) were used to compare all parallel and 1/2 scheduling for 32-channel and 64-channel architectures. Then, the TimeQuest Timing Analyzer in Quartus II was used to determine the maximal clock frequency for the listed architectures. The clock frequency was set to 100 MHz. The obtained resource consumption and maximal frequencies of all architectures are presented in Table 3, wherein "number of groups" represents the number of scan groups in the multi-group UPA system; "number of modules" represents the number of fine delay modules in the system; "Total LUT" (LUT: look up table), "Total Reg.", and "Total 9-bit Mult." refer to the consumption of total logic unit, total register, and total 9-bit multiplier, respectively; and Fmax represents the maximum clock frequency. Percentages with brackets in the Total LUT and Total 9-bit Mult. columns represent their share of all the same resources in the entire FPGA.


**Table 3.** Resource consumption and max frequency of all parallel and 1/2 scheduling for 32-channel and 64-channel architectures.

<sup>1</sup> Due to resource limitations, the total 9-bit multiplier in the FPGA was 532.

Table 3 shows that all parallel architectures demand more resources and have lower maximal frequencies than 1/2 scheduling architectures. The 1/2 scheduling architecture could save about 57.06–58.84% in LUT and 30–40% in 9-bit multipliers. Table 3 also demonstrates that maximum frequency decreased as the number of channels increased. The bold text in column Fmax are the best Fmax in same number of channels, respectively. Therefore, based on the premise of guaranteeing real-time performance, the proposed architecture and IBF algorithm can reduce resource consumption, shorten timing, and increase the maximum clock frequency.

### *4.3. Real-Time Verification*

Figure 11 displays the results of the pre-synthesis simulation in four groups of two fine delay modules, using ModelSim 10.2 SE electronics design automation tools (Mentor Co., Ltd., Wilsonville, OR, USA). The other experimental conditions are described in the previous section, and the experimental parameters are shown in Table 4. The delay caused by fine-delay filters with

eight clock-cycles has been taken into account and combined into time of read parameter. Units are clock cycles of the FPGA in Table 4 columns 2–4.


**Figure 11.** Four groups scheduled in two fine delay modules' simulation by ModelSim.


**Table 4.** Four groups of two fine delay modules simulation parameters.

<sup>1</sup> Time of read parameters *T<sup>i</sup>* RP = 1024.

In Figure 11, the tasks were T0–T3, corresponding to frame tasks of Group 0–3, and FD0 and FD1 are fine delay modules. The upper FD0 and FD1 were scheduled by LIST, and the lower FD0 and FD1 were scheduled by IBF. In the case of maximum 8 K sampling depth, 128 focal laws (Group 3), the makespan of LIST was 13.86 ms, whereas the makespan of IBF was 11.82 ms, so IBF is superior to LIST. At a waiting time of more than 1 ms between frames, the frame periods of LIST and IBF were 14.86 and 12.82 ms, respectively, which correspond to frame rates of 67 and 78 fps, respectively. Therefore, the IBF algorithm generally reduced the makespan of the frame tasks, increased the frame rate, and improved real-time performance of the multi-group scan UPA instrument.

### **5. Conclusions**

In this paper, a fine delay scheduling architecture in the multi-group scanning of a UPA system was presented. The diversity of echo data in multi-group scanning and the number of focal laws were considered, and the multi-group scan problem was modelled by a linear equation. The IBF algorithm was proposed, and its time complexity and absolute performance were analyzed. The experimental results showed that compared to LIST, LPT, and BF algorithms, the IBF algorithm decreased the makespan by 8.76–21.48%, while the frame rate reached 78 fps, and the architecture reduced FPGA resources by 30–40%. The IBF algorithm was superior to BF in terms of its small task-to-module ratio. The proposed algorithm and mathematical model was applied to a UPA. uUsing the proposed architectures effectively improved integration, increased maximum frequency, improved real-time performance, and finally, decreased resource consumption. Therefore, the instrument's flexibility and performance was improved. The next step is to study another processing module scheduling and multi-FPGA situation, integrated in a distributed environment.

**Author Contributions:** Y.L., W.T. and G.L. conceived the idea of the paper; Y.L. performed the experiments, and Y.L. and W.T. carried out the system model; Y.L. wrote the paper.

**Funding:** This work was financially supported by the National Key Foundation for Exploring Scientific Instrument (2013YQ230575) and Guangzhou Science and Technology Plan Project (201509010008).

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Letter*
