**Mathematics and Digital Signal Processing**

Editor **Pavel Lyakhov**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editor* Pavel Lyakhov Mathematical Modeling North-Caucasus Federal University Stavropol Russia

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Applied Sciences* (ISSN 2076-3417) (available at: www.mdpi.com/journal/applsci/special issues/ Mathematics Digital Signal Processing).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-1476-5 (Hbk) ISBN 978-3-0365-1475-8 (PDF)**

© 2021 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

### **Contents**


Reprinted from: *Applied Sciences* **2020**, *10*, 7488, doi:10.3390/app10217488 . . . . . . . . . . . . . . **141**

**Nikolay Chervyakov, Pavel Lyakhov, Mikhail Babenko, Irina Lavrinenko, Maxim Deryabin, Anton Lavrinenko, Anton Nazarov, Maria Valueva, Alexander Voznesensky and Dmitry Kaplun**

A Division Algorithm in a Redundant Residue Number System Using Fractions Reprinted from: *Applied Sciences* **2020**, *10*, 695, doi:10.3390/app10020695 . . . . . . . . . . . . . . **155**

### **About the Editor**

#### **Pavel Lyakhov**

Pavel Lyakhov is currently the Head of the Department of Mathematical Modeling, North-Caucasus Federal University. He graduated in mathematics from Stavropol State University, in 2009, where he also received a Ph.D. degree in mathematics, in 2012. He has been working with North-Caucasus Federal University, since 2012. Currently leads research projects: RFBR grant 19-07-00130-A -effective tools for intellectual analysis of visual information based on convolutional neural networksand Russian Federation President grant MK-3918.2021.1.6 -performance digital medical imaging circuits based on parallel mathematics. His research interests include high-performance computing, residue number systems, digital signal processing, image processing, and medical imaging.

### *Article* **Analysis of the Quantization Noise in Discrete Wavelet Transform Filters for 3D Medical Imaging**

**Nikolay Chervyakov, Pavel Lyakhov and Nikolay Nagornov \***

Department of Applied Mathematics and Mathematical Modeling, North-Caucasus Federal University, Stavropol 355017, Russia; k-fmf-primath@stavsu.ru (N.C.); ljahov@mail.ru (P.L.)

**\*** Correspondence: sparta1392@mail.ru; Tel.: +7-962-451-3247

Received: 14 January 2020; Accepted: 8 February 2020; Published: 11 February 2020

**Abstract:** Denoising and compression of 2D and 3D images are important problems in modern medical imaging systems. Discrete wavelet transform (DWT) is used to solve them in practice. We analyze the quantization noise effect in coefficients of DWT filters for 3D medical imaging in this paper. The method for wavelet filters coefficients quantizing is proposed, which allows minimizing resources in hardware implementation by simplifying rounding operations. We develop the method for estimating the maximum error of 3D grayscale and color images DWT with various bits per color (BPC). The dependence of the peak signal-to-noise ratio (PSNR) of the images processing result on wavelet used, the effective bit-width of filters coefficients and BPC is revealed. We derive formulas for determining the minimum bit-width of wavelet filters coefficients that provide a high (PSNR ≥ 40 dB for images with 8 BPC, for example) and maximum (PSNR = ∞ dB) quality of 3D medical imaging by DWT depending on wavelet used. The experiments of 3D tomographic images processing confirmed the accuracy of theoretical analysis. All data are presented in the fixed-point format in the proposed method of 3D medical images DWT. It is making possible efficient, from the point of view of hardware and time resources, the implementation for image denoising and compression on modern devices such as field-programmable gate arrays and application-specific integrated circuits.

**Keywords:** discrete wavelet transform; medical imaging; 3D image processing; quantization noise

#### **1. Introduction**

Medical imaging uses many different methods such as magnetic resonance (MR) imaging [1–8], radiography [4,9–11], radionuclide [8,12], optical [11,13,14], ultrasound [1,15] and medical robotics [16,17]. The typical medical imaging system consists of three components (Figure 1): data acquisition, data consolidation and data processing. The data acquisition card, which filters incoming data, is the most cost-sensitive system card. Usually, a diagnostic imaging system will consist of multiple data acquisition cards. Once the data is compensated and filtered in scanners, it is sent to the data consolidation card for buffering and data alignment. Once the data has been collected, it is sent to the image processing cards [18]. These cards perform heavy-duty filtering and the most algorithm-intensive image reconstruction. Modern field-programmable gate array (FPGA) devices are widely used in data consolidation, and image processing for sophisticated application algorithms implementation including pattern recognition, image enhancement and data compression [19,20].

Denoising of 2D and 3D medical images is an important problem in modern medical imaging systems. The noisy pattern is not always bad in medical images, but in most cases is a problem. MR images are inherently noisy and thus filtering methods are required to improve the data quality [5]. Rheological methods of increasing MR elastography resolution determine viscoelastic properties through wave inversion, which is highly ill posed and sensitive to noise [1]. In radiology using computed tomography (CT) or related morphological imaging modalities, noise affects the analysis of

anatomical structures and thus impedes diagnostic applications [11]. Low dose radiation exposure for patient safety leads to noisy and low-contrast fluoroscopic sequences [11]. The reconstruction process of the positron emission tomography images includes inherent multiplicative noise, which prevents the analysis of visual data [12]. In optical CT for retinal imaging as another example use case, noise limits the measurement of structural features in the human eye, e.g., retinal layer properties [11]. Denoising facilitates visual data interpretation from echocardiography [15].

**Figure 1.** The typical medical imaging system.

Medical imaging systems produce increasingly accurate images with improved quality using higher spatial resolutions and bit-depths with advances in scanning technology and digital devices. Such improvements increase the amount of information that needs to be processed, transmitted and stored. This is especially true when using 3D scanning technology [4]. For example, four sets of positron emission tomography medical images of one patient may require more than 4 GB of storage space [21]. Video recording of a relatively short retinal peeling procedure may require over 40 GB of memory storage [14]. The capacity of hard drives is on average 1–2 TB with the current level of storage technology development. Thus, the compression of 3D medical images is also an important problem in modern medical imaging systems.

Various transforms are used to solve problems of 2D and 3D medical images denoising and compression in practice. The most common of them are discrete Fourier transform (DFT) [3,7,14,22] and discrete wavelet transform (DWT) [1,9,11,14]. DFT is widely used in the frequency domain but the domain characteristics disappeared after it. We cannot determine the time position and the degree of intensity after signal DFT. It is not possible to describe the local properties of the time domain of the image. DWT solves these problems because it allows obtaining both frequency and time information about a signal [23,24]. 2D and 3D images DWT is performed by convolution with a pair of lowpass and highpass wavelet filters of filter bank that highlight main and detailed information respectively. Denoising and compression of images are performed by detailed information manipulating in modern algorithms such as set partitioning in hierarchical trees (SPIHTs) [25] and embedded zerotrees of wavelet transforms (EZWs) [26]. The convolution operation has high computational complexity. Hardware implementation on modern microelectronic devices such as field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs) working with fixed-point numbers is one of the ways to improve its characteristics [27–29]. Quantization noise occurs when converting

wavelet filters coefficients into this format, due to which convolution is performed with an error. The question arises about the accuracy of wavelet filters coefficients representation in the device's memory, which is efficient in terms of resources and enough to achieve the required quality of image processing. A novel area-efficient high-throughput 3D DWT architecture for real-time medical imaging based on distributed arithmetic is proposed by the authors [30]. The design and implementation of 3D Haar wavelet transform with transpose based computation and dynamic partial reconfiguration for 3D medical image compression are presented in [31]. The implementation of positron emission tomography using DWT on FPGA is proposed by authors [32]. In paper [33] described the architecture based on the use of DWT for biomedical signals compression. The design and implementation of context-based adaptive variable length coding and comparative analysis of trade-off offered by DWT for 3D medical image compression systems are described by authors [34]. In [35] presented the design and implementation of 3D DWT with a transpose-based method for medical image compression on FPGA. Experimental results from [36] showed that the system constructed a 1D DWT system based on FPGA can filter the noise and extract the electroencephalogram (EEG) signal well. The design and implementation on FPGA of 3D DWT using Daubechies wavelets with a transpose-based method for medical image compression are presented in [37]. The design and implementation of distributed arithmetic architectures of 3D DWT with a hybrid method for medical image compression are presented in [38]. Authors [39] presented the FPGA-based embedded system design using DWT and its evaluation for a pre-processing stage of EEG signal analysis. A detailed review of FPGA and ASIC architectures for DWT implementation in biomedical and intelligent applications, which can be designed either for higher-accuracy or for low-power consumption is provided by the authors [29]. In [40], the authors showed that DWT along with Gaussian filtering shows better results in removing the noise and smoothes the electrocardiogram signals. Authors [41] described the design and implement a complete hardware model based on DWT for EEG data compression and reconstruction on FPGA. A framework is offered in [42] based on DWT using linear and non-linear classifiers for detecting an epileptic seizure from EEG data recorded from normal subjects and an epileptic patient. There are no references to selected bit-width of wavelet filters coefficients in the materials studied about the hardware implementation of medical images DWT on FPGA and ASIC [29–32,34–42]. Authors [33] quantized wavelet filters coefficients by 16 bits, but there is no rationale for this choice. The problem of analyzing the quantization noise effect in wavelet filters coefficients for 2D grayscale and color images DWT with 8 bits per color (BPC) was solved in [43].

Analysis of 3D medical images DWT result quality dependence on noise arising from filters coefficients quantizing of wavelet with compact support is the purpose of this work. Particular attention is paid to determining the minimum bit-width of wavelet filters coefficients, at which this noise does not have a significant impact on the 3D medical images DWT result (*PSNR* ≥ 40 dB for images with 8 BPC, for example), or does not affect it at all (*PSNR* = ∞). The values *PSNR* ≥ 40 dB describes the difference between the two images with 8 BPC almost imperceptible for human eyes [44,45]. The value *PSNR* = ∞ for identical images.

#### **2. Materials and Methods**

DWT is a signal transform using a filter bank, which is a convolution of the input data with wavelet filters that translate them from a time representation into a time-frequency domain. Wavelet filters *F* of filter bank consist of coefficients *fF*,*<sup>i</sup>* , where *i* = 1, . . . , *k* and *k* is the number of coefficients. Coefficients of lowpass and highpass wavelet filters of decomposition (*LD*, *HD*) and reconstruction (*LR*, *HR*) are related by equation [27]

$$f\_{\rm HD,i} = (-1)^{i+1} f\_{\rm LD,k-1-i\nu} f\_{\rm LR,i} = f\_{\rm LD,k-1-i\nu} f\_{\rm HR,i} = (-1)^i f\_{\rm LD,i}.\tag{1}$$

We shall consider only wavelets with compact support [46]. Daubechies wavelets *db*(*k*/2) (where *db*1 with *k* = 2 is Haar wavelet), symlets *sym*(*k*/2) and coiflets *coi f*(*k*/6) are the most common ones.

Consider a 3D digital medical image *I* of *X* rows, *Y* columns and *Z* frames as a function *I*(*x*, *y*, *z*), where 0 ≤ *x* ≤ *X* − 1, 0 ≤ *y* ≤ *Y* − 1 and 0 ≤ *z* ≤ *Z* − 1 are the spatial coordinates of *I*. Thus, voxel values (analogues of 2D pixels for 3D space) are represented as *I*(*x*, *y*, *z*) for grayscale images and as *I*(*x*, *y*, *z*, *c*) for color images, where *c* is the color number (for example, *c* = 1, 2, 3—red, green and blue colors respectively for RGB images). We assumed that all image voxels are isotropic [47], hereinafter.

Convolution of a 3D image with wavelet filters is performed by formulas

$$I'(\mathbf{x}, \mathbf{y}, z) = \sum\_{i=1}^{k} I(\mathbf{x} - i, \mathbf{y}, z) \cdot f\_{\mathbf{F}, i\prime} \ I^{\prime}(\mathbf{x}, \mathbf{y}, z) = \sum\_{i=1}^{k} I^{\prime}(\mathbf{x}, \mathbf{y} - i, z) \cdot f\_{\mathbf{F}, i\prime}$$

$$I^{\prime\prime}(\mathbf{x}, \mathbf{y}, z) = \sum\_{i=1}^{k} I^{\prime\prime}(\mathbf{x}, \mathbf{y}, z - i) \cdot f\_{\mathbf{F}, i\prime}$$

where *I* ′ , *I* ′′ and *I* ′′′ is the convolution results by strings, columns and frames respectively. 3D image DWT is performed by sequential convolution with wavelet filters (Figure 2) in the steps below.


**Figure 2.** The scheme of 3D image discrete wavelet transform (DWT).

*f* We get 8 sets of coefficients, *LLL*, *LLH*, *LHL*, *LHH*, *HLL*, *HLH*, *HHL* and *HHH*, of image decomposition as a result of original image *I* analysis. These sets can be divided into approximating (*LLL*) and detailing (*LLH*, *LHL*, *LHH*, *HLL*, *HLH*, *HHL* and *HHH*). Approximating coefficients correspond to the lowpass part of the signal and contain main information about the image *I*. Detailing coefficients to correspond to the highpass part of the signal and contain detailed information about the image *I*. 3D image denoising and compression are carried out by manipulating detailing coefficients (*LLH*, *LHL*, *LHH*, *HLL*, *HLH*, *HHL* and *HHH*) of image decomposition.

4

*f*


We get the reconstructed image e*I* as a result of image decomposition coefficients synthesis. Theoretically, the original image should be fully reconstructed since the scheme in Figure 2 has the perfect reconstruction property [48]. However, quantization noise occurs due to the digital format of wavelet filters coefficients representation in practice. Quantization noise distorts all image decomposition coefficients *LLL*, *LLH*, *LHL*, *LHH*, *HLL*, *HLH*, *HHL* and *HHH* as well as reconstructed image e*I*. The images DWT result may have a quality unacceptable for the task depending on the magnitude of quantization noise.

The question arises about the minimum bit-width of wavelet filters coefficients *fF*,*<sup>i</sup>* , necessary for efficient software and hardware implementation of 3D images DWT on modern devices and enough for high-quality images processing. The speed of operations with a fixed-point number is higher than with a floating-point number on modern devices. This can be used to develop 3D medical imaging devices. Therefore, wavelet filters coefficients are quantized and converted into a fixed-point format in the proposed method by scaling by 2*<sup>n</sup>* and rounding up

$$f\_{\mathbf{F},i}^\* = \left[ \mathbf{2}^n f\_{\mathbf{F},i} \right]. \tag{2}$$

Bit-width *r* of quantized wavelet filters coefficients *f* ∗ *F*,*i* can be determined by the formula *r* = *n* + 1 in this case. The digital image *I* ∗ processed according to the scheme in Figure 2 using quantized wavelet filters coefficients *f* ∗ *F*,*i* . Voxel values of an image *I* ∗ should be normalized by scaling by 2−6*<sup>n</sup>* (2−*<sup>n</sup>* for each convolution, according to the scheme from Figure 2) and rounding down

$$
\widetilde{I} = \left\lfloor 2^{-6n} I^\* \right\rfloor. \tag{3}
$$

We get only integers as a result of images DWT with unquantized coefficients. The quantization error of the wavelet filters coefficients rounded up is strictly redundant. Rounding down of the DWT results minimizes this error and cannot cause an error by itself. Rounding up and down operations are performed by discarding the fractional part of the number with the addition of one in the case of rounding up an integer. The rounding errors will have different signs and partially compensate each other for rounding in different directions. Rounding operations in this order require fewer resources for hardware implementation than rounding operations to the nearest integer. This is due to the fact wavelet filter coefficients are known a priori and their quantization with rounding up can be made in advance. Thus, wavelet filters coefficients will be used in the form of constants in the software and hardware part. The convolution is performed using arithmetic logic devices, and its result is rounded down by simply discarding the fractional part and does not require additional hardware and time costs.

We used the peak signal-to-noise ratio (*PSNR*) between two images (original image *I* and processed image e*I*) to quantify the image processing quality. The *PSNR* logarithmic nature makes it possible to clearly interpret results that differ slightly from each other. Other metrics usually only show a big difference. This characteristic is measured in decibels (dB) and is calculated by the following formula [49]

$$PSNR = 10\log\_{10}\left(\frac{\left(2^B - 1\right)^2}{MSE}\right) = 10\log\_{10}\left(\frac{M^2}{MSE}\right).$$

where: *B* is the image BPC; *M* is the maximum brightness of the image voxels (for example, *B* = 8 and *M* = 2 <sup>8</sup> <sup>−</sup> <sup>1</sup> = 255 for 8-bit grayscale image and 24-bit RGB image); *MSE* is the mean square error of brightness, which is calculated for grayscale (*MSEgrayscale*) [50] and color (*MSEcolor*) [51] 3D images by formulas

$$MSE\_{g\text{nyyscale}} = \sum\_{\mathbf{x}=0}^{X-1} \sum\_{y=0}^{Y-1} \sum\_{z=0}^{Z-1} \frac{\left(I(\mathbf{x}, y, z) - \overline{I}(\mathbf{x}, y, z)\right)^2}{X \cdot Y \cdot Z},$$

$$MSE\_{color} = \frac{1}{\overline{C}} \sum\_{c=1}^{C} \sum\_{\mathbf{x}=0}^{X-1} \sum\_{y=0}^{Y-1} \sum\_{z=0}^{Z-1} \frac{\left(I(\mathbf{x}, y, z, c) - \overline{I}(\mathbf{x}, y, z, c)\right)^2}{X \cdot Y \cdot Z}.$$

The value *PSNR* = ∞ for identical images. The image processing quality is considered high if *PSNR* ≥ *Q*, where *Q* describes the difference between the two images almost imperceptible for human eyes. *Q* = 40 dB for images with 8 BPC [44,45]. We propose to generalize *Q* to the case of images with 12 and 16 BPC using formula

$$Q = \mathbf{5B}.\tag{4}$$

*i*=1

Thus, *Q* is equal 40 dB, 60 dB and 80 dB for images with 8, 12 and 16 BPC respectively.

#### **3. Results**

#### *3.1. Theoretical Analysis of the Maximum Error of the 3D Medical Images DWT*

The error of 3D medical images DWT occurs as a result of wavelet filters coefficients conversion (quantization noise) by Formula (2). Convolutions, upsampling and the summing of convolution results cause an increase in this error. Rounding down normalized voxel values of the restored image also has an effect. Note the important facts.


$$\sum\_{i=1}^{k} f\_{L,i} = \sqrt{2}, \text{ respectively [27].}$$

We introduce the following notation.


The errors *a* of all image decomposition coefficients *LLL*, *LLH*, *LHL*, *LHH*, *HLL*, *HLH*, *HHL* and *HHH* are separated into two groups *a*<sup>ε</sup> (ε = 1, 2) as a result of upsampling ↑ 2. Figure 3 shows an example of the errors separation *a*<sup>ε</sup> (ε = 1, 2, 3, 4) at the upsampling by frames and columns, where *Y* <sup>∗</sup> = (*Y* + *k*)/2 − 1 and *Z* <sup>∗</sup> = (*Z* + *k*)/2 − 1. This situation is similar for upsampling by strings. Upsampling is applied three times during image restoration. We got eight groups of errors *a*<sup>ε</sup> (ε = 1, 2, 3, 4, 5, 6, 7, 8) as a result. Thus, we would add an additional index ε to the introduced notations, which denotes calculations by the spatial characteristics of wavelet filters coefficients.

**Figure 3.** The scheme of the errors separation with upsampling by frames and columns.

Next, we carried out analysis calculations for an estimation of the maximum error of the 3D medical images DWT.

*<sup>F</sup> <sup>F</sup> L* **Stage 1. Wavelet filters coe**ffi**cients quantization.** Let us calculate the exact values of the coefficients sums *SF*, *SF*,<sup>ε</sup> and errors *E*1,*F*, *E*1,*F*,<sup>ε</sup> of rounding up filters *L* and *H* scaled coefficients.

 *S<sup>L</sup>* = X *k i*=1 2 *n fL*,*<sup>i</sup>* = 2 *n*X *k i*=1 *fL*,*<sup>i</sup>* = 2 *n* · √ 2 = 2 *n*+ <sup>1</sup> <sup>2</sup> , *S<sup>H</sup>* = X *k i*=1 2 *n fH*,*<sup>i</sup>* = 2 *n*X *k i*=1 *fH*,*<sup>i</sup>* = 2 *n* · 0 = 0, *SL*,1 = *k* X<sup>2</sup> *i*=1 2 *n fL*,2(*i*−1) , *SL*,2 = *k* X<sup>2</sup> *i*=1 2 *n fL*,2*i*−1, *SH*,1 = *k* X<sup>2</sup> *i*=1 2 *n fH*,2(*i*−1) , *SH*,2 = *k* X<sup>2</sup> *i*=1 2 *n fH*,2*i*−1, *E*1,*<sup>L</sup>* = X *k i*=1 l2 *n fL*,*i* m − 2 *n fL*,*i* , *E*1,*<sup>H</sup>* = X *k i*=1 l2 *n fH*,*<sup>i</sup>* m − 2 *n fH*,*<sup>i</sup>* , *E*1,*L*,1 = *k* X<sup>2</sup> *i*=1 l2 *n fL*,2(*i*−1) m − 2 *n fL*,2(*i*−1) , *E*1,*L*,2 = *k* X<sup>2</sup> *i*=1 l2 *n fL*,2*i*−<sup>1</sup> m − 2 *n fL*,2*i*−<sup>1</sup> , *E*1,*H*,1 = *k* X<sup>2</sup> *i*=1 l2 *n fH*,2(*i*−1) m − 2 *n fH*,2(*i*−1) , *E*1,*H*,2 = *k* X<sup>2</sup> *i*=1 l2 *n fH*,2*i*−<sup>1</sup> m − 2 *n fH*,2*i*−<sup>1</sup> .

*<sup>F</sup> E L* **Stage 2. Row decomposition.** Let us calculate the exact values *T*2,*<sup>F</sup>* and errors *E*2,*<sup>F</sup>* of row decomposition with filters *L* and *H*.

$$T\_{2,L} = S\_L \cdot M\_\prime\\E\_{2,L} = E\_{1,L} \cdot M\_\prime\\E\_{2,H} = E\_{1,H} \cdot M\_\prime$$

All convolution results *Tj*,*<sup>F</sup>* with filter *H* are zero since *Ti*,*<sup>F</sup>* for all voxels are equal and *k* P−1 *i*=0 *fH*,*<sup>i</sup>* = 0 [27].

**Stage 3. Column decomposition.** Let us calculate the exact values *T*3,*<sup>F</sup>* and errors *E*3,*<sup>F</sup>* of column decomposition with filters *L* and *H*.

$$T\_{3,LL} = T\_{2,L} \cdot \mathbb{S}\_{L\prime} E\_{3,LL} = (T\_{2,L} + E\_{2,L})(\mathbb{S}\_L + E\_{1,L}) - T\_{3,LL\prime}$$

$$E\_{3,LH} = (T\_{2,L} + E\_{2,L})E\_{1,H\prime}E\_{3,HL} = E\_{2,H}(\mathbb{S}\_L + E\_{1,L})\_\prime E\_{3,HH} = E\_{2,HE\_{1,H}}$$

**Stage 4. Frame decomposition.** Let us calculate the exact values *T*4,*<sup>F</sup>* and errors *E*4,*<sup>F</sup>* of frame decomposition with filters *L* and *H*.

$$T\_{4,LLL} = T\_{3,LL} \cdot S\_{L\prime} E\_{4,LLL} = (T\_{3,LL} + E\_{3,LL})(S\_L + E\_{1,L}) - T\_{4,LLL}$$

$$E\_{4,LLH} = (T\_{3,LL} + E\_{3,LL})E\_{1,H\nu}E\_{4,LHL} = E\_{3,LH}(S\_L + E\_{1,L}),$$

$$E\_{4,LHH} = E\_{3,LH} \cdot E\_{1,H\nu}E\_{4,HIL} = E\_{3,HL}(S\_L + E\_{1,L}), \\ E\_{4,HLH} = E\_{3,HL} \cdot E\_{1,H\nu}E\_{4,LLH},$$

$$E\_{4,HHL} = E\_{3,HH}(S\_L + E\_{1,L}), \\ E\_{4,HHH} = E\_{3,HH} \cdot E\_{1,H}.$$

**Stage 5. Frame reconstruction.** Let us calculate the exact values *T*5,*F*,*<sup>l</sup>* and errors *E*5,*F*,*<sup>l</sup>* of frame reconstruction with filters *L* and *H*, ε = 1, 2.

$$T\_{5,LLL,\varepsilon} = T\_{4,LLL} \cdot S\_{L,\varepsilon}, \\ E\_{5,LLL,\varepsilon} = (T\_{4,LLL} + E\_{4,LLL})(S\_{L,\varepsilon} + E\_{1,L\varepsilon}) - T\_{5,LLL,\varepsilon},$$

$$E\_{5,LLLH,\varepsilon} = E\_{4,LLL}(S\_{H,\varepsilon} + E\_{1,H,\varepsilon}), \\ E\_{5,LHL,\varepsilon} = E\_{4,LHL}(S\_{L,\varepsilon} + E\_{1,L\varepsilon}),$$

$$E\_{5,LHHH,\varepsilon} = E\_{4,LHH}(S\_{H,\varepsilon} + E\_{1,H,\varepsilon}), \\ E\_{5,HLL,\varepsilon} = E\_{4,HLL}(S\_{L,\varepsilon} + E\_{1,L\varepsilon}),$$

$$E\_{5,HLH,\varepsilon} = E\_{4,HLH}(S\_{H,\varepsilon} + E\_{1,H,\varepsilon}), \\ E\_{5,HHL,\varepsilon} = E\_{4,HHL}(S\_{L,\varepsilon} + E\_{1,L\varepsilon}),$$

$$E\_{5,HHHH,\varepsilon} = E\_{4,HHH}(S\_{H,\varepsilon} + E\_{1,H,\varepsilon}).$$

**Stage 6. Frame summation.** Let us calculate the errors *E*6,*F*,<sup>ε</sup> of sums *E*5,*F*,ε, ε = 1, 2.

$$E\_{6, \text{LL}, \varepsilon} = E\_{5, \text{LLLL}, \varepsilon} + E\_{5, \text{LLHH}, \varepsilon}, \\ E\_{6, \text{LH}, \varepsilon} = E\_{5, \text{LHH}, \varepsilon} + E\_{5, \text{LHH}, \varepsilon}.$$

$$E\_{6, \text{HL}, \varepsilon} = E\_{5, \text{HLLL}, \varepsilon} + E\_{5, \text{HLHH}, \varepsilon}, \\ E\_{6, \text{HH}, \varepsilon} = E\_{5, \text{HHLL}, \varepsilon} + E\_{5, \text{HHHH}, \varepsilon}.$$

**Stage 7. Column reconstruction**. Let us calculate the errors *T*7,*F*,<sup>ε</sup> and errors *E*7,*F*,<sup>ε</sup> of column reconstruction with filters *L* and *H*.

*T*7,*LL*,1 = *T*5,*LLLL*,1 · *SL*,1, *T*7,*LL*,2 = *T*5,*LLLL*,2 · *SL*,1, *T*7,*LL*,3 = *T*5,*LLLL*,1 · *SL*,2, *T*7,*LL*,4 = *T*5,*LLLL*,2 · *SL*,2, *E*7,*LL*,1 = (*T*5,*LLLL*,1 + *E*6,*LL*,1)(*SL*,1 + *E*1,*L*,1) − *T*7,*LL*,1, *E*7,*LL*,2 = (*T*5,*LLLL*,2 + *E*6,*LL*,2)(*SL*,1 + *E*1,*L*,1) − *T*7,*LL*,2, *E*7,*LL*,3 = (*T*5,*LLLL*,1 + *E*6,*LL*,1)(*SL*,2 + *E*1,*L*,2) − *T*7,*LL*,3, *E*7,*LL*,4 = (*T*5,*LLLL*,2 + *E*6,*LL*,2)(*SL*,2 + *E*1,*L*,2) − *T*7,*LL*,4, *E*7,*LH*,1 = *E*6,*LH*,1(*SH*,1 + *E*1,*H*,1), *E*7,*LH*,2 = *E*6,*LH*,2(*SH*,1 + *E*1,*H*,1), *E*7,*LH*,3 = *E*6,*LH*,1(*SH*,2 + *E*1,*H*,2), *E*7,*LH*,4 = *E*6,*LH*,2(*SH*,2 + *E*1,*H*,2), *E*7,*HL*,1 = *E*6,*HL*,1(*SL*,1 + *E*1,*L*,1), *E*7,*HL*,2 = *E*6,*HL*,2(*SL*,1 + *E*1,*L*,1), *E*7,*HL*,3 = *E*6,*HL*,1(*SL*,2 + *E*1,*L*,2), *E*7,*HL*,4 = *E*6,*HL*,2(*SL*,2 + *E*1,*L*,2), *E*7,*HH*,1 = *E*6,*HH*,1(*SH*,1 + *E*1,*H*,1), *E*7,*HH*,2 = *E*6,*HH*,2(*SH*,1 + *E*1,*H*,1), *E*7,*HH*,3 = *E*6,*HH*,1(*SH*,2 + *E*1,*H*,2), *E*7,*HH*,4 = *E*6,*HH*,2(*SH*,2 + *E*1,*H*,2). **Stage 8. Column summation.** Let us calculate the errors *E*8,*F*,<sup>ε</sup> of sums *E*7,*F*,ε, ε = 1, 2, 3, 4.

$$E\_{8,L,\varepsilon} = E\_{7,\mathrm{LL},\varepsilon} + E\_{7,\mathrm{LH},\varepsilon\prime} \\ E\_{8,\mathrm{H},\varepsilon} = E\_{7,\mathrm{HL},\varepsilon} + E\_{7,\mathrm{HH},\varepsilon} \dots$$

**Stage 9. Row reconstruction.** Let us calculate the errors *T*9,<sup>ε</sup> and errors *E*9,*F*,<sup>ε</sup> of column reconstruction with filters *L* and *H*.

$$T\_{9,1} = T\_{7,IL,1} \cdot S\_{L,1}, T\_{9,2} = T\_{7,IL,2} \cdot S\_{L,1}, T\_{9,3} = T\_{7,IL,3} \cdot S\_{L,1}, T\_{9,4} = T\_{7,IL,4} \cdot S\_{L,1},$$

$$T\_{9,5} = T\_{7,IL,1} \cdot S\_{L,2}, T\_{9,6} = T\_{7,IL,2} \cdot S\_{L,2}, T\_{9,7} = T\_{7,IL,3} \cdot S\_{L,2}, T\_{9,8} = T\_{7,IL,4} \cdot S\_{L,2},$$

$$E\_{9,L,1} = (T\_{7,LL,1} + E\_{8,L,1})(S\_{L,1} + E\_{1,L,1}) - T\_{9,1}E\_{9,L,2} = (T\_{7,LL,2} + E\_{8,L,2})(S\_{L,1} + E\_{1,L,1}) - T\_{9,2},$$

$$E\_{9,L,3} = (T\_{7,LL,3} + E\_{8,L,3})(S\_{L,1} + E\_{1,L,1}) - T\_{9,5}E\_{9,L,4} = (T\_{7,LL,4} + E\_{8,L,4})(S\_{L,1} + E\_{1,L,1}) - T\_{9,4},$$

$$E\_{9,L,5} = (T\_{7,LL,1} + E\_{8,L,1})(S\_{L,2} + E\_{1,L,2}) - T\_{9,5}E\_{9,L,6} = (T\_{7,LL,2} + E\_{8,L,2})(S\_{L,2} + E\_{1,L,2}) - T\_{9,6},$$

$$E\_{9,L,7} = (T\_{7,LL,3} + E\_{8,L,3})(S\_{L,2} + E\_{1,L,2}) - T\_{9,7}E\_{9,L,8} = (T\_{7,LL,4} + E\_{8,L,4})(S\_{L,2} + E\_{1,L,2}) - T\_{9,8,L}.$$

$$\mathbf{E}\_{9,H,1} = \mathbf{E}\_{8,H,1}(\mathbf{S}\_{H,1} + \mathbf{E}\_{1,H,1}), \mathbf{E}\_{9,H,2} = \mathbf{E}\_{8,H,2}(\mathbf{S}\_{H,1} + \mathbf{E}\_{1,H,1}), \mathbf{E}\_{9,H,3} = \mathbf{E}\_{8,H,3}(\mathbf{S}\_{H,1} + \mathbf{E}\_{1,H,1}),$$

$$\mathbf{E}\_{9,H,4} = \mathbf{E}\_{8,H,4}(\mathbf{S}\_{H,1} + \mathbf{E}\_{1,H,1}), \mathbf{E}\_{9,H,5} = \mathbf{E}\_{8,H,1}(\mathbf{S}\_{H,2} + \mathbf{E}\_{1,H,2}), \mathbf{E}\_{9,H,6} = \mathbf{E}\_{8,H,2}(\mathbf{S}\_{H,2} + \mathbf{E}\_{1,H,2}),$$

$$\mathbf{E}\_{9,H,7} = \mathbf{E}\_{8,H,3}(\mathbf{S}\_{H,2} + \mathbf{E}\_{1,H,2}), \mathbf{E}\_{9,H,8} = \mathbf{E}\_{8,H,4}(\mathbf{S}\_{H,2} + \mathbf{E}\_{1,H,2}).$$

**Stage 10. Row summation.** Let us calculate the errors *E*10,<sup>ε</sup> of sums *E*9,*F*,ε, ε = 1, 2, 3, 4, 5, 6, 7, 8.

$$E\_{10,\varepsilon} = E\_{9,L,\varepsilon} + E\_{9,H,\varepsilon}.$$

**Stage 11. Normalizing.** Let us calculate the errors *E*11,<sup>ε</sup> of rounding downscaled *E*10,<sup>ε</sup> by 2−6*<sup>n</sup>* , ε = 1, 2, 3, 4, 5, 6, 7, 8.

$$E\_{11,\varepsilon} = \left\lfloor \mathfrak{Z}^{-6n} E\_{10,\varepsilon} \right\rfloor.$$

The obtained values *E*11,<sup>ε</sup> (ε = 1, 2, 3, 4, 5, 6, 7, 8) represent the resulting error of the method and allow for the calculation of the PSNR

$$PSNR = 10\log\_{10}\left(8M^2 / \sum\_{\varepsilon=1}^{8} E\_{11,\varepsilon}^2\right) \tag{5}$$

where *MSEgrayscale* = *MSEcolor* = <sup>1</sup> 8 P 8 ε=1 *E* 2 11,ε .

Formula (5) allows determining the minimum quality of a 3D image *db*3, obtained as a result of DWT of the original image *I*, depending on the maximum brightness and selected bit-width *r* = *n* + 1 of wavelet filters coefficients *fF*,*<sup>i</sup>* .

Calculations results (*PSNR*, dB) obtained by using our method of wavelet filters coefficients quantizing and final Formula (5) for 3D medical grayscale and color images DWT with various BPC, various bit-width *r* and numbers *k* = 2, 4, 6, . . . , 20 of wavelets *db*(*k*/2) filters coefficients are presented in Tables 1–3. The cells in bold correspond to the minimum bit-widths of the filter coefficients, at which the processing quality achieves a high level according to the formula (4).

**Table 1.** Calculation results (*PSNR*, dB) of 3D medical images (with 8 BPC) DWT by using bit-width *r* of Daubechies wavelets filters coefficients.


**Table 2.** Calculation results (*PSNR*, dB) of 3D medical images (with 12 BPC) DWT by using bit-width *r* of Daubechies wavelets filters coefficients.



**Table 3.** Calculation results (*PSNR*, dB) of 3D medical images (with 16 BPC) DWT by using bit-width *r* of Daubechies wavelets filters coefficients.

Calculations results (*PSNR*, dB) obtained by using our method of wavelet filters coefficients quantizing and final Formula (5) for 3D medical grayscale and color images DWT with various BPC, various bit-width *r* and numbers *k* = 2, 4, 6, . . . , 20 of wavelets *sym*(*k*/2) filters coefficients are presented in Tables 4–6.

**Table 4.** Calculation results (*PSNR*, dB) of 3D medical images (with 8 BPC) DWT by using bit-width *r* of symlets filters coefficients.


**Table 5.** Calculation results (*PSNR*, dB) of 3D medical images (with 12 BPC) DWT by using bit-width *r* of symlets filters coefficients.


**Table 6.** Calculation results (*PSNR*, dB) of 3D medical images (with 16 BPC) DWT by using bit-width *r* of symlets filters coefficients.


Calculations results (*PSNR*, dB) obtained by using our method of wavelet filters coefficients quantizing and final Formula (5) for 3D medical grayscale and color images DWT with various BPC, various bit-width *r* and numbers *k* = 6, 12, 18, 24, 30 of wavelets *coi f*(*k*/6) filters coefficients are presented in Tables 7–9.


**Table 7.** Calculation results (*PSNR*, dB) of 3D medical images (with 8 BPC) DWT by using bit-width *r* of coiflets filters coefficients.

**Table 8.** Calculation results (*PSNR*, dB) of 3D medical images (with 12 BPC) DWT by using bit-width *r* of coiflets filters coefficients.


**Table 9.** Calculation results (*PSNR*, dB) of 3D medical images (with 16 BPC) DWT by using bit-width *r* of coiflets filters coefficients.


Let us compile Tables 10–12 based on Tables 1–9 with the minimum values of *r*, at which the result of 3D medical images DWT with Daubechies wavelets, symlets and coiflets reach a high and maximum quality. For example, the result of 3D medical images (with 8 BPC) DWT with Daubechies wavelet *db*8 reaches high quality at *r* = 13 (*PSNR* = 43.36 dB) and maximum quality at *r* = 15 (*PSNR* = ∞) according to Table 1. The remaining cells are filled in the same way.


**Table 10.** Minimum values of *r*, at which the result of 3D medical images DWT with Daubechies wavelets reaches high and maximum quality.

**Table 11.** Minimum values of *r*, at which the result of 3D medical images DWT with symlets reaches high and maximum quality.


**Table 12.** Minimum values of *r*, at which the result of 3D medical images DWT with coiflets reaches high and maximum quality.


We could make the following conclusions based on calculation results presented in the Tables 10–12.

1. Minimum bit-width *r* of wavelet filters coefficients at which the result of 3D medical images with 8 BPC DWT does not contain visible distortions (*PSNR* ≥ 40 dB) can be determined by a formula

$$r = 11 + \left\lfloor \sqrt{\frac{k}{2}} \right\rfloor. \tag{6}$$

where *k* is the number of wavelet filters coefficients.

2. Minimum bit-width *r* of wavelet filters coefficients at which the result of 3D medical images with 12 BPC DWT does not contain visible distortions (*PSNR* ≥ 60 dB) can be determined by a formula

$$r = 15 + \left\lfloor \sqrt{\frac{k}{4}} \right\rfloor. \tag{7}$$

3. Minimum bit-width *r* of wavelet filters coefficients at which the result of 3D medical images with 16 BPC DWT does not contain visible distortions (*PSNR* ≥ 80 dB) can be determined by a formula

$$r = 18 + \left\lfloor \sqrt{\frac{k}{3}} \right\rfloor. \tag{8}$$

4. Minimum bit-width *r* of wavelet filters coefficients at which the result of 3D medical images DWT does not contain distortions (*PSNR* = ∞) can be determined by a formula

$$r = 5 + B + \left\lfloor \sqrt{\frac{k}{2} - 1} \right\rfloor \tag{9}$$

where *B* is the image BPC.

Formulas (6)–(9) are an approximate since the values *r* obtained at their use are sometimes redundant, that is, exceed values presented in Tables 10–12. However, they allow one to accurately calculate the non-redundant bit-width of the quantized wavelet filters coefficients in most cases. These formulas are applicable to both grayscale and color images.

#### *3.2. Experiments of the 3D Medical Tomographic Images DWT*

The experiments were conducted using MatLab software version R2018b for the three 3D medical tomographic grayscale images: "wmri" is the 8-bit image of size 128 × 128 × 27; "Trufi\_COR" is the 12-bit image of size 320 × 320 × 30 and "Body\_1.0" is the 16-bit image of size 512 × 512 × 507. These images have the following histograms (Figure 4). The larger the image bitness, the lower its ratio of the average voxel brightness to the maximum allowed. We show the influence of this factor on the image processing quality further.

Images DWT performed as follows: filters coefficients *fF*,*<sup>i</sup>* of the Daubechies wavelets *db*(*k*/2) (*k* = 2, 4, 6, . . . , 20), symlets *sym*(*k*/2) (*k* = 2, 4, 6, . . . , 20) and coiflets *coi f*(*k*/6) (*k* = 6, 12, 18, 24, 30) were obtained, quantized by multiplying by 2*<sup>n</sup>* (*n* = 1, 2, 3, . . . , 25) and rounding up according to Formula (2) and converted to fixed-point format; DWT of 3D images implemented; the voxels brightness values of the restored images were scaled by dividing by 26*<sup>n</sup>* and rounding down according to Formula (3) and converted to fixed-point format.

An example of 3D tomographic images "wmri", "Trufi\_COR" and "Body\_1.0" DWT with wavelet *db*8 is shown in Figures 5–7 respectively. Frames in Figures 6 and 7 are selected to illustrate the error effect on the image processing result. Figures show a gradual improvement in the quality of processing with an increase the bit-width *r*: in Figures 5b, 6b and 7b visible distortion (Figure 5b is darkened in places, and Figures 6b and 7b are lighted); in Figures 5c, 6c and 7c processed images are indistinguishable by eye from the original images; in Figures 5d, 6d and 7d processed images are identical to the corresponding originals. Experimental results are of higher quality compared with the calculation results. The values *PSNR* = 47.11 dB and *PSNR* = ∞ at *r* = 12 and *r* = 15 respectively (Figure 5) obtained after 8-bit image "wmri" DWT with wavelet *db*8 exceed the corresponding calculated values *PSNR* = 37.82 dB and *PSNR* = ∞ at *r* = 12 and *r* = 15 respectively (Table 1). The values *PSNR* = 64.57 dB and *PSNR* = ∞ at *r* = 12 and *r* = 17 respectively (Figure 6) obtained after 12-bit image "Trufi\_COR" DWT with wavelet *db*8 exceed the corresponding calculated values *PSNR* = 36.67 dB, *PSNR* = 67.30 dB at *r* = 12 and *r* = 17 respectively (Table 2). Similarly, for "Body\_1.0".

(**b**)

**Figure 4.** *Cont*.

(**c**)

**Figure 4.** Histograms of used images: (**a**) "wmri", average brightness 63.276; (**b**) "Trufi\_COR", average brightness 129.796 and (**c**) "Body\_1.0", average brightness 21.053.

**Figure 5.** Example of 3D tomographic 8-bit image "wmri" DWT by *db*8 wavelet: (**a**) original image; processed image: (**b**) *r* = 9, *PSNR* = 27.62 dB; (**c**) *r* = 12, *PSNR* = 47.11 dB and (**d**) *r* = 15, *PSNR* = ∞.

**Figure 6.** Example of 3D tomographic 12-bit image "Trufi\_COR" (15-th frame) DWT by *db*8 wavelet: (**a**) original image; processed image: (**b**) *r* = 7, *PSNR* = 30.27 dB; (**c**) *r* = 12, *PSNR* = 64.57 dB and (**d**) *r* = 17, *PSNR* = ∞.

**Figure 7.** Example of 3D tomographic 16-bit image "Body\_1.0" (1-st frame) DWT by *db*8 wavelet: (**a**) original image; processed image: (**b**) *r* = 7, *PSNR* = 64.05 dB; (**c**) *r* = 10, *PSNR* = 85.30 dB and (**d**) *r* = 17, *PSNR* = ∞.

The image processing results were analyzed using *PSNR* and structure similarity (*SSIM*) [52], calculating by formula

$$SSIM(I,\overline{I}) = \frac{\left(2\mu\_I\mu\_{\widetilde{I}} + c\_1\right)\left(2\sigma\_{\widetilde{II}} + c\_2\right)}{\left(\mu\_I^2 + \mu\_{\widetilde{I}}^2 + c\_1\right)\left(\sigma\_I^2 + \sigma\_{\widetilde{I}}^2 + c\_2\right)}.$$

where: µ*<sup>I</sup>* is the average of *<sup>I</sup>*; <sup>µ</sup>e*<sup>I</sup>* is the average of e*I*; σ 2 *I* is the variance of *I*; σ 2 e*I* is the variance of e*I*; *c*<sup>1</sup> = (0.01 · *M*) 2 ; *c*<sup>2</sup> = (0.03 · *M*) 2 and *M* is the maximum brightness of the image voxels. Experimental results (*PSNR*, dB; *SSIM*) of DWT of 3D tomographic grayscale images "wmri" (8-bit), "Trufi\_COR" (12-bit) and "Body\_1.0" (16-bit) for various bit-width *r* and numbers *k* = 2, 4, 6, . . . , 20 of wavelets *db*(*k*/2) filters coefficients are presented in Tables 13–18. The cells in bold correspond to the minimum bit-widths of the filter coefficients, at which the processing quality achieves a high level according to the formula (4).

**Table 13.** Experimental results (*PSNR*, dB) of 3D tomographic 8-bit image "wmri" DWT by using bit-width *r* of Daubechies wavelets filters coefficients.


**Table 14.** Experimental results (*SSIM*) of 3D tomographic 8-bit image "wmri" DWT by using bit-width *r* of Daubechies wavelets filters coefficients.


**Table 15.** Experimental results (*PSNR*, dB) of 3D tomographic 12-bit image "Trufi\_COR" DWT by using bit-width *r* of Daubechies wavelets filters coefficients.



**Table 16.** Experimental results (*SSIM*) of 3D tomographic 12-bit image "Trufi\_COR" DWT by using bit-width *r* of Daubechies wavelets filters coefficients.

**Table 17.** Experimental results (*PSNR*, dB) of 3D tomographic 16-bit image "Body\_1.0" DWT by using bit-width *r* of Daubechies wavelets filters coefficients.


**Table 18.** Experimental results (*SSIM*) of 3D tomographic 16-bit image "Body\_1.0" DWT by using bit-width *r* of Daubechies wavelets filters coefficients.


Experimental results (*PSNR*, dB; *SSIM*) of DWT of 3D tomographic grayscale images "wmri" (8-bit), "Trufi\_COR" (12-bit) and "Body\_1.0" (16-bit) for various bit-width *r* and numbers *k* = 2, 4, 6, . . . , 20 of wavelets *sym*(*k*/2) filters coefficients are presented in Tables 19–24.

**Table 19.** Experimental results (*PSNR*, dB) of 3D tomographic 8-bit image "wmri" DWT by using bit-width *r* of symlets filters coefficients.



**Table 20.** Experimental results (*SSIM*) of 3D tomographic 8-bit image "wmri" DWT by using bit-width *r* of symlets filters coefficients.

**Table 21.** Experimental results (*PSNR*, dB) of 3D tomographic 12-bit image "Trufi\_COR" DWT by using bit-width *r* of symlets filters coefficients.


**Table 22.** Experimental results (*SSIM*) of 3D tomographic 12-bit image "Trufi\_COR" DWT by using bit-width *r* of symlets filters coefficients.


**Table 23.** Experimental results (*PSNR*, dB) of 3D tomographic 16-bit image "Body\_1.0" DWT by using bit-width *r* of symlets filters coefficients.


**Table 24.** Experimental results (*SSIM*) of 3D tomographic 16-bit image "Body\_1.0" DWT by using bit-width *r* of symlets filters coefficients.


Experimental results (*PSNR*, dB; *SSIM*) of DWT of 3D tomographic grayscale images "wmri" (8-bit), "Trufi\_COR" (12-bit) and "Body\_1.0" (16-bit) for various bit-width *r* and numbers *k* = 6, 12, 18, 24, 30 of wavelets *coi f*(*k*/6) filters coefficients are presented in Tables 25–30.


**Table 25.** Experimental results (*PSNR*, dB) of 3D tomographic 8-bit image "wmri" DWT by using bit-width *r* of coiflets filters coefficients.

**Table 26.** Experimental results (*SSIM*) of 3D tomographic 8-bit image "wmri" DWT by using bit-width *r* of coiflets filters coefficients.


**Table 27.** Experimental results (*PSNR*, dB) of 3D tomographic 12-bit image "Trufi\_COR" DWT by using bit-width *r* of coiflets filters coefficients.


**Table 28.** Experimental results (*SSIM*) of 3D tomographic 12-bit image "Trufi\_COR" DWT by using bit-width *r* of coiflets filters coefficients.



**Table 29.** Experimental results (*PSNR*, dB) of 3D tomographic 16-bit image "Body\_1.0" DWT by using bit-width *r* of coiflets filters coefficients.

**Table 30.** Experimental results (*SSIM*) of 3D tomographic 16-bit image "Body\_1.0" DWT by using bit-width *r* of coiflets filters coefficients.


Calculation results from Tables 10–12 supplemented by experimental results from Tables 13–30 and the difference between them is presented in Tables 31–33.

**Table 31.** Minimum values of *r*, at which the result of 3D tomographic images DWT by Daubechies wavelets reaches high and maximum quality.



**Table 32.** Minimum values of *r*, at which the result of 3D tomographic images DWT by symlets reaches high and maximum quality.



Experimental results (*PSNR*, dB) of various 3D tomographic 12-bit grayscale images DWT by wavelet *db*4 with bit-width *r* = 11 of filters coefficients are presented in Table 34 and in Figure 8.


**Table 34.** Experimental results (*PSNR*, dB) of 3D tomographic images DWT by wavelet *db*4 with bit-width *r* = 11 of filters coefficients.

**Figure 8.** Experimental results of 3D tomographic 12-bit images DWT by wavelet *db*4 with bit-width *r* = 11 of filters coefficients.

*A F F* The nonlinear hyperbolic regression [53] curve for the data from Table 34 was plotted in Figure 8 and has the equation *PSNR* = 58.98 + 328.78/*A*, where *A* is the average brightness of the image voxels. The *F*-test value [54] for constructed nonlinear hyperbolic regression curve is *F* = 42.24 actually observed. The *F*-test critical value [55] for false-rejection probability 0.001 with degrees of freedom *k*<sup>1</sup> = *p* − 1 = 2 − 1 = 1 and *k*<sup>2</sup> = *m* − *p* = 17 − 2 = 15 is *F*0.001;1,15 = 16.59, where *p* is the regression equation estimated parameters number and *m* is the images number from Table 34. Since *F* > *F*0.001;1,15 resulting regression equation is significant at false-rejection probability 0.001. Equation asymptote exceeds the corresponding theoretical calculations values.

#### **4. Discussion**

Experimental results, the main of which are presented in Tables 31–33, show that all *PSNR* values obtained as calculation results were not bigger than the *PSNR* values obtained as experimental results. This confirms the accuracy of theoretical analysis. Thus, the derived Formulas (6)–(9) could be used for determining the minimum bit-width of wavelet filters coefficients, at which the result of 3D medical tomographic images DWT reaches high (*PSNR* ≥ 40 dB for images with 8 BPC, *PSNR* ≥ 60 dB for images with 12 BPC and *PSNR* ≥ 80 dB for images with 16 BPC according to Formula (4)) and maximum (*PSNR* = ∞) quality respectively. Tables 13–30 show that *SSIM* values obtained as a calculation result were set to one when using 4 decimal places in simulating 8-, 12- and 16-bit images when the *PSNR* was approximately 45, 65 and 80 dB, respectively. Thus, both *PSNR* and *SSIM* metrics used confirm high-quality image processing. The experiment of 3D 8-bit medical tomographic image DWT required 1–2 bits less for wavelet filters coefficients than the calculations require for high-quality processing since the worst case was predicted in theoretical analysis. An even greater decrease in the bit-width of wavelet filter coefficients led to even greater savings in hardware resources. The difference between the obtained theoretical and experimental values increased significantly in the case of 12-bit and 16-bit images. The 12-bit tomographic image required 4–5 bits and 1–2 bits less for wavelet filters coefficients to achieve high and maximum processing quality respectively. This difference increased to 9–10 and 5–6 bits respectively in the case of a 16-bit image. This is because the range of voxel brightness values significantly increased in 12- and 16-bit images. The average brightness of the image voxels varied insignificantly at this time (was within the 8-bit range) since the high-order bits were rarely used. Thus, the ratio of the average voxel brightness to the maximum allowable value of *M* decreased with increasing BPC of images, which were demonstrated by histograms in Figure 4. This led to much faster achievement of high and maximum quality compared with the theoretical analysis results.

The darkening and lighting in Figures 5–7 were due to the low accuracy of wavelet filters coefficients quantization used for image processing. The excessive character of quantization error led to an increase in the voxels brightness values of the processed images. Figures 6b and 7b turned out to be lighted since 12- and 16-bit images had a brightness margin, which is shown by the histograms in Figure 4b,c. However, the range of brightness values of the 8-bit image was fully utilized (Figure 4a) and the quantization error led to the computational range overflow. The voxels brightness values that exceeded the range went to zero as a result of this.

Table 34 and Figure 8 show the dependence of the 12-bit medical tomographic images processing quality of their average voxels brightness. This dependence had a nonlinear hyperbolic regression form. Equation asymptote exceeded the corresponding theoretical calculation values. The processing quality by *PSNR* metric (from 74.57 to 58.39) decreased with an increase in the average voxels brightness (from 16.89 to 187.42). The difference in the image processing quality with the minimum and maximum values of the average brightness according to Table 34 was more than 15 dB. It was commensurate with the difference in the processing quality of the same image by the same wavelet with filter coefficients bit-width that differ by two, according to Tables 15, 21 and 27. That is, we would need 2 bits less for wavelet filter coefficients for high-quality processing of a 12-bit image with an average brightness of 16.89 than for processing a 12-bit image with an average brightness of 187.42. The average voxels brightness of the medical image can vary in different ranges depending on many factors: from the medical image modalities; from the analyzing device type; from specific device settings; from the analyzed organ or group of organs; etc. Thus, the requirements for the digit capacity of wavelet filter coefficients can be relaxed, depending on the ability to take into account many factors related to the nature of the images obtained as a result of medical tests. Summarizing, the quality of 3D medical tomographic images DWT primarily depends on their bits per color, on average voxels brightness, on the number of wavelet filters coefficients and to a lesser extent on the type of wavelet.

Minimum bit-width *r* of wavelet filters coefficients for 3D medical tomographic images DWT is defined as follows: determine BPC of images (for example, 8, 12 or 16 BPC); select a quality threshold of image processing (for example, *PSNR* = 40 dB, *PSNR* = 60 dB, *PSNR* = 80 dB or *PSNR* = ∞); choose the wavelet with the number of coefficients *k*; calculate bit-width *r* of wavelet filters coefficients by Formulas (5)–(9) depending on the quality threshold of image processing selected.

#### **5. Conclusions**

The problem of analyzing the quantization noise effect in coefficients of DWT filters for 3D medical imaging was solved. The method was proposed for wavelet filters coefficients quantizing, which allows minimizing resources in hardware implementation. The method was developed for estimating the maximum error of 3D grayscale and color images DWT with various BPC. The derived Formula (5) allows determining the minimum quality of 3D medical images DWT depending on the wavelet used, bit-width of wavelet filters coefficients and BPC. We proved that Formulas (6)–(9) can be used to determine the minimum bit-width of wavelet filters coefficients for which the result of 3D images DWT reaches high (*PSNR* ≥ 40 dB for images with 8 BPC, *PSNR* ≥ 60 dB for images with 12 BPC and *PSNR* ≥ 80 dB for images with 16 BPC) and maximum (*PSNR* = ∞) quality respectively depending on the wavelet used. The experiments of the 3D tomographic images DWT showed that the bit-width of wavelet filters coefficients could be significantly reduced for high-quality medical imaging compared to theoretical analysis results. All data were presented in a fixed-point format and rounding operations were simplified in the proposed method of 3D images DWT.

The proposed DWT method could be used in a wide range of applications for denoising and compression of 3D medical images. Given the need to improve the efficiency of medical visual data processing methods, further research can be expected in this direction.

**Author Contributions:** Conceptualization, P.L.; Data curation, P.L.; Formal analysis, N.C.; Investigation, N.N. and P.L.; Methodology, N.C.; Project administration, N.C.; Resources, N.C.; Software, N.N.; Supervision, N.C.; Validation, N.C.; Visualization, N.N.; Writing—original draft, N.N. and P.L.; Writing, review & editing, N.N., P.L. and N.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Russian Foundation for Basic Research (RFBR), grants numbers 19-07-00130 A and 18-37-20059 mol-a-ved, and the Council on grants of the President of the Russian Federation, grant number SP-2245.2018.5.

**Acknowledgments:** We are thankful to the Stavropol Regional Clinical Advisory and Diagnostic Center for providing tomographic images.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Article*

### **Maximum Correntropy Criterion Based** *l***1-Iterative Wiener Filter for Sparse Channel Estimation Robust to Impulsive Noise**

#### **Junseok Lim**

Department of Electrical Engineering, Sejong University, Seoul 143-747, Korea; jslim@sejong.ac.kr; Tel.: +82-2-3408-3299

Received: 20 December 2019; Accepted: 20 January 2020; Published: 21 January 2020

**Abstract:** In this paper, we propose a new sparse channel estimator robust to impulsive noise environments. For this kind of estimator, the convex regularized recursive maximum correntropy (CR-RMC) algorithm has been proposed. However, this method requires information about the true sparse channel to find the regularization coefficient for the convex regularization penalty term. In addition, the CR-RMC has a numerical instability in the finite-precision cases that is linked to the inversion of the auto-covariance matrix. We propose a new method for sparse channel estimation robust to impulsive noise environments using an iterative Wiener filter. The proposed algorithm does not need information about the true sparse channel to obtain the regularization coefficient for the convex regularization penalty term. It is also numerically more robust, because it does not require the inverse of the auto-covariance matrix.

**Keywords:** mathematical models of digital signal processing; digital filtering; maximum correntropy; impulsive noise; sparse channel estimation

#### **1. Introduction**

In many signal processing applications [1–4], we find various sparse channels in which most of the impulse responses are close to zero and only some of them are large. In recent years, many kinds of sparse adaptive filtering algorithms have been proposed for sparse system estimation, including recursive least squares (RLS)-based [5–9] and least mean square (LMS)-based algorithms [10–14]. It is generally known that RLS-based algorithms have faster convergence and less error after convergence than LMS-based algorithms [15]. However, there are fewer RLS-based than LMS-based algorithms. Among these, the convex regularized recursive least squares (CR-RLS) proposed by Eksioglu [6] is a full recursive convex regularized RLS like a typical RLS.

While the aforementioned algorithms typically show good performance in a Gaussian noise environment, their performance deteriorates in a nonGaussian noise environment such as an impulsive noise environment. Recently, the maximum correntropy criterion (MCC) [16–19] has been successfully applied to various adaptive algorithms robust to impulsive noise. Current studies in robust sparse adaptive methods have resulted in the development of CR-RLS-based algorithms with MCC [20,21], and showed strong robustness under impulsive noise. However, CR-RLS used in [20,21] is not practical when determining the regularization coefficient for the sparse regularization term because CR-RLS [6] needs information about the true channel when calculating the regularization coefficients. In addition, MCC CR-RLS algorithms (so called convex regularized recursive maximum correntropy (CR-RMC)) [20,21] include the inversion of the auto-covariance matrix, which is linked to the numerical instability in finite-precision environments [15].

The recursive inverse (RI) algorithm [22,23] and the iterative Wiener filter (IWF) algorithm [24] have recently been proposed. RI and IWF have the same structure besides a step size calculation. They perform similarly to the conventional RLS algorithm in terms of convergence and mean squared error, without using the inverse of the auto-covariance matrix. Therefore, RI [22,23] and IWF [24] can be considered algorithms without the numerical instability of RLS.

This paper proposes a sparse channel estimation algorithm robust to impulse noise using IWF and maximum correntropy criterion with *l1*-norm regularization. The proposed algorithm includes a new regularization coefficient calculation method for *l1*-norm regularization that does not require information about true channels. In addition, the proposed algorithm has numerical stability because it does not include inverse matrix calculation.

In Section 2 of this paper, we derive the new algorithm using IWF. In Section 3, we provide simulation results that show the performance of the proposed algorithm. In Section 4, we note our conclusions.

#### **2. MCC** *l1***-IWF Formulation**

In the channel estimation problem, we assume that at time instant n the observed signal *y*(*n*) is the result of the input signal *x*(*k*) sequence passing through the system **w***<sup>o</sup>* = [*w*0, · · · , *wM*−1] *T* in the M-dimensional finite impulse response (FIR) format. Especially, in the sparse channel estimation problem, we assume that the system response w is sparse.

In the adaptive channel estimation, we apply an M dimensional channel **w**(*k*) to the same dimensional signal vector **x**(*k*), estimate an output *y*ˆ(*k*) = **x** *T* (*k*)**w**(*k*), and calculate the error signal *<sup>e</sup>*(*k*) = *<sup>y</sup>*(*k*) + *<sup>n</sup>*(*k*) <sup>−</sup> *<sup>y</sup>*ˆ(*k*) = <sup>e</sup>*y*(*k*) <sup>−</sup> *<sup>y</sup>*ˆ(*k*), where *<sup>y</sup>*(*k*) is the output of the actual system, *<sup>y</sup>*ˆ(*k*) is the estimated output, and *n*(*k*) is the measurement noise. Especially, the measurement noise is nonGaussian.

To estimate the channel in nonGaussian noise, we define an MCC cost function with exponential forgetting factor λ shown in (1) [20,21] and minimize it adaptively.

$$\underset{\boldsymbol{\Psi}(n)}{\text{minimize}} \left\{ \sum\_{m=0}^{n} \lambda^{n-m} \exp \left( -\frac{\left(\boldsymbol{y}(m) - \boldsymbol{\Psi}(n)^{T}\mathbf{x}(m)\right)^{2}}{2\sigma^{2}} \right) \text{s.t.} \left\|\boldsymbol{\hat{\mathbf{w}}}(n)\right\|\_{1} \leq c \right\} \tag{1}$$

where **w**ˆ (*n*) = [*w*ˆ <sup>0</sup>(*n*), · · · , *w*ˆ *<sup>M</sup>*−1(*n*)] *T* , **x**(*m*) = [*x*(*m*), *x*(*m* − 1), · · · , *x*(*m* − *M* + 1)] *T* , λ is a forgetting factor, and **<sup>w</sup>**<sup>ˆ</sup> (*n*) 1 , P*M*−1 *k* = 0 *w*ˆ *k* (*n*) . The Lagrangian for (1) becomes

$$J(\hat{\mathbf{w}}(n), \boldsymbol{\gamma}(n)) \;=\; \zeta(\hat{\mathbf{w}}(n)) + \boldsymbol{\gamma}(n) \langle \left\| \hat{\mathbf{w}}(n) \right\|\_1 - c \right\} \tag{2}$$

where ζ(**w**ˆ (*n*)) = P*n m* = 0 λ *<sup>n</sup>*−*<sup>m</sup>* exp − (*y*(*m*)−**w**<sup>ˆ</sup> (*n*) *T* **x**(*m*)) 2 2σ 2 ! , and γ(*n*) is a real-valued Lagrangian multiplier. We minimize the regularized cost function to find the optimal vector in the same way that IWF was derived [24].

The regularized cost function is convex and nondifferentiable; therefore, subgradient analysis replaces the gradient. When denoting a subgradient vector of *f* at **w**ˆ with ∇ *s f*(**w**ˆ ), the subgradient vector of *J*(**w**ˆ (*n*), γ(*n*)) with respect to **w**ˆ (*n*) can be written as follows:

$$\nabla^{\sf s} f(\mathbf{\hat{w}}(n), \mathbf{\hat{y}}(n)) = \nabla \zeta(\mathbf{\hat{w}}(n)) + \mathbf{\hat{y}}(n) \nabla^{\sf s} \{\left\| \mathbf{\hat{w}}(n) \right\|\_{1} \}.\tag{3}$$

Hence, for the optimal **w**ˆ (*n*) minimizing *J*(**w**ˆ (*n*), γ(*n*)), we set the subgradient of *J*(**w**ˆ (*n*), γ(*n*)) to 0 at the optimal point. When evaluating the gradient ∇ζ(**w**ˆ (*n*)), we can derive a gradient vector as (4).

$$\nabla^{\mathbb{S}}[(\hat{\mathbf{w}}(n), \boldsymbol{\gamma}(n)) \ = \frac{1}{\sigma^2} (\boldsymbol{\Phi}(n)\hat{\mathbf{w}}(n) - \mathbf{r}(n)) + \boldsymbol{\gamma}(n)\text{sgn}(\hat{\mathbf{w}}(n)) \ = \mathbf{g}\_{\mathbf{n}'} \tag{4}$$

$$\begin{aligned} \text{where } e(n) &= y(n) - \hat{\mathbf{w}}(n)^T \mathbf{x}(n), \boldsymbol{\Phi}(n) = \sum\_{m=0}^{n} \lambda^{n-m} \mathbf{x}(m) \mathbf{x}(m)^T = \lambda \boldsymbol{\Phi}(n-1) + \exp\left(-\frac{\varepsilon \left|n\right|^2}{2\sigma^2}\right) \mathbf{x}(n) \mathbf{x}(n)^T, \boldsymbol{\Phi}(n) = \sum\_{m=0}^{n} \lambda^{n-m} \mathbf{x}(m) \mathbf{x}(m)^T \mathbf{x}(m)^T \mathbf{x}(m)^T \boldsymbol{\Phi}(n) \\ \mathbf{x}(n) &= \sum\_{m=0}^{n} \lambda^{n-m} y(m) \mathbf{x}(m) = \lambda \mathbf{r}(n-1) + \exp\left(-\frac{\varepsilon \left|n\right|^2}{2\sigma^2}\right) y(n) \mathbf{x}(n), \text{and } \nabla^2 \left(\left\|\hat{\mathbf{w}}(n)\right\|\_{1}\right) = \text{sgn}(\hat{\mathbf{w}}(n)) \text{ [6].}\\ \text{Using Equation (4), we can obtain the update expression for } \hat{\mathbf{w}}(n) \text{ as (5).} \end{aligned}$$

$$\hat{\mathbf{w}}(n+1) \;= \; \hat{\mathbf{w}}(n) - \mu\_n \nabla^{\mathbf{s}} f(\hat{\mathbf{w}}(n), \boldsymbol{\gamma}(n)) \;= \; \hat{\mathbf{w}}(n) - \mu\_n \mathbf{g}\_n \tag{5}$$

To get the step size µ*n*, we find the µ*<sup>n</sup>* that minimizes exponentially averaged *a posteriori* error energy, *J*(**w**ˆ (*n* + 1), γ(*n*)), where *a posteriori* error is *e*(*n*) = *y*(*n*) − **w**ˆ (*n* + 1) *T* **x**(*n*).

$$\begin{array}{rcl} \nabla\_{\mu} \boldsymbol{I}(\boldsymbol{\Psi}(n+1), \boldsymbol{\gamma}(n)) &=& -\frac{1}{\sigma^{2}} \boldsymbol{\Psi}(n+1)^{T} \boldsymbol{\Phi}(n) \mathbf{g}\_{n} + \frac{1}{\sigma^{2}} \mathbf{r}(n)^{T} \mathbf{g}\_{n} - \boldsymbol{\gamma}(n) \nabla^{\boldsymbol{\mathfrak{s}}} \left( \left\| \boldsymbol{\Psi}(n+1) \right\|\_{1} \right)^{T} \mathbf{g}\_{n} \\ & \cong -\frac{1}{\sigma^{2}} \boldsymbol{\mathfrak{w}}(n+1)^{T} \boldsymbol{\Phi}(n) \mathbf{g}\_{n} + \frac{1}{\sigma^{2}} \boldsymbol{\mathfrak{r}}(n)^{T} \mathbf{g}\_{n} - \boldsymbol{\gamma}(n) \nabla^{\boldsymbol{\mathfrak{s}}} \left( \left\| \boldsymbol{\Psi}(n) \right\|\_{1} \right)^{T} \mathbf{g}\_{n} \\ &= -\frac{1}{\sigma^{2}} \boldsymbol{\mathfrak{w}}(n+1)^{T} \boldsymbol{\Phi}(n) \mathbf{g}\_{n} + \frac{1}{\sigma^{2}} \boldsymbol{\mathfrak{r}}(n)^{T} \mathbf{g}\_{n} - \boldsymbol{\gamma}(n) \boldsymbol{\mathfrak{s}} \mathbf{g} \mathbf{n} \left( \boldsymbol{\mathfrak{w}}(n) \right)^{T} \mathbf{g}\_{n}. \end{array} \tag{6}$$

Substituting Equation (5) into Equation (6), we get

$$\nabla\_{\boldsymbol{\mu}} \boldsymbol{J}(\boldsymbol{\hat{w}}(\boldsymbol{n}+1), \boldsymbol{\gamma}(\boldsymbol{n})) = -\frac{1}{\sigma^{2}} \boldsymbol{\hat{w}}(\boldsymbol{n})^{\mathrm{T}} \boldsymbol{\Phi}(\boldsymbol{n}) \mathbf{g}\_{\boldsymbol{n}} + \frac{1}{\sigma^{2}} \mu\_{\boldsymbol{\text{n}}} \mathbf{g}\_{\boldsymbol{n}}^{\mathrm{T}} \boldsymbol{\Phi}(\boldsymbol{n}) \mathbf{g}\_{\boldsymbol{n}} + \frac{1}{\sigma^{2}} \boldsymbol{\mathbf{r}}(\boldsymbol{n})^{\mathrm{T}} \mathbf{g}\_{\boldsymbol{n}} - \boldsymbol{\gamma}(\boldsymbol{n}) \boldsymbol{\text{sgn}}(\boldsymbol{\hat{w}}(\boldsymbol{n}))^{\mathrm{T}} \mathbf{g}\_{\boldsymbol{n}}.\tag{7}$$

To find µ*n*, we set ∇µ*J*(**w**ˆ (*n*), γ(*n*)) = 0, and

$$\mu\_n = \frac{\frac{1}{\sigma^2} \Big(\hat{\mathbf{w}}(n)^T \boldsymbol{\Phi}(n) - \mathbf{r}(n)^T\Big) \mathbf{g}\_n + \gamma(n) \text{sgn}(\hat{\mathbf{w}}(n))^T \mathbf{g}\_n}{\frac{1}{\sigma^2} \mathbf{g}\_n^T \boldsymbol{\Phi}(n) \mathbf{g}\_n} = \sigma^2 \frac{\mathbf{g}\_n^T \mathbf{g}\_n}{\mathbf{g}\_n^T \boldsymbol{\Phi}(n) \mathbf{g}\_n} \tag{8}$$

We have to derive regularization coefficient γˆ(*n*) such that **<sup>w</sup>**<sup>ˆ</sup> (*<sup>n</sup>* <sup>+</sup> <sup>1</sup>) 1 = *c*, i.e., the *l*1-norm of vector **w**ˆ (*n* + 1) is preserved at all time steps of *n*. This can be represented by a flow equation in continuous time-domain in [25].

$$\frac{\partial \left\| \hat{\mathbf{w}}(t) \right\|\_{1}}{\partial t} = \left( \frac{\partial \left\| \hat{\mathbf{w}}(t) \right\|\_{1}}{\partial \hat{\mathbf{w}}} \right)^{T} \frac{\partial \hat{\mathbf{w}}}{\partial t} = \left( \nabla^{\mathbb{S}} \left\| \hat{\mathbf{w}}(t) \right\|\_{1} \right)^{T} \frac{\partial \hat{\mathbf{w}}}{\partial t} = \left. \mathbf{0} \right. \tag{9}$$

Using a sufficiently small interval δ, the time derivative in (9) can be approximated as

$$\mathbb{E}\left(\left\|\mathbf{\hat{v}}[\:\mathbb{W}(t)\right\|\_{1}\right)^{\mathrm{T}}\frac{\partial\mathbf{\hat{w}}}{\partial t} \cong \left(\mathbb{V}^{\mathrm{s}}\left\|\:\mathbb{W}(n)\right\|\_{1}\right)^{\mathrm{T}}\frac{\left(\hat{\mathbf{w}}(n+1) - \hat{\mathbf{w}}(n)\right)}{\delta} = \left.\mathbf{0}.\tag{10}$$

Using (5) and (6), (10) becomes

$$\text{sgn}(\hat{\mathbf{w}}(n))^T(\hat{\mathbf{w}}(n+1) - \hat{\mathbf{w}}(n)) \ = \text{sgn}(\hat{\mathbf{w}}(n))^T(-\mu\_n \mathbf{g}\_n) \ = \text{0.} \tag{11}$$

and

$$\operatorname{sgn}(\hat{\mathbf{w}}(n))^T \Big( \frac{1}{\sigma^2} \Phi(n) \mathbf{w}(n) - \frac{1}{\sigma^2} \mathbf{r}(n) + \gamma(n) \operatorname{sgn}(\hat{\mathbf{w}}(n)) \Big) = 0. \tag{12}$$

The regularization coefficient γˆ(*n*) obtained from Equation (12) is as follows.

$$\gamma(n) = -\frac{\text{sgn}(\mathbf{\hat{w}}(n))^T (\boldsymbol{\Phi}(n)\mathbf{w}(n) - \mathbf{r}(n))}{\sigma^2 \text{sgn}(\mathbf{\hat{w}}(n))^T \text{sgn}(\mathbf{\hat{w}}(n))}.\tag{13}$$

On the contrary, CR-RMC algorithm in [20] uses the same regularization coefficient as that in [6]. The regularization coefficient is shown in (14)

$$\hat{\gamma}(n) = 2 \frac{\frac{tr\{\boldsymbol{\Phi}^{-1}(n)\}}{M} \{\left\|\hat{\mathbf{w}}(n)\right\|\_{1} - \rho\} + + \text{sgn}(\hat{\mathbf{w}}(n))^T \boldsymbol{\Phi}^{-1}(n) \boldsymbol{\varepsilon}(n)}{\sigma^2 \left\|\boldsymbol{\Phi}^{-1}(n) \text{sgn}(\hat{\mathbf{w}}(n))\right\|\_{2}^2},\tag{14}$$

where <sup>ε</sup>(*n*) = **<sup>w</sup>**e(*n*) <sup>−</sup> **<sup>w</sup>**<sup>ˆ</sup> (*n*) and **<sup>w</sup>**e(*n*) is the solution to the normal equation, **<sup>Φ</sup>**(*n*)**w**e(*n*) = **<sup>r</sup>**(*n*). In (14), the regularization coefficient has the parameter, ρ. In [6] and [20], the parameter was set as ρ = *f*(**w***true*) = k**w***true*k<sup>1</sup> , with **w***true* indicating the impulse response of the true channel. There was no further discussion about how to set ρ. We summarize the algorithm in Table 1.



#### **3. Simulation Results**

In this section, we compare the sparse channel estimation performance between the proposed algorithm and the convex regularized recursive maximum correntropy (CR-RMC) [20]. In addition, the numerical robustness of the proposed algorithm is compared with that of CR-RMC in the finite-precision environments.

#### *3.1. Estimation of Sparse Channels*

In this experiment, we showed the sparse system estimation results. The simulation was performed under the same experimental conditions in [6]. The true system parameter **w***<sup>o</sup>* had an order of M = 64. Out of the 64 coefficients, there were S nonzero coefficients. The nonzero coefficients were placed randomly, and the values of the coefficients were drawn from a *N*(0, 1/S) distribution. The impulsive noise is generated according to the Gaussian mixture model [26]

$$p\_v = (1 - p\_r) \mathcal{N}(0, \sigma\_1^2) + p\_r \mathcal{N}(0, \sigma\_2^2) \tag{15}$$

where *N* 0, σ 2 *i* (*i* = 1, 2) denote the Gaussian distribution with zero-mean and variance σ 2 *i* . The *p<sup>r</sup>* denotes the occurrence probability of the Gaussian distribution with variance σ 2 2 , which usually is much larger than σ 2 1 so as to generate the impulsive noises. The zero-mean Gaussian distribution with variance σ 2 1 generated the background noise, and the zero-mean Gaussian distribution with variance σ 2 2 (usually σ 2 <sup>2</sup> ≫ σ 2 1 ) generated the impulsive noise with the probability *pr* . In this experiment, we set the variance of σ 2 1 to 0.01 and generate the input signal so that SNR keeps 20dB. The other parameters were set as σ 2 <sup>2</sup> = 500 and *p<sup>r</sup>* = 0.01.

We compare CR-RMC [20] using the true system response information and the proposed algorithm using a regularization factor that did not use the true system response. It also included the results of the MCC-RLS [27] and the conventional RLS without considering impulsive noise and sparsity. For the performance evaluation, we simulated the algorithms in the sparse impulse response for S = 4, 8, 16, 32.

Figure 1 illustrates the mean standard deviation (MSD) curves. The results show that the estimation performance of the proposed algorithm is similar to that of CR-RMC using the regularization factor referring to the true system impulse response. As expected, the conventional RLS produced the worst MSD in all cases.

**Figure 1.** Steady state MSD for S <sup>=</sup> 4, 8, 16, 32 (--: the proposed algorithm, -◦-: convex regularized recursive maximum correntropy (CR-RMC), -⋄-: maximum correntropy criterion (MCC)- recursive least squares (RLS), solid line: conventional RLS without considering impulsive noise and sparsity): (**a**) S = 4, (**b**) S = 8, (**c**) S = 16, (**d**) S = 32.

Figure 1 confirms that, without *a priori* information about the true system impulse response, the proposed regularization factor works similarly to that of the regularization factor in CR-RMC using the true system impulse response information.

#### *3.2. Numerical Robustness Experiment*

In this experiment, we showed the proposed algorithm to be numerically more robust than CR-RMC in the finite-precision environments. We performed channel estimation with finite precision by quantization to show the numerical robustness [28]. The round-off error from the quantization with finite bits was accumulated and propagated through the inverse matrix operation of **Φ**(*n*), and, finally, explosive divergence occurred [15,28]. To illustrate this, we repeated numerical stability experiments, decreasing the quantization bit from 32 bits by 1 bit to find the quantization bits that started numerical instability in each algorithm while comparing and verifying the performance for the case of S = 4 and S = 16. In addition, the rest of the setup for the experiment was the same as Experiment 3.1.

Figure 2 shows the results of comparing the performance of the proposed algorithm and CR-RMC in terms of MSD with different numbers of quantization bits. Figure 2a,b shows the results when quantized to 32 bits. In this case, we can observe that the proposed algorithm as well as CR-RMC converges normally as Figure 1a,b. Figure 2 shows the results of comparing the performance of the proposed algorithm and CR-RMC in terms of MSD with different numbers of quantization bits. Figure 2a,b shows the results when quantized to 32 bits. In this case, we can observe that the proposed algorithm as well as CR-RMC converges normally as Figure 1a,b. Figure 2c,d shows the quantization results for 16 bits. In 16 bits, CR-RMC started numerical instability. Compared with the results of Figure 2a,b, it can be observed that quantized CR-RMC diverges due to the cumulative effect of the error

of quantization noise. Figure 2e,f shows the quantization results for 11 bits. In 11 bits, the proposed algorithm also started numerical instability. If we consider the level of quantization error with signal to quantization noise ratio (SQNR), SQNR (dB) = 1.76 + 6.02 × bits [29], CR-RMC is stable at above 98.08 dB in SQNR. The proposed algorithm is stable at above 67.98 dB in SQNR. In other word, the proposed algorithm has 30.1 dB gain in numerical stability compared to CR-RMC.

**Figure 2.** Results of numerical robustness experiment (--: the proposed algorithm, -x-: CR-RMC): (**a**) S = 4 case quantized by 32 bits; (**b**) S = 16 case quantized by 32 bits; (**c**) S = 4 case quantized by 16 bits; (**d**) S = 16 case quantized by 16; (**e**) S = 4 case quantized by 11 bits; and (**f**) S = 16 case quantized by 11 bits.

The experimental results confirm that the proposed algorithm is numerically more robust than CR-RMC.

#### **4. Conclusions**

In this paper, this paper have proposed a sparse channel estimation algorithm robust to impulse noise using IWF and MCC with *l1*-norm regularization. The proposed algorithm includes a regularization factor calculation algorithm without any requirement for *a priori* knowledge about the true system response. The simulation results show that the proposed algorithm works similarly to the CR-RMC algorithm with a regularization factor referring to the true system response information. In addition, simulation results show that the proposed algorithm is more robust against numerical error than the CR-RMC algorithm.

**Funding:** This research received no external funding.

**Acknowledgments:** This paper was supported by the Agency for Defense Development (ADD) in Korea (UD190005DD).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **Development of Classification Algorithms for the Detection of Postures Using Non-Marker-Based Motion Capture Systems**

**Tatiana Klishkovskaia 1,\*, Andrey Aksenov 1,2 , Aleksandr Sinitca <sup>3</sup> , Anna Zamansky <sup>4</sup> , Oleg A. Markelov <sup>5</sup> and Dmitry Kaplun <sup>3</sup>**


Received: 14 May 2020; Accepted: 7 June 2020; Published: 10 June 2020

**Abstract:** The rapid development of algorithms for skeletal postural detection with relatively inexpensive contactless systems and cameras opens up the possibility of monitoring and assessing the health and wellbeing of humans. However, the evaluation and confirmation of posture classifications are still needed. The purpose of this study was therefore to develop a simple algorithm for the automatic classification of human posture detection. The most affordable solution for this project was through using a Kinect V2, enabling the identification of 25 joints, so as to record movements and postures for data analysis. A total of 10 subjects volunteered for this study. Three algorithms were developed for the classification of different postures in Matlab. These were based on a total error of vector lengths, a total error of angles, multiplication of these two parameters and the simultaneous analysis of the first and second parameters. A base of 13 exercises was then created to test the recognition of postures by the algorithm and analyze subject performance. The best results for posture classification were shown by the second algorithm, with an accuracy of 94.9%. The average degree of correctness of the exercises among the 10 participants was 94.2% (SD1.8%). It was shown that the proposed algorithms provide the same accuracy as that obtained from machine learning-based algorithms and algorithms with neural networks, but have less computational complexity and do not need resources for training. The algorithms developed and evaluated in this study have demonstrated a reasonable level of accuracy, and could potentially form the basis for developing a low-cost system for the remote monitoring of humans.

**Keywords:** posture classification; skeleton detection; motion capture; exercise classification; virtual rehabilitation

#### **1. Introduction**

Demographic ageing in humans means that to date, 12% of the global population are aged over 60 years, and this number is likely to double within a few decades [1]. Ageing leads to a higher prevalence of complications that may benefit from exercise therapy. Such an increase in ageing will mean that the rapid development of science and medicine, as well as the introduction of new technologies and methodologies utilized by health systems, will be needed. Increased knowledge

has been gained regarding new treatment regimes for a growing number of chronic diseases and traumas, but with consequential increases in social and economic costs [2]. It is well-known that rehabilitation forms an important part of a typical overall treatment plan, which can be delivered, for instance, by utilizing therapeutic exercise (physiotherapy). The performance of physical activity has many advantages in older people with dementia, and can positively affect the preservation of cognitive abilities [3]. Stroke patents may also benefit from physical activities, which can result in improved recovery rates.

However, the success of rehabilitation largely depends on keeping the patient interested and motivated in the continuation of treatment. Factors influencing adherence to the continuation of physical education depend on whether people continue to receive professional assistance and counselling after the completion of the initial training [4]. Among the main reasons for the termination of continued professional assistance and counselling are forgetfulness, a lack of further supervision and motivation, and time restraints (for example: attending the rehabilitation center).

The use of exercise therapy delivered remotely using posture recognition and interactive content may have a positive impact on enabling patients to perform exercise, as well as their willingness to continue training and rehabilitation programs [5].

Events such as the recent Covid-19 pandemic reinforce the need for remote exercise therapy with feedback from a doctor, which would be very beneficial for many patients with different disabilities.

Traditionally, exercise therapy consists of demonstrating exercises, observation and evaluation by a health professional, which in turn requires special training and significant face-to-face contact with a patient. However, modern computer and sensor technologies could be utilized to augment (or where appropriate, replace) direct intervention by health professionals. Such technologies that can capture specific postures will be able to determine whether or not the exercise regimes provided to the patient are proving the beneficial postural changes over time, with reference to those obtained from healthy adults. With the capabilities of motion capture systems advancing significantly in recent years, and with motion capture systems being more accessible and effective, they allow the kinematics of the human body to be measured and recorded with sufficient accuracy in real time, even using web cameras.

Two main types of motion capture systems are widely used: those which use markers, and those which estimate joint and limb segment parameters based on neural network training from marker systems. The first requires use of a special suit, or a removable system of sensors (active or passive markers) attached to the human body. The second type, such as those provided by Microsoft Kinect, Intel RealSense, Structure Core and others, use color and depth data, as well as image recognition algorithms, to retrieve the data. These systems can record kinematic data and perform analysis of the human body's movements in real time.

In addition, the development and availability of these sensors opens more opportunities, as it makes it possible to create bespoke courses of rehabilitation, and to monitor their implementation [6–11]. Similar applications have been developed for different patient groups, but the most widely represented software has been designed for post-stroke patients [12–16]. Software has also been designed for people with neurological diseases [17], including cerebral palsy [18], multiple sclerosis [19] and traumatic brain injuries [20].

However, the algorithms used by these systems to estimate the accuracy of execution of movements by such patients are not fully described in the literature. Two of those algorithms can, however, be distinguished by their differing mode of operation. The first is based on the use of dynamic time warping (DTW), along with fuzzy logic [7], and the other is based on the recognition of different body segment postures and trajectories [21]. However, the use of a home-based system, using virtual rehabilitation and offering the possibility of communication with a doctor, is more convenient for the patient, and also allows the course of rehabilitation to be altered by adding new exercises, if necessary. DTW is, however, difficult to apply when compared to posture estimation algorithms. Anton et al. utilized the recognition of postures together with trajectories, which resulted in an accuracy of posture estimation of 91.9%, and detection of movements of 95.16% [21].

Recent advances in machine learning have led to the use of machine learning algorithms in many studies, including posture classification [22,23]. The objective of these studies is to classify the sitting postures via conventional algorithms and deep learning-based algorithms using the body pressure distribution data from pressure sensors [22]. After classifying the sitting postures using several classifiers, average and maximum classification rates of 97.20% and 97.94%, respectively, were obtained from nine subjects with a support vector machine using the radial basis function kernel. Through a comparison of the application of the convolutional neural network (CNN) and conventional machine learning algorithms, the effectiveness of an approach [23] wherein the CNN algorithm is applied was shown (average value of accuracy = 0.953). However, machine learning-based algorithms have problems with a computational complexity that lead to an inability of real-time implementation (in reference [22], the authors stressed this point) and the need for resources for training.

These examples of previous research in the use of posture recognition algorithms provide strong arguments for the continued research and development of such algorithms.

The aim of this research was to develop simpler and more efficient identification algorithms for posture and exercise classification within healthy participants, as well as to evaluate these using Kinect V2. The main contributions of our work can be summarized as follows. Three algorithms for the classification of different postures were developed and evaluated. The effectiveness of these algorithms was based on a total error of vector lengths and a total error of angles, and the multiplication of these two parameters was proved. To compare the effectiveness of classification algorithms, a database was created from the descriptions of the 573 known postures, as well as 903 postures which were not related to them. It was shown that the algorithms presented in this study were demonstrated to be reasonably accurate, and could potentially form the basis for developing a simple system for the remote monitoring of rehabilitation involving exercise therapy.

The remainder of this paper is organized as follows. In Section 2, we describe the Microsoft Kinect V2-based approach to the automatic classification of human exercise movement and present three algorithms for posture classifications. In Section 3, we compare the effectiveness of the three developed classification algorithms by means of a database that was created from the descriptions of the 573 known postures and 903 postures which were not correctly performed. In Section 4, we discuss the results and how they can be interpreted from the perspective of previous studies, and of the working hypotheses. Future research directions also are highlighted. Finally, we present the conclusions in Section 5.

#### **2. Materials and Methods**

#### *2.1. Participants*

Ten healthy young adults (mean ± standard deviation age: 23.4 ± 4.1 years; six males with body mass: 72.7 ± 4.7 kg and height: 179.7 ± 4.2 cm; four females with body mass: 51.5 ± 2.6 kg and height: 163.3 ± 2.8 cm) participated in forming the exercise database. A healthy male (age 35, weight 75 kg and height 184 cm) and a healthy female (age 23, weight 50 kg and height 165 cm) were used to form the independent reference posture database. This research was completed as part of the state project of the Ministry of Health of Russia and was approved by the Ethics Committee of the Ilizarov Scientific Center for Restorative Traumatology and Orthopaedics (17 May 2018, protocol No.2(57)). All participants read the information sheet before the experiment. Written informed consent was obtained from all the participants.

#### *2.2. Posture Description*

ݖ ݕ ,ݔ

A 3D Sensor (Microsoft Kinect V2) was used to record movement, as it is able to recognize different subjects, track their movement and create a skeleton comprising 25 points (Figure 1), which may be described by three-dimensional coordinates (i.e., by using X, Y and Z planes of motion).

**Figure 1.** Diagram of connection of points received from the sensor.

Any movement consists of a series of postures. Eighteen joints were used to describe a posture in a series of volunteer subjects. It was decided to exclude joints such as those numbered 16, 20, 21, 22, 23, 24 and 25 (Figure 1) from algorithms, as they demonstrated high inconsistency in tracking accuracy. A total of 40 parameters were therefore calculated, based on 18 points: 17 were vector lengths (Table 1) and 23 were angles. However, each algorithm used a different number of parameters, as described in Section 2.3.

**Table 1.** Vector lengths used for the algorithm, where numbers represent the joint as shown in Figure 1.


The vector lengths were calculated relative to a position on the centerline of the torso (see point "2", Figure 1), as it had minimal errors in tracking. As each subject had a different body shape, this meant lengths between joints were not consistent, and it was therefore decided to normalize them using the participants' heights using the following formula [24]

$$D\_{\text{vector}} = \sqrt{\frac{\left(\mathbf{x} - \mathbf{x}\_0\right)^2 + \left(y - y\_0\right)^2 + \left(z - z\_0\right)^2}{h \text{eight}}},\tag{1}$$

where *x*0, *y*<sup>0</sup> and *z*<sup>0</sup> represent coordinates of the midpoint of the back, and *x*, *y*, *z* are the coordinates of the point for which the distance is calculated.

Eleven angles were used in algorithms to describe postures and movements, as shown in Figure 2 and Table 2. For all 11 joints, the angles were between two vectors in 3D space. However, for the shoulder, hip and knee, the angles were calculated in the frontal and sagittal planes only.

**Figure 2.** Angles used in describing poses.


**Table 2.** Angles used to describe postures.

The angles were calculated as the angle between two 3D vectors

$$D\_{\text{angle}} = \arccos\left(\frac{x\_1x\_2 + y\_1y\_2 + z\_1z\_2}{\sqrt{x\_1^2 + y\_1^2 + z\_1^2}\sqrt{x\_2^2 + y\_2^2 + z\_2^2}}\right) \tag{2}$$

where *xn*, *y<sup>n</sup>* and *z<sup>n</sup>* are the coordinates of vectors obtained by the differences between points, according to Table 1.

#### *2.3. Experemental Protocol*

A database of 12 postures was created to validate the algorithms containing postures and exercise movements by ten subjects (Table 3, Figures 3 and 4). Each subject was asked to do 13 exercises and repeat each one at least 25 times. Subjects were allowed to rest if they felt fatigued. On average, it took around four hours to record 13 exercise movements for each participant. Exercise movements were randomized for each subject.

**Table 3.** Reference database of postures for the two people recorded and used for the classification of other participants.

**Figure 3.** Postures: (**a**) hands outstretched; (**b**) hands down; (**c**) hands on the waist; (**d**) left hand up; (**e**) right hand up; and (**f**) both hands up.

**Figure 4.** Postures: (**a**) hands forward; (**b**) left knee up; (**c**) right knee up; (**d**) both hands to the head; (**e**) left hand to the side; and (**f**) right hand to the side.

The movement exercises were described as a sequence of postures. The simplest movement was described by the start and the end position. In some cases, however, there were more complex sequences of movements where the middle phase movement comprised a combination of several postures. A total of thirteen different exercise test movements were eventually used in the study, as shown in Table 4.



#### *2.4. Accuracy Evaluation of Postures and Movement Exercises*

The accuracy, specificity and sensitivity were calculated based on formulas described in the article [25].

The classification of postures was made by comparing the recorded posture descriptors (*D<sup>i</sup>* ) with a reference database (*D<sup>j</sup>* ). The distance *Er<sup>i</sup>* for each pose *i* between the reference and reordered posture could be calculated as:

$$Err\_{\bar{l}} = dist(D\_{\bar{l}}, D\_{\bar{j}})\_{\nu} \tag{3}$$

A descriptor is composed of two parameters (angles and vectors), and thus two types of errors were calculated: the total error of the length of vectors and the total error of angles.

The first was calculated using absolute differences between them

$$ErVec\_i = \sum\_{k=1}^{17} \left| D\_i(k) - D\_j(k) \right| \tag{4}$$

where *D<sup>i</sup> (k), k* = between 1 and 17—parameters that are responsible for the length of the vectors. The total error angles for postures *i* were calculated using the formula

ସ

$$ErAngle\_{i} = \sum\_{k=18}^{40} \left| D\_{i}(k) - D\_{j}(k) \right|\_{\prime} \tag{5}$$

where *D<sup>i</sup> (k), k* = between 18 and 40—parameters responsible for the values of angles.

Based on those types of errors, three algorithms for the posture classifications assessment were developed. To classify the posture, the results should be equal to or almost equal to the reference database, so that the algorithm can define the correct posture classification from the data set collected. This was achieved by setting a threshold for the three algorithms:


To evaluate the most accurate algorithm for posture detection, the classification database was made using the descriptions of either "correct" or "incorrect" postures. In our study, all subjects were young and healthy, therefore it was enough to use two people for the posture reference database. However, the reference database would be more complex if participants had some disabilities and varied in age group.

To justify the accuracy of exercise movement classification, the database, with a set of sequenced postures in the correct order, was made, as shown in the examples in Figure 5.

**Figure 5.** Example of a movement exercise: (**a**) combination of two postures; and (**b**) a more complex movement exercise with a set of postures in sequential order.

Matlab was used for data collection, analysis.

#### **3. Results**

#### *3.1. Classification Algorithms*

To compare the effectiveness of different classification algorithms, a database was created from the descriptions of the 573 known postures, as shown in Table 3, and 903 postures which were not correct. Using this database, three algorithms were obtained that tested the sensitivity, specificity and accuracy of values. (Figures 6 and 7).

**Figure 6.** Relationship between specificity, sensitivity, accuracy and threshold for: (**a**) Algorithm 1; and (**b**) Algorithm 2.

**Figure 7.** Relationship between specificity, sensitivity, accuracy and threshold for Algorithm 3.

The mean sensitivity for the first algorithm was 92.5%, while for the second it was 98.95% and for the third it was 96.5%. Table 5 demonstrates detailed statistical results for three algorithms. Figure 8 shows receiver operator characteristic (ROC) curve results for three algorithms.



The mean intersection of sensitivity and specificity for the first algorithm was 75.7%, while for the second it was 94.1% and for the third it was 87.7%. The mean accuracy for the first algorithm was 76.6%, while for the second it was 94.9% and for the third it was 89.3%. The area under the ROC curves for the first algorithm was 0.862, while for the second it was 0.986 and for the third it was 0.966.

**Figure 8.** Relationship between false and true positive rates between three different algorithms.

#### *3.2. Number of Exercises Performed by Participants*

Each participant performed at least 390 exercises in total. Table 6 demonstrates detailed information on the number of exercises performed by each participant.


**Table 6.** Number of exercises performed by each participant.

The highest values of accuracy for movement exercises was demonstrated by the second algorithm, with 94.3% (SD 1.7%), as shown in Figure 9.

**Figure 9.** The accuracy of exercise movements performed by ten subjects for the second algorithm.

Figure 10 shows the percentage ratio of the identification of 13 exercises.

**Figure 10.** The identification ratio for 13 exercises with the implementation of Algorithm 2.

The average identification ratio of correct movement classification among participants was 94.3% (SD 1.7%). The average identification of correct exercises was 94.2% (SD 1.8%).

#### **4. Discussion**

The aim of this study was to determine accurate posture and exercise classification algorithms with low-cost sensors such as Microsoft Kinect, which has also led to the development of different virtual rehabilitation programs [13,26]. The use of such sensors can have many advantages. Firstly, they highlight interactivity and motivation, and they can also be used at home. This is important for people who live in remote areas, where there may not be experts who are locally available. In addition, the technique can be adapted to the needs of any patient group [27], or animals [28–31].

The comparison of this sensor with a professional optical motion capture system has demonstrated that it has the accuracy sufficient for both the tasks and data generation capability needed by specialists in the field of rehabilitation [8].

However, the question of how to evaluate the correctness of the exercise is still not certain, as the literature is only represented by a limited number of articles [7,21]. The previous research has demonstrated a most accurate posture classification of 91.9%, and for movement, a most accurate posture classification of 95.16% [21]. This study demonstrated a slight increase in the accuracy by using three different algorithms and by setting up a threshold level for: total error of vector lengths; total error of angles; and multiplication of vector errors by angle errors (as in [21]). Calculating sensitivity and specificity, the classification accuracy of the algorithms was obtained, with the best result shown by the algorithm using the total error of angles (94.9%). This algorithm showed better results when compared with previous research based on a multiplication of the total errors algorithm. This new algorithm also requires considerably fewer parameters for the classification of postures and exercise movements. The previous study, which showed the best accuracy for the posture classification, used 30 variables of the posture descriptor, such as angles and vector lengths [21]. However, the second algorithm in this research used only 17 variables of posture descriptor, which significantly improved the efficiency of the method.

In our study, when evaluating the classification accuracy of the exercises, we used results for the average accuracy of each participant and the average accuracy of the exercises, which were 94.3% (SD 1.7%) and 94.2% (SD 1.8%), respectively. Those results are practically the same as those of the previous research [21], but our algorithm, as mentioned above, requires considerably fewer parameters for the classification of postures and exercise movements. More advanced marker-based motion capture systems can also be used to improve the classification accuracy of algorithms. Previous research [32] has demonstrated that the static error of tracking passive markers with Oqus (Qualisys) cameras was 0.15 mm and a dynamic 0.26 mm, with much higher tracking frequencies than those used by the Kinect V2 sensor.

The definition of human posture can be applied not only to the creation of applications for rehabilitation, but also for monitoring the lives of older people, such as in the recording of a sudden fall. According to statistics, 28–35% of people over 65 years of age experience a fall [33], after which they often need a period of rehabilitation. Such a monitoring system could detect a person's posture, and alert relatives, neighbors or close friends in cases where the person's positional data indicates the possibility of a heart attack, stroke or other complication; such a posture, for example, could be lying down on the floor. The time factor in attending to such situations is very crucial, being directly correlated to the person's recovery.

More studies are required to develop classification algorithms for the various medical applications mentioned, as this study had a number of limitations, outlined below.


Future planned research is to use the Qualisys system to improve the algorithm by reducing the number of limitations.

Video analysis is widely applied in the context of human movement detection, and real-time implementation using reliable algorithms based on the postural recognition of healthy persons should provide postural data that can be used to assess the effectiveness of clinically prescribed exercise regimes for patients, as well as allow for variations in exercise regime, dependent on the data collected. Such data would be useful in optimized treatment by exercise therapy.

The advantages of such an approach could also be extended to veterinary applications. Very few studies address automatic video-based analysis of animals—for example, canine behavior as a means of monitoring animal health and wellbeing [28–30]—with some of these studies using a 3D Kinect camera to detect joint position. In [28], the authors present a system capable of identifying static postures for canines that does not rely on hand-labeled data at any point, although the system can only identify the "standing," "sitting" and "lying" postures with approximately 70%, 69% and 94% accuracy, respectively. Paper [29] presents a depth-based tracking system for the automatic detection of animals' postures and body segments, as well as an exhaustive evaluation on the performance of several classification algorithms, based on both a supervised and a knowledge-based approach. Furthermore, Barnard et al. addressed a problem of automatic behavioral analysis of kenneled dogs using 3D video monitoring [30]. Dog body segment detection was done using standard Structural Support Vector Machine classifiers, and the automatic tracking of the dog was also implemented. However, this tool has a high margin for improvement.

A number of studies were also found in the literature using wide-ranging applications in the biomechanics of animals, as well as in prosthetics to prevent injuries, monitoring rehabilitation after surgical operations, choosing the appropriate orthopedic devices and prostheses, training and others [34–36]. Therefore, the classification algorithm of posture can also be useful in not only human medicine, but also veterinary applications, influencing veterinary intervention using exercise regimes, as well as monitoring animals' health and behavior. Further studies using the Qualisys system and neural network, which would be trained to recognize a dog's skeleton using cost-effective video cameras, are planned; so far, such work has only been carried out for humans.

#### **5. Conclusions**

Virtual or home rehabilitation using modern technologies can improve health and quality of life for many people and animals. The algorithms for posture and movement classification used in this study demonstrated good results using an optical sensor. These algorithms can also be used in other motion capture systems as a simpler and less resource-intensive alternative to machine learning and neural network algorithms, thus increasing accuracy.

The posture and movement classification algorithm may also be used to monitor incidental falls in the elderly population that can be associated with heart failure or a stroke, and initiate a call for help.

As for animals, this technique may also be applied for measuring the time budget of animals, indicating the amount or proportion of time that animals spend in different behaviors as a measure for common ethological and welfare parameters [37].

**Author Contributions:** Conceptualisation, A.A. and T.K.; methodology, A.S.; software, T.K. and A.S.; validation, A.A., T.K. and D.K.; formal analysis, A.S.; investigation, A.A.; resources, A.S.; data curation, A.S.; writing—original draft preparation, A.A. and D.K.; writing—review and editing, D.K. and A.A.; visualization, A.A. and T.K.; supervision, A.Z. and O.A.M.; project administration, D.K. and A.Z.; and funding acquisition, O.A.M., D.K. and A.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by a grant from the Ministry of Science & Technology of Israel and by RFBR according to the research project N 19-57-06007.

**Acknowledgments:** The authors would like to express their sincere gratitude to the Ilizarov Scientific Center for Restorative Traumatology and Orthopaedics for supporting the project.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **Three-Dimensional (3D) Model-Based Lower Limb Stump Automatic Orientation**

**Dmitry Kaplun 1,2 , Mikhail Golovin <sup>3</sup> , Alisa Sufelfa 1,3,\* , Oskar Sachenkov <sup>4</sup> , Konstantin Shcherbina <sup>3</sup> , Vladimir Yankovskiy <sup>3</sup> , Eugeniy Skrebenkov <sup>3</sup> , Oleg A. Markelov <sup>2</sup> and Mikhail I. Bogachev <sup>2</sup>**


Received: 28 March 2020; Accepted: 30 April 2020; Published: 7 May 2020

**Abstract:** Modern prosthetics largely relies upon visual data processing and implementation technologies such as 3D scanning, mathematical modeling, computer-aided design (CAD) tools, and 3D-printing during all stages from design to fabrication. Despite the intensive advancement of these technologies, once the prosthetic socket model is obtained by 3D scanning, its appropriate orientation and positioning remain largely the responsibility of an expert requiring substantial manual effort. In this paper, an automated orientation algorithm based on the adjustment of the 3D-model virtual anatomical axis of the tibia along with the vertical axis of the rectangular coordinates in three-dimensional space is proposed. The suggested algorithm is implemented, tested for performance and experimentally validated by explicit comparisons against an expert assessment.

**Keywords:** 3D model; prosthetic design; orientation; positioning; reconstruction

#### **1. Introduction**

According to World Health Organization (WHO) statistics, up to one billion people in the world constituting about 15% of the total population live with certain form of disability, including approximately 200 million experiencing considerable difficulties in functioning that limit their participation in family and society life, with the domination of cases in the upper age group of ≥60 years old, where nearly every second a person experiences moderate and about 10% severe disabilities [1]. Severe disabilities are often associated with lower limb amputations, with approximately 30,000 to 40,000 amputations performed each year, and more than 1.5 million people living with a lost limb in the U.S. alone, with the majority requiring access to lower limb prosthetics [2]. The total number of lower limb prostheses provided annually in Russia increased from around 60,000 in 2012–2014 to above 80,000 in 2016–2018 [3].

Although reasons for amputation vary considerably between regions and age groups, ranging from severe trauma, wounds and burns caused by road traffic and occupational injuries as well as violence and humanitarian crises to tumors and inflammatory diseases potentially leading to the development of sepsis and multiple organ failure, on average around one half of amputation cases can be attributed to either trauma or disease, respectively [1,4].

Modern prosthetics largely rely upon technological support to improve the accuracy of design and finally to enhance the quality of rehabilitation taking into account the individual characteristics of the patient. Current technologies allow for a highly automated prosthetic design procedures based on the geometry and properties of human organs obtained by 3D scanning [5] or computed tomography (CT) scanning [6] followed by prosthetic model reconstruction. In addition to the design itself, adequate 3D models allow for running biomechanical simulations, leading to certain optimization strategies based on objective visual data sources and numerical mathematics [7]. In some cases, appropriate model design requires a combination of different technologies. In contrast to 3D scanning, CT and magnetic resonance imaging (MRI) are also capable of visualization of soft tissues [8,9]. Thus, modern scanners with the usage of digital models and mathematical simulations can predict the behavior of patient's organs at both micro [10] and macro scales [11] as well as over different time spans [12].

Modern approach to the design of receiving sockets for lower limb prostheses typically consists of the following stages:


The 3D scanning involves recording of the surface points coordinates [13] using a 3D scanner or recording device followed by the reconstruction procedure used to obtain a 3D-model of a limb segment. One of the key stages in 3D-model processing is the orientation of the model in 3D Cartesian coordinates.

Among recent approaches to the automation of the 3D-model orientation problem, using various mathematical filters for reconstructions of the 3D-model from 2D X-ray images can be mentioned [14]. The rotation matrix used for the automatic orientation of the model is in 3D space. However, this approach is limited to certain specific 3D-model rotation angles.

Another approach is based on the virtual environment that is used to automatically select the position of the 3D-model in three-dimensional space is considered in [15]. The key disadvantage of this particular method is that it limits possible 3D-model orientations to a set of specified templates.

In another recent work [16] orientation of the 3D image of the scapula is based on the preliminarily fixed markers that are being further compared against the control model. The disadvantage of this method is that the expert needs to place markers in the 3D-model manually.

A commonly applied solution is a web-based program, Rapid Plaster (PVA Med, New York, NY, USA), developed for the design of prosthetic sockets allowing users to work with a 3D-model in a conventional browser and convert it using standard tools for working with 3D models, although again it lacks a component that could be used for the automated orientation in 3D space [17].

Current classification of socket types, their common advantages and limitations from multiple viewpoints, as well as an overview of the keynote parameters affecting the stump–socket interface and influencing the comfort and stability of the limb prostheses such as displacement, stress, volume and temperature fluctuations for different positioning and orientation scenarios are reviewed in [18].

A physically motivated reduced model of the dynamic interactions between the residual limb and the prosthetic socket proposed in [19] is capable of the quantitative assessment and simulation of the stress distribution for different variants of the socket orientation and positioning as well as its possible alterations depending on the corresponding variation in the friction coefficients.

Another recent knowledge-based approach to the design and orientation of the lower limb prostheses with particular focus on the 3D modeling of the socket is based on the acquisition and formalization of the knowledge related to the prosthesis manufacturing process as well as the architecture of a dedicated knowledge-based engineering framework detailing the key design steps. A computer-aided module named socket modeling assistant (SMA) represents a virtual laboratory where the socket prototype is being created and positioned based on the digital model of a patient's

residual limb. The SMA software acts as an interactive tool to guide and support the expert socket designer during each step from socket design to its positioning and orientation either in an automatic or in a semi-automatic fashion [20].

The authors of [21] focus on untangling the complexity of the transtibial prosthetic socket fit by selecting the keynote characteristics that guarantee its successful fitting as well as finding certain criteria for the optimized selection of the particular prosthetic socket type for different positioning and orientation scenarios. Based on the analysis of the activity levels and reported satisfaction of active persons, especially among younger persons mainly with a traumatic cause of amputation, they conclude that the total surface-bearing sockets generally outperform the patellar tendon-bearing sockets, as they can be better adjusted to the lower limb stump in order to withstand dynamic loads typically associated physical activity.

Socket biomechanics, socket pressure measurement, friction-related phenomena and associated properties, with corresponding computational models describing the limb tissue responses to the external mechanical loads and other physical conditions for different socket positioning and orientation scenarios are the key focus of [22]. Further advancement of this research is associated with the embedded sensors enabling direct measurements of the physical stresses applied to the socket [23]. The results of this study indicate that the direction and the angle of rotation of the stump could be obtained by decoding the magnetic field signals obtained by magnetic sensors embedded into a prosthetic socket. This pilot study provides important guidelines for the development of a practical interface between the residual bone rotation and the prosthesis for control of prosthetic rotation.

The conventional approach to the prosthetic socket orientation includes splitting the stump 3D-model into sections according to a discrete grid in the horizontal plane followed by obtaining the resulting dimensions from the cross-section perimeters. Accordingly, registration of the cross-section perimeters while the 3D model is positioned at the wrong angle to the horizontal plane leads to incorrect measurements. Thus, it is essential to create an algorithm that provides at least preliminary automated orientation of the 3D-model based on objective measurement data. Due to high variability in the stump characteristics and the individual shape of each stump, especially in each congenital malformation case, in certain scenarios only preliminary conclusions can be made automatically, and expert attention will, nevertheless, be required. However, even in this scenario, an automatic decision support system (DSS) could save an expert's time and provide at least preliminary positioning.

In this work, we propose an automated orientation algorithm based on the adjustment of the 3D-model virtual anatomical axis of the tibia along with the vertical axis of the rectangular coordinates in three-dimensional space. The suggested algorithm is implemented, tested for performance and experimentally validated by explicit comparisons against an expert assessment. We believe that the proposed solution could be useful as a DSS for the prosthetic expert support.

The automated system development is based on the algorithm of orientation of the digital geometric model of the lower limb stump in an automatic mode, and consists of the following steps:


#### **2. Materials and Methods**

The anonymous experimental data for the study were obtained from the Federal Scientific Center of Rehabilitation of the Disabled named after G.A. Albrecht. The study was performed in accordance with the ethical standards presented in the Declaration of Helsinki. The study protocol was reviewed and approved by the local expert collegiate council before the beginning of the study. All patients provided their written informed consent prior to their inclusion in the study.

The study focused on the patients with unilateral, bilateral and multiple amputation defects with various causes of amputation defects, with one lower limb having one defect at the lower leg level. The following patients were excluded from enrollment in the study: those with skin defects such as non-healing wounds; those with trophic ulcers; those exhibiting tremor of the extremities due to various diseases; those with protrusion of bone sawdust under the skin that prevents prosthetics; those suffering from either acute myocardial infarction or acute cerebrovascular accident within 4 months prior to the study; those exhibiting mental illness in the acute stage; those with contagious infectious diseases; and those with complication of the main and/or concomitant disease with the appointment of bed rest.

Among seven patients selected for detailed investigation, four were male and three female, aged between 45 and 58 years old.

Mathematical analysis and functional programming tools (Python 3.3.7 with Mesh library, www.python.org) were used for visual data processing, model design and computer simulations. The loop optimization method was used to select the points with maximum displacement [24]. Multidimensional regression method was used for 3D-model cross-section fitting [25]. For 3D-model orientation, quaternions [26] and rotation matrices have been used [27].

The developed algorithm (see Algorithm 1) performs an automatic orientation of the 3D-model. The key steps of the algorithm are as follows:

#### **Algorithm 1**


In the first step a digital representation obtained by a 3D scanner is imported as an array of coordinates containing the points *r<sup>i</sup>* = *(x<sup>i</sup> , y<sup>i</sup>* , *z<sup>i</sup> )*. Once imported, all entries are sorted according to their coordinates along the Z axis. Figure 1 illustrates the visual representation of the input model, where x, y, z are Cartesian axes, while a is the virtual anatomical axis. The spatial orientation of the model is determined by the angles between the projections of the anatomical axis a on the xy, xz and zy planes.

Next, the points *r<sup>i</sup>* = *(x<sup>i</sup>* , *y<sup>i</sup>* , *z<sup>i</sup> )* characterized by the maximum coordinates (step 2 of the algorithm) have been selected as:

$$
\vec{r\_{\text{max}}} = \max\_{\vec{z}} \vec{r}\_i \tag{1}
$$

$$
\vec{r\_{\min}} = \min\_{\vec{z}} \vec{r\_i} \tag{2}
$$

To enable the rotation of the digital 3D model, we employed a set of transformations following the rotation matrices approach. In 3D space rotation around the Z-axis is described by the matrix transformation.

$$\begin{aligned} \text{R}\_z \begin{pmatrix} \theta \end{pmatrix} = \begin{pmatrix} \cos(\theta) & \sin(\theta) & 0\\ -\sin(\theta) & \cos(\theta) & 0\\ 0 & 0 & 1 \end{pmatrix} \end{aligned} \tag{3}$$

**Figure 1.** Visual representation of the input model.

Similarly, rotation around the X-axis has the form:

$$\mathbf{R}\_{\mathbf{x}}\left(\boldsymbol{\varphi}\right) = \begin{pmatrix} 1 & 0 & 0 \\ 0 & \cos(\boldsymbol{\varphi}) & \sin(\boldsymbol{\varphi}) \\ 0 & -\sin(\boldsymbol{\varphi}) & \cos(\boldsymbol{\varphi}) \end{pmatrix} \tag{4}$$

Rotation around the Y-axis is given by:

$$\begin{aligned} \mathbf{R}\_{\mathcal{Y}}\left(\omega\right) = \begin{pmatrix} \cos(\omega) & 0 & -\sin(\omega) \\ 0 & 1 & 0 \\ \sin(\omega) & 0 & \cos(\omega) \end{pmatrix} \end{aligned} \tag{5}$$

Next, the coordinates of the point *r<sup>i</sup>* can be written as a series of consecutive transformations.

$$r\_i^\* = \mathcal{R}\_z(\theta) \cdot \mathcal{R}\_x(\varphi) \cdot \mathcal{R}\_y(\omega) \cdot r\_i^T \tag{6}$$

The resulting transformation matrix can be interpreted as a matrix of the direction cosines between the old and new coordinate systems.

$$r\_i^\* = \mathcal{R}\_2(\theta) \cdot \mathcal{R}\_x(\varphi) \cdot \mathcal{R}\_y(\omega) \cdot r\_i^T = \mathcal{M} \cdot r\_i^T \tag{7}$$

The described approach has been used both for the primary and for the secondary orientation of the model (steps 3 and 8 of the algorithm).

The normalized vector indicating the direction that connects these points relative to the global coordinate system **r**dir can be expressed as:

$$
\overrightarrow{\dot{r}}\_{\text{dir}} = \frac{\overrightarrow{r\_{\text{max}}} - \overrightarrow{r\_{\text{min}}}}{\|\overrightarrow{r\_{\text{max}}} - \overrightarrow{r\_{\text{min}}}\|} \tag{8}
$$

Figure 2 shows the cross-sections of the 3D-model before (2A) and after (2B) arrangement.

**Figure 2.** Cross-sections: (**A**) points before arrangement, (**B**) points after arrangement.

Following the arrangement of the points constituting the cross-section a closed contour is selected (step 5 of the algorithm). Creation of a closed contour starts from a reference point and proceedings following the oriented direction obtained at each j-th step from:

$$D\_j = \frac{\|\|r\_0 - r\_i\|\| \cdot \|\|r\_0\|\| \cdot \|\|r\_i\|\|}{(r\_0, r\_i)}\tag{9}$$

Arrangement of this array according to a specific variable D<sup>j</sup> , defines an ordered closed set of points, as exemplified in Figure 2b, where r is an array of coordinates containing the points *r<sup>i</sup>* = *(x<sup>i</sup>* , *y<sup>i</sup>* , *z<sup>i</sup> ).*

At step 6, the central (pivot) points are obtained by averaging the coordinate points for each of the cross sections specified by the operator. Next at step 7 a straight line through the central points (obtained at step 6) is fitted using the multidimensional regression method. In particular, the error ε is being minimized in the model:

$$\mathbf{Y} = \mathbf{Z}\mathbf{b} + \mathbf{d} + \varepsilon \tag{10}$$

where Y contains z-coordinates from the dataset r, Z contains x- and y-coordinates from the dataset r, b and d are free model parameters, while ε is the random error. The problem (10) can be resolved using the least mean squares (LMS) method.

$$\mathbf{S} = \sum \left( \mathbf{Y} - \mathbf{Z}\mathbf{b} - \mathbf{d} \right)^{2} \to \min \tag{11}$$

thus yielding.

$$\begin{cases} \sum (\mathbf{Y} - \mathbf{Z}\mathbf{b} - \mathbf{d})Z = 0\\ \sum (\mathbf{Y} - \mathbf{Z}\mathbf{b} - \mathbf{d}) = 0 \end{cases} \tag{12}$$

Finally, the cross-section levels of the lower limb stump in the 3D-model are obtained, and the entire 3D-model is aligned along the Z-axis (see Figure 3) before being exported.

**Figure 3.** Three-dimensional (3D) model with cross-sections and fitting line that passed through its centers.

Figure 4 shows the model appearance after the orientation using the designed algorithm.

**Figure 4.** Visual representation of the oriented 3D model.

#### **3. Results**

The validation of the proposed algorithm was performed using experimental data obtained by 3D scanning followed by 3D model reconstructions for seven different patient cases as summarized in Table 1. To validate the tests under different initial conditions, prior to the test each 3D model was randomly oriented. Since there is no "gold standard" for an automated algorithm in the field, the results of the automatic orientation algorithm were compared against manual positioning by an experienced prosthetic expert.

**Table 1.** Summary of validation results based on experimental measurements from seven different patients compared against prosthetics expert assessment.


For the quantitative comparison between the automated and the expert manual positioning, two pivot points, one on the distal and one on the proximal planes, were selected, and a straight line connecting these two points was fitted. Once the model orientation was performed independently by the automated algorithm and manually by the expert, the two spatially oriented models were matched by the point in the distal plane. Three projections of the straight line on the xy, xz and zy planes were obtained for both automated and manually oriented models, and angles between these projections in each plane were used for the quantitative comparison between the two approaches, as summarized in Table 1. In the above procedure, the exact locations of the pivot points are not critical for the accuracy of the comparison, since they are used for measuring relative orientations only.

Although in this work we focused on the lower limb stump orientation problem, the proposed algorithm appears to be a universal tool for the 3D model orientation. We have implemented the algorithm using Python programming language and Mesh library features (the source code is available as Supplementary Material). The developed software module follows the algorithm represented below in pseudocode (see Algorithm 2). The algorithm contains two key verification conditions, with the first condition checking whether the virtual anatomical axis of the tibia coincides with the vertical Z-axis, while the second condition triggering if the model is positioned with the stump end up after orientation. Since the construction of the cross-sections depends explicitly on the anatomic characteristics of the lower limb stump, it is specified by the operator for each 3D-model in an individual range.

The axis of the model should coincide with the Z-axis, since this orientation is required for a more intuitive manual positioning by the expert, who is more used to traditional technology assuming the adjustment of the stump cast installed vertically. After the manual adjustment is performed, the entire model can be re-positioned and re-oriented as required to match with the actual biomechanical axis.

The angle of the virtual axis along the tibia to the vertical Z-axis is affected by the angle of the 3D-model slice after scanning. Figure 5 compares the models obtained after orientation, where (*a)* is an expert oriented 3D-model, while 3D-models (*b)-(f)* are different variant of 3D-model oriented by the proposed algorithm. The 3D-model gradually shortens the end above the knee. The data obtained show the following: the shorter the femoral part of the 3D-model, the smaller the deviation of the virtual anatomical axis of the tibia from the vertical Z-axis.

**Figure 5.** Comparison of models obtained after orientation: (**a**) an expert oriented 3D-model, (**b**–**f**) 3D-models oriented by the proposed algorithm.

#### **Algorithm 2**


Consequently, the input requirements for the algorithm performance are: (i) 3D-model scan length above the knee and (ii) the angle of inclination of the cross-section plane of the scan above the knee to the anatomical axis of the femur.

For comparison the inclination angle input models can be marked with reference points. To analyze the angles of deviation of the virtual anatomical axis of the tibia from the Z ax is after orientation by the algorithm and the expert. In Figure 6, a straight line drawn through reference points allows the results to be compared visually.

**Figure 6.** Inclination angle calculation.

Performance of the algorithms depends explicitly on the number of polygons. The dependence of the analysis time on the number of polygons in the model is shown in Figure 7.

**Figure 7.** Performance as a function of polygon number in the 3D model.

The presented dependence function can be reasonably fit by a second order polynomial. It was found that significant simplification of the original model leading to the six-fold reduction of the number of polygons from 120,000 to 20,000, the algorithm's operation time is reduced by approximately 1.5 orders of magnitude. Thus, reduction of calculation time can be achieved by any measures that lead to the reduction of the polygon number, such as smoothing of the 3D-model surfaces.

#### **4. Discussion**

Based on the results obtained, it can be assumed that the developed algorithm provides with relatively good agreement with the expert assessment typically within 10 degrees along the vertical axis in any direction, as revealed by comparison of the fitting line obtained by the automatic algorithm and a similar line fitted for the model oriented manually by the expert. The most pronounced deviations could be observed in the third section for the fifth patient that could be attributed to the specific anatomic features of the stump. As indicated in Table 1, no relevant results were obtained for the second patient, as attempted analysis resulted in the inverted positioning of the 3D model. A possible reason for this result is that the angle of inclination of the 3D-model cross section plane above the knee to the virtual anatomical axis of the femur was untypically sharp, resulting in the input requirements not being fulfilled.

The limitations of the proposed approach include limited automation, as the expert has to define the position of the proximal edge of the stump to determine the location of the upper section. We also have to note that the algorithm in its present implementation relies upon the assumption that the virtual anatomic axis is orthogonal to the sagittal plane, which is not exactly, but only approximately fulfilled especially for longer models, see also Figure 6 for visual reference.

In our opinion, the above algorithm, taking into account all its inherent limitations, can nevertheless be used as a decision support tool for the prosthetic expert and 3D designer considerably reducing the workload of qualified personnel. Among relevant analogues, the work [28] should be mentioned, where 2D cross-sections of the upper and lower limbs 3D-model were used to create a "bone structure". The sections were created by placement of any number of parallel planes inside the 3D-model and calculating corresponding intersections between the 3D-model and the plane. This "bone structure" can be used to determine the orientation of the scanned model in 3D space in order to create additional cross-section images, as well as obtaining basic information about the shape of the hand or foot or missing fingers, which can be used to create a 3D-model of the prosthesis. However, getting accurate dimensional information about the 3D-model being scanned will require additional input data from the user to determine the scale of the 3D-model, which in our opinion appears to be a certain limitation of this approach, as preprocessing of the 3D-model by an expert is required increasing the workload and potentially reducing the objectivity of the first approach to the model orientation.

As an outlook, we believe that further optimization of the algorithm could make the dependence of the performance on the number of polygons in the 3D-model less pronounced. For that, an algorithm module that allows orientation of the 3D-model in a horizontal plane using the quaternion method for rotating an array of 3D-model points sounds a perspective solution. Since recent technological advancement has enabled the estimation and simulation of the distribution of stresses applied to the residual limb tissues, as well as the volume fluctuations affecting the stump over time and the temperature variations influencing the residual tissues for different variants of socket design, a more automated functional socket design may be developed in the near future. Moreover, recent advancements in prosthetic technologies suggest that future directions could be associated with advanced socket designs capable of self-adaptation to the complex interplay of factors affecting the stump, under both static and dynamic loads, as a replacement for the current fixed socket-orientation scenario [18].

#### **5. Conclusions**

To summarize, an automated orientation algorithm based on the adjustment of the 3D-model virtual anatomical axis of the tibia along with the vertical axis of the rectangular coordinates in three-dimensional space was proposed. The suggested algorithm was implemented, tested for performance, and experimentally validated by explicit comparisons against expert assessment. Based on the results obtained, it can be assumed that the developed algorithm provides relatively good agreement with the expert assessment typically within 10 degrees along the vertical axis in any direction, taking into account the input requirements. In our opinion, the above algorithm, taking into account all its inherent limitations, can nevertheless be used as a decision support tool for the prosthetic expert and 3D designer considerably reducing the workload of qualified personnel.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2076-3417/10/9/3253/s1: Python Software Module S1: orientation script.

**Author Contributions:** Conceptualization, M.G. and V.Y.; methodology, O.S.; software, A.S.; validation, E.S., M.G. and K.S.; formal analysis, V.Y., E.S.; investigation, A.S., M.G.; resources, K.S. and O.A.M.; data curation, O.S.; writing—original draft preparation, A.S., D.K.; writing—review and editing, D.K., M.I.B.; visualization, A.S. and O.A.M.; supervision, D.K.; project administration, M.I.B.; funding acquisition, M.I.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Ministry of Science and Higher Education of the Russian Federation under assignment No. 0788-2020-0002. The APC was funded by the Ministry of Science and Higher Education of the Russian Federation under assignment No. 0788-2020-0002.

**Acknowledgments:** The authors would like to thank the staff of the Federal Scientific Center of Rehabilitation of the Disabled named after G.A. Albrecht and the Regional Scientific and Educational Mathematical Center at Kazan Federal University for assistance.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Improving Calculation Accuracy of Digital Filters Based on Finite Field Algebra**

**Dmitry Kaplun <sup>1</sup> , Sergey Aryashev <sup>2</sup> , Alexander Veligosha <sup>3</sup> , Elena Doynikova <sup>4</sup> , Pavel Lyakhov 1,5 and Denis Butusov 1,6,\***


Received: 18 November 2019; Accepted: 17 December 2019; Published: 19 December 2019 -

**Featured Application: Nowadays digital filters are widely used in receivers of di**ff**erent softwaredefined radio (SDR) communication systems. The main factor a**ff**ecting the development of SDR is the characteristics of analog-to-digital and digital-to-analog converters. SDR technology allows us to replace existing and developed designs of receivers and transceivers of a heterodyne structure with a limited number of hardware units controlled by specialized software. This helps to simplify the constructions, make them cheaper, improve their performance, and support any modulation types. Furthermore, such an approach can be fruitful in signal reception and demodulation for di**ff**erent types of digital modulations such as DPSK, QAM, GMSK, etc. The main operations in the SDR receiver are heterodyning and filtering, which are performed digitally. In this case, digital filtering determines almost all parameters of the output channel of such a receiver. Therefore, the parameters of digital filters in these receivers have to meet more strict requirements, including the accuracy of calculations and hardware costs. The use of finite field algebra will significantly increase the accuracy of calculations in digital filters of such receivers and reduce hardware costs.**

**Abstract:** The applications of digital filters based on finite field algebra codes require their conjugation with positional computing structures. Here arises the task of algorithms and structures developed for converting the positional notation codes to finite field algebra codes. The paper proposes a method for codes conversion that possesses several advantages over existing methods. The possibilities and benefits of optimization of the computational channel structure for digital filter functioning based on the codes of finite field algebra are shown. The modified structure of computational channel is introduced. It differs from the traditional structure by the fact that there is no explicit code converter in it. The main principle is that the "reference" values of input samples, which are free from the error of the analog-digital converter, are used as input samples. The proposed approach allows achieving a higher quality of signal processing in advanced digital filters.

**Keywords:** digital filter; finite field algebra; conversion device; module; memory device; residue

#### **1. Introduction**

For the effective implementation of digital signal processing (DSP) algorithms, especially digital filters (DF), number-theoretic methods based on prime numbers [1–3] are of great importance. Many of these methods allow parallel computing, thus the research on theory and application of numerical systems with parallel structure is of particular interest in the field of DSP, image processing systems, cryptographic systems, quantum automated machines, neural computers systems, massive concurrency of operations, cloud computing, etc. [4–9]. Such systems are best suited for parallel computing. One of the most fruitful research areas here is the algebra of finite field, which provides an impressive level of internal parallelism [10] to the DSP systems.

Recent studies in designing DSP computing devices based on the finite field algebra (FFA) have shown that FFA has a superior potential for improvement of performance and reliability of numerical information processing being compared with the traditional positional numeral system (PNS) [11].

Since the FFA is an integer numeral system, it is possible to represent the processed data in DSP devices with arbitrary accuracy. From [11,12], it is known that the FFA advantages appear most clearly when tabular schemes are in use. Therefore, as the integral technology improves (e.g., the production of storage devices with high information density) along with the technical basis of the tabular computational method, efficiency of using FFA codes is steadily increasing.

Most popular designs of computing devices operating in FFA codes are focused on the implementation of computational processes of the same type despite their different specialization. These processes are the sequences of arithmetic operations (addition and multiplication) with integer numbers. This determines interest in researching the FFA implementation in highly efficient digital filtering algorithms. The following advantages of FFA can be outlined from [11–13]:

1. Independence of formation of numbers' bits. Whereby each bit carries information about the entire original number instead of the intermediate number resulting from the formation of lower-order bits. This implies the possibility of numbers' bits independent parallel processing.

2. The low bitness of residues representing a number. It results from a small number of possible code combinations. It allows using tabular arithmetic where the typical arithmetic operations are transformed into operations performed by simple selection of the result of calculations from the table (memory device).

3. The FFA has natural corrective abilities. The FFA codes allow efficient detection and correction of errors while transmitting signals and performing arithmetic operations [14].

Considering the abovementioned issues, we can conclude that it is advisable to use the FFA for the synthesis of DF with the required quality indicators. The FFA advantages compared with PNS allow providing the required frequency and accuracy characteristics of filters and digital signal processing in real-time [13]. Modern means of digital signal processing (for example, digital receivers) have strict requirements for the quality of signal processing. The analysis carried out in [12,13] showed that the use of FFA allows one to achieve the required indicators of signal processing quality. Let us give some preliminaries first.

#### **2. The Basics of Operations on Numbers in the FFA**

The theoretical basis of FFA is the theory of comparisons. Two integers, *A*<sup>1</sup> and *A*2, that have the same residues being divided by module *p* are called comparable in modulus *p*, and the relation of comparability takes the following form:

$$A\_1 \equiv A\_2 \bmod p \tag{1}$$

From view of numbers comparability only residue α is used. It is obtained by dividing the number *A* by the number *p*. Thus, the following comparison is true:

$$
\alpha \equiv A \bmod p
\tag{2}
$$

Finding the residue is the transformation of the number *A* modulo *p*. The operation of determining the residue is performed by the following rule [11]:

$$\forall A \in \mathbb{Z}: \left| A \right|\_{p^{+}} \leftrightarrow A - \left[ A/p \right]\_{p} \tag{3}$$

The residues of number *A* modulo *p* will belong to the number range α ∈ (0, 1, 2, . . . , *p* − 1). While performing calculations it is always possible to replace the comparison that establishes a relationship between integer classes of numbers having the same residue with the equality including this residue. For example, if

$$A + B \equiv \mathbb{C} \bmod p \tag{4}$$

Than Equation (4) can be written as follows:

$$(A+B)\bmod p = \text{C mod }p\tag{5}$$

Calculations with residues are rather simple since they can get values no more than *p* − 1 [11,12]. Therefore

$$\begin{array}{l}(A \ \pm B) \bmod p \text{ is equivalent to } (A \bmod p + B \bmod p) \bmod p\\A \cdot B \bmod p \text{ is equivalent to } (A \bmod p \cdot B \bmod p) \bmod p\end{array} \tag{6}$$

Therefore, for any operations of multiplication, addition, subtraction, one can replace the result of calculations at each step by its residue. Representation of numbers in the FFA is provided by the smallest non-negative residues α*<sup>i</sup>* according to the system of mutually simple modules in the following form:

$$p\_i(\forall i \in [1; n]; A(\alpha\_1, \alpha\_2, \dots, \alpha\_n)). \tag{7}$$

Addition, subtraction, and multiplication of two numbers *A* and *B* can be performed by the addition, subtraction, or multiplication of the residues α*<sup>i</sup>* and β*<sup>i</sup>* for each module *p<sup>i</sup>* , independently. If the value *P* is chosen as the product of modules *p<sup>i</sup>* , then actions with large numbers can be performed in such a system with a large number of small modules *p<sup>i</sup>* . The value *P* determines the complete range of representation of numbers in the FFA code.

The following identity can be written [11]:

$$\forall A \in (0, 1, 2, \dots, p\_i - 1) \bmod p\_i: A = \left| \sum\_{n=1}^n |A|\_{pi}^+ + m\_i \mathbf{P}\_{\text{in}} \right|\_{\mathbf{P}\_n} \tag{8}$$

where *m<sup>i</sup>* = *P* −1 *in* + *pi*.

The identity (8) is the basis for generating the finite field arithmetic code. If the fixed series of positive integers *p*1, *p*2, . . . , *p<sup>n</sup>* are modules than the finite field arithmetic (the system of residual classes) is a such nonpositional number system in which any positive integer is represented as a set of residues from dividing the represented number by the selected base of the system as follows:

$$A(\alpha\_1, \alpha\_2, \dots, \alpha\_n) \tag{9}$$

where α1, α2, . . . , α*n*—the smallest non-negative number residues by modules *p*1, *p*<sup>2</sup> . . . , *pn*, respectively. The numbers α*<sup>i</sup>* by the selected modules are formed as follows:

$$\alpha\_{i} = \text{rest} \, A \, (\text{mod } p\_{i}) = A - \left[ \frac{A}{P\_{i}} \right] p\_{i} \, (\forall i \in [1, n] \tag{10}$$

where *A*/*pi*—integer quotient; *pi*—bases-mutually prime numbers.

In number theory, it is proved [11,12] that if "*i* , *j*, (*p<sup>j</sup>* , *p<sup>i</sup>* ) = 1", then the number representation (10) is the only one if 0 ≤ *A* ≤ *Pn*, where *P<sup>n</sup>* = *p*<sup>1</sup> · *p*<sup>2</sup> · . . . · *pn*—is the number range, i.e., there is the number *A* for which 

$$\begin{cases} \quad A \equiv \alpha\_1(\text{mod } p\_1) \\ \quad A \equiv \alpha\_2(\text{mod } p\_2) \\ \quad \cdots \\ \quad A \equiv \alpha\_n(\text{mod } p\_n) \end{cases} \tag{11}$$

Thus, the conclusion can be done that is advisable to use the methods of organizing calculations based on the representation of the processed data in the FFA codes in the algorithms of digital filtering. It should be noticed that this task is solved on the basis of a systematic approach. Namely, for the synthesis of DF operating on the basis of the FFA codes a number of tasks should be solved including:


Currently, there are numerous studies devoted to solving the abovementioned tasks and other problems in DSP. Their solution will allow one to fully utilize the advantages of the FFA and ensure efficient signal processing in the digital filters [12,13]. In Figure 1, the variant of a simplified structural diagram of DF operating in the FFA codes is represented.

**Figure 1.** The variant of simplified structural diagram of digital filters (DF) operating in the finite field algebra (FFA) codes.

#### **3. Digital Filters in the FFA**

Here we describe an implementation of the converter of position code to the FFA code. An opportunity to exclude an influence of the errors of analog-to-digital converter on the filter output sample is considered.

Analysis of the modern implementations of computational algorithms in the FFA codes allows concluding that the time of reverse conversion to positional representation in them takes more than 50% of the common time of calculations [11,12].

ା ሼ0… − 1ሽ,


Insufficient development of the theoretical foundations of construction of code converters and, following this, the insufficient development of methods and means for their implementation becomes a critical place in the entire cycle of development and implementation of computing devices of DF operating on the basis of the FFA. It leads to loss of its advantages.

In [13,16,17] the algorithms of operation and hardware implementation of the devices for interfacing the positioning and computing FFA devices are reviewed. The devices in which the determination of residue is based on the use of the property of residues' cyclical nature are considered. Following the Equation (1)

$$|\alpha\_1|\_{p\_n}^+ \in \{0 \dots p\_n - 1\},\tag{12}$$

the residue α*<sup>i</sup>* will repeat *d* times in the range of convertible numbers. The cycle of its reiteration will depend on the value of module *p<sup>i</sup>* . In other words, the value *d* can be determined from the Equations (13) and (14).

$$d = \lfloor \frac{2^s}{p\_i} \rfloor, \text{ for } \alpha\_i \in \{0, \dots, p\_i - 1\} \tag{13}$$

$$d = \lfloor \frac{2^s}{p\_i} \rfloor + 1, \text{ for } \alpha\_i \in \{0, \dots, g\} \tag{14}$$

where *g*—is the residue of dividing the number 2*<sup>s</sup>* by *p<sup>i</sup>* ; *s*—is the bit width of the number being converted in the FFA code. In accordance with the Equations (13) and (14) the residue α*<sup>i</sup>* corresponds to *d* numbers from the range of 2*<sup>s</sup>* . To calculate the residue α*<sup>i</sup>* of initial number *A* it should be uniquely determined that the number *A* ∈ {*d*}, i.e.,

$$A \in d \text{ for } d = \lfloor \frac{2^s}{p\_i} \rfloor, \text{ for } \alpha\_i \in \{0, \dots, p\_i - 1\}$$

$$A \in d \text{ for } d = \lfloor \frac{2^s}{p\_i} \rfloor + 1, \text{ for } \alpha\_i \in \{0, \dots, g\}$$

For this goal the range of binary numbers presented in the FFA code can be divided into the subranges. The number of subranges and the values of numbers in them are determined by the value *E* = 2 *<sup>s</sup>*/2. The values of numbers in subranges will be within the numbers' intervals with a step equal to one. They will be determined by the following expressions:

$$\begin{array}{c} \text{E}\_1 = 0, 1, \dots, 2^{s/2} - 1; \\ \text{E}\_2 = 2^{s/2}, 2^{s/2} + 1, \dots, 2^{\frac{s}{2} + 1} - 1; \\ \dots \\ \text{E}\_k = 2^{\left(\frac{s}{2}\right) + \left(\frac{s}{2} - 1\right)}, 2^{\left(\frac{s}{2}\right) + \left(\frac{s}{2} - 1\right)} + 1, \dots, \quad 2^s - 1. \end{array} \tag{15}$$

From the expression (15) it follows that the upper *s*/2 bits of number unambiguously determine the number of subrange *E<sup>i</sup>* in which the number is. While the lower *s*/2 bits of number *A* determines the index of number in the subrange. Thus, based on the cyclicity property of residues modulo, the finding of residue for the number *A* will include determination of subrange number and reading the residue from the memory device for each subrange. Considering the Equations (13)–(15), the algorithm of residue finding on the basis of subrange determination will include the following procedures:


The block diagram of the proposed interfacing device includes the register where the initial number *A* is written, the 2*s*/2 comparison schemes and the 2*s*/2 memory devices. The proposed method of residue finding provides it for two modular cycles of the converter. The time of residue formation does not change with an increase in the bit width of the converted source numbers.

The disadvantage of the proposed algorithm is that the hardware costs required for its implementation are high. In addition, the residue values repeat in the memory devices. This indicates an incomplete use of modular code ring properties.

While researching the proposed algorithm of formation of a modulo residue it was found that the first values of residues in the subranges of expression (15) for the modules getting the values *p<sup>i</sup>* < 2 *s*/2 change by a value

$$\mathbb{C}\_{\rm s} = 2^{s/2} - p\_{\rm l} \tag{16}$$

while for the modules getting the values *p<sup>i</sup>* > 2 *<sup>s</sup>*/2, these values are constant and determined as

$$\mathbb{C}\_{\mathbf{s}} = \mathbb{2}^{s/2}.\tag{17}$$

Using these properties, the algorithm of residue determination can be described as follows: <sup>௦</sup> = 2௦/ଶ .

$$\alpha\_{\bar{l}} = |\mathbb{R} + \mathbb{C}\_{\text{s}}| \tag{18}$$

where *R*—the number determined by the *s*/2 lower bits of the number *A*; *Cs*—the value calculated using Equation (16) and written to the memory device (MD) by the address that gets the value in accordance with the upper *s*/2 bits of number *A*. The block diagram of the proposed converting device is provided in Figure 2. /2 *А* <sup>௦</sup> /2 *А*

= |+<sup>௦</sup>


The developed algorithm provides the residue formation for three modular cycles of the interfacing device. Thus, we can conclude that the developed conversion method provides significantly better performance than the existing conversion methods considered in [10].

It should be noted that the algorithm provides operation in the conveyor mode, i.e., matching the speed of arrival of input data into the computing device of a DF and calculation of its output sample on the basis of FFA.

**Figure 2.** The block diagram of the proposed converting device.

The abovementioned method of data representation in the FFA codes meets the requirements of the real-time signal processing devices in terms of performance indicators. Application of the conversion device for data in the FFA codes complicates the overall filter structure, requires additional costs to synchronize the operation of its elements in the mode of calculations conveyor. The question arises about the possibility of excluding the conversion device from the structure of the computation channel of DF to improve the accuracy of the calculation of the output sample.

(்) ()

()

() = () + () + ௦() + ௨௧()+ ௗௗ(),

௦()

ௗௗ()

௨௧()

In [15,16] the models of calculation accuracy for output samples in a positional digital filter and in the filter operating on the basis of FFA are provided. The model of calculation accuracy in the positional DF considering that while calculating the output sample the intermediate results will be rounded off, can be represented as follows:

$$e\_{\text{cr\\_om}}(nT) = Q\_{AD\overline{C}}(nT) + e\_{q\text{\\_oc}f}(nT) + e\_{q\text{\\_is}}(nT) + e\_{no\text{\\_out}}(nT) + e\_{\text{add}}(nT),\tag{19}$$

where *eer com*(*nT*)—the common calculation error for the output sample; *QADC*(*nT*)—the error of analog-to-digital converter (ADC); *eq coe f*(*nT*)—the error of coefficients quantization when they are represented in a computational digital filtering algorithm; *eq is*(*nT*)—the error of input sample quantization when they are represented in a computational digital filtering algorithm; *ero out*(*nT*)—the error introduced by the intermediate results rounding off; *eadd*(*nT*)—the error resulting from the fact that the input of each subsequent stage will receive an intermediate sample, which already has an error that accumulates when the intermediate sample "passes" through the stages of the filter.

Moreover, the common model of calculation accuracy for an output sample in the positioning DF considering that the intermediate results will be truncated can be represented as follows:

$$e\_{\text{cr\\_com}}(nT) = Q\_{AD\mathcal{K}}(nT) + e\_{q\text{\\_ce}}(nT) + e\_{q\text{\\_is}}(nT) + e\_{lr\text{\\_util}}(nT) + e\_{\text{add}}(nT) \tag{20}$$

where *etr out*(*nT*)—the error introduced by truncation of intermediate calculation results in the filter links.

In the computational device of DF, operating on the basis of FFA, there is no such disadvantages as the operations of truncation (rounding off) of intermediate calculation results, the additional errors, the errors of quantification of input data and filter coefficients. Therefore, there is no accumulation of errors in the filter when calculating the output sample.

Then the accuracy model of output sample calculation for the DF operating on the basis of FFA can be represented as follows [14]:

$$
\Delta e\_{\text{er com}}(nT) = Q\_{\text{ADC}}(nT). \tag{21}
$$

Analysis of Equations (19)–(21) allows concluding that the accuracy of DF operating on the basis of FFA is significantly higher than the accuracy of positioning filters. Analysis of the influence of errors that occur when calculating the output sample of the FFA filter during signal processing in radio channels showed that the signal-to-noise ratio (SNR) is about 68–70 dB.

If the amplitude of the input signal is less than a half of the voltage of the full scale of ADC, then there is an additional attenuation about −20 dB. In this case, the SNR value is calculated as follows: SNR = 74 dB − 20 dB = 54 dB. However, as soon as radio channels are subject to significant fading, it can be argued that the SNR value will be 40–50 dB.

Thus, it can be concluded that in case of exclusion of ADC from the signal processing path, for example, from the digital receiver path (Figures 3 and 4) the value of SNR can be increased.

As soon as while building the digital receiver the range of data being processed in the digital signal processing device (Figures 3 and 4) is known, namely, the bit width of used ADC is known, then there is an opportunity to represent input data being processed in the filters' computational channel in the FFA codes without ADC and the device of their conversion. Considering Equation (12), the values of residues for any module of the selected base system do not exceed the value of the module and their count is equal to the module value. Then as a source of input data for the DF's computational channel a memory device can be used. It records the values of residuals for the selected module.

Therefore, the FFA ring property allows one to significantly simplify the hardware implementation of the computing device of the filter, to increase its performance and to exclude the error of output sample calculation. Thus

$$e\_{\text{er com}}(nT) = 0.\tag{22}$$

௧ ௨௧()

௧ ௨௧()

−

−

In the case of such calculations, the error introduced by analog-to-digital conversion of input data is excluded from the channel of FFA digital filter. This further increases the accuracy of the calculation of the filter output sample. The structure of the computational channel of DF operating based on the FFA is shown in Figure 5. () = 0.

() = 0.

() = () + () + ௦() + ௧ ௨௧()+ ௗௗ()

() = () + () + ௦() + ௧ ௨௧()+ ௗௗ()

() = ().

() = ().

−

−

**Figure 3.** The structure of digital receiver by radio frequency.

**Figure 4.** The structure of digital receiver by intermediate frequency.

**Figure 5.** The structure of the computational channel of DF operating based on the FFA.

Thus, the data converter from positional representation to the FFA code can be excluded from the DF structure. The ADC can be used as a control device for the selection of the required residue from the memory device of the output sample. In this case, one ADC can be used for all computational channels of the FFA DF.

#### **4. Testing of DF in the FFA**

Let us evaluate the proposed solutions experimentally. Evaluation of the efficiency of the proposed conversion method is conducted based on the comparison of its conversion time with the existing methods. Evaluation is made considering the number of operation cycles of comparable devices. It is provided in Table 1.


**Table 1.** Comparative evaluation of performance of existing and developed conversion methods.

The data provided in Table 1 allows concluding that the proposed method for data conversion is efficient and can be used to build DF operating in the FFA codes.

The existing methods for data conversion from the positional representation to the FFA code are based on the sequential bitwise conversion of the source number. To get the modulo residue the arithmetic operations with the number bits are conducted. The type and the number of arithmetic operations depend on the conversion method. It determines the number of operation cycles for the conversion device (see Table 1).

Let the base system be given *p*<sup>1</sup> = 5, *p*<sup>2</sup> = 9, *p*<sup>3</sup> = 13. The range of the processed data in this case is *p* = 585. In the case of this implementation, the DF structure will include three computational channels (see Figure 1). Each computational channel, except the computational device implementing the filtering algorithm, includes the data conversion device (see Figure 2).

In the considered case, when the conversion device is excluded from the structure of the calculation (Figure 5), in the memory device for the input sample, for example, modulo *p*<sup>2</sup> = 9, the residue values for *p* = 585 will be recorded in the form of Table 2.


**Table 2.** The residues values for the range *p* = 585 modulo *p*<sup>2</sup> = 9.

Let us consider an example of a calculation of residue value for the number *X* = 32,015 modulo *p* = 17 for the method "Lowering the bit width with the parallel adder" (Table 1). Representing the number *X* = 32,015 in binary code and dividing it into blocks of 4 bits we obtain

$$X = \mathfrak{Z}2015 = 0111\ 1101\ 0000\ 1111\ \_$$

i.e., we have 4 numbers *a*<sup>3</sup> = 7, *a*<sup>2</sup> = 13, *a*<sup>1</sup> = 0, *a*<sup>0</sup> = 15. Then we can write

$$\begin{aligned} \vert 32015 \vert\_{\rm{17}}^{+} &= \vert 7 \cdot 2^{12} \vert\_{\rm{17}}^{+} + \vert 13 \cdot 2^{8} \vert\_{\rm{17}}^{+} + \vert 0 \cdot 2^{4} \vert\_{\rm{17}}^{+} + \vert 15 \cdot 2^{0} \vert\_{\rm{17}}^{+} \vert\_{\rm{17}}^{+} = \\ &= \vert 7 \cdot 16 \vert\_{\rm{17}}^{+} + \vert 13 \cdot 1 \vert\_{\rm{17}}^{+} + \vert 0 \cdot 16 \vert\_{\rm{17}}^{+} + \vert 15 \cdot 1 \vert\_{\rm{17}}^{+} \vert\_{\rm{17}}^{+} = \vert 10 + 13 + 0 + 15 \vert\_{\rm{17}}^{+} = 4 \end{aligned}$$

In the case of parallel implementation of this algorithm, the time of residue calculation is

$$T\_{FC} = t\_{LIT} + t\_{LIT} \times \log\_2\left[\frac{n}{k}\right] \tag{23}$$

where *k*—the block size (in the example *k* = 4); *n* = 16—the bit width of the input number X; *tLUT*—time of getting from the LUT-table (is taken equal to three cycles).

For the considered example, the time of residue calculation considering Equation (23) is equal to nine cycles, which corresponds to the Table 1.

To implement the considered converter (the number of hardware devices is brought to the amount of LUT-tables) eleven LUT-tables are required. For the suggested method (Figure 2), three LUT-tables are required. For the computational channel without conversion device (Figure 5), one LUT-table is required.

Following Equation (21), the error introduced to the processes signal by the twelve bit ADC is ∆ = 0.00024414. As an example, a non-recursive DF of the forty-fifth order is taken [13]. For evaluation the samples of impulse response of the filter taking values *N*<sup>1</sup> = −0.000105023, *N*<sup>2</sup> = −0.000125856, *N*<sup>17</sup> = 0.0364568, *N*<sup>18</sup> = 0.0328505 are selected. To simplify the provided values the sample sign of the impulse response is not considered and it is supposed that the ADC error decreases the values of the selected samples.

While calculating the output sample of the filter and representing the samples of impulse response in the binary code considering the ADC errors, their values can be written as follows: *N*<sup>1</sup> = 0.000139117, *N*<sup>2</sup> = 0.0001185284, *N*<sup>17</sup> = 0.03621266, *N*<sup>18</sup> = 0.03260636.

The values of samples of impulse response in the FFA code for the module *p*<sup>1</sup> = 5 without error (in decimal notation and in FFA code) will be written as follows:

*N*<sup>1</sup> = (105023)<sup>10</sup> = (3)FFA, *N*<sup>2</sup> = (125856)<sup>10</sup> = (1)FFA, *N*<sup>17</sup> = (36456800)<sup>10</sup> = (0)FFA, *N*<sup>18</sup> = (32850500)<sup>10</sup> = (0)FFA.

Considering the error introduced by the ADC, the values of samples of impulse response in the FFA code for the module *p*<sup>1</sup> = 5 will be written (in decimal notation and in FFA code) as follows:

*N*<sup>1</sup> = (139117)<sup>10</sup> = (2)FFA, *N*<sup>2</sup> = (1185284)<sup>10</sup> = (4)FFA, *N*<sup>17</sup> = (362126600)<sup>10</sup> = (0)FFA, *N*<sup>18</sup> = (326063600)<sup>10</sup> = (0)FFA.

As it follows from the conducted evaluation the values of the first and the second samples in the FFA code changed because of the ADC error influence. This error will change the filter frequency response in the process of calculation of its output sample. Considering that the error introduced by the ADC can affect the values of the significant number of impulse response samples, the distortion of frequency response will be significant.

#### **5. Discussion**

The known methods [18–21] for data conversion from the positional representation to the FFA code are based on the sequential bitwise conversion of the source number. To get the modulo residue the arithmetic operations with the number bits are conducted. The type and the number of arithmetic

operations depend on the conversion method. It determines the number of operation cycles for the conversion device. Our study has shown that the data converter from positional representation to the FFA code can be excluded from the DF structure. In this case, the ADC can be used as a control device for the selection of the required residue from the memory device of the output sample. We described the structure of the computational channel and provided several practical tests, aimed both at a speed characteristics study and error estimation. The optimization technique was given and experimentally tested.

The obtained results open the possibility for efficient and compact hardware implementation of the digital filters on modern devices (DSP, FPGA, ASIC, etc.) for processing signals in various areas, such as radio communication, hydroacoustics, radars and echolocation systems, as well as in industry, defense, law enforcement, and other fields of science and technology [22]. Further research will be devoted to the comparison of the proposed approach with existing techniques of low-precision digital filters [23] and control systems [24] implementation based on an alternative discrete operator technique [25,26] and Gaussian approximation approach [27]. We will also try to build efficient adaptive DF [28] based on the proposed approach.

#### **6. Conclusions**

Thus, the conclusion can be made that building of computational channel of the DF operating in the FFA codes without the converter and ADC influence on the input data values allows increasing accuracy of calculation of filter output sample. Similar estimates are valid not only for the impulse response samples of the filter but also for the input signals. In the first case, the impulse response of the filter is changed. In the second case, the output sample will be distorted. For example, if an adaptive DF will be implemented, the error of the output sample will require setting up the filter coefficients. It, in turn, will require high time costs.

The actual variant of construction and organization of calculations in the computational channel of the DF will depend on the requirements for performance and accuracy of the output sample calculation.

Finally, we can conclude that the properties of finite field algebra ensure the construction of efficient computational algorithms and structures of DF with high performance and accuracy of the output sample calculation.

**Author Contributions:** Conceptualization, D.K. and S.A.; bata curation, A.V. and D.B.; formal analysis, E.D. and D.B.; funding acquisition, D.K.; investigation, S.A. and D.B.; methodology, A.V. and P.L.; project administration, D.K., S.A., and P.L.; resources, S.A. and E.D.; software, A.V. and P.L.; supervision, D.K. and S.A.; validation, P.L. and D.B.; visualization, A.V. and E.D.; writing—original draft, D.K., A.V., and D.B.; writing—review and editing, E.D. and D.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the grant of the Russian Science Foundation (Project №19-19-00566).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **Multiresolution Speech Enhancement Based on Proposed Circular Nested Microphone Array in Combination with Sub-Band A**ffi**ne Projection Algorithm**

#### **Ali Dehghan Firoozabadi 1,\* , Pablo Irarrazaval 2,3,4 , Pablo Adasme <sup>5</sup> , David Zabala-Blanco 6,\* , Hugo Durney <sup>1</sup> , Miguel Sanhueza <sup>1</sup> , Pablo Palacios-Játiva <sup>7</sup> and Cesar Azurdia-Meza <sup>7</sup>**


Received: 6 April 2020; Accepted: 4 June 2020; Published: 6 June 2020

**Abstract:** Speech enhancement is one of the most important fields in audio and speech signal processing. The speech enhancement methods are divided into the single and multi-channel algorithms. The multi-channel methods increase the speech enhancement performance by providing more information with the use of more microphones. In addition, spatial aliasing is one of the destructive factors in speech enhancement strategies. In this article, we first propose a uniform circular nested microphone array (CNMA) for data recording. The microphone array increases the accuracy of the speech processing methods by increasing the information. Moreover, the proposed nested structure eliminates the spatial aliasing between microphone signals. The circular shape in the proposed nested microphone array implements the speech enhancement algorithm with the same probability for the speakers in all directions. In addition, the speech signal information is different in frequency bands, where the sub-band processing is proposed by the use of the analysis filter bank. The frequency resolution is increased in low frequency components by implementing the proposed filter bank. Then, the affine projection algorithm (APA) is implemented as an adaptive filter on sub-bands that were obtained by the proposed nested microphone array and analysis filter bank. This algorithm adaptively enhances the noisy speech signal. Next, the synthesis filters are implemented for reconstructing the enhanced speech signal. The proposed circular nested microphone array in combination with the sub-band affine projection algorithm (CNMA-SBAPA) is compared with the least mean square (LMS), recursive least square (RLS), traditional APA, distributed multichannel Wiener filter (DB-MWF), and multichannel nonnegative matrix factorization-minimum variance distortionless response (MNMF-MVDR) in terms of the segmental signal-to-noise ratio (SegSNR), perceptual evaluation of speech quality (PESQ), mean opinion score (MOS), short-time objective intelligibility (STOI), and speed of convergence on real and simulated data for white and colored noises. In all scenarios, the proposed method has high accuracy at different levels and noise types by

the lower distortion in comparison with other works and, furthermore, the speed of convergence is higher than the compared researches.

**Keywords:** speech enhancement; adaptive filter; microphone array; sub-band processing; filter bank

#### **1. Introduction**

In the current century, the smartphones and other communication devices have been an important part of human life, where it is impossible to have social communications without them [1,2]. One of the principal parts in these smartphones is the signal processing platform. This part has an important role in the telecommunication and audio signal processing. Denoising and dereverberation are two main sections in the signal processing and enhancement platforms, which is the aim of this article, to increase the performance of speech enhancement algorithms [3]. Increasing the number of sensors improves the accuracy of denoising algorithms due to the spatial spectrum extension by providing the proper information. The definition of accuracy in the enhancement algorithms is how the enhanced signal is closer to the original signal with a high level of noise elimination and less distortion. Therefore, the speech enhancement is the main part in such applications as: hearing aid systems, mobile communication, speaker localization and tracking, speech recognition, voice activity detection (VAD), speaker identification, etc. The denoising algorithms should be implemented in a way to keep the speech intelligibility in an acceptable range and to remove a high level of noise and reverberation. Then, the signal-to-noise ratio (SNR) cannot be the only specific factor for comparing the speech enhancement methods. The qualitative criteria such as: perceptual evaluation of speech quality (PESQ) [4], mean opinion score (MOS) [5], and short-time objective intelligibility (STOI) [6] are very useful to show the performance of denoising methods in comparison with other previous works along with quantitative criteria such as: overall SNR and segmental SNR (SegSNR) [7]. The performance of the denoising algorithms is calculated by considering the qualitative and quantitative criteria at the same time, which are the proper measurements for comparison with other previous works.

In recent years, many of the single and multi-channel methods have been proposed for speech enhancement. The single-channel methods are still challenging strategies for the speech enhancement due to the limited information. The traditional speech enhancement methods such as the Wiener filter (WF) and distributed multichannel WF (DB-MWF) [8,9], spectral subtraction [10,11], and statistical-model-based [12,13] have superior performances in stationary noisy environments but the stability and accuracy of these methods are strongly decreased in non-stationary noisy conditions. However, existing noise estimation methods such as minima-controlled recursive averaging [14,15] and minimum statistics [11,16] follow the stationary noise energy. However, they do not have the ability to follow the non-stationary noise energy. For example, the method proposed in [16] is presented to estimate the power spectral density (PSD) of a non-stationary noise signal. This method can be considered in combination with any speech enhancement algorithm, which requires the noise PSD estimation. The presented method follows the spectral minima in each frequency band by minimizing conditional mean square error (MSE) criteria in each time frame, which develops the optimal smoothing parameter for recursive smoothing of the PSD of the noisy speech signal. Therefore, an unbiased noise estimator is presented based on the optimally smoothed PSD estimation and the analysis of the statistics of spectral minima. Therefore, the noise estimation accuracy in some methods [15,16] is affected when the noise is non-stationary. A group of speech enhancement methods are proposed based on a priori information of speech signals such as the auto-regressive hidden Markov model (ARHMM) [17–19]. The noise and speech signals are modeled as an auto-regressive (AR) process in these methods. In addition, the hidden Markov model (HMM) is implemented for modeling the prior information of speech and noise features. For example, the methods in [18,19] are considered for modeling the speech and noise spectrum shape. Therefore, the spectrum gain is calculated instead of

the whole spectrum for the speech and noise signals. The noise-spectrum gain estimation is adapted by the fast variations of the signal energy, which is known as non-stationary noise.

Masoud and Sina [20] proposed a novel method based on the normalized fractional of the two-channel least mean square (LMS) algorithm for enhancing the speech signal. The presented algorithm is known as fractional LMS, which is obtained by considering the fractional terms in the calculation of filter coefficients of the standard LMS algorithm. The normalization is a proper strategy to improve the performance of the LMS algorithm. Therefore, a normalization step is implemented on the fractional LMS in order to promote the performance of the enhancement method. The proposed two-channel method has a higher performance in terms of the MSE criteria in comparison with other works. Pagula and Kishore [21] proposed a recursive least square (RLS)-based adaptive filter for the application of speech enhancement. The segmentation step is considered for the microphone signals to provide a better stationary of the speech signals. In the following, the adaptive filter coefficients are calculated based on the modified version of the RLS method. The filter coefficients are calculated in a way to have the least distortion in the enhanced speech signals. The presented method has a high performance in the presence of white noise for a different range of SNRs. Qi et al. [22] proposed a method for estimation of the short-time linear prediction parameters of the Wiener filter. In the presented work, a speech signal spectrum modeling is proposed based on the prior information of the speech linear prediction in order to model the noise as same as the speech signal. The difference between the proposed method with other previous works is the use of multiplicative update rule for better estimation of the coefficients. Tavakoli et al. [23] introduced a framework for the speech enhancement based on an ad-hoc microphone array. A subarray is considered for coherence calculation in the speech signal. A coherence measurement is proposed based on the speech quality in the entrance of the array in order to select the subarrays in the local speech enhancements, when more than one subarray is used. The proposed method is evaluated based on quantitative and qualitative criteria such as: array gains, speech distortion ratio, PESQ, and STOI to show the superiority of the algorithm. Shimada et al. [24] proposed an unsupervised speech enhancement method based on the non-negative matrix factorization and sub-band beamforming for robust speech recognition against the noise. In the recent years, the minimum variance distortionless response (MVDR) beamforming is widely used to achieve the speech enhancement because this method properly works when there are steering vectors for the speech signal and spatial covariance matrix for the noise. In the presented algorithm, an unsupervised method decomposes each time-frequency bin to the sum of the noise and signal by implementing the multi-channel non-negative matrix factorization (MNMF). The presented method estimates the spatial covariance matrix (SCM) for the signal and noise by the use of spectral noise and speech features. In this paper, the online MVDR beamforming is proposed via an adaptive update for the MNMF parameters. Kavalekalam et al. [25] proposed a speech enhancement model-based method to increase the speech perception for auditory earphones applications. In the proposed method, a binaural speech enhancement framework is introduced, which is implemented by a speech production approach. The proposed speech enhancement framework is based on a Kalman filter, which is presented to use the speech production dynamic in the procedure of the speech enhancement. The Kalman filter needs to have an estimation from the short time predictor (STP) of clean speech, noise, and the pitch estimation of the clean speech. A binaural method for STP parameters estimation is proposed in this paper with a directional pitch predictor based on the harmonic model and maximum likelihood (ML) criteria for pitch features estimations. These parameters are calculated just based on 2-microphones signals equivalent to human ears. Botinhao et al. [26] proposed a simultaneous noise-reverberation enhancement method for text-to-speech (TTS) systems. The recorded voices in noisy-reverberant environments affects the quality of the TTS systems. A simple way is to increase the quality of the prerecorded speech signals for the TTS training system by speech enhancement methods such as: noise suppression and dereverberation algorithms. Then, a recurrent neural network is considered in this paper for the speech enhancement. The neural network is trained by parallel data of clean speech and recorded speech with low quality. The low quality speech signal is obtained

by the addition of environmental noise and convolution between the room impulse response and the clean speech. The separated neural networks are trained by only-noise, only-reverberation, and noisy-reverberant data. The quality of the training data with a low quality speech signal is highly improved by the use of this neural network. Wang et al. [27] proposed a model-based method for speech enhancement in modulation domain by the use of a Kalman filter. The proposed predictor models the estimated amplitude spectral dynamically from the speech and noise to calculate the minimum mean square error (MMSE) of the speech amplitude spectrum taking into account that the noise and speech are additive in the complex plane. The stationary Gaussian model is proposed to consider the dynamic noise amplitude as same as the dynamic speech amplitude, which is a mixture of Gaussian models that the centers are located in a complex plane.

In our article, a multi-channel speech enhancement method is introduced based on the proposed circular nested microphone array in combination with the sub-band affine projection algorithm (CNMA-SBAPA). A nested microphone array increases the accuracy of the speech enhancement methods by increasing the information. Nevertheless, spatial aliasing is one of the challenges when microphone arrays are used. Firstly, a uniform circular nested microphone array (CNMA) is proposed for eliminating the spatial aliasing. Additionally, the array dimensions are designed in a way to be applicable in the real conditions. The speech components are variable in frequency bands. Therefore, a sub-band processing method is considered for speech signals. This method provides the high frequency resolution in low speech frequency components. Finally, the affine projection algorithm (APA), as an adaptive method for the speech enhancement, is implemented on sub-band signals from the circular nested microphone array (NMA). Since each APA block is implemented on a sub-band with specific information, the accuracy and speed of convergence are increased in this condition. In the last step, the synthesis filters are used to generate the enhanced speech signal. The proposed system with sub-band APA is compared by the quantitative (segmental SNR), qualitative (PESQ, MOS, and STOI) criteria, and speed of convergence with the least mean square (LMS), traditional APA, recursive least square (RLS), distributed multichannel Wiener filter (DB-MWF), and multichannel nonnegative matrix factorization-minimum variance distortionless response (MNMF-MVDR) algorithms on real and simulated data under white and colored noisy conditions. The results show the superiority of the proposed system in comparison with other previous works in all environmental conditions.

Section 2 shows the microphone signal model and the proposed uniform circular nested microphone array. Section 3 includes the proposed sub-band algorithm with analysis and synthesis filter banks in combination with the sub-band APA. The results on real and simulated data are discussed in Section 4. Section 5 includes some conclusions.

#### **2. The Microphone Model and Proposed Nested Microphone Array**

In this section, the microphone signal model was presented to produce the simulated data. In addition, the uniform CNMA was proposed for eliminating the spatial aliasing. Additionally, the nested subarrays and microphone combinations are introduced in this section.

#### *2.1. Microphone Signal Model*

The microphone signal modeling is an important part in the implementation of speech processing algorithms such as: speech enhancement, speaker tracking, speech recognition, etc. Two models are usually considered in this processing: ideal and real models [28]. In the ideal model, which is known as an open-space model, the received signal in a microphone place is a weakened and delayed version of the transmitted signal from the source location. The ideal model for microphone signals is expressed as:

$$x\_m[n] = \frac{1}{r\_m}s[n - \tau\_m] + v\_m[n].\tag{1}$$

where *xm*[*n*] is the received signal in the *m*-th microphone,*s*[*n*] is the speech source signal (transmitted signal), *rm* is the distance between source and *m*-th microphone, τ*m* is the time delay between source and *m*-th microphone, and *vm*[*n*] is the additive noise in *m*-th microphone place. This model cannot show the real environments and close space conditions because the reverberation effect is discarded. Therefore, the real model is introduced for microphone signal simulations to provide the real environmental conditions for evaluating the speech enhancement algorithms. The real model simulates the microphone signal similar to the environmental conditions. The expression for real model is shown as: [ ]

$$\mathbf{x}\_{m}[n] = s[n] \* \boldsymbol{\gamma}\_{m}[r\_{m}, n] + \boldsymbol{\upsilon}\_{m}[n],\tag{2}$$

where the source signal is convolved to the room impulse response to model the real environments. In this equation, γ*m*[*rm*, *n*] is the impulse response between the source and *m*-th microphone, which contains the attenuation factor and whole reverberation effect in the real conditions, and \* denotes to the convolution operator. The simulated signals are similar to real conditions by considering this mathematical real model. [ ,]

#### *2.2. The Proposed Uniform Circular Nested Microphone Array*

The microphone array increases the accuracy of the speech enhancement algorithms due to increasing the information. However, the spatial aliasing based on the inter-microphone distances destroys the recorded speech signals, and in the following, the performance of the speech enhancement algorithms. Nested microphone array has the capability to eliminate the spatial aliasing [29]. In this section, a uniform CNMA is proposed where by having a symmetrical shape, provides the same probability for all speakers around the array, and the quality of the enhanced signals are not dependent on the position of the speakers. Additionally, its small structure helps to be applicable in most of the conditions in comparison with other big arrays. Figure 1 shows the block diagram of the proposed speech enhancement algorithm, where the NMA part with its analysis filters and down-sampler blocks are shown in the left side.

**Figure 1.** The block diagram of the proposed circular nested microphone array in combination with the sub-band affine projection algorithm (CNMA-SBAPA) for the speech enhancement.

16,000 Hz 1 5850 Hz. lim lim 2 lim(1) 2.2 cm 2 2925 Hz The speech signal has a frequency range of [0–8000] Hz with a sampling frequency. *F<sup>s</sup>* = 16, 000 Hz The proposed CNMA is designed for the frequency range [50–7800] Hz, which covers the wideband speech spectrum. The CNMA is structured by four subarrays. The first subarray is designed for the range B1 = [3900–7800] Hz, of central frequency *fc*<sup>1</sup> = 5850 Hz. The inter-microphone distance (*d*lim) should be *d*lim < λ/2(λ is the wavelength of the highest frequency component in the related sub-band) to avoid the spatial aliasing, this is *d*lim(1) < 2.2 cm for the first subarray. The second subarray covers the frequency range B2 = [1950–3900] Hz with a central frequency of *fc*<sup>2</sup> = 2925 Hz, therefore *d*lim(2) = 2*d*<sup>1</sup> < 4.4 cm. The third subarray is defined for the frequency range B3 = [975–1950] Hz

lim(2) 1 2 4.4 cm.

with a central frequency of *fc*<sup>3</sup> = 1462 Hz and *d*lim(3) = 4*d*<sup>1</sup> < 8.8 cm. Finally, the forth subarray is designed for the frequency range B4 = [50–975] Hz with a central frequency of *fc*<sup>4</sup> = 512 Hz and the inter-microphone distance is *d*lim(4) = 8*d*<sup>1</sup> < 17.6 cm. For a more complexity system, a higher number of microphones could be considered to design a larger nested microphone array. Table 1 shows the summarized information to design the uniform CNMA.

1462 Hz lim(3) 1 4 8.8 cm

4

512 Hz

3

lim(4) 1 8 17.6 cm


**Table 1.** The information to design the proposed uniform CNMA.

The microphone array was structured to have the closest microphone distances as *dsim*(1) = 2.2 cm (for the simulated data) based on the designed CNMA. Therefore, the first subarray included the microphone pairs {1,2}, {2,3}, {3,4}, {4,5}, {5,6}, {6,7}, {7,8}, and {8,1}. The microphone pairs {1,3}, {3,5}, {5,7}, {7,1}, {2,4}, {4,6}, {6,8}, and {8,1} were selected for the second subarray with an inter-microphone distance of *dsim*(2) = 4.2 cm. The third subarray has the inter-microphone distance of *dsim*(3) = 5.6 cm. Then, the microphone pairs {1,4}, {2,5}, {3,6}, {4,7}, {5,8}, {6,1}, {7,2}, and {8,3} wereconsidered for this subarray. For the last subarray, the inter-microphone distance is *dsim*(4) = 6 cm and the microphone pairs {1,5}, {2,6}, {3,7}, and {4,8} were selected for the implementation. Given our actual microphone array, the minimum inter-microphone distance that we could have was2.7cm (for the real data). For this reason, we did two evaluations, one for simulated data with *dsim*(1) = 2.2 cm, as dictated by the theory, and one for real data with *dreal*(1) = 2.7 cm, to match our hardware. All subarrays are shown in Figure 2, which shows the designed CNMA with its small shape. (2) 4.2 cm. (3) 5.6 cm (4) 6 cm (1) 2.2 cm real(1) 2.7 cm

**Figure 2.** The proposed uniform CNMA and allocated microphones for each subarray.

Each subarray needs an analysis filter bank to avoid the spatial aliasing and imaging. Figure 1 (left and right sides) shows the analysis and synthesis filter banks along with the up-sampler and down-sampler blocks. The multirate sampling by the use of up-samplers and down-samplers is implemented to provide the frequency bands. As shown in Figure 3a, the analysis filter bank *Hi*(*z*) and down-sampler *D<sup>i</sup>* are realized as a multi-level tree structure. Each stage of the tree requires a high-pass filter (HPF) *HPi*(*z*), a low-pass filter (LPF) *LPi*(*z*), and a down-sampler *D<sup>i</sup>* (for the analysis filter bank) ( ),

or up-sampler *D<sup>i</sup>* (for the synthesis filter bank). The relation between the analysis filter bank *Hi*(*z*), the LPFs, and HPFs in the tree structure is expressed as: 3 12 3 2 4 4 12 3 () () ( ) ( ) ( ) ( ) ( ) ( ). ( ),

() () ( )

2

2 4

( ) ( ),

( ) ( ),

1 1

2 12

( ) ( )

$$\begin{array}{l}H\_{1}(z) = HP\_{1}(z) \\ H\_{2}(z) = LP\_{1}(z)HP\_{2}(z^{2}) \\ H\_{3}(z) = LP\_{1}(z)LP\_{2}(z^{2})HP\_{3}(z^{4}) \\ H\_{4}(z) = LP\_{1}(z)LP\_{2}(z^{2})LP\_{3}(z^{4}). \end{array} \tag{3}$$

( )

( )

In each level of the tree, a 52-tap finite impulse response (FIR) LPF and HPF are implemented by the Remez method. The parallel filters have a stop-band attenuation of 50dB and a transition band 0.0575. Figure 4 shows the frequency response for the analysis filter banks.

**Figure 4.** The frequency response for the analysis filter banks.

#### **3. The Proposed Multiresolution Sub-band-APA for the Speech Enhancement**

Speech is a wideband and non-stationary signal, where each frequency band has different information. This feature for the speech signal provides the conditions to evaluate the speech spectrum components by considering different frequency resolution. For example, speech information is condensed at the lower part of the spectrum. Therefore, the accuracy of the speech enhancement algorithm is increased by a focus to low frequency components. In this article, a specific sub-band processing along with a filter bank was proposed for paying more attention to lower frequencies by the use of filters with narrower bandwidths. Table 2 shows the information to design and implement this analysis filter bank. There is still not any certain rule for selecting the number of frequency bands. Of course, by having narrower band filters in low frequencies, we have more frequency resolution, but the concern is the computational complexity. In other hands, adding each more filter means entering more microphone pairs and more calculations. Based on the experiments, this number of frequency bands prepares enough performance and acceptable level of complexity.

**Table 2.** The required information to design the analysis filter bank for sub-band processing in the proposed CNMA-SBAPA algorithm.


As seen, the filter bandwidth is smaller in low frequencies in comparison with high frequencies. This property increases the frequency resolution for low frequencies. The most important benefit in sub-band processing is the noise estimation from the silent part of the speech signal in each sub-band. Since in the proposed denoising method, the noise estimation is required as an input for the enhancement algorithm. Therefore, the more accurate and stationary noise estimation is obtained by sub-band processing of the speech signal, which increases the denoising algorithm performance. If *xm*[*n*] is considered as an input signal for the *m*-th microphone, the analysis filter output for the CNMA is expressed as:

$$\mathbf{x}\_{m,i}[n] = \mathbf{x}\_m[n] \* h\_i[n] \text{ where } \langle m = 1, \dots, 8 \text{ and } i = 1, \dots, 4 \rangle. \tag{4}$$

where *xm*,*<sup>i</sup>* [*n*] is the analysis filter output and *h<sup>i</sup>* [*n*] is the impulse response for this filter. Therefore, the spatial aliasing is eliminated from each microphone pairs of CNMA by the use of analysis filters, which are designed specifically for each subarray. In the following, the microphone signals are entered to the proposed analysis filter bank for the sub-band processing. As shown in Table 2, each microphone signal is divided into 10 sub-bands. These numbers of sub-bands were selected based on our experiments in order to provide a proper efficiency and with low computational complexity, by preparing a high frequency resolution in low frequencies. Therefore, the output of the proposed analysis filter bank is expressed as:

$$y\_{m,i,j}[n] = \mathbf{x}\_{m,i}[n] \* F\_j[n] \text{ where } \{j = 1, \dots, 10, \, i = 1, \dots, 4, \, m = 1, \dots, 8\}. \tag{5}$$

where *F<sup>j</sup>* [*n*] is the impulse response for each sub-band filter in the analysis filter bank and *ym*,*i*,*<sup>j</sup>* [*n*] is the output of the analysis filter bank for the *j*-th sub-bands and *m*-th microphone. The signals

*ym*,*i*,*<sup>j</sup>* [*n*] are the sub-band microphone signals for the proposed sub-band-APA algorithm. In the following, the sub-band-APA (SBAPA) algorithm along with the circular nested microphone array (CNMA-SBAPA) is proposed for the speech enhancement. Adaptive filters as an important tool in digital signal processing have been utilized for many years in such application as: speech signal enhancement, system identification, localization and tracking, etc. In adaptive filters, the coefficients change periodically to be adapted based on the time varying features of the noise, and this property increases the performance of the denoising system in comparison with normal methods. In addition, these filters are non-linear and homogeneous since their features are dependent on the input signal. The adaptive filters have the following advantages: low delay and better tracking in non-stationary conditions [30]. These advantages are very important in dereverberation, denoising, time delay estimation, channel equalization, and speaker tracking applications. In these applications, low delay and robustness against of non-stationary noisy and reverberant conditions are important parameters to improve the performance of the proposed systems. The existence of the reference signal, which is hidden in the filter coefficient estimations, defines the system performance. Figure 5 shows the general structure of the adaptive filter in denoising applications. , ,

**Figure 5.** The general structure of the adaptive filter for denoising applications.

, , [ ] [ We change the notation for input signal in adaptive filter (*ym*,*i*,*<sup>j</sup>* [*n*]) to *y*[*n*] for simplifying the mathematical expressions. An adaptive filter is expressed as follows [31]:

$$
\begin{bmatrix} z[\eta] = w\_L[\eta] \* y[\eta], \tag{6}
$$

[ ] [ ] where *n* is the time index, *z*[*n*] is the adaptive filter output, and *wL*[*n*] is the adaptive filter coefficients with length *L*. The update algorithm in Figure 5 is considered as a principal part for an adaptive filter, which is the APA in this article. The main idea for an adaptive filter is to minimize the error signal *e*[*n*] to make the output of the filter as similar as the desired signal.

[ ] [ The input signal *y*[*n*] for the adaptive filter is considered as the summation of the noise (*v*[*n*]) and desired signal (*d*[*n*]), which is described as:

$$\begin{bmatrix} \ \ \ \ \end{bmatrix} \tag{7}$$

$$y[n] = d[n] + v[n]. \tag{7}$$

[ ] [ ] [ ]. The adaptive filter has a FIR structure, namely the filter is designed based on the limited number of coefficients in the time domain. For a filter with order of *L*, the filter coefficients are defined as:

[ ] [ ] [ ] [ ] 0 , 1 ,..., 1 . *wL*[*n*] = [*w*[0], *w*[1], . . . , *w*[*L* − 1]]. (8)

The error signal or cost function is defined as the difference between estimated and desired signal, namely:

$$e[n] = d[n] - z[n].\tag{9}$$

$$\begin{bmatrix} \ \ \ \end{bmatrix} \qquad \begin{bmatrix} \ \ \end{bmatrix} \qquad \begin{bmatrix} \ \ \end{bmatrix}$$

As shown in Equation (6), the output of the adaptive filter *z*[*n*] is defined as the convolution between the filter coefficients *wL*[*n*] and the input signal *y*[*n*], where *y*[*n*] is considered as the input of the adaptive filter, namely:

$$y[n] = [y[n], y[n-1], \dots, y[n-L]].\tag{10}$$

In addition, the adaptive filter coefficients change during the time, which is written as:

$$w\_L[n] = w\_L[n-1] + \Delta w\_L[n] \tag{11}$$

where ∆*wL*[*n*] is defined as the correction factor for the filter coefficients. The adaptive filter produces the correction factor based on the input and error signal. In Figure 5, several algorithms can be considered for updating the filter coefficients. The APA is one of the fastest and most efficient methods for this purpose. The AP algorithms were introduced to improve the speed of convergence in the gradient-based algorithms, especially when the input signal has a non-stationary spectrum. It is because the speed of convergence is decreased in the case of non-stationary and constraint spectrums [30].

Filter update equation is one of the most important features in the AP algorithms, which uses *N* vectors of the input data to update the filter coefficients instead of using one vector of the input data, i.e., the normalized least mean square (NLMS). Therefore, more information was considered in the time for accurately updating the filter coefficients. Thus, the AP algorithm is known as an improved and extended version of the NLMS method or it can be expressed mathematically as a constraints minimization problem, which is expressed as follows.

The variation for *L* filter coefficients during the two consecutive times is given by:

$$
\Delta w\_L[n] = w\_L[n] - w\_L[n-1].\tag{12}
$$

We minimized Equation (13) under *N* constraints, which are shown in Equation (14) to extend the adaptive filter algorithm.

$$\left\|\Delta w\_{L}[n]\right\|^{2} = \Delta w\_{L}^{T}[n]\Delta w\_{L}[n] \tag{13}$$

where *N* constraints are defined as follows:

$$w\_{\perp}^{T}[n]y[n-k] = d[n-k] \text{ for } k = 0, \dots, N-1,\tag{14}$$

where *y*[*n* − *k*] is the vector of *N* last sample from the input signal and *d*[*n*] is the desired signal, see Figure 5. The proposed solution formulates the update algorithm for AP, which is expressed as:

$$w\_L[n] = w\_L[n-1] + A^T[n] \left( A[n]A^T[n] \right)^{-1} e\_N[n] \tag{15}$$

where:

$$A[n] = \left( y\_L[n], y\_L[n-1], y\_L[n-2], \dots, y\_L[n-N+1] \right)^T,\tag{16}$$

and *eN*[*n*] is a vector of size *N* × 1, which is written as:

$$w\_N[n] = d\_N[n] - A[n]w\_L[n-1].\tag{17}$$

The vector *dN*[*n*] is the desired signal with size *N* × 1, namely:

$$d\_N[n] = \left(d[n], d[n-1], \dots, d[n-N+1]\right)^T. \tag{18}$$

The general format for AP algorithm is obtained by rewriting Equation (15) as:

$$w\_L[n] = w\_L[n-1-\alpha(N-1)] + \mu A\_\pi^T[n] \left( A\_\pi[n] A\_\pi^T[n] + \delta I \right)^{-1} e\_{N\tau}[n].\tag{19}$$

2

2

<sup>21</sup> **w**

**,**

If *eN*τ[*n*] is considered as *eN*τ[*n*] = *dN*τ[*n*] − *A*τ*wL*[*n* − 1 − α(*N* − 1)], then: 2 21 12 **\*w ,**

$$A\_{\pi}[n] = \left( y\_L[n], y\_L[n-\pi], \dots, y\_L[n-(N-1)\pi] \right)^T,\tag{20}$$

and the signal *dN*τ[*n*] is expressed as: 1 <sup>2</sup> 

$$d\_{N\tau}^T[n] = (d[n], d[n-\tau], \dots, d[n-(N-1)\tau]).\tag{21}$$

<sup>1</sup>

As shown in Equation (19), the *N* required vectors to update the adaptive filter are not necessarily to be the last data vectors. Therefore, several versions of AP algorithms are defined based on the way to select the input data and parameters in Equation (19). There are some developed algorithms based on these parameters selections such as: the NLMS along with the orthogonal correction factor (OCF-NLMS) [32], the partial rank affine projection algorithm (PRAPA) [33], and the standard APA [34] whose parameters are α = 0, δ = 0, τ = 1. If δ parameter differs to 0, the APA algorithm is extended to APA with regularization (R-APA) [35], where the update equation for the filter coefficients is a specific case of the Levenberg Marquardt regularized APA (LMR-APA) algorithm [36]. 2 21 12 12 12 **\* p \*p \* <sup>p</sup> w .** <sup>12</sup> **w** <sup>21</sup> **w** 1 <sup>2</sup> <sup>12</sup> **w** <sup>21</sup> **w** 12 12 **w p** 21 21 **w p** <sup>1</sup> 2 

1 12 21 21 21

The introduced AP algorithm contains one input signal. Since pairs of microphones are used in the proposed CNMA, the AP algorithm is generalized to a two-microphone version [37]. Firstly, the generalization of a two-microphone structure is defined, where each microphone contains the mixing speech and noise signal, which is expressed as (see Figure 6a): <sup>12</sup> **w** <sup>21</sup> **w**

$$q\_{m}[n] = \sum\_{i=1}^{2} \sum\_{r=1}^{L-1} \mathbf{p}\_{im}[r] s\_{i}[n-r] \;/\; m = 1,2,\tag{22}$$

where *s<sup>i</sup>* [*n*] represents the source signals, *qm*[*n*] is the microphone signals, *L* is the impulse response length, and **p***im*[*r*] are the impulse responses between the microphone and sources. These impulse responses are considered as linear time-invariant (LTI) systems. Two source signals *s<sup>i</sup>* [*n*] are selected as the speech signal *s*[*n*] and noise signal *b*[*n*]. It is assumed that the speech and noise signals are independent, which means *E s*[*n*]*b*[*n* − *m*] = 0, ∀*m*, where *E* denotes to expected value. Then, the noise and speech signals are uncorrelated. Based on the general structure, which is shown in Figure 6a, the microphone signals *q*<sup>1</sup> [*n*] and *q*2[*n*] are expressed as follows: 1 1 **q q** 21 21 21 2 1 2 2 **w w q u,** 1 **q q q**1 **q**<sup>2</sup> 1 11 1 1 1 **q , ,...,** 2 22 2 1 1 **q , ,...,** <sup>1</sup> 

$$q\_1[n] = s[n] \* \mathbf{p}\_{11} + b[n] \* \mathbf{p}\_{21} \tag{23}$$

$$q\_2[n] = s[n] \* \mathbf{p}\_{12} + b[n] \* \mathbf{p}\_{22}.\tag{24}$$

**Figure 6.** (**a**) The general structure of the proposed denoising system, (**b**) the simplified presented model for the two-microphone system, and(**c**) the affine projection algorithm (APA) structure for the two-microphone.

In addition, **p**<sup>11</sup> and **p**<sup>22</sup> represent the impulse responses for direct path, and **p**<sup>12</sup> and **p**<sup>21</sup> are cross-coupling for the channels between the sources and microphones. The presented model is simplified by considering **p**<sup>11</sup> = **p**<sup>22</sup> = δ[*n*], which is shown in Figure 6b as:

$$\mathbf{q}\_1[n] = \mathbf{s}[n] + b[n] \ast \mathbf{p}\_{21'} \tag{25}$$

$$
\rho\_2[n] = s[n] \star \mathbf{p}\_{12} + b[n]. \tag{26}
$$

Therefore, the microphone signals are generated based on the impulse responses between the source and microphones, noise, and speech signals. The structure in Figure 6c was proposed to retrieve the source signal from the received noisy signals *q*<sup>1</sup> [*n*] and *q*2[*n*]. The proposed structure provides the conditions to retrieve the original signal by the use of adaptive filters **p**<sup>11</sup> and **p**22. The signals *u*<sup>1</sup> [*n*] and *u*2[*n*] for the two-microphone structure are defined as follows:

$$
\mu\_1[n] = q\_1[n] - q\_2[n] \* \mathbf{w}\_{21}[n],\tag{27}
$$

$$
\mu\_2[n] = q\_2[n] - q\_1[n] \* \mathbf{w}\_{12}[n],\tag{28}
$$

where in Equations (27) and (28), **w**12[*n*] and **w**21[*n*] are the adaptive filters for eliminating the noise of microphone signal *q*<sup>1</sup> [*n*] and the speech of microphone signal *q*2[*n*], respectively. Signals *u*<sup>1</sup> [*n*] and *u*2[*n*] are rewritten by replacing Equations (25) and (26) to Equations (27) and (28) as:

$$u\_1[n] = s[n] \* [
\delta[n] - \mathbf{p}\_{12} \* \mathbf{p}\_{21}] + b[n] \* [\mathbf{p}\_{21} - \mathbf{w}\_{21}[n]],\tag{29}$$

$$\mu\_2[n] = b[n] \* [\delta[n] - \mathbf{p}\_{21} \* \mathbf{p}\_{12}] + s[n] \* [\mathbf{p}\_{12} - \mathbf{w}\_{12}[n]].\tag{30}$$

Two adaptive filters **w**12[*n*] and **w**21[*n*] are required to retrieve the original speech signal from the noisy signals *u*<sup>1</sup> [*n*] and *u*2[*n*]. There is just a unique structure for adaptive filters **w**12[*n*] and **w**21[*n*] as **w**12[*n*] = **p**<sup>12</sup> and **w**21[*n*] = **p**<sup>21</sup> to retrieve the enhanced speech of noisy signals *u*<sup>1</sup> [*n*] and *u*2[*n*]. This structure requires a VAD for preparing the noise estimation from the silent part of the recorded signals.

The AP algorithm is generalized to a two-microphone structure based on the obtained Equation (19) for updating the filter coefficients. The AP algorithm is the generalized version of the two-microphone NLMS [38], which is shown in Figure 6c for adaptive speech enhancement algorithm. Therefore, the adaptive filter coefficients **w**12[*n*] and **w**21[*n*] for two-microphone APA are expressed as:

$$\mathbf{w}\_{12}[n] = \mathbf{w}\_{12}[n-1] + \frac{\mu\_{12}}{\mathbf{q}\_{1}[n]\mathbf{q}\_{1}[n]^{T} + \delta I} \mathbf{q}\_{1}[n]\mathbf{u}\_{2}[n] \tag{31}$$

$$\mathbf{w}\_{21}[n] = \mathbf{w}\_{21}[n-1] + \frac{\mu\_{21}}{\mathbf{q}\_{2}[n]\mathbf{q}\_{2}[n]^{T} + \delta I} \mathbf{q}\_{2}[n]\mathbf{u}\_{1}[n] \tag{32}$$

where **q**<sup>1</sup> [*n*] and **q**<sup>2</sup> [*n*] are defined as **q**<sup>1</sup> [*n*] = [*q*<sup>1</sup> [*n*], *q*<sup>1</sup> [*n* − 1], . . . , *q*<sup>1</sup> [*n* − *N* + 1]] and **q**<sup>2</sup> [*n*] = [*q*2[*n*], *q*2[*n* − 1], . . . , *q*2[*n* − *N* + 1]]. The matrices of the two-microphone signals *q*<sup>1</sup> [*n*] and *q*2[*n*] have dimensions *L*× *N*, where *L* is the adaptive filter length and *N* is the projection order. The two parameters µ<sup>12</sup> and µ<sup>21</sup> are the step sizes, which control the convergence of adaptive filters **w**12[*n*] and **w**21[*n*]. These parameters should be selected in the range [0,2] to assure the convergence of AP algorithm. If *N* is selected as 1, the AP algorithm is converted to the NLMS method.

The proposed sub-band APA not only increased the accuracy of the speech enhancement algorithm, but also the speed of convergence was improved (Table 6 in the results section) in the implementations because the noise was estimated separately for each sub-band and it was stationary on narrow bandwidths. Then, the SBAPA was implemented on generated sub-bands by the analysis filters in Figure 3. As shown in Figure 1, a symmetrical synthesis filter bank and synthesis filters related to the nested microphone array were implemented for the reconstruction the final enhanced signal. The synthesis filters as similar as the analysis filters were implemented based on the tree structure in Figure 3b. Finally, all sub-band signals were summed to generate the final enhanced signal. In the next section, the performance of the proposed CNMA-SBAPA was compared with other previous works.

#### **4. Results and Discussion**

The experiments in order to evaluate the performance of the proposed method were implemented on the real and simulated data. The TIMIT dataset was considered for the simulated data, where the data collection MDAB0 by four continuous sentences SX139, SX229, SX319, and SX409 were selected as a male speaker in the simulations [39]. This dataset includes short sentences for testing and training the algorithms. The tones and frequency components are two different parameters in the speech signal. There are pitch and speech spectrum components for the speakers. It is important to work with male or female signals for the algorithms, which works with the pitch parameter. Since this parameter changes highly based on the gender. Since we consider the speech spectrum, then the issue to use the male or female speakers does not change the results. Therefore, 12.5 s male-speech signal is used for implementations and experiments. A voice activity detector is implemented to detect the silence part of the speech signal [40], and the noise spectrum is estimated of these parts for the proposed SBAPA. Figure 7 shows the simulated room with the location of speakers and microphone array. The inter-microphone distances *dsim* = 2.2 cm for the simulated data was selected based on the designed array. A speaker and a steered noise source were considered in the simulations. The room dimensions, speaker, and noise source locations were selected as 475,592,420cm, 374,146,110cm, and 362,412,120cm, respectively. These dimensions and locations were considered the same as the real room recording conditions. In addition, the proposed algorithm was implemented on real data to evaluate the real effect of the noise and reverberation on the performance. For this purpose, the real speech signal was recorded in the speech processing laboratory at Fondazione Bruno Kessler (FBK), Trento, Italy. Figure 8 shows a view of the recording room at FBK. Two electronic speakers were used instead of the human and noise source in the process of data recording. In addition, Figure 8 shows the position of the circular NMA in the center of the room. We were able to consider the minimum inter-microphone distance in the real conditions with our setup (see Figure 8) as *dreal* = 2.7 cm because of the microphone dimension, electronic board, and the microphone shield. Additionally, each microphone had a cross section, where in the real conditions it was about 0.7cm. It means it is hard to measure the exact distance between two microphones and it has some errors. Since all cross sections in a microphone are areas for a sound recording, then, based on all limitations, we were forced to have this inter-microphone distance for real data implementation even with a few millimeters difference with the mathematical calculations. Therefore, the differences in the results of our proposed method for the real and simulated data were for this an issue. In the real condition, there are always some inaccuracy factors for the measurements. We found the center of the room and the microphones were located on the table based on the primary measurements. All microphones were connected to the sound recording system, which uses parallel acquisition for all microphone channels. All channel acquisitions were synchronized and there was not any delay between recorded signals in different microphones or channels. The phase error based on the recording condition was very low and was even close to zero based on the audio recording system. In the real room, the table did not make any direct reflection. All the reflected waves from the table will cross to the walls and ceiling firstly, and since all of them were covered with curtains and sound absorption panels, the indirect reflections to the microphones were very few. Both speakers were connected to the two computers for playing the speech and noise with a sampling frequency of *F<sup>s</sup>* = 16, 000 Hz. The microphone, sound, and noise sources were selected in the simulations with exactly the same real conditions for the results to be comparable in these two conditions. Figure 9 shows the time-domain and spectrum of the male speech signal.

**Figure 7.** A view of the simulated room with the positions of speakers, noise source, and microphones.

**Figure 8.** The real recording room in the speech processing laboratory, FBK, Trento, Italy.

**Figure 9.** The time-domain and spectrum for the male speech signal.

The reverberation effect was considered in the experiments to provide the simulation conditions similar to the real scenarios. The image model was implemented in the simulations to produce the reverberation effect similar to the real conditions [41]. The image model produced the room impulse response between the source and microphone by considering the speaker position, microphone location, sampling frequency, room dimension, room reflection coefficients, impulse response length, and reverberation time. The received signal to the microphone was simulated by the convolution between the generated impulse response by the image method and the source signal. The impulse response was generated for the noise and speech sources because both receive the same effect of the room reverberation. In addition, noise was additive with the speech signal in the microphone positions. The room reverberation time was selected as *RT*<sup>60</sup> = 350 ms, which was considered for a room with a low level of reverberation to be the same as the real conditions. To generate the noisy signal, five types of noise were considered for the simulated and real data such as white noise, babble noise, train noise, car noise, and restaurant noise. Figure 10 shows the time-domain and spectrum for these noisy signals according to a SNR=0dB. The noise signal duration was 12.5 s, the same as the speech signal. <sup>60</sup> 350 ms,

**Figure 10.** *Cont*.

**Figure 10.** The time-domain and spectrum for a noisy speech signal with (**a**) white noise, (**b**) babble noise, (**c**) train noise, (**d**) car noise, and (**e**) restaurant noise according to a signal-to-noise ratio (SNR) = 0 dB.

The Hamming window with a length of 30 ms was selected for signal blocking to keep the stationarity of the signal in the short time. The projection order was considered as *N* = 4 to keep the computational complexity in an acceptable range in addition to a proper accuracy of the algorithm. Additionally, the step sizes were chosen as µ<sup>12</sup> = 1 and µ<sup>21</sup> = 1 to provide the fast convergence for the proposed SBAPA in the real-time implementations. The evaluations in this article were implemented by the use of MATLAB software version 2019b on a PC with processor Inter Core i7-7700k, 4.20 GHz, and with 32GB RAM to be able to implement the proposed algorithm in the real-time conditions.

The proposed SBAPA in combination with a proposed circular nested microphone array (CNMA-SBAPA) was compared with the LMS [20], traditional APA [31], RLS [21], DB-MWF [9], and MNMF-MVDR [24] algorithms. These methods were compared because all of them are based on the adaptive filters and multi-channel beamforming as a main category for comparison. There are many methods for comparison with the proposed algorithm but the comparison should be based on the common theme in implementations. Therefore, the adaptive filter-based algorithms were selected for this comparison. The qualitative and quantitative criteria were considered to show the superiority of the proposed method in comparison with other previous works. For this purpose, the SegSNR [7], PESQ [4], MOS [5], and STOI [6] criteria were selected for the comparison. The SegSNR is a quantitative criterion, which shows the improvement in the enhanced signal due to the percentage of the noise power elimination from the noisy signal, namely:

$$\text{SegSNR}\_{(d\mathcal{B})} = \frac{1}{R} \sum\_{i=o}^{R} 10 \log\_{10} \left( \frac{\sum\_{j=0}^{Q-1} \left| S\_j[n] \right|^2}{\sum\_{j=0}^{Q-1} \left| S\_j[n] - Z\_j[n] \right|^2} VAD\_i \right) \tag{33}$$

where *S*[*n*] and *Z*[*n*] are the clean and enhanced speech signals, respectively. The variable *Q* is the mean averaging value of the SNR for the output signal. The variable *R* is the number of only-speech frames and *VAD* is a speech detector, which is 1 for only-speech frames and 0 for only-noise frames. Therefore, the SegSNR is appropriate to show the speech enhancement performance. Many of the speech enhancement algorithms eliminate some part of the speech signals in addition to the noise frames, which decreases the speech perception for the enhanced signals. Then, three well-known qualitative criteria are considered in the evaluations. The first one is the PESQ, which is defined based on the standard ITU-T P.862 for qualitative evaluations of speech signals in mobile stations [4,42]. In fact, the PESQ criteria is used in the numerical representation of qualitative evaluations for enhanced speech signals. The defined range for this criteria is [−0.5 4.5], where −0.5 and 4.5 show the lowest and highest quality of the enhanced speech, respectively. Additionally, the results were compared with the MOS score criteria. These are qualitative criteria in telecommunication systems that represent the clarity, perception, and intelligibility of the enhanced signal. The MOS criteria are defined based on the standard ITU-T P.800 [5,43] in telecommunication systems. The evaluation results based on the MOS criteria was implemented by the use of some volunteers, by listening to the enhanced signal, where 1 and 5 are the lowest and highest scores in this criteria, respectively. Table 3 shows the defined scores for the MOS criteria in the evaluations.

**Table 3.** The numerical scores for the mean opinion score (MOS) criteria in the evaluation process.


Finally, the last qualitative criteria for evaluations is the STOI. This criteria predicts the intelligibility of humans based on a series of cases. The speech intelligibility measurement is based on the existence of a series of pre-assumptions, but if the noisy signal is processed based on the time-frequency weighting, the final results are not trustable. The STOI is an objective intelligibility measurement, which represents the highest convolution value by the intelligibility of both noisy and weighted time-frequency noisy signals. In addition, the lowest and highest scores for the STOI criteria are 0 and 1, which represent the best and the worst enhancement performance, respectively.

Firstly, the proposed method was evaluated on the white noise and then, the other colored noise were considered in the experiments. The proposed CNMA-SBAPA was evaluated on real and simulated data in comparison with the LMS, traditional APA, RLS, DB-MWF, and MNMF-MVDR algorithms. Figure 11 shows the time-domain and spectrum for the noisy and enhanced signals in the presence of white noise for SNR = 0 dB. As seen in these figures, the proposed CNMA-SBAPA method decreased more level of the noise with less distortion in comparison with other works. However, the numerical values are necessary for comparison. In the following, the experiments were evaluated with quantitative and qualitative criteria.

**Figure 11.** *Cont*.

**Figure 11.** *Cont*.

**Figure 11.** The time-domain and spectrum representation for (**a**) white noisy signal and enhanced signal by the (**b**) least mean square (LMS), (**c**) APA, (**d**) recursive least square (RLS), (**e**)distributed multichannel Wiener filter (DB-MWF), (**f**) multichannel nonnegative matrix factorization-minimum variance distortionless response (MNMF-MVDR), and (**g**) proposed CNMA-SBAPA for SNR=0dB.

− − − − − − − In the following, the proposed method was compared by numerical criteria with other previous works. Figure 12 shows the SegSNR results in SNRs [−10, −5, 0, 5, 10, and 15] dB for the proposed CNMA-SBAPA in comparison with the LMS, traditional APA, RLS, DB-MWF, and MNMF-MVDR for real and simulated data in the presence of white noise. As seen, the proposed method had a superior performance in different ranges of SNRs in comparison with the rest of the works, namely a better noise elimination was reached via the proposed algorithm. For example, the proposed method enhanced the noisy speech signal with SNR = −10 dB to SegSNR = 1.35 dB in comparison with SegSNR = −4.58 dB in LMS, SegSNR = −3.21 dB in APA, SegSNR = −1.57 dB in RLS, SegSNR = −1.68 dB in DB-MWF, and SegSNR = −0.94 dB in MNMF-MVDR. Nevertheless, the quantitative criteria are not enough to properly evaluate a method, and both quantitative and qualitative criteria should be considered in the evaluations.

**Figure 12.** The segmental signal-to-noise ratio (SegSNR) comparison between the proposed CNMA-SBAPA, LMS, traditional APA, RLS, DB-MWF, and MNMF-MVDR methods on (**a**) simulated and (**b**) real data for white noise.

− − In addition, the proposed method was compared with previous works by qualitative criteria such as the PESQ, MOS, and STOI. We used 20 volunteers, where they listened first to the clean signal by the headset to have an idea of an excellent signal with a rating of 5 in the MOS scale and a noisy signal (before enhancement), which is the worst option in the MOS scale with a rating of 1. Then, the enhanced signal in a different range of SNRs were played for them, and they were asked to select a rate between 1 and 5 based on the Table 3. Figure 13 shows the PESQ, STOI, and averaged MOS criteria for the enhanced signal by the proposed method in comparison with previous works on real and simulated data for different ranges of SNRs in the presence of white noise. As seen, the proposed method had the best performance in comparison with previous works. For example, the PESQ score was 3.41 in the proposed method in comparison to 1.82 in LMS, 2.51 in APA, 2.73 in RLS, 2.93 in DB-MWF, and 3.1 in MNMF-MVDR, for SNR=15dB for simulated data. In addition, the STOI criteria was 0.89 in the proposed method in comparison to 0.73 in LMS, 0.77 in APA, 0.81 in RLS, 0.83 in DB-MWF, and 0.85 in MNMF-MVDR, for SNR=15dB. The other criteria for comparison was the average MOS rate, which was 3.5 in the proposed method in comparison to 2.5 in LMS, 2.7 in APA, 2.9 in RLS, 3.0 in DB-MWF, and 3.0 in MNMF-MVDR, for SNR=15dB. Therefore, the proposed method was superior for enhancing the noisy signals by considering both quantitative (Figure 12) and qualitative (Figure 13) criteria in comparison to previous works in the presence of white noise. In addition, the proposed method was implemented on colored noises to show the reliability of the results. For this purpose, the proposed method was evaluated on babble, train, car, and restaurant noises for the real and simulated data and for SNR ranges [−10,−5,0,5,10, and 15]dB. Tables 4 and 5 show the results on the simulated and real data, respectively. As seen from the numbers in these tables, the proposed method had better results in most cases in comparison with traditional methods, which present the reliability of the proposed method in colored noisy conditions. Some of the methods had slightly better results in specific cases, for example in SNR=15dB, which cannot be generalized to all cases. In addition, the SegSNR values are shown in these tables to present better comparison with qualitative criteria.

Finally, Table 6 presents the speed of convergence for the proposed method in comparison with other previous works for all white and colored noises in seconds (the required time for convergence based on the configuration of the used PC) on the real data. As shown, the proposed method has a higher speed of convergence in comparison with other algorithms. The main reason for this high speed of convergence is the sub-band processing, because this multiresolution processing provides stationary noise in each frequency band, which is an important factor in the speed of convergence. When the noise is closer to stationary conditions, the speed of convergence is increased in adaptive filter-based algorithms. As clearly shown in this table, the speed of convergence in white noisy conditions was higher than the colored noisy scenarios. Therefore, the proposed CNMA-SBAPA

method had superiority for the speech enhancement in comparison with LMS, traditional APA, RLS, DB-MWF, and MNMF-MVDR algorithms based on the quantitative SegSNR and qualitative PESQ, MOS, and STOI criteria, as well as the speed of convergence.

**Figure 13.** The perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and averaged mean opinion score (MOS) comparison between the proposed CNMA-SBAPA and the LMS, traditional APA, RLS, DB-MWF, and MNMF-MVDR methods for (**a**) simulated and (**b**) real data by considering the white noise.

**Table 4.** The comparison between PESQ, MOS, STOI, and SegSNR for the proposed CNMA-SBAPA in comparison with the LMS, traditional APA, RLS, DB-MWF, and MNMF-MVDR methods on the simulated data for colored noises such as: train, babble, car, and restaurant noises in different range of SNRs (the bold numbers are the best results).


**Table 5.** The comparison between PESQ, MOS, STOI, and SegSNR for the proposed CNMA-SBAPA in comparison with the LMS, traditional APA, RLS, DB-MWF, and MNMF-MVDR methods on the real data for colored noises such as: train, babble, car, and restaurant noises in different range of SNRs (the bold numbers are the best results).




#### **5. Conclusions**

Speech enhancement is an important application in the signal processing for smart meeting rooms. The aim of speech enhancement is denoising, dereverberation, or denoising–dereverberation at the same time. The speech enhancement is implemented as a pre-processing step to produce the proper signal in such an application as speaker localization, tracking, speech recognition, text-to-speech, estimation the number of speakers, etc. The speech enhancement algorithms are divided into the single and multi-channels methods. The single-channel algorithms are challenging in the speech enhancement processes because of the lack of suitable information in the denoising procedure. In contrast, the multi-channel algorithms increase the enhancement accuracy due to having more information but the computational complexity is increased. In this article, a multi-channel speech enhancement method was proposed based on the microphone array. The microphone array increased the accuracy in the enhanced

algorithms based on the increasing of information, but the spatial aliasing decreased the efficiency because of inter-microphone distances. In this article, a uniform circular nested microphone array was proposed for the speech enhancement algorithms. This nested array was designed in a way that the microphones were located at specific distances to eliminate the spatial aliasing, in combination with analysis filters to provide the proper information for the speech enhancement algorithms. In addition, the speech information is different in various frequency bands. Therefore, the specific sub-band processing was proposed to have especial attention to the speech spectrum components. The frequency bands were designed to have the maximum resolution in low frequency components. In the following, the APA was implemented on all frequency bands, which was obtained by the sub-band processing and circular nested microphone array. The projection factor (*N*=4) was considered for the CNMA-SBAPA in order to keep the computational complexity in an acceptable range along with the superior accuracy. Finally, the synthesis filter bank was implemented on the sub-band signals and the enhanced signal was generated by the summation through all sub-bands. The proposed algorithm was compared with the LMS, traditional APA, RLS, DB-MWF, and MNMF-MVDR methods on the real and simulated data for white and colored noises under the SNRs range [−10,−5,0,5,10, and 15]dB. In all conditions the proposed method had a superior accuracy in comparison with previous works. In addition, the proposed method was compared based on the speed of convergence with previous works, which it was much faster among all the other algorithms. Since the proposed enhancement algorithm was implemented on stationary signals, where its benefit was increasing the speed of convergence in adaptive filters.

One of the future works is reducing the size of the array and decreasing the number of microphones (without having a high effect on the quality) to be applicable for smartphone applications. Even the type of the microphones is important. In this article, we used a high quality microphone, which provides the signals with proper amplitude from the environment. The use of normal microphones in smartphones is another challenge, which could be an area for future work. Another area for future work is to find the best numbers of sub-bands to provide the maximum performance and lowest computational complexity, where the numbers of sub-bands will not be fixed and it should be adaptive based on the speech components.

**Author Contributions:** Conceptualization, A.D.F. and P.A. and D.Z.-B.; methodology, A.D.F. and P.A.; software, A.D.F., P.I., P.A. and H.D.; validation, M.S., P.P., D.Z.-B. and C.A.-M.; formal analysis, A.D.F. and P.A.; investigation, A.D.F. and P.A.; resources, A.D.F., P.A., D.Z.-B. and P.I.; data curation, A.D.F.; writing—original draft preparation, A.D.F., P.A. and D.Z.-B.; writing—review and editing, P.P.-J., C.A. and D.Z.-B.; supervision, P.I.; project administration, P.A., H.D., M.S. and D.Z.-B.; funding acquisition, P.A. and A.D.F. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by FONDECYT Postdoctorado No. 3190147, FONDECYT No. 11180107 and ANID PFCHA/Beca de Doctorado Nacional/2019 21190489.

**Acknowledgments:** This work was supported by the Vicerrectoría de Investigación y Postgrado of the Universidad Tecnológica Metropolitana, the Vicerrectoría de Investigación y Postgrado, and Faculty of Engineering Science of the Universidad Católica del Maule.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:



#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **Classification of Hydroacoustic Signals Based on Harmonic Wavelets and a Deep Learning Artificial Intelligence System**

**Dmitry Kaplun <sup>1</sup> , Alexander Voznesensky <sup>1</sup> , Sergei Romanov <sup>1</sup> , Valery Andreev <sup>2</sup> and Denis Butusov 3,\***


Received: 10 April 2020; Accepted: 26 April 2020; Published: 29 April 2020

**Abstract:** This paper considers two approaches to hydroacoustic signal classification, taking the sounds made by whales as an example: a method based on harmonic wavelets and a technique involving deep learning neural networks. The study deals with the classification of hydroacoustic signals using coefficients of the harmonic wavelet transform (fast computation), short-time Fourier transform (spectrogram) and Fourier transform using a kNN-algorithm. Classification quality metrics (precision, recall and accuracy) are given for different signal-to-noise ratios. ROC curves were also obtained. The use of the deep neural network for classification of whales' sounds is considered. The effectiveness of using harmonic wavelets for the classification of complex non-stationary signals is proved. A technique to reduce the feature space dimension using a 'modulo N reduction' method is proposed. A classification of 26 individual whales from the Whale FM Project dataset is presented. It is shown that the deep-learning-based approach provides the best result for the Whale FM Project dataset both for whale types and individuals.

**Keywords:** harmonic wavelets; classification; kNN-algorithm; deep neural networks; machine learning; Fourier transform; short-time Fourier transform; wavelet transform; spectrogram; confusion matrix; ROC curve

#### **1. Introduction**

The whale was one of the main commercial animals in the past. Whalers were attracted by the huge carcass of this animal—from one whale they could get much more fat and meat than from any other marine animal. Today, many of its species have almost been driven to extinction. For this reason, they are listed in the IUCN Red List of Threatened Species [1]. Currently, the main threat to whales is an anthropogenic factor, expressed in violation of their usual way of life and pollution of the seas. To ensure the safety of rare animals, the number of individuals must be monitored. Within the framework of environmental monitoring programs approved by governments and public organizations of different countries, cetacean monitoring activities are carried out year-round using all of the modern achievements in data processing [2]. Monitoring includes work at sea and post-processing of the collected data: determining the coordinates of whale encounters, establishing the

composition of the group, and photographing the animals for subsequent observation of individually recognizable individuals.

Systematic observation of animals presents scientists with the opportunity to learn about how mammals share the water area among themselves, to collect data on age and gender composition [3]. An important task is to find out where the whales come from and where they then go to in the winter, to track their routes of movements. You must also be able to determine which population the whales belong to.

Sounds made by cetaceans for communication are called "whale songs". The word "songs" is used to emphasize the repeating and melodic nature of these sounds, reminiscent of human singing. The use of sounds as the main communication channel is due to the fact that, in an aquatic environment, visibility can be limited, and smells spread much slower than in air [4]. It is believed that the most complex songs of humpback whales and some toothless whales are used in mating games. Simpler signals are used all year round and perhaps serve for day-to-day communication and navigation. Toothed whales (including killer whales) use emitted sounds for echolocation. In addition, it was found that whales that have lived in captivity for long can mimic human speech. All these signals are transmitted to different distances, under different water conditions and in the presence of a variety of noises. Additionally, stable flocks have their own dialects, i.e., there is wide variability in the sounds made by whales, both within the population and between populations. Thus, sounds can be used to classify both whale species and individuals. The task of classifying whales by sound has been solved by many researchers for different types of whales in different parts of the world, using various methods and approaches, the most popular being signal processing algorithms [5,6] and algorithms based on neural networks [2,7–10]. Neural-network-based approaches present different architectures, models and learning methods. In [2], the authors developed and empirically studied a variety of deep neural networks to detect the vocalizations of endangered North Atlantic right whales. In [7], an effective data-driven approach based on pre-trained convolutional neural networks (CNN) using multi-scale waveforms and time-frequency feature representations was developed in order to perform classification of whale calls from a large open-source dataset recorded by sensors carried by whales. The authors of [8] constructed an ensembled deep learning CNN model to classify beluga detections. The applicability of basic CNN models is also being explored for the bio-acoustic task of whale call detection, such as with respect to North Atlantic right whale calls [9] and humpback whale calls [10].

This paper considers two approaches to hydroacoustic classification, taking the sounds made by whales as examples: on the basis of harmonic wavelets and deep learning neural networks. The main contributions of our work can be summarized as follows. The effectiveness of using harmonic wavelets for the classification of hydroacoustic signals was proved. A technique to reduce the feature space dimension using a 'modulo N reduction' method was developed. A classification of 26 individual whales is presented for the dataset. It was shown that the deep-learning-based approach provides the best result for the dataset both for whale types and individuals.

The remainder of this paper is organized as follows. In Section 2, we briefly describe hydroacoustic signal processing and review related works on it. In Section 3, we introduce details of the harmonic wavelets and their application to the processing of hydroacoustic signals. In Section 4, we review the kNN algorithm for classification based on harmonic wavelets and present experimental results to verify the proposed approach. In Section 5, experimental results are presented to verify the approach for classification based on neural networks and machine learning. In Section 6, we discuss the results and how they can be interpreted from the perspective of previous studies and of the working hypotheses. Future research directions also are highlighted. Finally, we present the conclusions in Section 7.

#### **2. Hydroacoustic Signal Processing**

Before classifying hydroacoustic signals, which are sounds made by whales in an aquatic environment, they must be pre-processed, as the quality of the classification will depend on their quality. Hydroacoustic signal processing includes data preparation, as well as the use of further algorithms allowing the extraction of useful signals from certain directions. Preliminary processing includes de-noising, estimation of the degree of randomness, extraction of short-term local features, pre-filtering, etc. Preprocessing affects the process of further analysis within a hydroacoustic monitoring system [11–13]. Even though the preprocessing of hydroacoustic signals has been studied for a long time, there are several unresolved problems, namely: working in conditions of a priori uncertainty of signal parameters; processing complex non-stationary hydroacoustic signals with multiple local features; and analysis of multicomponent signals. Another set of problems is represented by effective preliminary visual processing of hydroacoustic signals and the need for a mathematical apparatus for signal preprocessing tasks.

Current advances in applied mathematics and digital signal processing along with the development of high-performance hardware allow the effective application of numerous mathematical techniques, including continuous and discrete wavelet transforms. Wavelets are an effective tool for signal preprocessing, due to their adaptability, the availability of fast computational algorithms and the diversity of wavelet bases.

Using wavelets for hydroacoustic signal analysis provides the following possibilities [14,15]:


Classification is an important task of modern signal processing. The quality of the classification depends on the noise level, training size and testing datasets, and the algorithm. It is also important to choose classification features and determine the size of the feature space. The classification feature is the feature or characteristic of the object used for classification. If we classify real non-stationary signals, it is important to have informative classification features. Among such features are wavelet coefficients.

#### **3. Harmonic Wavelets**

Wavelet transform uses wavelets as the basis functions. An arbitrary function can be obtained from one function ("mother" wavelet) by using translations and dilations in the time domain. The wavelet transform is commonly used for analyzing non-stationary (seismic, biological, hydroacoustic etc.) signals, usually together with various spectral analysis algorithms [16,17].

Consider the basis of harmonic wavelets whose spectra are rectangular in the given frequency band [15,16]. Harmonic wavelets are usually represented in the frequency domain. Wavelet-function (mother wavelet) can be written as:

$$\Psi(\omega) = \begin{cases} \frac{1}{2\pi\prime} \ 2\pi \le \omega < 4\pi \\ 0, \ \omega < 2\pi, \omega \ge 4\pi \end{cases} \quad \Leftrightarrow \ \psi(\mathbf{x}) = \int\_{-\infty}^{\infty} \Psi(\omega) e^{i\omega \mathbf{x}} d\omega = \frac{e^{i4\pi \mathbf{x}} - e^{i2\pi \mathbf{x}}}{i2\pi \mathbf{x}} \tag{1}$$

There are some techniques that allow us to decompose input signals using different basic functions: wavelets, sine waves, damped sine waves, polynomials, etc. These functions form the atom dictionary (basis functions) and each function is localized in the time and frequency domains. Often the dictionary of atoms is full (all types of functions are used) and redundant (the functions are not mutually independent). One of the main problems in these techniques is the selection of basic functions and

dictionary optimization to acheive optimal decomposition levels [17]. Decomposition levels for wavelets can be defined as:

$$\begin{aligned} \Psi\_{jk}(\omega) &= \begin{cases} \frac{1}{2\pi} 2^{-j} e^{-\frac{i\omega k}{2^j}}, \text{ $2\pi 2^j \le \omega < 4\pi 2^j$ }\\ 0, \text{ $\omega < 2\pi 2^j$ }, \omega \ge 4\pi 2^j \end{cases} \\ \psi\_{jk}(\mathbf{x}) &= \psi(2^j \mathbf{x} - k) = \int\_{-\infty}^{\infty} \Psi\_{jk}(\omega) e^{i\omega \mathbf{x}} d\omega = \frac{e^{i\mathbf{x}\pi(2^j \mathbf{x} - k)} - e^{i2\pi(2^j \mathbf{x} - k)}}{i2\pi(2^j \mathbf{x} - k)} \end{aligned} \tag{2}$$

where *j* is decomposition level and *k* is dilation.

Very often, wavelets are basis functions because of their useful properties [14] and the potential to process signals in the time-frequency domain. The Fourier transform of a scaling function can be written as:

$$\Phi(\omega) = \begin{cases} \frac{1}{2\pi}, & 0 \le \omega < 2\pi \\ 0, & \omega < 0, \omega \ge 2\pi \end{cases} \quad \Leftrightarrow \quad \phi(\mathbf{x}) = \int\_{-\infty}^{\infty} \Phi(\omega) e^{i\omega \mathbf{x}} d\omega = \frac{e^{i2\pi \mathbf{x}} - 1}{i2\pi \mathbf{x}} \tag{3}$$

We can formulate the following properties of harmonic wavelets, which relate them with other classes of wavelets:


The drawback of harmonic wavelets is their weak localization properties in the time domain in comparison with other types of wavelets. The spectrum in the form of a rectangular wave leads to decay in the time domain as *1*/*x*, which is not sufficient for extracting short-term singularities in a signal in the time domain.

#### *Wavelet Transform in the Basis of Harmonic Wavelets*

Detailed coefficients *<sup>a</sup>jk*,e*ajk* and approximation coefficients *<sup>a</sup>*φ*<sup>k</sup>* ,e*a*φ*<sup>k</sup>* :

$$\begin{cases} a\_{jk} = 2^j \int\_0^\infty f(\mathbf{x}) \overline{\psi}(2^j \mathbf{x} - k) d\mathbf{x} & \widetilde{a}\_{jk} = 2^j \int\_0^\infty f(\mathbf{x}) \psi(2^j \mathbf{x} - k) d\mathbf{x} \\\ a\_{\phi k} = 2^j \int\_{-\infty}^\infty f(\mathbf{x}) \overline{\phi}(\mathbf{x} - k) d\mathbf{x} & \widetilde{a}\_{\phi k} = 2^j \int\_{-\infty}^0 f(\mathbf{x}) \phi(\mathbf{x} - k) d\mathbf{x} \end{cases} \tag{4}$$

where *j* is the decomposition level; *k* is the dilation.

If *f(x)* is a real-valued function, then: <sup>e</sup>*ajk* <sup>=</sup> *<sup>a</sup>jk*, <sup>e</sup>*a*φ*<sup>k</sup>* <sup>=</sup> *<sup>a</sup>*φ*<sup>k</sup>* . Wavelet decomposition [14]:

$$f(\mathbf{x}) = \sum\_{j=-\infty}^{\infty} \sum\_{k=-\infty}^{\infty} a\_{jk} \psi(2^j \mathbf{x} - k) = \sum\_{k=-\infty}^{\infty} a\_{\phi k} \phi(\mathbf{x} - k) + \sum\_{j=0}^{\infty} \sum\_{k=-\infty}^{\infty} a\_{jk} \psi(2^j \mathbf{x} - k) \tag{5}$$

Wavelet decomposition using harmonic wavelets [18]:

$$f(\mathbf{x}) = \sum\_{j = -\infty}^{\infty} \sum\_{k = -\infty}^{\infty} \left[ a\_{jk} \psi(2^j \mathbf{x} - k) = \overline{a}\_{jk} \overline{\psi}(2^j \mathbf{x} - k) \right]$$

$$= \sum\_{k = -\infty}^{\infty} \left[ a\_{\phi k} \phi(\mathbf{x} - k) = \overline{a}\_{\phi k} \overline{\phi}(\mathbf{x} - k) \right] + \sum\_{j = 0}^{\infty} \sum\_{k = -\infty}^{\infty} \left[ a\_{jk} \psi(2^j \mathbf{x} - k) + \overline{a}\_{jk} \overline{\psi}(2^j \mathbf{x} - k) \right] \tag{6}$$

$$a\_{jk} = 2^j \int\_{-\infty}^{\infty} f(\mathbf{x}) \overline{\psi}(2^j \mathbf{x} - k) d\mathbf{x}$$

Calculations with the last two formulae are inefficient.

Fast decomposition can be implemented in the following way:

$$a\_{\rm jk} = 2^{\rm j} \int\_{-\infty}^{\infty} \mathcal{F}(\omega) \frac{1}{2\pi} 2^{-j} \mathbf{e}^{\frac{i\omega k}{2^j}} d\omega = \frac{1}{2\pi} \int\_{2\pi 2^j}^{4\pi 2^j} \mathcal{F}(\omega) \mathbf{e}^{\frac{i\omega k}{2^j}} d\omega \approx \int\_{2\pi 2^j}^{4\pi 2^j} \mathcal{F}(\omega) \mathbf{e}^{\frac{i\omega k}{2^j}} d\omega \tag{7}$$

The substitution is of the following form:

$$\begin{aligned} n &= 2^j + s \\ F\_{2^j + s} &= 2\pi \mathbb{F}[\omega = 2\pi \{ 2^j + s \} ] \end{aligned} \tag{8}$$

We can show that:

$$a\_{jk} = \sum\_{s=0}^{2^j - 1} F\_{2^j + s} e^{\frac{i2\pi sk}{2^j}} k = 0 \dots 2^j - 1; j = 0 \dots n - 1. \tag{9}$$

$$\widetilde{a}\_{j\mathbf{k}} = \sum\_{s=0}^{2^j - 1} F\_{N - (2^j + s)} e^{\frac{i2\pi \mathbf{k}}{2^j}} \ k = 0 \dots 2^j - 1; \ j = 0 \dots n - 1. \tag{10}$$

$$\widetilde{a}\_{j\mathbf{k}} = \widetilde{a}\_{j\mathbf{k}}$$

Thus, the algorithm for computing wavelet coefficients of the octave harmonic wavelet transform [19] of a continuous-time function *f(x)* can be written in the following way:



**Table 1.** Distribution of wavelet coefficients among decomposition levels.

Further, consider two approaches to classifying bio-acoustic signals. We have used real hydroacoustic signals of whales from the database [20].

#### **4. Classification Using the kNN-Algorithm**

The classification was based on 14,822 records of whales of two types: 'killer' (4673 records) and 'pilot' (10,149 records). Data for processing was taken from [20]. Research has been conducted for the following signal-to-noise ratios (SNR): 100, 3, 0 and −3 dB. Training of the classifier was based on 85% of records of each class, and testing was based on 15% of records of each class. The following attributes have been used for comparison: the harmonic wavelet transform (HWT) coefficients, the short-time Fourier transform (STFT) coefficients and the discrete Fourier transform (DFT) coefficients.

All records had different numbers of samples (8064–900,771) and different sampling rates. To perform classification, we had to change the lengths of the records so that they equaled 2. To reduce the feature space dimension, we employed the approach based on modulo *N* reduction [21]. Such an approach allows us to reduce the data dimension when calculating N-point DFT if *N* < *L* (*L* is signal length). The final signal matrix size (*N* = *4096*) was 14,822 × 4096.

To reduce the feature space dimension, we also used coefficients of symmetry for the harmonic wavelet transform and the DFT: we used 50% coefficients (matrix: 14,822 × 2048). In the case of using a short-time Fourier transform (Hamming window of the size 256, overlap 50%), the final signal matrix size was 14,822 × 3999.

Below we can see the classification results (Tables 2–13, Figure 1) using the kNN-algorithm [22] for different features and different SNR values.

**Figure 1.** ROC curve of the classification: HWT, SNR = 100 dB.

The classification problem is to attribute vectors to different classes. We have two classes: positive and negative. In this case, we can have four different situations at the output of a classifier:


We have calculated the following classification quality metrics: precision, recall and accuracy.

$$Precision = \frac{TP}{TP + FP}; \text{ Recall} = \frac{TP}{TP + FN}; \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \tag{11}$$

Tables 14–16 contain precision, recall and accuracy for different classification features and different signal-to-noise ratios. Additionally, we can find the average final efficiency score characterizing the use of different classification features.


**Table 2.** Classification results: HWT, SNR = 100 dB.

**Table 3.** Classification results: STFT, SNR = 100 dB.






**Table 6.** Classification results: STFT, SNR = 3 dB.


**Table 7.** Classification results: DFT, SNR = 3 dB.


**Table 8.** Classification results: HWT, SNR = 0 dB.



**Table 9.** Classification results: STFT, SNR = 0 dB.

**Table 10.** Classification results: DFT, SNR = 0 dB.




**Table 12.** Classification results: STFT, SNR = −3 dB.


**Table 13.** Classification results: DFT, SNR = −3 dB.


**Table 14.** Classification results: HWT.


\* score of a particular metric for each SNR. The "averaged score for three metrics" means that we estimated the average score for three metrics with the same SNR. Then, the final score for each feature (HWT, STFT, DFT) with different SNRs was chosen. We can see that using HWT as features gives the best result.


**Table 15.** Classification results: STFT.

\* score of a particular metric for each SNR. The "averaged score for three metrics" means that we estimated the average score for three metrics with the same SNR. Then, the final score for each feature (HWT, STFT, DFT) with different SNRs was chosen. We can see that using HWT as features gives the best result.


**Table 16.** Classification results: DFT.

\* score of a particular metric for each SNR. The "averaged score for three metrics" means that we estimated the average score for three metrics with the same SNR. Then, the final score for each feature (HWT, STFT, DFT) with different SNRs was chosen. We can see that using HWT as features gives the best result.

#### **5. Classification Using a Deep Neural Network**

The classification was based on 14,822 records of whales of two types: 'killer' (4673 records) and 'pilot' (10,149 records). Data for processing were taken from [20], containing sound recordings of 26 whales of two types: killer whale (15 individuals) and pilot whale (11 individuals).

In [23], for this dataset, two classifiers were constructed based on the kNN-algorithm. In the first case, the sounds were classified into a grind or killer whale sounds. For training, 800 whale sounds of each class were used; for testing, 400 of each were used. A classification accuracy of 92% was obtained. In the second experiment, 18 whales were separated from each other. For training, they took 80 records; for testing, they took 20. The classification accuracy was 51%.

In this work, records less than 960 ms long were removed from the dataset. After that, 14,810 records with an average duration of 4 s remained: 10,149 records of the grind and 4661 records of killer whales.

The classifier for both tasks was based on the VGGish model [24], which is a modified deep neural network VGG [25] pre-trained on the YouTube-8M dataset [26]. Cross entropy was used as a loss function. The audio files have been pre-processed in accordance with the procedure presented in [24]. Each record is divided into non-overlapping 960 ms frames, and each frame inherits the label of its parent video. Then log-mel spectrogram patches of 96 × 64 bins are then calculated for each frame. These form the set of inputs to the classifier. The output for the entire audio recording was carried out according to the maximum likelihood for all classes for each segment. As features, the output of the penultimate layer of dimension 128 was taken. More details can be found in the paper [23].

#### *5.1. Experiment 1—Classification by Type*

For the first task, we divided the dataset into training and test data in the proportion 85:15; in the training and the test sample there are no sounds from the same whales. The killer whale was designated 0, and the pilot whale was designated 1. Statistics on the training set: 8486–1, 3995–0. Statistics on the test set: 1663–1, 666–0.

The following results were obtained. On the training set, the confusion matrix was:

$$
\begin{pmatrix} \mathbf{3994} & 1 \\ \mathbf{27} & \mathbf{8459} \end{pmatrix}
$$

#### 1—FP, 27—FN.

On the test set, the confusion matrix was:

$$
\begin{pmatrix} 633 & 33 \\ 86 & 1577 \end{pmatrix}
$$

33—FP, 86—FN

Recall = 0.95, precision = 0.98 or accuracy = 0.95, AUC = 0.99. Figure 2 shows the ROC curve for the test set.

**Figure 2.** ROC curve for the test set.

#### *5.2. Experiment 2—Classification by Individual*

The data was divided into training and test sets in the ratio of 85:15, maintaining the proportions of the classes. As Figure 3 shows, the classes are very unbalanced. Thus, in the training test, for classes 26 and 5, 15 and 12 files are available. For class 20, 3684 files are available (see Figure 3).

**Figure 3.** The number of files available for each class (individual). For training, from each class we took 900 files (augmented).

The confusion matrix for the training set is given in Figure 4.


**Figure 4.** Confusion matrix for the training set.

The confusion matrix for the test set is presented in Figure 5.


**Figure 5.** Confusion matrix for the test set.

The accuracy of the classification of individuals in percent on a test sample is presented in Figure 6. Blue lines indicate the true-positive value, orange lines indicate false-positive value.

**Figure 6.** Classification accuracy for 26 whales.

As can be seen, the 25th (whale ID 26) class never predicts. Only for the 9th (whale ID 10), 14th (whale ID 15), 24th (whale ID 25) classes was the classification accuracy below 60%; for all the others it was higher. For some classes, classification accuracy is higher than 95%.

#### **6. Discussion**

Classification of whale sounds is a challenging problem that has been studied for a long time. Despite great achievements in feature engineering, signal processing and machine learning techniques, there still remain some major problems to be solved. In this paper, we used harmonic wavelets and deep neural networks. The results of the classification of whale types and individuals by means of deep neural networks are better than in previous works [23] with this dataset, but accuracy in the classification of types using harmonic wavelets as features and in the classification of individuals using deep neural networks should be increased. In further studies, we will use a Hilbert–Huang transform [27] and adaptive signal processing algorithms [28] to generate features.

For improvement of individual classification, two approaches can be suggested. The first combines data augmentation with other architectures of the neural network, but this will lead to large computational costs. The second approach is to use technology for simple and non-iterative improvements of multilayer and deep learning neural networks and artificial intelligence systems, which was proposed some years ago [29,30]. Our further research in the classification of hydroacoustic signals will be related to these two approaches. We also intend to test these approaches by adding noises at different SNRs, as we have done for harmonic wavelets.

#### **7. Conclusions**

In our paper, we considered the harmonic wavelet transform and its application to classifying hydroacoustic signals from whales of two types. We have provided a detailed representation of the mathematical tools, including fast computation of the harmonic wavelet transform coefficients. Classification results analysis allows us to draw conclusions about the reasonability of using harmonic wavelets when analyzing complex data. We have established that the smallest classification error is provided by the k-NN algorithm based on the harmonic wavelet transform coefficients.

The analysis (Table 17 Figures 5 and 6) illustrates the superiority of using a neural network for the Whale FM Project dataset in comparison with known work [23] and a kNN-classifier for the classification problem [31]. However, it is worth noting that the implementation of a neural network of such a complicated structure requires significant computational resources.

**Table 17.** Analysis of different approaches to bioacoustic signal classification of whale types (pilot and killer).


Classification of 26 individual whales from the Whale FM Project dataset was proposed, and better results in comparison with previous works were achieved [23].

The proposed approach can be used in the study of the fauna of the oceans by research institutes, environmental organizations, and enterprises producing equipment for sonar monitoring. In addition, the study showed that the same methods can be used for speech processing and classification of underwater bioacoustic signals, which will subsequently allow the creation of effective medical devices based on these methods.

**Author Contributions:** Conceptualization, D.K.; data curation, S.R.; formal analysis, S.R. and D.B.; investigation, A.V. and S.R.; methodology, V.A. and D.B.; project administration, D.K.; resources, D.K. and V.A.; software, A.V. and V.A.; supervision, D.K.; validation, A.V., S.R. and D.B.; Visualization, A.V.; writing—original draft, D.K. and D.B.; writing—review & editing, V.A. and D.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** The research and the present paper are supported by the Russian Science Foundation (Project NO. 17-71-20077).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **Quantification of the Feedback Regulation by Digital Signal Analysis Methods: Application to Blood Pressure Control E**ffi**cacy**

**Nikita S. Pyko <sup>1</sup> , Svetlana A. Pyko 1,2, Oleg A. Markelov 1,\* , Oleg V. Mamontov 2,3 and Mikhail I. Bogachev 1,\***


Received: 7 November 2019; Accepted: 18 December 2019; Published: 26 December 2019 -

#### **Featured Application: Analysis of blood pressure and heart rate mutual synchronization provides complementary information about the physiological mechanisms and e**ffi**cacy of feedback control that keeps blood pressure levels within the physiologically desirable range.**

**Abstract:** Six different metrics of mutual coupling of simultaneously registered signals representing blood pressure and pulse interval dynamics have been considered. Stress test responses represented by the reaction of the recorded signals to the external input by tilting the body into the upright position have been studied. Additionally, to the conventional metrics like the joint signal coherence Coher and the sensitivity of the pulse intervals response to the blood pressure changes baroreflex sensitivity (BRS), also alternative indicators like the synchronization coefficient Sync and the time delay stability estimate TDS representing the temporal fractions of the analyzed signal records exhibiting rather synchronous dynamics have been determined. In contrast to BRS, that characterizes the intensity of the pulse intervals response to the blood pressure changes during observed feedback responses, both Sync and TDS likely indicate how often such responses are being activated in the first place. The results indicate that in most cases BRS is typically reciprocal to both Sync and TDS suggesting that low intensity of the feedback responses characterized by low BRS is rather compensated by their more frequent activation indicated by higher Sync and TDS. The proposed additional indicators could be complementary for the differential diagnostics of blood pressure regulation efficacy and also lead to a deeper insight into the involved concomitant factors this way also aiming at the improvement of the mathematical models representing the underlying feedback control mechanisms.

**Keywords:** feedback regulation; digital signal analysis; control efficacy

#### **1. Introduction**

Blood pressure levels are being simultaneously affected by multiple internal and external factors that require continuous activity of several regulatory mechanisms to keep its levels within a certain homeostatic range. Among multiple mechanisms that are involved in the regulation of blood pressure, arterial baroreceptor reflex or simply, baroreflex, appears one of the key mechanisms that govern short-term feedback responses to various physical stresses such as exercise, adaptation to the changes

in the body position, reactions to drugs, changes in the subject's mental conditions and so on. The increase of blood pressure is sensed by baroreceptors located in blood vessels that in turn invoke a response by the autonomous nervous system. The response is generally twofold, including the decrease of heart rate and the reduction of the vascular resistance, both leading to the consequent drop of blood pressure. Thus the efficacy of this feedback mechanism largely determines timely and adequate responses to the changes in blood pressure.

Over a long time quantitative assessment of the short-term blood pressure—heart rate feedback regulation efficacy has been limited to the analysis of the baroreflex sensitivity (BRS) defined as the measure of the relative change of the pulse interval (in ms) in response to the change in the systolic blood pressure (in mmHg), altogether measured in (ms/mmHg). Historically, BRS was first measured as the increase in the pulse interval to the pharmacologically induced increase in blood pressure that guaranteed the feedback mechanism activation for a certain time fragment, with the BRS being quantified by the linear regression coefficient of pulse intervals on blood pressure over this time fragment. In the last three decades, several methods to measure BRS from simultaneously recorded spontaneous fluctuations of both pulse intervals and blood pressure without applying any external stimuli have been suggested. The most common time-domain approach, often termed as the sequence method, simply focuses on finding time fragments where both pulse intervals and blood pressure either increase or decrease consecutively and monotonously over several heartbeat cycles, sometimes referred to as baroreflex sequences. Next for each time fragment a linear regression coefficient of pulse intervals on blood pressure is calculated, representing the local BRS estimate [1,2]. To improve the accuracy, averaging over several baroreflex sequences is usually performed, at the cost of lower temporal resolution. In contrast, spectral-based methods do not require selection of certain time fragments, instead focusing on the blood pressure—pulse intervals transfer function analysis in a certain frequency band where the coherence between them exceeds a certain threshold [3–5]. To overcome the common drawback of both methods, in particular their limited performance under non-stationary conditions such as stress tests, modified methods based on first differences analysis have been suggested [6]. Combined with the advances of non-invasive blood pressure measurement techniques, these methods made BRS one of the routinely measured parameters both in clinical and ambulatory settings [7].

BRS has been reported as a highly informative prognostic marker widely applicable in both ambulatory and clinical investigations. In particular, BRS appeared highly predictive of cardiac mortality in post-infarction patients with both reduced ejection fraction [8] as well as preserved left ventricular function [9] including those receiving β-blocker treatment [10] as well as in patients with life-threatening arrhythmias [11], see also [12]. BRS impairment has been also reported as an early indicator of autonomic dysfunction and autonomic failure [13]. In earlier studies, BRS has been shown to exhibit significant changes under exercise as well as postural and other physical stresses [14–17], while more recent data indicate that these effects are temporary and after a certain adaptation period the baroreflex exhibits a resetting around new absolute blood pressure and pulse interval values [18–20]. However, successful baroreflex resetting has notable exceptions with one such observed recently in diabetes patients with certain complications, particularly with obesity who exhibited significant baroreflex impairment [21].

While the BRS quantifies explicitly the response of the heart rate to the changes in blood pressure, it does not contain any information whether there was such a response to every significant variation of blood pressure. Time-domain methods simply disregard time intervals without significant changes of heart rate irrelevant to the fact whether there were blood pressure variations, while spectral methods provide characteristics that are averaged over a given frequency band within the entire analysis window. In turn, another alternative method [22] measures the average heart rate acceleration or deceleration, while not considering whether it occurred in response to significant blood pressure variations or not. Thus, neither of them can guarantee that all blood pressure variations were adequately responded. However, a timely response to the changes in blood pressure is essential for homeostasis, since

missing or delayed responses lead to the increased blood pressure variability. In contrast, there is recent evidence that baroreflex activation therapy using implantable devices that stimulates carotid baroreceptors significantly improved the blood pressure control efficacy [23]. Accordingly, additionally to the measurement of the BRS itself, it is also important to quantify the activation of the feedback mechanism in response to blood pressure variations.

In this paper, we suggest a series of complementary indicators of the blood pressure—heart rate feedback regulation based on their mutual synchronization patterns. Investigation of the physiological signals' mutual synchronization is widely used in chronobiological studies appearing essential for a deeper understanding of the mechanisms related to the influence of various external factors such as geomagnetic field variations, solar cycles, jetlag or night shifts [24–26]. Recent examples include the synchronization analysis of heart rate and respiration during different sleep phases [27–29]. While in most cases relation between rhythms with certain quasi-periodic structure is studied, such as breathing cycles modulating heartbeat cycles, we go beyond that and, while following conceptually similar methodology, modify the synchronization analysis methodology to suite the blood pressure—pulse interval analysis that both exhibit rather stochastic behavior, as indicated below. Particular mutual synchronization metrics used here follow a recent study where their performance has been validated using simulated datasets with non-periodic structure and correlation patterns reminiscent to those typically observed in physiological signals [30].

#### **2. Materials and Methods**

#### *2.1. Subjects and Clinical Investigation Protocol*

All recordings were obtained at the Almazov National Medical Research Centre in accordance with the ethical standards presented in the Declaration of Helsinki. The study protocol was reviewed and approved by the Ethics Committee of the Almazov National Medical Research Centre (Ref. No. 110, approval date 12 June 2010) before the beginning of the study. All patients and volunteers provided their informed consent in written form prior to their participation in the study.

The study included 95 subjects subdivided into three groups:



**Table 1.** Detailed clinical characteristics of the studied patients' groups.

Prior to the tilt-test all patients and volunteers underwent standard functional autonomic tests. A comprehensive assessment of autonomic regulation of blood circulation included the following tests:


Next all 95 subjects and patients underwent a head-up tilt-test (table tilt 70◦ , duration of orthostatic position up to 30 min unless stopped earlier due to syncope response) [33], see also [34] that was performed under identical conditions between 10 am and 1 pm. The recording in the initial supine position and the initial fragment of the orthostatic phase recording (both around 10 min duration) were used in further analysis.

Hemodynamic parameters we measured continuously using the Finometer-Pro blood pressure monitor (Finapres Medical Systems, Enschede, The Netherlands) with parallel electrocardiogram (ECG) recording. The forearm blood flow was measured by venous occlusion plethysmography using Dohn air-filled cuff.

The overall study design is summarized in Figure 1.

**Figure 1.** Clinical study design.

Detailed characteristic of the studied patients' groups are summarized in Table 1.

#### *2.2. Data Acqusition and Preliminary Processing*

Since the instantaneous phase calculation procedure by Hilbert transform employed in the signal mutual synchronization analysis is rather sensitive to random errors in measurements, an adaptive recursive filtering procedure has been applied to the original measurement sequences {*s<sup>i</sup>* }, where *s* denotes either systolic blood pressure (SBP) or pulse intervals (PI) aiming at the elimination of the anomalous measurements. This procedure is based on the analysis of the first differences of the pulse intervals and systolic blood pressure values. The threshold value for the exclusion of outliers is based on the analysis of their empirical distribution functions for the processed data series. The elimination procedure consists of two consecutive steps that are repeated iteratively: (i) marking of the potential outliers as candidates for future elimination and (ii) removing marked outliers. In the first step, one calculates the first differences *s* ′ *i* = *si*+<sup>1</sup> − *s<sup>i</sup>* from the initial dataset {*s<sup>i</sup>* } (that will be recalculated iteratively each time after a single outlier is eliminated). To mark the *i*th element of the dataset {*s<sup>i</sup>* } as a candidate for being an outlier ("OUT") three conditions should be met simultaneously:


Those elements for which all three conditions are met are then marked as "OUT". Out of the marked data {*s<sup>i</sup>* } the one with the largest normalized standard deviation is eliminated. After elimination of a single outlier the above procedure is repeated iteratively unless there are no values marked as outliers for the chosen elimination depth. For deeper details on the filtering algorithm, we refer to [35].

Since the sequence of pulse intervals was non-equidistant due to its inherent variability, we next used the cubic interpolation and resampling with the desired sampling frequency, 5 Hz in our case. Therefore, both analysed datasets were now represented by sequences equidistant in time and taken at the same time points.

#### *2.3. BRS Estimation*

To calculate BRS from blood pressure and pulse interval recordings during tilt tests, we followed a recently suggested methodology that is particularly suited for dealing with non-stationary data [6]. The first differences of SBP and PI values were taken and the BRS was estimated as a linear regression coefficient of ∆PI on ∆SBP in those quadrants where the signs of ∆PI and ∆SBP were identical. To disregard uncertain as well as anomalous variations beat-to-beat changes of less than 1 mmHg in SBP or less than 3 ms in PI and more than 20 mmHg in SBP or more than 100 ms in PI were ignored.

#### *2.4. The Method of PI and BRS Phase Synchronization Measurement*

To estimate the mutual synchronization behaviour of the two signals quantitatively we used the method based on the comparison of their phases [36]. Instantaneous phase values were determined by the Hilbert transform which is widely used in mathematics, physics and signal analysis. For the overall estimation algorithm design, see Figure 1.

The Hilbert transform produces complex function . *s*(*t*) from the original real signal *s*(*t*) (that stands for either SBP or PI) by adding the imaginary component *s*⊥(*t*), which is defined as:

$$s\_{\perp}(t) = \frac{1}{\pi} \int\_{-\infty}^{\infty} \frac{s(\tau)}{t - \tau} d\tau. \tag{1}$$

The resulting complex function is known as the complex analytical signal . *s*(*t*) = *s*(*t*) + *js*⊥(*t*).

The real and imaginary parts of the analytical signal allowed us to determine the envelope *S*(*t*) as the absolute value of the analytic signal that characterized the laws governing its amplitude modulation, and the phase Φ(*t*) as the argument of the analytical signal that characterized the laws governing its angular modulation. Accordingly;

$$S(t) = \sqrt{s^2(t) + s\_\perp^2(t)}, \ \Phi(t) = \text{arctg}\frac{s\_\perp(t)}{s(t)}.\tag{2}$$

Of note, the signal phase has a clear physical interpretation only in the case of harmonic or narrowband oscillations, while the above formalism was not restricted to these assumptions and thus allowed us to calculate phase values for arbitrary data sequences. Next, we determined the differences between the phases ΦPI–ΦSBP. Following [27,28] we next applied moving average filtering in a gliding window of size τ and calculated the standard deviation of the phase differences. Consecutive phase points where the standard deviation remained below a given threshold, equal to 2π/δ, were treated as belonging to synchronization episodes, once their duration exceeded *T* seconds. This procedure was applied to the entire record in a gliding window while counting episodes of synchronous behavior. As a result, a quantitative measure of the phase synchronization was the synchronization coefficient Sync defined as the percentage of the synchronous behaviour episodes duration within the total duration of the analysis window.

#### *2.5. Adjustment of Synchronization Analysis Algorithm Parameters*

For the initial adjustment of the methods and finding appropriate parameters of the synchronization analysis algorithm that fit to the typical blood pressure and heart rate variability characteristics, another set of 150 stationary records which were obtained independently from this study and already used in previously reported analysis [37] from subjects with various autonomic status under supine resting conditions (typical record duration around 10 min) have been used.

First optimization of synchronization algorithm parameters τ, *T* and δ was performed. The appropriate choice of these values depended on the specific experimental conditions and was not always universal for the given type of data. The parameters may be adjusted to optimize the sensitivity of the algorithm by avoiding the saturation at either very low or very high synchronization coefficients. The gliding window duration τ determined the number of phase points in the window. It was connected with the parameter δ used to calculate the threshold for the standard deviation of the instantaneous phase first differences since the standard deviation calculated for a finite data sample depended on the sample size.

To choose the appropriate window size τ, first the boundary conditions that specify its possible range were determined. One of the common hypotheses of the Mayer waves origin is their baroreflex loop based nature, suggesting that their period of about 10 s corresponds to the full feedback loop cycle [38]. Accordingly, any internally or externally induced changes in blood pressure were followed by characteristic regulatory oscillations with the Mayer waves period. Thus, choosing the gliding window size that was comparable or above this 10 s period eliminated these short-term regulatory oscillations by averaging. Alternatively, to ensure that the observed variations in both blood pressure and pulse intervals were proper measurements not caused by single faulty measurements and do not appear artifacts of preliminary filtering procedures, one has to guarantee that at least several actual measurements have been performed in each window. Taking into account typical heart rate values, a 3 s gliding window will typically result in having 3–4 pulse measurements under resting conditions and even more under stress conditions with an increased heart rate. Accordingly, one has to restrict with τ above 3 s, and preferably, not longer than half period of Mayer oscillations, that is 5 s. For a better temporal resolution of the analysis, choosing the lower bound of the appropriate range that was a 3 s gliding window seems plausible. Next the threshold for the standard deviation of the

phase difference was chosen empirically as 2π/300, *T* = 0.4 s, the latter adjusted empirically to avoid saturation and this way increase the dynamic range of the Sync index.

The graphic insets in Figure 2 exemplify the analysis results for a single tilt-test record. In addition to pre-processed data sequences and their Hilbert phases, the figure displays their first differences as well as the results of their standard deviation analysis in a gliding window (in the lower right panel). The dashed lines denote the 2π/δ threshold for the standard deviation of the phase difference. The bold solid curve denotes ΦPI–ΦSBP and curve fragments highlighted by red color denote the synchronization episodes where the standard deviation of the phase difference remains below the 2π/δ threshold. Finally, the fraction of such episodes within the total record duration determines the synchronization coefficient Sync. π Φ Φ π

**Figure 2.** Systolic blood pressure vs. pulse intervals synchronization analysis algorithm design.

The figure shows that, while for the entire analysed fragment the synchronization coefficient Sync was somewhat around 50%, it exhibited considerable variations along the record. While the first part of the record was characterized by prolonged episodes of synchronous behaviour interrupted by few short-term asynchronous fragments, the second part of the fragment exhibited more frequent asynchronous behaviour episodes interrupted by rather few coupling patterns. While the particular reasons for each onset and breakdown of this coupling can hardly be determined, there was a clear discrepancy between the first and the second part of the record in terms of their synchronization patterns. Such changes in the synchronization behaviour of blood pressure and pulse intervals could have been triggered by some physical or mental stress like change in the body position, and so on. Accordingly, reactions to various stress patterns that were imposed during functional tests can be studied in terms of the changes in the blood pressure—the pulse intervals synchronization coefficient, Sync. This requires that the recordings being analysed during different test phases. For example, for the head-up tilt table testing the supine and the orthostatic test, phases could be analysed separately, this way allowing us to evaluate how the synchronization pattern changes in response to the orthostatic stress. An additional advantage of the proposed methodology is that the evaluation of the degree of phase synchronization between SBP and PI may be useful in the study of regulatory functions in the human body during various functional tests, since it does not require data stationarity.

While the BRS value characterizes the intensity of the heart rate reaction to the blood pressure changes during observed feedback responses, the Sync value likely indicates how often such responses are activated in the first place. Accordingly, low intensity of the reaction characterized by low BRS that should result in higher than normal blood pressure variability, theoretically, could be at least partially compensated by its more frequent activation characterized by higher Sync.

#### *2.6. Alternative Mutual Information Metrics*

While according to the results of a recent study [30], Sync appears the most sensitive of various mutual synchronization indices, it has its own drawbacks, including high sensitivity to any (including random) variations in the analyzed signals, that requires seeking for more robust alternatives that would respond more specifically only to significant variations, although typically at the cost of their lower sensitivity.

Among them, the third approach is the time delay stability estimate is based on the analysis of the relative shift of the maximum of the cross-correlation function of two studied series originally proposed in [28]. In this approach, the average delay in 50% overlapping windows of fixed length is calculated, and the time delay stability episode is determined once within at least five consecutive windows the shift of the maximum of the cross-correlation function remains below a given threshold. Like in the previous method, in order to estimate the time delay stability coefficient for an entire record, the TDS value is determined as the fraction of the time delay stability episodes in the total record duration. Similarly, the starting set of the parameters used in this study follows the results of a recent simulated data based investigation [30].

One more quantity utilized here is a certain combination of the previous two approaches and is based on the analysis of the correlation time of the phase differences of the studied data series. Like in the first method, the phase differences *K*Φ*i*(τ) are determined using the Hilbert transform. Next the observational data series are divided into 50% overlapping time windows of duration *T*, and in each time window the correlation time of the phase differences is calculated as:

$$\inf\_{i} \frac{\int |\mathsf{K}\_{\Phi i}(\tau)| d\tau}{\int\_{\Phi i} \mathsf{K}\_{\Phi i}(\mathbf{0})}.\tag{3}$$

To obtain the overall statistics for a given data series, the averaged correlation time over all studied time windows is calculated. Once the window size is well above observed correlation times, this method becomes parameter-free.

The fifth approach utilizes the mean coherence of two data series [3]. Contrasting with the three above described methods that are all obtained in the time domain, the coherence function is obtained in the frequency domain and is calculated as:

$$\mathcal{C}\_{xy}(f) = \frac{\left|P\_{xy}(f)\right|^2}{P\_x(f)P\_y(f)}.\tag{4}$$

where *Px*(*f*) and *Py*(*f*) are the individual spectral densities, and *Pxy*(*f*) is the cross-spectral density of the analyzed datasets.

The sixth method is based on the calculation of cross-conditional entropy of the two series *x*(*t*) and *y*(*t*) according to the approach described in [18]. First, the analyzed data series are normalized. Then in the series *y*(*t*) the patterns including *L* − 1 samples are selected. The cross-conditional entropy is then defined as:

$$CE(L) = -\sum\_{k=1}^{M} p(f\mathbf{x}\_k) \sum\_{i=1}^{N} p(y(i)/f\mathbf{x}\_k) \times \log(p(y(i)/f\mathbf{x}\_k)),\tag{5}$$

where *fx* is a data fragment of size *L* − 1 selected from the first data series *x*(*i*); *p*(*fx*) is the probability of the observation of *fx* within the series *y*(*t*); is the number of corresponding fragments; *p*(*y*(*i*)/*fx*) is the conditional probability of a particular sample *y*(*i*) to be observed within a series *y*(*t*) following the *fx* pattern.

The above index represents the amount of information carried by the sample *x*(*i*) when the pattern *fx* is assigned. Thus, the coefficient *CE* depends on the pattern size *L*. It reaches zero when a sufficient number of samples of *y*(*t*) carries the entire information of behavior of *x*(*t*). It remains high and constant if the processes *x*(*t*) and *y*(*t*) are independent and yields intermediate values when knowledge of *y*(*t*) allows for a limited prediction of the behavior of *x*(*t*).

Figure 3 exemplifies the entire analysis procedure for a single tilt-test record. After preliminary data preparation both SBP and PI sequences are subjected to the Hilbert transform and phase detection as indicated above. The entire algorithm was implemented using the Matlab software package.

**Figure 3.** Typical dynamics of blood pressure and pulse intervals during a head-up tilt test, vertical red lines denote the test phases.

#### *2.7. Statistical Analysis*

Since our preliminary studies [35–37] indicated that the studied blood pressure—pulse intervals coupling metrics are not normally distributed, we used the methods of non-parametric statistics to process our results. To quantify the statistical significance of our results, we next applied the non-parametric Mann–Whitney U-test for independent samples [39]. To calculate the correlations between the synchronization coefficient, the standard deviations of pulse intervals and systolic blood pressure and the baroreflex sensitivity we used the non-parametric Spearman's correlation coefficients. All statistical analysis was performed using the IBM SPSS Statistics software package.

#### **3. Results**

Remarkably, while the orthostatic hypotension patients (Group 1) were characterized by a smaller number of reduced autonomic tests results (5.0 ± 1.2 vs. 2.5 ± 1.1) suggesting rather moderate autonomic dysfunction particularly less severe than that one in the diabetes patients (Group 2), they were characterized by considerably more pronounced blood pressure reduction during tilt test (34 ± 13 vs. 17 ± 14 for systolic and 17 ± 7 vs. 2 ± 16 for dyastolic blood pressure, respectively).

Analysis of both the entire tilt test records as well as each of its phases revealed that, as expected, the BRS differs significantly between healthy individuals and both patient groups, according to the non-parametric Mann–Whitney U-test (*p* < 0.05 for the first group with orthostatic hypotension and *p* < 0.001 for the second group of diabetes patients with autonomic neuropathy). Remarkably, in addition to the well-established BRS index, nearly all studied mutual information indices exhibited significant differences between the patients' groups and the healthy volunteers (*p* < 0.05). In marked contrast, when

it comes to differential diagnostics, only a few studied metrics have shown significant discrepancies between the first and the second patients' groups. In particular, besides the well established BRS index (*p* < 0.05), also Sync (*p* < 0.05) and TDS (*p* < 0.005) mutual information metrics have shown significant differences (see also Figures 4–6 for visual illustration).

**Figure 4.** The overall average of the six studied mutual information metrics: (**a**) Sync, (**b**) TDS, (**c**) *TAU*, (**d**) *CE*, (**e**) *Coher* and (**f**) BRS averaged for the overall duration of the tilt test records. The horizontal axis denotes group number (also denoted by color): 1 (blue)—orthostatic hypotension; 2 (red)—diabetes mellitus with autonomic neuropathy; 3 (green)—healthy volunteers (control group).

**Figure 5.** The values of the six studied mutual information metrics: (**a**) Sync, (**b**) TDS, (**c**) *TAU*, (**d**) *CE*, (**e**) *Coher* and (**f**) BRS obtained during the supine phase of the tilt test. The horizontal axis denotes group number (also denoted by color): 1 (blue)—orthostatic hypotension; 2 (red)—diabetes mellitus with autonomic neuropathy; 3 (green)—healthy volunteers (control group).

**Figure 6.** The values of the six studied mutual information metrics: (**a**) Sync, (**b**) TDS, (**c**) *TAU*, (**d**) *CE*, (**e**) *Coher* and (**f**) BRS obtained during the orthostatic phase of the tilt test. The horizontal axis denotes group number (also denoted by color): 1 (blue)—orthostatic hypotension; 2 (red)—diabetes mellitus with autonomic neuropathy; 3 (green)—healthy volunteers (control group).

For a better representation of the overall response patterns quantified simultaneously by several indicators, Figure 7 also depicts the star-style diagrams summarizing our results after appropriate normalization. Those indices that were not normalized by definition have been rescaled by dividing all values by the corresponding observed maxima.

Significant changes between the supine and the orthostatic phases of the tilt test could be observed for nearly all studied metrics except CE in the diabetes patients' group (*p* < 0.05). However, only BRS and Coher metrics (*p* < 0.01) significantly differed between the supine and the orthostatic test phases both in the control group and in the first patient group with orthostatic hypotension, while no significant differences between the test phases could be observed in other studied indices.

In particular, although BRS reduced significantly in the orthostatic position for the patients with non-diabetic orthostatic hypotension (Group 1), given that their initial *BRS* values (9.59 ± 4.65) (medians ± interquartile ranges are given here and below) in the supine position were only slightly below those in the control group (see Figure 5), they appeared only moderately reduced also in the orthostatic position (5.16 ± 4.12, see also Figure 6). In contrast, in diabetic patients (Group 2) characterized by already low BRS values in the supine position (4.07 ± 2.98, see also Figure 5), its comparable relative reduction in the orthostatic position led to much lower absolute *BRS* values (2.73 ± 1.03, see also Figure 6).

According to these data, one could expect higher blood pressure variability in Group 2, as their autonomic nervous system is less sensitive to the changes in blood pressure, and their feedback response to blood pressure variations is weaker than in Group 1. Surprisingly, patients in Groups 1 and 2 demonstrated rather comparable blood pressure variability (SBP standard deviations were 7.83 ± 4.13 vs. 5.58 ± 3.08 mmHg in the supine and 7.69 ± 4.71 vs. 6.15 ± 1.84 in the orthostatic position, see also Figures 5 and 6, respectively).

**Figure 7.** The star-style diagrams representing the overall responses to the tilt test based on a series of studies mutual dynamics indicators averaged for the overall duration of the tilt test records: (**a**) for the orthostatic hypotension patients (group 1), (**b**) for the diabetes patients with autonomic neuropathy (group 2) and (**c**) for the healthy subjects (control group); also for the supine tilt test phase: (**d**) for the orthostatic hypotension patients (group 1), (**e**) for the diabetes patients with autonomic neuropathy (group 2) and (**f**) for the healthy subjects (control group); as well as for the orthostatic tilt test phase: (**g**) for the orthostatic hypotension patients (group 1), (**h**) for the diabetes patients with autonomic neuropathy (group 2) and (**i**) for the healthy subjects (control group). Numbers above the diagrams also indicate the patient/subject groups (also denoted by different colors). The filled areas represent the interquartile ranges (IQR), while small stars within the filled areas represent the medians.

Moreover, the synchronization coefficient Sync demonstrated significant negative correlations with the standard deviation of the systolic blood pressure during the orthostatic phase of the tilt test. This seems to be rather a universal phenomenon, as it is reproduced well not only qualitatively but also quantitatively in all studied groups (Spearman's correlation coefficient ρ ≈ −0.5, *p* < 0.05). The above

indicates that in addition to the established BRS, synchronization patterns between blood pressure and pulse intervals also appear important markers of an adequate cardiovascular response to the orthostatic stress.

#### **4. Discussion**

While key indicators of the cardiovascular feedback regulation are characterized by well-known measures such as BRS, some details could not be revealed by using this single indicator, especially when it comes to the differential diagnostics. Despite of the pronounced discrepancies between the BRS values in patients from Groups 1 and 2 (approximately twofold between their median values for both supine and orthostatic tilt-test phases), blood pressure variability did not exhibit any significant discrepancies between the two patient groups, indicating that there might be another contributing factor that also plays an important role.

Such a factor could be revealed and, furthermore, quantified by considering the synchronization coefficient Sync and the time delay stability TDS metrics both indicating the (normalized) total duration of the time fragments when rather synchronous dynamics of blood pressure and pulse intervals could be observed. Our results indicate that in all studied groups under normal conditions both Sync and TDS demonstrate behavior that appears rather *reciprocal* to BRS. This indicates that in patients with reduced BRS the feedback mechanisms are likely being *activated more frequently* this way trying to compensate their lower sensitivity and intensity.

This compensation seems to be rather a universal phenomenon, as the reciprocal character of Sync and TDS vs. BRS can be clearly observed in the comparison between them in different studied groups in both supine and orthostatic positions (see boxplots in Figures 4–6). The compensation hypothesis is further supported by the fact that during *all* tilt test phases in both patients from Group 1 and healthy subjects always negative (although not in all cases statistically significant) correlations between BRS and Sync have been observed. Although, qualitatively, a similar effect could also be observed when considering TDS instead of Sync, it appears less pronounced that could be likely attributed to the lower sensitivity of TDS when compared to Sync, as revealed by a recent computer simulations based study [30].

A prominent exception from the above reciprocal relationship as well as from the corresponding negative correlation pattern could be observed only in diabetes patients likely indicating a breakdown of the above compensatory mechanism at least in some of the patients. Notably, this appears generally in line with recent studies of baroreflex control where significant dependence of the orthostatic baroreflex performance on concomitant conditions such as obesity has been reported [21]. Moreover, in the same study a significantly higher number of baroreflex sequences per given time window (that could also serve as another possible substitute to Sync and/or TDS mutual synchronization metrics) likely indicating more frequent activation of the baroreflex loop could be observed in diabetes patients with concomitant obesity compared to control subjects, although differences between diabetes patients and weight-matched control subjects appeared insignificant. Since we have observed similar effect in our study by using the suggested Sync and TDS indicators, also concomitant conditions such as obesity presumably play a key role in the impairment of baroreflex control not only in terms of its low sensitivity, but also in terms of its timely activation in response to blood pressure variations, although more detailed investigations are required to further elucidate the key factors that influence this possible compensation breakdown.

Therefore, we believe that proposed additional indicators could be useful for the improvement of the differential diagnostics of blood pressure regulation efficacy and also lead to a deeper insight into the involved concomitant factors this way also aiming at the improvement of the mathematical models representing the underlying feedback control mechanisms.

Finally, the proposed complementary mutual behavior indicators might appear potentially useful for the analysis of other physiological signals as well as for the quantification of the alternative stress test responses other that the tilt test, for example, in stress detection studies in daily life scenarios (for recent examples of relevant investigations see e.g., [40,41]). However, for further practical utilization of these complementary indicators in other differential diagnostic scenarios, design of dedicated prediction tools based on, e.g., star-style diagram pattern recognition and/or shape analysis with anomaly detection [42] or multivariate regression models with decision-making procedure based on the analysis of appropriately weighted combination of several complementary indices (see, e.g., [43] for a recent example) is required.

#### **5. Conclusions**

To summarize, six different metrics of mutual coupling of simultaneously registered signals representing blood pressure and pulse interval dynamics have been considered in this study. Stress test response patterns represented by the reaction of the recorded signals to the external input by tilting the body into the upright position have been analyzed. While nearly *all* studied metrics significantly differed between patients and healthy subjects, only a few of them appeared informative for the differential diagnostics of patients with autonomic disorders of different etiology and severity. Besides the widely used BRS index, also Sync and TDS mutual information metrics representing the temporal fractions of the analyzed signal records exhibiting rather synchronous dynamics exhibited significant differences. While BRS characterizes the *intensity* of the pulse intervals response to the blood pressure changes during observed feedback responses, both Sync and TDS likely indicate how often such responses are *activated* in the first place. Our results indicate, that in most cases, BRS is typically *reciprocal* to both Sync and TDS suggesting that *low intensity* of the feedback responses characterized by low BRS is rather compensated by their *more frequent activation* indicated by higher Sync and TDS. A notable exception can be observed in diabetes patients with autonomic neuropathy, where a likely breakdown of this compensation could be observed.

**Author Contributions:** Conceptualization, M.I.B. and O.V.M.; formal analysis, N.S.P.; investigation, S.A.P. and M.I.B.; project administration, O.A.M. and M.I.B.; resources, O.V.M.; software, N.S.P.; supervision, M.I.B.; writing—original draft, M.I.B.; writing—review and editing, O.A.M., O.V.M. and M.I.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** We would like to acknowledge the financial support of this work by the Ministry of Science and Education of the Russian Federation in the framework of the basic state assignment No. 2.5475.2017/6.7.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Wood Defect Detection Based on Depth Extreme Learning Machine**

#### **Yutu Yang <sup>1</sup> , Xiaolin Zhou <sup>2</sup> , Ying Liu 1,\*, Zhongkang Hu <sup>3</sup> and Fenglong Ding <sup>1</sup>**


Received: 18 September 2020; Accepted: 22 October 2020; Published: 24 October 2020

**Abstract:** The deep learning feature extraction method and extreme learning machine (ELM) classification method are combined to establish a depth extreme learning machine model for wood image defect detection. The convolution neural network (CNN) algorithm alone tends to provide inaccurate defect locations, incomplete defect contour and boundary information, and inaccurate recognition of defect types. The nonsubsampled shearlet transform (NSST) is used here to preprocess the wood images, which reduces the complexity and computation of the image processing. CNN is then applied to manage the deep algorithm design of the wood images. The simple linear iterative clustering algorithm is used to improve the initial model; the obtained image features are used as ELM classification inputs. ELM has faster training speed and stronger generalization ability than other similar neural networks, but the random selection of input weights and thresholds degrades the classification accuracy. A genetic algorithm is used here to optimize the initial parameters of the ELM to stabilize the network classification performance. The depth extreme learning machine can extract high-level abstract information from the data, does not require iterative adjustment of the network weights, has high calculation efficiency, and allows CNN to effectively extract the wood defect contour. The distributed input data feature is automatically expressed in layer form by deep learning pre-training. The wood defect recognition accuracy reached 96.72% in a test time of only 187 ms.

**Keywords:** wood defect; CNN; ELM; genetic algorithm; detection

#### **1. Introduction**

Today's wood products are manufactured under increasingly stringent requirements for surface processing. In developed countries such as Sweden and Finland with developed forest resources, the comprehensive use rate of wood is as high as 90%. In sharp contrast, the comprehensive use rate of wood in China is less than 60%, causing a serious waste of resources. With China's rapid economic development, people are increasingly pursuing a high-quality life, which will inevitably lead to an increase in demand for wood and wood products, such as solid wood panels, wood-based panels, paper and cardboard, and other consumption levels are among the highest in the world. The existing wood storage capacity and processing level make it difficult to meet the rapid growth demand. The lack of wood supply and the low use rate have led to the limited development of China's wood industry. Therefore, it is necessary to comprehensively inspect the processing quality of logs and boards to improve the use rate of wood and the quality of wood products.

The nondestructive testing of wood can accurately and quickly make judgments on the physical properties and growth defects in wood, and nondestructive wood testing and automation can be realized. In recent years, the combined application of computer technology along with detection and control theory has made great progress in the detection of wood defects. In the nondestructive testing of wood surfaces, commonly used traditional methods include laser testing [1,2], ultrasonic testing [3–5], acoustic emission technology [6,7], etc. Computer-aided techniques are a common approach to surface processing, as they are efficient and have a generally high recognition rate [8,9]. Deep learning was first proposed by Hinton in 2006; in 2012, scholars adopted an AlexNet network based on deep learning to achieve computer vision recognition accuracy of up to 84.7%. Deep learning prevents dimensionality in layer initialization and represents a revolutionary development in the field of machine learning [10,11]. More and more scholars are applying deep learning networks in wood nondestructive testing. He [12] et al. used a linear array CCD camera to obtain wood surface images, and proposed a hybrid total convolution neural network (Mix-FCN) for the recognition and location of wood defects; however, the network depth was too deep and required too much calculation. Hu [13] and Shi [14] used the Mask R-CNN algorithm in wood defect recognition, but they used a combination of multiple feature extraction methods, which resulted in a very complex model. However, the current deep learning algorithms still have problems such as inaccurate defect location, incomplete defect contour and boundary information in the wood defect detection process. To solve the above problems and effectively meet the needs of wood processing enterprises for wood testing, we carry out the research of this article.

The innovations of this article are: (1) Simple pre-processing of wood images using nonsubsampled shear wave transform (NSST), reducing the complexity and computational complexity of image processing, as the input of convolutional neural network; (2) application of a simple linear iterative clustering (SLIC) algorithm to enhance and improve the convolutional neural network to obtain a super pixel image with a more complete boundary contour; (3) use of genetic algorithm to improve extreme learning machine and classified the obtained image features. Through the above method, the accuracy of defect detection is improved, and the recognition time is truncated to establish an innovative machine-vision-based wood testing technique.

#### **2. Materials and Methods**

#### *2.1. Wood Defect Original Image Dataset*

According to the different processes and causes of solid wood board defects, they are divided into biohazard defects, growth defects and processing defects. Among them, growth defects and biohazard defects are natural defects, which have certain shape and structure characteristics, and are also an important basis for wood grade classification. Generally speaking, solid wood board growth defects and biohazard defects can be divided into: dead knots, live knots, worm holes, decay, etc. The original data set used in the experiment in this article is derived from the wood sampling image in the 948 project of the State Forestry Administration (the introduction of the laser profile and color integrated scanning technology for solid wood panels). When scanning to obtain wood images, the scanning speed of the scanner is 170 Hz–5000 Hz; Z direction resolution is 0.055 mm–0.200 mm; X direction resolution is 0.2755 mm–0.550 mm; and color pixel resolution can reach 1 mm × 0.5 mm. The data set includes 5000 defect maps of pine, fir, and ash. The bit depth of each image is 24, and the specified size is at the 100\*100 pixel level. Part of the defect image is shown in Figure 1.

**Figure 1.** Common defects of solid wood such as dead-knot, live-knot and decay.

#### *2.2. Optimized Convolution Neural Network*

This paper proposes an optimized algorithm which uses NSST to preprocess the images followed by the CNN to extract defect features from wood images as a preliminary CNN model. The simple linear iterative clustering (SLIC) super-pixel segmentation algorithm is used to analyze the wood images by super-pixel clustering, which allows the defects in wood images and local information regarding defects and cracks to be efficiently located. The obtained information is fed back to the initial model, which enhances the original CNN.

#### 2.2.1. Structure and Characteristics of Convolution Neural Networks

The CNN is an artificial neural network algorithm with multi-layer trainable architecture [15]. It generally consists of an input layer, excitation layer, pool layer, convolution layer, and full connection layer. CNNs have many advantages in terms of image processing applications. (1) Feature extraction and classification can be combined into the same network structure and synchronized training can be achieved, and the algorithm is fully adaptive. (2) When the image size is larger, the deep feature information can be extracted better. (3) Its unique network structure has strong adaptability to the local deformation, image rotation, image translation, and other changes in the input image. In this study, each pixel in the wood image was convoluted and the defect feature was extracted by exploiting these CNN characteristics. The CNN network skeleton used in this article is shown in Figure 2.

**Figure 2.** CNN network skeleton.

#### 2.2.2. Non-Subsampled Shearlet Transform (NSST)

The NSST can represent signals sparsely and optimally, but it also has a strong directionsensitivity [16–18]. Therefore, using NSST to preprocess wood images can preserve the defects feature of wood images. Redundancies in the wood image information are reduced in addition to the complexity and computation of image processing with the depth learning method.

#### 2.2.3. Simple Linear Iterative Clustering (SLIC)

The CNN uses a matrix form to represent an image to be processed, so the spatial organization relationship between pixels is not considered—this affects the image segmentation and obscures the boundary of the defective region of the wood image. The SLIC algorithm can generate relatively compact super-pixel image blocks after processing a gray or color image. The generated super-pixel image is compact between pixels, and the edge contour of the image is clear. To this effect, the SLIC extracts a relatively accurate contour to supplement the feature contour. The SLIC also works with relatively few initial parameters. Only the number of hyper pixels needed to segment the image must be set. The algorithm is simple in principle, and has a small calculation range and rapid running speed. By 2015, the parallel execution speed had reached 250 FPS; it is now the fastest super-pixel segmentation method available [19].

#### 2.2.4. Feature Extraction

The optimized CNN model proposed in this paper was designed for wood surface feature extraction. Knots were used as example wood defects (Figure 3a) to test feature extraction via the following operations. The input image Figure 3a was directly processed by CNN algorithm to obtain image Figure 3b, which presented local irregularity and nonsmooth edges in the contour after enlargement [20]. The SLIC algorithm was used to process the input image (Figure 3a) followed by longitudinal convolution (Figure 3d). The image shown in Figure 3h was obtained after edge removal and fusion processing. The defect contour features of Figure 3h are substantially clearer compared to Figure 3b because the segmentation of wood images using CNN, which is expressed in pixels as a matrix without considering the spatial organization relationship between pixels, affects the end image segmentation results. The SLIC algorithm instead extracts the wood defect boundary and contour information from the original image and feeds back the information to the initial segmentation results of the CNN model.

The above process reduces the redundancy of local image information in addition to the complexity and computation of the image processing. The pixel-level CNN model method does not accurately reveal the boundary of the defective region of wood image, but instead indicates only its general position. SLIC can extract a relatively accurate contour to supplement it and optimize the initial CNN model. To this effect, the proposed SLIC algorithm-based method improves the defect feature extraction of wood images over CNN alone.

The input image (Figure 3a) was processed by the NSST algorithm to obtain the image shown in Figure 3e, then Figure 3f was obtained using the SLIC algorithm. Vertical convolution was carried out to obtain the image shown in Figure 3g. Figure 3i was obtained after edge removal and fusion processing. Consider Figure 3i compared to Figure 3h: although the wood defect contour feature extraction effects are not obvious, using NSST to preprocess the image reduces environmental interference and training depth to markedly decrease the computation and complexity of the image processing.

**Figure 3.** Contrast diagram of optimized CNN feature extraction effects.

Based on the above analysis, this paper decided to use the wood image processing frame shown in Figure 4 to obtain the wood defect feature map.

**Figure 4.** Wood image processing frame.

#### *2.3. Extreme Learning Machine (ELM)*

Depth learning is commonly used in target recognition and defect detection applications due to its excellent feature extraction capability. The CNN is highly time-consuming due to the necessity of iterative pre-training and fine-tuning stages; the hardware requirements for more complex engineering applications are also high. The deep CNN structure has a large quantity of adjustable free parameters, which makes its construction highly flexible. On the other hand, it lacks theoretical guidance and is overly reliant on experience, so its generalization performance is dubious. In this study, we integrated the ELM into a depth extreme learning machine (Figure 5) to improve the training efficiency of the deep convolution network. The proposed method extracts wood defects by using an optimized CNN and ELM classifier to exploit the excellent feature extraction ability of the deep network and fast training of ELM simultaneously.

The ELM algorithm differs from traditional pre-feedback neural network training learning. Its hidden layer does not need to be iterated, and input weights and hidden layer node biases are set randomly to minimize training error. The output weights of the hidden layer are determined by the algorithm [21–23]. The ultimate learning machine is based on the proved ordinary extreme theorem and interpolation theorem, under which when the hidden layer activation function of a single hidden layer feedforward neural network is infinitely differentiable, its learning ability is independent of the

hidden layer parameters and is only related to the current network structure. When the input weights and hidden layer node offsets are randomly assigned to obtain the appropriate network structure, the ELM has universal approximation capability. The network input weights and hidden layer node offsets can be randomly assigned by approximating any continuous function. Under the premise of network hidden layer activation function infinite differentiability, the output weights of the network can be calculated via the least square method. The network model that can approximate the function can be established, and the corresponding neural network functions such as classification, regression, and fitting can be realized.

**Figure 5.** Contrast diagram of optimized CNN feature extraction effects.

This paper mainly centers on the classification function of ELM, which serves to select a relatively simple single hidden layer neural network as the classifier. The traditional neural network algorithm needs many iterations and parameters, learns slowly, has poor expansibility, and requires intensive manual interventions. The ELM used here requires no iterations, the learning speed is relatively fast, the input weights and biases are generated randomly, the follow-up does not need to be set, and relatively few manual interventions are required. In the large sample database, the recognition rate of ELM is better than that of the support vector machine (SVM). For these reasons, we use ELMs as classifiers to enhance recognition efficiency and performance [24,25].

The ELM algorithm introduced above is the main classification method for wood defect feature recognition in this paper. However, in the ELM network structure, the input weights and the threshold of hidden layer nodes are given randomly. For the ELM structure with the same number of hidden layer neurons, the performance of the network is very different, which makes the classification performance of the network more unstable. The genetic algorithm (GA) simulates Darwinian evolutionary theory to optimize the initial weights and threshold of ELM by eliminating less-fit weights and thresholds.

Figure 6a,c show the variation curves of GA-ELM population fitness functions under Radbas, Hardlim, and Sigmoid excitation functions, respectively. Smaller fitness values indicate higher accuracy; the Sigmoid incentive function has the best network effect.

**Figure 6.** Variation curves of driving function population fitness; (**a**) Radbas driving function; (**b**) Hardlim driving function; (**c**) Sigmoid driving function.

The classification accuracy of GA-ELM and ELM under different excitation functions is also shown in Table 1. We found that the classification accuracy of Sigmoid and Radbas excitations were similar, and the Hardlim excitation function was an exception. The accuracy of ELM and GA-ELM was highest when the Sigmoid function was used as the activation function. The accuracy of GA-ELM reached 95.93%, which is markedly better than that of an unoptimized ELM. The GA optimized ELM network required fewer hidden layer nodes and showed higher test accuracy as well.


**Table 1.** Classification accuracy of GA-ELM and ELM under different excitation functions.

In summary, an improved depth extreme learning machine was constructed in this study by combining the optimized GA-ELM classifier with the optimized CNN feature extraction. It is referred to from here on as "D-ELM".

#### **3. Experimentation**

#### *3.1. Experimental Parameters*

Table 2 shows the computer-related parameters and software platform used by the experimental system, including CPU model, main frequency and memory size.


**Table 2.** Experimental parameters.

#### *3.2. Empirical Method and Result*

The specific experimental process is shown in Figure 7. First, we preprocessed 5000 original images via NSST and randomly selected 4000 images for training. Second, for each pixel in each image, the neighborhood subgraph was taken as the input of CNN, and a total of 40,000,000 samples were obtained as the experimental training set to train the network model. The remaining 10,000,000 samples were used as test images to evaluate the algorithm. The features extracted from the test samples were input into the ELM network classifier, the number of hidden layer nodes of the extreme learning machine was set to 100, then the accuracy and stability of the feature extraction method was statistically analyzed. We found that when the number of iterations exceeds 3500, the loss function is around 0.2 and the convergence performance is acceptable.

**Figure 7.** Flowchart of wood defect feature extraction and classification process.

Figure 8a shows the relationship between the training loss value and the number of iterations. Although the training loss value fluctuates a little during the iteration process, it shows a downward trend as a whole. When the iteration is completed, the training loss value was around 0.2. Figure 8b shows a graph of accuracy. When the number of iterations was 1500, the accuracy of the proposed algorithm reached 90%. Accuracy continued to increase as iteration quantity increased until reaching a maximum of about 98%.

**Figure 8.** *Cont.*

**Figure 8.** Relationship between loss function, accuracy, iteration number: (**a**) Relationship between loss function and iteration; (**b**) accuracy graph.

Figure 9 shows our final recognition effect on the test set. We surround the identified wood defects with different colored rectangular borders

**Figure 9.** Recognition results based on based on deep learning.

#### **4. Discussion**

This paper proposes an ELM classifier based on depth structure. Choosing the appropriate number of hidden nodes under the D-ELM structure provides enhanced stability and generalization ability in the network. To ensure accurate tests and prevent node redundancy, when the number of hidden nodes was 100, the test accuracy of D-ELM was maintained at a relatively stable value over repeated tests. The accuracy was phased as shown in Figure 10. D-ELM significantly outperformed ELM with small fluctuations in amplitude, robustness to the number of test iterations, and higher network stability.

**Figure 10.** Accuracy of D-ELM and ELM after multiple tests.

Table 3 shows the results of our algorithm accuracy tests, as mentioned above. D-ELM has a higher average accuracy rate but lower standard deviation than ELM. The accuracy and stability of D-ELM network were both accurate and stable. As a result, the performance of the classifier was improved.

**Table 3.** D-ELM versus ELM stability.


We added an SVM classifier to the experiment to further assess the depth extreme learning machine. Table 4 shows the accuracy and timing of D-ELM and SVM training tests on all samples, where D-ELM again has the highest accuracy in both training and testing. Although the training time and network layer quantity are higher in D-ELM, its training time and test time are shorter than the other algorithms we tested, and its accuracy is much higher. The overall performance of D-ELM is better than that of ELM and SVM.


**Table 4.** Defect recognition of D-ELM ELM and SVM on wood images.

#### **5. System Interface**

We constructed a network model and classification optimizer based on the proposed algorithm by integrating Anaconda 3.5 and TensorFlow. We then constructed a real wood plate defect identification system in the C# development language on the Microsoft Visual Studio 2017 open platform. The system can identify defects in solid wood plate images and provide their position and size information. We also used a Microsoft SQL server 2012 database to store the information before and after processing.

Our experiment on deep network feature learning mainly involved the implementation of the network framework and the training model for wood recognition. The system is based on the network training model discussed in this paper; it can be used to detect defects in the wood image database on a single sheet and display the coordinates in the X and Y directions of the defects, as shown in Figure 11. On the left side, the scanned wood images are displayed with defects marked in green boxes. The top

right side of the interface shows the coordinates of each defect, the cutting position of the plank, and the type of defects. Below the table are the total numbers of defects identified by the machine and the recognition rate.


**Figure 11.** System interface diagram based on depth learning.

#### **6. Conclusions**

The depth extreme learning machine proposed in this paper has reasonable dimensions, effectively manages heterogeneous data, and works within an acceptable run time. Our results suggest that it is a promising new solution to problems such as obtaining marking samples, constructing features, and training. It has excellent feature extraction ability and fast training time. Based on the method of machine learning, The NSST transform is used to preprocess the original image (i.e., reduce its complexity and dimensionality while minimizing the down-sampling process in CNN), then SLIC is applied to optimize the CNN model training process. This method effectively reduces the redundancy of local image information and extracts relatively accurate supplementary feature contours. The optimized CNN is then used to extract wood image features and secure corresponding image features. The feature is input to the ELM classifier and the parameters of the related neural network are optimized. The GA is used to select the initial weight threshold of ELM to improve the prediction accuracy and stability of the network model. Finally, the image data to be tested is input to the well-trained network model and final test results are obtained.

We also compared the stability of D-ELM and ELM network models. The standard deviation of D-ELM was only 0.0967 and the accuracy of D-ELM improved by about 3% compared to ELM; the stability of the D-ELM network was also found to be higher and less affected by test quantity than ELM. We also found that D-ELM has an accuracy of up to 96.72% and a shorter test time than ELM or SVM at only 187 ms. The D-ELM network model is capable of highly accurate wood defect recognition within a very short training and detection time.

**Author Contributions:** Conceptualization, Y.Y. and Y.L.; methodology, Y.Y.; software, X.Z.; validation, Y.Y., Z.H. and F.D.; resources, X.Z. and Z.H.; writing—original draft preparation, Y.Y.; writing—review and editing, Y.L. and X.Z.; project administration, Y.L.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the 2019 Jiangsu Province Key Research and Development Plan by the Jiangsu Province Science and Technology under grant BE2019112, and was funded by the Jiangsu Province International Science and Technology Cooperation Project under grant BZ2016028, and was supported from the 948 Import Program on the Internationally Advanced Forestry Science and Technology by the State Forestry Bureau under grant 2014-4-48.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **A Division Algorithm in a Redundant Residue Number System Using Fractions**

**Nikolay Chervyakov <sup>1</sup> , Pavel Lyakhov 1,2,\* , Mikhail Babenko <sup>1</sup> , Irina Lavrinenko <sup>3</sup> , Maxim Deryabin <sup>1</sup> , Anton Lavrinenko <sup>3</sup> , Anton Nazarov <sup>1</sup> , Maria Valueva <sup>1</sup> , Alexander Voznesensky <sup>2</sup> and Dmitry Kaplun <sup>2</sup>**


Received: 3 December 2019; Accepted: 15 January 2020; Published: 19 January 2020

**Abstract:** The residue number system (RNS) is widely used for data processing. However, division in the RNS is a rather complicated arithmetic operation, since it requires expensive and complex operators at each iteration, which requires a lot of hardware and time. In this paper, we propose a new modular division algorithm based on the Chinese remainder theorem (CRT) with fractional numbers, which allows using only one shift operation by one digit and subtraction in each iteration of the RNS division. The proposed approach makes it possible to replace such expensive operations as reverse conversion based on CRT, mixed radix conversion, and base extension by subtraction. Besides, we optimized the operation of determining the most significant bit of divider with a single shift operation of the modular divider. The proposed enhancements make the algorithm simpler and faster in comparison with currently known algorithms. The experimental simulation using Kintex-7 showed that the proposed method is up to 7.6 times faster than the CRT-based approach and is up to 10.1 times faster than the mixed radix conversion approach.

**Keywords:** residue number system; redundant residue number system; modular division; fraction; algorithm

#### **1. Introduction**

The residue number system (RNS) has attracted many researchers as a basis for computing, and the interest taken in it has increased dramatically over the latest decade, which could be seen from the large number of papers focusing on the practical application of RNS in digital signal processing, image processing systems, cryptographic systems, quantum automated machines, neural computers systems, massive concurrency of operations, cloud computing, etc. [1–7].

RNS, if compared to other scales of notation, offers the advantage of rapid addition and multiplication, which causes stirs of interest in the RNS in areas requiring large amounts of computation. However, some operations, such as comparison and division of numbers, are very complicated in the RNS. Finding faster division algorithms would allow detecting more promising new areas to apply RNS.

*Appl. Sci.* **2020**, *10*, 695

The known RNS division algorithms [8–23] can be divided into two classes: based on the comparison of numbers, and based on the subtraction.

The algorithm for integer division operates similarly to a conventional binary division proposed in [1,2]. This algorithm and its modifications have a major drawback, namely that each iteration requires a comparison of numbers.

The algorithm without these drawbacks, as proposed in [1,2], is based on replacing the divider by an approximate value, which may be the product of one or several RNS modules. The algorithm provides a correct result for the condition *b* ≤ *b* < 2*b*, where *b* is an actual divider and *b* is an approximation of *b*. It is easy to see that this condition cannot be satisfied for all moduli sets (for example: *p*<sup>1</sup> = 9, *p*<sup>2</sup> = 11, *b* = 4).

The main disadvantages of this algorithm are the necessity of mixed radix conversion (MRC) and scaling operations use, and special logic and tables for determining the approximate divider. There have been proposed several algorithms for solving the problem of division based on a comparison of numbers and methods of determining the sign, which can be classified as follows: [8,10,15] using MRC, [9] to formulate the problem in terms of determining the even numbers, and [11] using the base extension operation in iterations. All the proposed algorithms, however, have the disadvantage of long computation time and high hardware costs due to the use of MRC, Chinese remainder theorem (CRT), and other costly operations.

In [12–14,16], a high-speed division algorithm is presented, which uses the comparison of higher degrees of dividend and divisor instead of using the MRC and CRT for the division of modular numbers. The time complexity and hardware costs in these algorithms are smaller than other algorithms, although this algorithm contains redundant stages. To speed up the calculation of the current quotients, Hung and Parhami suggested a division algorithm based on parity checking, in which the quotient calculation occurs two times faster than the algorithms [14,16]. However, the calculation of the higher powers of two is time-consuming in the RNS, which are carried out in each cycle.

The known algorithm of division in the RNS format, in addition to the RNS moduli set, also uses a replacement module system, which is an auxiliary to preserve the dividend and divisor residues. Presented in the RNS dividend and divisor are converted into a variety of RNS presentations with the various modules of the system [18]. Using two moduli sets of RNS leads to a large redundancy and the necessity for direct and reverse conversion from moduli set to the auxiliary and back for the division operation, which drastically reduces its speed. A fast algorithm for the division based on the use of the index over the Galois field transformation *GF*(*p*) is proposed in [18], which was simply implemented using LUTs (Look-Up Table). However, this algorithm is effective when processing data no more than 6–10 bits and when a modulus is a prime number. Thus, this algorithm is not efficient for large RNS ranges.

Most of the known iterative algorithms contain a large number of operations in each iteration. According to the authors, the algorithm based on the CRT with fractions considered in [11,17,23] is the best and has the time complexity *O*(*nb*), where *n* is the number of RNS modules and *b* is the number of bits in each module, assuming that the value of each module is more or less the same. The disadvantage of this algorithm is a set of operations performed in each iteration: the operations of addition, multiplication, comparison, and parity checking. Furthermore, the execution of the algorithm requires the conversion of the quotient from the system {−1, 0, 1} into the system {0, 1}, which gives an additional burden on the runtime of a modular division of number procedures.

In this paper, we propose an algorithm for division in the general case in the RNS using only the register shifts and summations. The improved algorithm has the following properties: it is very fast compared to the algorithms that are still available; has no restrictions on the dividend and divisor (except for when the divisor is equal to zero); it does not use a preliminary estimate of the coefficient; it does not use the back divisor; and does not use the base expansion operation. In [20], a similar approach based on mixed radix number system (MRNS) is proposed; however, in addition to the original RNS moduli set, it also used an auxiliary modulus set, which requires additional calculations

for data conversion and significantly slows down the calculation of the division result. The proposed algorithm allows increasing the performance of the division algorithm by using the CRTf method. In [14,16] the idea of the most significant bits for a quotient was proposed for an RNS with special moduli sets <sup>n</sup> 2 *k* , 2*<sup>k</sup>* <sup>−</sup> 1, 2*k*−<sup>1</sup> <sup>−</sup> <sup>1</sup> o and <sup>n</sup> 2 *<sup>k</sup>* + 1, 2*<sup>k</sup>* , 2*<sup>k</sup>* <sup>−</sup> <sup>1</sup> o , while in the proposed work, this approach is expanded to the case of general moduli set.

The main difference between this paper and [22] is that in this paper, a division algorithm for redundant RNS is proposed. Redundant RNS is intended for the organization of fault-tolerant calculations, while its modules are separated into informational, by which information is encoded, and redundant, necessary to restore information in case of errors. Separation of modules into information and redundant allows simplifying calculations by taking into account the information and redundant range of the system.

The known algorithms for dividing the numbers represented in the RNS are based on the absolute values of the dividend and the divisor. In this paper, we do not use the absolute values but their relative values, which allows reducing the computational complexity of division algorithms.

The rest of the paper is organized as follows: Section 2 describes the basics of RNS (Section 2.1) and approximate method for determining the placement of the number in it (Section 2.2). The proposed RNS division algorithm is presented in Section 2.3. Results and discussion are presented in Section 3.

#### **2. An Approximate Method for Determining the Positional Feature of the Modular Number**

#### *2.1. Residue Number System*

In the RNS, a positive integer is represented as a bank of residues to selected co-prime bases. This approach allows one to replace operations with large numbers by operations with small numbers, which are represented as residues of the division of large numbers by earlier selected relatively prime modules *p*1, *p*2, . . . , *pn*. Let

$$A \equiv \alpha\_1(\text{mod}p\_1), \ A \equiv \alpha\_2(\text{mod}p\_2), \ \dots, \ A \equiv \alpha\_n(\text{mod}p\_n). \tag{1}$$

Then, an integer *A* can be associated with the set (α1, α2, . . . , α*n*) of the least non-negative residues over one of the corresponding numbers. This correspondence will be one-to-one until *A* < *p*1*p*<sup>2</sup> . . . *pn*, according to the CRT. The set (α1, α2, . . . , α*n*) can be considered as one of the methods of the representation of the integer *A* in a computer, i.e., the modular representation or representation in the RNS.

The main advantage of this representation is the fact that the addition, subtraction, and multiplication operations are implemented very simply by the formulas:

$$A \pm B = (a\_1, a\_2, \dots, a\_n) \pm (\not p\_1, \not p\_2, \dots, \not p\_n) = ((a\_1 \pm \beta\_1) \text{mod} p\_1 \ (a\_2 \pm \beta\_2) \text{mod} p\_2 \ \dots, \ (a\_n \pm \beta\_n) \text{mod} p\_n) \tag{2}$$

$$A \times B = (a\_1, a\_2, \dots, a\_n) \times (\not p\_1, \not p\_2, \dots, \not p\_n) = ((a\_1 \times \beta\_1) \text{mod} p\_1 \ (a\_2 \times \beta\_2) \text{mod} p\_2 \ \dots, \ (a\_n \times \beta\_n) \text{mod} p\_n)$$

(3) These operations are called modular, since, for their execution in the RNS, it is sufficient to fulfill one cycle of processing numerical values. In addition, this processing occurs in parallel, and the information value in each modulo channel does not depend on the other modulo channels.

Thus, there are three main advantages of RNS [1].


3. RNS is a non-positional system with independent arithmetic units; therefore, an error in one channel does not apply to others. Thus, the processes of error detection and error correction are simplified.

However, such operations as sign detection, comparison, division, and some others are time-consuming and expensive in the RNS [4].

#### *2.2. Approximate Method*

An analysis of difficult (non-modular) operations has shown that they can be represented exactly or approximately, so the methods for calculating positional characteristics can be divided into two groups:


The methods for accurate calculation of positional characteristics are discussed in [1–3]. In this paper, we investigate the approximate method for calculating positional characteristics that can significantly reduce the hardware and time costs due to operations performed on positional codes of reduced capacity. In this regard, there is an issue of using the approximate method when calculating a certain number of non-modular procedures: determining intervals of numbers; number sign; number comparison, in cases where there is no need to know the exact value; and the difference between the numbers.

The point of the approximate method for calculating the positional characteristics of modular number is based on employing the relative values of the analyzed numbers to the full range defined by the CRT, which connects the positional number *a* with its representation in the remainder (α1, α2, . . . , α*n*), where α*<sup>i</sup>* is the smallest non-negative residues of the number in relation to the modules of the residue number system *p*1, *p*2, . . . , *p<sup>n</sup>* with the following expression:

$$a\_i = \left| \sum\_{i=1}^n \frac{P}{p\_i} |P\_i^{-1}|\_{p\_i} \alpha\_i \right|\_P \tag{4}$$

where *p<sup>i</sup>* are RNS modules, *P* = Q*n i*=1 *pi* is the range of RNS, *P<sup>i</sup>* = *<sup>P</sup> pi* = *p*1*p*<sup>2</sup> . . . *pi*−1*pi*+<sup>1</sup> . . . *pn*, and |*P* −1 *i* | *pi* is a multiplicative inversion of *P<sup>i</sup>* modulo *p<sup>i</sup>* .

If we divide the left and right parts of Expression (4) by the constant *P*, corresponding to the range of numbers, we will get the approximate value

$$\left| \frac{a}{P} \right|\_1 = \left| \sum\_{i=1}^n \frac{|P\_i^{-1}|\_{p\_i}}{p\_i} \alpha\_i \right|\_1 \approx \left| \sum\_{i=1}^n k\_i \alpha\_i \right|\_1 \tag{5}$$

where | ∗ |<sup>1</sup> denotes the fraction of ∗ (or Modulo 1 operation) [24], *k<sup>i</sup>* = |*P* −1 *i* | *pi pi* are constants of the chosen system, and α*<sup>i</sup>* are positions of the number represented in the RNS in modules *p<sup>i</sup>* , where *i* = 1, 2, . . . , *n*, and the value of the Expression (5) will be in the range [0, 1). The result of the sum shall be found after summation and discarding the integer part while maintaining the fractional part of the sum. The fractional value *F*(*a* ) = *a P* 1 ∈ [0, 1) contains information both on the magnitude of the number and on its sign. If *a P* 1 ∈ h 0, <sup>1</sup> 2 , then the number *a* is positive and *F*(*a*) is equal to the number of *a*, divided by *P*. Otherwise, *a* is a negative number, and 1 − *F*(*a*) indicates a relative value. Rounding *F*(*a*) to 2−*<sup>t</sup>* bits will be denoted as [*F*(*a* )]<sup>2</sup> <sup>−</sup>*<sup>t</sup>* . The exact value of *F*(*a*) is determined by inequalities [*F*(*a* )]<sup>2</sup> <sup>−</sup>*<sup>t</sup>* < *F*(*a* ) < [*F*(*a* )]<sup>2</sup> <sup>−</sup>*<sup>t</sup>* + 2 −*t* . The integer part, obtained through summing the constants *k<sup>i</sup>* , is a rank number; that is, a non-positional feature that shows how many times the range of the system *P* was surpassed while passing from the number representation in the residue number system to its positional representation. If necessary, the rank can be determined directly through the operation of the summation the constants *k<sup>i</sup>* . The fractional part can also be written as *A*mod1 because *A* = ⌊*A*⌋ + *A*mod1. The number of positions in the fractional part of the number is determined by the maximum potential difference between the adjacent numbers. In case of accurate comparison, which is widely used in the division of numbers, you need to calculate a value that is equivalent to the conversion of the RNS into the positional notation.

Rounding the *F*(*a*) value will inevitably result in an error. Let us denote ρ = −*n* + P*n pi* . Work [22]

*i*=1 shows that it is necessary to use *N* = l log<sup>2</sup> (*P*ρ) m bits after the decimal point when rounding the value *F*(*a*), so that the resulting error has no effect on the accuracy of calculations. In other words, there is established a one-to-one correspondence between the set of numbers represented in the RNS and the plurality of [*F*(*a* )]<sup>2</sup> <sup>−</sup>*<sup>N</sup>* values. Using the variables [*F*(*a* )]<sup>2</sup> <sup>−</sup>*<sup>N</sup>* in calculations, in terms of algorithmic complexity, is equivalent to applying the inverse transformation from the RNS into the positional notation using the CRT. This method is slow and therefore, in practice, the use of calculations with the values [*F*(*a* )]<sup>2</sup> <sup>−</sup>*<sup>N</sup>* is not rational. In [22], it is shown that it is possible to use the values [*F*(*<sup>a</sup>* )]<sup>2</sup> <sup>−</sup>*N*<sup>e</sup> , where *N*e < *N*, for operations of determining the number sign in the RNS. The point of this approach is based on the fact that when determining the sign there is no need to know the exact value of the number, and it is just enough to know about the range within which the number tested falls.

The algorithm for determining the sign of the number serves the basis for number comparison algorithms. Determining the sign of the number in the RNS using the values [*F*(*a* )]<sup>2</sup> <sup>−</sup>*<sup>t</sup>* , takes the following operations:


The speed of the algorithm at the stage of the «rough estimate» depends on how little the value *N*e is compared to *N*. However, if *N*e is taken as too little, then the intervals in Step 1 may be so small that the algorithm for numerous numbers in the RNS would require the use of the «clarification» stage, while the benefit of using a small capacity at the «rough estimate» stage would be dismissed completely. For example, in [13], it is proposed to use the case when *N*e = 4 that is usually too small for practice. Instead, we suggest using an estimation from [22], which shows that the optimal speed of the algorithm is achieved with *N*e ≈ log<sup>2</sup> (*N*ρ ln 4). Here below comes a comparison of the *N*e and *N* capacities for the RNS, where the ranges of 16, 32, and 64 bits are implemented.

1. Sixteen bits. RNS modules 7, 17, 19, 29.

 $P = \text{65569; } \rho = 68;$   $N = 23;$   $\tilde{N} = 11.$ 

2. Thirty-two bits. RNS modules 2, 3, 5, 11, 13, 19, 23, 29, 79.

$$P = 4295006430; \,\rho = 175; \, N = 40; \,\widetilde{N} = 13.1$$

3. Sixty-four bits. RNS modules 2, 11, 17, 19, 23, 31, 41, 53, 59, 61, 71, 79, 83.

$$P = 18446748995286100082; \ \rho = 537; \ N = 74; \ \bar{N} = 16.$$

Thus, with an RNS of a 16-bit range, the «rough estimate» is done using the values with a capacity of *N*e = 11 bits, while the «clarification» takes place at the *N* = 23 bit precision. The speed of the

0.100100100100

7

2

4

1

2

1

7

7 2 0 0.0110011

7 2 0,1 0.1110011

1 2

0.1000000 <sup>2</sup>

1

0.0101010 <sup>3</sup>

0.100000000000 <sup>2</sup>

0.1001100 <sup>4</sup>

0.010101010101 <sup>3</sup>

0.1001001

0.100110011001

7

2

1

0

3

2{ 2,2,1 }1

1

1 0 1 1 0 <sup>1</sup>

2

ϕ

,

ϕ

7

1

2

2

0 2

97 и 8 <sup>1</sup> <sup>2</sup>

rough estimate goes up by 2.09 times. For an RNS with a 32-bit range, the «rough estimate» employs a capacity of *N*e = 13 bits, while the «clarification» requires *N* = 40 bits for calculation. The speed of the rough estimate increases 3.08 times. For a 64-bit RNS, the «rough estimate» would use a capacity of *N*e = 16 bits, while the «clarification» requires *N* = 74 bits. The speed of the rough estimate increases 4.62 times. These results show that, for large ranges, the capacity *N*e employed for the rough estimate is significantly lower than the accurate calculation capacity *N*, which allows significant gains in terms of speed when performing non-modular operations. 1,1, 2,6 0, 2,3,1 1, 2, 4,5 7 2 0.0110001 0 0.0110001 0.0110011 0 97 96 <sup>1</sup> <sup>2</sup> <sup>3</sup> <sup>4</sup> 1,1, 2,6 0,0,1,5 1,1, 2,6 0,0,1,5 1,1,1,1

Figure 1 shows the location of the mentioned intervals for positive and negative numbers in the RNS, and the location of the ambiguity areas, where it is possible to wrongly determine the sign. For the redundant RNS, the numerical range shows a redundancy zone. This allows reducing the number of the checked conditions due to the fact that the sets of the admissible positive numbers and the areas of the erroneous sign determination would no longer intersect (Figure 2). Thus, when speaking of a redundant RNS, determining the sign is reduced to the following tasks. 7 2 0.1111111 <sup>12</sup> 2 0.000000010010 0


**Figure 1.** Position of ambiguity areas when determining the sign and the intervals for positive and negative numbers for irredundant residue number system (RNS).

**Figure 2.** Position of ambiguity areas when determining the sign and the intervals for positive and negative numbers for redundant RNS.

Let us have a view on employing the approximate method by comparing the numbers in the RNS.

**Example 1.** *We have a system of bases p*<sup>1</sup> = 2*, p*<sup>2</sup> = 3*, p*<sup>3</sup> = 5*, p*<sup>4</sup> = 7.

ϕ

2 1 2 2 mod , 2 mod ,..., 2 mod 1

1 1

1 2 ... 0

2

3 3 0

2,2{ 2,1 }1 1

1 ,

1 2 ...

<sup>2</sup> 2 2 0 <sup>2</sup> <sup>2</sup>

ϕ

0

1 0 1

2

1 2 ...

$$\begin{aligned} \text{Then P} &= 2 \cdot 3 \cdot 5 \cdot 7 = 210, \rho = 2 + 3 + 5 + 7 - 4 = 13, P\_1 = \frac{P}{p\_1} = 105, P\_2 = \frac{P}{p\_2} = 70, P\_3 = \frac{P}{p\_3} = 42, \\ P\_4 &= \frac{P}{p\_4} = 30. \end{aligned}$$

The constants *k<sup>i</sup>* used for computing the relative values are:

$$k\_1 = \frac{\left|\frac{1}{105}\right|\_2}{2} = \frac{1}{2};\ k\_2 = \frac{\left|\frac{1}{70}\right|\_3}{3} = \frac{1}{3};\ k\_3 = \frac{\left|\frac{1}{42}\right|\_5}{5} = \frac{3}{5};\ k\_4 = \frac{\left|\frac{1}{30}\right|\_7}{7} = \frac{4}{7}.$$

l For precise operations with relative sizes of numbers in the RNS, it is necessary to use *N* = log<sup>2</sup> (*P*ρ) m = 12 characters after the decimal point. For a quick «rough estimate», we will use *N*e = j log<sup>2</sup> (*N*ρ ln 4) k = 7 decimals. The constants *k<sup>i</sup>* rounded up to 7 and 12 bits after the decimal point, are, respectively:

Seven bits: *k*<sup>1</sup> = 0.1000000; *k*<sup>2</sup> = 0.0101010; *k*<sup>3</sup> = 0.1001100; *k*<sup>4</sup> = 0.1001001;

Twelve bits: *k*<sup>1</sup> = 0.100000000000; *k*<sup>2</sup> = 0.010101010101; *k*<sup>3</sup> = 0.100110011001; *k*<sup>4</sup> = 0.100100100100.

The «rough estimate» takes checking the conditions 0 < [*F*(*a* )]<sup>2</sup> <sup>−</sup><sup>7</sup> < 1 2 − 2 <sup>−</sup>7ρ and <sup>1</sup> 2 < [*F*(*a* )]<sup>2</sup> <sup>−</sup><sup>7</sup> < 1 − 2 <sup>−</sup>7ρ (Step 1 of the algorithm), which in the binary form will appear as

1.1. If 0 < [*F*(*a* )]<sup>2</sup> <sup>−</sup><sup>7</sup> < 0.0110011, then the number *a* is positive.

1.2. If 0, 1 < [*F*(*a* )]<sup>2</sup> <sup>−</sup><sup>7</sup> < 0.1110011, then the number *a* is negative.

Let us compare the two numbers *a* = 97 *b* = 8 presented in the RNS on the bases *p*1, *p*2, *p*3, and *p*4. Let us define the numbers *a* and *b* in the RNS as: *a* = (1, 1, 2, 6), *b* = (0, 2, 3, 1). The difference is *a* − *b* = (1, 1, 2, 6) − (0, 2, 3, 1) = (1, 2, 4, 5). We will also define the sign *a* − *b*. For the «rough estimate», we will find that [*F*(*a* − *b*)]<sup>2</sup> <sup>−</sup><sup>7</sup> = 0.0110001. The values found meets the condition of Step 1 of the algorithm, that is 0 < 0.0110001 < 0.0110011, so we can conclude that *a* – *b* > 0, which produces *a* > *b*.

Now, let us compare the two numbers *a* = 97 and *b* = 96 as presented in the RNS on the bases *p*1, *p*2, *p*3, and *p*4. Now, we will define the numbers *a* and *b* in the RNS: *a* = (1, 1, 2, 6), *b* = (0, 0, 1, 5). The difference is *a* − *b* = (1, 1, 2, 6) – (0, 0, 1, 5) = (1, 1, 1, 1). We will define the sign *a* − *b*. For the «rough estimate», [*F*(*a* − *b*)]<sup>2</sup> <sup>−</sup><sup>7</sup> = 0.1111111. None of the conditions are met regarding the value obtained, so it will take a clarification stage of the algorithm. For the «accurate estimation», we will find [*F*(*a* − *b*)]<sup>2</sup> <sup>−</sup><sup>12</sup> = 0.000000010010. This value follows the condition of Step 2 of the algorithm, so we conclude that *a* – *b* > 0, where *a* > *b*.

The example above serves an illustration of employing the approximate method for computing in the RNS. It has been shown how to take into account the error that occurs when using a small *N*e. In practice, for most cases, it would be enough to carry out a «rough estimate», a run wherein it takes operating with numbers whose capacity is close to the logarithm of the full range capacity. Therefore, the complexity of the «rough estimate» is committed to *O* log<sup>2</sup> *n* , while the complexity of the «clarification» stage tends to *O*(*n*).

#### *2.3. Division Algorithm in the RNS*

The algorithm for the *<sup>a</sup> b* integer division could be described with an iterative scheme, which is performed in two stages. The first stage implies a search for the highest power 2*<sup>i</sup>* when approximating the quotient with a binary sequence. The second stage involves clarification of the approximating series. To get a range greater than *P*, you can select a value *P* ′ = *P* · *pn*+1; thus, it will take expanding the RNS base through adding an extra module. To avoid this base expansion, which is a computationally complex operation, we need to compare not the dividend with the interim divisors but the current results of the iteration (*i*) with the values of the previous iterations (*i* − 1). We will repeat the process of doubling the divider as long as the intermediate divider at the *i* iteration is below that of the *i* − 1 iteration. This would allow meeting the condition 0 < *b* < *P* − 1.

The division algorithm can be described with the following rules.

A certain rule ϕ is constructed, which, for each pair of positive integers, *a* and *b* will assign a certain positive number *q<sup>i</sup>* , where *i* is the number of the iteration, so that *a* − *bq<sup>i</sup>* = *r<sup>i</sup>* > 0, i.e., *a* > *bq<sup>i</sup>* . Then, the division of *a* by *b* will follow the rule: based on the operation ϕ, each pair of *a* and *b* will be assigned a corresponding number *<sup>q</sup>*<sup>1</sup> <sup>=</sup> *<sup>q</sup>*0, so that *<sup>a</sup>* <sup>−</sup> *bq*<sup>1</sup> <sup>=</sup> *<sup>r</sup>*<sup>1</sup> <sup>≥</sup> 0, i.e., *<sup>a</sup>* <sup>≥</sup> *bq*1. We will take the values 2*<sup>i</sup>* as *q<sup>i</sup>* and place them into the memory as the constants *c<sup>i</sup>* = 2 *<sup>i</sup>*mod*p*1, 2*i*mod*p*2, . . . , 2*i*mod*p<sup>n</sup>* . Given that, the *i* + 1 operation does not depend on the *i*-th operation, which allows performing iterations in parallel. Furthermore, in each iteration, there are only two operations performed: multiplication of the constant divisor by 2*<sup>i</sup>* , and comparison of the obtained values with the dividend.

If *r*<sup>1</sup> ≤ *b*, then the division is complete; if *r*<sup>1</sup> ≥ *b*, then following the rule ϕ, the pair of numbers (*r*1, *b*) will get a *q*<sup>2</sup> assigned, so that *a* − *bq*<sup>2</sup> = *r*<sup>2</sup> ≥ 0, i.e., *a* ≥ *bq*2. If *r*<sup>2</sup> < *b*, then the division is completed, and if *r*<sup>2</sup> ≥ *b*, then following the rule ϕ, the pair of numbers (*r*2, *b*) is assigned a *q*3, so that *a* − *bq*<sup>3</sup> = *r*<sup>3</sup> ≥ 0, etc. Since the consistent application of the operation ϕ leads to a decreasing sequence of integers *a* > *r*<sup>1</sup> > *r*<sup>2</sup> > . . . ≥ 0, then the algorithm is implemented in a finite number of steps. Let us assume that at step *m* there is a case 0 < *bq<sup>m</sup>* recorded, which means the end of the division operation. Then, we finally obtain *a* (*q*<sup>1</sup> + *q*<sup>2</sup> + . . . + *qm*)*b* + *rm*, where the sequence *q*<sup>1</sup> + *q*<sup>2</sup> + . . . + *q<sup>m</sup>* is the approximation of the quotient, which may contain some extra *q<sup>i</sup>* . Next, we need clarification for the resulting approximating series. In [14] and [16], the idea of the most significant bits for the quotient was introduced for RNS with specialized moduli sets <sup>n</sup> 2 *<sup>k</sup>* + 1, 2*<sup>k</sup>* , 2*<sup>k</sup>* <sup>−</sup> <sup>1</sup> o and {2*<sup>k</sup>* , 2*<sup>k</sup>* <sup>−</sup> 1, 2*<sup>k</sup>* – 1<sup>−</sup> 1}, while the approach proposed in this paper is extended for a general case.

The clarification will start with the higher *qm*. If *a* > *bqm*, then *q<sup>m</sup>* is a member of the approximating series of the resulting quotient. Further, we take (*q<sup>m</sup>* + *qm*−1): if *a* > *b*(*q<sup>m</sup>* + *qm*−1), then *qm*−<sup>1</sup> is put into the line, otherwise, if *a* < *b*(*q<sup>m</sup>* + *qm*−1), then *q<sup>m</sup>* is excluded from the series, etc. After checking all the *qi* , the quotient shall be determined by the remaining members of the series. Then, the quotient desired is determined by the expression *a* = (*q<sup>m</sup>* + *qm*−<sup>1</sup> + . . . + *q<sup>i</sup>* + . . . + *q*1)*b* + *rm*, where

$$q\_{\dot{l}} = \begin{cases} 1, \text{if } (q\_m + q\_{m-1} + \dots + q\_{\dot{l}})b < a; \\ 0, \text{otherwise.} \end{cases}$$

This algorithm will be easy to modify it into a modular form, while the absolute values of the variables are replaced with their relative values. The structure of the algorithm proposed is based on employing the approximate method for comparing numbers, which is performed using subtraction.

The known algorithms determine the quotient on the basis of iteration *A* ′ = *A* − *QD*, where *A* and *A* ′ , respectively, are the current and the next dividend, *D* is the divisor, *Q*<sup>1</sup> is the quotient, which is generated at each iteration of the full range of the RNS, and is not chosen from a small set of constants. In the proposed algorithm, the quotient is determined from the iteration *r<sup>i</sup>* = *A* − *b*2 *i* , where *A* is a certain dividend, *b*—divisor, and 2*<sup>i</sup>* is a member of the quotient's approximating series.

A comparison of the algorithms shows that the dividend in all iterations does not change, while the divisor is multiplied by the constant, which significantly reduces the computational complexity. In the iterative process of division in positional notation, in order to search for the highest power of the quotient's approximating series, and to clarify the approximating series, the dividend is compared to the doubled divisors or to the sum of the members of the series. Application of this principle to RNS can lead to incorrect operation of the algorithm, since, in case of the dynamic range overflow for the intermediate divider, the reconstructed number may go beyond the operating range caused by cyclic RNS. The cyclic RNS value will be below the dividend, which is not true because, in fact, the numbers will exceed the range *P* and the algorithm will proceed to the «loop» mode. For example, if the RNS modules are *p*<sup>1</sup> = 2, *p*<sup>2</sup> = 3, *p*<sup>3</sup> = 5, and *p*<sup>4</sup> = 7, then the range is *P* = 2 · 3 · 5 · 7 = 210. Suppose the reconstruction produced the number *A* = 220. In the RNS, *A* = 220 = (0, 1, 0, 3), i.e., *A* = 210 and *A* ′ = 10 have the same representation in the RNS. This ambiguity can lead to a breach of the algorithm. To overcome this difficulty, there is a need to compare the RNS the results of the current iteration values with the previous ones, which allows correct determination of a larger or smaller number. So, the fact of the dynamic range overflow in the RNS can be used for decision-making, «more-less». At

the first iteration, there is a comparison of the dividend with the divisor, while the remaining iterations compare the doubled values of the divisors *qib* < *qi*+1*b*. Each new iteration implies a comparison of the current value with the previous one.

Consistent application of these iterations leads to the formation of the inequalities chain *bq*<sup>1</sup> < *bq*<sup>2</sup> < . . . < *bq<sup>m</sup>* > *bqm*+1, which determines the required number of iterations dependent on the values of the dividend and the divisor. Thus, the algorithm is implemented through a finite number of iterations. Suppose that at iteration *m* + 1 there is a case of closure of the increasing sequence *bq<sup>m</sup>* > *bqm*+1, which corresponds to the RNS overflow range, i.e., *bqm*+<sup>1</sup> > *P* and *a* < *bqm*+1. Here is the end of the process of developing quotient interpolation through a binary sequence or a set of constants in the RNS. Thus, the process of the quotient approximation can be done by comparing the neighboring approximate divisors.

Here below, we will provide a detailed description of an improved algorithm for the division of modular numbers in a redundant RNS.

#### *2.4. Determination of the Quotient Sign*

Step 1. Calculate the approximate values of the dividend *F*(*a*) and the divisor *F*(*b*). We determine the signs of the numbers in two stages.


Step 2. If the numbers *a* and *b* have different signs, then the quotient is negative. If the numbers *a* and *b* have the same signs, then the quotient is positive. In further calculations, we use the absolute values of the divisor *a* and the divisor *b*. For the sake of convenience, we will denote them, too, as *a* and *b*.

#### Approximation of the Quotient

Step 3. Calculate the approximate values of the dividend *F*(*a*) and the divisor *F*(*b*) and compare them. If *<sup>F</sup>*(*a*) <sup>≤</sup> *<sup>F</sup>*(*b*), then the division process ends and the quotient <sup>j</sup> *a b* k is, respectively, equal to 0 or 1. If *F*(*a*) > *F*(*b*), then there is a search for the highest power 2−*<sup>N</sup>* in the approximation of the quotient with the binary code, where −*N* is a least significant bit of the binary fraction.

Let us show the search for the highest degree in the binary fraction.

Step 4. Shift the function [*F*(*b*)]<sup>2</sup> <sup>−</sup>*<sup>N</sup>* to the left up until a change in the first bit after the decimal point. The number of shifts determines the highest power *j*, which is recorded with the pulse counter connected to the memory *V*.

In this approximation, the quotient ends. To clarify the approximating sequence of the quotient, we will perform the following steps.

#### *2.5. Clarification of the Quotient's Approximating Sequence*

Step 5. From the memory, we select the constant 2*<sup>j</sup>* (the highest power of the series) and multiply it by the divisor. The value 2*jF*(*b*) will be compared with the dividend *F*(*a*) using the approximate method of number comparison in the RNS.

The constants 2*<sup>j</sup>* , 1 ≤ *j* ≤ log<sup>2</sup> *P* are previously placed in the memory *V*; the counter *j* and the quotient *Q* are set on «0». The outputs of the counter are address inputs in the memory *V*.

Step 6. Calculate the <sup>∆</sup><sup>1</sup> <sup>=</sup> *<sup>F</sup>*(*<sup>a</sup>* ) <sup>−</sup> *<sup>F</sup>*1(*b*). If the sign bit <sup>∆</sup><sup>1</sup> the value is «1», then the corresponding power series is discarded; if the value is «0», then to the quotient adder we add the value of the sequence members with the same degree, i.e., 2*j*mod*p<sup>i</sup>* , 1 ≤ *i* ≤ *n*, 0 ≤ *j* ≤ *N*.

Step 7. Check the sequence member of the 2*j*−<sup>1</sup> degree through a shift to the right and comparison. Compare ∆<sup>1</sup> and 2*j*−<sup>1</sup> *b*. If ∆<sup>1</sup> < 2 *j*−1 *b*, then the corresponding power series is discarded; if ∆<sup>1</sup> > 2 *j*−1 *b*, then to the quotient adder we add the value of the sequence members with the same degree, i.e., 2*j*−1mod*p<sup>i</sup>* <sup>∆</sup><sup>2</sup> <sup>=</sup> <sup>∆</sup><sup>1</sup> <sup>−</sup> <sup>2</sup> *j*−1 *b*.

Step 8. Similarly, check all the remaining sequence members up to degree zero. The last <sup>∆</sup>*<sup>i</sup>* <sup>=</sup> *<sup>R</sup>* <sup>=</sup> <sup>∆</sup>*i*−<sup>1</sup> <sup>−</sup> *<sup>F</sup>i*−1, i.e., 0 <sup>≤</sup> *<sup>R</sup>* <sup>&</sup>lt; *<sup>b</sup>* will be the remainder of *<sup>a</sup>* divided by *<sup>b</sup>***.** The quotient *<sup>Q</sup>* will be the sum of all the 2*<sup>j</sup>* needed for developing the quotient, which was accumulated in the adder with the sign as defined in the second step. The algorithm terminates.

The performance of the modified algorithm could be further shown with the example below.

**Example 2.** *Find the quotient Q* = *<sup>a</sup> b of dividing a* = 97 *by b* = −8 *in an RNS with bases p*<sup>1</sup> = 2*, p*<sup>2</sup> = 3*, <sup>p</sup>*<sup>3</sup> <sup>=</sup> <sup>5</sup>*, <sup>p</sup>*<sup>4</sup> <sup>=</sup> <sup>7</sup>*. Then <sup>P</sup>* <sup>=</sup> <sup>2</sup> · <sup>3</sup> · <sup>5</sup> · <sup>7</sup> <sup>=</sup> <sup>210</sup>*,* <sup>ρ</sup> <sup>=</sup> <sup>2</sup> <sup>+</sup> <sup>3</sup> <sup>+</sup> <sup>5</sup> <sup>+</sup> <sup>7</sup> <sup>−</sup> <sup>4</sup> <sup>=</sup> <sup>13</sup>*, <sup>P</sup>*<sup>1</sup> <sup>=</sup> *<sup>P</sup> p*1 = 105*, P*<sup>2</sup> = *<sup>P</sup> p*2 = 70*, P*<sup>3</sup> = *<sup>P</sup> p*3 = 42*, and P*<sup>4</sup> = *<sup>P</sup> p*4 = 30.

The constants *k<sup>i</sup>* used for calculation of the relative values are:

$$k\_1 = \frac{|\frac{1}{105}|\_2}{2} = \frac{1}{2};\ k\_2 = \frac{|\frac{1}{70}|\_3}{3} = \frac{1}{3};\ k\_3 = \frac{|\frac{1}{42}|\_5}{5} = \frac{3}{5};\ k\_4 = \frac{|\frac{1}{30}|\_7}{7} = \frac{4}{7}.$$

For a quick «rough estimate», we will use *N*e = j log<sup>2</sup> (*N*ρ ln 4) k = 7 characters after the decimal point. The constants *k<sup>i</sup>* rounded up to 7 bits after the decimal point are:

Seven bits: *k*<sup>1</sup> = 0.1000000; *k*<sup>2</sup> = 0.0101010; *k*<sup>3</sup> = 0.1001100; *k*<sup>4</sup> = 0.1001001.

Precise operations with relative values of the numbers in the RNS take *N* = l log<sup>2</sup> (*P*ρ) m = 12 characters after the decimal point. The constants *k<sup>i</sup>* rounded up to 12 binary bits after the decimal point are:

> *k*<sup>1</sup> = 0.100000000000; *k*<sup>2</sup> = 0.010101010101; *k*<sup>3</sup> = 0.100110011001; *k*<sup>4</sup> = 0.100100100100.

Now, we shall represent the *a b* numbers in the RNS:

$$a\_{10} = 97 \to (1, \ 1, \ 2, \ 6)\_{RNS'}$$

$$b\_{10} = -8 \to (0, \ 1, \ 2, \ 6)\_{RNS}.$$

Determine the signs of the numbers *a* and *b*.

A «rough estimate» (binary):

[*F*(*a* )]<sup>2</sup> <sup>−</sup><sup>7</sup> = |1 · 0.1000000 + 1 · 0.0101010 + 10 · 0.1001100 + 110 · 0.1001001|<sup>1</sup> = 0.0111000.

Since [*F*(*a* )]<sup>2</sup> <sup>−</sup><sup>7</sup> misses any one of the intervals (0; 0.0110011), (0.1; 0.1110011) as set forth in Example 1 will take a clarifying iteration:

[*F*(*b*)]<sup>2</sup> <sup>−</sup><sup>7</sup> = |0 · 0.1000000 + 1 · 0.0101010 + 10 · 0.1001100 + 110 · 0.1001001|<sup>1</sup> = 0.1111000

[*F*(*b*)]<sup>2</sup> <sup>−</sup><sup>7</sup> , too, misses all the intervals (0; 0.0110011), (0.1; 0.1110011) as set forth in Example 1, so it will take another clarifying iteration.

«Clarification»:

[*F*(*a* )]<sup>2</sup> <sup>−</sup><sup>12</sup> = |1 · 0.100000000000 + 1 · 0.010101010101+10 · 0.100110011001 + 110 · 0.100100100100|<sup>1</sup> = = 0.011101011111

Since 0 < [*F*(*a* )]<sup>2</sup> <sup>−</sup><sup>12</sup> < 0.1, then the number *a* is positive.

[*F*(*b*)]<sup>2</sup> <sup>−</sup><sup>12</sup> = |0 · 0.100000000000 + 1 · 0.010101010101+10 · 0.100110011001 + 110 · 0.100100100100|<sup>1</sup> = = 0.111101011111

Since 0.1 < [*F*(*b*)]<sup>2</sup> <sup>−</sup><sup>12</sup> < 1, then the number *b* is negative.

The numbers *a* and have opposite signs, so the quotient sign will be negative. In order to find the absolute value of the quotient, we will divide *a* by −*b* = (0, 0, 0, 0) − (0, 1, 2, 6) = (0, 2, 3, 1) following the algorithm as specified above.

The relative values of the dividend *a* and the divisor −*b* with full accuracy of the calculations *N* are:

$$[F(a)]\_{2^{-12}} = 0.011101011111; \; [F(-b)]\_{2^{-12}} = 0.000010011001.$$

Shifting the fractional part of the divisor −*b* to the left, step by step, we determine that a change in the first fractional bit after the decimal point occurs at the fourth shift. Thus, the approximation series can include only the values 2<sup>0</sup> , 2<sup>1</sup> , 2<sup>2</sup> , and 2<sup>3</sup> , which, in the RNS, have the following representation:

$$q\_0 = \mathbf{2}^0 = (1, \ 1, \ 1, \ 1);\ q\_1 = \mathbf{2}^1 = (0, \ 2, \ 2, \ 2);\ q\_2 = \mathbf{2}^2 = (0, \ 1, \ 4, \ 4);\ q\_3 = \mathbf{2}^3 = (0, \ 2, \ 3, \ 1).$$

These values develop the approximation sequence of the quotient, which is to be clarified later on. For a more accurate approximation sequence, we will subtract from the fraction [*F*(*a* )]<sup>2</sup> <sup>−</sup><sup>12</sup> of the dividend the fraction of the divisor[*F*(−*b*)]<sup>2</sup> <sup>−</sup><sup>12</sup> that has been shifted three ranks to the left (i.e., multiplied by 2<sup>3</sup> ):

$$\Delta\_1 = [\mathbf{F}(a)]\_{2^{-12}} - 2^3 \cdot [\mathbf{F}(-b)]\_{2^{-12}} = 0.011101011111 - 0.010011001 = 0.001010010111.$$

Since ∆<sup>1</sup> > 0, then we will leave 2<sup>3</sup> in the approximation sequence, while the value ∆<sup>1</sup> will be used for further calculations.

We subtract from <sup>∆</sup><sup>1</sup> the fraction [*F*(−*b*)]<sup>2</sup> <sup>−</sup><sup>12</sup> of the divisor shifted left two ranks:

<sup>∆</sup><sup>2</sup> <sup>=</sup> <sup>∆</sup><sup>1</sup> <sup>−</sup> <sup>2</sup> 2 · [*F*(−*b*)]<sup>2</sup> <sup>−</sup><sup>12</sup> = 0.001010010111 − 0.0010011001 = 0.000000110011.

Since ∆<sup>2</sup> > 0, then we leave 2<sup>2</sup> in the approximation sequence, while the value ∆<sup>2</sup> will be used for further calculations.

We subtract from <sup>∆</sup><sup>2</sup> the fraction [*F*(−*b*)]<sup>2</sup> <sup>−</sup><sup>12</sup> of the divisor shifted left one rank:

$$
\Delta\_3 = \Delta\_2 - 2^1 \cdot [F(-b)]\_{2^{-12}} = 0.000000110011 - 0.00010011001 = 1.111100000001
$$

The appearance of 1 in the sign rank indicates that ∆<sup>3</sup> < 0, therefore 2<sup>1</sup> is excluded from the approximation sequence, and ∆<sup>3</sup> is not to be used further (continue using ∆2).

We subtract from <sup>∆</sup><sup>2</sup> the fraction [*F*(−*b*)]<sup>2</sup> <sup>−</sup><sup>12</sup> of the divisor (no shift applied):

$$
\Delta\_4 = \Delta\_2 - 2^{01} \cdot [F(-b)]\_{2^{-12}} = 0.000000110011 - 0.000010011001 = 1.111110011010.
$$

The appearance of 1 in the sign rank indicates that ∆<sup>4</sup> < 0, so 2<sup>0</sup> is excluded from the approximation sequence.

Here, the process of clarifying the approximation sequence comes to an end. To determine the quotient, we need to add the remaining members of the approximation sequence. In this example, the remaining members were the following ones: *q*<sup>3</sup> = 2 <sup>3</sup> = (0, 2, 3, 1) and *q*<sup>2</sup> = 2 <sup>2</sup> = (0, 1, 4, 4). Then the absolute value of the quotient is to be determined through summing the members of the sequence:

$$\left| \left\| \frac{a}{b} \right\| \right| = (0, \text{ 2, 3, 1}) + (0, \text{ 1, 4, 4}) = (0, \text{ 0, 2, 5}) = 12.$$

In view of the sign, we finally obtain <sup>j</sup> *a b* k = −12. 1 2 { , ,..., } <sup>2</sup> log 1,2,...,

Figure 3 demonstrates the scheme of positional characteristics calculation based on CRTf for a number *X* = {*x*1, *x*2, . . . , *xn*}. A bit's width of values *x<sup>i</sup>* is equal to <sup>l</sup> log<sup>2</sup> *pi* m , *i* = 1, 2, . . . , *n*. The initial moduli |*x<sup>i</sup>* · *ki* |2*<sup>N</sup>* , *i* = 1, 2, . . . , *n* generates partial products of constant multiplication. Then, they are summed by a Carry-Save-Adder-tree (CSA-tree) modulo 2*N*. Obtained results are summed by Kogge–Stone adder [25] modulo 2*<sup>N</sup>* and is equal to [*F*(*X*)]<sup>2</sup> <sup>−</sup>*<sup>N</sup>* . In the next section, we will demonstrate the advantages of the proposed method compared to known analogs based on CRT and MRC. 2 1,2,..., 2 2 <sup>2</sup> [ ( )]

**Figure 3.** The scheme of positional characteristics calculation based on CRTf.

#### **3. Simulation of the Proposed Algorithm**

It follows from the analysis of the modular division scheme that the comparison and sign detection unit is the main component determining its computational complexity. This unit can be implemented based on the CRT, MRNS, or CRT with fractions. We have considered the models of all three types.

В The experimental simulation has been performed using ISE Design Suite 14.7. Kintex-7 KC705 XC7K70T-2FBG676 without DSP48E1B blocks has been chosen as the goal of compilation. This FPGA contains 10,250 slices and 300 input–output blocks. During the simulation, we varied the digit capacity of the moduli under a fixed number of bases. For each type of the model, same prime bases of a given capacity have been selected; in particular, four bases with module bits 5, 9, 13, 17, 21, and 25. The dynamic range of the system is approximately the product of the number of bases and their digit capacities. Only the bottleneck of the RNS division algorithm was implemented in hardware. The remaining parts of the division algorithm are very similar to the division operation in the standard IEEE library "ieee.numeric\_std.all" and require approximately the same amount of resources in hardware implementation. Figure 4 shows the resource usage graph of this FPGA with different capacity moduli for each type of the model. Table 1 shows detailed resource utilization for all approaches considered.

**Figure 4.** FPGA Kintex-7 KC705 XC7K70T-2FBG676 resource usage by selected calculation basis: (**a**) number of occupied Slices, (**b**) number of LUTs.


**Table 1.** Resources utilization and total delay.

All the considered algorithms were implemented on the corresponding FPGA. The architectures for CRT and MRC from [4] were simulated for comparison. In addition, the results for the division algorithm in the weighted numeric system (WNS) are also presented. For division in the WNS, an algorithm from standard IEEE library "ieee.numeric\_std.all" was implemented. The pipeline in the binary division is implemented like in [26]. These results clearly show the fact that the additional circuit logic for the proposed method using CRT with fractions does not exceed 25% of the WNS division algorithm costs.

We will comprehend the scheme latency as the maximum time spent by an arbitrary signal to run over the whole scheme from a certain input to a certain output. Latency estimation allows describing the performance of the suggested algorithm, including the working frequency of the scheme. For each type of model, Figure 5 presents the working frequency of the scheme for the base systems with different digit capacities of the moduli.

**Figure 5.** Frequency as a function of dynamic range bit count.

As an example, consider a 64-bit capacity as the most widespread in modern computer systems. To write numbers in this system, it suffices to represent each of the four moduli as a 16- or 17-bit number. Let the set of moduli be {65537, 65539, 65543, 65551}. The range of this set forms a 65-bit number, which covers 64-bit capacity. Here, the approximate method requires only 689 slices, whereas the orthogonal basis method and the improved MRNS scheme require 1457 and 865 slices, respectively. On the other hand, the working frequency of the approximate method reaches 62.5 MHz, which is 7.6 times faster than the CRT-based restoration and 10.1 times faster than the improved MRNS method. Note that the advantages of the approximate method over these approaches remain in force for higher digit capacities, too.

#### **4. Conclusions**

The new algorithm described in this paper speeds up the modular division procedure in the RNS representation in comparison with the well-known analogs. This fact can be explained by the rather simple structure of the algorithm containing uncomplicated operations, namely, addition and shift (for quotient approximation), as well as shift and subtraction (for quotient refinement). Owing to CRT usage with fractions, the new algorithm does not include such operations as modular remainder calculation and number conversion into the mixed radix number system (MRNS) representation. The simulation of the algorithm on FPGA Kintex-7 has demonstrated a considerable reduction in hardware costs and an appreciable gain in speed as against the algorithms based on the CRT and MRNS representations.

Currently, this is the best hardware implementation of the general modular division. In comparison with the well-known algorithms, the suggested algorithm guarantees smaller hardware and time costs by a close connection between architectural calculations and hardware implementation. As a result, the computational complexity of modular division has been essentially decreased. The new algorithm is remarkable for easy implementation, thereby requiring fewer calculations than its well-known analogs.

A promising direction of further research is to find fast algorithms for several problem-causing operations in the RNS, namely, RNS-MRNS conversion and the optimal choice of RNS moduli within different ranges for specific applications. Each of the directions would promote the development of this field of computational mathematics owing to new RNS applications.

**Author Contributions:** Conceptualization, D.K., P.L., and N.C.; data curation, M.B. and I.L.; formal analysis, M.D.; funding acquisition, D.K.; investigation, A.L., A.N., M.V., and A.V.; methodology, P.L.; project administration, N.C.; resources, M.B. and I.L.; software, A.N., M.V., and A.V.; supervision, D.K. and P.L.; validation, M.D. and A.L.; visualization, A.V., M.V., and A.N.; writing—original draft preparation, N.C., P.L., and M.B.; writing—review and editing, D.K. and P.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the grant of the Russian Science Foundation (Project №19-19-00566). **Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Applied Sciences* Editorial Office E-mail: applsci@mdpi.com www.mdpi.com/journal/applsci

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18