**About the Editors**

#### **Guang Wang**

Guang Wang, Ph.D., is currently an Associate Professor with the Automation Department, North China Electric Power University, Baoding, China. He received a B.E. degree in automation from Harbin Engineering University, Harbin, China, in 2010, and his M.Sc. and Ph.D. degrees in control science and engineering from the Harbin Institute of Technology, Harbin, in 2012 and 2016, respectively. He has led and been involved in more than six research projects, including as the National Natural Science Foundation of China and Natural Science Foundation of Beijing. His research interests include fault diagnosis and performance monitoring, and their applications in energy storage systems.

#### **Jiale Xie**

Jiale Xie, Ph.D., is currently a lecturer with the Department of Automation, North China Electric Power University, Baoding, China. He received his B.Eng. degree in automation from Harbin Engineering University, Harbin, China, in 2010, and M.Sc. and Ph.D. degrees in control science and engineering from Harbin Institute of Technology, Harbin, in 2012 and 2018, respectively. He has led and been involved in more than four research projects, including National Natural Science Foundation of China, and Central University Basic Scientific Research Business Expenses Special Funds. His research interests focus on intelligent management and safety control of Li-ion-battery-based energy storage systems.

#### **Shunli Wang**

Prof. Dr. Shunli Wang is a leading expert on new energy research, and is the head of DTlab and the leader of the new energy measurement and control research team. He researches modeling and state estimation strategies for high-power lithium-ion batteries. He has undertaken more than 40 projects and registered more than 30 patents, publishing over 100 research papers as well as obtaining 20 awards, including the Young Scholar Award, Science and Technology Progress Award, and University & Enterprise Innovation Talent Team Award, etc. He has developed multiple generation systems, improving the reliability of battery systems and expanding its application fields with significant social and economic benefits.

## *Editorial* **Application of Artificial Intelligence in Power System Monitoring and Fault Diagnosis**

**Guang Wang 1, Jiale Xie 1,\* and Shunli Wang <sup>2</sup>**


#### **1. Introduction**

Emerging technologies such as artificial intelligence (AI), big data analytics, and deep learning have gained widespread attention in recent years and have demonstrated great potential for application in many industrial fields. In power systems, AI and other technologies are also being used as new and powerful tools to replace traditional techniques for feature modeling, performance control, and fault diagnosis in order to obtain superior results. This Special Issue, "Application of Artificial Intelligence in Power System Monitoring and Fault Diagnosis", aims to introduce the latest advances in this field and discusses the application of AI technology in power system modeling and control, state estimation, performance diagnosis, and prognosis, among other fields.

The scope of this Special Issue includes, but is not limited to, the following:


From a total of 24 submissions, 10 research papers were published in this Special Issue, with 14 rejected.

#### **2. Highlights of Published Papers**

This section provides a summary of this Special Issue of *Energies*, covering published articles [1–10] which address several topics related to AI technologies in power system performance monitoring.

In [1], Barnabei et al. designed a Supervisory Control and Data Acquisition (SCADA) based framework for the unsupervised anomaly detection of district heating (DH) network generating units. The framework relies on a multivariate machine learning regression model and then uses a sliding threshold approach for the subsequent processing of the model residuals generated during the testing phase. The system was tested against major failures occurring in gas-fired generating units at the DH plant in Aosta, Italy, and the results showed that the framework can detect anomalies successfully.

**Citation:** Wang, G.; Xie, J.; Wang, S. Application of Artificial Intelligence in Power System Monitoring and Fault Diagnosis. *Energies* **2023**, *16*, 5477. https://doi.org/10.3390/ en16145477

Received: 17 July 2023 Accepted: 18 July 2023 Published: 19 July 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

1

In [2], Lin et al. proposed a new method for shunt capacitor monitoring. The method monitors the shunt capacitor bank via the synchronous voltage and branch current of the shunt capacitor bank, calculates the capacitance parameters of the ungrounded starconnected capacitor bank using the parameter symmetry of the capacitor parameter calculation method, and identifies the abnormal state of the capacitor according to the statistical method. The simulation established by PSCAD verified that the relay protection device could effectively monitor the early abnormal condition of the capacitor bank.

In [3], Jawad et al. proposed a fault diagnosis method based on probabilistic generative models to remedy the shortcomings of existing fault detection methods for high-voltage direct current (HVDC) transmission systems. The method uses wavelet transform based on ant colony optimization and artificial neural network to detect different types of faults in HVDC transmission lines. The experimental results showed that the proposed method has higher accuracy and stronger robustness in the fault diagnosis of HVDC transmission systems compared with existing methods, such as support vector machines and decision trees.

In [4], Pujana proposed a hybrid model-based method for developing a digital twin (DT) model for wind power conversion systems. The method combines the advantages of physical models with advanced data analysis techniques to obtain knowledge from actual operational data while preserving physical relationships, thereby generating synthetic data from non-occurring events to detect and classify faults. Compared with existing DT methods, the method proposed in this paper has significant advantages in accuracy and interpretability.

In [5], Xia et al. proposed a multi-model fusion ensemble learning algorithm based on stacked structures to detect power theft. To solve the problem of existing methods being unable to further improve the accuracy of electricity theft detection, a heterogeneous ensemble learning method is used to construct a heterogeneous integrated learning model for stacked structure electricity theft detection using different powerful individual learning superposition integration structures to achieve the accurate detection and identification of electricity theft.

For identifying different types of partial discharges (PDs) in gas-insulated switchgear (GIS), Zheng et al. proposed an improved feature fusion convolutional neural network (IFCNN) method in [6], which solves the problem of traditional methods requiring a large quantity of statistical discharge data. By fusing time-frequency features, the method can uncover more local features of potential discharge pulses and increase the recognition accuracy to 95.8%.

In [7], Luo et al. designed an automatic machine learning-based lifetime prediction model (AutoML) for accurately estimating and predicting the capacity and lifetime of Li-ion batteries. The features of CC and CV phases are extracted using optimized incremental capacity (IC) curves, and the noise is removed using the Kalman filtering algorithm. They then built AutoML, which can automatically generate the appropriate processing flow, addressing the issues of information redundancy and high computational cost. By validating the NASA dataset, they demonstrated a significant improvement in the model's ability to predict battery life on small-scale datasets.

In [8], Bai et al. proposed an HOG-SVM-based power system equipment identification method. First, wavelet transform is performed on the sound signals of power system equipment collected from the field to obtain wavelet coefficient-time maps. Then, the HOG features of the images are selected, and the selected features are classified using an SVM classifier. Moreover, the method also combines sound signal and image processing to effectively take advantage of image processing and avoid the limitations of sound signal processing. Finally, simulation experiments demonstrated that the proposed method can accurately identify and classify power system equipment.

In [9], Chen et al. proposed a deep-learning-based method for the intelligent modeling of the incineration process in waste-to-energy plants. The output variables are selected regarding safety, stability, and economy. The input variables are determined by eliminating invalid redundant variables using the Lasso (Least absolute shrinkage and selection operator) algorithm and a multi-input multi-output model based on feature selection, and CNN-BiLSTM is established. The results showed that the model can fully exploit the data features under multi-dimensional input feature parameters, and that it has higher accuracy and applicability than the traditional model.

Finally, in [10], Zhang et al. constructed a short-term wind speed prediction model based on variable support segments (VSS). At first, the method decomposes the historical wind speed series into several components using the variational mode decomposition method. Then, an improved transformer model is used to predict the predicted values of each element, and these predicted values are summed to obtain the future wind speed prediction. Experimental results showed that the prediction accuracy of the improved transformer model is significantly higher than that of other prediction models.

**Author Contributions:** Investigation, G.W. and J.X.; Writing—original draft, G.W.; Writing—review and editing, J.X. and S.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported in part by the National Natural Science Foundation of China under Grants 61973117 and 52207235, the Beijing Municipal Natural Science Foundation under Grant 4192056, and the Hebei Natural Science Foundation under Grant F2019502185.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **Variable Support Segment-Based Short-Term Wind Speed Forecasting**

**Ke Zhang \*, Xiao Li and Jie Su**

School of Control and Computer Engineering, North China Electric Power University, Baoding 071003, China; 17331250396@163.com (X.L.); sjzjn@126.com (J.S.)

**\*** Correspondence: zhangke\_1998@163.com

**Abstract:** Accurate short-term wind speed forecasting plays an important role in the development of wind energy. However, the inertia of airflow means that wind speed has the properties of time variance and inertia, which pose a challenge in the task of wind speed forecasting. We employ the variable support segment method to describe these two properties. We then propose a variable support segment-based short-term wind speed forecasting model to improve wind speed forecasting accuracy. The core idea is to adaptively determine the variable support segment of the future wind speed by a self-attention mechanism. Historical wind speed series are first decomposed into several components by variational mode decomposition (VMD). Then, the future values of each component are forecast using a modified Transformer model. Finally, the forecasting values of these components are summed to obtain the future wind speed forecasting values. Wind speed data collected from a wind farm were employed to validate the performance of the proposed model. The mean absolute error of the proposed model in spring, summer, autumn, and winter is 0.25, 0.33, 0.31, and 0.29, respectively. Experimental results show that the proposed model achieves significant accuracy and that the modified Transformer model has good performance.

**Keywords:** wind speed forecasting; variable support segment; VMD; Transformer; attention mechanism

#### **1. Introduction**

Wind energy has become the most promising clean energy due to its large reserves [1] and good foundation. The Global Wind Energy Council has indicated that the installed global wind power capacity provide be up to 20% of global electricity by 2030 [2]. The development and utilization of wind energy are critical to alleviating the pressure generated by traditional energy sources such as fossil fuels. The conversion and management of wind energy is closely related to wind speed. Accurate short-term wind speed forecasting, which estimates the wind speed 30 minutes to 6 hours ahead [3], is essential for optimizing power grid scheduling, reducing system rotating reserve capacity, and guaranteeing stable grid operation [4,5]. However, the accuracy and reliability of wind speed forecasting are affected by the stochastic nature and nonlinear characteristics of wind speed. Various models for improving wind speed accuracy have been proposed [6–9], which can be divided into the categories of single models and combined models based on their structure. The most widely used single models include the backpropagation (BP) neural network [10], extreme learning machine (ELM), Kalman filtering, the autoregressive moving average (ARMA) [11], and support vector regression (SVR) [12] models.

A single model is unable to achieve satisfactory forecasting accuracy due to the intermittency of wind speed. Thus, combined models consisting of multiple single models are widely applied. Extensive studies have shown that combined models have better performance [13,14]. There are two sorts of combined models. The first weights the forecasting values of different models to obtain the final forecasting values. In [15], the weight coefficients of three different models were determined via modified support vector regression. In [16], the partial least squares algorithm was used to optimize the weight

**Citation:** Zhang, K.; Li, X.; Su, J. Variable Support Segment-Based Short-Term Wind Speed Forecasting. *Energies* **2022**, *15*, 4067. https:// doi.org/10.3390/en15114067

Academic Editors: Galih Bangga

Received: 23 April 2022 Accepted: 30 May 2022 Published: 1 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

coefficients. Wang et al. [17] proposed a combined model in which the coefficients of four artificial neural networks' forecasting results are determined using the multi-objective bat algorithm (MOBA).

However, the original wind speed series often appears as a broadband signal in the frequency domain, which is difficult to forecast directly. Therefore, a second sort of combined model has been presented to solve this issue. First, a historical wind speed series is broken into narrowband components using the signal decomposition method. Then, each narrowband component's future values are forecast separately by the forecasting models. The final forecasting values are the sum of each component's forecasting values. The most commonly used signal decomposition methods include wavelet transform (WT) [18], empirical mode decomposition (EMD) [19] and its variants, and variational mode decomposition (VMD) [20]. In [21], WT was employed to reduce wind speed fluctuation characteristics. Naik et al. [22] utilized EMD to preprocess wind speed data. In [23], VMD was used to overcome the intermittency of the wind and eliminate noise signals. WT requires the wavelet function and the decomposition layers to be selected artificially, which is non-adaptive. Although EMD and its variants are adaptive, they have limitations such as mode mixing and endpoint effect. VMD has good noise robustness, which is an adaptive signal decomposition method. Here, we employ VMD as the signal decomposition method.

Forecasting models are another key component of combined models; research [24,25] has shown that deep learning models have better performance in extracting and learning complex quantitative relationships hidden in wind speed data. Altan et al. [26] used the long short-term memory (LSTM) model for the forecasting of narrowband components, which showed good performance. In [27], the bidirectional LSTM model was utilized to forecast the sub-series. In [28], a combined model which incorporated VMD, differential evolution (DE), and echo state network (ESN) was proposed. In [29], the significant spatiotemporal characteristics in wind speed data were extracted by a graph deep learning model.

The Transformer model [30] is a deep learning model based on the self-attention mechanism which is good at capturing dependencies in long sequences and is not affected by distance. The Transformer model outperforms other deep learning models on process sequence data, hence, we employ it here as the forecasting model. However, the Transformer model cannot be employed for time series forecasting tasks directly due to its particular structure. Therefore, the structure of the Transformer model is modified in this paper. According to the above analysis, we first use VMD to obtain the narrowband components decomposed from historical wind speed series, then utilize the modified Transformer model to obtain each component's forecasting values. The final forecasting values are the sum of each component's forecasting values. The following are this paper's major contributions:


The structure of this paper is as follows: Section 2 provides the mathematical description of wind speed forecasting; Section 3 briefly introduces VMD and the Transformer model; Section 4 presents the modified Transformer model and the proposed model; Section 5 analyzes the forecasting results of different models; and the final section contains our conclusions.

#### **2. Mathematical Description of Wind Speed Forecasting**

At present, most wind speed forecasting models assume that the future wind speed in the short term is only related to the historical wind speed:

$$\mathbf{x}\_{\rm N} = f(\mathbf{x}\_{\rm M}) \tag{1}$$

where **x***<sup>N</sup>* = [*xi*, ... , *xi*+*N*−1] denotes the future wind speed series and **x***<sup>M</sup>* = [*xi*−*M*, ... , *xi*−1] denotes the historical wind speed series (i.e., the support segment of **<sup>x</sup>***N*); *<sup>f</sup>* : <sup>R</sup>*<sup>M</sup>* <sup>→</sup> <sup>R</sup>*<sup>N</sup>* is the function that describes the mapping relationship between **x***<sup>M</sup>* and **x***N*. Thus, the wind speed forecasting task can be achieved by constructing a model to approximate the function *f* . Wind speed series often appear as broadband signals in the frequency domain, while narrowband signals are generally assumed to have a stable future trend and are easier to forecast. As a result, one feasible approach is to forecast the future values based on the narrowband components of historical wind speed series. The wind speed forecasting process based on signal decomposition can be formulated as

$$\mathbf{x}\_{N} = \sum\_{k} \mathbf{x}\_{N}^{k} = \sum\_{k} f\_{k}(\mathbf{x}\_{M}^{k}) \tag{2}$$

where **x***<sup>M</sup>* = ∑*<sup>k</sup>* **x***<sup>k</sup> <sup>M</sup>*, **<sup>x</sup>***<sup>k</sup> <sup>M</sup>* represents the narrowband component of the historical wind speed series, i.e.,the support segment of **x***<sup>k</sup> <sup>N</sup>*. Therefore, the function *fk* : <sup>R</sup>*<sup>M</sup>* <sup>→</sup> <sup>R</sup>*<sup>N</sup>* describes the quantitative relationships between **x***<sup>k</sup> <sup>M</sup>* and **<sup>x</sup>***<sup>k</sup> N*.

The inertia of airflow means that the wind speed shows time-varying and inertial properties, which influences the accuracy of wind speed forecasting. As Equation (2) fails to describe these two properties of wind speed effectively, there is room for improvement. Hence, the parameter *τ*, which is related to delay, can be introduced to the mathematical description of wind speed forecasting, and the parameter *p*, which denotes the length of the support segment, is set as a time variable. As a result, the mathematical description of wind speed forecasting can be formulated as

$$\mathbf{x}\_{N} = \sum\_{k} \mathbf{x}\_{N}^{k} = \sum\_{k} f\_{k}(\mathcal{S}\_{p\_{k}, \mathbf{r}\_{k}}^{k}) \tag{3}$$

where *S<sup>k</sup> pk*,*τ<sup>k</sup>* = [*x<sup>k</sup> i*−*pk*−*τ<sup>k</sup>* , ... , *x<sup>k</sup> i*−1−*τ<sup>k</sup>* ] is the variable support segment of **x***<sup>k</sup> <sup>N</sup>*. In Formula (3), the parameters *τ* and *p* vary with the historical wind speed series; thus, the inertia property of wind speed is described by the parameter *τ*, while the time-varying property of wind speed is described by the parameters *τ* and *p* jointly. When *N* = 1, Equation (3) corresponds to the one-step wind speed forecasting problem, which can be reformulated as

$$\alpha\_i = \sum\_k f\_k(S^k\_{p\_k, \tau\_k}) \tag{4}$$

Unless otherwise specified, the remainder of this paper concentrates on the issue of one-step wind speed forecasting.

Figure 1 shows the schematic diagram of the variable support segment; [*x*<sup>1</sup> <sup>2</sup>, *<sup>x</sup>*<sup>1</sup> <sup>3</sup>, *<sup>x</sup>*<sup>1</sup> <sup>4</sup>, *<sup>x</sup>*<sup>1</sup> 5], which contributes to the formation of *x*<sup>1</sup> <sup>11</sup>, is the variable support segment of *<sup>x</sup>*<sup>1</sup> <sup>11</sup>, that is, *p*<sup>1</sup> = 4 and *τ*<sup>1</sup> = 5. Similarly, the variable support segment of *x*<sup>2</sup> <sup>11</sup> is [*x*<sup>2</sup> <sup>3</sup>, ... , *<sup>x</sup>*<sup>2</sup> <sup>7</sup>]; *p*<sup>2</sup> = 5 and *τ*<sup>2</sup> = 3.

According to Equation (4), we can forecast the future wind speed via the following steps.


**Figure 1.** Schematic diagram of the variable support segment.

Approximating the variable support segment accurately is the key to reducing the errors in wind speed forecasting. Existing forecasting models struggle with adaptively approximating the variable support segment. In our approach, the variable support segment is approximated using the self-attention mechanism, the specific process of which is introduced in Section 4.1.

#### **3. VMD and Transformer**

For the purposes of this paper, VMD was selected as the signal decomposition method and the Transformer model was selected as the forecasting model; this section briefly introduces them.

#### *3.1. VMD*

VMD decomposes an input signal into a number of intrinsic mode functions which are band-limited. It includes two main parts, variational problem construction and variational problem solving.

VMD uses an input signal, *g*(*t*), equal to the sum of all the modes as its premise and seeks *K* mode functions, *uk*(*t*), to obtain the minimum sum of the estimated bandwidths of each mode. Thus, the constrained variational problem can be formulated as

$$\begin{cases} \min\_{\{u\_k\}, \{\omega\_k\}} \left\{ \sum\_k ||\partial\_t[(\delta(t) + \frac{j}{\pi t}) \* u\_k(t)]e^{-j\omega\_k t}||\_2^2 \right\} \\ s.t. \sum\_k u\_k = g(t) \end{cases} \tag{5}$$

where *uk* is the mode function, *ω<sup>k</sup>* is the mode center frequency, *K* is the number of modes, *δ* is the Dirac disturibution, ∗ is convolution, and *g*(*t*) is the input signal.

By introducing the quadratic penalty term *α* and the Lagrangian multiplier *λ*(*t*), the constrained variational problem of Equation (5) becomes an unconstrained variational problem:

$$\begin{aligned} L(\{\mu\_k(t)\}, \{\omega\_k\}, \lambda(t)) &= \\ \ln \sum\_k ||\partial\_t[(\delta(t) + \frac{j}{\pi t}) \* \mu\_k(t)]e^{-j\omega\_k t}||\_2^2 + ||\varrho(t) - \sum\_k \mu\_k(t)||\_2^2 + \left< \lambda(t), \mathcal{g}(t) - \sum\_k \mu\_k(t) \right> \end{aligned} \tag{6}$$

In order to solve the unconstrained variational problem, VMD alternately updates *un*+<sup>1</sup> *<sup>k</sup>* (*t*), *<sup>ω</sup>n*+<sup>1</sup> *<sup>k</sup>* , and *<sup>λ</sup>n*+<sup>1</sup> *<sup>k</sup>* (*t*) to find the "saddle point" of the extended Lagrangian expression. Here, the iterative formula of the Fourier transform of *uk*(*t*), *ω<sup>k</sup>* and *λ*(*t*) can be expressed as

$$\mathfrak{a}\_k^{n+1}(\omega) \leftarrow \frac{\mathfrak{F}(\omega) - \sum\_{i \neq k} \mathfrak{a}\_i(\omega) + \frac{\hat{\lambda}(\omega)}{2}}{1 + 2a(\omega - \omega\_k)^2} \tag{7}$$

$$
\omega\_k^{n+1}(\omega) \leftarrow \frac{\int\_0^\infty \omega |\hat{u}\_k(\omega)|^2 d\omega}{\int\_0^\infty |\hat{u}\_k(\omega)|^2 d\omega} \tag{8}
$$

$$
\hat{\lambda}^{n+1}(\omega) \leftarrow \hat{\lambda}^n(\omega) + \eta[\hat{\g}(\omega) - \sum\_{k} \hat{\mu}\_k^{n+1}(\omega)] \tag{9}
$$

where *η* is an update factor.

#### *3.2. The Transformer Model*

The Transformer [30] model is a model based on an "encoder–decoder" structure, shown in Figure 2. The model consists of an input layer, encoder stack, decoder stack, and output layer.

**Figure 2.** The structure of the Transformer model.

The word embedding module and positional encoding module, which correspond to "Input Embedding" and "Positional Encoding" in Figure 2, respectively, make up the input layer. The word embedding module is utilized to convert input words into computable vectors, as words cannot be directly input into the model. The positional encoding module embeds positional information into the input sequence, as the Transformer model abandons the traditional recurrent neural network structure and is therefore unable to directly receive the position information of the input sequence. The encoder stack which is responsible for encoding the input information and generating intermediate vectors as the input of the decoder stack is composed of several encoders. Each encoder contains two modules, the multi-head attention mechanism module and the feed-forward neural network module, corresponding to "Multi-Head Attention" and "Feed Forward" in the Figure 2, respectively. Here, we use relu as the activation function in the feed-forward neural network module. Residual connections are used between each module and normalization is carried out, which is indicated by the "Add & Norm" part in the Figure 2.

The multi-head attention mechanism module calculates the attention based on a selfattention mechanism, which can deeply explore the internal relationship of input sequences, focus on important information, and filter out unimportant information. The self-attention mechanism first maps the input matrix **X** into the query matrix **Q**, the key matrix **K**, and the value matrix **V**, then calculates the attention distribution by the scale dot production, and finally performs a weighted summation of the value matrix according to the attention distribution. Specifically, this is shown in Equations (10)–(13):

$$\mathbf{Q} = \mathbf{W}\_{\mathbf{Q}} \mathbf{X} \tag{10}$$

$$\mathbf{K} = \mathbf{W}\_K \mathbf{X} \tag{11}$$

$$\mathbf{V} = \mathbf{W}\_V \mathbf{X} \tag{12}$$

$$Attention(\mathbf{Q}, \mathbf{K}, \mathbf{V}) = softmax(\frac{\mathbf{Q}\mathbf{K}^T}{\sqrt{d}})\mathbf{V} \tag{13}$$

where **W***Q*, **W***K*, and **W***<sup>V</sup>* are the weight matrix corresponding to **Q**, **K**, and **V**, respectively, and <sup>√</sup>*<sup>d</sup>* is a scale factor.

The information learned by a single self-attention mechanism is relatively simple. In order to fully mine the correlation information between input sequences, the Transformer model further adopts the multi-head attention mechanism in order to learn information from different subspaces, then splices the outputs of different subspaces to obtain the final output, as shown in detail in Equations (14) and (15):

$$Multihead(\mathbf{Q}, \mathbf{K}, \mathbf{V}) = Concat(Head\_1, \dots, Head\_H) \mathbf{W}^O \tag{14}$$

$$Head\_i = Attention(\mathbf{XW}\_i^Q, \mathbf{XW}\_i^K, \mathbf{XW}\_i^V) \tag{15}$$

where **W***<sup>Q</sup> <sup>i</sup>* , **<sup>W</sup>***<sup>K</sup> <sup>i</sup>* , and **<sup>W</sup>***<sup>V</sup> <sup>i</sup>* are the weight matrices corresponding to **Q**, **K** and **V**, in *Headi*, *Concat* is used to splice the output of each *Head*, and **W***<sup>O</sup>* is the projection matrix, which is used to realize the projection of the stitching result.

The decoder stack, which is responsible for decoding the input information, is composed of several decoders. Compared to the encoder, the decoder includes an additional mask multi-head attention mechanism module to prevent information leakage. Residual connections between the modules of the decoder are used and normalized.

The output layer includes the Linear module and the Softmax module, which are used to convert the vector output by the decoder stack into a probability and then output the word corresponding to the highest probability.

#### **4. The Wind Speed Forecasting Model**

#### *4.1. The Modified Transformer Model*

The structure of the original Transformer model is not suitable for time series forecasting tasks; therefore, we conducted several specific modifications:

(1) The word-embedding module was replaced by a fully-connected neural network (FCNN) to allow the wind speed series to be input directly into the model;


For convenience, the modified Transformer model, shown in Figure 3, is called M-Transformer in this paper.

**Figure 3.** The structure of the M-Transformer model.

Drawing on the large number of previous experimental results, the *Head* number was set as 8 and the input length of the narrowband component as 10. Without loss of generality, the historical wind speed components were represented as [*x*1, ... , *x*10]. It should be noted that *x*<sup>1</sup> ∼ *x*<sup>10</sup> were fed into the FCNN. As shown in Figure 4, *xi* is mapped into a row vector by the FCNN with the length *ds* = 512. The matrix **X** is concatenated from ten row vectors generated from the narrowband modes of the historical wind speed, which is then fed into *Head*<sup>1</sup> ∼ *Head*<sup>8</sup> in order to separately calculate the attention distribution.

Using *Head*<sup>1</sup> as an example, the matrix **<sup>X</sup>** is multiplied by **<sup>W</sup>***<sup>Q</sup>* <sup>1</sup> ,**W***<sup>K</sup>* <sup>1</sup> ,**W***<sup>V</sup>* <sup>1</sup> to generate **Q**1, **K**1, **V**1. The attention distribution of *Head*1 (i.e., the weight matrix **W**<sup>1</sup> in Figure 4) is calculated based on Equation (16):

$$\mathbf{W}\_1 = \operatorname{softmax}(\frac{\mathbf{Q}\_1 \mathbf{K}\_1^T}{\sqrt{d}}) \tag{16}$$

after which we multiply **W**<sup>1</sup> and value matrix **V**<sup>1</sup> to obtain the output **Z**<sup>1</sup> of *Head*1:

**Figure 4.** The schematic diagram of the multi-head attention mechanism.

The *i*th row of **Z**<sup>1</sup> can be considered as the weighted sum of all rows of the matrix **V**1, and the weight of each row is the numerical value of the corresponding element on the *i*th row of **W**1. The *j*th row of **V**<sup>1</sup> is determined by the unique historical wind speed component sample value *xj*; thus, the weight matrix, **W**1, determines which sample values in the narrowband components of the historical wind speed series contribute to the output **Z**<sup>1</sup> of *Head*1. Thus, the weight matrices {**W**1, ... ,**W**8} of all *Head* of the first encoder in the encoder stack together to determine the variable support segment of the narrowband modes of the historical wind speed series, which can be expressed as

$$S\_{p, \mathbf{r}} = \bigcup\_{h} \bigcup\_{\max(\mathbf{W}\_h^j) > 0} \mathbf{x}\_j \tag{18}$$

where **W***<sup>j</sup> <sup>h</sup>* represents the *<sup>j</sup>*th column of the weight matrix **<sup>W</sup>***<sup>h</sup>* of the *<sup>h</sup>*th *Head* and *max*(**W***<sup>j</sup> h*) denotes the maximum element value of **W***<sup>j</sup> h*.

Figure 5 shows a pseudo-color figure of the weight matrix **W**<sup>1</sup> of the first *Head* of the first encoder in the encoder stack. It can be seen that the non-zero elements are concentrated in certain columns in **W**1, which is to say that in the component of the historical wind speed series, the only elements that contribute to the output of Head are [*x*3,..., *x*8].

**Figure 5.** The attention distribution of *Head*1.

#### *4.2. Proposed Model*

According to the wind speed forecasting task steps in Section 2, several narrowband components decomposed from the historical wind speed series are input into the M-Transformer model to separately obtain the forecasting value. The wind speed forecasting result is the sum of the forecasting value of each narrowband component. A flow chart for the proposed method is shown in Figure 6, abbreviated as VMD-TF for convenience.

**Figure 6.** The flowchart of the proposed method.

According to Figure 6, before decomposing the wind speed series based on the VMD, parameters *K* (i.e., the number of narrowband components) need to be determined. In our approach, these are determined by judging whether the center frequencies of the adjacent components overlap; the specific process is shown in Figure 7. The quadratic penalty factor influences the decomposition results. When the quadratic penalty factor is 2000, VMD has certain adaptability and can avoid mode mixing.

**Figure 7.** Flowchart for determining *K*.

#### **5. Experiment and Analysis**

#### *5.1. Wind Speed Data*

The data were obtained from a wind farm in Hebei. The sampling interval used in collecting the the data was 1 h. Hebei is located in a temperate monsoon climate, and the characteristics of the data consequently vary from season to season. Figure 8 shows the statistics related to the wind speed data in different seasons.

As can be seen in Figure 8, the maximum and average wind speed in summer is higher than in other seasons, indicating abundant wind energy resources. In addition, the wind speed in summer varies greatly and has strong randomness, with the highest standard deviation.

Figure 9 shows the decomposition result of the wind speed series from April 14th to May 18th, that is, in summer, in which *C*1–*C*<sup>7</sup> are narrowband components. It can be clearly seen that the trend of each component is more regular than the original wind speed series. *C*<sup>8</sup> is the residual component. Although it contains noise, it may contain part of the information of the original wind speed series as well. Therefore, permutation entropy was utilized to assess the signal's randomness and determine whether the residual component could be considered a component of the original wind speed series.

**Figure 8.** Statistical data for wind speed in different seasons.

**Figure 9.** Wind speed series decomposition results.

#### *5.2. Accuracy Assessment*

In this paper, the mean absolute error (MAE) and the root mean square error (RMSE) were selected as the evaluation indicators

$$\text{MAE} = \frac{1}{N} \sum\_{i=1}^{N} |\mathbf{x}\_i^a - \mathbf{x}\_i^f| \tag{19}$$

$$\text{RMSE} = \sqrt{\frac{1}{N} \sum\_{i=1}^{N} (\boldsymbol{\alpha}\_i^a - \boldsymbol{\alpha}\_i^f)^2} \tag{20}$$

where *N* is the length of the forecasting wind speed series, *x<sup>a</sup> <sup>i</sup>* denotes the true value, and *x f <sup>i</sup>* represents the forecasting value.

#### *5.3. Results and Analysis*

#### 5.3.1. Forecasting Result

In each quarter, we randomly selected a week of wind speed data as the test set and used the four weeks of data before the test set as the training set; the specific division is shown in Table 1.

The parameters used for the M-Transformer were as follows: the encoder stack consisted of four encoders, the decoder stack contained four decoders, the dropout rate was equal to 0.1, the learning rate was set to 0.002, the batch size was set to 72 and 1 for the training and testing process, respectively, the optimizer was adma, and the loss function was the mean square error (MSE). Both the training and testing process were implemented in the Python 3.7 platform.

**Table 1.** Division of the training and testing sets.


Figure 10 is a scatter diagram of the forecasting results for each season. The abscissa and ordinate are the forecasting and the true wind speed, respectively. The closer the data points are to the 45° line, the better the forecasting results. In Figure 10, the data points are closely distributed on the 45° line and on both sides, indicating that the proposed model achieves good performance.

**Figure 10.** The forecasting results of the proposed model.

#### 5.3.2. Comparative Experiments

In this paper, three single models and three combined models were selected as comparison models to further verify the superiority of VMD-TF. The single models were ARIMA, BP, and LSSVM, and the combined models were EAW [31], WEE [32], and RWA [33]. The forecasting results of each model were evaluated by MAE and RMSE respectively, and the specifics are shown in Figure 11.

Equation (21) was utilized as the evaluation indicator to compare the improvement of VMD-TF to the other models:

$$I\_{index} = \frac{E\_p - E\_c}{E\_c} \times 100\% \tag{21}$$

Here, *Iindex* denotes the performance improvement index and *Ep* and *Ec* are the error of the VMD-TF and the comparison model, respectively. Table 2 shows the specific results.


**Table 2.** The performance improvements achieved by the proposed model.

The results of the comparative experiment show the following.


#### 5.3.3. Effectiveness of VMD

We employed EMD and EEMD as comparison methods to demonstrate that VMD could effectively reduce the influence of wind speed non-stationarity. The model combining M-Transformer with EMD is referred to as EMD-TF, while the model combining M-Transformer with EEMD is referred to as EEMD-TF. We used M-Transformer to forecast the wind speed directly without decomposition, in which case it is referred to as TF. In analyzing the capability of these models, the summer testing set was used. Figure 12 exhibits the comparisons between the forecasting values and the true values, while Table 3 shows the forecasting errors for each model.

**Figure 12.** Comparison of the forecast and true wind speed for summer testing data.

According to Figure 12, even when the wind speed changes greatly VMD-TF is able to track and forecast well, while TF, EMD-TF, and EEMD-TF cannot respond as well to such mutations. According to Table 3, the forecasting result with VMD-TF is the best, while TF is the worst. Thus, we are able to conclude that signal decomposition methods can greatly enhance wind speed forecasting accuracy, and that of the methods investigated here, VMD shows the best performance.

**Table 3.** The forecasting errors of each model with the summer testing data.


#### 5.3.4. Effectiveness of M-Transformer

In order to illustrate that the M-Transformer model has good forecasting ability, we selected ARIMA, BP, the deep belief network (DBN), and LSTM as comparisons. These models, each composed of VMD and a single model, are referred to as VMD-ARIMA, VMD-BP, VMD-DBN, and VMD-LSTM, respectively. To assess the performance of these

combined models, the winter testing set was used. Figure 13 compares the forecasting values and true values, while Table 4 shows the forecasting errors for each combined model.

**Figure 13.** Comparison of forecast and real wind speed with winter testing data.

In Figure 13, all forecasting wind speed curves appear to be relatively close to the true wind speed curve. According to Table 4, however, the MAE and RMSE of VMD-TF are the smallest. Taking MAE as an example, the accuracy of VMD-TF decreased by 33%, 17%, 15%, and 9%, respectively, compared with the other four models, which shows that M-Transformer has superior performance.



#### **6. Conclusions**

In this paper, we have proposed a variable support segment-based short-term wind speed forecasting model. Several conclusions can be drawn based on our experiments and analysis.


Although VMD-TF shows significant performance achievements, it neglects the impact of meteorological factors, which limits its ability to deal with sudden changes in wind speed. In future work, we intend to develop a model that is able to take into account both historical wind speed data and prevailing meteorological factors that influence wind speed.

**Author Contributions:** Conceptualization, K.Z.; methodology, K.Z. and X.L.; software, K.Z.; validation, X.L.; formal analysis, J.S.; investigation, X.L.; resources, J.S.; data curation, K.Z.; writing original draft preparation, K.Z.; writing—review and editing, X.L. and J.S.; visualization, X.L.; supervision, J.S.; project administration, J.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Lianhong Chen 1, Chao Wang 1, Rigang Zhong 1, Jin Wang <sup>2</sup> and Zheng Zhao 2,\***


**Abstract:** The incineration process in waste-to-energy plants is characterized by high levels of inertia, large delays, strong coupling, and nonlinearity, which makes accurate modeling difficult. Therefore, an intelligent modeling method for the incineration process in waste-to-energy plants based on deep learning is proposed. First, the output variables were selected from the three aspects of safety, stability and economy. The initial variables related to the output variables were determined by mechanism analysis and the input variables were finally determined by removing invalid and redundant variables through the Lasso algorithm. Secondly, each delay time was calculated, and a multi-input and multioutput model was established on the basis of deep learning. Finally, the deep learning model was compared and verified with traditional models, including LSSVM, CNN, and LSTM. The simulation results show that the intelligent model of the incineration process in the waste-to-energy plant based on deep learning is more accurate and effective than the traditional LSSVM, CNN and LSTM models.

**Keywords:** waste-to-energy; deep learning; variable selection; intelligent modeling

#### **1. Introduction**

At present, the treatment methods for domestic waste usually include landfill, compost and incineration [1,2]. According to the statistics and the volume of domestic waste removal and transportation, the proportion of landfill treatment, compost treatment and incineration treatment is 58.30%, 2.10% and 36.20% respectively, and the remaining 3.40% are treated by simple landfill and stacking [3]. However, due to problems such as the large amount of land required and environmental pollution, the proportion of landfill treatment and compost treatment is decreasing year by year. The waste incineration process reduces the content of harmful substances in the waste by pyrolysis and oxidation under high temperature and high pressure. The volume of waste after incineration is reduced by more than 85% and the weight is reduced by more than 75%. The waste incineration process greatly eliminates the germs and harmful components in the waste, thus achieving the efficient treatment of the waste. Additionally, the energy generated by incineration can be used to generate electricity to realize a major goal of waste recycling [4]. It has been suggested that waste incineration power generation technology has the advantages of "reduction, recycling, and harmlessness", and that it is currently the best way to deal with domestic waste [5]. However, due to the complex composition of waste, the large fluctuations in waste calorific value, and the fact that the incinerator is a multi-input and multi-output (MIMO) object distinguished by high levels of inertia, large delays, strong coupling, and nonlinearity, it is difficult to meet the needs of the subsequent combustion optimization. Therefore, establishing an accurate and reliable intelligent model of the incineration process of waste-to-energy plants is the key to subsequent incineration optimization [6,7].

Elisa [8] et al. used the mechanism modeling method to model the incineration process of waste-to-energy power plants, but there were problems related to the complicated

**Citation:** Chen, L.; Wang, C.; Zhong, R.; Wang, J.; Zhao, Z. Intelligent Modeling of the Incineration Process in Waste Incineration Power Plant Based on Deep Learning. *Energies* **2022**, *15*, 4285. https://doi.org/ 10.3390/en15124285

Academic Editor: Surender Reddy Salkuti

Received: 1 May 2022 Accepted: 10 June 2022 Published: 10 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

<sup>1</sup> Shenzhen Energy Environment Engineering Co., Ltd., Shenzhen 518048, China; chenlianhong@sec.com.cn (L.C.); wangchaob@sec.com.cn (C.W.); zhongrigang@sec.com.cn (R.Z.)

derivation process and low precision. Therefore, data-driven modeling has been widely used in combustion process modeling [9,10]. Peng et al. [11] established a multi-input and single-output model for boiler combustion oxygen content based on big data and a neural network, and enhanced the neural network through Bayesian arithmetic, which solved the problem of slow learning speed and the problem of obtaining the optimal value in a small range of the classical neural network. Based on the operating data for a boiler in a thermal power plant, Song et al. [12] used a radial basis neural network to establish a model with the flue gas oxygen content, furnace negative pressure and steam pressure as outputs. Compared with a back propagation (BP) neural network, a radial basis function (RBF) neural network has better categorization capability, approximation ability and learning speed, but it has poor resistance to noise in the sample data. Zhong et al. [13] used the particle swarm algorithm and support vector machine to establish a boiler exhaust gas temperature model of a 660 MW unit, which provided guidance for the operation of the boiler. However, for large amounts of sample data, support vector machines are prone to overfitting and lack modeling accuracy. Due to the structural limitations of the algorithms themselves, these algorithms cannot mine the deep information in the sample data [14].

With the rapid development of artificial intelligence, modeling methods based on deep learning have attracted more and more scholars' attention. Hu et al. [15] established a boiler combustion efficiency model using a convolutional neural network for a 600 MW supercritical unit boiler in Henan. Yu et al. [16] used deep CNN and support vector machine to extract and analyze the deep features of flame images, and realized the modeling of the NOx concentration of a 4.2 MW heavy oil combustion boiler. Zhang [17] established a deep neural network model of a stacking noise reduction autoencoder and LSTM network considering the characteristics of ultra-supercritical units such as high inertia, large delays and noise in the actual data. However, the existing modeling methods generally have defects such as too few input and output variables, which are far from the actual operation of the actual unit, and the inability to express the dynamic characteristics of the model. Therefore, in the process of model establishment, in addition to selecting the modeling method, the selection of input variables will also affect the modeling accuracy. Wang et al. [18] utilized principal component analysis means to lower the dimension of the multi-dimensional input variables of wind turbines. Although the feature dimension was reduced, the original data was changed and the interpretability of the model was reduced.

Incinerator incineration process modeling data are characterized as complex large sample data, nonlinear time series, etc. Compared with the methods described above, the deep learning network method can use the complexity relationship between data to automatically model and adjust the model parameters so as to establish the optimal nonlinear model between input and output. That is, this method is able to use the time series' characteristics or other complex relationships between historical data to model through deep learning networks. In summary, an intelligent model of the incineration process of waste-to-energy plants based on deep learning is elicited. First, the output variables are selected from the three aspects of safety, stability and economy. The initial variables related to the output variables are determined. The input variables were finally determined by removing the invalid variables and redundant variables through the Lasso algorithm. Secondly, each delay time is calculated, and a multi-input and multi-output model based on deep learning is established. Finally, the model proposed in this paper is compared with the traditional model to verify the improvement in its accuracy.

#### **2. Basic Method Principle**

#### *2.1. Waste-to-Energy Treatment Technology*

After the garbage is transported to the incineration plant, it is fermented in a garbage storage tank for 3–5 days to increase the calorific value. The calorific value of the waste after fermentation is about 1800–2100 kcal/kg. Garbage fermentation mainly relies on the role of microorganisms in the garbage. At the same time, during the storage of the garbage, the water in the garbage is continuously leached out. After storage and fermentation, the

garbage is moved to the feeding hopper by a hanging garbage grab, and is transported to the incinerator through the feeding grate. The incinerator is composed of a multi-stage mechanical grate, of which the first and second stages of the incineration grate are the waste drying area, the third and fourth stages of the incineration grate are the waste gasification area (main combustion area), and the fifth stage is the burning ember area. At the same time, each section of the combustion grate includes a fixed grate, sliding grate and turning grate. The sliding grates slowly push the garbage layer forward on the grate, and the function of the turning grate is to drive the overturning grate pieces to turn up and down through the reciprocating rotation of the overturning shaft, so as to support the garbage bed in a local position, destroy the original garbage bed, cause the previously formed bed to dislocate, and break the hard-shell surface and molten layer caused by burning. The primary combustion air is blown into the interior of the garbage bed (like throwing a fire) so that the garbage can be completely burned. The fan system consists of a primary air system, secondary air system and furnace-wall-cooling air system, among the first two wind systems that affect the combustion process. The primary air is extracted from the garbage storage tank, heated to about 180 ◦C by the air preheater, and sent to the bottom of the incinerator through the gap between the grate pieces of the incinerator. Then it penetrates the garbage bed and enters the incinerator chamber, where it is mixed with the garbage and burns. At the same time, a negative pressure in the garbage pond is established to prevent the overflow of garbage odor, so as to achieve effective management of odor. The secondary air mainly adjusts the amount of oxygen to ensure better combustion conditions. In view of the characteristics of China's garbage, which has high moisture content, high non-combustible content and low calorific value, the design of the rear arch of the furnace is adapted, as shown in Figure 1, to form a good aerodynamic field and help combustion [19].

**Figure 1.** Incinerator structure diagram.

#### *2.2. Lasso Algorithm*

Least absolute shrinkage and selection operator (Lasso) is a penalty-based variable selection method first proposed by the famous statistician Robert Tibshirani in 1996 [20,21]. The specific principle is as follows.

The following is an example of a typical linear regression model:

$$y\_i = \beta\_0 + \sum\_{j=1}^p x\_{ij}\beta\_j + e\_i \quad i = 1, 2, \cdots, n \tag{1}$$

There are n sets of observations, and each group of observations consists of an input variable *yi* and *p*-correlated predictor variables *xi* = (*xi*1, *xi*2, ··· , *xip*) *T*.

The traditional method is to minimize the least squares objective function:

$$\min\_{\beta\_0, \beta} \max\_{i=1}^n \left( y\_i - \beta\_0 - \sum\_{j=1}^p x\_{ij}\beta\_j \right)^2 \tag{2}$$

In the general formula, the least squares estimation of *β* is not 0, and if *n* < *p*, there are countless solutions that make the objective function 0, hence the result of the least squares estimation is not unique. Therefore, this process needs to introduce a penalty function, that is, regularization. The Lasso algorithm is based on the least squares estimation to introduce a penalty factor to constrain the norm of *β*, as shown in the formula.

$$\begin{aligned} RSS(\hat{\beta}) &= \underbrace{\operatorname{argmin}\_{i=1}^{n} \sum\_{i=1}^{n} (y\_i - \beta\_0 - \sum\_{j=1}^{p} \beta\_j x\_{ij})^2 + \lambda \sum\_{j=1}^{p} |\beta\_j|}\_{\beta} \\ \hat{\beta} &= \underset{s.t}{\operatorname{argmin}} \left\{ ||y - \beta\_0 - \mathbf{x}\beta||\_2^2 \right\} \\ &\qquad s.t \quad ||\beta|| \le t \end{aligned} \tag{3}$$

In the formula, *λ* ≥ 0 is the hyperparameter, *λ p* ∑ *j*=1 *βj* is the compression penalty, *t* is the adjustment parameter, and the inequality *β* ≤ *t* effectively restricts the parameter space and realizes feature selection.

#### *2.3. Model Building Based on Deep Learning*

#### 2.3.1. Convolutional Neural Network

CNN is a feedforward network that was first used in image processing and has excellent performance [22]. CNN has the characteristics of weight sharing, local connection, and dimensionality reduction sampling, and can fully mine the local characteristics of the data itself.

CNN generally contains three basic layers: a convolutional layer, pooling layer and fully connected layer. The pixels in the local area of the input image are weighted by the weight coefficient of the convolution kernel, the operation of feature extraction is completed by the convolution layer, and the activation function introduces nonlinear changes to the network model. The pooling layer performs dimension reduction sampling on the output of the convolutional layer, and at the same time, the pooling operation results in translation invariance in the CNN. The fully connected layer is where each node is linked to all the nodes in the previous layer, which is used to synthesize the features extracted in the front, as depicted in Figure 2.

**Figure 2.** Construction of CNN.

This paper uses CNN to fully mine the features of the data, and the feature data processed by the convolution operation is sent to the Bi-LSTM network for further operations.

#### 2.3.2. Bi-LSTM Model

The cyclic unit construction of LSTM is exhibited in Figure 3.

**Figure 3.** Cyclic unit construction of the LSTM network.

$$
\begin{bmatrix} \overline{c}\_t \\ o\_t \\ \dot{i}\_t \\ f\_t \end{bmatrix} = \begin{bmatrix} \tanh \\ \sigma \\ \sigma \\ \sigma \end{bmatrix} \left( P \begin{bmatrix} \chi\_t \\ h\_{t-1} \end{bmatrix} + b \right) \tag{4}
$$

$$\dot{\mathbf{u}}\_{t} = \sigma(P\_{i}\mathbf{x}\_{t} + Q\_{i}h\_{t-1} + b\_{i})\tag{5}$$

$$f\_t = \sigma \left( P\_f \mathbf{x}\_t + Q\_f h\_{t-1} + b\_f \right) \tag{6}$$

$$\rho\_t = \sigma(P\_\bullet \mathbf{x}\_t + Q\_o h\_{t-1} + b\_0) \tag{7}$$

$$\mathcal{L}\_t = f\_t \otimes \mathcal{c}\_{t-1} + i\_t \otimes \tilde{\mathcal{c}}\_t \tag{8}$$

$$\overline{\mathcal{E}}\_t = \tanh(P\_c \mathbf{x}\_t + Q\_c h\_{t-1} + b\_c) \tag{9}$$

$$h\_t = o\_t \odot \tanh(\mathcal{L}\_t) \tag{10}$$

First, use the input *xt* at the present moment and the external condition *ht*−<sup>1</sup> at the previous moment to calculate *ft*, *it*, *ot* and *<sup>c</sup>t*. Secondly, use *ft* and *it* to update the memory unit *ct*, and finally, pass the internal state information to the external state *ht* in combination with *ot* [23].

LSTM can only extract forward sequence information, while Bi-LSTM (Bidirectional Long Short-Term Memory) extracts sequence information in both directions to obtain more data features [24]. Bi-LSTM consists of two layers of LSTM networks with the same input and different information transfer directions. The Bi-LSTM structure is shown in Figure 4.

**Figure 4.** Two-way cyclic neural network expanded by time.

$$h\_t^{(1)} = f(\mathcal{U}^{(1)} h\_{t-1}^{(1)} + \mathcal{W}^{(1)} \mathbf{x}\_t + b^{(1)}) \tag{11}$$

$$h\_t^{(2)} = f(\mathcal{U}^{(2)} h\_{t+1}^{(2)} + \mathcal{W}^{(2)} \boldsymbol{\varkappa}\_t + \mathcal{b}^{(2)}) \tag{12}$$

$$h\_t = h\_t^{(1)} \oplus h\_t^{(2)} \tag{13}$$

⊕ is the vector concatenation.

#### 2.3.3. Intelligent Model of Incineration Process Based on Deep Learning

The CNN-BiLSTM combined model not only combines the feedforward mechanism of the CNN with the feedback mechanism of the RNN, but also greatly reduces the computational cost through feature extraction of the CNN. Furthermore, by combining the models with BiLSTM, the model accuracy is improved.

Unlike steady-state modeling, dynamic models take into account the effects of time [25]. In the dynamic model of the CNN-BiLSTM incineration process, the output is not only related to the current data of each auxiliary variable, but also related to the delay time of the input and output variables [26].

In describing the dynamic characteristics of the waste incinerator combustion process, due to the existence of the delay time d, the sampling point at time t can be represented as {*x*(*t* − *d*), *y*(*t*)}. By discretizing the dynamic model of the incineration process, the difference equation form of the dynamic model is obtained as:

$$y(t) = f[y(t-1), \dots, y(t-n), x(t-d)] \tag{14}$$

The above formula can be expressed as an intelligent model of the incineration process of waste-to-energy plants, which is a MIMO model. From the above equation, the output variable of the model can be obtained and is shown as the relationship of the output values of *n* past moments and the input values of d past moments.

#### **3. Intelligent Model of Waste-to-Energy Plant Incineration Process Based on Deep Learning**

*3.1. Variable Selection Based on the Lasso Algorithm*

Before building an incineration model, the input variables and output variables of the model should be determined. In this paper, the selection of the output variables of the intelligent model of waste-to-energy plant incineration process took the three aspects of safety, stability and economy into consideration. The output variables are the oxygen content of the boiler flue gas outlet (*CO*<sup>2</sup> ), steam flow (Q), the furnace temperature of the incinerator (T1), and the temperature of the ember section (T2). The amount of oxygen is related to the load, and the amount of oxygen is used as a precursor to load changes. When the oxygen feedback value is higher than the set value, it means that the air volume is excessive or the garbage calorific value is insufficient. At this time, the boiler load decreases. When the oxygen feedback value is lower than the set value, the boiler load will increase. At this time, the boiler load should be reduced. If the oxygen content is lower than a certain level, it means that there is an abnormal situation such as an explosion in the furnace. At this time, the feeding should be stopped to prevent the garbage in the furnace from deflagrating and causing danger. When the steam flow is stable, this ensures that the steam turbine and generator work at the rated load and the equipment performance is good. Ensuring that the furnace flue gas stays above 850 ◦C for 2 s can prevent the generation of harmful flue gas dioxins. The temperature of the ember section is maintained within a certain range, which ensures that the garbage is fully burned and improves the combustion efficiency.

Mechanism analysis was used to select the input variables that have an impact on the output variables. A total of 19 variables related to the output variables were screened out from the variables collected by the power plant, namely, primary air flow, the temperature of primary air, unit 1–5 primary air flow, secondary air flow, the temperature of secondary air, unit 1–5 material layer thickness, the transmission speed of the sliding grate unit 1, the transmission speed of the sliding grate unit 2, the transmission speed of the sliding grate unit 3, the transmission speed of the sliding grate unit 4, and the transmission speed of the sliding grate unit 5. Since the secondary air temperature basically does not fluctuate, and there is a coefficient relationship between the transmission speeds of the five units of the sliding grate, 19 variables were screened and 14 initial variables were obtained.

The 14 initial variables were selected by the Lasso algorithm, and 8 input variables were finally obtained, namely, primary air flow, unit 1 primary air flow, unit 4 primary air flow, unit 5 primary air flow, secondary air flow, unit 1 material layer thickness, material layer thickness of unit 5, and conveying speed of sliding grate unit 1. Then, 25,200 sets of data were selected from a northern waste power plant from 16:00 on 6 August 2019 to 10:00 on 8 August 2019, and the sampling time was 6 s. The unit and variation range of each input variable are shown in Table 1. The local trend diagrams of the input variables are shown in Figure 5.

**Figure 5.** The local trend diagrams of the input variables.


**Table 1.** Unit and variation range of input variables.

#### *3.2. Calculation of Delay Time*

Waste-to-energy generating units typically have large delays, and there is a time delay between the various data collected by the power plant. When an input variable changes, it takes a while for the output variable to react to the change. In order to ensure the consistency of each input variable and output variable in the time sequence, a time delay compensation algorithm based on mutual information is proposed. Mutual information can calculate the correlation between the two groups of samples. By determining the mutual information numerical value between the input variables and the output variables, the delay time between the input variables and the output variables can be obtained (the specific process can be found in [26]). Taking the steam flow as the output as an example, Table 2 shows the maximum mutual information of each input variable for the steam flow and the corresponding time delay at this time.

**Table 2.** Delay time and maximum mutual information of auxiliary variables.


#### *3.3. Model Establishment*

The steps to build an intelligent model of the incineration process of a waste-to-energy plant are as follows:


The CNN-BiLSTM model structure includes: 1 CNN, 1 max pooling layer, 1 dropout layer, 2 BiLSTM layers, and 1 fully connected layer. The parameters of the final training model are: the batch size is 128, epochs are 50, and the Adam optimization algorithm is selected.

In Figure 6, u is the initial variable, x is the input variable after variable selection, dij is

the delay time between the input xi and the output yj, **<sup>Y</sup>** is the actual value, and **^ Y** is the model output value.

**Figure 6.** Modeling process diagram.

The model building process is shown in Figure 6.

#### **4. Model Establishment and Result Analysis**

#### *4.1. Model Evaluation Indicators*

The model evaluation indicators used in this paper are MAE, MAPE, and RMSE. The closer the value is to 0, the more accurate the output of the model. The calculation formula is as follows:

$$MAE = \frac{1}{n} \sum\_{i=1}^{n} |y\_i - \hat{y}\_i| \tag{15}$$

$$MAPE = \frac{1}{n} \sum\_{i=1}^{n} \left| \frac{y\_i - \hat{y}\_i}{y\_i} \right| \times 100\% \tag{16}$$

$$RMSE = \sqrt{\frac{\sum\_{i=1}^{n} \left(y\_i - \hat{y}\_i\right)^2}{n}} \tag{17}$$

where *yi* is the actual sample value, *y*ˆ*<sup>i</sup>* is the model output value, and *n* is the amount of data.

MAE reflects the magnitude of the deviation of the measured value from the true value; MAPE reflects the degree to which the sample output value deviates from the measured value; and RMSE reflects the sample standard deviation of the bias between the model output value and the measured value, reflecting the degree of dispersion of the sample. The combination of the three can better represent the precision of the model.

#### *4.2. Model Establishment Result Analysis*

.To test and verify that the output of the model is accurate, the first 90% of the 25,200 sets of data were used as the training set and the last 10% ere used as the test set in chronological order. Before the data is entered into the model, the data should be normalized to eliminate the influence of the input variables on the output of the modeling results due to different magnitudes and speeding up the running time of the model.

#### 4.2.1. The Influence of Variable Selection on Modeling Results

There were still invalid variables and redundant variables in the variables obtained after the mechanism analysis. So as to solve the issue of model overfitting and further simplify the model, the Lasso algorithm was used for variable selection. Experiments were executed on the 14 initial variables obtained from the mechanism analysis and the 8 input variables selected by the Lasso algorithm to verify that the output of the model was accurate. The final evaluation indexes are revealed in Table 3.

**Table 3.** Evaluation index of the influence of variable selection on modeling results.


It can be seen from Table 3 that after the selection of the initial variables, the test set error decreased. This phenomenon shows that since there are invalid variables and redundant variables in the initial variables, if these variables continue to be retained in the input variables, the generalization ability of the model will be reduced. Therefore, it is more efficient to select the variables before building a model as this not only simplifies the model and reduces the computing time during modeling, but also prevents the model from overfitting and improves the model accuracy.

#### 4.2.2. The Influence of Different Models on the Modeling Results

Three classic models, LSSVM, CNN, and LSTM were selected for comparison to verify the effectiveness of the CNN-BiLSTM model. When other conditions were kept the same, the modeling accuracy is as shown in Table 4, and a comparison of the modeling results of the CNN-BiLSTM model is presented in Figure 7.

**Figure 7.** Modeling result diagram. (**a**) Comparison of output results of steam flow model; (**b**) comparison of the output results of the flue gas oxygen content model; (**c**) comparison of output results of the furnace temperature model; (**d**) comparison of the output results of the temperature model in the ember stage.


**Table 4.** Influence of different models on modeling accuracy.

Table 4 shows that, taking the steam flow as an example, the mean absolute error of LSSVM is 0.178, the mean absolute error of CNN is 0.292, the mean absolute error of LSTM is 0.066, and the mean absolute error of CNN-BiLSTM is 0.051. So, the order of model accuracy from low to high is CNN, LSSVM, LSTM, CNN-BiLSTM. It can be seen that the intelligent model based on deep learning can effectively improve the utility value of the traditional model, and it has stronger generalization ability and modeling accuracy.

#### **5. Conclusions**

A combined model based on feature selection and CNN-BiLSTM was constructed in this paper. First, the Lasso algorithm was used to remove invalid and redundant variables from the initial variables to determine the input variables, and the effective feature information was extracted through the CNN network. Finally, the BiLSTM network was used to train the model. Historical operation data from a waste-to-energy plant in the north was used for simulation analysis. The main conclusions are as follows:


**Author Contributions:** Conceptualization, L.C., C.W. and R.Z.; methodology, C.W., R.Z. and J.W.; software, L.C., R.Z. and J.W.; validation, R.Z., C.W. and J.W.; formal analysis, L.C., C.W. and R.Z.; investigation, L.C., R.Z. and J.W.; resources C.W., J.W. and Z.Z.; data curation, L.C., C.W. and R.Z.; writing—original draft preparation, C.W., R.Z. and J.W.; writing—review and editing, L.C., C.W., R.Z., J.W. and Z.Z.; visualization, C.W., R.Z. and J.W.; supervision, L.C., C.W. and R.Z.; project administration, C.W., R.Z. and J.W.; funding acquisition, R.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Shenzhen Special Sustainable Development Science and Technology Project, grant number KCXFZ20201221173402007.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Kang Bai 1, Yong Zhou 2, Zhibo Cui 1,\*, Weiwei Bao 2, Nan Zhang <sup>2</sup> and Yongjie Zhai <sup>1</sup>**


**Abstract:** In this paper, a method of power system equipment recognition based on image processing is proposed. Firstly, we carry out wavelet transform on the sound signal of power system equipment collected from the site, and obtain the wavelet coefficient–time diagram. Then, the similarity of wavelet coefficients–time images of different equipment and the same equipment in different periods is calculated, which is used as the basis of the feasibility of image recognition. Finally, we select the HOG features of the image, and classify the selected features using SVM classifier. The method proposed in this paper can accurately identify and classify power system equipment through sound signals, and is different from the traditional method of classifying sound signals directly. The advantages of image processing can be effectively utilized through image processing to avoid the limitations of sound signal processing.

**Keywords:** electric power equipment; voice recognition; HOG feature extraction; SVM classifier; image processing

#### **1. Introduction**

With the gradual development of large-scale, integrated, highly automated and intelligent power system equipment, not only are rapid economic benefits introduced, but also the risk of great loss caused by sudden equipment failure is increased. Therefore, the comprehensive, timely and accurate monitoring of the power system equipment health status ensures the stable operation of equipment, reduces the accidental shutdown rate and has a high investment–income ratio. To this end, researchers carried out systematic research on temperature, vibration, image and other aspects of various power system equipment, and obtained effective information characteristics [1–3]. In addition, artificial intelligence [4], deep learning [5] and neural network [6] have been used to realize fault monitoring of equipment.

According to Kafeel et al. [7], current, sound and vibration are the most commonly monitored parameters. In Ribeiro et al. [8] a hydro-generator current-monitoring system is proposed and the fast Fourier transform (FFT) is applied to the Parker transform of the current. Song et al. [9] used the bin method, the method based on multivariate normal distribution and the Copula method to compare three Bayesian diagnosis models on account of SCADA (Supervisory Control And Data Acquisition). Li et al. [10], aiming at the problems of high-speed and long-distance transmission and greatly increasing data storage capacity, proposed a method on account of adjustable q-factor wavelet transform morphologic module analysis, including few and scattered Bayesian iterative arithmetic unite stepping pulse dictionary. Yu et al. [11] try to build a rough set with feature relationships, then use a distribution reduction arithmetic to dislodge unnecessary features and send the remaining features to a flexible naive Bayesian sorter for malfunction diagnoses. In Herp et al. [12], a method is proposed to establish a fault-diagnosis model by learning fault samples, assuming that the error features picked up from SCADA (Supervisory Control

**Citation:** Bai, K.; Zhou, Y.; Cui, Z.; Bao, W.; Zhang, N.; Zhai, Y. HOG-SVM-Based Image Feature Classification Method for Sound Recognition of Power Equipments. *Energies* **2022**, *15*, 4449. https:// doi.org/10.3390/en15124449

Academic Editors: Abu-Siada Ahmed and Jong Hoon Kim

Received: 18 May 2022 Accepted: 16 June 2022 Published: 18 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

And Data Acquisition) data compliance a Gaussian distribution in the characteristic space. Wang D. [13] present a method for improving wavelet filtering by combining infographics and Bayesian inference to confirm the best wavelet argument and apply to malfunction diagnoses. In Li et al. [14], in the process of fault feature extraction, the importance of different signals is optimized by particle swarm optimization. Yu et al. [15] propose an error-feature collection means based on Mean Multigrain Decision Theory Rough Sets (MMGDTRS) and Non-Naive Bayes Classifier (NNBC). Li et al. [16] present a new first-rank Bayesian command method for predicting early failure of gear-shaft systems with locally observable degradation and random failure. A polybasic Bayesian command strategy on account of Hidden Semi-Markov Model (HSMM) is proposed. In Liu et al. [17], a state-monitoring method of rolling bearings based on hybrid generalized HMM is introduced, which uses interval value features to effectively identify and classify the state in the machine process. In Gan and Jiao [18], a malfunction diagnoses means of wavelet transform gearbox on account of ameliorated inheritance arithmetic radio frequency sorter is proposed. Li et al. [19] introduced a malfunction diagnoses means for gearboxes on account of deep radio frequency integration of aural and oscillation signals. Han and Jiang [20] use VMD to acquire eigenvectors and send them to RF for fault diagnosis. Qin [21] welded Ensemble Empirical Mode Decomposition (EEMD) and RF for malfunction diagnoses. Verellen et al. [22], aiming at the detection of bearing faults in rotating machinery, propose a non-invasive acoustic signalmonitoring system based on a sparse microphone array. Traditional vibration analysis uses accelerometers, which are touch sensors that need to be attached to the component under investigation. Smieja et al. [23] proposed an interesting non-contact vibration monitoring method in which image processing is used. Cao et al. [24] proposed a pipeline robot fault diagnosis system based on sound-signal recognition, which transmits the sound signal collected by the storage sensor to the upper computer for fault diagnosis, and the test has achieved good results. Suman et al. [25] proposed an acoustic signal mode-determination algorithm based on adaptive Kalman filtering and MFCC, which can effectively detect vehicle health status by using acoustic signals to detect vehicle mechanical faults. Rakesh Kumar et al. [26] established a rainforest species audio signal-recognition model based on the combination of long short-term memory (LSTM) and convolutional neural network (CNN). The models are combined to achieve a high-accuracy, low-loss detection method. Zhuo et al. [27] proposed a program for on-line diagnosis of steel truss structures using sound signals, and proposed an improved offline database-guided response power and phase transformation method. Experiments show that this method can achieve accurate positioning in strong noise environments, and the amount of computation is smaller.

In this paper, the audio-signal monitoring of power equipment is studied deeply. At present, most sound-signal-processing technologies are based on the receiving frequency range of human ear mechanism. The existing technologies lead to many high- and lowfrequency sound signals beyond the range of the human ear not being effectively utilized, resulting in the loss of a large number of effective signal data. However, even if the whole-frequency-band signal-extraction method is adopted, the characteristics of signals are difficult to separate from each other, and the extraction is difficult. The essential reason for these problems is that the coverage of sound signals is extremely wide, so the difficulty of recognition is greatly increased [28]. It can be seen that the traditional sound signalprocessing technology has considerable limitations. In order to solve this problem, we took another analytical way of thinking: no longer the traditional method, but the audioprocessing problem transferred to the field of image processing. As a result, this paper proposes a power equipment based on wavelet transform voice-fault identification analysis method, in which the access to the audio signal by DWT abstracts the wavelet coefficient of sound. The time-frequency diagram and wavelet coefficient diagram of sound signal are output, and the method of machine learning [29] is applied to analyze sound information from the perspective of image texture. In this method, the whole frequency band of sound signal is extracted without any filtering, and then the sound signal is translated into image

processing, which can effectively avoid the loss of information data and make use of the advantages of image recognition for classification.

#### **2. Audio Signal Analysis Based on Wavelet Transform**

The overall structure of the research idea is shown in Figure 1. This paper studies the feature extraction method of six kinds of power equipment sounds collected by a 96-channel handheld audio imager. Firstly, we can analyze the audio pre-processing method based on Wavelet and Hamming window, and then we can obtain the audio pre-processing device with different image segmentation coefficients based on Wavelet and Hamming window, and then we can obtain the audio pre-processing device with different image segmentation coefficients; finally, based on this result, we use HOG + SVM method to classify and predict different devices, and find that it has a high recognition rate.


**Figure 1.** The overall idea of the experimental process. The figure includes the power equipment sound-field-acquisition module, the sound signal preprocessing module, the wavelet transform output image module and the image-processing module.

#### *2.1. Sound Signal Preprocessing*

The voice signals collected by the sound imager may have problems such as aliasing, high-order harmonic distortion and high frequency. Before analyzing the sound signals of field equipment, we carry out pre-weighting, framing, windowing and other preprocessing operations so that the signals procured by pursuant voice processing are more consistent and smooth as far as possible, allowing us to afford high-quingity parameters for signal parameter collection and further sound signal processing quality. The specific steps of sound signal preprocessing are as follows:


$$H(z) = 1 - \mu z^{-1} \tag{1}$$

where in 0.9 < *μ* < 1.0, is taken as 0.97 in this paper.

• Normalization. Normalize the spectrum of the preprocessed sound signal to reduce the difference in the frequency range of different types of sound:

$$X = \frac{X - \min(X)}{\max(X) - \min(X)} \tag{2}$$

• Framing and windowing. The sound signal is stable in a short time. The short-time length is generally 10–30 ms. In order to facilitate feature analysis, the sound signal needs to be processed in frames. For purpose of ensuring the smooth conversion between two adjacent frames, the frame signal needs to be superimposed, and then each frame is multiplied by a window function of a certain length for windowing and filtering. In this paper, Hamming window is adopted, and the window function is shown in Formula (3):

$$(0.54 - 0.46\cos\left(\frac{2\pi n}{N - 1}\right)(0 \le n \le N - 1) \tag{3}$$

#### *2.2. Feature Extraction of Audio Signal Based on Wavelet Transform*

Wavelet transform is an important time-frequency analysis approach that combines the time-domain characteristics and frequency-domain characteristics of signals.

#### 2.2.1. Definition of Wavelet Function

The application of wavelet analysis in signal and picture compression is a crucial side of the application of wavelet analysis. It has the characteristics of high compression ratio and fast compression speed. After compression, it can not only keep the traits of the signal and image unvaried, but also resist the interference in transmission. The definition formula is as follows:

$$\mathcal{W}\_f(a,b) = \frac{1}{\sqrt{a}} \sum\_{-\infty}^{+\infty} f(\mathbf{x}) \phi\left(\frac{\mathbf{x} - b}{a}\right) d\mathbf{x} \tag{4}$$

Take the function *φ*(*x*) of the basic wavelet as the displacement *b*, and make the inner product with the signal *f*(*x*) to be analyzed under different scales *a*, with the transformation of *a*, *b* the wavelet transform has the traits of multi-resolution.

#### 2.2.2. Wavelet Sequence

*<sup>ψ</sup>*(*t*) ∈ *<sup>L</sup>*2(*R*), *<sup>ψ</sup>*(*t*) is called a basic wavelet and mother wavelet, where *<sup>L</sup>*2(*R*) refers to the mean square integrable space. Wavelet must meet:

$$\sum\_{-\infty}^{\infty} \psi(t)dt = 0\tag{5}$$

This is also the meaning of "wavelet". After scaling and translating the generating function, the wavelet sequence can be obtained:

$$
\psi(a,b)(t) = \frac{1}{\sqrt{a}} \psi\left(\frac{t-b}{a}\right) \tag{6}
$$

(*a*, *b* ∈ *R*, *a* = 0) *a*, *b* where *a*, *b* is the expansion factor and translation factor, respectively.

#### *2.3. SSIM-Based Image Processing Method*

#### 2.3.1. Definition

Unartificial images have a sehr hoch configuration, especially in the case of spatial similarity, there is a high associations between the pixels of the image. Such associations port crucial information about the configuration of objects in the optical scenario. What we are talking about is finding a more straight method to contradistinguish the configuration of a fuzzy image with that of a reference image.

Structural similarity is a measure of how similar two images are. The SSIM value is between 0 and 1, and the larger its value, the smaller the difference between the images. The definition of SSIM is as in Equation (1) Structural similarity. From the standpoint of image formation, configurational information is defined as a reflection scene that is isolated of brightness and contrast, and the image is modeled by three different factors: brightness, contrast and structure.

Function definition:

$$SSIM(\mathbf{x}, \mathbf{y}) = [l(\mathbf{x}, \mathbf{y})]^a [c(\mathbf{x}, \mathbf{y})]^\beta [s(\mathbf{x}, \mathbf{y})]^\gamma \tag{7}$$

where *α*, *β*, *γ* > 0.

The measure of similarity can be realized by the SSIM measuring system, which can be constituted of three comparison elements of brightness, contrast and structure. Next, we define three contrast functions:

Brightness contrast function:

$$I(x, y) = \frac{2\mu\_x \mu\_y + c\_1}{\mu\_x^2 + \mu\_y^2 + c\_1} \tag{8}$$

Contrast function:

$$c(x, y) = \frac{2\sigma\_x \sigma\_y + c\_2}{\sigma\_x^2 + \sigma\_y^2 + c\_2} \tag{9}$$

Structural contrast function:

$$s(x, y) = \frac{\sigma\_{xy} + c\_3}{\sigma\_{xy} + c\_3} \tag{10}$$

For the above formula, *μx*, *μy*, stand for the whole pixels of the picture; *σx*, *σy*, stand for the criterion differences of picture pixel value; *σxy* stand for the convariance of *x*, *y*; *c*1, *c*2, *c*<sup>3</sup> stand for constants. This is for the purpose of eliminating system fault when the denominator is 0. In practical application, *α* = *β* = *γ* = 1, *c*<sup>3</sup> = 0.5*c*2.

#### 2.3.2. Application of SSIM

In image mass evaluation, obtaining the SSIM index of a certain part is better than all. First, the statistical features of images are generally disproportionally distributed *i* then room; second, image deformation varies with the room; third, under standard visual interval, people can centre around one area of the image, therefore the separate processing of a certain part is more in line with the scope of human vision; fourth, the local quality detection can obtain the mapping matrix of image spatial quality changes, and the results can be used for other applications.

Therefore, in the formula above, *μx*, *σx*, *σxy* both add an 8 × 2 square window and traverse the whole image by every pixels. At every procedure of the computation, *μx*, *σx*, *σxy* and SSIM values ground on the pixels in the window. Finally, an SSIM index mapping matrix is procured, which is composed of certain part SSIM indexes. However, plain-add window will lead to terrible "blocking" impression of the mapping matrix. To resolve the conundrum, we use the 11 × 11 meristic Gaussian weighing function *W* = {*wi*|*i* = 1, 2, ···, *N* } as the weighing window, with a par differences of 1.5, and

$$\sum\_{i=1}^{N} w\_i = 1\tag{11}$$

Then the approximated value of *μx*, *σx*, *σxy* is voiced as:

$$
\mu\_x = \sum\_{i=1}^N w\_i \mathbf{x}\_i \tag{12}
$$

$$\sigma\_{\mathbf{x}} = \left(\sum\_{i=1}^{N} w\_i (\mathbf{x}\_i - \mu\_{\mathbf{x}})^2\right)^{\frac{1}{2}} \tag{13}$$

$$\sigma\_{xy} = \sum\_{i=1}^{N} w\_i (\mathbf{x}\_i - \boldsymbol{\mu}\_x) \left( y\_i - \boldsymbol{\mu}\_y \right) \tag{14}$$

Using this windowing means, the mapping matrix can show the capabilities of certain part isotropy, and then use the evenness SSIM index as the evaluation quality of the entire image:

$$MSSIM(\mathbf{x}, \mathbf{y}) = \frac{1}{MN} \sum\_{1}^{M} \sum\_{1}^{N} SSIM(\mathbf{x}\_{i}, \mathbf{y}\_{i}) \tag{15}$$

In the above, *x*, *y* are images, *xi*, *yi* are the locations of certain part SSIM index in the mapping, *M*, *N* are the number of local windows.

#### *2.4. HOG Feature Extraction Algorithm*

Histogram of Oriented Gradient (HOG) feature is a kind of descriptor that uses computer vision and image processing technology to detect object features. Image features are extracted by calculating and statistical histogram of directional gradient in a specific area of the image. The incorporation of Hog feature extraction and SVM classifier has been diffusely applied in the field of image identification.

#### Feature Extraction Process

(1) Detection window: Hog cut apart the image through window and block. Mathematically process the pixel values of an area in an image in units of cells. Here, we first introduce the concepts of window, block and cell and the relationship between them.


(2) Normalized images: Normalization includes gamma and color room normalization. Normalizing the whole image can effectively reduce the influence of lighting conditions. Normalization can also avoid the large proportion of certain part external exposure contribution in picture grain intensity. Standard Gamma compression formula:

$$l(\mathbf{x}, \mathbf{y}) = l(\mathbf{x}, \mathbf{y})^{\gamma} \tag{16}$$

*γ* takes values based on the effect.

(3) Calculated gradient: Firstly, the gradient value in the horizontal and vertical coordinate orientation is calculated, and the gradient orientation is calculated according to the calculated gradient value. The formula is as follows:

$$G\_x(\mathbf{x}, \mathbf{y}) = H(\mathbf{x} + \mathbf{1}, \mathbf{y}) - H(\mathbf{x} - \mathbf{1}, \mathbf{y}) \tag{17}$$

$$G\_{\mathbf{y}}(\mathbf{x}, \mathbf{y}) = H(\mathbf{x}, \mathbf{y} + 1) - H(\mathbf{x}, \mathbf{y} - 1) \tag{18}$$

For the two formulas *Gx*(*x*, *y*), *Gy*(*x*, *y*), *H*(*x*, *y*) separately stand for the aclinic gradient, perpendicular gradient and pixel value at a specific pixel point of the collected image. The gradient value of amplitude and gradient orientation at pixel (*x*, *y*) are:

$$G(\mathbf{x}, \mathbf{y}) = \sqrt{G\_{\mathbf{x}}(\mathbf{x}, \mathbf{y})^2 + G\_{\mathbf{y}}(\mathbf{x}, \mathbf{y})^2} \tag{19}$$

$$\alpha(\mathbf{x}, \mathbf{y}) = \tan^{-1} \left( \frac{G\_{\mathbf{y}}(\mathbf{x}, \mathbf{y})}{G\_{\mathbf{x}}(\mathbf{x}, \mathbf{y})} \right) \tag{20}$$

(4) Constructing gradient column diagram: The orientation division is determined by bins (number of divisions). Generally, bins takes 9, and the gradient orientation is cut apart into 9 intervals.

(5) Cell-normalized gradient histogram in the block: the increasing range of gradient intensity is greatly affected by local illumination and foreground–background contrast, so normalization is needed.

(6) Generate hog feature vector: finally, combine all blocks to generate feature vector.

#### *2.5. Support Vector Machines (SVM)*

The supervised learning model of support vector machine and its related learning algorithm are widely used in machine learning. It can be used in classification of data and analysis of regression. When given the condition of a set of training specimens, each sample is labeled as one of two different varieties, and the SVM drill algorithm set up a model, deals the new specimens to a certain variety, and constructs an improbability binary linear classifier. The SVM training model represents all specimens as mappings of points in space, and divides the specimens with a wide and obvious gap. The new specimens are then mapped into the same room and their categories predicted.

#### **3. Experimental Result**

Firstly, select the working sound of six types of equipment under a fixed working condition collected from the power plant, the sampling frequency is 16,000 Hz, and the fixed 1 s is the cycle for segmentation; The sound sample data set information of six types of equipment is shown in Table 1:



After segmentation, the 40 s audio signal of one of the six devices is selected for wavelet transform to obtain the time wavelet coefficient diagram, as shown in the following figures.

From the above image results in Figure 2, it can be seen that there are great differences in time wavelet coefficient images between different devices, and the image features are obvious. Based on this result, we intercept the other four 40 s sound data of each device and output their time wavelet coefficient diagrams. According to the obtained images, we found that the similarity of wavelet coefficient images of a device in different periods is very high, but the feature distinction between different devices is still obvious. Therefore, we took out three images of each device for intra-class and inter-class similarity comparison, and the results are shown below.

**Figure 2.** Sample image. The abscissa represents time and the ordinate represents wavelet coefficients.

It can be seen from the Figure 3 that the signal similarity of the same equipment in different periods is generally higher than that between different equipment. Based on the above similarity-matching results, we divide the time wavelet coefficient graphs obtained by each equipment into five different time periods into two groups, one group of four graphs as the training set and the other group of one graph as the test set. In this way, a total of 24 training samples and 6 test samples of 6 types of samples are obtained. The test samples are predicted and classified by using hog feature-extraction algorithm and SVM multi-classification training. The results are shown in Table 2 and Figure 4 below.

**Figure 3.** Scatter diagram of image similarity. The abscissa represents the number of sample groups, and the ordinate represents the sample similarity.


**Table 2.** Classification accuracy of raw data.

**Figure 4.** Original sound classification of equipment. The abscissa represents time, and the ordinate represents wavelet coefficients.

In the field of power production, it is difficult to completely eliminate the noise interference in the extraction process of power equipment sound. Therefore, we add Gaussian white noise to the original power equipment sound signal as interference to verify the accuracy and feasibility of this method. Through experiments, we find that when 10 dB Gaussian white noise is added, the characteristics of the time wavelet coefficient diagram of each equipment are not obvious, so it is difficult to distinguish the equipment, When 20 dB Gaussian white noise is added, the characteristics of each equipment in the time wavelet coefficient diagram appear again. Therefore, we process and classify the sound signal added with 20 dB Gaussian white noise. The results are shown in Table 3 and Figure 5 below.

**Table 3.** Add white noise data classification accuracy.


It can be seen from the experimental results that when white Gaussian noise is affiliated to the sound signal of the equipment, the features of the images of some equipment become more difficult to distinguish, and the recognition accuracy of the image is slightly decreased, but the overall recognition accuracy is high, and the classification effect is obvious. By adding white Gaussian noise of different decibels, it is not difficult to find that noises of different decibels have different degrees of influence on the sound signal of the equipment, which is intuitively reflected in the wavelet coefficient–time diagram, making it more difficult to distinguish image features and equipment identification and classification. Compared with the traditional power equipment sound-recognition method, the advantages of the image processing-based power equipment sound-recognition method proposed in this paper lie in the use of the full frequency range of the sound signal and the

more delicate feature expression. For example, a sound-recognition algorithm for substation equipment based on harmonic characteristics and vector quantization was proposed by Dong et al. [30]. The sampled sound signal of power equipment takes the 27th harmonic within 0–1300 Hz as the feature vector, so there will be a lot of noise. The sound data is not used, and the sound features are difficult to express in detail and comprehensively, which will have a certain impact on the accuracy of the results.

**Figure 5.** Equipment classification after adding 20 dB Gaussian white noise. The abscissa represents time, and the ordinate represents wavelet coefficients.

#### **4. Conclusions**

In this paper, aiming at the sound of six types of thermal power plant power system equipment collected from the scene, the wavelet coefficient–time map of the equipment is obtained through wavelet transformation, and the audio signal is translated into image processing. SSIM algorithm is used to calculate the same at different times and for different equipment, and the image similarity between them can draw a clear difference in terms of image characteristics, which can be used in the classification. Based on this judgment, the obtained images were classified by HOG + SVM fusion method, and 10 dB and 20 dB Gaussian white noise were added to the audio signal, respectively. It was found that noises of different decibels had different degrees of influence on the sound signal of the equipment, and the difficulty of distinguishing the features of the wavelet coefficient–time graph would be improved. Under the influence of 10 dB noise, the characteristic of the wavelet coefficient– time diagram of the equipment is not obvious and difficult to distinguish, but under the influence of 20 dB noise, the difficulty of distinguishing the characteristic of the wavelet coefficient–time diagram of equipment is increased, but the classification effect is good. The experimental results show that the recognition method of sound translation image processing, which is different from the traditional sound-recognition method, has better practical feasibility. The limitation of this paper is that the number of available audio samples is limited, and there is not enough data for training samples. Moreover, only the image obtained by wavelet transform is considered, and whether the image obtained by other methods has better feature distinguishability has not been studied deeply. In the future, we can explore more methods to express characteristic images of sound signals, and continue to study the optimal method of sound signal recognition based on image processing.

**Author Contributions:** Conceptualization, K.B.; methodology, K.B. and Z.C.; software, Z.C.; validation, Y.Z. (Yong Zhou); formal analysis, Y.Z. (Yong Zhou); resources, W.B.; data curation, W.B.; writing—original draft prepara-tion, Z.C.; writing—review and editing, K.B.; visualization, N.Z.; supervision, Y.Z. (Yongjie Zhai); project adminis-tration, Y.Z. (Yongjie Zhai); funding acquisition, Y.Z. (Yongjie Zhai). All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Chenqiang Luo 1,\*, Zhendong Zhang 1,\*, Dongdong Qiao 2, Xin Lai 1, Yongying Li <sup>1</sup> and Shunli Wang 3,4**

<sup>1</sup> College of Mechanical Engineering, University of Shanghai for Science and Technology,

Shanghai 200093, China; laixin@usst.edu.cn (X.L.); 15901865905@163.com (Y.L.)


**\*** Correspondence: 713luochq@163.com (C.L.); usstzzd@usst.edu.cn (Z.Z.)

**Abstract:** Accurate online capacity estimation and life prediction of lithium-ion batteries (LIBs) are crucial to large-scale commercial use for electric vehicles. The data-driven method lately has drawn great attention in this field due to efficient machine learning, but it remains an ongoing challenge in the feature extraction related to battery lifespan. Some studies focus on the features only in the battery constant current (CC) charging phase, regardless of the joint impact including the constant voltage (CV) charging phase on the battery aging, which can lead to estimation deviation. In this study, we analyze the features of the CC and CV phases using the optimized incremental capacity (IC) curve, showing the strong relevance between the IC curve in the CC phase as well as charging capacity in the CV phase and battery lifespan. Then, the life prediction model based on automated machine learning (AutoML) is established, which can automatically generate a suitable pipeline with less human intervention, overcoming the problem of redundant model information and high computational cost. The proposed method is verified on NASA's LIBs cycle life datasets, with the MAE increased by 52.8% and RMSE increased by 48.3% compared to other methods using the same datasets and training method, accomplishing an obvious enhancement in online life prediction with small-scale datasets.

**Keywords:** lithium-ion battery; incremental capacity; automated machine learning; life prediction

#### **1. Introduction**

With the increasing reduction of fossil energy reserves and severe air pollution, considerable attention has been paid to electric vehicles (EVs) in recent years, which can be energy-saving and environmental-friendly solutions, whereas the traditional automobile industry is a big energy consumer, causing serious exhaust emission [1,2].

Lithium-ion batteries (LIBs) are the ideal energy storage device for EVs, and their safe and feasible application as a power source can contribute to their value in their entire lifespan, which can promote secondary utilization and material recycling, conducting the carbon footprint in the battery production and recycling stages [3]. Hence, the safety and reliability of LIBs in EVs have been spotlighted. However, unlike in the laboratory cycle, the battery performance and the available capacity degrades erratically due to random operation during driving, which could cause underuse or overuse of battery cells, leading to resource waste as well as some potential disasters without accurate remaining useful life (RUL) prediction. Consequently, life prediction and capacity estimation of LIBs are facing challenges, and it is worthwhile devoting much effort to elucidate the battery degradation evolution trend in lifespan, thus, avoiding premature replacement and excessive use.

The typical method of LIBs RUL prediction is usually divided into two categories: model-driven and data-driven methods. The model-driven method exploits the intrinsic aging mechanism and induces complex equations to reflect the reactions process. These

**Citation:** Luo, C.; Zhang, Z.; Qiao, D.; Lai, X.; Li, Y.; Wang, S. Life Prediction under Charging Process of Lithium-Ion Batteries Based on AutoML. *Energies* **2022**, *15*, 4594. https://doi.org/10.3390/en15134594

Academic Editor: Quanqing Yu

Received: 26 May 2022 Accepted: 21 June 2022 Published: 23 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

models are often established in the theoretical derivation process using mechanism knowledge. Additionally, model parameters are identified through empirical assumptions and mathematical algorithms, such as extended Kalman filter (EKF) [4], expectation maximization (EM) [5], unscented particle filter (UPF) [6], and autoregressive moving average (ARMA) [7]. Nonetheless, it is difficult to predict precisely because of the complex nonlinear process and discrepancy between data distribution and model hypothesis.

The data-driven method has recently received significant attention in LIB's RUL prediction because of easy access to data and the development of machine learning. The raw measured data during operation can serve as the learning model and bridge the implicit gap between the input and output data. More importantly, the successful application of some advanced learning algorithms in machine translation, speech recognition, and computer vision provides a remarkable applying reference for state estimation and RUL prediction in LIBs.

Data-driven methods for RUL prediction are cataloged as machine learning (ML), artificial neural network (ANN), and deep learning (DL). ML: Richardson et al. [8] proposed a regression of the Gaussian process (GP) algorithm for LIBs RUL prediction, with a good performance in long-term forecasting. Yun et al. [9] explored a hybrid prognosis approach for RUL estimation, combining the variational mode decomposition (VMD), autoregressive integrated moving average (ARIMA), and gray model (GM) models for RUL prediction. ANN: Zhang et al. [10] suggested a novel method based on ANN with four layers for state of health (SOH) estimation and RUL prediction using the incremental capacity curve during the constant current discharge phase. Sun et al. [11] proposed a cloud-edge collaborative strategy with state of health (SOH) for capacity estimation and back propagation neural network (BPNN) optimized by a genetic algorithm for capacity prediction. DL: Dong et al. [12] applied the long short-term memory (LSTM) for the RUL prediction, which can solve the gradient exploding problem during iterating. Zraibi et al. [13] pointed out a CNN-LSTM-DNN algorithm for RUL prediction, in which the three hybrid methods respectively play a critical role. Wang et al. [14] proposed a hybrid method combined with a BiLSTM-AM model and a support vector regression (SVR) model for online life prediction, and the collected initial data are updated by SVR, and BiLSTM-AM is used to predict cycle life. Tang et al. [15] decomposed the original data into high- and low-frequency parts precisely through an ensemble empirical mode decomposition, and the parts separately are predicted by DNN and a self-designed LSTM network, named IRes2Net-BiGRU-FC, which showed a high robustness of RUL prediction in both the CC and CV stages.

Currently, most of the research in RUL prediction currently focuses on the application of deep learning and their variant with an intricate network, which can overcome the vanishing and exploding gradient, over-fitting in training, and distortion in long-term dependence through architecture optimization and hyperparameters tuning, which has made outstanding accomplishments. However, some problems also occur, for example, it is difficult to achieve a good training speed and effect for small-scale data in short order due to unmatched model structure and human experts must be deeply involved in every segment of the designing model because of its knowledge- and labor-intensive characteristic. A model with less human intervention and adjustable structure can broaden the exploration of RUL prediction based on the data-driven method.

In recent years, automated machine learning (AutoML) has emerged as a new subarea in machine learning, aiming at tailoring every segment of the machine learning model as a pipeline automatically without requiring human assistance, a combination of automation and ML as defined in Ref. [16]. This model has applied in predicting COVID-19 pandemic [17], Computer Vision [18], Natural Language Process [19], Video Analytics [20], etc. However, existing AutoML research on RUL prediction is just beginning and challenges do emerge, as dealing with the long multivariate time-series problem requires extensive data pre-processing and feasible feature extraction, to ensure that useful information can be accumulated and transmitted. All the three published articles [21–23] proposed RUL prediction of aircraft engines based on AutoML using a simulated turbofan

engine degradation open-source dataset from NASA PCoE [24]. Kefalas et al. [21] used a mature architecture TPOT [25] in automatic modeling to develop and optimize machine learning pipelines in an automatic manner, introducing expanding windows to extract statistical features to evaluate the degradation accumulated in the early life of the system. Tornede et al. [22] pointed out a cooperative coevolutionary algorithm, which enlarges the number of pipelines that are explored in a single run, through coevolving the two populations, which are in sub-spaces partitioned by search space into feature extraction and regression methods. Tornede et al. [23] proposed an adaptation of the AutoML tool ML-Plan to automated RUL prediction, integrated an automated feature engineering process transforming time-series data into a standard feature representation, which can deal with a prediction as a standard process. RUL prediction of battery is more challenging since equipment as above runs attaching many sensors to monitor the real-time state, generating more input data of model than battery.

This study aims to develop a life prediction approach based on AutoML using the incremental capacity (IC) curve. The main contributions of this study are summarized as follows:


The remainder of this paper is prepared as follows: Section 2 introduces the IC analysis and HI extraction. In Section 3, the model based on AutoML is proposed to predict battery RUL. Section 4 presents the experimental results of the proposed method and the valid comparison with other methods. Finally, a conclusion is given in Section 5.

#### **2. IC Analysis and HIs Extraction**

#### *2.1. Experimental Dataset*

The experimental dataset in this paper is obtained from the NASA Ames Prognostics Center of Excellence, which consists of aging data for 18650 LIBs [26]. The four LIBs B05, B06, B07, and B18 are tested in standard charging and discharging processes under 25 Celsius. The experimental steps of a cycle are as follows and are shown in Figure 1a:


**Figure 1.** The experimental steps and capacity degradation profiles: (**a**) The voltage and current in a test cycle; (**b**) Capacity aging trends of the four batteries.

The experiments were halted when the battery capacity decayed to 70%. The tested charge life cycle number of B05, B06, B07, and B18 batteries are 168, 168, 168, and 132 cycles, respectively. The degradation tendencies of battery capacity under different cycle numbers are described in Figure 1b. In this paper, the charging process is selected to study the aging law of LIBs. Figure 2 illustrates the variations of voltage and current of B05 in the CC-CV charging as battery aging. In the CC charging phase, the duration shortens, and the voltage curve moves leftward as battery cycling, which shows the charging capacity in this phase is decreasing. In the CV charging phase, it presents an increasing duration, and the charging capacity showed an opposite trend as in the CC phase.

**Figure 2.** The variations of voltage and current of B05 in the CC-CV charging: (**a**) Charge voltage; (**b**) Charge current.

#### *2.2. IC Curve and Smoothing Method*

The IC curve indicates the change rate of capacity over the voltage evolution during the charging process as an efficient tool to study the variation in the electrochemical properties of LIBs. It has been proved that the batteries with different aging levels have a slight shift in charging voltage or current curve due to the big voltage plateaus during the low-rate cycle [27]. By calculating the derivative of the charging capacity to battery voltage, the IC curve analysis can convert the voltage plateaus into the intuitive and recognizable fluctuation on the IC curve, to detect a gentle change accurately during battery aging [28,29].

The intensity of reactions between electrodes is affected by battery aging during the charging process, where the difference is implicit in the voltage curve but can be reflected in the IC curve as inflection points or even peaks [30]. We can track the battery state and even predict the battery aging trajectory from the inflection points vanishing, decreasing, and shifting, since the slight capacity aging caused by battery degradation can be identified quantitatively by the IC curve [31].

Because the charging capacity is divided by the terminal voltage change within an equal time interval (ETI) and equal voltage interval (EVI) [32], the IC curve can be obtained as shown in Equations (1) and (2).

$$\text{IC}\_{\text{ETI}} = \frac{dQ\_{\text{C}}}{dV\_{\text{C}}} \approx \frac{\Delta Q\_{\text{C}}}{\Delta V\_{\text{C}}} = \frac{i\_{\text{C}}\Delta t}{V\_{\text{C,2}} - V\_{\text{C,1}}} \tag{1}$$

$$\text{IC}\_{\text{EVI}} = \frac{dQ\_{\text{C}}}{dV\_{\text{C}}} \approx \frac{\Delta Q\_{\text{C}}}{\Delta V\_{\text{C}}} = \frac{Q\_{\text{C},2} - Q\_{\text{C},1}}{\Delta V\_{\text{C}}} \tag{2}$$

where *QC* and *VC* are the battery charging capacity and battery terminal voltage, respectively. *iC* and *t* are the current and the time in the CC charging phase, which can calculate the charging capacity in the ETI method. *QC*,2 − *QC*,1 is the charging capacity in the CV charging phase.

As shown in the green line in Figure 3, the curve calculated by Equation (1) as the sample is polluted by measurement noise owing to impact by the selected interval. If the interval is small, the IC curve will be noisy, and if the interval is large, the IC curve features will become indistinct [32]. It is challenging to catch useful shape features as the peak characteristic is not transparent. In this study, the Kalman filter (KF) is applied as a proper filtering algorithm to improve the curve smoothing.

**Figure 3.** Smoothing results of IC curve.

Firstly, the evolution of *x =* Δ*Q/*Δ*V* can be modeled as a random walk with additive Gaussian process noise *ω* and measurement noise *υ*, then the state equation and observation equation are as follows:

$$\begin{cases} \ x\_k = A\_k \mathfrak{x}\_{k-1} + B\_k \mathfrak{u}\_k + \omega\_k\\ \ y\_k = C\_k \mathfrak{x}\_k + D\_k \mathfrak{u}\_k + \upsilon\_k \end{cases} \tag{3}$$

where *yk* represents the noise-polluted measurement of *xk*, *uk* is the external input of the system, *ω<sup>k</sup>* represents the measurement noise, and *υ<sup>k</sup>* represents the process noise. *Q* and *R* are defined as the covariance of process noise and measurement noise, *Kk* is the Kalman gain, and *Pk* is the covariance of estimate value. Then, the filtering algorithm based on the nominal model of Equation (3) can be formulated as:

State and error covariance,

$$\begin{cases}
\text{ } \pounds\_0^+ = \text{E}(\text{x}\_0) \\
\text{ } P\_0^+ = \text{E}[(\text{x}\_0 - \text{\h}\_0^+)(\text{x}\_0 - \text{\h}\_0^+)^T]
\end{cases} \tag{4}$$

Process and measurement noise,

$$\begin{cases} \mathbf{Q} = E(\omega\_k \omega\_k^T) \\ \mathbf{R} = E(\upsilon\_k \upsilon\_k^T) \end{cases} \tag{5}$$

State and error covariance time update,

$$\begin{cases} \pounds\_k^- = A\_k \pounds\_{k-1}^+ + B\_k \mu\_k \\\ P\_k^- = A\_k P\_{k-1}^+ A\_k^T + Q \end{cases} \tag{6}$$

Kalman gain,

$$\mathcal{K}\_k = P\_k^- \mathbb{C}\_k^T \left( \mathbb{C}\_k P\_k^- \mathbb{C}\_k^T + \mathcal{R} \right)^{-1} \tag{7}$$

State and error covariance measurement update,

$$\begin{cases} \pounds\_k^+ = \pounds\_{k-1}^- + K\_k(y\_k - \mathsf{C}\_k\pounds\_k^- - D\_k\mu\_k) \\ P\_k^+ = (I - K\_k\mathsf{C}\_k)P\_k^- \end{cases} \tag{8}$$

where *uk* is defined as zero due to no external input of the system, and the system state and system output are *xk = (dQ/dV)k* and *yk* = *ic*d*t*. We set *x*<sup>0</sup> = 0 and *P*<sup>0</sup> = 1. The red curve in Figure 3 is the filtered IC curve and the inflection points can be clearly identified. The smoothing method provides a basis for the further development of the HIs extraction.

#### *2.3. HIs Extraction and Correlation Analysis*

2.3.1. Aging Mechanism Based on IC Analysis

Battery aging is a certainty with corrosion and consumption in the internal material structure of the battery due to electrochemical as well as side reactions in the battery during cycling and storage [33]. According to the research of Ref. [34], the aging mechanisms for LIBs can be categorized into the two main degradation phases: linear degradation phase and accelerated degradation phase. In the linear degradation phase, the battery capacity declines under a linear trend, which is mainly caused by the loss of lithium inventory (LLI), including the formation of SEI film on the surface of the negative electrode, the dilution of electrolyte derived from the side reactions, lithium deposition, and other typical aging mechanisms [35]. In the accelerated degradation phase, the battery capacity is aggravated to decline, where the loss of active material (LAM) emerges as a major factor. The active material is physically damaged and decomposed by the chemical reaction, which affects the intensity of the electrochemical reaction and the transportation of lithium ions between electrodes. Moreover, the products generated by the LLI aging mode, and the polymer decomposed by the electrolyte and lithium deposition can accumulate and be attached to the active material, causing isolation between lithium ions and material as well as material breaking [36,37].

These aging modes are also distinguished in the IC curve. As depicted in Figure 4a, the IC curve in different charging cycles, B05 as the sample, displayed a clear rightward and lower trend along with two obvious inflection points (IP) on the curve, named IP A and IP B. Owe to discrepancy in LIBs internal characteristics, the intensity of inflection is variable. The IC curves of some batteries show slight inflection, and others are inflected into a peak. According to the previous research [32,38] on the degradation mode based on the IC curve feature, it is clear that in terms of LLI and LAM, the intensity of IP A and B will decrease and move toward the higher voltage section during battery cycling, just as Figure 4a. Conversely, IP A and B evolve in opposite trends, and the intensity of IP A is more influenced by LAM mode than LLI mode, and the intensity of IP B mainly depends on LLI mode. Furthermore, for the scenario of EV driving, the battery works in a linear degradation phase as the EOL is most defined as the range from 70% to 80%, so IP B can be more recommended to be the indicator containing aging information than IP A.

**Figure 4.** IC curve and HIs of B05 in different charging cycles: (**a**) IC curve; (**b**) Height and position of IP B as two HIs.

#### 2.3.2. Extraction of HIs

In the driving scenario, the battery cannot be discharged under constant current conditions as in the laboratory, which depends on the unpredictable load demand during driving. It is hard to calculate the capacity through the Ampere hour integral method and capture characteristic aging parameters by the discharging curves. Conversely, the charging process is constant due to the regular charging strategy, where the slight shift can be identified. In this study, the IC curve is calculated through the ETI method for the CC charging process as Section 2.2, using charging voltage and CC charging current data. The height of IP B as shape feature characteristic of IC curve and corresponding voltage standing for inflection point position are selected as two HIs for battery RUL prediction, named F1 and F2. The evolution trends of the two HIs with different cycles, B05 still as a sample, are illustrated in Figure 4b.

Compared to laboratory conditions, the charging process is usually incomplete with commercial chargers due to the driver's habit, but the charging curve still retains the shape characteristics of the CC-CV phase, especially the complete curve shape in the CV charging phase, although the different depth of discharge (DOD) in the different cycle may influence battery charging. Owe to the above-mentioned reasons, the HIs of the CV phase can be extracted directly instead of conversion by the IC method, which ensures the characteristic undistorted transmission in practice.

Similar to the CC phase, there are also some regular shifts like indicators for battery aging in the CV phase as Figure 2 depicted. According to voltage balance Equation (5), because of the increase of polarization voltage *UP* and ohmic internal resistance during degradation, *UT* reaches a cut-off voltage earlier, and the charging process switches to the CV phase in a shorter time, which can lead to different charging time and charging capacity in every cycle as shown in Figure 5.

$$\mathcal{U}\_T = \mathcal{U}\_{\text{CCV}} + \mathcal{U}\_P + \mathcal{U}\_R \tag{9}$$

where *UT* is terminal voltage, *UOCV* is open-circuit voltage, *UP* is polarization voltage, and *UR* is ohmic internal resistance voltage.

**Figure 5.** The charging time and capacity profiles of B05 in the CV phase in every cycle.

Hence, the charging capacity of the CV charging phase is chosen as another HI F3 to characterize the capacity degradation, and it can be formulated by the Ampere hour integral method as:

$$Q\_{cv} = \int\_{t1}^{t2} I\_{cv} dt\tag{10}$$

where *ICV* is the current in the CV charging phase, *t*<sup>1</sup> and *t*<sup>2</sup> are the start-stop time of the CV charging phase.

In conclusion, the height of IP B in the IC curve and corresponding voltage standing for the position of inflection point as F1 and F2 in the CC charging phase and the charging capacity of the CV charging phase as F3 are selected as HIs for battery RUL prediction. F1 and F2 can highlight the slight shifts in the voltage plateau phase in the charging process, and F3 represents the charging condition and polarization of the battery. All the HIs reflect the characteristics of the entire battery charging process.

#### 2.3.3. Correlation Analysis of HIs

To further explore the relationship among the three HIs and probe whether all of them can express the change in the battery capacity, the interaction between the HIs and capacity is analyzed by the Pearson correlation and the Spearman correlation. The analysis results are displayed in Table 1 and Figure 6. As the correlation coefficient of F1 (Height of IP B), F3 (Charging capacity of CV phase) is close to 1, and the absolute value of F2 (Position of IP B) also exceeds 0.94, the correlation of model input parameters is quite significant.

**Table 1.** Results of correlation analysis of three HIs.

**Figure 6.** Relationship of HIs and the reference capacity of B05.

#### **3. Online Estimation Based on AutoML**

#### *3.1. Description of AutoML Model*

We established a novel model based on AutoML, automatically customizing the forecasting pipeline, which consists of data pre-processing, feature engineering, and automatic modeling with less human intervention and trial error manually, covering the complete actions from processing the input data to the deployment of the model.

Each step of data pre-processing, including data cleaning, data augmentation, and data coding, can search the configuration space automatically by some optimization algorithm, such as reinforcement learning and grid search. The pre-processing contributes to input data without polluted noise, avoiding over-fitting of model training and enhancing model robustness. Data pre-processing also involves normalizing, through which the available data can eliminate the effect caused by the different ranges of value in the learning phase. We use the Min-max normalization to map the range of feature S into [*a*, *b*] as follows:

$$\mathbf{x}' = a + \frac{(\mathbf{x} - \min(\mathbf{S}))(b - a)}{\max(\mathbf{S}) - \min(\mathbf{S})} \tag{11}$$

where *x* and *x'* are the value and the transformed value of the feature *S*.

Feature engineering is to automatically construct features from the data so that subsequent learning models can have good performance, with the segment of feature extraction, feature selection, and feature enhancement. It ensures that the features can exclude the redundant variable and be extracted in an appropriate dimensionality for the feature space.

AutoML aims to solve the so-called CASH problem, the short for combined algorithm selection and hyperparameters optimization [39]. This is essentially the task of selecting or combining the appropriate model for the dataset at hand automatically, along with the various hyperparameters tuned properly in every segment of the pipeline.

The model performance mostly depends on a set of hyperparameters that make up the algorithm. Hyperparameters are tuned specifically to that dataset, with some techniques like Regression Trees, and Gaussian Processes [40]. Bayesian optimization has been applied as a successful candidate for hyperparameter tuning, which fits a probabilistic model to capture the relationship between hyperparameters' setting and their measured performance. Then, the most promising hyperparameter setting is selected and evaluated, as well as updated in the model with the result, finishing an entire iteration [41].

The meta-learning approach is complementary to Bayesian optimization for optimizing a model architecture, which is employed to obtain promising configurations to warmstart Bayesian Optimization. Each model trained on data contributes to the configuration space of hyperparameters cross datasets, even if a model performed poorly. The area of meta-learning [42] follows this common strategy that human experts screen known models by reasoning about the performance of learning algorithms and searching with configuration space. Therefore, meta-learning is applied to select instantiations of the given model frameworks, which tend to perform well on a new dataset, from the knowledge of previous tasks [43]. To characterize explicitly discrepancy in dataset repositories, meta-features are introduced as the searching targets and data depiction, denoting some dimensions, such as Statistical meta-features, PCA meta-features, information-theoretic meta-feature, etc. [43]. They comprise the attributes of each dataset and the parameters of the computing model, such as neural network training weights. More precisely, the meta-learning approach works as follows:

Step 1 For each machine learning dataset in a dataset repository, we evaluated a set of meta-features and used Bayesian optimization to determine and store an instantiation of the given ML framework with strong empirical performance for that dataset.

Step 2 Then, for the given new dataset, we compute its meta-features, sort out all datasets by distance metric among them in meta-feature space and select the stored ML architecture instantiations for the limited nearest datasets for evaluation before starting Bayesian optimization with their results.

In this study, we used Auto-sklearn architecture combined with a meta-learning approach and Bayesian optimization, which is a robust new AutoML system based on scikitlearn, comprising 15 classifiers, 14 feature preprocessing methods, and 4 data preprocessing methods, giving rise to a structured hypothesis space with 110 hyperparameters. The architecture improves the existing AutoML methods by automatically taking into account past performance on similar datasets, and by constructing ensembles from the models evaluated during the optimization [44]. EarlyStopping is used as callbacks to prevent overfitting, which can stop training when the loss did not decrease anymore in each epoch. The network training weights in the current epoch are the final training results.

The flow chart for pipelined-based AutoML is show in Figure 7. Configuration space Λ is built up based on the algorithm repository *A* in Auto-sklearn architecture, which comprises the hyperparameters controlling each algorithm. Meta-learning searches the existing dataset *Di* similar to the new dataset *Dnew* in the dataset repository, in which similarity is defined by a distance between two datasets based on meta-features, and initializes a search with the meta-feature *FDi*. For each dataset, meta-features are only computed on the training set. In contrast to human domain experts, Bayesian Optimization does not use knowledge from previous runs on different datasets and uses the matched *FDi* to obtain promising configurations *λ*, in which the model is evaluated with the new meta-feature *FDnew* based. The Bayesian Optimization with meta-learning finishes the pipeline including data pre-processing, feature engineering, and classifier selection.

**Figure 7.** Flow chart for pipeline-based AutoML combined with meta-learning and Bayesian optimization.

#### *3.2. Framework of RUL Prediction Method*

The integrated framework for the life prediction approach is described in Figure 8 and divided into offline training and online prediction. In the offline stage, first, the IC curve is calculated in the CC charging phase, in which the two HIs that inflection point height and position are extracted, with the curve smoothing method derived from the KF algorithm. Combined with the charging capacity in the CV phase of every cycle, all three HIs effectively characterize the battery aging of the entire charging process. Then, a novel AutoML model is established with Auto-sklearn architecture, to realize the automatic design of the pipeline. The model hyperparameters are tuned and the search is optimized in the training stage. In the online stage, the extracted three features are directly applied to predict the battery RUL based on the trained AutoML model.

**Figure 8.** Framework of RUL prediction based AutoML.

#### **4. Results and Discussion**

#### *4.1. Evaluation Criteria*

In this study, root mean square error (RMSE) and mean absolute error (MAE) indexes are applied to evaluate the performance of the prediction method. The formulae for calculating are as follows:

$$\text{RMSE} = \sqrt{\frac{\sum\_{i=1}^{n} (\mathbb{C}\_{real}(n) - \mathbb{C}\_{prd}(n))^2}{n}} \tag{12}$$

$$\text{MAE} = \frac{1}{n} \sum\_{i=1}^{n} \left| \mathbb{C}\_{real}(n) - \mathbb{C}\_{prd}(n) \right| \tag{13}$$

where *n* is the number of cycles, *Creal* is the real capacity, and *Cprd* is the predicted capacity.

In most experiment settings in data-driven methods, the training set and validation set are usually divided on the same single battery, which can implement the online prediction. In the driving scenario, unlike in the bench test, the BMS cannot predict online capacity without the first 60% to 80% battery running data, so the goal of online prediction for the entire battery aging cycle is hard to achieve. To tackle this, we use the leave-one-out evaluation as Chen et al. [45] applied to evaluate their model: one battery is sampled for validation, and the other three batteries are used for training. A total of four trials are conducted and hyperparameters of the model as well as the average evaluating index over all batteries are determined.

#### *4.2. Prediction Results and Analysis*

As the framework showed in Figure 8, the HIs F1, F2, and F3 calculated by raw voltage and current data are used as input of the AutoML model, and the training set and validation set are divided by leave-one-out evaluation.

By means of searching and evaluating, the three methods, poly, rbf, and sigmoid, are used for feature preprocessing, and the five classifiers are selected in the AutoML model, and the classifiers' type and ensembled weight are listed in Table 2. In Table 3 we show the hyperparameters used in this study.


**Table 2.** The classifiers type and ensembled weight in the AutoML model.

**Table 3.** Hyperparameters used in verification.


The capacity prediction and relative error of B05, B06, B07, and B18 are depicted in Figure 9. The red predicted capacity curve approximates the yellow real capacity curve with most prediction errors controlled within 7% in the validation of four batteries, indicating high accuracy and robustness.

**Figure 9.** Capacity prediction results and errors of (**a**) B05, (**b**) B06, (**c**) B07, and (**d**) B18.

Compared to the RMSE value in Table 4, the lowest prediction accuracy among the four batteries is B18. Although it is not as good as other batteries, the MAE of B18 is 0.0479, indicating the test accuracy rate reached more than 95%. According to the development requirements of batteries in EVs, the fault threshold is set to 80% of initial capacity, and the life value can also be predicted. The predicted error is 1 and 8 cycles for B06 and B18 respectively. All of the above illustrates that the proposed online prediction method has high accuracy and reliability.

**Table 4.** Prediction results of four batteries.


We conducted the same training and validation for NASA's data and average evaluation index as in Ref. [45], so we can perform a valid benchmark comparison. Table 5 summarizes the RUL prediction results from various methods, and it is obvious that the proposed method achieves a better performance than other methods, which is presented in Ref. [45]. It is obvious that the prediction based on AutoML is more accurate using the same data set and training method, with the MAE increased by 52.8% and RMSE increased by 48.3% than DeTransformer.

**Table 5.** Comparison of prediction results of AutoML with other ML methods.


#### **5. Conclusions**

In this study, according to our knowledge, we are the first to propose the AutoML model applied in the RUL prediction of LIBs, with the HIs extracted by IC analysis. A smoothing IC curve based on the KF algorithm is employed for HIs extraction and three HIs have been verified to characterize the aging phenomenon in the entire charging process including the CC and CV phases. We proposed a prediction method based on AutoML running in the Auto-sklearn architecture, which can customize the pipeline for specific datasets automatically, overcoming the problem of redundant model information and high computational cost. Then the experiment on NASA's LIBs cycle life dataset verifies the accuracy and robustness. As a next step, we plan to keep on further studies on neural networks in the AutoML model, using neural architecture search to improve pipeline, as well as investigating effective dimensionality optimization techniques for the HIs extraction by IC analysis.

**Author Contributions:** Conceptualization, C.L. and Z.Z.; methodology, C.L. and D.Q.; software, C.L. and Y.L.; validation, C.L. and Y.L.; formal analysis, C.L. and D.Q.; investigation, C.L.; resources, C.L.; data curation, C.L. and D.Q.; writing—original draft preparation, C.L.; writing—review and editing, C.L.; visualization, C.L.; supervision, Z.Z. and X.L.; project administration, C.L. and S.W.; funding acquisition, Z.Z. and X.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China, grant number 51977131.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

