Artificial Intelligence and Sustainable Energy Systems

Volume I

Edited by Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

www.mdpi.com/topics

## **Artificial Intelligence and Sustainable Energy Systems**

## **Artificial Intelligence and Sustainable Energy Systems**

**Volume I**

Editors

**Luis Hern´andez-Callejo Sergio Nesmachnow Sara Gallardo Saavedra**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Luis Hernandez-Callejo ´ University of Valladolid Spain

Sergio Nesmachnow Universidad de la Republica ´ Uruguay

Sara Gallardo Saavedra Universidad de Valladolid Spain

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of Topic published online in the open access journal *Applied Sciences* (ISSN 2076-3417), *Entropy* (ISSN 1099-4300), *Sustainability* (ISSN 2071-1050), *Electronics* (ISSN 2079-9292), and *Energies* (ISSN 1996-1073) (available at: https://www.mdpi.com/topics/Artificial Intelligence Energy Systems).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**Volume I ISBN 978-3-0365-7644-2 (Hbk) ISBN 978-3-0365-7645-9 (PDF)**

**Volume I-III ISBN 978-3-0365-7642-8 (Hbk) ISBN 978-3-0365-7643-5 (PDF)**

© 2023 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**


Reprinted from: *Electronics* **2022**, *11*, 4228, doi:10.3390/electronics11244228 ............. **185**


#### **Qingyuan Wang, Longnv Huang, Jiehui Huang, Qiaoan Liu, Limin Chen, Yin Liang, Peter X. Liu and Chunquan Li**

A Hybrid Generative Adversarial Network Model for Ultra Short-Term Wind Speed Prediction Reprinted from: *Sustainability* **2022**, *14*, 9021, doi:10.3390/su14159021 ................ **423**

### **About the Editors**

#### **Luis Hern ´andez-Callejo**

Luis Hernandez-Callejo is an electrical engineer at the Universidad Nacional de Educaci ´ on a ´ Distancia (UNED, Spain), a computer engineer at UNED, and a PhD candidate at the Universidad de Valladolid (Spain). Professor and researcher at the Universidad de Valladolid. His areas of interest are renewable energy, microgrids, photovoltaic energy, wind energy, smart cities, and artificial intelligence. He has participated in numerous research projects, directed many doctoral theses, and is the author of hundreds of scientific articles.

#### **Sergio Nesmachnow**

Sergio Nesmachnow is a full professor at the Faculty of Engineering, Universidad de la Republica, Uruguay. He is a level III researcher (the maximum level) of the National System of ´ Researchers in Uruguay and a visiting professor at renowned universities and research centers in America and Europe. He has more than 400 publications in scientific journals and international conferences and is responsible for more than 50 research projects.

#### **Sara Gallardo Saavedra**

Sara Gallardo Saavedra is a professor and researcher at the Campus Duques de Soria of the University of Valladolid, Spain. Her research focuses on the detection, characterization, and classification of defects in photovoltaic (PV) modules through the use of thermography, electroluminescence, I-V curves, and visual analysis. She has participated in numerous national and international R+D+I projects, carrying out active dissemination of the results with high regularity in the scientific production, including scientific publications in high impact factor journals, book chapters, and contributing to congresses on advanced maintenance in PV. She has made a predoctoral and a postdoctoral stay in the Unit of Solar PV Energy in the Energy Department of the Energy Research Center, Environment, and Technology (CIEMAT) in Madrid, and the researcher has collaborated with different institutions such as the University of Gavle in Sweden, the Universidad del Valle in Colombia, the National Polytechnic Institute of Mexico, and the University of Cuenca in Ecuador.

## **Preface to "Artificial Intelligence and Sustainable Energy Systems"**

The problems that affect humanity are numerous and occur in different areas. Energy sustainability, climate change, and the effects derived from pollutants and viruses are some of the most relevant problems. The main objective of researchers is to provide solutions to these and other problems.

In recent years, the use of artificial intelligence has increased considerably. Artificial intelligence is used in different areas: energy, sustainability, medicine, health, mobility, industry, etc. Therefore, it is necessary to continue advancing in the application of artificial intelligence to the aforementioned problems. Energy is a precious commodity, and it is increasingly difficult to dispose of it in a sustainable way. In this sense, renewable energy sources are essential, although the use of conventional energy cannot be forgotten. Therefore, sustainable energy systems, integrating renewable and non-renewable energy sources, smart systems, and new business models, are crucial.

Therefore, in this book, the best accepted and published articles on the topic "Artificial Intelligence and Sustainable Energy Systems" are presented. All articles refer to the themes indicated above.

> **Luis Hern´andez-Callejo, Sergio Nesmachnow , and Sara Gallardo Saavedra** *Editors*

### *Article* **Synthetic Dataset of Electroluminescence Images of Photovoltaic Cells by Deep Convolutional Generative Adversarial Networks**

**Héctor Felipe Mateo Romero 1,\*, Luis Hernández-Callejo 2,\*, Miguel Ángel González Rebollo 1, Valentín Cardeñoso-Payo 3, Victor Alonso Gómez 4, Hugo Jose Bello 5, Ranganai Tawanda Moyo <sup>6</sup> and Jose Ignacio Morales Aragonés <sup>4</sup>**


**Abstract:** Affordable and clean energy is one of the Sustainable Development Goals (SDG). SDG compliance and economic crises have boosted investment in solar energy as an important source of renewable generation. Nevertheless, the complex maintenance of solar plants is behind the increasing trend to use advanced artificial intelligence techniques, which critically depend on big amounts of data. In this work, a model based on Deep Convolutional Generative Adversarial Neural Networks (DCGANs) was trained in order to generate a synthetic dataset made of 10,000 electroluminescence images of photovoltaic cells, which extends a smaller dataset of experimentally acquired images. The energy output of the virtual cells associated with the synthetic dataset is predicted using a Random Forest regression model trained from real IV curves measured on real cells during the image acquisition process. The assessment of the resulting synthetic dataset gives an Inception Score of 2.3 and a Fréchet Inception Distance of 15.8 to the real original images, which ensures the excellent quality of the generated images. The final dataset can thus be later used to improve machine learning algorithms or to analyze patterns of solar cell defects.

**Keywords:** generative adversarial neural networks; photovoltaics; artificial intelligence; synthetic data; electroluminescence

#### **1. Introduction**

A number of factors (energy crisis, wars, climate change, etc.) are causing a rise in renewable energies use. Solar energy can be easily and affordably converted either into thermal energy by means of thermal panels or into electrical energy, using photovoltaic panels (PV) [1]. Industrial plants generating electricity from solar energy, commonly known as solar farms, are generally composed of a high number of photovoltaic (PV) panels made of PV cells. As the number of installations increases, the maintenance of solar farms becomes a nontrivial problem [2]. The energy produced depends on different conditions, such as the state of the panels, the climate, or the time of year. Solar panels are also vulnerable to phenomena that can reduce or nullify their performance. These issues make necessary a system to control and optimize production, since manual human labor is not enough as the number of panels gets higher.

Artificial intelligence is usually applied to solve difficult control and optimization problems. When applied to PV systems, different AI methods such as fuzzy logic, meta-

**Citation:** Romero, H.F.M.; Hernández-Callejo, L.; Rebollo, M.Á.G.; Cardeñoso-Payo, V.; Gómez V.A.; Bello, H.J.; Moyo, R.T.; Aragonés, J.I.M. Synthetic Dataset of Electroluminescence Images of Photovoltaic Cells by Deep Convolutional Generative Adversarial Networks. *Sustainability* **2023**, *15*, 7175. https://doi.org/ 10.3390/su15097175

Academic Editors: Jack Barkenbus and Firoz Alam

Received: 24 February 2023 Revised: 23 April 2023 Accepted: 24 April 2023 Published: 25 April 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

heuristics, or neural networks have been used to solve problems [3]. The most important problems include [4] max power point tracking, output power forecasting, parameter estimation, and defect detection.

Defect detection is one of the most interesting problems and researched topics in PV systems [5,6]. It is usually tackled with different kinds of neural networks, since they have great performance, although, as an important drawback, they need high amounts of data to perform better than other machine learning methods [7]. This could be a major impossibility for problems where it is difficult to harvest new data. Most of the approaches to detect the state of PV panels use electroluminescence images of the cells as an input, which is an invasive method that makes it difficult to carry out measurements and gather data.

Data augmentation is the most common method to deal with image data scarcity by means of the introduction of slight modifications (rotations, flips, and minor deformations) to the original images in order to create new images [8–10]. More recent papers promote the use of more complex AI techniques, such as Generative Adversarial Networks (GANs), to generate synthetic images [11,12]. GANs are state-of-the-art algorithms for data generation [13]. They have also been applied to PV systems for solving different problems [14].

The generation of synthetic EL images of PV cells using GANs has also been proposed in other works [11,12]. These works present synthetic datasets created with different GAN architectures trained with EL images of cells with different kinds of defects. Although the datasets in these works have not been made public, from their work, it can be deduced that the synthetic images are just labeled from visual inspection in order to train standard defect/normal classifiers, ignoring the output energy output performance of synthetic cells, since they cannot be measured.

In this paper, we present a new approach to deal with the image data scarcity problem. Starting from a small set of electroluminescence images of PV cells obtained under experimental conditions to be described, a synthetic dataset of images was created using Generative Adversarial Networks (GANs). For each synthetic image, a scalar value that represents the performance of their energy production is associated using machine learning techniques trained with the energy production of the original PV cells. It is also a continuation of the paper presented in [15], where we augmented and improved the dataset and applied new metrics to ensure the appropriate quality of the generated data.

To ensure reproducibility, the resulting dataset was made publicly available, so it can be used to improve the performance of AI models, to analyze the characteristics, properties, and defects of the cells, or to compare with other methods of generation of synthetic images.

This paper is structured as follows: Section 2 briefly reviews the Generative Adversarial Networks, Section 3 presents the methodology that was followed to generate the dataset, in Section 4, the synthetic dataset is described, and finally, the conclusions of the paper are presented in Section 5.

#### **2. Generative Adversarial Networks**

Generative Adversarial Networks (GANs) [16] are one of the most important and popular technologies nowadays and have been applied to different fields [17]. They can be applied to semisupervised and unsupervised learning. A GAN is usually defined as a pair of neural networks that are competing against each other. The network known as the Generator is the one designed with the job of trying realistic new data to try to deceive the other network, known as the Discriminator. This network has to decide if the data that it receives were forged.

The Generator does not have access to the real data, which is an important feature of these algorithms. The Generator has to learn how to create the data based on the feedback from the Discriminator. The Discriminator has access to both kinds of data, but it does not know which kind of data is an image before carrying out the prediction. The networks change their weights depending on the results of the deception; the Generator uses them to improve the forgeries, while the Discriminator tries to improve its recognition of forgeries. Figure 1 shows a diagram of the behavior of the algorithm.

**Figure 1.** Diagram of a GAN.

The basic principle of operation of a GAN can be expressed as a two-player minimax game played between D and G, with a value function *VGAN*(*G*, *D*) given by the following mathematical expression [16], a binary cross-entropy function, commonly used in binary classification problems:

$$\min\_{G} \max\_{D} V\_{GAN}(G, D) = \mathbb{E}\_{\mathbf{x} \sim p\_{data}(\mathbf{x})} [\log D(\mathbf{x})] + \mathbb{E}\_{\mathbf{z} \sim p\_z(\mathbf{z})} [\log(1 - D(G(\mathbf{z})))].\tag{1}$$

The first term in (1)represents the expected value of entropy given by the Discriminator over the real data and the second is the entropy given by the Generator over the fake data obtained.

GANs are usually composed of Deep Feed-forward Networks, but more complex architectures can be used in order to improve the generative capacities of the algorithm and the quality of the forged data. One of the most used architectures are Convolutional Neural Networks.

#### *Deep Convolutional GAN*

Research on GANs has led to new interesting architectures which substantially improve the performance of the networks and the quality of the forged data [18]. For this paper, we implemented the architecture known as Deep Convolutional GAN [19]. This architecture is based on convolutional layers, but it also provides a set of constraints in order to provide more stabilized training and better quality in the output. The most important guidelines are the following:


#### **3. Methodology**

The correct preparation of the real and synthetic image datasets is a complex process which requires several steps to complete. In this section, we present the methodology that was followed in this research, which implies four relevant stages: manual acquisition of real EL images of PV cells beside their electrical characteristics (IV curves); data preprocessing to prepare for synthetic image generation using GAN; maximum power output assignment to synthetic cells from regression models trained with real images; and model and result validation.

#### *3.1. Real Images Acquisition*

Data availability is one of the most critical issues when using deep learning techniques. For precise PV cell characterization, we need two different kinds of data: the electroluminescence (EL) images of the photovoltaic cells and their corresponding IV curves. For the first, there are some public datasets available in the bibliography [20], but they do not include information about the IV curve, since it is not easy to measure the individual curve of a single PV cell. To solve this problem, we had to obtain the data using a specific technique developed previously [21] and following a manual process in the laboratory.

The capturing of the EL images was performed using a Hamamatsu InGaAs Camera C12741-03. The cell was isolated from external light in other to avoid the light produced by other kinds of radiations. Figure 2a presents the camera for taking the pictures. EL was chosen as the image capture technique, since it is widely used for detecting defects on PV modules [22] and it is the most used technique in other related works with PV systems. In order to test the different levels of luminous emission of the cell, various values of current were used to power the LED array when obtaining the IV curve and to the PV cell when capturing the EL image.

To measure the IV curve, we used a device specifically designed to measure the values of current and voltage of a single cell [21]. This device (Figure 2b) provided voltage and current values to build the IV curve of the cells. The IV curves can be used to calculate the max power point and the performance of the cells. Figure 2c shows the setting used for taking the IV curves. The original paper for this device validates the accuracy of the measurements.

**Figure 2.** Devices used to obtain the data: (**a**) InGaAs Camera; (**b**) IV curve tracer; and (**c**) setting to measure the IV curves.

The number of damaged cells was extremely limited in quantity and variety. The solution for this issue was to create artificial shadows in order to improve the amount and variety of images. The different shadows were created taking into account the most important defects found in solar farms. Figure 3 presents a representative image of each kind of artificial shadow that was used.

The final acquired dataset was composed of 602 different EL images with their corresponding IV curves. In order to allow the repeatability of the experiments, the dataset was made publicly available (The dataset can be downloaded from https://github.com/ hectorfelipe98/Synthetic-PV-cell-dataset. The repository includes the original images, the synthetic dataset, and a CSV file that maps each image with its relative power and its class).

**Figure 3.** Shadows and defects used to create the dataset: (**a**) shadow/defect 0: original image; (**b**) shadow/defect 1: long horizontal line; (**c**) shadow/defect 2: long vertical line; (**d**) shadow/defect 3: big central circle; (**e**) shadow/defect 4: small central circle; (**f**) shadow/defect 5: two small circles in the corners; (**g**) shadow/defect 6: big circle in one corner; and (**h**) shadow/defect 7: other defects.

#### *3.2. Image Preprocessing*

The captured EL images were not suitable to be directly used with machine learning algorithms, since there were factors affecting image quality to be addressed first. For this reason, some algorithms were used to improve both the quality and normalization of the images. This section explains how the different problems found in image quality were solved.


**Figure 4.** Image before and after applying color normalization: (**a**) original image; (**b**) original histogram; (**c**) image after the min–max normalization; and (**d**) histogram after the normalization.

**Figure 5.** Cell after removing the surrounding image contour corresponding to wall portions.

#### *3.3. Generation of the Synthetic Images*

As announced, a Deep Convolutional Generative Adversarial Network was chosen to generate extra images from the original recorded set. The model is composed of two different networks: the Generator and the Discriminator. This section explains the architecture and parameters of both networks and the process of training.

#### 3.3.1. Generative Network

The generative network was created following the principles of the DCGN. It has 3 different Convolutional Transpose Layers (Deconvolutional) in order to generate patterns. The use of these layers in combination with batch normalization [23] improves the generative capacities of the network and improves the stability of the training, The network uses Leaky Relu [24] as an activation function, since it usually outperforms the standard Relu. The architecture of this networks is represented in Figure 6. The input to the network

(Figure 7a) is a random noise array from a normal distribution. The output is a 200 × 200 image (Figure 7b. This size was chosen in order to reduce the computation cost of the algorithm while still obtaining images with a large amount of information. Other important hyperparameters can be found in Table 1.

**Figure 6.** Architecture of the generative network.

**Figure 7.** GAN's output images before and after training. (**a**) Output image before training. (**b**) Output image after training.

**Table 1.** Hyperparameters for both networks


3.3.2. Discriminator Network

The Discriminator network also followed the principles of DCGAN. The network uses different Convolutional Neural Networks to find patterns from the images. The typical feed-forward part of the network was also removed, with only the output layer remaining. The use of dropout layers and batch normalization improves the generalization capacity of the networks and stabilizes the training. The architecture is represented in Figure 8. The input corresponds to a 200 × 200 image and the output to a binary value that determines if the image is a real cell or a forgery. The other important hyperparameters were the same as the Generator, so they can be found in Table 1.

**Figure 8.** Architecture of the Discriminator network.

#### 3.3.3. Training the GAN

The training was performed simultaneously in both networks with all the available samples (602). The training loop starts with the creation of synthetic images by the Generator using random seeds. After that, the Discriminator is provided with a mix of real and synthetic images. That input data are used to train the network, the loss is computed for each of the networks, based on the results of the Discriminator. Figure 9 shows the evolution of the loss for both networks. In the first epochs, the loss of the Discriminator network is quite high. It has not learned the patterns of the original images, so it cannot identify which images are real or forgered, even if the forged images are extremely similar to noise. After a while, the Discriminator reduces its loss since it can differentiate which images are real; this provokes an increase in the loss of the Generator. After that, the loss of the Generator steadily decreases its value since it starts to learn the patterns to create images similar to the original ones. Until epoch 400, it continues to learn when it obtains its lowest value. At the same time, the Discriminator is unable to identify which images are real, making its loss its maximum value. After epoch 400, the values have almost no change, so it can be concluded that the training should end at this point.

**Figure 9.** Evolution of the Generator and Discriminator loss.

The training was performed with a CPU AMD Ryzen 7 5800H, 16 GB of RAM, and a GPU Nvidia Geforce GTX 1650. It took 2 h and 41 min to complete the training.

After completing the training, the Generator network was used to generate the synthetic dataset. A total of 10,000 different images were created using 10,000 different random seeds. Figure 10 presents a selection of some generated images.

**Figure 10.** Samples of the synthetic images generated by our GAN-based method.

#### *3.4. Assigning Maximum Power to Cells*

The labeling of the images was not a trivial process, since it required associating each image (real or synthetic) with an output maximum power value, which represents the performance of the cell. This section explains how this association was made both for original and GAN-generated images.

#### 3.4.1. Original Images

As explained before, each EL image has an associated IV curve. The IV curve provides information about the performance of the cell, which can only be obtained after taking the following steps:


The resulting value will measure the relative performance of the cell, normalizing its MPP with the maximum value of the cells of the same group. This maximum is the mean of the 5 highest values, since it reduces the effects of potential incorrect measurement. Values of relative power near 1 will correspond with cells in good condition, with only a few or no defects. Low values correspond with underperforming cells, mostly due to their defects.

#### 3.4.2. Synthetic Images

The labeling of the synthetic dataset was a completely different problem. As explained, the values of the original images were calculated based on their IV curve. This process cannot be replicated in the synthetic images, since they are not real cells; so, they cannot be measured.

To solve this problem, we formulated it as a regression problem that can be solved using a machine learning model. The model is trained with the full dataset of original images, together with their normalized power (MPP) calculated from IV curves, as explained in the previous subsection (602 samples). The chosen model is Random Forest [25], since it provided a low error in the original dataset and showed excellent generalization power to associate the MPP to the synthetic images. The implementation of the algorithm in the Sklearn library [26] was used for this work. The tuning of the hyperparameters of the RF model was carried out using the Grid Search method in the Sklearn library (GridSearchCV), obtaining the optimal values shown in Table 2.


**Table 2.** Estimation of Random Forest hyperparameters using GridSearchCV.

Since Random Forest is not suitable to work directly on raw images, some features were extracted from the images. The features are based on typical statistics (mean, standard deviation, etc.) and other characteristics directly extracted from the histogram (amounts peaks, peaks width, peaks height, amount of colors, etc.) A complete list can be found in Table 3. Feature selection (FS) is an important step in the preparation of machine learning models. We used correlation-based FS. As depicted in Figure 11, the cross-correlation between all the original sets of features shows that almost no feature is highly correlated with the others, except for the standard deviation and the variance, which are completely dependent on each other, meaning one of them can thus be safely removed from the final set of features.

The dataset was split into two sets: training (67%) and validation (33%). We decided to only use two due to our data limitation. The target variable was the relative power of each cell, standardized between 0 and 1.

**Table 3.** Features for Random Forest Regressor.


The model obtained a Mean Absolute Error (MAE) of 0.041 and a Mean Squared Error (MSE) of 0.0038 in the validation dataset. The distribution of the predictions of the model can be found in Figure 12. The low error and the similarity in the distribution confirm the validity of the model. The distribution of the prediction for the synthetic dataset can also be observed. Finally, the images were divided into two groups according to their predicted power (class 0 > 0.8 and class 1 <= 0.8). In total, 6963 images were classified as class 0 and 3037 as class 1.

**Figure 11.** Correlation heatmap of the initial set of features.

**Figure 12.** Histograms of real and predicted normalized power of the original and generated dataset.

#### **4. Results**

The resulting dataset was divided into two different folders, one for each class: Class 0 (6963 samples, Figure 13a) represented the images whose relative power is at least 0.8, and the images in that class can be considered as functional PV cells. Class 1 (3037 samples, Figure 13b) represented the images with a power of less than 0.8, and the images in that class can be considered as underperforming PV cells.

**Figure 13.** Sample of images of both classes. (**a**) Sample of images of class 0. (**b**) Sample of images of class 1.

#### *4.1. Visual Analysis*

For ensuring the quality and similarity of the images, we propose two different methods: in this section, an analysis based on visual characteristics and histogram, and in the next section, an analysis based on different metrics.

As can be observed in Figure 13, the generated images present a similar structure while presenting new patterns of shadows different from the original ones (Figure 3). This is an interesting feature produced by the generative capacity of the GAN, since it can combine the different kinds of shadows presented in the original images in order to create new kinds of patterns. This improves the variety of shadows presented in the dataset.

Figure 14 presents the distribution of the features selected previously for labeling (Table 3). For each feature, the relationship between the values of the feature and the relative power of the cell is presented. Synthetic images are represented with orange dots, and the original images are represented with blue dots. In most features, the original dataset images appear as a subset of the synthetic dataset, with some exceptions caused mostly by underrepresented cases. This means that synthetic images not only present the characteristics of the original images, but they also present new cases of defects or shadows while maintaining the most important characteristics. This is mostly produced by the generative properties of the GAN, which can create new patterns combining the patterns of the input data; these new patterns improve the diversity of the dataset. This can lead to an improvement in performance in the machine learning methods that use this dataset. Another interesting finding is that the most underrepresented cases in the original data do not appear in the synthetic data; this is also provoked by the properties of the GAN, since it needs a considerable amount of samples to find patterns.

#### *4.2. Histogram Analysis*

The histogram of the images gives a lot of information about them. Figure 15a presents the mean histogram of all images of class 0 of the original dataset and the mean for all of the pictures of class 0 of the synthetic one. It can be seen that the images in this class present a large amount of light gray–white pixels (Values near 200), but they also can present some minor defects or shadows, as can be seen by the number of black pixels (values near 0). A difference between both datasets can be seen: the synthetic dataset images have higher but narrower peaks and are sometimes a bit moved to the left.

**Figure 14.** Distribution of the relative power generated by a cell as a function of the value of each of the sixteen features used to characterize the images: orange dots—synthetic images; blue dots—original images.

Figure 15a presents the same information for the images in class 1. The images in this class present a large number of dark pixels due to their defects and shadows. The amount of lighter pixels is considerably lower. In the synthetic images, the peak of black pixels is higher, but its width is narrower. The light pixels are extremely similar to the original ones.

**Figure 15.** Different generated images of both classes.

As shown, the aspect of the histograms of both datasets is quite similar. The minor differences are mostly produced due to the augmented variety of patterns of defects and shadows.

Figure 16 presents two different cells of class 1: one original and one synthetic with similar aspects. Visual inspection of both histograms shows that they have a similar structure, presenting the same amount of peaks and even placed in a similar position. Nevertheless, synthetic images tend to show a more symmetrical histogram and a shift from maximum to lower intensities, since the extreme intensity values are less frequent than in real images.

(**a**) A defective synthetic cell with its histogram

(**b**) A defective original cell with its histogram

**Figure 16.** Comparison of a defective synthetic cell and a defective original cell.

Figure 17 presents the same for two images of class 0. Both images have almost no apparent defects. Their histograms have similar shapes, as can be seen by the number of peaks and their placement.

(**b**) A defective original cell with its histogram

**Figure 17.** Comparison of a defective synthetic cell and a defective original cell.

Figure 18a presents a comparison of the histograms of the position of the maximum, found in the right half of the histograms (gray/white colors); it can be seen that both histograms have a similar shape. Figure 18b presents the histograms of the number of pixels that have low values (up to 10% of the maximum values); it can be seen how the synthetic images do not completely imitate the original ones. A similar case can be seen in Figure 18c, which represents the histograms of the number of pixels in the last decile; it shows how even if the shapes are pretty similar, there is a shift to the left. It seems this GAN method has some flaws when finding the patterns around the most extreme values. This issue is not critical but shows that this method still has some room to improve.

**Figure 18.** Comparison of the different aspects of the histograms of both original and synthetic images: (**a**) histogram of the position of the peak of gray/white colors for both original and synthetic images; (**b**) histogram of the number of dark pixels (first decile) for both original and synthetic images; and (**c**) histogram of the number of white pixels (last decile) for both original and synthetic images.

(**c**)

#### *4.3. Image Quality Metrics*

Previous works related to the synthetic generation of EL images of PV cells have not addressed the issue of ensuring the quality of their data by providing objective metrics. The Inception Score (IS) and the Fréchet Inception Distance (FID) are the most important metrics to ensure the quality of synthetic images. In the next paragraphs, both metrics are explained and used. A summary of the results of these metrics can be found in Table 4.

**Table 4.** Metrics for ensuring the quality of the synthetic dataset (Original: O, Synthetic: S, and Noise: N).


#### 4.3.1. Inception Score

This metric was first proposed in 2016 [27] for evaluating the quality of generated artificial images. The score is computed based on the results of a pretrained InceptionV3 model [28] applied to the generated images. This score is maximized when two conditions are met: The value of the labels is the same for each image, in other words, the entropy of the distribution is minimized. The other condition is that the images are diverse, meaning that the labels are evenly distributed across all possible labels.

For our problem, we used a custom implementation of the metric based on Python and Tensorflow. We compared the results of three different datasets: the original dataset, the synthetic dataset, and a dataset composed only of noise. For each experiment, we split the evaluated dataset into 10 subsets and computed the IS of each one; after that, we computed the mean and the standard deviation. This process reduces the memory cost of the algorithm and the effects of randomness.

The original dataset obtained a mean score of 2.1440 with a standard deviation of 0.0559, the synthetic dataset obtained a mean score of 2.3418 with a standard deviation of 0.4079, and the noise dataset obtained a mean score of 1.0506 with a standard deviation of 0.0026. The results show that both datasets do not obtain a great result with this metric, but their results are better than the results of the noise. This is provoked mostly by the fact that Inceptionv3 was not trained to deal with EL images. It can also be observed that the results of both datasets are similar, with an error of 9.2%. The fact that [26] produced similar IS values for both shows that they have a high similarity. This proves the quality of the synthetic dataset but also shows that it has some room to improve.

#### 4.3.2. Fréchet Inception Distance

This metric was first proposed in 2017 [29] to evaluate the quality of synthetic images. In contrast to the IS, this metric compares the distribution of the synthetic data with the distribution of the original data, measuring the similarity between the two datasets.

As in the other case, we used a custom implementation based on Python and Tensorflow. We compared the same three datasets as in the other metric. To compare each dataset with itself, we divided the datasets into two halves after shuffling them. We repeated the process five times in order to reduce the effects of the randomization. The results show that the values of the datasets with themselves were near 0 (0.43118 originals, 0.15095 synthetics, and 2.5117 noise).

We also compared the distance between the three datasets: 15.808 between originals and synthetics, 293.82 between originals and noise, and 296.77 between synthetics and noise. It can be seen that both originals and synthetics are considerably far from noise. It is true that the distance between them is not a value near 0, but it is still a low value, which shows that the difference between both datasets is not high. This is reinforced when this difference is compared with the difference between the noise and both datasets. The difference from 0 is mostly provoked due to the new patterns of shadows and defects that are generated due to the combinations of the different shadows of the original images, thanks to the generative capacity of the DCGAN models.

#### **5. Conclusions and Future Work**

The creation of synthetic electroluminescence images of photovoltaic cells is not a trivial problem. Different factors need to be taken into account in order to create highquality images. The gathering of data is by its nature a manual process, requiring the EL image and its IV curve. The IV curve was measured individually for each cell, which is an innovation from other papers in the bibliography. The obtained images need proper preprocessing in order to perform well in the models. The labeling of the generated data is also a complex problem, since it is not possible to measure the output power or the IV curve of a not-real cell; so, we designed a model to assign a value of performance. This model was trained with the values of the real images, and its low error values show its good performance.

The algorithm used for creating the new images, DCGAN, has shown great performance, not only creating images that are very similar to the originals but also creating new patterns of defects and shadows. This similarity between the original images and the synthetics images was proved using different methods: A visual inspection of the cells shows that they present the characteristics of a real PV cell, and the histogram also proves their similarity, since they share the most important aspects of its shape. Image

quality metrics such as Inspection Score (2.3) and French Inception Distance (15.8) also prove their similarity.

The results of this paper prove the quality of the synthetic images, but the dataset can be improved. The most direct way of improving the dataset would be increasing the amount of data. The inclusion of new kinds of defects and shadows would improve the generative capacities of the GAN. Another interesting option would be trying to use different kinds of PV cells, such as monocrystalline PV cells, since our dataset only consists of polycrystalline PV cells. This would improve the usefulness of the dataset in machine learning problems that use other kinds of PV cells.

Another way of improving the quality of the dataset would be increasing the size of the networks. The size of the current networks are based on the limitations in hardware and budget. Bigger networks could reduce the amount of epochs and improve the quality of the generated images. The use of the most innovative architectures could also improve the dataset.

**Author Contributions:** Conceptualization, H.F.M.R., L.H.-C. and V.C.-P.; methodology, H.F.M.R., V.C.-P. and M.Á.G.R.; validation, H.F.M.R. and V.A.G.; writing—original draft preparation, H.F.M.R. and H.J.B.; writing—review and editing, H.F.M.R., L.H.-C., V.C.-P., J.I.M.A. and R.T.M.; project administration, L.H.-C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by the University of Valladolid with the predoctoral contracts of 2020, co-funded by Santander Bank.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The synthetic dataset can be found in https://github.com/hectorfelipe9 8/Synthetic-PV-cell-dataset.

**Acknowledgments:** This study was supported by the Universidad de Valladolid with ERASMUS+ KA-107. We also appreciate the help of other members of our departments.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **An Adaptive Hybrid Model for Wind Power Prediction Based on the IVMD-FE-Ad-Informer**

**Yuqian Tian 1, Dazhi Wang 1,\*, Guolin Zhou 1, Jiaxing Wang 1, Shuming Zhao <sup>1</sup> and Yongliang Ni <sup>2</sup>**


**Abstract:** Accurate wind power prediction can increase the utilization rate of wind power generation and maintain the stability of the power system. At present, a large number of wind power prediction studies are based on the mean square error (MSE) loss function, which generates many errors when predicting original data with random fluctuation and non-stationarity. Therefore, a hybrid model for wind power prediction named IVMD-FE-Ad-Informer, which is based on Informer with an adaptive loss function and combines improved variational mode decomposition (IVMD) and fuzzy entropy (FE), is proposed. Firstly, the original data are decomposed into *K* subsequences by IVMD, which possess distinct frequency domain characteristics. Secondly, the sub-series are reconstructed into new elements using FE. Then, the adaptive and robust Ad-Informer model predicts new elements and the predicted values of each element are superimposed to obtain the final results of wind power. Finally, the model is analyzed and evaluated on two real datasets collected from wind farms in China and Spain. The results demonstrate that the proposed model is superior to other models in the performance and accuracy on different datasets, and this model can effectively meet the demand for actual wind power prediction.

**Keywords:** wind power prediction; improved variational mode decomposition; fuzzy entropy; adaptive loss function; Informer

#### **1. Introduction**

The global energy-shortage problem is becoming more and more serious, and it is essential to accelerate the pace of energy structure transformation based on the increasing proportion of renewable energy. Wind power, as an economical and environmentally friendly emerging renewable energy source, has been vigorously developed by various countries, and its application prospects are promising [1,2]. However, with the random fluctuation of wind power, it has strong uncontrollability, resulting in a decrease in the dispatching efficiency of the power grid and an imbalance between energy supply and demand [3]. Therefore, achieving high-accuracy and high-reliability prediction of wind power in practical applications can minimize energy loss and make the power grid operate more stably and safely.

In recent years, a large number of scholars have studied wind power prediction models, which can be mainly divided into physical models [4], statistical models [5], artificial intelligence (AI) models [6], and hybrid models [7]. The physical models are based on the method of fluid mechanics, which uses numerical weather prediction data to calculate the wind turbine output curve and then calculate wind power from it [8]. However, the fluid mechanics method has the disadvantages of the high complexity of building the model and massive computational cost. The statistical models are based on the mapping relationship between historical data and future data [9,10]. Rajagopalan et al. [11] proposed an autoregressive moving average (ARMA) model for ultra-short-term wind power forecasting and achieved superior results. The autoregressive integrated moving

**Citation:** Tian, Y.; Wang, D.; Zhou, G.; Wang, J.; Zhao, S.; Ni, Y. An Adaptive Hybrid Model for Wind Power Prediction Based on the IVMD-FE-Ad-Informer. *Entropy* **2023**, *25*, 647. https://doi.org/10.3390/ e25040647

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 6 March 2023 Revised: 5 April 2023 Accepted: 11 April 2023 Published: 12 April 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

average (ARIMA) model is a widely used statistical model that is based on ARMA with the addition of difference operation [12,13]. However, this method requires a large-scale dataset, making it difficult to mine the nonlinear relationship of complex data.

AI models are current technical trends and are widely used in the field of largescale and multi-dimensional data prediction [14]. AI models are mainly divided into machine-learning models and deep-learning models [15]. For example, an echo state network (ESN) [16] was applied to wind speed forecasting and improved the prediction performance. Khan et al. [17] used the Naive Bayes Tree (NB) to extract the probabilities of each feature of wind power, successfully predicting wind power values from hours to years. Machine-learning methods are based on rigorous mathematical theories that enable rapid computation in high-dimensional spaces. Owing to weak generalization ability, machine-learning methods are prone to overfitting, and it is difficult to achieve good prediction effects. In contrast, deep-learning models unify feature-learning tasks and prediction tasks into one model, making them more suitable than shallow machine-learning models to solve wind power prediction problems in complicated uncertainty scenarios [18]. Tian et al. [19] used a model based on the attention mechanism and demonstrated its efficacy in wind power prediction. Liu et al. [20] presented a novel deep convolutional neural network (CNN) capable of automatically extracting hidden information from multidimensional data and efficiently implementing multi-step prediction. Hu et al. [21] applied a model integrated with a deep-learning framework and basic ESN network for energy prediction, which enhanced the model's memory capacity with a stacked hierarchy of reservoirs. Although these methods have achieved some success in wind power prediction, the fact that a single model cannot fully exploit the time series information leads to limited prediction performance [22].

Hybrid models are created by multiple intelligent algorithms or prediction models that combine the advantages of different models to achieve an improvement in prediction accuracy. Hybrid prediction models consist of combined multiple models and stacking models based on data processing [23,24]. Chen et al. [25] designed a weighted combination prediction composed of six long short-term memory networks (LSTM), and its prediction effect is better than that of a single prediction model. Xiong et al. [26] proposed a multiscale hybrid prediction model that combines attention mechanism, CNN, and LSTM to adequately capture the high-dimensional features in wind farm data. Zheng et al. [27] established a hybrid model combining bidirectional long-short-term memory (Bi-LSTM) and CNN, which adopted a unique feature extraction method of space and then time. Although the combined prediction method of multiple models exhibits high prediction accuracy, it suffers low computational efficiency and narrow application scenarios [28]. Considering the nonlinear implicit relationship in the time series of wind power data, the stacking model based on data processing is proposed to improve the prediction accuracy by mining deep features through data decomposition. For example, Wu et al. [29] proposed a multi-step prediction method using variational modal decomposition (VMD) and chain ESN, which achieved multi-steps prediction at multiple time scales. Yang et al. [30] employed the VMD method to decompose the wind speed data, which was then utilized as input for an optimized LSTM network to perform predictions. Ren et al. [31] proposed a hybrid model of empirical-mode decomposition (EMD) and support-vector regression (SVR) for wind power prediction. Lv et al. [32] decomposed wind speed data into 3-dimensional input features using singular spectrum analysis (SSA) and fed them into a convolutional longshort-term memory (ConvLSTM) network, which effectively enhanced the local correlation between multivariate data. Khazaei and Ehsan [33] used prediction methods combining wavelet transform (WT) decomposition with the AI model, and the results showed that the model has high accuracy. Hybrid models based on data processing have a simple structure and strong feasibility, but the accuracy of the prediction greatly depends on the effectiveness of data decomposition. Over-decomposition of data can result in redundant components and reduce the efficiency of calculation, while the insufficient decomposition of data can lead to mode mixing, which fails to meet the needs of high-precision prediction [34].

Most prediction models use mean square error (MSE) as a loss function, which needs to meet the condition that prediction errors obey a Gaussian distribution. However, the use of MSE as a loss function in models that are insensitive to outliers in wind power data with high randomness may result in large errors [35]. In response to this issue, some researchers have improved the loss function of the model to minimize the impact of errors on the prediction results. Hu et al. [36] proposed a loss function without fixed distribution, which effectively solves the problem of prediction-gradient descent at wind power intervals. Duan et al. [35] designed a loss function with non-Gaussian distributed errors and combined it with an LSTM model to predict wind power. The loss function is significantly important in the wind power prediction process as it determines the training direction and accuracy of the model [37]. Although these improved loss functions have positive effects on wind power prediction, the models still exhibit limitations in terms of their adaptability and robustness.

Based on the above analysis, a robust and adaptive IVMD-FE-Ad-Informer hybrid model for wind power prediction is proposed in this paper, which aims to improve the precision of data decomposition and the predictive performance of non-stationary wind power data. The main contributions of this paper are outlined as follows: (1) Considering the difficulty in selecting the number of VMD, the IVMD algorithm improved by the maximum information coefficient (MIC) decomposes the original wind power data into *K* optimal sub-series, which effectively reduces the difficulty of wind power prediction. (2) fuzzy entropy (FE) is used to reconstruct sub-series into new elements of similar complexity together, alleviating the burden of model operation. (3) An adaptive loss function is innovatively introduced into the Informer network to solve the problem of traditional MSE's insensitivity to randomly fluctuating wind power data. This novel model can reduce the impact of outliers in non-smooth wind power data. (4) Ablation experiments and comparative experiments are performed on datasets collected from both different wind farms to verify the effectiveness and stability of the model. The prediction results show that the proposed model framework is reasonable, and it exhibits significantly better prediction performance and accuracy compared to other models.

The specific contents of this paper are as follows: Section 2 specifically describes the basic methodologies of hybrid models; Section 3 presents the construction framework and evaluation indicators of the IVMD-FE-Ad-Informer model; Section 4 constructs four experiments to verify the accuracy and validity of the proposed model; and finally, this paper is summarized in Section 5.

#### **2. Methodologies**

#### *2.1. Variational Mode Decomposition*

VMD [38] is a commonly used data decomposition method that converts wind power sequences from the time domain to the frequency domain and subsequently decomposes them into *K* intrinsic mode functions (IMFs). Firstly, build the variational constraint equation:

$$\begin{cases} \min \left\{ \sum\_{K=1}^{K} \left\| \partial\_{t} \left[ \left( \delta(t) + \frac{i}{\pi t} \right) \* \boldsymbol{\mu}\_{K}(t) \right] e^{-i \boldsymbol{\nu}\_{K} t} \right\|\_{2}^{2} \right\} \\ \quad \text{s.t. } \sum\_{K=1}^{K} \boldsymbol{\mu}\_{K} = \boldsymbol{x}(t) \end{cases} \tag{1}$$

where ∗ is the convolution calculation symbol, *x*(*t*) is the wind power sequence, *wK* and *uK* are the central frequency and band components of the *k*th IMF value, *δ*(*t*) is the impulse function, and *∂<sup>t</sup>* is used to denote the derivative of the function.

To simplify the variational constraint equation to a simple unconstrained problem, the Lagrange function *λ*(*t*) and the penalty factor *α* are introduced:

$$L(\{\boldsymbol{u}\_{K}\},\{\boldsymbol{w}\_{K}\},\boldsymbol{\lambda}) = \boldsymbol{u} \sum\_{K=1}^{K} \left| \left| \partial\_{t} \left[ \left( \boldsymbol{\delta}(t) + \frac{\boldsymbol{i}}{\pi t} \right) \* \boldsymbol{u}\_{K}(t) \right] \boldsymbol{\varepsilon}^{-\text{imp}\boldsymbol{t}} \right| \right|\_{2}^{2} + \left| \left| \mathbf{x}(t) - \sum\_{K=1}^{K} \boldsymbol{u}\_{K}(t) \right| \right|\_{2}^{2} + \left< \boldsymbol{\lambda}(t), \boldsymbol{x}(t) - \sum\_{K=1}^{K} \boldsymbol{u}\_{K}(t) \right> \tag{2}$$

Then the optimal solution of the unconstrained problem is solved using the alternating direction method of multiplication with the following iterative procedure:

$$\mathfrak{A}\_{K}^{n+1}(w) = \frac{\mathfrak{A}(w) - \sum\_{i \neq K} \mathfrak{A}\_{\mathfrak{f}}(w) + \left(\mathring{\lambda}(w)/2\right)}{1 + 2a(w - w\_{K})^{2}} \tag{3}$$

$$w\_K^{n+1} = \frac{\int\_0^\infty w \left| \hat{u}\_K^{n+1}(w) \right|^2 dw}{\int\_0^\infty \left| \hat{u}\_K^{n+1}(w) \right|^2 dw} \tag{4}$$

Finally, after applying the above process, the original wind power series is decomposed into the *K* sub-series.

#### *2.2. Fuzzy Entropy*

Fuzzy entropy (FE) [39] is a dynamical method for analyzing the complexity of time series. The FE value changes smoothly with changes in the set parameters, which makes it more robust to noise and more resistant to interference. Firstly, for time series with the length of *n*, the FE algorithm is introduced into the fuzzy membership function, and the specific formula is as follows:

$$D(\mathbf{x}) = \exp\left[-\ln(2)\left(\frac{\mathbf{x}}{r}\right)^2\right] \tag{5}$$

where *r* is the similarity tolerance, *x* = *d<sup>m</sup> ij* , *<sup>d</sup><sup>m</sup> ij* is the distance between vectors that reconstruct the time series into m-dimensional phase space, and *i*, *j* = 1,2..., *n* – *m* + 1, *i* = *j*.

Averaging over each *i* in *D<sup>m</sup> ij* yields, the average similarity function is as follows:

$$\phi^m(r) = \frac{1}{N - m + 1} \sum\_{i=1}^{N-m+1} \left( \frac{1}{N - m} \sum\_{j=1, j \neq i}^{N-m+1} D\_{ij}^m \right) \tag{6}$$

Therefore, the FE expression is as follows:

$$\text{FuzzzyEn}(m, r, n) = \ln \phi^m(r) - \ln \phi^{m+1}(r) \tag{7}$$

#### *2.3. Informer*

The Informer network is a variant of the Transformer that effectively addresses the long-sequence prediction problem [40]. Improvements of the Informer include: using a probsparse self-attention mechanism to reduce the complexity of matrix computation; introducing a self-attention distillation mechanism to extract the main features of time series, which effectively reduces memory usage; using a decoder to directly output the predicted values generatively to achieve the purpose of long-series prediction. The structure of the Informer model is shown in Figure 1.

The traditional self-attention mechanism consists of query, key, and value, and the expression is as follows:

$$f\_A(Q, K, V) = \text{softmax}\left(\frac{QK^\top}{\sqrt{d}}\right)V\tag{8}$$

where *Q*R*LQ*×*d*, *K*R*LK*×*d*, *V*R*LV*×*d*, *d* is the input dimension.

**Figure 1.** The structure of the Informer.

As the matrix multiplication involved in Equation (8) is computationally huge, the probsparse self-attention mechanism is introduced to select the important elements in *Q* to calculate the attention values.

$$f\_A(Q, K\_\prime V) = \text{softmax}\left(\frac{\overline{\mathbb{Q}}K^\top}{\sqrt{d}}\right)V\tag{9}$$

where *Q* is obtained through probabilistic sparsity of *Q* and controlled by a constant sampling factor c and the number of *Q* is c ∗ ln *LK*.

Therefore, the similarity and importance between query and key are measured by Kullback–Leibler divergence, as follows:

$$k(q\_i, k\_j) = \ln \sum\_{l=1}^{L\_K} e^{\frac{q\_i k\_j^\top}{\sqrt{d}}} - \frac{1}{L\_K} \sum\_{j=1}^{L\_K} \frac{q\_i k\_j^\top}{\sqrt{d}} - \ln L\_K \tag{10}$$

where the relevance of *qi* to *kj* is proportional to the magnitude of *k*(*qi*, *kj*). If *p*(*kj qi*) is close to a uniform distribution, i.e., *p*(*kj qi*) = 1/*LK* , indicating that *qi* has the same similarity to all *kj*, then *qi* is deemed as a redundant vector and can be dropped.

Based on this, the sparsity evaluation formula that defines the *i*-th query is:

$$M(q\_{i\prime}K) = \ln \sum\_{l=1}^{L\_K} e^{\frac{q\_i k\_l^\top}{\sqrt{d}}} - \frac{1}{L\_K} \sum\_{j=1}^{L\_K} \frac{q\_i k\_j^\top}{\sqrt{d}} \tag{11}$$

The self-attentive distillation mechanism is introduced in the encoder. The width of the feature map is reduced to half its length after the distillation layer, which can reduce the overall memory usage and effectively solve the problem of long input. The concrete representation is as follows:

$$X\_{j+1}^t = f\_{\text{MP}}\left(\text{ELU}\left(f\_{\text{Conv}}\left(\left[X\_j^t\right]\_{AB}\right)\right)\right) \tag{12}$$

where *f* MP represents the maximum pooling layer function, *f* Conv denotes the convolutional layer function, and [·]*AB* is the attention unit.

The input of the decoder uses the time shield technique, and its input vector is as follows:

$$X\_{\rm dec}^{\rm in} = f\_{\rm Concat} \left( X\_{\rm token} \mid X\_0 \right) \in \mathbb{R}^{(L\_{\rm token} + L\_{\rm \mathcal{P}}) \times d\_{\rm model}} \tag{13}$$

where *<sup>X</sup>*token <sup>∈</sup> <sup>R</sup>*L*token <sup>×</sup>*d*model is the input start token, *<sup>L</sup>*token is the length of the start token, *<sup>X</sup>*<sup>0</sup> <sup>∈</sup> <sup>R</sup>*Lp*×*d*model is the 0-value matrix, and *Lp* is the length of the part to be predicted.

#### *2.4. Adaptive Loss Function*

The adaptive loss function [41] obtains a generalized loss function by introducing robustness as a continuous parameter. During the training process, the adaptive loss function automatically adjusts the robustness parameters around the minimization loss algorithm, thereby enhancing the prediction accuracy. The generalized loss function formula is as follows:

$$f(z, \beta, c) = \frac{|\beta - 2|}{\beta} \left( \left( \frac{\left(z/c\right)^2}{|\beta - 2|} + 1 \right)^{\beta/2} - 1 \right) \tag{14}$$

where *z* is the difference between the true value and the predicted value, *c* > 0 serves as a scale factor that controls the curvature of the quadratic function at x = 0, and *β* is a variable parameter that controls the robustness.

By analyzing Equation (14), the adaptive loss function changes with the change of *β*. For different *β*, the adaptive loss function formula is as follows:

$$L(z,\beta,c) = \begin{cases} \frac{1}{2}(z/c)^2 & \text{if } \beta = 2\\ \log\left(\frac{1}{2}(z/c)^2 + 1\right) & \text{if } \beta = 0\\ \sqrt{\left(z/c\right)^2 + 1} - 1 & \text{if } \beta = 1\\ 1 - \exp\left(-\frac{1}{2}(z/c)^2\right) & \text{if } \beta = -\infty\\ \frac{|\beta - 2|}{\beta}\left(\left(\frac{\left(z/c\right)^2}{|\beta - 2|} + 1\right)^{a/2} - 1\right) & \text{otherwise} \end{cases} \tag{15}$$

It can be seen that the adaptive loss function can be a variety of loss functions, such as the MSE, Cauchy, Charbonnier, and Welsch loss functions, by adjusting the value of the variable parameter *β*.

#### **3. Proposed Model**

#### *3.1. Improved VMD*

With a solid mathematical theoretical foundation, VMD can effectively separate the components of complex signals and greatly suppress mode mixing. However, the decomposition parameter of VMD is given in advance, which limits the performance of data decomposition. In order to overcome the shortcomings of VMD in a parameter setting, this paper proposes the incorporation of the decomposition method and MIC [42] to determine the most suitable number of decompositions *K*. The degree of decomposition is determined by calculating the MIC value between the original sequence y and the reconstructed sequence y , and the MICyy value is positively correlated with the number of decomposition numbers *K*. The closer MICyy value is to 1, the less information is lost during VMD decomposition, indicating a more adequate decomposition.

#### *3.2. IVMD-FE-Ad-Informer Model Framework*

In consideration of the high volatility of wind power data, this paper introduces the improved VMD and FE methods to the Informer network with adaptive loss function, and the framework is shown in Figure 2. In the data processing stage, the original data are decomposed into *K* IMFs by IVMD. Next, FE is used to calculate the complexity of each IMFs, and the IMFs with similar values are reconstructed into new elements. In

the model-building stage, the input variables for the model are obtained through feature selection using the MIC algorithm and then input into a robust Ad-Informer prediction model. In the results analysis stage, the wind power forecasting results are obtained by linearly superposing the predicted values of each element, followed by visualizing the forecasting curve.

**Figure 2.** The framework of IVMD-FE-Ad-Informer model.

#### *3.3. Evaluation Indexes*

The mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R2) are used as evaluation indicators for the prediction performance of IVMD-FE-Informer and other benchmark models. The mathematical formula is as follows:

$$MAE = \frac{1}{N} \sum\_{i}^{N} |q\_{\text{true}}\ (t) - q\_{\text{pred}}\ (t)|\tag{16}$$

$$RMSE = \sqrt{\frac{1}{N} \sum\_{t=1}^{N} \left( q\_{\text{true}}(t) - q\_{\text{pred}}(t) \right)^2} \tag{17}$$

$$R^2 = 1 - \frac{\sum\_{t=1}^{N} \left( q\_{\text{true}}\left(t\right) - q\_{\text{pred}}\left(t\right) \right)^2}{\sum\_{t=1}^{N} \left( q\_{\text{true}}\left(t\right) - \overline{q} \right)^2} \tag{18}$$

where *q*true (*t*) and *q*pred (*t*) denote the true and predicted values of wind power at time *t*, respectively, *q* is the mean value of *q*true , and *N* is the number of samples in the dataset.

#### **4. Experiment and Analysis**

In this section, four sets of experiments are conducted on datasets with different sampling intervals, capacities, and regions. Experiment 1 aims to describe the specific details of the data processing. Experiment 2 is designed as an ablation experiment to verify the prediction performance of the hybrid model. Experiment 3 mainly aims to design a comparative experiment to verify the viability and superiority of each module. Experiment 4 aims to verify the applicability and stability of the proposed model on different datasets. All experiments are run in Python 3.7 and Pytorch environment with Intel(R) Core (TM) i5-12500H CPU @ 4.50 GHz, 12 Cores, NVIDIA GeForce RTX 3050 GPU, a memory capacity of 16 Gb, and Windows 11 operating system.

#### *4.1. Data Description*

The experiments are mainly conducted on two complete datasets without missing values in this paper. Dataset A is based on a wind farm in Gansu, China, which was selected from 1 July to 30 September 2019, with a sampling interval of 15 min. Dataset A contains wind power, wind speeds at different heights (10 m, 30 m, 50 m, 70 m, and hub height), air temperature, air pressure, and humidity features. Dataset B was collected from the Sotavento Galicia wind farm in Spain from 18 January to 12 March 2020, with a sampling interval of 10 min. Dataset B contains only wind power, wind speed, and wind direction features. The wind power curves from different datasets are show in Figure 3.

The prediction process is the same for different datasets; in fact, dataset A is used for Experiments 1 to 3, and dataset B is used for Experiment 4. The datasets are divided into the training set, validation set, and test set in a ratio of 7:2:1, and the results of each experiment are obtained by taking the average of 10 iterations. The characteristics of the datasets, including number, maximum value (Max), minimum value (Min), mean, standard deviation (Std), and coefficient of variation (COV), are shown in Table 1.



#### *4.2. Experiment 1: The Specific Details of Data Processing*

The data processing part mainly includes data decomposition, new elements reconstruction, and feature selection, and in this part, the operation process and the selection of parameters for data processing will be specifically discussed.

**Figure 3.** The curve of datasets.

#### 4.2.1. Data Decomposition

The IVMD algorithm solves the traditional VMD problem of *K* selection by calculating the MICyy value. The original wind power data are fed into the IVMD model, which is decomposed into *K* IMFs. Based on the results of MICyy corresponding to different values of *K* as indicated in Figure 4, it can be observed that the value of MICyy remains stable and constant for *K* = 16. The IMFs curve after IVMD decomposition and its corresponding spectrum diagram are shown in Figure 5. By observing the principal frequencies of different IMFs from Figure 5, it can be concluded that the IVMD algorithm proposed in this paper can effectively separate each IMF accurately.

**Figure 4.** The curve of MICyy vs. *K*.

**Figure 5.** The results of the IVMD algorithm. The IMFs curves are shown (**left**), and the spectral densities corresponding to the IMFs are shown (**right**).

#### 4.2.2. New Elements Reconstruction

The original wind power data are decomposed into 16 IMFs, and if all the sub-series are directly fed into the prediction model, it will increase the operational burden of the prediction model. Therefore, the complexity of these IMFs will be evaluated by FE, and then the IMFs with similar complexity will be reconstructed into new elements. After conducting extensive experiments, the values of m = 2 and r = 0.25std are found to be the optimal settings for achieving the best accuracy and running time of the model. The FE values of each IMF are shown in Figure 6, and the reconstructed new elements based on these FE values are shown in Table 2.

**Figure 6.** The FE value of each IMF.



#### 4.2.3. Feature Selection

The computational efficiency and generalization ability of the model can be improved by removing some irrelevant or redundant features from the original dataset. Therefore, MIC is used to analyze the correlation between meteorological features and each element and extract typical features reflecting each element through MIC value. The confusion matrix of MIC is given in Figure 7. It can be found from Figure 7 that the influence characteristics of each element are different, reflecting the overall correlation and local characteristics, respectively. In order to select the features with the highest relevance to build the input variables, the MIC thresholds of each element are set to 0.5. The input feature selection results are shown in Table 3.

**Figure 7.** The confusion matrix of MIC.

**Table 3.** The feature selection results.


#### *4.3. Experiment 2: Ablation Experiment*

The purpose of conducting ablation experiments is to verify whether the complex hybrid model has improved the prediction accuracy as compared to simple combinatorial models and single models. The selected benchmark models are IVMD-FE-Ad-Informer, Ad-Informer, and Informer, of which the Informer model uses the MSE loss function. The parameters of AD-Informer are obtained using the grid search method, where the robustness parameter β is adaptively adjusted using the Adam optimizer. The specific parameters are shown in Table 4. The input size of the encoder and decoder is equal to the number of input variables of the model. The prediction curve results of each sub-mode after the training of the IVMD-FE-Ad-Informer model are shown in Figure 8, and the wind power prediction results can be obtained by superimposing them. The final forecasting curves of the ablation experiment are shown in Figure 9, and forecasting errors are shown in Table 5. Figure 9 not only portrays the overall trend of the test set but also amplifies the values from position 300 to 480 in order to offer an in-depth analysis of the predicted results. The main reason is that the wind power data within the test set from the 300th to the 480th position displays more sudden changes and a wider range of variation, thus providing a more comprehensive evaluation of the predictive performance of the proposed model.


**Table 4.** Parameter setting of the Ad-Informer.

From Figure 9, it can be seen that the Ad-Informer model is significantly closer to the true value than the Informer model at the inflection point, indicating that the proposed adaptive function can effectively mitigate the impact of errors at the abrupt points. Compared with the Ad-Informer model, the IVMD-FE-Ad-Informer model is closer to the real value, indicating that the data processing method can reduce the time delay in the process of prediction. According to Table 5, the Ad-Informer model requires less time than the Informer model, mainly attributed to the automatic adjustment of the adaptive loss function, which enables the model to obtain the optimal loss during the training process and enhance its robustness. The hybrid model proposed in this paper shows significant improvements over the AD-Informer and Informer models, with a decrease of 45.09% and 59.67% in MAE, 44.4% and 55.44% in RMSE, and an increase of 11.42% and 22.72% in R2, respectively. By comparing the considered models, it can be seen that IVMD-FE-Ad-Informer decomposes the original wind power data into finer granularity, which can better explore the internal features of wind power, resulting in a significant improvement in both prediction accuracy and performance.

**Table 5.** Forecasting errors of ablation experiment.


*4.4. Experiment 3: Comparative Experiment*

To verify the superiority of each module, EMD-FE-Ad-Informer, IVMD-FE-Informer, IVMD-FE-LSTM, LSTM, and ANN are used as benchmark models in the comparison experiments, and the parameter settings of ANN and LSTM are the same as [19,43]. EMD decomposes wind power data into 11 IMFs by trail-and-error method, and then these IMFs are reconstructed into three new components (IMF1~IMF3, IMF4~IMF6, and IMF7~IMF11) using FE. The forecasting curves of different models are shown in Figure 10. The forecasting errors are shown in Table 6. The boxplots of the forecasting errors for each model are given in Figure 11.

**Figure 8.** The prediction curve results of each sub-mode. (**a**) Element 1 prediction curve; (**b**) Element 2 prediction curve; (**c**) Element 3 prediction curve; (**d**) Element 4 prediction curve; (**e**) Element 5 prediction curve.

**Figure 9.** The forecasting curves of the ablation experiment. The overall forecasting trends are shown at the (**top**), and the local enlargement is shown at the (**bottom**).


Based on the results in Table 6, the IVMD-FE-Ad-Informer model outperforms the single prediction model and the other hybrid models across all evaluation metrics. It can be concluded that MAE decreased by about 35.68–60.32%, RMSE decreased by about 36.11–59.67%, and R2 increased by about 5.64–30.78%. According to Table 6 and Figure 10, it can be inferred that the IVMD algorithm has superior data decomposition ability compared to the traditional EMD algorithm under similar data processing. This improved ability enables the IVMD algorithm to more effectively reduce non-smooth features in the original data, resulting in smoother data and improved wind power prediction accuracy. Furthermore, the prediction accuracy of Ad-Informer is much higher than that of Informer and LSTM for the same data processing method, with R<sup>2</sup> of 0.925, 0.889, and 0.808, respectively. While IVMD-FE-Ad-Informer is relatively time-consuming due to the implementation of the Ad-Informer prediction module five times after the IVMD-FE data preprocessing, it demonstrates a closer resemblance to the actual curve and produces the smallest forecasting errors. It can be indicated that the model proposed in this paper is an optimal combined model with high prediction performance.

**Figure 10.** The forecasting curves of different models. The overall forecasting trends are shown at the (**top**), and the local enlargement is shown at the (**bottom**).

**Figure 11.** The boxplots of different models.

#### *4.5. Experiment 4: The Stability of IVMD-FE-Ad-Informer Forecasting*

The experimental results demonstrate that IVMD-FE-Ad-Informer outperforms other benchmark models on dataset A and exhibits considerable wind power prediction ability. However, the statistical distributions of wind power data vary across different time intervals, regions, and capacities, which may lead to the phenomenon of unstable forecasting. Therefore, the stability and applicability of the model still need further discussion. In this section, EMD-FE-Ad-Informer, Ad-Informer, LSTM, and ANN are used as benchmark models on dataset B, which are collected from the Sotavento Galicia wind farm in Spain at 10 min sampling intervals. The parameter-setting method of this experiment is the same as Experiment 3, and the specific parameter settings of each algorithm are shown in Table A1, Appendix A. The forecasting curves of dataset B are shown in Figure 12, and the forecasting errors are shown in Table 7.

**Table 7.** Forecasting errors of different datasets.


According to Figure 12 and Table 7, the results obtained from dataset B are comparable to those from dataset A, indicating that the model proposed in this paper has high stability and generalization ability on different datasets. From Figure 12, IVMD-FE-Ad-Informer exhibited the closest fit to the true values among all the considered models, with the EMD-FE-Ad-Informer following closely behind. It can also be seen from Table 7 that the IVMD-FE-Ad-Informer has the best prediction performance with regard to MAE, RMSE, and R2, which are 83.01 kW, 60.43 kW, and 0.962, respectively. The results further confirm that the IVMD algorithm is a superior and effective method for wind power data decomposition.

Based on the experimental results of the two different datasets, it is apparent that the IVMD-FE-Ad-Informer outperforms other benchmark models in terms of all evaluation metrics and has the closest fit of prediction curves to the true values. Meanwhile, the COV value is introduced for further analysis of the influence of prediction accuracy on different datasets. This value is a typical indicator of the degree of data fluctuation, with more volatile data having a higher COV value [44]. It also can be concluded that the accuracy of the proposed model prediction is inversely related to the degree of fluctuation in the original data. For example, when using Ad-Informer to forecast wind power on dataset A, the R<sup>2</sup> is 0.866, whereas, on dataset B with a higher COV, R<sup>2</sup> is slightly lower at 0.858. Furthermore, the superiority of the proposed model in terms of prediction performance becomes more prominent as the original wind power sequence contains more nonlinear features. The outstanding contribution is the development of an adaptive loss function, which can accurately identify and predict violent changes in wind power, thereby effectively mitigating the impact of outliers.

**Figure 12.** The forecasting curves of different datasets. The overall forecasting trends are shown at the (**top**), and the local enlargement is shown at the (**bottom**).

#### **5. Conclusions**

The actual operation of wind farms is influenced by various factors such as weather conditions, season variation, and atmospheric circulation, which can lead to numerous outliers and non-smooth features in the wind power data. The presence of such factors brings many obstacles to achieving the further improvement of accuracy and performance of wind power prediction. Thus, an adaptive hybrid model for wind power prediction based on improved VMD, FE, and Informer in conjunction with adaptive loss function is proposed in this paper. The IVMD-FE-Ad-Informer model is a promising hybrid model that enables adaptive forecasting of stochastically fluctuating wind power data, and its main advantages are summarized as follows:


As can be seen from the above, the hybrid wind power prediction model that combines the advantages of several algorithms has higher prediction accuracy and better robustness. However, there are some problems in this study that need to be improved in the future. Firstly, in this paper, only the correlation factor is considered in feature selection, while F-score and sensitivity factors are not taken into account. In future work, the analysis of the relationship between other variables and wind power using F-score and sensitivity will be conducted to reduce the redundancy of massive data. Then, the parameter selection in this paper may not be precise enough, and to address this issue, the optimization algorithm will be introduced to overcome the sensitive defect of deep-learning networkparameter selection.

**Author Contributions:** Y.T. and D.W. described the proposed framework and wrote the whole manuscript; Y.T. implemented the simulation experiments; G.Z. and J.W. collected data; Y.N. and S.Z. revised the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is supported in part by the Natural Science Foundation of China under Grant 52077027 Study and in part by the Liaoning Province Science and Technology Major Project No. 2022021000014.

#### **Institutional Review Board Statement:** Not applicable.

**Data Availability Statement:** The data used to support the findings of this study are available from the corresponding author upon request.

**Acknowledgments:** The authors thank the chief editor and the reviewers for their valuable comments on how to improve the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

**Table A1.** The parameter settings of each algorithm on dataset B.



#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Prediction of Faults Location and Type in Electrical Cables Using Artificial Neural Network**

**Ana-Maria Moldovan \* and Mircea Ion Buzdugan**

Faculty of Building Services Engineering, Technical University of Cluj-Napoca, 28 Memorandumului Str., 400114 Cluj-Napoca, Romania

**\*** Correspondence: ana.moldovan@insta.utcluj.ro

**Abstract:** Detecting and locating faults in electrical cables has been a permanent concern regarding electrical power distribution systems. Over time, several techniques have been developed aiming to manage these faulty situations in an efficient way. These techniques must be fast, accurate, but, above all, efficient. This paper develops a new approach for detecting, locating, classifying, and predicting faults, particularly in different types of short-circuits in electrical cables, based on a robust artificial neural network technique. The novelty of this approach lies in the ability of the method to predict fault's location and type. The proposed method uses the Matlab and Simulink platform and comprises four consecutive stages. The first one is devoted to the development of the Simulink model. The second one implies a large number of simulations in order to generate the necessary dataset for training and testing the artificial neural network model (ANN). The following stage uses the ANN to classify the location and the type of potential faults. Finally, the fourth stage consists of predicting the location and the type of future faults. In order to reduce the time and the resources of the simulation process, a virtual machine is used. The study reveals the efficiency of the method, and its ability to successfully predict faults in real-world electrical power systems.

**Keywords:** electrical cables; detecting, locating and predicting faults; artificial neural network; Classification Learner app

**Citation:** Moldovan, A.-M.; Buzdugan, M.I. Prediction of Faults Location and Type in Electrical Cables Using Artificial Neural Network. *Sustainability* **2023**, *15*, 6162. https://doi.org/10.3390/su15076162

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 7 March 2023 Revised: 29 March 2023 Accepted: 30 March 2023 Published: 3 April 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Today, people's lives are entirely dependent on the sustainability of electrical power systems. This supposes that the continuity of the supply of the electrical power distribution systems is mandatory. In this respect, electrical cables have the important role of linking all components of a power system.

The presented method depicted in the following sections aims to contribute to a higher degree of sustainability of the distribution power systems, accelerating the maintenance process in fault cases, due to its accuracy in predicting the location and the type of faults in the energy cables.

A fault in a cable directly affects the sustainability of the system, and the duration of a power outage, being crucial to ensure the cables' integrity during their entire operation time [1–3]. However, if any defect occurs in a cable, the reaction must be as fast as possible to reduce to a minimum the duration of its clearing time [4,5].

Methodologies of detecting faults in electrical cables have evolved with the advancements in technology, with several methods being implemented: time domain reflectometry technique, impedance-based method, knowledge-based method, traveling wave methods or hybrid methods [6]. Each of them has its benefits and its limitations [7,8]. For instance, time domain reflectometry can be successfully used in the case of a single cable, being useless for systems that have more than two branches [7,9,10].

Algorithms based on artificial intelligence (AI) propose solutions that are able to manage these more complex systems [11–13]. The artificial neural network technique (ANN) provides efficient pattern recognition algorithms that can be applied in predicting, locating and classifying faults [14,15]. The ANN technique is able to solve nonlinear problems, based on learned experiences, which implies different possible configurations of the electrical distribution systems [16–20]. At the same time, the main ANN algorithms features are robustness, generalizability, and noise immunity [21–23].

A complex method of detecting and locating faults in electrical cables should return their exact type and location [24–26]. For a three-phase electrical power system, the fault types are interruptions or short-circuits, with the last ones being: single line to ground fault, line to line fault, double line to ground fault, three-line fault and three line to ground fault [27–29].

This paper presents an efficient method of detecting, locating, and predicting the different types of short-circuits in electrical cables. In order to develop this method, a model of distribution electrical system has been modeled in Simulink, followed by the use of the ANN technique of the Classification Learner app available in Matlab.

The model selected for analysis contains a three-phase source block of 20 kV and several distributed parameters line blocks with a total of 22 km of cables. The ANN technique is based on data generated by the Simulink model of the electrical distribution system. The method aims to reach a high rate of validation accuracy of the trained model delivered by the Classification Learner app. After running a large number of simulations, more precisely 6150 simulations, a 98% rate of validation accuracy was obtained.

The generated data represent the training dataset for the ANN algorithm which has an important impact over the accuracy of the method, since the performance of the method increases with the complexity of the dataset. The case presented below highlights the complexity and advantages of using ANNs methods in predicting, detecting, locating, and classifying short-circuits in complex distribution electrical power systems.

The article is structured in five sections. The present section is the introductory one, highlighting the importance of detecting faults in electrical power distributions systems. In the second one, entitled Materials and Methods, the working principle of the method is presented. Then, the Results section comprises four consecutive stages. The first stage is devoted to the development of the Simulink model. The second stage presents a large number of simulations in order to generate the necessary dataset for training and testing the ANN model. The third one uses the ANN to classify the location and the type of potential faults. Finally, the fourth stage consists of predicting the location and the type of future faults. The fourth and fifth sections are devoted to the discussion, conclusions and further work.

#### **2. Materials and Methods**

The need to detect and locate faults in electrical systems has generated different methods in the attempt to solve these problems. In the present section, a new approach involving a simulated model of a distribution electrical power system, combined with the benefits of ANN applications, will be presented.

As mentioned above, the method comprises four consecutive stages. The first one is devoted to the development of the model, and the second one presents a large number of simulations in order to generate a dataset necessary for training and testing the ANN model, while the third stage uses ANN to classify the location and the type of faults. Finally, the fourth stage consists of predicting the location and the type of potential future faults.

After modeling the distribution electrical system in Simulink (R2022a), several simulations were performed for different types of short-circuits in different locations of the cables. The results of all simulations have been saved in a database which became the training dataset for the Classification Learner app from Matlab (R2022a). The input data for the Classification Learner app are the measured values of the voltage and current, the responses being either the location of the faults or both their location and type. Based on the trained neural network, the location and the type of a further fault can be predicted.

The model was created using the blocks contained in the Simscape (R2022a) electrical library, a library dedicated to electrical power systems. The developed simulation model contains a three-phase source block, distributed parameters line blocks, three-phase voltageintensity (VI) measurement blocks, three-phase load blocks and a three-phase fault block able to induce faults in different locations of the cables. The model and these blocks will be detailed in the next section.

Since the process of simulating the model which provides the training set for the neural network is a time-consuming one, it had to be automated. The automation consisted in writing a Matlab code which ran these simulations, and at each simulation, modifying the parameters and saving the data delivered by the measurement blocks.

Once the database is accessible, the whole set of simulation results is introduced in the Classification Learner app from Matlab. At this point, the training process may start. In this last Matlab application, different types of algorithms based on artificial intelligence are available, such as decision trees, discriminant analysis, naive bayes classifiers, support vector machines, nearest neighbor classifiers, ensemble and neural network classifiers [30]. The Classification Learner app allows the training of all these algorithms based on the accuracy of validation, enabling the possibility of choosing the most efficient one. For the present case, the most accurate algorithm turned out to be the medium neural network model. This model can be exported into Matlab workspace, being later used in the prediction of the trained model response for another set of measurements, corresponding to a further fault due to the versatility and complexity of the Matlab and Simulink platform.

The presented method is synthesized in the process diagram in Figure 1.

**Figure 1.** The process diagram of the method presented.

#### **3. Results**

As stated above, the case under analysis proposes a method of detecting and locating faults in electrical systems based on the medium neural network algorithm, which can be successfully used in solving faults detection, location, and prediction.

#### *3.1. Simulink Model*

This first stage of the presented method is devoted to the development of the Simulink model of a distribution electrical system. The development of the model concept consists of inserting different types of short-circuits in different locations of the system and observing their influence on the measured voltage–current pairs. Based on it, the training dataset necessary in the third stage will be used.

As mentioned in Section 2, in order to develop the simulation model, the Simscape electrical library was used. The selected model of the distribution electrical system presented in Figure 2 contains: a three-phase source block, depicted in green; six subsystems for the six lines L1 to L6, depicted in dark green, which contains the distributed parameters line blocks; eight three-phase voltage-intensity VI measurement blocks B1 to B8, depicted in blue; three-phase load blocks, noncolored; a three-phase fault block, bordered in red; and powergui, the environment block for Simscape electrical specialized power system models, set to discrete simulation type with the sample time of 2 <sup>×</sup> <sup>10</sup>−<sup>6</sup> s.

**Figure 2.** Distribution electrical system—Simulink model.

To introduce the fault block in different locations, each line is divided into three or four sectors of 1 km length, totaling 22 sectors (see Figure 3 and Table 1). From Figure 4, one can see that the sectors are modeled using three phase distributed parameter line blocks.

**Figure 3.** Fault block connected to L1—Sector 4.

**Table 1.** Subsystems of Lines.



**Figure 4.** Distributed parameter line block.

The three-phase fault block can be set for twelve types of faults, the fault resistance, the ground resistances, and the no-fault situation included (see Table 2). The letters a, b, and c indicate the three power lines, while g indicates the ground plane.

**Table 2.** Types of faults.


The three-phase voltage-intensity VI measurement blocks are mandatory to collect the values of the voltage–current pairs. As an example, Figure 5 presents the three-phase VI measurement block B1.


**Figure 5.** Three-phase VI measurement block—B1.

All measurements collected from the Simulink model are exported into the Matlab data acquisition workspace. Its workflow is described in the structure depicted in Figure 6a,b. For the sake of clarity, a cropped detail is presented in Figure 7.

**Figure 6.** Data acquisition from Simulink model. (**a**) first half of the structure (A-A); (**b**) second half of the structure (B-B).

**Figure 7.** Data acquisition from Simulink model (cropped detail).

The voltage and current values are passed from the three-phase VI measurement blocks to the sequence analyzer, and then are exported to the Matlab variables. At each simulation, data are saved in a table which in the end will become the training dataset for the artificial intelligence algorithm.

#### *3.2. Simulation Process*

After implementing the model of the distribution electrical system, the simulation process of different types of short-circuits may start for different values of fault and ground resistances in different locations of the system.

The fault block is moved along the system and is positioned at the end of each sector of all the six lines, which totals 22 positions. For each of these positions and for 25 situations of different chosen values for the fault and ground resistances, all the twelve types of faults are simulated. Performing all these simulations requires that for each simulation the location of the fault block, the type of fault, the values of fault or ground resistances be changed. The time needed for running a simulation is approximately two minutes, not considering the process of changing the parameters. This means that over the course of an hour, less than 30 simulations can be performed. Due to the large number of necessary simulations, and the time required to perform them, automation was almost mandatory.

The automation process has been achieved by implementing a Matlab program, which runs the Simulink model. Consequently, at each run, either the faults block position or the faults block parameters are automatically changed. Thus, the automation reduces the time of each simulation to 1.5 min./simulation, having benefits over the duration of the entire simulation process. Data from the measurement blocks are saved, and used later for the training algorithm of ANN.

The code of the program is presented in Appendix A.

Once the running process is completed, the 6150 simulations performed led to a dataset, which represents the input of the Classification Learner app. Examples of these data can be seen in Tables 3 and 4, which contain examples of the voltage–current values, provided by the eight measurement blocks along with the faults' location and type.


**Table 3.** Data from simulations—voltage measurements.

**Table 4.** Data from simulations—current measurements.


Through the sequence analyzer from the data acquisition of the Simulink model, the magnitude and phase angle of the three-phase signals are obtained. For instance, measurement block B1 provides values of the voltage magnitude V\_B1, the voltage phase angle F\_B1, the current magnitude I\_B1 and the current phase angle FI\_B1 (see also Figure 6a,b).

#### *3.3. Classification Learner App*

Once the simulation process is completed, the large amount of obtained data is used in the Classification Learner App from Matlab to train the artificial intelligence (AI) algorithms. This application can classify data based on the training dataset and return a single response for a further situation [30].

To start the training session, it is necessary to set the parameters observed in Figure 8. The table named "DataTable", containing the results of 6150 simulations, becomes the dataset

variable in the Classification Learner App. Data from this table are divided into two types of data, namely predictors and response. Predictors are represented by the values of voltage– current pairs measured at each simulation and the response represents the fault location of the exposed situation. The validation scheme was set as a cross-validation with five folds. After setting these parameters, the session starts by clicking the Start Session button.


**Figure 8.** Classification Learner app—training dataset.

Simulations have been also performed for the cases in which the response is both the location and the type of the fault.

In the Classification Learner app, the model of the training algorithm can be set. To obtain the best accuracy validation values, the option of training all models algorithm may also be chosen.

After the analysis of several training models, Figure 9 reveals that the most efficient model sorted in terms of accuracy is the medium neural network algorithm. One can see that the accuracy validation in the case of fault location was 98% and 94.7% in the case of both location and type of the fault.

If the response is only the fault location (example L1/S1 for line 1/sector 1), which implied 23 unique answers, the validation accuracy will be better than in the case where the response is both the location and the type of the fault (example L1/S1/ab for line1/sector 1/type of fault ab) which implied 243 unique answers. These two cases will be presented comparatively in the next stage, where the fault location and the fault location and type based on the trained model are predicted.

In evaluating and observing the performance of a trained model, the analysis of the confusion matrix and of the receiver operating characteristic (ROC) curve are two useful tools.


**Figure 9.** Classification Learner app.

The confusion matrix contains the predicted classes in its columns and the true classes in its rows; therefore, a 100% validation accuracy assumes a perfect principal diagonal confusion matrix. Values situated outside the principal diagonal of the matrix indicate situations that are not well-predicted and need supplementary data training set [30].

In Figures 10 and 11, the validation confusion matrices for the two studied cases are presented. From Figure 10, which presents the case of 23 unique responses, one can observed that the only situation with 100% accuracy is the no-fault (normal) situation. In Figure 11, one can also observe the shape of the principal diagonal in the case of 243 unique responses. Due to the complexity of the simulation, the values situated outside the principal diagonal are not visible in the resolution of Figure 11. The 243 classes containing the location and type of faults are presented in Table 5.

From the confusion matrix depicted in Figure 10, it can be observed that for lines 5 and 6 of the Simulink model, the prediction response is poorer, which is noticeable from the larger number of values that deviate from the principal diagonal. For the other lines, the situation is better, with fewer cases where the predicted class does not coincide with the true class.

Even though some values are far from the principal diagonal, their low values indicate that the probability of a wrong prediction is unlikely. For instance, for the predicted class L4/S4, there is a single situation in which the true class is in fact L1/S2.

Unfortunately, the confusion matrix for the case with 243 unique responses is not useful to indicate the classes that were not well-predicted. This difficulty can be alleviate using other tools provided by the Classification Learner app, one of them being the ROC curve, an efficient way of comparing trained models.

**Figure 10.** Validation confusion matrix for medium neural network model—accuracy 98.0%—for location of the fault.

**Figure 11.** Validation confusion matrix for medium neural network model—accuracy 94.7%—for location and type of fault.


**Table 5.** Location and type of faults.

The ROC curve is a plot tool that provides the false positive rate and the true positive rate of each predicted class. The area under curve (AUC) is an indicator of the quality of the classifier. The AUC values range between 0 and 1, with a higher value indicating a better performance of the classifier [30].

In Figures 12 and 13, the ROC curves for some representative cases studied are presented. Figure 12 presents the ROC curve for the medium neural network model with an accuracy validation of 98.0% and Figure 13 presents the ROC curve for the medium neural network model with an accuracy validation of 94.7%. In both figures, the highest and the lowest values of AUC are shown.

For the model which predicts only the location of the fault, has an accuracy validation of 98.0%, and encompasses 23 situations, the maximum value of AUC (1.00) is reached for several classifiers. Figure 12a presents the ROC curve in one of these situations, while Figure 12b presents the lower value of AUC, which, in this case, is of 0.99 and occurs for Line 6-Sector 2.

For the model which predicts both location and type of the fault, has an accuracy validation of 94.7%, and encompasses 243 situations, the maximum value of AUC (1.00) is also reached for several classifiers. Figure 13a presents the ROC curve in one of these situations, while Figure 13b presents the lower value of AUC, which, in this case, is of 0.94 and also occurs for Line 6-Sector 2.

Both situations reveal a good accuracy validation prediction.

By analyzing the ROC curves and comparing their results with the ones of the confusion matrices, it is obvious that the two plotting tools offer the same results but in different ways. Since the confusion matrix presents an overview of all classes, the ROC curve presents specific results for each class.

**Figure 12.** ROC curve for medium neural network model—accuracy 98.0% (**a**) the AUC 1.00 for Line 4-Sector 4, (**b**) the AUC 0.99 for Line 6-Sector 2.

**Figure 13.** ROC curve for medium neural network model accuracy—94.7%—(**a**) the AUC 1.00 for Line 4-Sector 4, (**b**) the AUC 0.94 for Line 6-Sector 2.

#### *3.4. Prediction of Faults Based on the Trained Model*

After analyzing the different types of trained models, these can be exported from the Classification Learner app to the Matlab workspace as a new variable ("trainedModel"). This variable can be used to predict responses for other faults which may occur in the same

distribution system, and which have not been considered in the previous training dataset. As an example, Tables 6 and 7 contain four situations in which different fault and ground resistances have been used.

**Table 6.** Data for predictions—voltage measurements.


**Table 7.** Data for predictions—current measurements.


Measurements generated for the predictions must be introduced in a variable ("Ttest") with the same structure as ("DataTable") as the variable used in the training dataset. After processing the variable to be tested, using the prediction function for the trained model, the response based on the trained model will be obtained.

Figures 14 and 15 present the responses of the two cases studied, e.g., for the medium neural network model with an accuracy of 98.0% and for the medium neural network model with an accuracy of 94.7%.

**Figure 14.** Predicted results for test examples (for medium neural network model accuracy 98.0%—for the fault location).

**Figure 15.** Predicted results for test examples (for medium neural network model accuracy 94.7%—for location and type of the fault).

By applying the prediction function to Tables 6 and 7, which contain the voltage and current measurement data, two cell arrays are created, namely two column vectors (see Figures 14 and 15).

It can be observed that in the first case, the response contains only the location of the fault, and in the second case, the response includes in addition the type of the fault. By comparing the result of the prediction function with the last column ("Fault") of the ("Ttest") table, one can observe that the trained model operates properly.

#### **4. Discussion**

The paper presents a solution for detecting and locating faults in electrical distribution systems using a Simulink model combined with the ANN algorithms of the Classification Learner app provided by Matlab.

The high performance of the proposed technique emphasizes the potential of using this principle for real-world distribution electrical systems.

After the first stage devoted to the development of the simulation model and the simulations presented in the second stage, in the third stage, it can be observed that the performance of the trained model is correlated to the training dataset. The larger the database is, the higher the performance of the trained model is. In order to have a good accuracy validation value, the number of simulations must be adapted to the complexity of the analyzed system.

Simulating all the cases which can offer a solid test set for the presented case study implies the use of high-level hardware and software resources. These needs can be covered by using local hardware resources; however, much better results can be obtained by using a virtual machine which can significantly shorten the required simulations time. If several virtual machines are simultaneously used, the simulation process can be accelerated, and in the end, the data collected from all of them can be processed and integrated into a single database.

When applying the method presented above for a real-world distribution electrical system, it is mandatory that the Simulink model must contain its very real components. This method can also be used in other situations, for instance, in industrial estates, where the consumers, components of the system and their structure are already well-known.

#### **5. Conclusions**

The developed method has a good accuracy and its use in real-world situations is therefore recommended. The trained model which predicted the location of faults (with 23 possible responses) had an accuracy validation of 98%, while the trained model which predicted both location and type of faults (with 243 possible responses) had a slightly lower accuracy validation (94.7%). Both trained models were based on the same data training set (measurements from 6150 of simulations), which revealed that for obtaining a good accuracy validation value, a larger number of responses, and of the data training set were needed.

The major advantage of the presented method lies in its good precision in detecting, locating, and predicting faults. At the same time, running a large number of simulations could be considered a tedious operation. Clearly, the simulations' time was closely related to the complexity of the Simulink model and the number of parameters that needed to be modified at each simulation. Additionally, the developed Simulink model of the electrical power system could also decisively contribute by performing different updates of the system if necessary. It could be used, for example, to simulate the impact of a new structure of the system or to analyze possible future improvements performed on the electrical system.

Methods of faults detecting, locating, and predicting in electrical power systems, based on ANN algorithms, could also solve complex problems encountered at electric lines or branched systems of cables, situations difficult to be managed using classic methods of detecting faults.

In further research, this method could be improved in the direction of generating dynamic changes of the loads simulations, and in developing a friendly graphical user interface to the application.

The presented topic, regarding faults detection and location in electrical systems, was and remains a main and permanent concern of utility companies, which continuously seek to improve and adapt proprietary methods in order to have an optimal operation of their electrical power systems.

**Author Contributions:** Conceptualization, A.-M.M. and M.I.B.; methodology, A.-M.M.; software, A.-M.M.; validation, A.-M.M. and M.I.B.; formal analysis, M.I.B.; investigation, A.-M.M.; resources, A.-M.M.; data curation, M.I.B.; writing—original draft preparation, A.-M.M.; writing—review and editing, A.-M.M. and M.I.B.; visualization, M.I.B.; supervision, M.I.B.; project administration, A.-M.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **Appendix A**

```
T = [];
     model = 'name_model';
     tstop = 0.1;
     % 1 23456
     Line.Name = {'L1', 'L2', 'L3', 'L4', 'L5','L6'};
     Line.Sectors = [ 4 4 4 4 3 3 ];
     Faults = {'0','ag','bg','cg','ab','bc','ac','abg','bcg','acg','abc','abcg'};
     nLine = 1:6;
     nFault = 1:12;
     Faulty.Lines.Name = {Line.Name{nLine}};
     Faulty.Sectors = Line.Sectors(nLine);
     Type_Faults = {Faults{nFault}};
     open_system(model);
     for iFaultyLines = 1:length(Faulty.Lines.Name)
        for iFaultType = 1:length(Type_Faults)
          Fault_Type = Faults(iFaultType);
        Faulty.Lines = [model,'/',Faulty.Lines.Name{iFaultyLines}];
        open_system(Faulty.Lines);
        Nsectors = Faulty.Sectors(iFaultyLines);
        FaultBlock = [Faulty.Lines,'/Fault'];
        try
          addbd = add_block([model,'/Fault'],FaultBlock,'Commented','off','Position', [545
        catch
          Position = get_param(FaultBlock,'Position');
          delete_block(FaultBlock);
          add_block([model,'/Fault'],FaultBlock,'Commented','off','Position',Position + 10);
        end
     if contains(Fault_Type,'g')
        set_param(FaultBlock,'GroundFault','on')
     else
        set_param(FaultBlock,'GroundFault','off')
     end
     if contains(Fault_Type,'a')
        set_param(FaultBlock,'FaultA','on')
     else
```

```
set_param(FaultBlock,'FaultA','off')
end
if contains(Fault_Type,'b')
  set_param(FaultBlock,'FaultB','on')
else
  set_param(FaultBlock,'FaultB','off')
end
if contains(Fault_Type,'c')
  set_param(FaultBlock,'FaultC','on')
else
  set_param(FaultBlock,'FaultC','off')
end
for iSectors = 1:Nsectors
  NameBlock = [Faulty.Lines,'/S',num2str(iSectors)];
  hFault = get_param(FaultBlock,'PortHandles');
  hBlock = get_param(NameBlock,'PortHandles');
  hLines = add_line(Faulty.Lines,hFault.LConn,hBlock.RConn);
  Ron = logspace(log10(1e-4),log10(20),5); %FaultResistance
  Rg = logspace(log10(1e-4),log10(20),5); %GroundResistance
  for iR1 = 1:length(Ron)
     set_param(FaultBlock,'FaultResistance',num2str(Ron(iR1)));
       for iR2 = 1:length(Rg)
           set_param(FaultBlock,'GroundResistance',num2str(Rg(iR2)));
           out = sim(model);
           cellrow{1,1} = out.yout{9}.Values.V_B1.Data(end,:);
           cellrow{1,2} = out.yout{9}.Values.F_B1.Data(end,:);
           cellrow{1,3} = out.yout{9}.Values.V_B2.Data(end,:);
           cellrow{1,4} = out.yout{9}.Values.F_B2.Data(end,:);
           cellrow{1,5} = out.yout{9}.Values.V_B3.Data(end,:);
           cellrow{1,6} = out.yout{9}.Values.F_B3.Data(end,:);
           cellrow{1,7} = out.yout{9}.Values.V_B4.Data(end,:);
           cellrow{1,8} = out.yout{9}.Values.F_B4.Data(end,:);
           cellrow{1,9} = out.yout{9}.Values.V_B5.Data(end,:);
           cellrow{1,10} = out.yout{9}.Values.F_B5.Data(end,:);
           cellrow{1,11} = out.yout{9}.Values.V_B6.Data(end,:);
           cellrow{1,12} = out.yout{9}.Values.F_B6.Data(end,:);
           cellrow{1,13} = out.yout{9}.Values.V_B7.Data(end,:);
           cellrow{1,14} = out.yout{9}.Values.F_B7.Data(end,:);
           cellrow{1,15} = out.yout{9}.Values.V_B8.Data(end,:);
           cellrow{1,16} = out.yout{9}.Values.F_B8.Data(end,:);
           cellrow{1,17} = out.yout{10}.Values.I_B1.Data(end,:);
           cellrow{1,18} = out.yout{10}.Values.FI_B1.Data(end,:);
           cellrow{1,19} = out.yout{10}.Values.I_B2.Data(end,:);
           cellrow{1,20} = out.yout{10}.Values.FI_B2.Data(end,:);
           cellrow{1,21} = out.yout{10}.Values.I_B3.Data(end,:);
           cellrow{1,22} = out.yout{10}.Values.FI_B3.Data(end,:);
           cellrow{1,23} = out.yout{10}.Values.I_B4.Data(end,:);
           cellrow{1,24} = out.yout{10}.Values.FI_B4.Data(end,:);
           cellrow{1,25} = out.yout{10}.Values.I_B5.Data(end,:);
           cellrow{1,26} = out.yout{10}.Values.FI_B5.Data(end,:);
           cellrow{1,27} = out.yout{10}.Values.I_B6.Data(end,:);
           cellrow{1,28} = out.yout{10}.Values.FI_B6.Data(end,:);
           cellrow{1,29} = out.yout{10}.Values.I_B7.Data(end,:);
           cellrow{1,30} = out.yout{10}.Values.FI_B7.Data(end,:);
```

```
cellrow{1,31} = out.yout{10}.Values.I_B8.Data(end,:);
           cellrow{1,32} = out.yout{10}.Values.FI_B8.Data(end,:);
           FaultLocation = [Faulty.Lines.Name{iFaultyLines},'/S',num2str(iSectors)];
           StringFaultLocation = convertCharsToStrings(FaultLocation);
           if contains (Fault_Type,'0')
                 cellrow{1,33} = ("normal");
           else
                 cellrow{1,33} = strcat(StringFaultLocation,"/",Fault_Type);
           end
           cellrow{1,34} = Ron(iR1);
           cellrow{1,35} = Rg(iR2);
           T1 = cellrow;
           T = [T;T1];
           save(date);
           end
  end
  delete_line(hLines);
  end
DataTabel = cell2table(T);
save('data_generation');
```
#### **References**

1. Sumit, S.V. Iterative and Non-Iterative Methods for Transmission Line Fault-Location Without using Line Parameters. *Int. J. Eng. Innov. Technol.* **2013**, *3*, 310–314.

end

end


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **A Power System Timing Data Recovery Method Based on Improved VMD and Attention Mechanism Bi-Directional CNN-GRU**

**Kangmin Xie, Jichun Liu \* and Youbo Liu**

School of Electrical Engineering, Sichuan University, Chengdu 610065, China **\*** Correspondence: jichunliu@scu.edu.cn

**Abstract:** The temporal data of the power system are expanding with the growth of the power system and the proliferation of automated equipment. However, data loss may arise during the acquisition, measurement, transmission, and storage of temporal data. To address the insufficiency of temporal data in the power system, this study proposes a sequence-to-sequence (Seq2Seq) architecture to restore power system temporal data. This architecture comprises a radial convolutional neural unit (CNN) network and a gated recurrent unit (GRU) network. Specifically, to account for the periodicity and volatility of temporal data, VMD is employed to decompose the time series data output into components of different frequencies. CNN is utilized to extract the spatial characteristics of temporal data. At the same time, Seq2Seq is employed to reconstruct each component based on introducing a feature timing and multi-model combination triple attention mechanism. The feature attention mechanism calculates the contribution rate of each feature quantity and independently mines the correlation between the time series data output and each feature value. The temporal attention mechanism autonomously extracts historical–critical moment information. A multi-model combination attention mechanism is introduced, and the missing data repair value is obtained after modeling the combination of data on both sides of the missing data. Recovery experiments are conducted based on actual data, and the method's effectiveness is verified by comparison with other methods.

**Keywords:** neural networks; VMD; data reconfiguration; attention mechanisms

#### **1. Introduction**

Power grid spatial and temporal character is becoming more complicated with the development of the power system, and the automation equipment rapidly expands with large-scale power systems [1,2]. At the same time, measuring data are increasing. They are starting to resemble big data due to the rapid advancement of power system measurement technologies and the ongoing reduction in measurement costs [3]. The transmission, storage, and analysis of massive amounts of big data for power grids have emerged as a significant area of research in recent years thanks to the rapid advancement of big data technology [4,5]. It is possible to estimate the status of the power system and equipment to a significant extent as well as to optimize operation and accident analysis through the analysis of massive and multiple types of time series data [6,7].

It goes without saying that obtaining authentic and accurate data is crucial for data processing. Still, since signal attenuation, interference, and occasionally failing electronic acquisition equipment cause data to be lost during data acquisition, measurement, transmission, and storage, it is impossible to obtain accurate time series data. In addition to complicating the analysis of prediction outcomes or trend development based on extensive data analysis, missing data can also have an impact on system state estimate, stability, and other critical features based on network data analysis [8,9]. The power system measurement configuration itself has a certain redundancy at the beginning of the design, and for

**Citation:** Xie, K.; Liu, J.; Liu, Y. A Power System Timing Data Recovery Method Based on Improved VMD and Attention Mechanism Bi-Directional CNN-GRU. *Electronics* **2023**, *12*, 1590. https://doi.org/ 10.3390/electronics12071590

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 18 February 2023 Revised: 23 March 2023 Accepted: 27 March 2023 Published: 28 March 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

some of the missing time series data, under the premise of satisfying the observable state estimation, it can be replaced by pseudo-measurements or by data that are similar in time and space without causing an unacceptable impact on the overall system state accuracy. In addition, the state estimation through the system can be used as a basis for filling in the missing data [10]. However, when there is a large amount of missing time series data, state estimation by redundancy is not possible. Then, the missing data need to be repaired by mathematical or engineering means.

Numerous solutions to the missing data issue have been put forth by domestic and international researchers. The pre-processing category and the post-assessment category are the two main categories. On the basis of system timing data and system topology, the latter primarily builds the system state equation to recover the data. In the literature [11], a real-time dynamic parameter estimation method based on vector measurement unit (PMU) data with extended Kalman filtering (EKF) is proposed. In order to reduce the computational difficulty of this method, the literature [11] uses a model decoupling technique. However, this method is only applicable to the real-time estimation of the states and parameters related to electromechanical dynamics. The literature [12] proposes a robust detection method using temporal correlation and statistical consistency of time series data, offering three innovative matrices that capture measurement correlation and statistical consistency by processing predicted states and reliable information inserted from phasor measurement units. Pre-processing techniques are primarily employed to recover missing data from known data. The two main categories of preprocessing ideas are: (1) analyzing the characteristics of the data in the missing data domain to complete the data as described in the literature [13–16] and (2) analyzing the overall trend and overall structure of the data and completing the data [17–19]. Ref. [13] used a Lagrangian interpolation polynomial method for adaptive estimation of incomplete and missing data, but this method is limited to the case where there are few missing data.There are also some scholars who convert the measured value into a Page matrix and then use low-rank matrix estimation based on the optimal singular value threshold to reconstruct the original signal [14]. The literature [15] first proved that the power quality data have the property of approximately low rank. Based on this, a multi-parametric joint rank optimization model is designed, and the alternating direction multiplier method is applied to decompose it into several subproblems for solving separately. At the same time, the optimal selection strategy of adaptive iteration steps is proposed to speed up the model solution for the problem that the traditional alternating direction multiplier method solves slowly. Ref. [16] uses the singular value threshold algorithm to complete the missing data twice and analyzes its error on the basis of the completion. However, the above two methods are not effective for complex missing data. According to Ref. [17], a shallow coder is used to learn the data features, and after processing, the data are supplemented by weighting the data structure. According to Ref. [18], forward and reverse GRU networks are used separately to learn the existing data, and their combined results are then weighted to achieve the goal of completing the data. The above two methods have a large gap in the reconstruction effect of different types of data. Ref. [19] has constructed an improved generative countermeasure network learning time series data with complex time and space relations. According to the data's redundancy and inherent physical and mathematical relations, the data can be restored to a considerable extent. However, this method consumes many resources and could be more conducive to practical use. Table A1 in the Appendix A shows a summary of information for similar work.

Based on the above background and the time series and multidimensional correlation characteristics of power system timing data, a method for recovering missing data from power system measurements based on dual radial gated cyclic units is proposed. The method learns the spatiotemporal characteristics of the historical data, obtains sufficient generalization capability for the time series data, constructs a mapping of the existing data to the missing data, and makes this mapping select the relatively valuable information in the existing data to repair the missing data in real time through a triple attention mechanism. In order to make full use of the existing information, this paper proposes a joint neural network approach, i.e., to build neural networks on both sides of the missing data separately and finally to obtain the weighted repair results by combining two neural networks. Finally, a comparison between simulated and actual data shows that this data-driven method of repairing missing data in power system measurements, which does not rely on the power system topology, can maintain a high accuracy rate under different numbers of absent conditions.

#### **2. Power System Time Series Missing Data Recovery Model Structure**

The method of time series data recovery of power systems based on VMD and triple attention mechanism bi-directional CNN-GRU is shown in Figure 1, which is divided into four main steps:

**Figure 1.** Power system timing data recovery model based on improved VMD and double radial GRU with a triple attention mechanism.


#### **3. Decomposition of Quantitative Time Series Data Based on Improved VMD**

VMD is an adaptive, quasi-orthogonal decomposition method as well as a more cutting-edge signal processing method that was proposed in 2014. The essence of the algorithm is Wiener filtering for noise reduction [20]. The goal is to decompose a time series data *X* adaptively into several Intrinsic Mode Functions (IMFs) with finite bandwidth *xk*. In order to calculate the spectral bandwidth of each component, this goal can be achieved by the following three steps: (1) Obtain the one-sided spectrum by Hilbert transform. (2) Transfer the spectrum of each quantitative data component to the baseband region by mixing an index tuned to the respective estimated center frequency. (3) Estimate the bandwidth of the decomposed signal by H1 Gaussian smoothing. For the input signal, the constrained variational model signal is represented as follows.

$$\min\_{\{\{\mathbf{x}\_k\}, \{\omega\_k\}}} \left\{ \sum\_k \left\| \partial\_t \left[ \left( \delta(t) + \frac{j}{\pi t} \right) \* \mathbf{x}\_k(t) \right] e^{-j\omega\_k t} \right\|\_2^2 \right\} \tag{1}$$
 
$$\text{s.t.}\\\sum\_k \mathbf{x}\_k = X$$

The center frequency of the corresponding component is represented by the letter {*ωk*} in the equation above. The original input signal *X* should be represented by the superposition of all components. Equation (1) can be changed into the following form by adding the quadratic penalty term and the Lagrange multiplier *λ* to reconstruct the constraint:

$$\begin{aligned} \mathcal{L}(\{\mathbf{x}\_k\}, \{\boldsymbol{\omega}\_k\}, \{\boldsymbol{\lambda}\}) &= a \sum\_k \left\| \partial\_t \left[ \left( \boldsymbol{\delta}(t) + \frac{\boldsymbol{j}}{\tau t} \right) \star \mathbf{x}\_k(t) \right] \right\|\_2^2 + \\ & \left\| \boldsymbol{X}(t) - \sum\_k \mathbf{x}\_k(t) \right\|\_2^2 + \left\langle \boldsymbol{\lambda}(t), \boldsymbol{X}(t) - \sum\_k \mathbf{x}\_k(t) \right\rangle \end{aligned} \tag{2}$$

where *α* is the penalty factor, which ensures the signal reconstruction's accuracy even when noise interference is present. The Lagrangian multiplier, *λ*, firmly guarantees that the constraints are upheld. The components and their center frequencies can be obtained from the saddle points of the above extended Lagrangian equation using the alternate direction method of multipliers (ADMM).

$$\mathfrak{A}\_{k}^{n+1}(\omega) = \frac{f(\omega) - \sum\_{i \neq k} \mathfrak{A}\_{i}(\omega) + \frac{\vec{\lambda}(\omega)}{2}}{1 + 2a(\omega - \omega\_{k})^{2}} \tag{3}$$

$$
\omega\_k^{n+1} = \frac{\int\_0^\infty \omega |\hat{u}\_k(\omega)|^2 d\omega}{\int\_0^\infty |\hat{u}\_k(\omega)|^2 d\omega} \tag{4}
$$

where *u*ˆ *n*+1 *<sup>k</sup>* represents the residual of the current Wiener filter, and the frequency domain mode *u*ˆ*k*(*ω*) can be Fourier inverted to produce the time domain mode *xk*(*t*) before its real part is taken [21].

The final number of modes the VMD algorithm produces is determined by the modal number k, which has a non-negligible position in the algorithm. The center frequencies of the decomposed components are typically compared to determine whether there is underor over-resolution, which has a relatively high degree of subjectivity. To a certain extent, the double-threshold screening method can prevent this issue [22].

The VMD analysis's components have narrow bandwidth properties, meaning that most of the modes are centered around the center frequency and have high corresponding amplitudes. Two thresholds—amplitude thresholds *T*<sup>2</sup> and and frequency interval thresholds *T*<sup>1</sup> —are established based on the aforementioned characteristics. By examining the spectral properties of the input timing signal and the threshold *T*1, it is possible to divide the entire spectrum into several frequency bands, with each band being used as a potential

component. The frequency bands above are measured for their corresponding amplitude using the threshold *T*2, and those whose amplitude satisfies the criteria are kept, while those with insufficient amplitude are ignored. The following four steps can be used to divide the application of the double threshold screening method to ascertain the modal number k: (1) Based on the spectral properties of the input data being examined, choose the appropriate frequency interval threshold *T*<sup>1</sup> and amplitude threshold *T*2. (2) Look for local maxima in the spectrum, and using the frequency interval threshold *T*1, divide the local maxima into corresponding frequency bands. (3) Examine the valid frequencies divided into each band; the valid frequency bands are those whose amplitudes meet the *T*<sup>2</sup> amplitude threshold. (4) The number of modes equals the number of legal frequency bands.

#### **4. Dual Attention Model**

Time series data of power systems are generated in chronological order. Considering the need for data analysis and storage, the power system timing data are often composed of discrete time series data. There is usually some connection between each power system timing data and other power time series data [23]. For the missing power system timing data, the reconstruction of power system data can be realized by analyzing the more complete time series data on both sides of it and mining the inner correspondence law between the complete power time series data and the missing power system timing data on the basis of considering the temporal sequence characteristics [24].

Figure 2 presents a comprehensive schematic of the dual attention model implementation, which comprises three primary components: the CNN layer, the GRU layer, and the attention layer. The components derived from the enhanced VMD algorithm are normalized using MinMaxScaler to confine the overall value range between 0 and 1 and subsequently fed into the CNN layer. During data input, the individual features of the input are weighted using the feature attention mechanism, thereby reinforcing the features that exert a significant impact on the outcome. On the output side, the temporal attention mechanism is employed to enhance the model accuracy by capitalizing on the correlation of the data on the time scale.

#### *4.1. CNN-GRU Neural Network*

Through a convolutional kernel, convolutional neural networks extract features from the input data locally, and convolutional neural network units learn the patterns in the input data window [25]. Convolutional neural networks gain two significant characteristics: First, the patterns studied by a convolutional neural network are translation invariant. A specific pattern learned by a convolutional neural network in a local data segment can identify this pattern in an arbitrary place.

Due to the stability of electricity consumption habits, the power system data possesses apparent repeatability. For example, the curve of regional load data in the same period of two days has substantial similarity, and the photovoltaic generation curve also shows strong similarity in two days with similar climate environments. This similarity provides a reasonable basis for convolutional neural networks. Second, the convolutional neural network can learn patterns' spatial hierarchy; the first convolutional layer can learn smaller local patterns, and the second convolutional layer further reconstructs the first convolutional layer's patterns to form larger patterns. This feature can make the convolutional neural network can learn more and more complex and abstract data [26]. Regarding temporal data processing, one-dimension (1D) convolution is commonly utilized. The operational principle of one-dimension convolution is depicted in Figure 3. The 1D convolutional layer is adept at detecting local patterns in a sequence. As the same input transformation is applied to each sequence segment, patterns discovered at one position in the temporal data can be subsequently recognized at other positions, rendering the 1D convolutional neural network translation invariant (concerning time translations).

**Figure 2.** Dual attention CNN-GRU model.

**Figure 3.** Working principle of one-dimensional convolution neural network.

By sampling the feature map through the maximum pooling layer, it can reduce the number of input features in the upper convolutional layer by sampling the input signal, which makes the model structure more streamlined, and the number of parameters to be computed decreases significantly. The maximum pooling layer extracts the essential information from the upper layer and transfers it to the lower convolutional layer, which allows the convolutional layer to have a larger and larger observation window (the window covers the proportion of the original input size), thus making the convolutional layer neural network have a spatial hierarchy.

A variant of the long short-term memory (LSTM) neural network, the GRU is a relatively new neural network structure primarily proposed by Junyoung Chung et al. at the International Conference on Machine Learning in 2015 [27]. In the GRU, the cell state and hidden state are combined with forgetting and input gates to create a single update gate. When applied to large-scale data, the original three gate structures are combined into just two gates, the parameters are decreased while maintaining the characteristics of LSTM, and the computational speed is thus noticeably increased.

The structure of the GRU shows the two gates that the GRU has: the reset gate rt and the update gate *Zt* [28]. The update gate is a linear transformation of the input signal *xt* at time step *t* and the state *ht*−<sup>1</sup> at the previous time step, respectively, which are added together, and the information obtained is activated by a sigmoid function. An update gate determines how much of the signal from the past is going to be passed to the future. The reset gate is similar to the update gate in that the input signal *xt* at time step *t* and the state *ht*−<sup>1</sup> at the previous time step are linearly transformed and added together. The resulting information is activated by a sigmoid function. However, the essence is

to decide how much information needs to be forgotten. After obtaining the reset gate, the linearly transformed reset gate and the linearly transformed input are added. The result is put into the hyperbolic tangent activation function to obtain the current required memory content ˆ *ht*. A unit of 1 is subtracted from the output of the update gate to obtain a difference. Then, the product of this difference and the current memory content is multiplied by the product of the state of the previous time step and the result of the update gate to obtain the state of the present time step. The specific formula for GRU is shown below.

$$\begin{array}{c} z\_t = \sigma\_\% (W\_z x\_t + \mathcal{U}\_z h\_{t-1} + b\_z) \\ r\_t = \sigma\_\% (W\_r x\_t + \mathcal{U}\_r h\_{t-1} + b\_r) \\ h\_t = \Phi\_h (W\_h x\_t + \mathcal{U}\_h (r\_t \odot h\_{t-1}) + b\_h) \\ h\_t = (1 - z\_t) \odot h\_{t-1} + z\_t \odot h\_t \end{array} \tag{5}$$

where *xt* denotes the input vector; *ht* denotes the output vector; ˆ *ht* denotes the current desired memory content; *zt* denotes the update gate; *rt* denotes the reset gate; *W*, *U* and *b* represent the parameter matrices and vectors; *σ<sup>g</sup>* denotes the sigmoid function; *φ<sup>h</sup>* denotes the hyperbolic tangent function; and denotes the Hadamard product.

#### *4.2. Attentional Mechanisms*

The processing of visual signals by attentional mechanisms is specific to human vision. Human vision requires daily access to enormous amounts of image data as one of the most crucial information acquisition channels. The brain typically concentrates on the image's critical components while ignoring the image's comparatively minor components when processing this information. This mechanism has the potential to speed up and improve the processing of visual data significantly [29]. The attention mechanism in this paper is similar to this in that the important parts of the signal are selected and given a relatively large weight, resulting in a greater increase in the output accuracy of the whole system.

Let the input timing and the corresponding characteristics be:

$$\mathbf{x} = [\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_T] = [\mathbf{x}^{(1)}, \mathbf{x}^{(2)}, \dots, \mathbf{x}^{(n)}]^T \tag{6}$$

The expansion can be represented by the following matrix.

$$\mathbf{x} = \begin{bmatrix} \mathbf{x}\_1^{(1)} & \mathbf{x}\_1^{(2)} & \cdots & \mathbf{x}\_1^{(n)} \\ \mathbf{x}\_2^{(1)} & \mathbf{x}\_2^{(2)} & \cdots & \mathbf{x}\_2^{(n)} \\ \vdots & \vdots & & \vdots \\ \mathbf{x}\_T^{(1)} & \mathbf{x}\_T^{(2)} & \cdots & \mathbf{x}\_T^{(n)} \end{bmatrix} \in \mathbf{R}^{T \times n} \tag{7}$$

*xt* = *x* (1) *<sup>t</sup>* , *x* (2) *<sup>t</sup>* , ··· , *x* (*n*) *t* (1 ≤ *t* ≤ *T*) is the feature set of the above n features at moment *t*. *x*(*m*) = *x* (*m*) <sup>1</sup> , *x* (*m*) <sup>2</sup> , ··· , *x* (*m*) *T* (1 ≤ *m* ≤ *n*) is the value of the mth relevant eigenvalue at moment *t*(1 ≤ *t* ≤ *T*).

In order to obtain the association of each feature variable with the current time series, i.e., for the present time series, the importance of its corresponding feature quantity, the feature attention method is used for calculation. The attention weights corresponding to the feature quantities at the current moment are calculated by associating the feature variables at the moment *t* with the corresponding output variables at moment *t* − 1, *ht*−<sup>1</sup> and the state variables, *st*−1.

$$\varepsilon\_t^{(m)} = V\_\varepsilon^T \tanh\left(\mathcal{W}\_\mathbf{e}[h\_{t-1}; s\_{t-1}] + \mathcal{U}\_\mathbf{e} \mathbf{x}^{(m)} + b\_\mathbf{e}\right) \tag{8}$$

*Ve*, *We*, *Ue* are the weight matrices, respectively. *be* is the corresponding bias term.

After obtaining the weight values, they need to be normalized so that the sum of the weight values corresponding to moment *t* is 1.

$$a\_t^{(m)} = \frac{\exp\left(c\_t^{(m)}\right)}{\sum\_{i=1}^n \exp\left(c\_t^{(i)}\right)}\tag{9}$$

After obtaining the weighting coefficients, the input eigenvalues are multiplied by them to obtain the weighted input:

$$\widetilde{\mathbf{x}}\_t = \left[ \mathbf{a}\_t^{(1)} \mathbf{x}\_t^{(1)}, \mathbf{a}\_t^{(2)} \mathbf{x}\_t^{(2)}, \dots, \mathbf{a}\_t^{(n)} \mathbf{x}\_t^{(n)} \right] \tag{10}$$

The obtained adaptive weighted input *x*'*<sup>t</sup>* afterward is fed into the subsequent model instead of the original input *xt*. This method can dynamically extract the correlation between the feature values and the corresponding time series. The state *ht* of the hidden layer at each moment needs to be updated at the next moment.

$$h\_t = f\_1(h\_{t-1}, \vec{x}\_t) \tag{11}$$

*f*<sup>1</sup> is the GRU network unit.

After the adaptive feature attention results are computed, the output results obtained by the feature attention mechanism are used as the input of the temporal attention mechanism in the next stage. The time series attention mechanism focuses attention on the input series and obtains the adaptive time principal series output by weighted average for the input time series [30]. Figure 4 illustrates the principle of the attention mechanism implementation. The attentional mechanism is calculated as follows:

$$c\_i = \sum\_{s=1}^{T} \alpha\_{ts} \overline{h}\_s \tag{12}$$

$$\alpha\_{ts} = \frac{\exp\left(s \coreron{core}\left(h\_{t\_t}\overline{h}\_s\right)\right)}{\sum\_{s'=1}^S \exp\left(s \corearrow \left(h\_{t'}\overline{h}\_{s'}\right)\right)}\tag{13}$$

where *ht* is the output of the decoder corresponding to time *t* and *h h* −*<sup>s</sup>* is the source hidden state of the encoder. Here, score expression is specifically calculated as follows:

$$score\left(h\_{\mathbf{f}\_{\prime}}\overline{h}\_{\mathbf{s}}\right) = h\_{\mathbf{f}\_{\prime}}^{\top}\overline{h}\_{\mathbf{s}}\tag{14}$$

The core idea is to make the context vector *ct*, which is otherwise invariant in the seq2seq structure, dynamic by reorganizing it several times. The traditional context vector *ct* selects the output of the last time step as the final output because the data can be regarded as the process of gradually extracting features after GRU processing. The output of the last time step often contains important information about the state of the past time steps. However, for data with long time steps and strong information correlation, this model often leads to the vital information of a past time step being ignored or not highlighting the information strongly associated with the prediction results [31]. Through the temporal attention mechanism, the vector *ct* is dynamized, and *ct* is no longer just the output of the last time step but also a dynamic combination of individual time steps. Different weight coefficients are given for different predicted contents, thus giving different *ct*, which finally achieves the role of extracting necessary information in the time series.

Fuse the real-time updated context vector *ct* with the output *ht* of time step *t* as the input of the decoder:

$$\tilde{h}\_{\rm t} = \tanh(\mathcal{W}\_{\rm t}[c\_{\rm t}; h\_{\rm t}] + b\_{\rm c}) \tag{15}$$

where *Wc* and *bc* are the weight and bias of the fused input, respectively; tanh is the hyperbolic tangent function; ˜ *ht* is the encoder output.

**Figure 4.** Schematic diagram of realization of attention mechanism.

#### **5. Example Analysis**

#### *5.1. Data Pre-Processing and Error Indicators*

In this paper, three datasets are used to validate the model proposed in this paper: Singapore power load data, a public wind power dataset in the United States, and a public photovoltaic (PV) output dataset in Australia. The first 80% of the data is used as the training set, the first 80% to 95% is used as the validation set, and the last 5% is used as the test set. The wind power generation data set has a total of 50,500 points, with a sampling period of 10 min. Again, the first 80% of the data is used as the training set, the first 80% to 95% of the data is used as the validation set, and the last 5% of the data set is used as the test set. The photovoltaic output at night is always zero, so there is no meaningful training for this period, and the photovoltaic data in the night part of the data have been removed during the data processing. The photovoltaic output data set has 98,000 points with a sampling period of 5 min. Here, 80% of the data is used as the training set, the first 80% to 95% is used as the validation set, and the last 5% is used as the test set. In order to make the neural network training more efficient beforehand, the MinMaxScaler in sklearn was used to normalize the data by min–max between (0, 1), which are calculated as:

$$MAE = \frac{1}{m} \sum\_{i}^{i-1} |(y\_i - \hat{y}\_i)| \tag{16}$$

$$MSE = \frac{1}{m} \sum\_{i=1}^{i=1} (y\_i - \hat{y}\_i)^2 \tag{17}$$

where *m* denotes the total number of model output results; *yi* and *y*ˆ*<sup>i</sup>* mean the actual value of the *i*th point in the output results and the model output value, respectively. Smaller MAE and MSE indicate better model fit.

Adam is chosen as the optimizer (adaptive moment estimation), and the Adam optimization algorithm can achieve adaptive gradient selection, which can jump out some local minima.

#### *5.2. Model Configuration*

The convolutional layers are stacked in two layers. The number of neurons in each layer is 128, the convolutional window is 64, and the activation function is selected as Relu. The pooling layer is selected as the maximum pooling. The encoder chooses two layers of GRU stacking: 128 neurons for the first layer and 64 neurons for the second layer. Tanh is selected for both activation functions, the discard rate is set to 0.1 for the first layer and 0.24 for the second layer, and the decoder chooses single-step GRU decoding with 64 neurons. Each GRU step is output through a unit fully connected layer. The training step size is 0.0075.

#### *5.3. Data Set Comparison*

In this paper, a total of five models are introduced for comparison, and the five models are LSTM, multi-layer perceptron (MLP), Seq2Seq, CNN and the model of this paper using one-sided reconstruction. All models use the same normalization procedure for data output, and the input of the models is the corresponding historical sequence data to ensure the scientific accuracy and validity of the comparison method. The data in the test set were taken from 32 sampling points and 128 sampling points for data reconstruction, and the evaluation results of each quantitative data index are shown in Tables 1 and 2. In order to make the comparison results more intuitive and comparable, the comparison results are directly normalized using the model output results, which can avoid the problem of the inability to compare because of different units and large differences in the size of the original data.


**Table 1.** MAE and MSE of each model with reconstructed length of 32 sampling points.


**Table 2.** MAE and MSE of each model with reconstructed length of 128 sampling points.

5.3.1. Reconstructed Data Length of 32 Sampling Time Points Reconstruction Effect Comparison

Table 1 presents a summary of the Mean Squared Error (MSE) and Mean Absolute Error (MAE) for various models with a reconstructed data length of 32. The experimental results from three data sets, namely, the load data set, wind power generation data set, and photovoltaic power generation data set, are arranged from left to right. As the summary table employs normalized data results, it does not include any units. Figures 5–7 depict the schematic diagram of the load data reconstruction results, wind power generation data reconstruction results, and photovoltaic power generation data reconstruction results, respectively, with a reconstructed data length of 32 sampling time points. To facilitate comparison and highlight the characteristics of model reconstruction, a period with more salient features was selected from each of the three datasets.

**Figure 5.** Results of reconstructing 32 load time sampling points with different models.

**Figure 6.** Results of reconstructing 32 wind power time sampling points with different models.

**Figure 7.** Results of reconstructing 32 photovoltaic power time sampling points with different models.

For the loaded data set, we can see that when the reconstructed data length is 32 sampling time points, the average MAE of the proposed model in this paper decreases by 32.41%, 13.66%, 20.97%, 8.41%, and 31.23% relative to the unilateral reconstruction model, LSTM, CNN, Seq2Seq, and MLP, respectively. The mean MSE of the proposed model decreases by 48.53%, 20.29%, 30.34%, 9.13%, and 25.19% relative to the one-sided reconstruction model, LSTM, CNN, Seq2Seq, and MLP, respectively.

For the wind power dataset, we can see that when the reconstructed data length is 32 samples, the mean MAE and mean MSE of the model proposed in this paper decreased by 43.83%, 34.79%, 37.01%, 24.44%, 15.42%, 37.00%, and 49.65%, 39.54%, 42.67%, 15.42%, and 37.29%, respectively.

For the photovoltaic generation dataset, we can see that when the reconstructed data length is 32 samples, the MAE mean and MSE mean of the proposed model in this paper decreased by 45.89%, 34.79%, 37.08%, 23.29%, 37.86%, and 49.76%, respectively, when compared to the unilateral reconstruction model, LSTM, CNN, Seq2Seq, and MLP. 39.69%, 42.80%, 15.53%, and 37.75%.

The reconstructed data results graph demonstrates that the model proposed in this paper is more closely aligned with the actual trend for data reconstruction. Specifically, the proposed model better reconstructs the data mutation part. An extreme value point near the fifteenth sampling point is observed for the load reconstruction results, and the proposed model fits this extreme value point better. Furthermore, the rise and fall of the load reconstruction results are also in close proximity to the actual data. The effect is more pronounced for the photovoltaic power generation data and wind power generation data with apparent changes. The wind power data reconstruction results show that the data are more fluctuating, and the data are relatively very variable, with changes occurring repeatedly within 32 sampling points. The proposed model can react quickly at the abrupt change points, such as the 6th sampling point, 19th sampling point, 22nd sampling point, and 28th sampling point, and keep up with the data change trend in time. The relative change of photovoltaic data is located between the load and wind power generation. However, photovoltaic data have an obvious characteristic that the data will occasionally suddenly drop to very low or rise from very low to a higher position. This characteristic is mainly due to the photovoltaic power generation's dependence on weather, which has a significant impact on light intensity. For all three data sets in this paper, the proposed model's reconstructive ability on relatively short missing data is significantly stronger than the comparison model.

5.3.2. Reconstructed Data Length of 128 Sampling Time Points Reconstruction Effect Comparison

Table 2 presents a summary of the MSE and MAE for each model with a reconstructed data length of 128. It includes experimental results from three datasets. Appendix A Figure A1 displays a schematic diagram of the reconstructed load data with a reconstructed data length of 128 sampling time points. Appendix A Figure A2 exhibits a schematic diagram of the reconstructed wind power generation data with a reconstructed data length of 128 sampling time points. Appendix A Figure A3 illustrates a schematic diagram of the reconstructed photovoltaic power generation data with a reconstructed data length of 128 sampling time points. The three datasets were selected for their distinct performance characteristics over a period to facilitate comparison and emphasize the features of model reconstruction.

For the loaded dataset, when the reconstructed data length is 128 samples, the mean MAE of the model proposed in this paper decreases by 33.01%, 16.56%, 23.68%, 11.04%, and 36.13% relative to the one-sided reconstruction model, LSTM, CNN, Seq2Seq, and MLP, respectively. The mean MSE of the model proposed in this paper decreased by 48.09%, 26.81%, 35.31%, 15.92%, and 37.49% relative to the one-sided reconstruction model, LSTM, CNN, Seq2Seq, and MLP, respectively.

For the wind power dataset, when the reconstructed data length is 128 samples, the mean MAE and mean MSE of the model proposed in this paper decreased by 40.67%, 51.56%, 53.20%, 21.47%, 36.99%, and 49.65%, respectively, when compared to the unilateral reconstruction model, LSTM, CNN, Seq2Seq, and MLP. 23.06%, 17.67%, 21.72%, and 44.69%.

For the photovoltaic generation data set, when the reconstructed data length is 128 samples, the mean MAE and mean MSE of the model proposed in this paper decreased by 47.71%, 35.76%, 36.41%, 26.31%, 41.03%, and 48.52%, respectively, when compared to the unilateral reconstruction model, LSTM, CNN, Seq2Seq, and MLP, generating values of 43.49%, 47.82%, 20.50%, 47.79%.

We observe that the model proposed in this paper more closely approximates the actual trend in data reconstruction. Specifically, in the case of data mutations, the model presented here performs better. Similar to the analysis with 32 samples, in the load reconstruction graph, although the change structure of 128-sample data is more prominent than that of 32-sample data, our model can track changes more effectively. Likewise, this model outperforms other models in tracking wind and photovoltaic reconstructed graphs. As with the analysis of 32-sample reconstructions, our model excels at handling large data changes over short periods of time in wind reconstruction; for instance, it fits more closely around the 40th and 60th sample points. The same conclusion applies to photovoltaic data reconstruction. It is evident that our model also surpasses other models in reconstructing data at 128 sampling time points. For all three datasets examined in this paper, our proposed model significantly outperforms comparison models in reconstructing relatively long missing data.

#### 5.3.3. Reconstructing Data Error Distribution Analysis

Figures 8–10 display error analysis plots for the reconstruction results of our model and other comparison models at 32 sampling time points. Appendix A Figures A4–A6 present error analysis plots for the reconstruction results of our model and other comparison models at 128 sampling time points. From the perspective of error for different lengths of missing data, the range and median error of the error distribution for 128 sampling time points in all three datasets are significantly higher than those for 32 sampling time points. For instance, the median absolute values of load errors for our model, CNN, and MLP increase from 0.07, 0.126, and 0.25 to 0.31, 0.49, and 0.55, respectively. In terms of absolute error value distribution data, the reconstruction results of our proposed model are more concentrated around the median compared to several other models.

Figure 8 and Appendix A Figure A4 display the error plots of load data reconstruction results for 32 and 128 sampling points, respectively. The analysis of the load data reveals that the reconstruction error of 128 sampling points for all models is significantly larger than the error of 32 sampling points. For the proposed model, most of the errors increase from the 0–50 GW interval to the 0–115 GW interval. However, compared to other models, the error distribution of the proposed model is still relatively concentrated, which is consistent with the conclusion that the MAE and MSE are relatively smaller in the previous analysis. The performance of the proposed model is more outstanding in the error distribution interval 10–90%, error distribution interval 25–75%, and median error

distribution, especially the 25–75% error distribution index, which is lower than other models. This indicates that most of the errors of the proposed model are concentrated around 0. Compared to other models, the main reason for the superior performance of the CNN model in the 32 sampling case is related to its working principle. CNN itself does not have the concept of time flow and uses the form of convolutional kernel to identify data features. Even if the recognition window is increased, as long as the features identified by the CNN convolutional kernel can still be applied, the error distribution of the CNN model does not change much.

Figure 9 and Appendix A Figure A5 depict error plots of wind power generation data reconstruction outcomes for 32 and 128 sampling points, respectively. Figure 10 and Appendix A Figure A6 portray error plots of reconstruction outcomes for photovoltaic generation data with 32 and 128 sampling points, respectively. In both cases, wind power data and photovoltaic data, we observe a similar phenomenon to the load data, where the error distribution of each model increases significantly. However, the model proposed in this paper is more dispersed and concentrated in comparison to the other models both for the 32-sample-point reconstruction and the 128-sample-point reconstruction. With regard to the 25–75% error distribution metric, it is evident that the proposed model outperforms other models.

Nonetheless, there are also some individual characteristics. For the wind power data, the rise in error distribution from 32 to 128 sampling points is not as evident as that of the load data set. This is primarily because the wind power data set is more intricate than the load data set, and the changing trend is more challenging to capture. Therefore, even if it is extended to 128 sampling points, the increase in error is limited, given that the error of 32 sampling points is already significant. As for the photovoltaic data, the error variation is between the load data set and the wind power data set, owing to its inherent regularity. However, there are some random variations in the data set.

The absolute value of Seq2Seq error in reconstructing the 128 data sampling points of the model load data set in this paper appears to be more concentrated than the model proposed in this paper. This is probably due to the load data set used being the load data of Singapore, which is relatively stable, and the trend changes are relatively evident. In addition, the data set has fewer dramatic fluctuations at certain points in time, making the data itself very reconfigurable. For the Seq2Seq model, this can be more perfect in grasping the data structure characteristics, and even without adding additional data processing means, the data reconfiguration effect is already excellent. However, for wind power generation data and photovoltaic power generation data with significant fluctuations and non-obvious trend changes, it has been challenging for the Seq2Seq model to capture the complete intrinsic pattern of the data, resulting in a significantly enlarged error distribution.

**Figure 8.** Error results of reconstructing 32 time sampling points of load with different models.

**Figure 9.** Error results of reconstructing 32 wind power generation time sampling points with different models.

**Figure 10.** Error results of reconstructing 32 photovoltaic power generation time sampling points with different models.

5.3.4. 128 Sample Points Reconstruction Results Compared to 32 Sample Points Reconstruction Results

It is more challenging to reconstruct 128 sampled time points of data than to reconstruct 32 sampled time points of data. Comparing Tables 1 and 2, we can observe a significant increase in the MAE and MSE metrics for both models. This is because (1) the same data set is evidently less rich for 128 sampling points than for 32 sampling points, resulting in a model that is less comfortable with 128 sampling points. (2). Reconstructing 128 sampling points using the same deep learning model may lead to model limitations, as the model may not be able to handle more data points or capture more complex relationships in the data set, resulting in higher MAE and MSE metrics.

Figure 11 illustrates the percentage of MAE increase for 128 sample points compared to 32 sample points. Figure 12 illustrates the percentage increase of MSE for 128 samples compared to 32 samples. From Figure 11, it is evident that the MAE of each model increased significantly for 128 samples compared with 32 samples, and the most significant increase was 150% for LSTM and CNN. Furthermore, by comparing the three data sets, the MAE of this model is the smallest in most cases, indicating that this model makes the best use of the data. Similarly, it can be observed from Figure 12 that the performance of this model is superior in the other two datasets, except for the wind power dataset. The possible reasons for the low MSE of CNN and LSTM models in wind power data are: (1) The MSE of these two models is already relatively large in the province at 32 sampling points, which means that the model itself does not utilize the data to a high degree. (2) The regularity of the wind power data set itself is more difficult to find, which leads to the model itself not resolving the data set well, resulting in the reconstruction length. When the length of the reconstruction increases, there is a more obvious decrease in the degree of utilization.

**Figure 11.** Percentage increase in MAE at 128 sampling points compared to 32 sampling points.

**Figure 12.** Percentage increase in MSE at 128 sampling points compared to 32 sampling points.

By comparing the load with 32 sampling time points and 128 sampling time points, it is evident that the prediction model in this paper has a better fit to the true value, and the accuracy decreases more slowly as the data length increases. Compared with other models, the model proposed in this paper not only reconstructs the trend of the missing data more accurately but also performs better for the part of the load with large fluctuations and can analyze the law of sudden changes to a greater extent, which improves the overall model accuracy. Comparing the wind power generation data and photovoltaic power generation data, which are more random and fluctuating, it is clear that the model proposed in this paper fits the sudden changes in the data significantly better than the other models, and it not only fits the changes in the trend of the data but also captures the drastic local changes more accurately, making the output of this model closer to the actual situation. For example, in the case of photovoltaic power generation data, although the overall trend can be captured more obviously, the overall data have the characteristic of fluctuating sharply up and down around the trend, and the model in this paper can capture the larger fluctuations more accurately, so that the details can be supplemented to make the overall error smaller.

#### **6. Conclusions**

This paper proposes a method to recover power system timing data based on improved VMD and attention mechanism bi-directional CNN-GRU, which initially processes the data by VMD so that the data can be better divided into multiple groups based on frequency centroids, extracts the temporal characteristics of the time series data by using a CNN model and then realizes the Seq2Seq structural model combined with multiple attention mechanisms for the reconstruction of the data. This paper compares the unilateral reconstruction model with LSTM, CNN, Seq2Seq, and MLP. It analyzes the characteristics of the model proposed in this paper compared to other models in terms of three indicators: MAE, MSE, and reconstruction result error. Although the result analysis is sensitive to the data scenarios, the following conclusions can be drawn:


In the future, we can study the characteristics of different models with a different focus on data analysis and combine multiple models dynamically to achieve more accurate data processing capability.

**Author Contributions:** Methodology, K.X.; Formal analysis, Y.L.; Resources, K.X.; Writing – original draft, K.X.; Writing– review and editing, J.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

**Figure A2.** Results of reconstructing 128 wind power time sampling points with different models.

**Figure A3.** Results of reconstructing 128 temporal sampling points of photovoltaic power generation with different models.

**Figure A4.** Error results of reconstructing 128 time sampling points of load with different models.


**Figure A5.** Error results of reconstructing 128 wind power generation time sampling points with different models.

**Figure A6.** Error results of reconstructing 128 photovoltaic power generation time sampling points with different models.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **An Improved Deep Reinforcement Learning Method for Dispatch Optimization Strategy of Modern Power Systems**

**Suwei Zhai 1, Wenyun Li 2, Zhenyu Qiu 3, Xinyi Zhang <sup>3</sup> and Shixi Hou 3,\***


**Abstract:** As a promising information theory, reinforcement learning has gained much attention. This paper researches a wind-storage cooperative decision-making strategy based on dueling double deep Q-network (D3QN). Firstly, a new wind-storage cooperative model is proposed. Besides wind farms, energy storage systems, and external power grids, demand response loads are also considered, including residential price response loads and thermostatically controlled loads (TCLs). Then, a novel wind-storage cooperative decision-making mechanism is proposed, which combines the direct control of TCLs with the indirect control of residential price response loads. In addition, a kind of deep reinforcement learning algorithm called D3QN is utilized to solve the wind-storage cooperative decision-making problem. Finally, the numerical results verify the effectiveness of D3QN for optimizing the decision-making strategy of a wind-storage cooperation system.

**Keywords:** wind farm; energy storage system; reinforcement learning; deep neural networks

#### **1. Introduction**

Since the beginning of the 21st century, higher requirements for energy conservation, emission reduction, and sustainable development have been put forward as a result of the increasing pressure from the use of global resources. Thus, clean energy has gained much attention, which further accelerates the global energy transformation [1–3]. At present, the commonly used clean energy sources include wind energy, solar energy, and tidal energy. Among these clean energy sources, wind energy outperforms with its rich resources, low cost, and relatively mature technology [4,5].

However, because of the great correlation between wind energy and environmental information, its power generation is characterized by randomness, uncontrollability, and volatility, which seriously affects the power balance and threatens the stable and safe operation [6]. Equipping the wind farm with an energy storage system can alleviate the above problems to a certain extent [7–10]. Therefore, how to realize a high-efficient windstorage cooperative decision-making is a key issue for promoting the full absorption of wind energy [11,12].

Reinforcement learning, also known as a promising information theory, is a machine learning method based on environmental feedback information [13,14]. Its decision theory is very suitable for issues containing complex environments and multiple variables. At present, some studies have proven the feasibility and effectiveness of the energy allocation strategy using reinforcement learning in the field of power system, such as load frequency control on the generation side and market competition strategy [15–18].

Despite several works that have proposed reinforcement learning methods for windstorage cooperative decision-making, some issues still exist, as follows:

(1) The flexible loads embedded in the wind-storage cooperative framework have not been developed sufficiently in the existing literature. In [11,19–21], the authors did not

**Citation:** Zhai, S.; Li, W.; Qiu, Z.; Zhang, X.; Hou, S. An Improved Deep Reinforcement Learning Method for Dispatch Optimization Strategy of Modern Power Systems. *Entropy* **2023**, *25*, 546. https:// doi.org/10.3390/e25030546

Academic Editor: Luis Hernández-Callejo

Received: 26 January 2023 Revised: 15 March 2023 Accepted: 15 March 2023 Published: 22 March 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

<sup>1</sup> Electric Power Research Institute of China Southern Power Grid Yunnan Power Grid Co., Ltd., Kunming 650217, China

focus on the favorable effect of the flexible loads in the proposed wind-storage model. As an example, flexible loads were considered in [22], where the benefits from the suitable management of demand-side flexible loads were validated. However, the detailed formula for when the load in the price response load model should be shifted was not given.

(2) The exploration of reinforcement learning methods for wind-storage cooperative decision-making needs to be enhanced. In [19,20,23,24], a deep Q-learning strategy was considered in wind-storage systems. However, the main mechanism of the deep Q-learning strategy is to select the actions that can obtain the maximum benefits according to the Q values, which are constructed by the state and action. It has been reported that using the same networks to generate the Q values and its maximum estimated value will result in the maximizing deviation issue, which tends to deteriorate the network accuracy.

Motivated by the above analysis, a novel wind-storage cooperative decision-making model including demand-side flexible loads is developed in this paper, which comprehensively considers the direct or indirect control of various power components, improves the reasonable allocation ability of the energy controller, and enhances the economy and stability of the power grid. Moreover, in order to tackle the defects of the traditional deep Qlearning method, the dueling double deep Q-network (D3QN), which is constructed by two networks (the evaluation network and target network), is developed for the wind-storage cooperative decision-making control mechanism in this study.

The remainder of this study is organized as follows: wind-storage cooperative model and D3QN are presented in Section 2. In Section 3, the wind-storage cooperative decisionmaking algorithm using D3QN is presented. The algorithm evaluation details and the numerical results are presented in Sections 4 and 5. Section 6 presents the conclusions.

#### **2. Wind-Storage Cooperative Model and D3QN**

#### *2.1. Wind-Storage Cooperative Decision-Making Model*

This study mainly focusses on a wind-storage cooperative model, including wind turbines and energy storage systems, which also is connected to the external power grid.

The architecture of the wind-storage cooperative model is shown in Figure 1. Three layers exist: the electricity layer, information layer, and signal layer. The electricity layer includes a distributed energy resources (DER) based on wind power, an energy storage system (ESS) for the storage and release of wind power energy, a group of thermostatically controlled loads (TCLs), and a group of price responsive loads. The information layer is composed of a two-way communication system between the external power grid, each power module, and the energy controller (EC). Information such as electricity price, as well as the battery charge and discharge status are transmitted in the information layer. The signal layer transmits the control signals sent by the energy controller to each controllable module. The whole system model has three direct control points, namely, the switch control of TCLs, the charging and discharging control of ESS, and the trading control of energy on the external power grid.

At the same time, the whole wind-storage cooperative model can also be regarded as a multi-agent system. Each module in the system is regarded as an autonomous agent, which can interact with the environment and other agents. Moreover, the simple or complex behavior of each agent is controlled by its internal model. The models used in each module of the whole wind-storage cooperation model will be introduced in detail below.

#### 2.1.1. External Power Grid

Because of the intermittent and uncontrollable characteristics of DER, the use of DER alone may not be able to balance the relationship between supply and demand in the power grid. Therefore, the external power grid is considered as the regulatory reserve in this system model. The external power grid can provide electric energy immediately when the wind-storage energy is insufficient, and the external power grid can also accept the excess electricity when the wind energy is in surplus. The transaction price is defined by

the real-time price in the power market. The market prices are expressed as (*P<sup>u</sup> <sup>t</sup>* , *P<sup>d</sup> <sup>t</sup>* ), where *Pu <sup>t</sup>* and *P<sup>d</sup> <sup>t</sup>* represent the increased and decreased price, respectively.

**Figure 1.** Wind-storage cooperative model.

#### 2.1.2. Distributed Energy Module

Wind turbines are considered as the distributed energy equipment in this study. Specifically, actual wind data from a wind farm in Finland [25] are directly used to construct the model of DER. DER shares the currently generated electric energy information *Gt* with the energy controller.

#### 2.1.3. Energy Storage System Module

In order to reasonably optimize the allocation of energy and reduce the cost of energy consumption, this study uses the community energy storage system, rather than a separate household storage battery. As a centralized independent energy storage power station invested by a third party, the community energy storage system can integrate and optimize the allocation of the dispersed energy storage resources from the power grid side, power supply side, and user side.

For each time step *t*, the dynamic model of ESS is defined as follows [26]:

$$B\_t = B\_{t-1} + \eta\_t \mathbf{C}\_t - \frac{D\_t}{\eta\_d} \tag{1}$$

where *Bt* ∈ [0, *B*max] is the electric energy stored by ESS at time *t*, and *B*max is the maximum storage capacity of ESS. *η<sup>c</sup>* and *η<sup>d</sup>* are the charging and discharging efficiency coefficients of energy storage equipment, respectively, and (*ηc*, *η<sup>d</sup>* ) ∈ (0, 1] 2 . The variables *Ct* ∈ [0, *C*max] and *Dt* ∈ [0, *D*max] represent charge and discharge power, respectively, which are limited by the maximum charge and discharge rate *C*max and *D*max of ESS, respectively.

The state-of-charge variable of ESS is defined as *BEC*:

$$BEC\_t = \frac{B\_t}{B\_{\text{max}}} \times 100\% \tag{2}$$

When the energy controller releases the charging signal, ESS obtains the current electricity stored in the battery and verifies the feasibility of the charging operation by referring to the maximum storage capacity *B*max and the maximum charging rate *C*max. Then, ESS stores the corresponding electricity according to the actual situation and the remaining excessive electricity will be sold to the external power grid. When ESS receives the discharge signal, it verifies the relevant conditions again to judge the operational feasibility and provides the electricity accordingly. If ESS cannot fully provide the requested electricity, the insufficient part will be automatically provided by the external power grid, and the agent will need to pay the relevant costs.

#### 2.1.4. Thermostatically Controllable Load

Thermostatically controllable loads (TCLs) are characterized by their large size, flexible control, and energy conservation. In this study, it is assumed that the vast majority of households are equipped with TCLs, such as air conditioners, water heaters, and refrigerators. These TCLs can be directly controlled in each time unit *t* and the control signal comes from the TCL aggregator. As EC directly controls TCL equipment, this study defines that TCL will only be charged for power generation costs *Cgen* in order to compensate TCL users. To maintain the comfort of users, each TCL is equipped with a backup controller, which can keep the temperature within an acceptable range. The backup controller receives the on/off operation *u<sup>i</sup> <sup>t</sup>* from the TCL aggregator and modifies its action by verifying the temperature constraints. The specific definitions are as follows:

$$u\_{b,t}^i = \begin{cases} 0 & if \quad T\_t^i > T\_{\text{max}}^i \\ u\_t^i & if \quad T\_{\text{min}}^i < T\_t^i < T\_{\text{max}}^i \\ 1 & if \quad T\_t^i < T\_{\text{min}}^i \end{cases} \tag{3}$$

where *u<sup>i</sup> <sup>b</sup>*,*<sup>t</sup>* is the on/off action of the *ith* TCL backup controller at *<sup>t</sup>*, *<sup>T</sup><sup>i</sup> <sup>t</sup>* is the operating temperature of the *ith* TCL at *t*, and *T<sup>i</sup>* max and *T<sup>i</sup>* min are the upper and lower temperature boundaries set by the client, respectively. The differential equation of the temperature change in the building is designed as follows [27]:

$$\dot{T}\_t^i = \frac{1}{C\_a^i} (T\_t^0 - T\_t^i) + \frac{1}{C\_m^i} \left( T\_{m,t}^i - T\_t^i \right) + L\_{TCL}^i u\_{b,t}^i + q^i \tag{4}$$

$$\dot{T}\_{m,t}^{\dot{i}} = \frac{1}{C\_{\text{m}}^{\dot{i}}} (T\_t^{\dot{i}} - T\_{m,t}^{\dot{i}}) \tag{5}$$

where *T<sup>i</sup> <sup>t</sup>*, *T<sup>i</sup> <sup>m</sup>*,*t*, and *T*<sup>0</sup> *<sup>t</sup>* are the indoor air temperature, indoor solid temperature, and outdoor air temperature at *t*, respectively, *C<sup>i</sup> <sup>a</sup>* and *C<sup>i</sup> <sup>m</sup>* are expressed as the equivalent heat capacity of indoor air and solid, respectively, *q<sup>i</sup>* is the thermal power provided by indoor temperature control equipment, and *L<sup>i</sup> TCL* is the rated power of TCL.

Finally, the state of charge (SoC) is used to represent the relative position of the current temperature *T<sup>i</sup> <sup>t</sup>* within the expected temperature range. The SoC of each TCL at *t* is defined as follows:

$$So\mathcal{C}\_t^i = \frac{(T\_t^i - T\_{\text{min}}^i)}{(T\_{\text{max}}^i - T\_{\text{min}}^i)}\tag{6}$$

#### 2.1.5. Resident Price Response Load

Some power demands exist from household that the energy controller cannot directly control in the residential load [28]. This study assumes that the daily electricity consumption of residents is composed of the daily basic electricity consumption and the flexible load affected by the electricity price. The flexible load can operate in advance or later within the acceptable time range and can be transferred according to the power generation situation of DER, such that the resource utilization rate can be improved and the household electricity expenditure can also be reduced. In this module, each household *i* has a sensitivity factor *β<sup>i</sup>* ∈ (0, 1) and a patience parameter *λi*, in which the sensitivity factor *β* represents the percentage of load that can be operated in advance or later when the price decreases or increases, and the patience parameter *λ* represents the hours to repay the transferred load. For example, when the electricity price is high, this part of the load can be cut now and operated after *λi*.

At *t*, the load *L<sup>i</sup> <sup>t</sup>* of household *i* is modeled by the following formula:

$$L\_t^i = L\_{b,t} - SL\_t^i + PB\_t^i \tag{7}$$

$$SL\_t^i = L\_{b,t} \* \beta\_i \* \delta\_t \tag{8}$$

where *Lb*,*<sup>t</sup>* represents the daily basic load of residents, *Lb*,*<sup>t</sup>* > 0, and *Lb*,*<sup>t</sup>* follows the daily consumption pattern, which can be inferred from the average daily consumption curve of residential areas. *SL<sup>i</sup> <sup>t</sup>* is the shift load (SL) defined by (8), where *δ<sup>t</sup>* represents the electricity price level at *t*. Therefore, *SL<sup>i</sup> <sup>t</sup>* is positive when the price is high, i.e., *δ<sup>t</sup>* > 0, then *SL<sup>i</sup> <sup>t</sup>* > 0, and when the price is low, i.e., *δ<sup>t</sup>* < 0, then *SL<sup>i</sup> <sup>t</sup>* < 0. The positive transfer load will be repaid after a certain period of time *λ*. The negative transfer load is the electricity provided in advance, so it will exist in the future. The loads to be compensated can be formulated as follows:

$$PB\_t^i = \sum\_{j=0}^{t-1} \omega\_{i,j} \* SL\_j^i \tag{9}$$

where *ωi*,*<sup>j</sup>* ∈ {0, 1} represents the compensation degree for the transferred load at *j*. Generally, the closer *t* minus *j* is to *λi*, the higher *ωi*,*<sup>j</sup>* is. In addition, the compensation action also should be related to the electricity price, i.e., *ωi*,*<sup>j</sup>* becomes smaller when *δ<sup>t</sup>* > 0. Therefore, *ωi*,*<sup>j</sup>* can be designed as follows:

$$\omega\_{i,j} = \operatorname{clip}\left(\frac{-\delta\_{\mathbf{t}} \* \operatorname{sign}(SL\_j^i)}{2} + \frac{\mathbf{t} - j}{\lambda\_i}, 0, 1\right) \tag{10}$$

$$clip(\mathsf{X}, \mathsf{a}, \mathsf{b}) = \begin{cases} a & if \\ \mathrm{X} & if \\ b & if \end{cases} \begin{array}{c} \mathsf{X} < a \\ a \le X \le b \\ \mathrm{X} > b \end{array} \tag{11}$$

Given (10), when *δ<sup>t</sup>* > 0, one can obtain *SL<sup>i</sup> <sup>t</sup>* > 0 and *sign*(*SL<sup>i</sup> j* ) <sup>&</sup>gt; 0, then <sup>−</sup>*δt*∗*sign*(*SL<sup>i</sup> j* ) <sup>2</sup> < 0, which means that *ωi*,*<sup>j</sup>* becomes smaller and the positive transfer load almost cannot be compensated in the case of a high price [29,30].

#### 2.1.6. Energy Controller

In this study, EC can extract the information provided by different modules and the observable environment to determine the best supply and demand balance strategy. EC mainly manages the power grid through four control mechanisms, as shown in Figure 2, including TCL direct control, price level control, energy deficiency action, and energy excess action.

#### (1) TCL direct control

At each time step t, EC will allocate a certain amount of electric energy for TCLs. Then, they will be distributed to each TCL through a TCL aggregator. The TCL aggregator judges the priority of energy distribution according to the power delivered by EC and the SoC of each TCL, and then determines the on/off action of each TCL: TCL with a lower SoC has a higher priority in energy allocation than TCL with a higher SoC. The TCL aggregator also operates as an information aggregator transmitting the real-time average SoC information of the TCL cluster to EC [31]. The specific transmission process is shown in Figure 3.

**Figure 2.** The control mechanism of Energy controller.

**Figure 3.** The intermediary role of the TCL aggregator.

#### (2) Price level control

In order to effectively utilize the elastic benefits of the residential price response load, EC must determine the electricity price level *δ<sup>t</sup>* at each time step t. In order to ensure the competitiveness of the system model proposed in this paper, a pricing mechanism is designed: The price can fluctuate around the median value, but the average price of the daily electricity price *Pavg* cannot exceed 2.9% of the market electricity price provided by power retailers [32]. From a practical point of view, the electricity price at the DR side is discrete, and its fluctuation is affected by the electricity price level *δt*. So, the real-time electricity price is selected from five values:

$$P\_t \in \left(P\_{market} + \delta\_t \* \csc t\right) \tag{12}$$

where *δ<sup>t</sup>* ∈ {−2, −1, 0, 1, 2}, *cst* is the constant to determine the specific increment or reduction in electricity price.

In addition, the model also pays attention to the electricity price level *δ<sup>t</sup>* at each moment. When the sum of the previous electricity price levels is higher than the set threshold, the market electricity price is adjusted to *Pmarket* instead of the price given by the agent. The effective electricity price level *δt*,*eff* is defined as follows:

$$
\delta\_{t,eff} = \begin{cases}
\delta\_t & \text{if } \quad \sum\_{j=0}^t \delta\_t \le t \text{threshold} \\
& \text{if } \quad \sum\_{j=0}^t \delta\_t > t \text{threshold}
\end{cases}
\tag{13}
$$

(3) Energy deficiency action

When the power generated from DER cannot meet the power demand, EC can dispatch the energy stored in ESS or purchase energy from an external power grid. EC will determine the energy priority between ESS and an external power grid. In addition, if the high priority energy is ESS but the electricity stored in ESS cannot meet the power demand, the remaining power will be automatically supplied by an external power grid.

#### (4) Energy excess action

When the electricity generated by local DER exceeds the electricity demand, the excess electricity must be stored in ESS or be sold to an external power grid. In this case, EC also will determine the priority between ESS and the external power grid. If ESS is the preferred option and it has reached the max capacity, the remaining electricity will be automatically transmitted to an external power grid.

#### *2.2. D3QN*

In this section, the basic principle of DQN (deep Q-network) and SARSA (state−action−reward−state−action) is presented first.

The train mechanism of DQN can be formulated as follows:

$$Q\_{k+1}(\mathbf{s}, a) = Q\_k(\mathbf{s}, a) + aE\_k \tag{14}$$

$$E\_k = R + \gamma \arg\max\_{a'} Q(s', a') - Q(s, a) \tag{15}$$

Using (14), one can find that the update iteration needs to achieve the approximation of the action-value function value (i.e., *Qk*<sup>+</sup>1(*s*, *a*) = *Qk*(*s*, *a*)), which means *R* + *γargmaxa Q*(*s* , *a* ) − *Q*(*s*, *a*) → 0. Thus, the DQN network parameters can be updated by minimizing the mean square error loss function in the DQN algorithm.

The difference in the SARSA algorithm lies in how the Q value is updated. Specifically, when the agent with the SARSA algorithm is in the state *s*, it selects the action *a* according to the *ε* − *greedy*, and then observes the next state *s* from the environment, and selects the action *a* again. The sequence {*s*, *a*,*r*,*s* , *a* } is stored in the empirical replay set, and the calculation of the target Q value also depends on it. The core idea of the SARSA algorithm can be simplified as follows:

$$Q(\mathbf{s}, a) \leftarrow Q(\mathbf{s}, a) + a[\mathbb{R} + \gamma \text{argmax}\_{a'} Q(\mathbf{s'}, a') - Q(\mathbf{s}, a)] \tag{16}$$

In the existing study, DQN and SARSA have been developed for the wind-storage cooperative decision-making algorithm. However, both DQN and SARSA use *Q*(*s*, *a*) and *maxQ*(*s* , *a* ) produced by the same network to update the Q network parameter *ω*, which leads to the variation in the timing difference goal and a reduction in the convergence performance. Therefore, in view of the above possible problems, this paper uses the D3QN algorithm to optimize the model decision. The specific improvements are collected as follows:

(1) Referring to the double DQN (DDQN) algorithm, two neural networks with the same structure are constructed as the estimation network *Q*(*s*, *a*, *ω*) and the target network *Q* (*s*, *a*, *ω* ), respectively. The estimation network is used to select the action corresponding to the maximum Q value, and its network parameters are constantly updated. The target network is used to calculate the target value *y*, and its network parameters are fixed, but they are updated by using the current estimated network parameters value at regular intervals. The parameters in the target network are fixed for a period of time, which makes the convergence target of the estimated network relatively fixed, which is beneficial to the convergence of the algorithm model, and also avoids the agent selecting the overestimated suboptimal action. The overestimation problem of the DQN algorithm can also be effectively solved.

(2) In this paper, the structure of the deep neural network is adjusted. Referring to dueling DQN based on competitive architecture, the main output is divided into two parts: one part is the state-value function *V*(*S*, *ω*, *α*), which represents the current state; the other

part is the advantage function *A*(*S*, *A*, *ω*, *β*), which judges the additional value level of each action for the current state. The neural network structure of DQN is shown in Figure 4, and the neural network structure of D3QN is shown in Figure 5.

**Figure 4.** The network structure of DQN.

**Figure 5.** The network structure of D3QN.

Finally, the output of the Q network is obtained by the linear combination of the output of the state-value function network and the advantage function network:

$$Q(S, A, \omega, \mathfrak{a}, \mathfrak{k}, \mathfrak{k}) = V(S, \omega, \mathfrak{a}) + A(S, A, \omega, \mathfrak{k}) \tag{17}$$

However, (17) cannot identify the respective functions of *V*(*S*, *ω*, *α*) and *A*(*S*, *A*, *ω*, *β*) in the final output. In order to reflect this identifiability, the advantage function is generally set as the single action advantage function minus the average value of all of the action advantage functions in a certain state, so it can be modified as follows:

$$\begin{aligned} Q(\mathcal{S}, A, \omega, a, \beta) &= V(\mathcal{S}, \omega, a) + \\ A(\mathcal{S}, A, \omega, \beta) &- \frac{1}{\mathcal{A}} \sum\_{a' \in \mathcal{A}} A(\mathcal{S}, a', \omega, \beta) \end{aligned} \tag{18}$$

The flow chart of D3QN is shown in Figure 6:

In Figure 6, the D3QN algorithm stores the experience gained from the interaction in the experience pool one by one. After a certain amount is accumulated, the model randomly extracts a certain batch of data from the experience pool in each step to train the neural network. These randomly extracted experiences break the correlation between data, improve the generalization performance, and benefit from the stability of network training. Meanwhile, in Figure 6, the D3QN algorithm constructs two neural networks with the same structure, namely, the estimated network *QE*(*S*, *A*, *ω*, *α*, *β*) and the target network *QT*(*S*, *A*, *ω* , *α* , *β* ). The estimated network is used to select the action and parameter *ω* is updated constantly. The target network is used to calculate the temporal difference of the target value. Parameter *ω* is fixed and replaced with the latest estimated network parameter *ω* at regular intervals. *ω* remains unchanged for a period of time, resulting in a relatively fixed convergence goal of the estimated network *QE*, which is beneficial for convergence. The actions of the maximum function generated by the estimated network and the target network are not necessarily the same. Using *QE* to generate actions and *QT* to calculate the target value can prevent the model from selecting the overesti-

mated sub-optimal actions and can effectively solve the overestimation problem of the DQN algorithm.

**Figure 6.** The flow chart of D3QN.

#### **3. Wind-Storage Cooperative Decision-Making Based on D3QN**

In this section, wind-storage cooperative model will be converted into a discrete Markov decision-making process (MDP). According to the reinforcement learning mechanism, the one-day state of the model is discretized into 24 states. In addition, the MDP in this paper takes the online environmental information as the state space, the set of command actions executed by the energy controller as the action space, and the income of electricity sellers as the reward function. The interaction process between the energy controller and the system power environment is shown in Figure 7.

#### *3.1. State Space*

The state space is composed of the information that the agent needs to use when making decisions at each time step *t*, including the controllable state component *SC*, the external state component *SX*, and the time-dependent component *ST*. The controllable state information includes all environmental variables that the agent can directly or indirectly affect. In this study, the controllable state information is composed of TCL's average *SoC*, ESS's charge and discharge state *BSCt*, and the pricing counter *C<sup>b</sup> <sup>t</sup>* [33]. The external state information consists of all variables, such as the temperature information *Tt*, the wind power generation *Gt*, and the electricity price *P<sup>u</sup> <sup>t</sup>* . When the algorithm is implemented, the external state information directly uses the real data set, so it is assumed that the controller can accurately predict the values of three variables in the next moment. The time-dependent component information includes the information strongly related to time in the model, where *Lb*,*<sup>t</sup>* represents the current load value based on the daily consumption mode, and *t* represents the hours of the day.

The state space is expressed as follows:

$$s\_t \in S = S^C \times S^X \times S^T \tag{19}$$

$$s\_t = \begin{bmatrix} SoC\_t, BSC\_t, C\_t^b, T\_t, G\_t, P\_t^u, L\_{b,t}, t \end{bmatrix} \tag{20}$$

In the implementation process, the electricity price is not given directly. Firstly, the initial electricity price is set. When the price should be increased or decreased, the pricing counter *C<sup>b</sup> <sup>t</sup>* will be added or subtracted by 1. Then, the electricity price becomes the initial price plus the product between *C<sup>b</sup> <sup>t</sup>* and the unit electricity price.

**Figure 7.** Interaction process between the energy controller and the system environment.

#### *3.2. Action Space*

The action space consists of four parts: TCL action space *Atcl*, price action space *AP*, energy shortage action space *AD*, and energy excess action space *AE*. Among them, the TCL action space consists of four possible actions. The price action space consists of five possible actions. There are two possible actions in the energy shortage and excess action space, that is, the priority between ESS and the external power grid. Therefore, the whole action space contains 80 potential combinations of these actions, which can be expressed as follows:

$$a\_t = (a\_{tcl}, a\_P, a\_D, a\_E)\_t \tag{21}$$

$$a\_t \in A = A\_{tcl} \times A\_P \times A\_D \times A\_E \tag{22}$$

#### *3.3. Reward Function and Penalty Function*

The main form of deep reinforcement learning (DRL) to solve problems is to maximize the reward function. The purpose of using DRL in this paper is to maximize the economic profits of the electricity sellers. Thus, the reward value can be selected as the operating gross profit, i.e., the income from selling electricity to the demand-side and the external power grid minus the cost of wind power generation and purchasing electricity from an external power grid. Therefore, the reward function *Rt* and penalty function *Costst* are defined as follows:

$$R\_t = Rcv\_t - \text{Cost}s\_t \tag{23}$$

$$Rev\_{t} = P\_{t} \sum\_{loads} L\_{t}^{i} + \mathcal{C}\_{\mathcal{S}^{t:n}} \sum\_{T \subseteq L,s} L\_{T \subseteq L}^{i} u\_{b,t}^{i} + P\_{t}^{d} E\_{t}^{s} \tag{24}$$

$$\text{Costs}\_{t} = \mathbb{C}\_{\text{S}^{\text{en}}} \text{G}\_{t} + \left(P\_{t}^{\mu} + \mathbb{C}\_{tr\_{imp}}\right) \text{E}\_{t}^{P} + \mathbb{C}\_{tr\_{\text{exp}}} \text{E}\_{t}^{S} \tag{25}$$

where *Cgen* is the energy price charged to TCL, and it is also the cost of wind power generation. *Gt* refers to the wind power generation amount. *P<sup>d</sup> <sup>t</sup>* and *P<sup>u</sup> <sup>t</sup>* are the decreased price and increased price respectively, i.e., the energy price sold to or purchased from an external power grid [25]. *E<sup>S</sup> <sup>t</sup>* and *E<sup>P</sup> <sup>t</sup>* are the amount of energy sold to or purchased from an external power grid, respectively. *Ctrimp* and *Ctr*exp are the power transmission costs from the interaction with the external power grid.

#### **4. Implementation Details**

  Before the algorithm evaluation, implementation details are given in this section.

The computer configuration and environment configuration are collected as Widows11, python3.8, tensorflow1.14; CPU is AMD R7-5800H; GPU is RTX3060; and the memory is 16 GB.

The network structure of the DQN and SARSA algorithms consists of an input layer, two fully connected hidden layers, and an output layer. The activation function of neurons is the ReLU function. In addition, in order to prevent the phenomenon of over fitting after model training, this paper applies the dropout section for neural network training. The number of neurons in the network input layer is the same as the dimension of the system state space, and the number of neurons in the output layer is the same as the dimension of the system action space. The D3QN algorithm adds a competitive network to the structure of the first two algorithms, diverting the abstract value obtained from the full connection layer into two branches. The upper path is the state value function *V*(s), which represents the value of the state environment itself, and the lower path is the state dependent action advantage function A(s, a), which represents the additional value brought by selecting an action in the current state. Finally, these two paths are aggregated to obtain the Q value of each action. This competitive structure can theoretically learn the value of the environmental state without the influence of action, making the practice effect better.

In the training process of the neural network, the discarding rate in dropout is 70%, the sample storage capacity of experience playback set is 500, the scale batch used for each small batch is 200, the reward attenuation coefficient is 0.9, and the target network update interval N is 200. The detailed network structure diagrams of DQN, SARSA, and D3QN are shown in Figures 8 and 9.

> 

**Figure 8.** Network structure diagram of the DQN algorithm and SARSA algorithm.

 

**Figure 9.** Network structure diagram of the D3QN algorithm.

The proposed decision-making algorithm will be deployed in the cloud server for real-world applications. Generally, the cloud sever possesses enough computational power to execute the DL-based methods.

#### **5. Algorithm Evaluation**

 

In this section, the simulation evaluation is presented to validate the proposed control mechanism. This paper selects the wind power data of a wind farm in Finland. In the wind-storage cooperative model, the control cycle of ESS is 1 day, i.e., 24 intervals. In addition, the parameters involved in the whole system model are summarized in Table 1.


**Table 1.** Parameters in the system model.

**Table 1.** *Cont.*


#### *5.1. Comparisons of Training Results*

#### 5.1.1. Penalty Value Curve

The penalty value is composed of the cost of wind power generation, the purchasing power from the external power grid, and power transaction. Figure 10 shows the total cost paid by the wind power producers in each training cycle (episode) during the learning process. The penalty value decreases with the increase in training times and it gradually converges.

It can be seen that the convergence performance of D3QN is superior to its rivals. Although the penalty value using DQN shows a downward and gradual convergence trend, it still vibrates obviously, which is caused by the defects of DQN. D3QN uses two Q networks to calculate the target Q value and the estimated Q value, respectively, which directly reduces the correlation and greatly improves the convergence performance.

#### 5.1.2. Reward Value Curve

Figure 11 shows the reward value curve during the training process, i.e., the income obtained by the wind farm from the external environment in the operation. The specific training time, final reward mean value, and performance improvement rate between the three algorithms are summarized in Table 2. It can be seen that the final reward value of D3QN is higher than that of the other two algorithms, so the overall performance of the system model based on D3QN has been improved.

**Figure 10.** Comparison analysis of the penalty value using DQN, SARSA, and D3QN.

**Figure 11.** Comparison analysis of reward value curves using DQN, SARSA, and D3QN.

**Table 2.** Training results between three algorithms.


#### *5.2. Comparison of Application Results*

5.2.1. 10 Day Revenue Comparison

In order to give a more intuitive understanding of the performance difference for DQN, SARSA, and D3QN, this section selects the data from 10 days in a year, and analyzes the daily total profit obtained by the system model with the three algorithms, as shown in Figure 12.

It can be seen that the daily income using SARSA and D3QN is higher than that of DQN within 10 days. Moreover, the total profit of D3QN is better than that of SARSA in 9 out of 10 days, which also validates the superiority of D3QN.

#### 5.2.2. Daily Electricity Trading Comparison

This section will compare the behavior of the three algorithms in the specific oneday. The one-day data of the environment is shown in Figure 13, including the outdoor temperature, wind power generation, electricity prices, and residential load.

**Figure 13.** *Cont*.

**Figure 13.** *Cont*.

**Figure 13.** Environmental data of one-day: (**a**) outdoor temperature, (**b**) energy generated, (**c**) electricity prices, and (**d**) residential loads.

Using DQN, SARSA, and D3QN, one can obtain the energy allocation results of TCLs, the purchased energy, and the sold energy, as shown in Figures 14–16.

**Figure 14.** *Cont*.

**Figure 14.** TCLs status and power exchange using DQN: (**a**) TCLs and (**b**) power exchange.

**Figure 15.** *Cont*.

**Figure 15.** TCLs status and power exchange using SARSA: (**a**) TCLs and (**b**) power exchange.

**Figure 16.** *Cont*.

**Figure 16.** TCLs status and power exchange using D3QN: (**a**) TCLs and (**b**) power exchange.

In Figures 14–16, the SoC of TCLs reflects the change in indoor temperature for residents. This paper sets the constant temperature range of TCLs as 19~25 ◦C. When the charging state of TCLs is 0%, it means that the indoor temperature of residents is less than or equal to 19 ◦C; when the charging state is 100%, it means that the indoor temperature is greater than or equal to 25 ◦C. It can be seen that SARSA and D3QN can allocate sufficient energy to TCLs when the wind power generation is sufficient, where its state can reach saturation as soon as possible, such that the system can keep the room temperature stable, and gives residents a warm and comfortable experience. In addition, SARSA selects multiple transactions to ensure the income, and D3QN decisively sells a large amount of power to obtain more income when wind energy is sufficient and the electricity price is the highest.

#### 5.2.3. Computational Efficiency Comparison

In order to demonstrate the computational efficiency of the proposed D3QN, the training time, decision-making time, the number of trainable parameters, and performance improvement rate are summarized in Table 3. It takes 196.0111 and 415.5845 s for DQN and SARSA to reach convergence, respectively, while the proposed D3QN takes 244.1469 s. Furthermore, although D3QN possesses the largest number of trainable parameters, the decision-making time of D3QN is close to the other two algorithms, which demonstrates that D3QN can be implemented in real-world applications. From Table 3, one can conclude that the computational cost of D3QN is slightly larger than DQN and SARSA, which is still in an acceptable range. However, it should be noted that it is mainly because of many trainable parameters. Moreover, the performance improvement rate of D3QN is the biggest, which is an important criterion to evaluate different algorithms. Generally, it is worth increasing some computational complexity while the performance can gain enough improvement.


**Table 3.** Computational efficiency comparison between three algorithms.

#### **6. Conclusions**

Considering external conditions such as wind energy resources, demand response load, and market electricity price, this paper puts forward a new research method of windstorage cooperative decision-making based on the DRL algorithm. The main work of this paper is summarized as follows:

(1) This paper proposes a new wind-storage cooperative model. Based on the conventional model including wind farms, energy storage systems, and external power grids, this paper also takes into account a variety of flexible loads based on demand response, including residential price response loads and thermostatically controllable loads (TCLs). Meanwhile, this model also can be applied to other renewable energy sources, such as photovoltaic power generation, hydroelectric power generation, and thermal power generation.

(2) This paper proposes a new wind-storage cooperative decision-making mechanism using D3QN, which takes the energy controller as the central allocation controller of the system energy, realizing the direct control of TCLs and the indirect control of the residential price response load, and the management of priority between ESS and the external power grid in the case of sufficient or insufficient energy.

(3) It is worth mentioning that the application of the D3QN algorithm is a new attempt in the research field of wind-storage cooperative decision-making. Based on the historical data of wind farm and market electricity prices, the effectiveness of D3QN in dealing with the wind-storage cooperative decision-making problem is verified, and the superior performance of D3QN is also analyzed.

**Author Contributions:** Conceptualization, S.Z. and W.L.; methodology, S.Z. and W.L.; validation, S.Z., W.L. and X.Z.; writing—original draft preparation, S.Z., W.L. and Z.Q.; writing—review and editing, S.Z. and W.L.; supervision, S.H.; project administration, S.H.; funding acquisition, S.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Science and Technology Project of China Southern Power Grid Yunnan Power Grid Co., Ltd., grant number. YNKJXM20220048, China Postdoctoral Science Foundation, grant number. 2021MD703895.

**Institutional Review Board Statement:** Not applicable.

**Data Availability Statement:** Data sharing are not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Probability Density Forecasting of Wind Power Based on Transformer Network with Expectile Regression and Kernel Density Estimation**

**Haoyi Xiao 1, Xiaoxia He 1,2,\* and Chunli Li <sup>1</sup>**


**Abstract:** A comprehensive and accurate wind power forecast assists in reducing the operational risk of wind power generation, improves the safety and stability of the power system, and maintains the balance of wind power generation. Herein, a hybrid wind power probabilistic density forecasting approach based on a transformer network combined with expectile regression and kernel density estimation (Transformer-ER-KDE) is methodically established. The wind power prediction results of various levels are exploited as the input of kernel density estimation, and the optimal bandwidth is achieved by employing leave-one-out cross-validation to arrive at the complete probability density prediction curve. In order to more methodically assess the predicted wind power results, two sets of evaluation criteria are constructed, including evaluation metrics for point estimation and interval prediction. The wind power generation dataset from the official website of the Belgian grid company Elia is employed to validate the proposed approach. The experimental results reveal that the proposed Transformer-ER-KDE method outperforms mainstream recurrent neural network models in terms of point estimation error. Further, the suggested approach is capable of more accurately capturing the uncertainty in the forecasting of wind power through the construction of accurate prediction intervals and probability density curves.

**Keywords:** wind power forecasting; transformer network; expectile regression; kernel density estimation; probability density forecasting

#### **1. Introduction**

In response to climate problems, environmental pollution, and the energy crisis, the global focus of energy development and utilization has changed from traditional fossil fuels to clean and renewable energy sources such as wind and solar power [1]. Among these, wind energy is a non-polluting and sustainable energy source with huge storage capacity, stable production, and widespread use, making it one of the most popular sustainable renewable energy sources in the world [2]. According to forecasts, wind energy is estimated to account for a significant share of global electricity generation by 2030 [1], with China, in particular, proposing the development of a new power system based on renewable sources such as wind and solar [3]. Wind power is anticipated to play a pivotal role in the future energy mix with plans to integrate it into power systems around the world. This highlights the enormous potential for future growth in the wind power industry.

However, wind power generation is chiefly influenced by natural wind fluctuations and other meteorological conditions, and its intermittent, stochastic, and unstable nature inevitably produces technical challenges for power system planning and scheduling, as well as safe and stable operations [3]. Comprehensive and precise power network forecasting is necessary for the incorporation of wind farm technology into existing power grids. Successful forecasting is necessary to manage risks and successfully maintain a

**Citation:** Xiao, H.; He, X.; Li, C. Probability Density Forecasting of Wind Power Based on Transformer Network with Expectile Regression and Kernel Density Estimation. *Electronics* **2023**, *12*, 1187. https:// doi.org/10.3390/electronics12051187

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 4 February 2023 Revised: 24 February 2023 Accepted: 27 February 2023 Published: 1 March 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

balanced network with significant wind components as part of the overall electrical grid. The challenges associated with accomplishing this require careful mathematical analysis combined with data verification to merge wind networks into existing power grids. The stochastic issues with wind power differ significantly from more traditional power sources, so data analysis, statistical estimators, stochastic analysis, and predictive methodologies require careful thought.

With the development of wind power generation in recent years, significant research and progress have been made in the field of wind power forecasting (WPF). According to various modeling schemes, WPF can be essentially classified into physical models, statistical models, and artificial intelligence models with machine learning [3–6]. In more detail, physical methods commonly exploit long-term forecasts based on numerical weather predictions (NWPs). Hence, many physical factors are required to achieve the best forecast accuracy [5], and physical models usually exhibit advantages in long-term forecasting [6]. Statistical methods for time-series forecasting include methods such as the Kalman filter (KF), autoregressive integrated moving average (ARIMA), generalized autoregressive conditional heteroskedasticity (GARCH), and its variations [5]. These methodologies are utilized for predicting the future production of wind power based on a large amount of historical data and are more effective than physical methods for short-term wind power forecasting. However, the strict distribution assumptions and smoothness tests on the data result in these statistical models not exhibiting universality and generality. With the rapid development of artificial intelligence in recent years, many machine learning-based prediction approaches such as support vector machine (SVM) [6], random forest (RF) [7], and XGboost [8] have been developed to perform wind speed or wind power prediction. Machine learning approaches usually have large-scale data processing capabilities, more accurate prediction precision, and more remarkable universality and generalization capabilities [3].

Due to the powerful ability of deep learning to learn features and handle complex nonlinear problems, neural network algorithms such as long short-term memory neural networks (LSTMs) [9–12], gated recurrent units (GRUs) [12,13], extreme learning machines (ELMs) [14], and convolutional neural networks (CNNs) [15,16] have been recently extensively employed for short-term wind power prediction. In constructing predictive models for time-series data such as wind power data, recurrent neural network (RNN) frameworks, including LSTMs and GRUs, are particularly effective for modeling sequential data in time-series data prediction tasks such as wind power forecasting. Despite these RNN-based frameworks generally performing well, they exhibit some limitations. The RNNs are often employed to iteratively model sequential data, but these methodologies possess a high training time cost and could result in performance reduction for sequential data with longer time steps. This issue is essentially attributed to the fact that the RNNs can only consider the hidden state of the last moment during processing sequential data [17].

In 2017, Google proposed the transformer network [18], which has already exhibited a momentous impact on the field of natural language processing and the application area of deep learning. The model exclusively relies on the self-attention mechanism to establish global dependencies on sequence data and is capable of mining complex and relevant information from various scales of the sequence [19]. Transformer network-based methodologies have been used by various researchers for wind power prediction [19–21]. The core self-attention mechanism has also been used in combination with recurrent neural networks such as LSTM to construct hybrid models for more accurate wind power prediction [1,3,13,22,23]. The transformer networks are capable of capturing the internal correlation of longer sequences and comprehensively obtaining essential information about wind power data [21].

Most explorations so far have focused on providing deterministic values for point estimates, which are difficult to use in measuring the uncertain characteristics of wind power [24]. On the other hand, interval and probabilistic forecasting of wind power recently attracted considerable attention because it allows the construction of continuous probability density curves and the quantification of uncertainty in wind power output. Thereby, it provides helpful information for power companies, system operators, and related decisionmakers and stakeholders [2]. In addition, several investigations have been devoted to interval and probability density forecasting of wind power [25–27]. In [27], the quantile regression neural network (QRNN) approach was implemented for wind power prediction. For this purpose, the prediction results for various conditional quantiles were exploited as input to a nonparametric method of kernel density estimation (KDE) that does not presuppose the data distribution to derive the complete probability density profile of the wind power. The QRNN represents a hybrid model that combines traditional statistics and machine learning. It mainly merges the advantages of quantile regression (QR), such as the ability to estimate the conditional distribution of explanatory variables without considering the distribution type of random variables, with the strong nonlinear fitting capabilities of neural networks.

A nonparametric nonlinear regression model, the so-called expectile regression neural network (ERNN), was proposed in [28]; it builds upon the concept of QRNN by incorporating the expectile regression (ER) framework into the neural network structure. This novel ERNN model is capable of easily predicting the model parameters by standard gradient-based optimization algorithms and direction propagation due to the use of an asymmetric squared loss function, a property that outperforms the QRNN model that uses an asymmetric absolute loss function that is not differentiable at the origin. In addition, the ERNN model can directly output conditional expectation functions that describe the complete distribution of responses based on covariate information and provide more insightful information for decision-making.

The prediction performance of neural networks is commonly influenced by the model structure and hyperparameters [4], and numerous investigators have combined neural network models (NNMs) with modal decomposition techniques [3,20,23,29–31] or optimization algorithms [30–35] to achieve better prediction results. In the current investigation, hence, the transformer (i.e., a model known for its superior performance in sequential data tasks) is utilized as the base model for wind power prediction. Additionally, this effective model is properly combined with the asymmetric loss function of expectile regression and then optimized via the cuckoo search (CS) algorithm [36]. The optimal model structure is then exploited to make wind power predictions at various levels τ. To this end, the KDE model with a Gaussian kernel function in conjunction with the leave-one-out cross-validation (LOOCV) method is employed to obtain the probability density interval estimates for wind power forecasting. The results obtained with the proposed transformer expectile regression and kernel density estimation (Transformer-ER-KDE) model are compared with those of other models and methods for various points and interval estimates by utilizing the wind power data in the time interval of 2022.1–2022.2 provided by the Belgian grid, and its superiority to other models is proved.

The present investigation presents three major contributions in comparison to the preceding ones:


the density function of random variables [27], the leave-one-out cross-validation is employed here for optimal bandwidth selection, fully exploiting the information from the estimation results of various levels τ, while Gaussian kernel functions [3] are commonly utilized to achieve improved probability density estimates.

(3) The probability density estimation results are appropriately derived based on two sets of evaluation criteria for point estimation and interval prediction. The point estimation results, which are attained using the probability density approach, exhibit strong robustness and high accuracy compared with traditional prediction methods [27]. Usually, evaluation metrics, such as prediction interval coverage probability (PICP), prediction interval normalized average width (PINAW), and coverage width-based criterion (CWC), are employed to assess the interval prediction results. The prediction interval estimation error (PIEE) evaluation metrics proposed in [25] are also implemented here for the purpose of evaluating and comparing the probability density interval estimation. Additionally, the PIEE index is incorporated into the CWC composite index to make it more comprehensive and accurate in reflecting the evaluation effect of interval prediction.

#### **2. Related Theories**

#### *2.1. Transformer Network*

A transformer network is a transduction model that relies entirely on a self-attention mechanism to evaluate its input and output representations without employing RNNs or CNNs [20].

#### 2.1.1. Self-Attention Mechanism

The main advantage of the attention mechanism is its ability to extract relevant information from a large amount of input data in the current task context. Specifically, the self-attention mechanism calculates attention values within a sequence and uses this information to identify structural relationships and connections within the sequence [21].

In self-attention, the input sequence *<sup>X</sup>* <sup>∈</sup> <sup>R</sup>*l*×*<sup>d</sup>* is transformed by matrix operations into *Q*(Query), *K*(Key), and *V*(Value), where *l* represents the sequence length and *d* denotes the model dimension:

$$\mathbf{Q} = \mathbf{X}\mathbf{W}\_{\mathbf{Q}'}\mathbf{K} = \mathbf{X}\mathbf{W}\_{\mathbf{K}'}\mathbf{V} = \mathbf{X}\mathbf{W}\_{\mathbf{V}'} \tag{1}$$

where *WQ* <sup>∈</sup> <sup>R</sup>*d*×*dqk* , *WK* <sup>∈</sup> <sup>R</sup>*d*×*dqk* , and *Wv* <sup>∈</sup> <sup>R</sup>*d*×*dv* are the weight matrix parameters that the neural network is trained to through iterations, *<sup>Q</sup>* <sup>∈</sup> <sup>R</sup>*l*×*dqk* , *<sup>K</sup>* <sup>∈</sup> <sup>R</sup>*l*×*dqk* , and *<sup>V</sup>* <sup>∈</sup> <sup>R</sup>*l*×*dv* are evaluated as follows to the output of the self-attentive mechanism:

$$A = Softmax\left(\frac{QK^{\Gamma}}{\sqrt{d\_{qk}}}\right) \text{V.}\tag{2}$$

It is evident that *QK<sup>T</sup>* contains the information of various positions in the whole sequence, and after normalization, it represents the attention weights for each position. Furthermore, the matrix multiplication with *<sup>V</sup>* results in the output of attention *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>*l*×*dv* . Finally, the output is transformed through the linear transformation as specified in the following form:

$$\mathbf{O} = A\mathbf{W}\_{\mathbf{O}\prime} \tag{3}$$

where *WO* <sup>∈</sup> <sup>R</sup>*dv*×*dout* represents the linear layer training weight matrix, and the final output would be *<sup>O</sup>* <sup>∈</sup> <sup>R</sup>*l*×*dout* .

#### 2.1.2. Multi-Head Attention Mechanism

Within the transformer network, the self-attention mechanism is extended to a multihead attention mechanism, which is calculated in an identical way. The primary difference is that the input sequence *X* is divided into *n* subspaces, *n* heads, and parallel operations

of the self-attention mechanism are executed on each subspace. The attention outputs obtained from each head (i.e., *A***1**, *A***2**, ... , *An*) are then concatenated, and the final output *O* can be obtained through the following linear transformation:

$$O = \text{Concat}\left(A^1, A^2, \dots, \dots, A^n\right) \mathcal{W}\_O. \tag{4}$$

The operating principle of multi-headed attention is illustrated in Figure 1. Despite the presence of multiple heads, the number of parameters and time complexity are comparable to those of self-attention [20]. The exploitation of multi-head attention allows it to attend to various representation subspaces at various positions, thereby providing enhanced forecasting capabilities. Each subspace makes its own prediction based on its own perspective or a combination of factors, yielding better predictions than a single self-attentive mechanism.

**Figure 1.** Schematic diagram of the multi-head attention.

#### 2.1.3. Position Encoding

While self-attention considers information from all positions of the sequence data, it may not wholly capture the influence of positional differences. To make full use of the location information of sequence data, this paper incorporates position-encoding information into the sequence data. The position encoding is evaluated in the following form:

$$PE(pos, 2i) = \sin\left(pos/10, 000^{2i/d}\right),\tag{5}$$

$$PE(pos, 2i+1) = \cos\left(pos/10, 000^{2i/d}\right),\tag{6}$$

where *pos* denotes the sequence length index, and *i* represents the dimensional index from 0 to *d*/2.

#### 2.1.4. Transformer

The structure of the transformer network utilized in the present work is depicted in Figure 2.

The traditional transformer architecture consists of an encoder and a decoder. In the current exploration, only the transformer encoder structure is employed, which is appropriate for regression problems and serves as a general-purpose module for transforming a sequence into a more informative feature representation. The transformer is originally developed for exploitation in the NLP field; hence, minor modifications have been made to its architecture. Instead of a word vector embedding layer, the input data are passed through a linear layer before being encoded based on their position. Similarly, before being output, the prediction results are passed through a linear layer without an activation function rather than a Softmax layer for probabilistic prediction. The remaining elements of the multi-headed attention, two normalization layers, one linear layer, and two residual links, are identical to those in the original transformer.

**Figure 2.** Schematic representation of the proposed transformer network.

#### *2.2. Expectile Regression*

Given a response variable *Y* and a covariate matrix *X* with observations (*yi*, *xi*), where *i* is the sample number such that *i* = 1, 2, ... , *n* and *n* denotes the total number of samples, the values *yi* of the response variable at the *τ* level can be derived by the following classical linear expectile regression model:

$$
\hat{E}\_{\mathcal{Y}i}(\mathbf{r}|\mathbf{x}\_i) = \mathbf{x}\_i^\prime \hat{\beta}(\mathbf{r}), \quad i = 1, 2, \dots, n \tag{7}
$$

$$\hat{\beta}(\tau) = \arg\min \sum\_{i=1}^{n} \varphi\_{\tau} \left( y\_i - \mathbf{x}\_i^{'} \boldsymbol{\beta} \right). \tag{8}$$

$$\varphi\_{\tau}(u) = \begin{cases} \tau u^2, & u \ge 0 \\ (\tau - 1)u^2, & u < 0 \end{cases} \tag{9}$$

where *τ* ∈ (0, 1) is the quantile of a given weight level and denotes the degree of asymmetry of the loss function. *Eyi* (*τ*|*xi*) represents the *<sup>τ</sup>*-th level of the response variable *yi*, and *<sup>β</sup>*ˆ(*τ*) denotes the regression's coefficient at a given *τ* for which the estimation can be obtained by solving the optimization problem, as displayed in Equation (8).

*ϕτ*(*u*) is an asymmetric loss function that depends on the level *τ*. When *τ* = 0.5, the asymmetric squared loss function in Equation (9) above degenerates to the squared loss function *ϕ*(*u*) = *u*2, and the overall expectile regression model degenerates to a simple linear regression model. It has been widely acknowledged that the square loss function, commonly utilized in the training of neural networks through back-propagation, is merely a specific instance of the expectile regression asymmetric loss function.

A neural network can be conceptualized as a nonlinear function denoted by *f*(·) that serves as a generalized nonlinear model. Given an input *xi*, the output of this model can be displayed as follows:

$$\mathcal{E}\_{yi}(\mathbf{r}|\mathbf{x}\_i) = f(\mathbf{x}\_{i\prime}w(\mathbf{r})), \quad i = 1, 2, \dots, n \tag{10}$$

where *w*(*τ*) represents the model parameter to be estimated. In the ERNN model, the estimator can be appropriately derived by iterating based on the following loss function:

$$d\hat{\nu}(\tau) = \arg\min \sum\_{i=1}^{n} \varphi\_{\tau}(y\_i - f(\mathbf{x}\_i, w(\tau))),\tag{11}$$

where *ϕτ*(*u*) is the same as that given in Equation (9). Unlike the asymmetric absolute value loss function of the QRNN, the empirical loss function of the ERNN model is differentiable and smooth everywhere at various levels of *τ*. The empirical loss function is also convex, so the standard back-propagation and gradient descent optimization algorithms of neural networks are capable of estimating the ERNN model parameters easily and obtaining the optimal solution *w*ˆ(*τ*) at different values of *τ*. Furthermore, it is clear that the ERNN model is derived by replacing the conventional squared loss function employed in general neural networks with an asymmetric quadratic loss function [28].

#### *2.3. Cuckoo Search Algorithm*

The cuckoo search algorithm was proposed in 2009 [36] as a bionic intelligent algorithm that would be applicable to optimization problems. Similar to genetic algorithms (GAs), and particle swarm optimization (PSO) algorithms, the CS is also an algorithm for directly searching for the extremum points of the objective function in the feasible domain of the given parameters. The main strategy relies on the Lévy flight to update the position where the nest is located. The Lévy flight step formula is given as follows [37]:

$$s = \frac{u}{|v|^{1/\beta}}\tag{12}$$

The value of *β* is usually considered between 1 and 2. In this study, we set *β* = 1.5, which is a commonly used value in the literature. Both *u* and *v* obey the following normal distribution:

$$
\mu \sim \mathcal{N}\left(0, \sigma\_u^2\right), \ v \sim \mathcal{N}(0, 1). \tag{13}
$$

$$\sigma\_{\mathfrak{u}} = \left( \frac{\Gamma(1+\beta) \sin \frac{\pi \beta}{2}}{\beta \cdot \Gamma\left(\frac{1+\beta}{2}\right) \cdot 2^{\frac{\beta-1}{2}}} \right)^{\frac{1}{\beta}}.\tag{14}$$

The Lévy flight, which is commonly characterized by a combination of high-frequency small-step movements and low-frequency large-step movements, mimics the random wandering of a cuckoo. This behavior enables the CS algorithm to effectively search for globally optimal solutions while also avoiding being trapped in local optima. Moreover, the incorporation of small steps in the algorithm guarantees a certain level of accuracy in the solution. The position of the nest is updated according to the following relation:

$$\mathbf{x}\_{i}^{k+1} = \mathbf{x}\_{i}^{k} + \mathfrak{a} \times \mathbf{s} \odot \mathbf{x}\_{i}^{k},\tag{15}$$

where *x<sup>k</sup> <sup>i</sup>* denotes the value of the *k*-th iteration, *α* represents the scaling factor of the step, *s* stands for the step of the Lévy flight, and ⊗ denotes the dot product. The overall flow of the cuckoo search algorithm is presented in Figure 3.

This exploration takes the hyperparameters of an NNM into account as the search parameters, with the overall ERNN model employed as the adaptation function. The performance of the model in predicting the test set data, as measured by its goodness-of-fit value, is also utilized as the adaptive value. The objective of the current search is to find the optimal set of hyperparameters by maximizing the adaptive value.

**Figure 3.** The overall flowchart of the cuckoo search algorithm.

#### *2.4. Kernel Density Estimation*

In comparison to the parametric model, kernel density estimation, being a nonparametric method, avoids imposing any prior assumptions on the data distribution, thereby resulting in more accurate estimations. Based on the similarity theory, the obtained conditional quantile is similar to conditional density [27].

#### 2.4.1. KDE-Based Model

The KDE is established based on the sample data to estimate the probability density function. Given the density function of a random variable represented by *f*(*x*) and the empirical distribution function denoted by *F*(*x*), the basic estimation of *f*(*x*) can be provided by the following:

$$f(\mathbf{x}) = \frac{F(\mathbf{x} + h) - F(\mathbf{x} - h)}{2h},\tag{16}$$

where *h* represents a non-negative constant. As the value of *h* approaches zero, an approximate estimation of *f*(*x*) can be obtained in the following form:

$$\hat{f}(\mathbf{x}) = \frac{1}{Nh} \sum\_{i=1}^{N} k\left(\frac{\mathbf{x} - \mathbf{x}\_i}{h}\right),\tag{17}$$

where *N* denotes the number of samples, *h* is the bandwidth, and *k*(*x*) represents the kernel function. It is worth mentioning that various kernel functions bring different estimation effects. This investigation is aimed to utilize the Gaussian kernel function, which is commonly exploited and known to produce effective results [3]. The function is represented by the following equation:

$$k(\mathbf{x}) = \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{\mathbf{x}^2}{2}\right). \tag{18}$$

#### 2.4.2. Leave-One-Out Cross-Validation

The bandwidth plays a crucial role in a KDE-based approach. Wide bandwidths are capable of preventing the model from accurately estimating the density of critical features, while a small bandwidth results in an estimation with a higher level of noise. Herein, leave-one-out cross-validation is implemented for the optimal selection of the bandwidth, and mean integrated squared error (MISE) is also utilized to evaluate the error of the kernel density function. The MISE is defined per the following relation:

$$\text{MISE}\left(\hat{f}(\mathbf{x})\right) = E \int \left[ \left( \hat{f}(\mathbf{x}) - f(\mathbf{x}) \right)^2 \right] d\mathbf{x}.\tag{19}$$

The global error of LOOCV is defined as follows:

$$LV = \frac{1}{N} \sum\_{i=1}^{N} MISE\_i. \tag{20}$$

The error resulting from the computation of various bandwidths (*h*) is specified by *LV*(*h*). The optimal bandwidth (*h*0) is determined by identifying the point at which *LV*(*h*) takes its minimum value:

$$h\_0 = \arg\min LV(h), \ h > 0 \tag{21}$$

LOOCV effectively utilizes all the information of the data, resulting in the calculation of optimal parameters for the sample data. However, the corresponding computational time cost is high, and it is generally utilized in the case of small sample data due to the need for *N*-training that fits the model and error metric calculations. In the current investigation, the prediction results of the ERNN-based model for different levels of *τ* are chosen as inputs for kernel density estimation, and then the LOOCV is exploited as the method for bandwidth selection due to the limited number of values for *τ* ∈ (0, 1).

#### **3. Methodology Framework and Evaluation Metrics**

#### *3.1. Methodology Framework*

The framework of the overall WPF is demonstrated in Figure 4. The forecasting process in the present work is divided into the following steps:


**Figure 4.** The framework of the overall WPF.

#### *3.2. Point Estimation Evaluation Metrics*

In regression problems, four of the most commonly used and reliable evaluation metrics for assessing the point prediction accuracy of different models are mean absolute error (MAE), root mean square error (RSME), mean absolute percentage error (MAPE), and coefficient of determination (R2). Their calculation formulas are given in the following Equations (22)–(25):

$$\text{MAE} = \frac{1}{n} \sum\_{t=1}^{n} \parallel y\_i - \hat{y}\_i \parallel \tag{22}$$

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} (y\_i - \hat{y}\_i)^2} \tag{23}$$

$$\text{MAPE} = \frac{1}{n} \sum\_{t=1}^{n} \parallel \frac{y\_i - \hat{y}\_i}{y\_i} \parallel \tag{24}$$

$$R^2 = 1 - \frac{\sum\_{i=1}^{n} (y\_i - \mathcal{g}\_i)^2}{\sum\_{i=1}^{n} (y\_i - \overline{y})^2} \text{ .} \tag{25}$$

where *n* represents the number of predicted samples, *yi* denotes the true value of the response variable, *y*ˆ*<sup>i</sup>* is the predicted value, and *y* specifies the mean value of the real data.

#### *3.3. Interval Prediction Evaluation Metrics*

The quality of the prediction interval (PI) is a crucial feature in assessing the results of probability density prediction. To evaluate the probability density estimation of the model, herein, the following four metrics are employed for comparison: prediction interval coverage probability (PICP), prediction interval normalized average width (PINAW), prediction interval estimation error (PIEE), and coverage width-based criterion (CWC).

The PICP is a crucial evaluation metric for PI; it represents the probability that future wind power will be within the lower and upper limits of the forecast results, and it is defined by the following equation:

$$\text{PICP} = \frac{1}{n} \sum\_{i=1}^{n} \mathbb{C}\_{i\prime} \tag{26}$$

$$\mathbf{C}\_{i} = \begin{cases} \mathbf{1}\_{\prime} & \mathbf{y}\_{i} \in [L\_{i\prime} \,\mathrm{U}\_{i}] \\ \mathbf{0}\_{\prime} & \mathbf{y}\_{i} \notin [L\_{i\prime} \,\mathrm{U}\_{i}] \end{cases} \tag{27}$$

where *Li* and *Ui* in order represent the minimum and maximum values of the prediction interval for the *i*-th sample. The factor *Ci* denotes a Boolean variable, where *Ci* = 1 if the real value falls within the prediction interval and *Ci* = 0 in other cases. It is evident that a wide PI could result in a high PICP; nevertheless, it has minimal value for power planning and decision making. With this in mind, the PINAW is introduced to evaluate PI; it is defined by the following relation:

$$\text{PINAW} = \sum\_{i=1}^{n} \frac{\mathcal{U}\_i - L\_i}{nR} \,\text{},\tag{28}$$

in which *R* denotes the difference between the maximum and minimum values of the response variable *y* to be predicted, and it serves the purpose of standardizing the results to objectively evaluate the width of PI. Lower values of the PINAW imply higher accuracy of the interval prediction results.

The PICP only considers the probability of the real value falling within the prediction interval, without dealing with the error magnitude between the prediction interval and the real value. A relatively novel metric, PIEE [25], provides an understanding of the estimation error of PI. This metric is implemented to more systematically evaluate the risk outside the prediction interval; it is defined as follows:

$$PIEE = \sum\_{i=1}^{n} \frac{E\_i}{nR} \, ^\prime \tag{29}$$

$$E\_i = \begin{cases} y\_i - \mathcal{U}\_{i\prime} & y\_i > \mathcal{U}\_i \\ 0, & L\_i < y\_i < \mathcal{U}\_i \\ L\_i - y\_{i\prime} & y\_i < L\_i \end{cases} \tag{30}$$

The PIEE metric enables us to more precisely evaluate the estimation error of the true value outside the model prediction interval. However, as with PICP, a too-wide PI could result in a low PIEE, which is not significant. To ensure a more accurate and comprehensive evaluation, the CWC metric is introduced. A combination of the three metrics PICP, PINAW, and PIEE is employed to construct an improved CWC metric:

$$\text{CWC} = \text{PINAW} \{ 1 + \gamma\_{PICP} \exp[-(1 + \text{PISE})(\text{PICP} - \mu)] \} \tag{31}$$

$$\gamma\_{PICP} = \begin{cases} 0, & PICP \ge \mu \\ 1, & PICP < \mu \end{cases} \tag{32}$$

where the parameter *μ* represents the basic requirement for interval coverage probability, and a PICP value less than *μ* leads to an exponential penalty. In the current investigation, we set *μ* = 0.9. The penalty factor, denoted by 1 + PIEE, is exploited in the case of the coverage probability requirement not being satisfied. Additionally, it can be observed that the CWC metric takes into account the coverage probability, average width, and estimation error of the prediction interval and serves as a comprehensive index. A smaller value of the CWC implies a higher quality of the prediction interval.

#### *3.4. Probability Density Prediction Is Constructed as a Point Estimation*

In order to compare the estimation of the probability density prediction with that of the point prediction, the mode, median, and mean of the wind power probability density prediction are selected as the point estimation results. The mode corresponds to the peak value of the probability density curve. The median is defined as the middle value of the prediction interval, representing the weighted sum of all probability densities and their predicted values. Hence, this factor takes full advantage of the information from the probability density function [27]. The predicted values of the wind power for the *i*-th sample, *y*ˆ*i*,1 ≤ *y*ˆ*i*,2 ≤ ... ≤ *y*ˆ*i*,*N*, are denoted by *pi*,1 ≤ *pi*,2 ≤ ... ≤ *pi*,*N*, which are their corresponding probability values. The mode, median, and mean values are calculated by the following Equations (33)–(35):

$$Model = \hat{y}\_{i, \text{argmax}(p\_{i,j})\_{\prime}} \; j = 1, 2, \dots, N \tag{33}$$

$$Median = \begin{cases} \mathcal{Y}\_{i, \frac{N+1}{2}} & \text{ $N$  is odd} \\ \frac{\left(\mathcal{Y}\_{i, \frac{N}{2}} + \mathcal{Y}\_{i, \frac{N+2}{2}}\right)}{2}, & \text{ $N$  is even} \end{cases} \tag{34}$$

$$Mean = \sum\_{j=1}^{N} p\_{i,j} \cdot \mathcal{Y}\_{i,j}.\tag{35}$$

#### **4. Empirical Results**

#### *4.1. Data Sources and Preprocessing*

In the current investigation, we use wind power data from the Elia Belgian power grid company website as empirical data to verify and test the validity of the proposed model. For this purpose, the data from the aggregate Belgian wind farms are chosen for a period from 1 January to 28 February 2022. Since the original data have a 15 min frequency, they are resampled to a 1 h frequency to lessen the computational effort and for the ease of recording. According to the demonstrated processed data in Figure 5, it is evident that the wind power series data are highly variable and random. As a result, the probability density prediction of wind power is necessary for quantifying the uncertainty of wind power output and providing results that would be more informative to relevant decisionmakers and stakeholders. About 80% of the data, the purple solid line part (from 1 January 2022 00:00:00 to 17 February 2022 03:00:00), are chosen to be exploited as the training set, whereas the remaining 20% of the data, the brown dashed line part (from 17 February 2022 04:00:00 to 28 February 2022 23:00:00), are utilized as the test set. A sliding window of 168 periods (seven days) is employed to construct the feature variables, meaning that *yt*−167, *yt*−166, ... , *yt* is employed to predict the value of *yt*+1. After the above process is completed, the 3D tensor data from both the training and test sets are normalized to prepare for the NNM fitting. Table 1 provides information on the main parameters of the NNMs used in the present work.

**Figure 5.** Wind power plot of aggregate Belgian wind farms from 1 January to 28 February 2022.


**Table 1.** The main parameters of the NNMs.

#### *4.2. Comparison of the Model Prediction Results*

Nine models are utilized for comparison in order to evaluate the prediction results of classical point estimation methods. These models are appropriately analyzed via four metrics: MAE, RMSE, MAPE, and R2. A comparison of the point prediction results of some of the models is presented in Figure 6. The depicted results indicate that the predicted and actual values for the four models are relatively close. The models exhibit highly accurate prediction performance for intervals where the wind power data are monotonic, while more deviations for intervals are observed for the cases in which the wind power fluctuates and varies. Notably, the QRNN model predicts more dramatic fluctuations between 24 February 2022 and 27 February 2022, which could be related to its training process that utilizes an absolute value loss function. The four error metrics calculated for all models on the test set are given in Table 2.

**Figure 6.** Plots of the point prediction results based on the partial models.



From the results presented in Table 2, the following conclusions can be drawn:


#### *4.3. The Predicted Results Based on the Various Levels of τ*

The model has been trained and tested with different levels of *τ*. The effect of the prediction curve is presented in Figure 7, and the corresponding evaluation metrics calculated are presented in Figure 8.

**Figure 7.** The graphed prediction results for various levels of *τ*.

Figure 7 illustrates the plotted prediction results based on different *τ* values. It is apparent that the prediction curves are highly similar in trend and degree of fluctuation and are superimposed to configure a confidence interval covering the actual value. It is feasible and reliable to use these predicted values to construct probability density estimation curves.

From Figure 8, it is obtainable that the prediction performance is better and more consistent with less error in the case of *τ* in the range of 0.4 to 0.85. When the value of *τ* is considered too large or too small, it leads to a strong asymmetry in the loss function, which is appropriate for describing the corresponding conditional distribution, but the overall prediction performance is poorer.

**Figure 8.** Prediction metrics for different levels of *τ*.

#### *4.4. Probability Density Prediction Results*

Before performing the kernel density estimation, the optimal bandwidth size selected is appropriately verified by a leave-one-out cross-validation for each group of bit data in the test set. The box plots of all optimal bandwidths (*h*) are demonstrated in Figure 9. Figure 9 clearly displays that the majority of the optimal bandwidths (*h*) are in the range of 40–90.

**Figure 9.** Optimal range of the bandwidth.

The first nine points of the test set (from 17 February 2022 04:00:00 to 17 February 2022 12:00:00) are chosen, and the actual values and probability density curves of the wind power are demonstrated in Figure 10. The blue curve and the red dashed line represent the kernel density estimation curve and the actual values of the test set, respectively. All the actual values clearly fall within the predicted probability density curve, with the majority of the values being concentrated around the peak of the estimated probability density. This indicates that the estimated probability density effectively captures the inherent uncertainty in wind power generation. The location of the estimated probability density curve peak may be the true value of the wind power data. The probability density estimation offers several advantages such as quantifying uncertainty and improving prediction accuracy, providing decision-makers with more precise information about the WPF.

**Figure 10.** Probability density curves of the wind power for the partial test set.

The results of the probability density estimation for the proposed model, QRNN, and ER are compared in Table 3. The evaluation metrics for the point estimates (i.e., mode, median, and mean) constructed from the probability density estimates of each model are given in this table. Additionally, the corresponding histograms are presented in Figure 11, providing a visual representation of the performance of each model.


**Table 3.** The evaluated metrics for point estimation based on several approaches.

The presented results in Table 3 and Figure 11 display that the point prediction errors based on the probability density estimation of the proposed Transformer-ER model are substantially lower in comparison to those of the QRNN and linear ER models, which do not take into account temporal effects. Additionally, regardless of the model or method used, the mode, median, and mean values of the probability density predictions are relatively similar in terms of performance. The mean accuracy is slightly higher than mode and median accuracies because it takes into account all the information of the predicted data. Among all the models and methods, the Transformer-ER model exhibits the lowest MAE and RMSE and the highest R2 for the mean probability density, making it the best point prediction result. Its error metric is smaller than the point prediction results of almost all models in Table 2. It is worth mentioning that the exploitation of the MAPE may not be reliable due to the presence of values close to or equal to zero in the test set.

**Figure 11.** Evaluation metrics for point estimates constructed by the probability density prediction.

The evaluation metrics for interval estimation are provided in Table 4. The PICP values of the Transformer-ER, QRNN, and ER models are remarkably different. The QRNN model presents a high PICP, accordingly presenting a low PIEE. The ER of the linear model fails to satisfactorily fit the uncertainty of the wind power data, with a PICP value of only 42.25%. However, the higher PICP of the QRNN is derived from a larger average width of the prediction interval. This means that the QRNN gives an extensive prediction interval, which is of little significance for practical decision making. On the contrary, the Transformer-ER-based model exhibits a more moderate PICP and a smaller PINAW, and its composite index CWC has the smallest value. Therefore, the probability density prediction interval of the Transformer-ER model exhibits higher quality than that of other models.



As can be observed from Figures 12 and 13, while the PIs obtained from the QRNN model cover a majority of the actual values of the wind power, they also exhibit a broader range compared to the PIs from the Transformer-ER model. This broader range of PIs from the QRNN model could lead to a growth of uncertainty in the prediction of wind power forecasting; thus, it could not be beneficial in power planning and decision making. The PIs of the Transformer-ER model are more precise, as they are narrower in zones where the wind power data exhibit a monotonic increase or decrease and broader in zones where the wind power is volatile and variable. This issue would be effectively helpful in capturing the uncertainty in wind power forecasting, providing decision-makers with more relevant and useful information.

**Figure 12.** Plot of the prediction intervals of the ERNN-based model.

**Figure 13.** Plot of the prediction intervals of the QRNN-based model.

#### **5. Conclusions**

In the current investigation, a combination of the transformer network that performed best in the sequential data task and expectile regression is proposed for effective wind power prediction via an ERNN structure. The model is optimized by employing the cuckoo search algorithm. The methodology of kernel density estimation is then exploited to achieve the complete probability density curve, which is then built into the point and interval prediction. These predicted results are separately evaluated to provide comprehensive information on the uncertainty of the wind power. The proposed approach is then validated and tested based on the wind power generation data from the Belgian power grid company Elia. The major obtained conclusions are as follows: (1) The proposed model effectively addresses the volatility and stochastic nature of wind power data, provides comprehensive and accurate prediction, reduces the operational risks associated with wind power generation, and enhances the stability of power systems. (2) The transformer network, when compared to the commonly exploited recurrent neural networks, demonstrates the superior capability to capture the internal correlations and dependencies in long sequences and yields a higher level of prediction accuracy. (3) The proposed probability density prediction approach in this paper is capable of providing more comprehensive information for relevant stakeholders and decision-makers and has been proven to be more robust and accurate than point predictions. (4) The proposed ERNN-based model produces more accurate and narrow prediction intervals compared to QRNN models and thereby leads to higher quality prediction intervals in general.

**Author Contributions:** Conceptualization, X.H.; methodology, H.X. and X.H.; software, H.X.; validation, H.X.; formal analysis, H.X.; investigation, H.X.; resources, H.X.; data curation, H.X.; writing original draft preparation, H.X.; writing—review and editing, H.X., X.H. and C.L.; visualization, H.X.; supervision, X.H. and C.L.; project administration, X.H.; funding acquisition, X.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the National Natural Science Foundation of China (NSFC) (No. 11201356) and Hubei Province Key Laboratory of Systems Science in Metallurgical Process (Wuhan University of Science and Technology) (No. Y202201).

**Data Availability Statement:** The data presented in this study are available from the following website: https://www.elia.be/en/grid-data/power-generation/wind-power-generation (accessed on 2 January 2023).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Lightweight Network-Based Surface Defect Detection Method for Steel Plates**

**Changqing Wang 1,2,3, Maoxuan Sun 1,2,3, Yuan Cao 1,2,3,\*, Kunyu He 1,2,3, Bei Zhang 1,2,3, Zhonghao Cao 1,2,3 and Meng Wang 1,2,3**


<sup>3</sup> Henan Engineering Laboratory of Additive Intelligent Manufacturing, Xinxiang 453007, China

**\*** Correspondence: xyuan\_cao@163.com

**Abstract:** This article proposes a lightweight YOLO-ACG detection algorithm that balances accuracy and speed, which improves on the classification errors and missed detections present in existing steel plate defect detection algorithms. To highlight the key elements of the desired area of surface flaws in steel plates, a void space convolutional pyramid pooling model is applied to the backbone network. This model improves the fusion of high- and low-level semantic information by designing feature pyramid networks with embedded spatial attention. According to the experimental findings, the suggested detection algorithm enhances the mapped value by about 4% once compared to the YOLOv4-Ghost detection algorithm on the homemade data set. Additionally, the real-time detection speed reaches about 103FPS, which is about 7FPS faster than the YOLOv4-Ghost detection algorithm, and the detection capability of steel surface defects is significantly enhanced to meet the needs of real-time detection of realistic scenes in the mobile terminal.

**Keywords:** defect detection; lightweight; cavity spatial convolution; spatial attention

#### **1. Introduction**

With the rapid development of industrial automation technology, the study of automated [1,2] detection of defects in industrial production is receiving more and more attention. Due to the influence of various uncertainties, the surface of the steel plate in the production process will produce a variety of defects [3–7], such as scratches, deformation, welds, holes, etc. These defects [8–12] not only affect the integrity of the steel plate but also make a certain impact on the quality of the steel plate, so a more accurate detection of defects [13–16] on the surface of the steel plate is of paramount importance.

Conventional inspection methods use manual observation to detect defects, which is not only time-consuming and labor-intensive, but the results still do not meet the expected requirements. Based on the traditional industrial inspection methods proposed, the automated defect detection technology has been driven to a new level. Experts and scholars at home and abroad have conducted more profound research and practice on traditional machine vision in the detection of defects in steel plates. The enhanced BP detection algorithm was presented by Peng et al. [17] to detect flaws in steel plates. While this technique has a decent detection performance for flaws that are clear targets, it has a sluggish convergence rate and poor performance for small samples. Wang Yixin et al. [18] suggested a comparative detection approach utilizing machine vision; however, despite its high accuracy in recognizing faults in steel plates, it has a higher environmental impact and is incapable of detecting flaws in harsh conditions due to its difficulty with extracting feature images.

At this juncture, the accuracy of steel plate surface flaw detection [19] has increased due to the rapid development of deep learning technology in industrial inspection. Tian Siyang et al. investigated at timeframe instances of hot-rolled strip steel surface faults, identifying two faults,

**Citation:** Wang, C.; Sun, M.; Cao, Y.; He, K.; Zhang, B.; Cao, Z.; Wang, M. Lightweight Network-Based Surface Defect Detection Method for Steel Plates. *Sustainability* **2023**, *15*, 3733. https://doi.org/10.3390/su15043733

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 18 November 2022 Revised: 9 February 2023 Accepted: 9 February 2023 Published: 17 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

watermarks and water droplets. The one-stage identification YOLOv2 [20] algorithm was developed and tested for a wide range of surface flaws on steel sheets, as well as against several interference effects caused by false defects. Although the approach can detect surface flaws in hot-tied steel sheets with an average mAP of 92.54%, the detection speed of just 14 FPS prevents real-time detection. Xu Qian et al. used a modified YOLOv3 [21] network structure for the detection of surface defects in steel plates, reducing the model size of YOLOv3 by using a lighter MobileNet [22] network model, adding a cavity convolutional neural network [23] to the network to improve the defect detection capability of steel plates, and adding the Inceptionv3 structure to the network to make the number of layers richer.

In this paper, a defect detection technique for YOLO-ACG is proposed. First, the model's detection accuracy and speed have significantly increased thanks to the use of GhostNet as the replacement for the backbone network of CSPMakenet53 in the YOLOv4 network. Secondly, by replacing the spatial convolutional pooling pyramids at different scales in the original YOLOv4-Ghost network with more accurate spatial convolutional pooling pyramids in the null space, the focus of the model on the significant regions of the feature map target is increased and the perceptual regions of the feature map are enhanced by combining the semantic information of the context. Finally, the pyramid network structure of feature fusion with spatial attention mechanism is embedded in the network design, and the loss of information at the edges of the feature map is addressed by using the FPN structure to connect the fusion of two channels from top to bottom, which facilitates the fusion of information at different scales. The experimental results demonstrate that the YOLO-ACG algorithm can detect surface flaws in steel sheets more quickly and accurately than other lightweight methods, meeting the expectations of industrial inspection. The article will be followed by a more detailed analysis of the YOLO-ACG algorithm in terms of network structure and experimental data.

#### **2. Methodology**

#### *2.1. The YOLOv4 Backbone Network*

The YOLO (You Only Look Once) algorithm was put forth as a ONE-STAGE target detection technique by Redmon et al. [24] in 2016. The fundamental idea behind the YOLO algorithm is to approach the object recognition problem as a regression problem and utilize a convolutional neural network [25] structure to directly forecast the bounding box and category probabilities from the input image. The fourth iteration of the YOLO algorithm, YOLOv4, employs a variety of algorithmic network architectures, including feature pyramid networks and complete convolutional networks. The CSPDarknet53 backbone network, which is seen in Figure 1, replaces the YOLOv3 algorithm's Darknet53 backbone network topology. Additionally, YOLOv4 use a Mish function for the activation function, logistic regression for the categorization of images, and a feature pyramid network for multi-scale target detection, all of which maintain a high accuracy rate and ensure realtime monitoring.

The YOLOv4 algorithm's backbone network is CSPDakrnet53, which is also one of the best backbone networks. CSPDarknet53 generates three outputs, designated P3, P4, and P5, after applying convolutional layers 1 × 1 and 3 × 3. In this process, P3 and P4 are convolved once for 1 × 1 and then input to the enhanced feature extraction network for feature fusion. P5 is convolved three times and input to the void pyramid pooling layer, with the pooling results being input to the enhanced feature extraction network for feature fusion.

#### *2.2. GhostNet*

Yolov4-ghost will replace the current YOLOv4 backbone network with the GhostNet module [26], making the network lighter and easier to deploy on mobile terminals. The head network uses a PAN (path aggregation network) network structure, while the backbone network consists of convolution, spatial pyramid pooling (SPP), and GhostBottleneck.


**Figure 1.** CSPDarknet53 backbone network.

GhostBottleneck is replaced by GhostNet as the network's hub. In place of other network modules, a plug-and-play reusable module called GhostBottleneck dramatically decreases the computational load and model volume of method models. Two Ghost modules are stacked to create a GhostBottleneck. The primary goal of the first Ghost module is to enable the addition of additional channels and dimensions to the feature extraction, usually in the form of an extension layer. The second Ghost module checks to see if the feature extraction dimension still matches its input after lowering the number of channels. The input and output of the two Ghost modules are finally connected. When the second Ghost module is utilized, the ReLU function is not. Because of the variation in the input data distribution between the front layer and the back layer following the activation function, constant matching is required, which reduces training efficiency.

#### *2.3. Loss Function*

Steel plates can have a wide variety of imperfections, so the algorithm used in the detection process needs to be extremely precise to identify the types and locations of flaws. The three components that make up the loss function are as follows: (1) confidence loss; (2) classification loss; and (3) bounding box regression loss.

$$\begin{aligned} l \log\_a &= \sum\_{i=0}^{S^2} \sum\_{j=0}^{B} \mathcal{W}\_{ij}^{obj} [\mathbf{C}\_i^j \log(\mathbf{C}\_i^j) + (1 - \mathbf{C}\_i^j) \log(1 - \mathbf{C}\_i^j)] \\ &- l\_{noobj} \sum\_{i=0}^{S^2} \sum\_{j=0}^{B} (1 - \mathcal{W}\_{ij}^{obj}) [\mathbf{C}\_i^j \log(\mathbf{C}\_i^j) + (1 - \hat{\mathbf{C}}\_i^j) \log(1 - \mathbf{C}\_i^j)]; \mathbf{C}\_i^j = P\_{i,j} \ast IIO\_{pred}^{truth} \end{aligned} \tag{1}$$

$$loss\_b = \sum\_{i=0}^{S^2} \sum\_{j=0}^{B} \mathcal{W}\_{ij}^{obj} \sum\_{c=1}^{C} \left[ p\_i^j \log(p\_i^j(c)) + (1 - p\_i^j(c)) \log(1 - p\_i^j(c)) \right] \tag{2}$$

$$\text{loss}\_{\mathbb{C}} = 1 - IOL + \frac{\rho(d, d^{\text{st}})}{c^2} + av; \; v = \frac{4}{\pi} (\arctan \frac{w^{\text{gt}}}{h^{\text{gt}}} - \arctan \frac{w}{h}) \tag{3}$$

where *Wobj ij* represents whether the *<sup>j</sup>*th anchor box is in the predicted *<sup>i</sup>*th grid, *<sup>C</sup><sup>j</sup> <sup>i</sup>* is represented by the confidence that the *j*th bounding box in the *i*th grid has produced, *Pi*,*<sup>j</sup>*

represents the objective function's discriminant, *C*ˆ*<sup>j</sup> <sup>i</sup>* shows the measurement's actual value; *P*ˆ*j <sup>i</sup>* (*c*) is represented as the expected likelihood that the *j*th bounding box in I grids belongs to c; *P<sup>j</sup> <sup>i</sup>* (*c*) is defined as the actual likelihood that *j*th bounding box in *i*th grid belongs to c.; *dst*, *d* as the intersection of the true and anticipated boxes, ρ is represented by the location of the *dst*, *d* centroid. In Figure 2, where the square box represents the prediction box and the rectangular box represents the real box, c is the diagonal distance between the minimum closed loops of the two boxes.

**Figure 2.** Anchor box.

#### **3. Our Approach**

#### *3.1. YOLO-ACG Algorithm*

This study proposes the three-part YOLO-ACG network model, which is based on the YOLOv4 algorithm and is visible in Figure 3. It includes a backbone network, a feature fusion network, and a detection head network. RGB images with a three-channel output are used as the input. The first step is to enlarge the feature scale of the image to 52 × 52, 26 × 26 and 13 × 13 for image information screening and extraction through the P3, P4, and P5 levels of the backbone network. Second, the extracted results are sent to the CBM and CA attention mechanisms, and the ASPP module is added to the output of the extracted features to enhance the effective recognition of target defect differences by merging global and local characteristics with various perceptual fields. The CA attention mechanism then reinforces the output features to improve their location correlations and interactions across latitudes. The problem of the higher-layer network losing the feature information of the lowerlayer network during the information extraction process is effectively resolved by fusing the extracted three-feature layers with various semantic information, passing the feature information, and allowing the feature information to enter the feature fusion network after the CA attention mechanism is finished. Finally, the non-maximal suppression algorithm (NMS), whose thresholds are used to further filter the redundant anchor frames during the NMS processing, is combined with the center distance factor of the prediction frame to create the final prediction frame.

#### *3.2. Ghost Module*

The redundancy of feature maps in neural convolutional networks is one of the most important features. When the feature maps are output for visualization, there are many outputs with very similar features, which can be obtained by simple linear transformations without complicated operations. As shown in Figure 4, the working principle of the standard convolution and Ghost modules are presented separately.

As shown in Figure 4a for standard convolution, although ordinary convolution extracts features by using a large number of convolutions, and thus, generates a feature map, the excessive number of convolution kernels with the number of channels generates redundant information and leads to an increase in computation. Additionally, the Ghost module in Figure 4b separates the regular convolution into two parts, extracts the partial convolution to create the feature map, and then efficiently creates the whole feature map using straightforward linear operations.

**Figure 3.** YOLO-ACG network model.

**Figure 4.** Standard convolution and Ghost module.

To reflect the benefits of Ghost convolution in convolutional computing, the input feature map's width, height, and channel count are all assumed to be w, h, and c, respectively. The output after one convolution is *n* ∗ *h* ∗ *w* , where k and d are the sizes of the linearly variable convolution kernel and standard convolution kernel, respectively. Equation (4) illustrates that the amount of convolution computation performed in the Ghost module serves as the denominator and the numerator of the equation, respectively. Comparing the two convolutional calculations reveals that for the same parameters, the normal convolutional computation is s times larger than the convolutional computation in the Ghost module.

$$\begin{array}{c} r\_s = \frac{n \cdot h' \cdot n' \cdot c \cdot k \cdot k}{\frac{1}{s} \cdot h' \cdot n' \cdot c \cdot k \cdot k + (s-1) \cdot \frac{n}{s} \cdot h' \cdot n' \cdot d \cdot d} \\ = \frac{c \cdot k \cdot k}{\frac{1}{s} \cdot c \cdot k \cdot k + \frac{s-1}{s} \cdot d \cdot d} \approx \frac{s \cdot c}{s + c - 1} \quad \approx s \end{array} \tag{4}$$

#### *3.3. Improved ASPP Module*

The SPP [27] structure serves as the pyramidal pooling module for the whole YOLOv4 network. Large numbers of picture features must be stored in the SPP structure, and feature extraction from the feature map requires laborious multi-stage training that takes too long. The ASPP structure addresses the drawbacks of SPP by substituting the pooling process in the SPP structure with null convolution, which is typically utilized as the global feature extraction of the feature image and can be employed in the feature map with emphasis on the picture, preventing the loss of image data, although improving the semantic segmentation ability of the feature map and enhancing the perceptual field also results in the loss of information details on the edges of the feature map. The ASPP structure is a good answer to the aforementioned issue since it increases the perceptual field of the feature map without losing the finer details of the edge information.

The ASPP structure has two parts, the first part consists of an 1 × 1 convolutional layer and three 3 × 3 null convolutional layers with sampling rates of [6,12,18], respectively, whose convolutional kernels have a size of 256; the second part is a convolutional operation of 1 × 1 by global level pooling, and the same convolutional kernels also have a size of 256. Figure 5 depicts the ASPP module's structural layout. The method of null convolution and upsampling and the multiscale structure are used to realize the feature extraction of images in the environment of high resolution and perceptual field. This makes a significant improvement in the perceptual field of the feature image and the processing of the details of the edge of the feature image. The expansion rate is introduced in the convolution layer, expressed as the number of zero values in the convolution kernel.

#### **Figure 5.** ASPP module.

#### *3.4. CA Attention Mechanism Module*

It was discovered throughout the target detection procedure that there was a lack of effectiveness in the detection of subtle faults. To solve this issue, the network was enhanced with the CA (coordinate attention) spatial attention mechanism [28], which increases the significance of the location relationship and cross-latitude interaction on the channel attention mechanism and improves the accuracy and sensitivity of the entire network model to the information and location of the defective targets. To solve this issue, the CA (coordinate attention) spatial attention mechanism was incorporated into the network to increase the significance of its location relationship and cross-latitude interaction on the channel attention mechanism, making the entire network model more accurate and sensitive to the information and location of the defective targets. Figure 6 depicts the structure of the added CA attention network, which involves pooling the feature maps globally to obtain feature information in both directions. Sub-associative fusion and 1 × 1 convolutional transforms were then used to account for feature variation; finally, the integrated feature maps were divided into two feature maps with an equal number of channels by two 1 × 1 convolutions through transformation before being output by the added CA attention network. The CA module encodes the feature map's precise location to produce the width and height, such as the output concatenate for feature fusion, which is represented in Equation (5); Equation (6) represents the feature transformation of two independent features to make the input's dimensionality consistent; combining *g<sup>n</sup>* and *g<sup>m</sup>* to create a weight matrix in Equation (7) represents the outcome.

$$f = \beta(F([z^n, z^m]))\tag{5}$$

$$\mathfrak{g}^n = \delta(F\_n(f^n)), \mathfrak{g}^m = \delta(F\_m(f^m)) \tag{6}$$

$$\mathbf{x}\_d(i,j) = \mathbf{x}\_d(i,j) \times \mathbf{g}\_c^n(i) \times \mathbf{g}\_c^m(j) \tag{7}$$

where *f* denotes the mapped feature map, *β* denotes the nonlinear activation function, *z<sup>n</sup>* and *z<sup>m</sup>* denote the horizontal and vertical position relationship, *g<sup>n</sup>* and *g<sup>m</sup>* denote the feature map after the sigmoid output of two identical number of channels. Finally, *xa* denotes the connected jump feature information.

**Figure 6.** CA attention network.

#### **4. Experimental Preparation**

#### *4.1. Test Environment*

The experimental platform is Win10 OS, CPU is 12th Gen Intel(R) Core(TM) i7-12700KF 3.60 GHz; memory is 64 GB; GUP is NVIDIA GeForce RTX3090Ti.Pytorch 1.10.2 is used; the software runs in Anaconda 3.6; Cuda 10.0 and Cudnn 7.5 were installed to help speed up the GUP process, and Tensorflow 1.13.1, Opencv4.1, and Numpy 1.14.2 were installed in the environment. The auxiliary databases were installed to make the code run correctly.

#### *4.2. Production of Data Set*

In terms of data set, it was discovered that the three types of flaws that have the most impact on the quality of steel plate during the identification of defects in steel plate were weld, hole, and scratch. The number of feature photos offered by these three categories of steel plate defects was insufficient to meet the training needs, even though the existing data collection of the surface defects of German DAGM steel plate has a total of 10 types of steel plate defects. Therefore, the data set was expanded by combining the actual scene shooting with the public data set. A total of 4500 defect feature images were obtained from the entire self-made data set through on-site collection and selection of public data sets. Because it was discovered that the format and size of the feature map would influence the detection efficiency during the detection process, LabelImg labeling software was used to label the area of each type of image in proportion to the area to be labeled, and the length and width of the image were less than or equal to 3:1. This will help to better train the neural convolution network model. XML files should be used to store critical data from marked defect boxes in order to apply it to neural convolution network learning.

The target's proportion in the image varies slightly as a result of variations in the camera's viewing distance, and the model's capacity for adaptation is reduced by the various target sizes. The method of random scaling, clipping, and distribution of the logarithmic data set, which is more accurate for the training of the data set, was utilized to carry out the random splicing of photos in the preprocessing stage to tackle this problem. Figure 7 displays an illustration of data enhancement.

**Figure 7.** Example of data enhancement.

#### **5. Results and Discussion**

The self-made data set utilized in this experiment randomly separated the training set and test set at a ratio of 6:4. Several ablation experiments were then set up to assess the effects of each model improvement on the training effect and to choose the best model. The usefulness of the proposed algorithm was further confirmed by several sets of comparison experiments with the better steel defect detection methods already in use, and the superiority of the algorithm was assessed by the average detection accuracy (map) and the detection speed (FPS).

#### *5.1. Training Model*

The experiments in this paper were carried out in accordance with the predetermined parameters: the image input size was 416 × 416; the epoch was set to 300 rounds; the batch size was 128 for the first 50 rounds and 64 for the last 250 rounds; the learning rate was 1 <sup>×</sup> <sup>10</sup>−<sup>2</sup> for the first 50 rounds and 1<sup>×</sup> <sup>10</sup>−<sup>3</sup> for the last 250 rounds; momentum was the amount of stochastic gradient descent in order to obtain a better convergence effect, momentum was set at 0.937. Momentum is the expression of the reduction in the learning efficiency of the initialization of the weights in terms of momentum. Figure 8 depicts the loss curve during the training phase.

**Figure 8.** Loss curve during training.

It can be observed from the figure that the loss value decreases continuously as the epoch increases, and when the training proceeds to 50 rounds the loss curve is basically in a stable state without generating an overfitting situation, and the loss value of the YOLO-ACG algorithm converges to about 0.19 and 0.14 by increasing the accuracy of the recognition model, so the overall parameter setting of the algorithm is reasonable.

#### *5.2. Comparison Experiment*

The YOLO-ACG method is compared with current popular detection techniques, such as YOLOv4, YOLOv4-MobileNetv1, YOLOv4-MobileNetv2 [29], YOLOv4-MobileNetv3, YOLOv3-tiny and YOLOv4-tiny, in a self-made data set, as shown in Table 1, to confirm that the algorithm's improvement is more genuine and reliable.

**Table 1.** Data set to compare the experimental results.


The chart shows that when employing a big network model, such as the YOLOv4 [30] detection method, the algorithm has a very high detection accuracy of 96.35% but the model's size is relatively large at 244.7 MB, making it difficult to deploy on mobile devices. Some lightweight models, Models 2 to 6 in Table 1, enhance the YOLO algorithm. The comparison results reveal that the lightweight model's size has been significantly decreased in comparison to YOLOv4, which is more readily implemented in mobile devices, but that the model's detection accuracy has been significantly reduced in comparison to the YOLOv4 algorithm.

In view of this, the YOLO-ACG network model proposed in this paper takes into account the computational speed of the model, the detection accuracy, and the size of the model. Its model size is about 1/4 times that of the YOLOv4 network model, although slightly higher than that of Table 2–6 models, about 1/3 times that of 2–6 network models. The suggested approach has unquestionable advantages in terms of operation speed, reaching up to 103FPS. It surpasses roughly 18FPS, as compared to YOLOv4. It exceeds by almost two times 2–6 models, realizing high-speed detection. The accuracy of the YOLOv4-ACG model is around 2% greater than that of the models in Table 2–6, although being about 4% lower than that of the YOLOv4 model. From the above three aspects, we can see that YOLO-ACG is more efficient when deployed on mobile devices.

#### *5.3. Ablation Experiment*

The ablation experiment is to improve different modules based on the YOLOv4 Ghost algorithm and use self-made data sets to conduct training and performance evaluation. Table 2 shows the comparison of evaluation results of all models.

**Table 2.** Training results of different algorithm models.


As can be seen from the table, experiments conducted with the introduction of SPP and ASPP modules in the model reveal that the model size of the algorithm is approximately 1.5 times larger when the ASPP module is introduced than when the SPP module is introduced. In terms of accuracy, it is about 2% higher than the SPP module. In terms of recall rate, there is a significant improvement of approximately 8% compared to the SPP module. The detection speed is also improved compared with the introduction of the SPP module, which can reach about 98FPS. The comparison of the total ablation experiment reveals that, despite the model's size being only slightly larger—by about a third—than that of the method with the SPP module, it has far faster and more accurate detection. As a result, the ASPP module is elected as the algorithm's primary pooling layer. It can be shown from comparative tests 6–9 that the accuracy and speed of the algorithm in the detection algorithm with only the ASPP module are marginally improved, while the model size is slightly decreased, when compared to the introduction of ASPP module and the addition of SE [31], ECA, and CBAM. By comparing Experiments 6 to 10, it can be seen that Experiment 10's model size is only slightly smaller than Experiments 6 to 9's. In terms of recall, Experiment 10 exceeded Experiments 6 to 9 by about 2% to 5%, but experiment 10's precision is about 3% to 4% higher, and Experiment 10's speed is about 103 FPS, which is higher than Experiment 6 to 9's speed by about 6 to 9 FPS.

The aforementioned studies demonstrate that the revised YOLO-ACG algorithm approaches are efficient and increase the model's accuracy in the detection of steel plate surface flaws. In order to identify steel plate surface flaws more effectively, it possesses the qualities of quick detection speed, lightweight models, and ease of deployment in real-world situations.

#### **6. Conclusions**

This study suggests the YOLO-ACG method addresses the shortcoming of the YOLOv4 algorithm in handling flaw identification of steel plate data. From the three points below, the algorithm has been improved. First, lower the model size and substitute the present YOLOV4 method's backbone network with the lightweight Ghost module to make the algorithm simple to install on mobile devices. Then, to increase the maximum pooling efficiency of the YOLOv4 algorithm, the ASPP module is introduced to replace the maximum pooling layer. This considerably enhances the processing of the feature image's edge details and the feature image's receptive field. Finally, employing the pyramid feature fusion network CA module enables the enhancement of feature map fusion effectiveness in various scale spaces and further enhances the analysis of feature maps' edge information.

From the analysis of the experimental results, the proposed YOLO-ACG target detection algorithm applied to a homemade data set has a higher mAP of about 3% compared

to the existing YOLOv4-MobileNet algorithm model. In comparison with the size of the YOLOv4 algorithm model, the proposed algorithm is about 1/4 of the YOLOv4 algorithm model. The detection speed of the YOLO-ACG algorithm can reach about 103 FPS, which is twice as fast as the existing YOLOv4-MobileNet algorithm model. Therefore, YOLO-ACG target detection has significantly improved the ability to detect defects on the surface of steel plates and meet the mobile requirements for the real-time detection of realistic scenes.

**Author Contributions:** Conceptualization, C.W. and M.S.; methodology, C.W.; software, M.S.; validation, M.S.; formal analysis, C.W.; investigation, M.S.; resources, C.W.; data curation, M.S.; writing original draft preparation, M.S.; writing—review and editing, C.W., Y.C., B.Z., K.H., Z.C. and M.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This project was supported by the National Natural Science Foundation of China (Fund Numbered 52177004).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data sets used and analyzed in the current study are available from the corresponding author on reasonable request.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **TCNformer Model for Photovoltaic Power Prediction**

**Shipeng Liu 1,2, Dejun Ning 1,\* and Jue Ma 1,2**


**Abstract:** Despite the growing capabilities of the short-term prediction of photovoltaic power, we still face two challenges to longer time-range predictions: error accumulation and long-term time series feature extraction. In order to improve the longer time range prediction accuracy of photovoltaic power, this paper proposes a seq2seq prediction model TCNformer, which outperforms other stateof-the-art (SOTA) algorithms by introducing variable selection (VS), long- and short-term time series feature extraction (LSTFE), and one-step temporal convolutional network (TCN) decoding. A VS module employs correlation analysis and periodic analysis to separate the time series correlation information, LSTFE extracts multiple time series features from time series data, and one-step TCN decoding realizes generative predictions. We demonstrate here that TCNformer has the lowest mean squared error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) in contrast to the other algorithms in the field of the short-term prediction of photovoltaic power, and furthermore, the effectiveness of each module has been verified through ablation experiments.

**Keywords:** transformer; SkipGRU; TCN; photovoltaic power prediction; time series data prediction

#### **1. Introduction**

At present, with the rapid development of perovskite solar cell technology [1,2], the maximum efficiency [3] and stability [4] of photovoltaic power have been greatly improved. Photovoltaic power is increasingly important in the field of new energy. According to the data of the International Energy Agency (IEA), the growth rate of global photovoltaic installed capacity has reached as much as 49%. It is estimated that global photovoltaic power will reach 16% of the total power in 2050 [5]. At the same time, China is promoting the construction of a new power system with new energy as the principal part. Photovoltaic power using solar energy is an important branch of new energy and one of the important means for China to achieve the goal of carbon neutrality. After the large-scale integration of photovoltaic power stations into the energy network, the manner by which to accurately predict photovoltaic power and then accordingly dispatch the power grid has become an urgent problem to be addressed. Therefore, improving the prediction accuracy of photovoltaic power is significant for improving the operation efficiency of power stations themselves and for maintaining the stability of power grids.

Many scholars in China and abroad have carried out a lot of research on the prediction of photovoltaic power. At present, the mainstream prediction methods focus on traditional random learning and deep learning methods. In the field of traditional random learning, literature [6] uses historical weather data and historical power data as inputs of a support vector machine (SVM) to build a short-term photovoltaic power prediction model, which has a higher level of accuracy than the traditional autoregressive model (AR) or the radial basis function (RBF) models. One study [7] proposed a model based on Support Vector Regression (SVR) and achieved better prediction performance. In the field of deep learning, recurrent neural network (RNN) structures, such as long short-term memory (LSTM), gated recurrent unit (GRU), and seq2seq structural models, are widely used to analyze and predict time series data for such applications as stock price prediction [8], gold price prediction [9],

**Citation:** Liu, S.; Ning, D.; Ma, J. TCNformer Model for Photovoltaic Power Prediction. *Appl. Sci.* **2023**, *13*, 2593. https://doi.org/10.3390/ app13042593

Academic Editor: Sergio Nesmachnow

Received: 7 February 2023 Revised: 15 February 2023 Accepted: 16 February 2023 Published: 17 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

traffic flow [10], voice classification [11], etc. The prediction of photovoltaic power can also be regarded as a kind of time series data prediction, so the above algorithms have been used to predict short-term global horizontal irradiance (GHI) or comprehensive solar loads [12,13]. Furthermore, in order to ensure the accuracy as much as possible and reduce the training time, the GRU network has been applied to short-term photovoltaic power prediction [14], and the multivariable GRU model [15–17] has been used to predict solar irradiance or power. Some hybrid models have been applied in the field of photovoltaic power generation prediction, such as the combination of a deep learning model and a heuristic algorithm [18,19], the combination of a deep learning model and a traditional random learning method [20,21], the combination of multiple deep learning models [22,23], etc. The seq2seq structural model represented by the Transformer series model takes the photovoltaic power prediction problem as a experimental sample of its model, such as Autoformer and Informer [24,25]. However, in these models, usually the photovoltaic power prediction data are only used for prediction; that is, the corresponding weather data is not fully used, and the time series features of the data are not fully extracted.

Compared with traditional LSTM, GRU, and other models, the Transformer series seq2seq model can avoid the problem of error accumulation and read longer input data [26], but it is still limited by the length of the input data. It is difficult for the seq2seq series model to capture longer time series features. For this problem, [27] proposes long- and short-time series network (LSTnet) models. The Skip recurrent neural network (Skip RNN) structure is used to capture more long-term time series features.

Based on the above analysis, the current research mainly focuses on the prediction of data within a few hours. When applied to predict a longer time range [28] for photovoltaic power, these methods typically suffer from two major challenges: error accumulation and long-term time series feature extraction in order to simultaneously extract multiple time series features in the historical data of photovoltaic power and weather factors, and to avoid error accumulation. Inspired by the application of LSTM, LSTnet, and Transformer series models in the field of photovoltaic power prediction, this paper proposes a long and short temporary correction network (TCNformer), and we verified the model by using the real data of a photovoltaic station in Australia. According to the experimental results, the TCNformer model greatly optimizes various indicators compared with LSTM, SkipGRU, Transformer, and Informer, improving the accuracy of photovoltaic power prediction.

The contributions of this paper include the following:


#### **2. Preliminary**

#### *2.1. Time Series Features of Photovoltaic Power Data*

According to the literature [29,30], the current photovoltaic power prediction problem is usually defined as a time series data prediction problem. However, as the time granularity increases, the degree of the photovoltaic power data affected by external factors increases, and the self-similarity decreases. The basic photovoltaic power data studied in this paper are collected at a 15-min granularity, and they are greatly affected by external factors that have a certain regularity and contingency, so the statistical features of photovoltaic power data show certain periodicity, abruptness, and contingency.

As shown in Figure 1, the 4-day power history data of a photovoltaic station were randomly selected, showing obvious periodicity and volatility.

**Figure 1.** Graph of 15-min data from 4 continuous days.

As shown in Figure 2, in order to explore the long-term time series features of photovoltaic power data, this study employed the classic skills of a seasonal prediction model to select the historical data of a photovoltaic power station at 8:30 for 4 consecutive years. Although they show greater volatility, a certain periodicity can still be seen.

**Figure 2.** Graph of 8:30 data for 4 continuous years.

#### *2.2. LSTM and SkipGRU*

LSTM is a classic model in the field of time series prediction. In the prediction process, LSTM updates the internal state and the external state at the same time, mainly through three gates: a forgetting gate, an input gate, and an output gate.

The GRU network [31] is a variant of the LSTM network, which combines the three gates of the LSTM unit into two gates. The SkipGRU module skips the connection layer. By sampling at intervals, it can look back for a longer period of time when the length of the sampling sequence remains unchanged so as to capture the long-term features.

#### *2.3. Self-Attention Mechanism and ProbSparse Self-Attention Module*

The calculation formula of a traditional self-attention mechanism is as follows:

$$Q, K, V = X\mathcal{W}^Q, X\mathcal{W}^K, X\mathcal{W}^V \tag{1}$$

$$A(Q, K, V) = softmax\left(\frac{QK^T}{\sqrt{d}}\right)V\tag{2}$$

In the formula, *WQ*, *WK*, and *W<sup>V</sup>* are the three weight matrices. After random initialization, three vectors, *Q, K*, and *V*, are generated according to Equation (1), and then the result *A*(*Q*, *K*, *V*) weighted ion mechanism is calculated according to Equation (2). The result contains the information via the attention of all of the input data.

The ProbSparse Self-Attention proposes to calculate the sparsity measurement of each query using KL divergence:

$$M(q\_i, K) = L n \sum\_{j=1}^{L\_K} \exp\left(\frac{q\_i k\_j^T}{\sqrt{d}}\right) - \frac{1}{L\_K} \sum\_{j=1}^{L\_K} \frac{q\_i k\_j^T}{\sqrt{d}} \tag{3}$$

Based on the calculated sparsity metric, each key focuses on only u main queries to achieve probsparse self-attention:

$$A(\mathbb{Q}, K, V) = \operatorname{softmax} \left( \frac{\overline{\mathbb{Q}} K^T}{\sqrt{d}} \right) V \tag{4}$$

In the formula, *Q* is a sparse matrix with the same size as *q*, and it only contains top-u queries under sparse metric *M*(*q*, *K*).

#### *2.4. Temporal Convolutional Network (TCN) Module*

TCN is a variant of a convolutional neural network for processing sequence modeling tasks. It combines RNN and CNN architectures. TCN performs better than standard recursive networks on different tasks and data sets, and it demonstrates more long-term and efficient memory. The main component of the TCN network is Dilate Causal Conv. Other components are similar to the Feedforward module, which plays a role in deepening the linear features.

#### *2.5. Problem Definition*

The present study abstracts the photovoltaic power prediction problem as a multistep time series prediction problem, which can be defined as a data series with an input of *I* × *n* and an output of *O* × *1*, where *I* is the length of the input data, and *O* is the length of the output data. For example, under a 15-min sampling frequency, if the historical data of photovoltaic power in the past 30 days are used to predict the photovoltaic power data in the future 24 h, the *I* length is 2880, and the *O* length is 96.

#### **3. Methodology**

#### *3.1. Transformer Based TCNformer Solution*

For the time series features of photovoltaic power data, this paper proposed a TCNformer prediction model. The structure of the model is shown in Figure 3. Based on the traditional Transformer architecture, the TCNformer model mainly includes four modules: a variable selection (VS) module, an long- and short-time series feature extraction (LSTFE) module, an Encoder, and a Decoder.

**Figure 3.** The structure of TCNformer model.

The overall TCNformer network design follows the traditional Transformer structure, in which the Encoder module and the Decoder module are designed with a multilayer structure.

#### *3.2. Variable Selection (VS) Module*

Combined with the information shown in Figures 1 and 2, the historical data of photovoltaic power not only have timing features in the short term, but they also have certain timing features over the long term. Considering the length of the long-term cycle (as shown in Figure 2, the cycle is close to 365 days) and the subsequent optimization problems, it is difficult for the traditional model to capture these timing features at the same time. So, we designed a VS module to divide the input sequence into three dimensions through preliminary analysis and selection of the historical data. Then, the results from the VS module are transferred to the LSTFE module for feature fusion.

$$d\_{l}, d\_{s}, d\_{t} = \text{VariableSelection}(data, input) \tag{5}$$

In the formula, *data* <sup>∈</sup> <sup>R</sup>*I*×*n*, *dl* <sup>∈</sup> <sup>R</sup>*Il*×*nl* , *ds* <sup>∈</sup> <sup>R</sup>*Is*×*ns*, *dt* <sup>∈</sup> <sup>R</sup>*I*×*nt* respectively represent preprocessed raw data, month-level time series data, week-level time series data, and daylevel time series data. *n*, *nl*, *ns* and *nt* respectively represent the number of influencing factors. *VariableSelection*(·) represents the VS module, and the specific calculation method is as follows.

Photovoltaic power data often show strong time series features. Although the volatility is strong, they still have a certain periodicity over a longer time range. In this paper, the Fourier transform decomposition curve of photovoltaic power data and its influencing factor data are selected for periodicity analysis [32] in order to obtain the fluctuation periods of different periodic curves and to provide a certain degree of reference for the analysis of photovoltaic power prediction. The formula of the Fourier transform is as follows:

$$X(k) = \sum\_{n=0}^{N-1} x(n) \mathcal{W}\_N^{nK}, k = 0, 1, \dots, N-1 \tag{6}$$

$$\mathbf{x}(n) = (1/N)\sum\_{n=0}^{N-1} \mathbf{X}(k)\mathbf{W}\_{\mathbf{N}}^{-nK}, k = 0, \dots, N-1\tag{7}$$

$$\mathcal{W}\_N^{nK} = \mathcal{e}^{-j(2\pi/N)ku} \tag{8}$$

*X*(*k*) represents the Fourier series, *x*(*n*) represents the Fourier coefficient, *WnK <sup>N</sup>* represents the complex function, *k* represents the *x* coordinate in the frequency domain, and *N* represents the period.

Photovoltaic power is correlated with a large number of weather factors, especially the strong correlation between solar radiation intensity and photovoltaic power. In this study, the Pearson correlation coefficient was selected for correlation analysis, and the calculation formula is as follows:

$$P\_{x,y} = \frac{cov(x,y)}{\sigma\_x \sigma\_y} = \frac{E[(x\_i - \overline{x})(y\_i - \overline{y})]}{\sigma\_x \sigma\_y} \tag{9}$$

The VS module processes the month-level time series data, week-level time series data, and day-level time series data according to the analytical results of correlation and periodicity.

#### *3.3. Long- and Short-Time Series Feature Extraction (LSTFE) Module*

In this study, we designed an LSTFE module, and we used it to extract time series features from each time scale. The structure of the LSTFE module is shown in Figure 4. The LSTFE mainly includes the LSTM unit, the SkipGRU unit, and the CycleEmbed unit.

**Figure 4.** The structure of LSTFE module.

We transferred the week-level time-series-related data and the month-level time-seriesrelated data to the LSTM network and the SkipGRU network in the LSTFE module for prediction. The prediction results of the LSTM network made full use of the short-term time series features, while the SkipGRU network made full use of the long-term time series features:

$$f\_l = LSTM(d\_l) \tag{10}$$

$$f\_s = SkipGRLI(d\_s) \tag{11}$$

$$X = \text{Integration}(d\_{l\prime}, f\_{l\prime}f\_s) \tag{12}$$

$$X\_{\rm ev}^0 = \mathbb{C}y \\ \underline{\text{cle}} \\ \text{Embed}(X) \tag{13}$$

In Formulas (10) and (11), *fl* <sup>∈</sup> <sup>R</sup>*<sup>I</sup>* and *fS* <sup>∈</sup> <sup>R</sup>*<sup>I</sup>* represent the month-level time series feature extraction results and the week-level time series feature extraction results in the LSTFE module, respectively. Using the excellent feature extraction capabilities of the LSTM and the SkipGRU, the extracted feature results were transformed into the input length *I* of the Encoder module.

Using the LSTM and the SkipGRU, the time series features at weekly and monthly levels were extracted, but we were left wondering how to extract the time series features at an annual level? To solve this problem, we designed the CycleEmbed module.

The structure of the CycleEmbed unit is shown in Figure 5, including data projection, position coding, cycle coding, and timing coding.

**Figure 5.** The structure of the CycleEmbed module.

Data projection is based on the results of correlation and periodic analysis, mapping the output data to the vector of dimension, and aligning the dimensions. The alignment tool is a one-dimensional convolution filter.

The position coding is calculated in the same way as in Transformer:

$$P(pos, 2j) = \sin\left(\frac{pos}{(2L\_x)^{\frac{2j}{d\_{model}}}}\right) \tag{14}$$

$$P(pos, 2j+1) = \cos\left(\frac{pos}{(2L\_\chi)^{\frac{2j}{d\_{model}}}}\right) \,. \tag{15}$$

In Formulas (14) and (15), *j* ∈ \* 1, . . . , | *dmodel* 2 | + , *Lx* is the input sequence length, and *dmodel* is the Encoder input dimension.

Cycle coding is divided according to the results of periodic analysis and calculation. *τ* is the number of cycle data steps, which is determined by the results of periodic analysis *T* and the granularity of the data sampling time *g*; that is, *τ* = *T*/*g*. Then, the cycle information of the input data is coded according to the results of *τ*; that is, there are *τ* results in cycle coding, *Ci* = *Ci*%*τ*.

Timing coding is used to add the month and year to the coding to extract the longer time series features. In this way, the annual time series features of the data are introduced into the codec along with the embedding operation.

Combining the results of the four parts, the output result of the final period embedding module is the input of the Encoder:

$$\text{CycleEmbed}\_{l}[i] = lI\_{i} + P(L\_{\text{x}}(t-1) + i) + \mathbb{C}\_{i} + M\_{i} + \text{Y}\_{i} \tag{16}$$

#### *3.4. Encoder*

The input of the Encoder is the output of the LSTFE module. The structure of the Encoder is a multilayer network structure. Each layer of the Encoder is mainly composed of a sparse attention unit and a composition unit.

$$S\_{cn}^{l,1} = \operatorname{ProbSel} f\_{\mathcal{A}} \operatorname{Attention} \left( X\_{cn}^{l-1} \right) \tag{17}$$

$$S\_{en}^{l,2} = FeedForward\left(S\_{en}^{l,1}\right) \tag{18}$$

$$X\_{em}^l = S\_{em}^{l,2} \tag{19}$$

In Formula (17), *Sl*,1 *en* <sup>∈</sup> <sup>R</sup>*I*×*dmodel* is the calculation result of the sparse attention mechanism in the Layer l Encoder module, *Sl*,2 *en* <sup>∈</sup> <sup>R</sup>*I*×*dmodel* is the calculation result of the Feedforward layer in the Layer l Encoder module, and *FeedForward*(·) is an important part of the traditional Transformer network structure which is used to deepen the linear representation and better extract the features. The Feedforward structure used in this paper is shown in Figure 6. ProbSelfAttention(·) is the sparse attention mechanism in the Informer model [24].

**Figure 6.** The structure of the Feedforward layer.

*3.5. Decoder*

In the Transformer model, the Encoder can be calculated in parallel, but the Decoder needs to decode step by step. As with the LSTM model, error accumulation will occur. This study introduced a one-step TCN decoding operation:

$$X\_0 \, \, = \, \text{Zeros}[O, d] \,\tag{20}$$

$$X\_{des} = concat(X, X\_0) \tag{21}$$

$$X\_{dc}^{0} = \mathbb{C}y \\ \text{e} \\ \text{Embed}(X\_{dcs}) \tag{22}$$

In Formula (20), *X*<sup>0</sup> is the result of the zero-filling operation. One-step decoding divides the Decoder's input into two parts through a zero-filling operation. The first *I* datum is a known sequence, the last *O* datum is a sequence to be predicted, and *X*<sup>0</sup> *de* <sup>∈</sup> <sup>R</sup>(*I*+*O*)×*dmodel* is the Decoder's input data. At this time, part of the time information of the data to be predicted is also transmitted to the Decoder through the period embedding module for

prediction. The prediction process of the Decoder is similar to that of the Encoder, but it has a more of a self-attention layer than does the Encoder.

$$S\_{dc}^{l,1} = \text{ProbSdf} A t \text{t} t \text{t} \text{t} \text{t} \left(\mathbf{X}\_{dc}^{l-1}\right) \tag{2.3}$$

$$S\_{dc}^{l,2} = SelfAttention \left( S\_{dc}^{l,1}, X\_{en}^N \right) \tag{24}$$

$$S\_{dc}^{l,3} = FeedForward \left(S\_{dc}^{l,2}\right) \tag{25}$$

$$X\_{d\epsilon}^{l} = S\_{d\epsilon}^{l,3} \tag{26}$$

In Formulas (23)–(25), *Sl*,1 *de* <sup>∈</sup> <sup>R</sup>(*I*+*O*)×*dmodel* is the calculation result of the sparse attention mechanism in the Layer l Decoder module, *Sl*,2 *de* <sup>∈</sup> <sup>R</sup>(*I*+*O*)×*dmodel* is the result of matching the sparse attention mechanism in the Layer l Decoder module with the feature map obtained in the Encoder, and *Sl*,3 *de* <sup>∈</sup> <sup>R</sup>(*I*+*O*)×*dmodel* is the calculation result of the Feedforward layer in the Layer l Decoder module. The calculation method of FeedForward(·) and ProbSelfAttention(·) is the same as above. SelfAttention(·) is the self-attention mechanism. (See Section 2.3 for the calculation method.)

$$X\_{pred} = TCN\left(X\_{dc}^{M}\right) \tag{27}$$

*Xpred* <sup>∈</sup> <sup>R</sup>*O*×*dmodel* is the final prediction result of TCNfomer, which uses TCN to make generative predictions. The TCN structure used in this paper is shown in Figure 7.

**Figure 7.** The structure of the TCN.

#### **4. Experiment**

*4.1. Experimental Design*

#### 4.1.1. Data Preparation

The data set included an open-source data set of photovoltaic power conducted on a solar farm in Australia [33] from 2015 to 2016. The time interval is 15 min, there are 96 data points every day, and there are 70,176 samples in total. Each sample contains 13 data, including a time stamp, received active energy, average value at the current stage, active power, performance ratio, wind speed, weather temperature in Celsius, weather relative humidity, global horizontal radiation, diffuse horizontal radiation, wind direction, daily rainfall, global tilt of radiation, and diffuse tilt of radiation. The test set used data from the last 2 months of 2016.

All data for two years are shown in Figure 8. The x-axis is the number of days, the y-axis is 96 time points per day (sampling granularity is 15 min, and 24-h data processing includes 96 event points), and the z-axis is the photovoltaic power data.

**Figure 8.** Historical data set of photovoltaic power.

#### 4.1.2. Data Preprocessing

Since the dimensions between variables are not identical, linear normalization is required for prediction, and the conversion function is:

$$\mathbf{x}\_{norm} = \frac{\mathbf{x}\_{i} - \min(\mathbf{x}\_{i})}{\max(\mathbf{x}\_{i}) - \min(\mathbf{x}\_{i})} \tag{28}$$

In the formula, *xnorm* is the preprocessing result of the data after linear normalization; *xi* is the variable input value to be normalized; *max*(*xi*) is the maximum value of the variable in the original dataset *xi*; and *min*(*xi*) is the minimum value of the variable in the original data set *xi*.

#### 4.1.3. Evaluation Index

In order to verify the prediction accuracy of the model, the root mean square error (MSE), the average absolute error (MAE) and mean absolute percentage error (MAPE) were used as the evaluation indicators of the model performance. The specific calculation formula is:

$$MSE = \frac{1}{N} \sum\_{i=1}^{n} \left( X\_i - \hat{X}\_i \right)^2 \tag{29}$$

$$MAE = \frac{1}{N} \sum\_{i=i}^{N} |X\_i - \hat{X}\_i| \tag{30}$$

$$MAPE = \frac{100\%}{N} \sum\_{i=i}^{N} \left| \frac{X\_i - \hat{X}\_i}{X\_i} \right| \tag{31}$$

*Xi* is the actual output value of the *i* th data point of the test set; *X*ˆ*<sup>i</sup>* is the output prediction value of the *i* th data point; and *N* is the total number of samples in the test set.

#### 4.1.4. Experimental Environment and Parameter Setting

The experimental environment used an Intel i7-9700 K processor and an NVIDIA GeForce RTX 3080Ti graphics card, and the algorithm model used Python 3.8 as the programming language. The model-related network was built based on the open-source machine learning framework PyTorch. The Python libraries directly used in the experiment included: pandas, numpy, matplotlib, torch, math, and time. In this study, the random search method was used to determine the final super parameter settings. The final super parameter settings are shown in Table 1.


**Table 1.** Model parameter setting.

#### *4.2. Variable Selection Results and Discussion*

The VS module in the long- and short-sequence correction network includes correlation analysis and periodicity analysis. The results of the correlation analysis on photovoltaic power are shown in Table 2.

**Table 2.** Results of correlation analysis.


It can be seen from Table 2 that photovoltaic power is positively correlated with direct radiation intensity, scattered radiation intensity, temperature, and wind speed, while it is negatively correlated with humidity, wind direction, and rainfall. According to their numerical values, the data were filtered by 0.1. It can be seen that the correlation between direct radiation intensity and photovoltaic power is the largest, while variables such as scattered radiation intensity, temperature, humidity, and wind speed have a certain correlation with photovoltaic power, which show that these influencing factors have a certain degree of impact on the photovoltaic power, and the impact decreases in turn. Although wind direction and rainfall are negatively related to the photovoltaic power, the value is too small to impact the output.

It can be seen from Table 3 that the cycle of photovoltaic power, humidity, direct radiation intensity, and scattered radiation intensity is 24.03 h, approximately 1 day, and the cycle of the wind speed, wind direction, and rainfall is 0.17 h, which can be regarded as a periodicity. The temperature cycle is 8760 h; that is, the temperature cycle conforms to the changes in the four seasons, and the above results basically conform to the natural logic.

**Table 3.** Results of periodicity analysis.


By correlation analysis, five influencing factors should be selected, including direct radiation, scattered radiation, temperature, humidity, and wind speed. Three influencing factors, namely, direct radiation, scattered radiation, and humidity, were screened through periodic analysis. Finally, the time series related variables of photovoltaic power were screened through the VS module, those being direct radiation, scattered radiation, and humidity.

#### *4.3. Prediction Results of Different Prediction Steps*

In order to explore the prediction performance of each model under different prediction steps, this study selected LSTM, SkipGRU, Transformer, and Informer to compare with TCNformer.

The results are shown in Table 4. It can be seen from the results that, when the number of prediction steps is 1, the MSE errors of the five models have little difference. With the increase in the number of the prediction steps, the LSTM model demonstrated the largest error growth rate, and the error accumulation is obvious. Informer and TCNformer use the generative prediction method, so the error was relatively stable, and the error accumulation was low. The TCNformer model proposed in this paper not only had a low level of error accumulation, but it also had the lowest MSE error. In order to more intuitively observe the error accumulation in the models, the prediction results were visualized, as shown in Figure 9.


**Table 4.** Prediction Accuracy (MSE) Results of periodicity analysis.

#### *4.4. Prediction Performance of Different Models*

In this experiment, each model was trained five times in a 24-h (96 prediction steps) scenario, and the average value was taken. The final test set prediction results are shown in Table 5.

**Table 5.** The 24-h scenario prediction results.


As shown in Table 5, the TCNformer performs best according to the three indicators of the MSE, MAE, and MAPE. Compared with the time series prediction model Informer, the MSE, MAE, and MAPE decreased by 81.90%, 50.03%, and 14.98%, respectively. The training time (153.43 s) and running time (1.29 ms) of TCNformer are relatively long, but

considering the 15-min sampling granularity and 24-h prediction scenario, the training time and running time do not affect the practical application of TCNformer.

**Figure 9.** Prediction performance of the different numbers of prediction steps for each model.

As shown in Figure 10, this we visualized the prediction results of TCNformer using the test data set. The prediction results shown in the figure are 30 sets of 24-h prediction results, with little deviation when compared with the real data. It can be seen that TCNformer has a high level of accuracy and low number of errors.

**Figure 10.** Results of the TCNformer model.

#### *4.5. Error Analysis*

Because the prediction of TCNformer model is a time series, we did not calculate the standard error for multiple series. Instead, error analysis was carried out through the MSE of prediction and ground truth. Figure 11 shows the standard error diagram. The error bar in the diagram represents the standard error. Table 6 shows the mean value, standard deviation (SD), and standard error (SE) of the error under different sample numbers.

**Figure 11.** The standard error diagram of the MSE.

**Table 6.** Results of error analysis.


As shown in Table 6 and Figure 11, with the increase in the number of samples, the standard deviation and standard error gradually decreased, and the average value was closer to the average value of the overall sample. Therefore, the prediction result of the TCNformer model has a relatively stable level of error and a high level of reliability.

#### *4.6. Ablation Experiment*

In order to verify the effectiveness of each optimization module of the TCNformer model, we conducted ablation experiments, and we removed three innovative modules from the TCNformer model for comparative experiments, that is, we set them separately:

Experiment 1: Removal of the VS module.

Experiment 2: Removal of the long- and short-time series feature extraction module. Experiment 3: Removal of the seq2seq structure, and use of the VS module + long-

and short-time series feature extraction module + full connection network.

Experiment 4: Removal of one-step TCN decoding. Experiment 5: Use of the complete TCNformer model.

As shown in Table 7, the three innovations proposed in this paper are a VS module, an LSTFE module, and the seq2seq generative model structure combined with Informer and Transformer. No matter which module was removed, the error of the model was increased. When the seq2seq model structure was not used, the error was the largest, and the VS module had the smallest impact on the overall model, but it still caused a decline in accuracy. From these data, it can be concluded that the TCNformer model proposed in this paper is effective, and its innovative modules are useful.


**Table 7.** Results of ablation experiment.

#### **5. Conclusions**

In this paper, a TCNformer model was proposed for photovoltaic power prediction, and we can draw the following three conclusions based on the experiment results:


**Author Contributions:** Methodology, S.L.; Software, S.L.; Validation, S.L.; Investigation, J.M.; Writing—original draft, S.L.; Writing—review & editing, D.N. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Research and development of large dispatching level data acquisition and monitoring control system grant number E212641B01 and AI assisted optimization of hybrid energy system and techno-enviro-economic analysis of green hydrogen supply chain grant number PT19797.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **A Two-Stage Bilateral Matching Study of Teams-Technology Talents in New R&D Institutions Based on Prospect Theory**

**Lin Jiang \* and Biyun Chen**

School of Information Engineering, Yancheng Teachers University, Yancheng 224002, China

**\*** Correspondence: jiangl@yctu.edu.cn

**Abstract:** This study considers two-stage bilateral matching of teams and scientific and technological talents in new R&D organizations and proposes a two-stage dual-objective bilateral matching method based on prospect theory. The matching of teams and scientific and technological talent in new R&D institutions is divided into two stages: elimination matching in the first stage and selection matching in the second stage. In the first stage, the evaluation index of the team to talent and the cost index of talent are constructed, the dual reference points of peer and expectation are set for evaluating talent, and the bottom-line reference points are set for talent cost. The comprehensive prospect value in the first stage is calculated based on prospect theory, and the matching in the first stage is completed based on the dual-objective optimization model with the highest evaluation value and the lowest cost value. In the second stage, using the matching results of the first stage, the team evaluates the talent again, while the talent ranks the team to obtain the satisfaction value, and completes the second stage of bilateral matching based on prospect theory and the dual-objective optimization model with the highest evaluation value and the highest satisfaction value. Finally, a case study and method comparison show that the proposed method is feasible and effective.

**Keywords:** prospect theory; new R&D institutions; scientific and technological talent; performance assessment; bilateral matching

### **1. Introduction**

In China, new R&D institutions, with their impressive innovation achievements and rapid development momentum, have developed into the pioneering force for source science and technology innovation and development of strategic emerging industries in various regions, creating a new model of science and technology R&D that leaps and bounds to enhance source innovation capacity and rapidly realize industrialization. The new R&D institution is a "four different", not exactly like a university, with different culture; not exactly like a scientific research institute, with different content; not exactly like an enterprise, with different objectives; not exactly like an institution, with different mechanisms. In particular, its essential characteristics of marketization, industrialization, diversification, socialization and internationalization also foreshadow an important direction for the reform and development of scientific research institutions in China.

The development of science and technology cannot be separated from talent, and scientific and technological talent is the first resource for the development of new R&D institutions. With the development of new R&D institutions, the requirements for scientific and technological talents are becoming increasingly high-end and comprehensive. First, they should be patriotic and dedicated, transforming their love for the country and the will to strengthen the country into the act of serving the country. Second, they should be active in innovative thinking and conduct research oriented by the scientific and technological needs of the country and the market as well as "high precision and shortage" projects. At the same time, new R&D institutions should also create various conditions for all kinds of talent to settle down, feel at ease and work, respect talent and creativity, innovate talent

**Citation:** Jiang, L.; Chen, B. A Two-Stage Bilateral Matching Study of Teams-Technology Talents in New R&D Institutions Based on Prospect Theory. *Sustainability* **2023**, *15*, 3494. https://doi.org/10.3390/su15043494

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 25 July 2022 Revised: 7 February 2023 Accepted: 8 February 2023 Published: 14 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

assessment mechanisms, and improve the assessment system of scientific and technological talent oriented toward innovation ability, quality, and contribution. There are many teams within the new R&D institutions, and these institutions give the teams a lot of autonomy to choose the direction of scientific research independently, issue performance and rewards independently, and implement the science and technology management system independently. Therefore, teams of new R&D institutions also have autonomy in selecting and employing people. The two-way choice between a team of new R&D institutions and scientific and technological talents is the issue of bilateral matching. On the one hand, the team of the new R&D institution looks for scientific and technological talents according to its own needs, and each team will propose its own personalized evaluation index that classifies and selects scientific and technological talents through an evaluation based on a personalized index, which will realistically consider the benefits and costs together. On the other hand, the tech talent will also look for teams according to their own needs; there will also be a ranking of the teams.

To study the bilateral matching of new R&D institutions and talents, the author searched the SCI literature on the performance assessment of new R&D institutions, performance assessment of scientific and technological talents, and bilateral matching. Moliterno et al. [1] proposed that performance comparison is central to the behavioral theory of the firm, that is, companies assess their performance based on their own prior performance ("historical comparison") and the performance of other organizations ("social comparison") and base subsequent organizational changes on this performance feedback. Bode and Singh [2] argue that the provision of opportunities for employees to participate in social activities helps attract, motivate, and retain employee talent. Abramo et al. [3] proposed that the ultimate goal of research innovation activities is not publication, but scientific and technological progress useful to science or practice, and that there is no incentive to produce low-value papers if innovation performance is assessed and funds are allocated based on the total impact of publications, rather than on the number of publications. Yin et al. [4] proposed the use of a blend of subjective and objective methods to assess green technology innovation capabilities which should consider indicators in four areas: input elements, technological output, economic aggregates, and social effects. Sun and Cao [5] point out that Chinese academic research on innovation has paid particular attention to R&D expenditures, performance assessments, regional innovation ecosystems, the role of state-owned enterprises in innovation, and the role of the Chinese Communist Party in innovation. Mao et al. [6] show that organizational innovation climate, knowledge management capabilities, and internal collaboration networks have a significant positive impact on innovation performance, and that internal collaboration networks have a significant mediating role between them. Choma´c et al. [7] showed that consumer knowledge and preferences in the field of renewable energy determine the diffusion of RES solutions in personal use, thus stimulating the progress of energy transition. In general, to evaluate the performance of new R&D institutions, first, the evaluation indicators should be as comprehensive as possible, taking into account both subjective and objective factors. Second, in addition to assessing the established results, the innovation potential should also be assessed. Third, more attention should be paid to the quality of publications rather than the quantity. Fourth, the investment of funds and scientific and technological talents should be increased; and fifth, the economic promotion benefits as well as the social effects generated should be considered.

Chamorro et al. [8] discussed three methods for assessing talent: machine-learning algorithms, social sensing technologies, and user experience. Pillai et al. [9] investigated the application of AI technologies in talent acquisition, designed technology-organizationenvironment (TOE) and task-technology-adaptation (TTF) frameworks and proposed a model to explore the adoption of AI technologies for talent acquisition. Jiang et al. [10] proposed the development of technological talent in line with the globalization context and to integrate talent acquisition, research and development, technological innovation, and enterprise development. Wiblen and Marler [11] propose the role of digitization in talent

identification, showing how the same digital talent management techniques can produce different ways to identify talent. Chaudhuri et al. [12] argued that company management with PhDs in key roles outperforms similar company management. Agarwal et al. [13] argue that stable shared leadership is at the root of firms becoming the center of gravity of their industry, accounting for the largest share of output. In general, the performance assessment of scientific and technical talents should be conducted by considering both their results and their potential. Second, regular, immediate, and dynamic assessment of scientific and technical talents should be conducted to form a digital database of assessments. Third, the capabilities of scientific and technical talents should be integrated with the development of the company and linked to economic and social benefits; and fourth, an artificial intelligence-based approach should be provided, and input assessment data can be quickly and accurately predicted and reasoned to draw assessment conclusions.

Eirinakis et al. [14] propose a time-optimal algorithm that identifies all stable workerfirm pairs and allocations under pairwise stability, individual preferences, and max-min criteria. Wang et al. [15] studied the bilateral matching decision problem using heterogeneous information and attribute associations. Kanoria and Saban [16] introduced a dynamic bilateral search model in which strategic agents incur costs to discover their value for each potential partner and can do so nonsimultaneously. Nguyen et al. [17] developed a many-to-one matching market model in which agents with multiunit demand aim to maximize the underlying linear objective subject to a multidimensional backpack constraint. Johari et al. [18] proposed that, in a service platform, the job type is known, but the worker type is unknown and must be learned by observing the matching results. Deng et al. [19] find that buyer and supplier conformance levels, conformance types, and inconsistency directions affect project performance. Chen et al. [20] found that matching the nature of CEO human capital and the type of acquisitions they make is associated with stronger performance. Choma´c et al. [21] showed that global electricity price increases can be effectively reduced by conducting feasibility and matching analyses of renewable energy sources based on consumer investment and willingness to support them. In general, research on bilateral matching should consider the characteristics of heterogeneity, uncertainty, and incompleteness of input information. Second, improve the satisfaction and efficiency of bilateral matching based on certain methods. Third, consider multistage decision models; and fourth, match dynamically, conduct dynamic assessments, tap dynamic preferences, and consider dynamic reference point values, among others.

New R&D institutions have the mission of "high precision and shortage" of science and technology innovation, the mission of diversified and flexible reform of the science and technology system and mechanism, the mission to respond to market demand and generate economic and social benefits, and the mission of gathering high-end science and technology talents. It is particularly important and meaningful to study the matching of new R&D institutions with scientific and technological talents. Based on the fact that there is particularly little literature on the bilateral matching of new R&D institution teams and scientific and technological talents, this study applies the idea of bilateral matching. First, the new R&D organization team and technology talent are divided into two phases: the elimination matching phase and the selection matching phase. In the first stage of matching, since the number of talents is greater than the number of teams and a team can only match one talent, there are bound to be talents that are rounded off and cannot be matched, so we call it elimination matching. In the second stage of matching, the number of talents is equal to the number of teams, so we call it selection matching. Second, elimination matching considers the team's assessment of talent and the cost of talent introduction, and selection matching considers the mutual assessment of team and talent. Third, elimination matching considers the interval grey number to characterize the uncertain assessment and cost values, and selection matching considers the mean value of expert assessment and the talent's preference order value. Fourth, consider the psychological factors of decision makers, set a historical and desired double reference point for assessment value, set a bottomline reference point for cost value and preference order value, and calculate the prospect

value based on the reference point. Fifth, consider double objective matching, eliminate matching considering the double objective of maximum assessment value and minimum cost value, select matching considering the double objective of maximum assessment value and maximum satisfaction, reduce the double objective to a single objective, and construct a 0–1 integer programming optimal matching model. In this paper, the dual objectives are linear, and for simplicity of calculation, the dual objectives plus the weight information of the objectives can be simplified to a single objective. In order to make the matching results optimal, this paper uses 0–1 integer programming for solving the optimal solution.

#### **2. Phase I Elimination Match between the Team of New R&D Institutions and Scientific and Technological Talents**

The matching of new R&D organization teams and technology talents is divided into two stages. The first stage is the elimination of bilateral matching, that is, each team can match technology talents, the number of talents is greater than the number of teams, so not all technology talents can match the team, limited by the number of teams and the number of matches, there are always technology talents are eliminated in this matching process. In this study, there are n teams of new R&D institutions, and each team can only match one technological talent, and there are m technological talents, m > n; thus, so there will be m-n technological talents are eliminated. Therefore, this stage is referred to as elimination matching. Elimination matching must consider many factors, because eventually the new R&D organization has to explain the elimination reason for the eliminated tech talents, so the method of matching is very demanding.

In the first stage of matching, the highest value of the team's assessment of talent is considered, and the lowest value of the introduction cost of scientific and technological talent is also considered, historical reference points and desired reference points are set for the assessment value, bottom-line reference points are set for the cost value, the psychological factors of the team in the new R&D organization are fully considered, and scientific and technological talent is better compared, and finally, using the "0–1" integer planning model for bilateral matching. In the first phase of elimination matching, the total number of talents was greater than the number of R&D teams. Red smiling faces represent successfully matched tech talents, and green smiling faces represent unmatched tech talents that will be eliminated, as shown in Figure 1.

**Figure 1.** Phase I elimination matching chart for teams and talents in new R&D institutions.

*2.1. Constructing Indicators for the Assessment of Scientific and Technological Talents by the Teams of New R&D Institutions*

The assessment of scientific and technological talent has new requirements in this new era. Since new R&D institutions focus on breaking through core technologies, solving "neck" problems and transforming results, the development process of new R&D institutions requires scientific and technological talents to pay more attention to patriotism, technical potential and transformation ability. In Chinese history, many noble men and women had a strong sense of concern for the country and the people, and they took the affairs of the country as their responsibility, and they defended the motherland and cared for people's livelihood. No matter what environment we are in, we should love our country and take the affairs of our country as our responsibility. For example, in 2020, when we were fighting against the new coronavirus, many outstanding scientific and technological talents emerged in our country, creating Chinese speed in the field of nucleic acid detection, vaccines, saving critically ill patients, building hospitals, etc. They are not only a demonstration of scientific and technological ability, but more importantly, they have a patriotic heart and passion to serve the country. The technical potential of scientific and technological talent depends on their experience and personal drive. Graduating from a prestigious university can help him take fewer detours, critical innovation can keep him challenging new heights, crossdiscipline can give him more inspiration for innovation, and so on. Transformation ability should first have good communication and collaboration ability, because the projects of new R&D institutions are usually completed by teams; second, there should be transformed results that generate economic and social benefits, as well as talent benefits, etc.

The author visited 10 new R&D institutions, researched scientific and technological talents, reviewed 50 papers on skills and literature on R&D institution development, and summarized and refined 15 new R&D institutions' benefit assessment indicators for scientific and technological talents, as shown in Table 1. Ten new R&D institutions were located in the Science and Technology Park of Yancheng City, Jiangsu Province. Yancheng is a coastal city with the largest mudflat wetlands and the largest wind energy. Ten new R&D institutions belong to different industries: two in the saline rice industry, two in the cable industry, two in the machinery industry, two in the wind power industry, and two in the energy industry. I visited the human resources departments of these new R&D institutions and communicated with the person in charge of their talent introduction, who generally reflected the high cost of talent introduction and insufficient satisfaction. They also proposed modifications to our assessment index framework, such as serving the motherland, graduating from prestigious universities, cross-discipline, and providing scientific and technological insights.

During my visit, I also found that different teams of new R&D institutions have different indicators for assessing scientific and technological talent because their requirements are different. For example, new R&D organization A has a strict confidentiality sign posted from the entrance, and there are confidentiality signs everywhere on the stairs and restrooms, indicating that the team has a particularly high requirement for confidentiality. The high-end nature of the technology of the new R&D organization makes the technology accessible to a certain extent. Any knowledge is public to a certain extent and confidential to a certain extent. Intellectual property, for example, may have many public methods, but there are still some details that are kept confidential. Secrecy sometimes represents importance, sophistication, and high-end, so that a few talented people can still learn and improve, but just not be known to the general public. Another example is new R&D organization B. Their recruitment website shows that they are looking for PhD students who have graduated from "985," which means that they require graduates from prestigious universities. Therefore, in this study, we consider the assessment of individualized indicators of scientific and technological talent by teams of new R&D institutions. The first team selected its own personalized indicators from the 15 indicators, and the second team selected its own personalized indicators, k1j from the 15 indicators. The second team selected its own desired indicators k2j from the 15 indicators until all n teams selected their desired indicators knj from the 15 indicators. The teams assign indicator weights as they select the indicators ωij.


**Table 1.** Indicators for assessing the benefits of new R&D organization teams for talents.

Because there are differences in the process of assessment of scientific and technological talents by the team of new R&D institutions due to the assessment experts' own learning, preferences, reference points, information asymmetry, etc., and the assessment results have a certain degree of uncertainty, this study uses interval grey numbers to characterize the uncertain assessment values. The interval grey number is an uncertain number that takes values in a certain interval or in a general set of numbers, and only the range of information values is known without knowing the exact information values, which is usually denoted by the symbol , to denote it. A grey number that has both a lower and an upper bound is called an interval grey number and is denoted as , ∈ [a, a] with a

denotes the lower bound and a denotes the upper bound. Using the interval grey number to characterize the team's assessment value of talent, we obtain the n-decision matrix Ak = ak ij , i = 1, 2, ··· m, j = 1, 2, ··· p, k = 1, 2, ··· n, i denotes that there are m scientific and technical talents, j denotes that there are p assessment metrics, and k denotes that there are n teams.

$$\mathbf{A}^{\mathbf{k}} = \left[ \mathbf{a}\_{\overline{\mathbb{H}}}^{\mathbf{k}} (\bigotimes \mathbf{\bigbeta}) \right] = \begin{bmatrix} \underline{\mathbf{a}}\_{11}^{\mathbf{k}}, \overline{\mathbf{a}}\_{11}^{\mathbf{k}} & \underline{\mathbf{a}}\_{12}^{\mathbf{k}}, \overline{\mathbf{a}}\_{12}^{\mathbf{k}} & \cdots & \underline{\mathbf{a}}\_{1\mathbf{p}}^{\mathbf{k}}, \overline{\mathbf{a}}\_{1\mathbf{p}}^{\mathbf{k}} \\ \underline{\mathbf{a}}\_{21}^{\mathbf{k}}, \overline{\mathbf{a}}\_{21}^{\mathbf{k}} & \underline{\mathbf{a}}\_{22}^{\mathbf{k}}, \overline{\mathbf{a}}\_{22}^{\mathbf{k}} & \cdots & \underline{\mathbf{a}}\_{2\mathbf{p}}^{\mathbf{k}}, \overline{\mathbf{a}}\_{2\mathbf{p}}^{\mathbf{k}} \\ \cdots & \cdots & \cdots & \cdots & \cdots \\ \underline{\mathbf{a}}\_{m1}^{\mathbf{k}}, \overline{\mathbf{a}}\_{m1}^{\mathbf{k}} & \underline{\mathbf{a}}\_{m2}^{\mathbf{k}}, \overline{\mathbf{a}}\_{m2}^{\mathbf{k}} & \cdots & \underline{\mathbf{a}}\_{mp}^{\mathbf{k}}, \overline{\mathbf{a}}\_{mp}^{\mathbf{k}} \end{bmatrix} \tag{1}$$

#### *2.2. Constructing Cost Indicators for the Introduction of Scientific and Technological Talent*

New R&D institutions must pay a certain price when introducing scientific and technological talent. We take the introduction of three years as an example: the cost in three years generally includes the settlement fee, science and technology start-up fee, salary, insurance, provident fund, performance incentives, project dividends, and other incentives. Some items under the difference are also relatively large, and we choose more representative indicators as the matching indicators, as shown in Table 2.



Some scientific and technological talents can receive high rewards in teams of new R&D institutions, while others receive very little reward. Since the mechanism of new R&D institutions is very flexible and financially autonomous, there are many uncertain cost factors, and sometimes the rewards can far exceed salary income. In general, the cost of bringing in scientific and technological talent with higher performance evaluation values is relatively high. It is a real problem to match the teams of new R&D institutions with scientific and technical talents with high assessment values and low costs. In this study, the team of the new R&D organization and the technology talent are communicated and based on the cost assessment index in this study, both parties negotiate and finally determine an acceptable cost range. Using the interval grey number to characterize the cost value of talent introduction, we obtain a decision matrix of 1B = . bij/ , i = 1, 2, ··· m, j = 1, 2, ··· q, with i denotes the number of m a scientific and technical talent, and j denotes that there are q evaluation index.

$$\mathbf{B} = \begin{bmatrix} \mathbf{b}\_{\overline{\mathbf{i}}} \left( \bigotimes \mathbf{j} \right) \end{bmatrix} = \begin{bmatrix} \underline{\mathbf{b}}\_{11}, \overline{\mathbf{b}}\_{11} & \underline{\mathbf{b}}\_{12}, \overline{\mathbf{b}}\_{12} & \cdots & \underline{\mathbf{b}}\_{1q'} \mathbf{\overline{b}}\_{1q} \\ \underline{\mathbf{b}}\_{21}, \overline{\mathbf{b}}\_{21} & \underline{\mathbf{b}}\_{22}, \overline{\mathbf{b}}\_{22} & \cdots & \underline{\mathbf{b}}\_{2q'} \mathbf{\overline{b}}\_{2q} \\ \cdots & \cdots & \cdots & \cdots & \cdots \\ \underline{\mathbf{b}}\_{\mathbf{m}1}, \overline{\mathbf{b}}\_{\mathbf{m}1} & \underline{\mathbf{b}}\_{\mathbf{m}2}, \overline{\mathbf{b}}\_{\mathbf{m}2} & \cdots & \underline{\mathbf{b}}\_{\mathbf{m}q'} \mathbf{\overline{b}}\_{\mathbf{m}q} \end{bmatrix} \tag{2}$$

#### *2.3. Phase I Elimination Matching Results Based on Prospect Theory and Bilateral Matching Models*

According to prospect theory and grey target theory, this study sets a double reference point for the appraisal value of the historical reference point and desired reference point. This study sets a bottom-line reference point for the cost value, uses the reference point as the bull's eye, and applies prospect theory to the data set.

In the first stage of matching the team of a new R&D institution with scientific and technological talents, although the assessment value of scientific and technological talents is given, the assessment value does not reflect whether the scientific and technological talents are good, how good they are, and whether the team is satisfied. Therefore, it is necessary to find a reference point for comparison, and the data for the reference point must be easily accessible.

2.3.1. Setting Reference Points for Assessment Values—Historical Reference Points and Desired Reference Points

The historical reference point is the assessment of the scientific and technological talents already introduced in the history of the new R&D institution team and the comparison with the assessment of the talents in the past, which can tell whether the batch of scientific and technological talents is better than the past or not as good as the historical scientific and technological talents. In general, the new R&D organization team hopes that the technology talent brought in is improving. If the assessment value is higher than the historical reference point, the new R&D institution team is satisfied. If the assessment value is lower than the historical reference point, the new R&D institution team is not satisfied.

The expectation reference point is the goal that the new R&D organization team expects the tech talent to achieve, and comparing it with the expectation in the decision maker's mind will determine whether the batch exceeds or falls short of expectations. In general, the new R&D organization team expects the technological talent to meet expectations, but expectations are usually not too low. If the assessment value is higher than the expected reference point value, the new R&D organization team is satisfied. If the assessment value is lower than the expected reference point value, the new R&D organization team is not satisfied.

There are many teams within the new R&D organization, and each team sends different evaluation experts; therefore, the reference point values are different for each team. The historical reference point values can be obtained by collating historical data, and the expected reference point values can be given by the evaluation experts, so that the historical reference point vector Ck <sup>1</sup> and the desired reference point vector Ck <sup>2</sup> The historical reference point vector and the desired reference point vector can be obtained.

$$\mathbf{C}\_{1}^{k} = \left[\mathbf{c}\_{1\circ}^{k}(\bigotimes)\right] = \left\{ \left[\underline{\mathbf{c}}\_{11}^{k}, \overline{\mathbf{c}}\_{11}^{k}\right], \left[\underline{\mathbf{c}}\_{12}^{k}, \overline{\mathbf{c}}\_{12}^{k}\right], \dots, \left[\underline{\mathbf{c}}\_{1\mathbf{P}'}^{k}, \overline{\mathbf{c}}\_{1\mathbf{P}}^{k}\right] \right\} \tag{3}$$

$$\mathbf{C}\_{2}^{\mathbf{k}} = \left[ \mathbf{c}\_{2}^{\mathbf{k}} \left( \bigotimes \mathbf{j} \right) \right] = \left\{ \left[ \underline{\mathbf{d}}\_{21}^{\mathbf{k}}, \overline{\mathbf{d}}\_{21}^{\mathbf{k}} \right], \left[ \underline{\mathbf{d}}\_{22}^{\mathbf{k}}, \overline{\mathbf{d}}\_{22}^{\mathbf{k}} \right], \dots, \left[ \underline{\mathbf{d}}\_{2\mathbf{p}}^{\mathbf{k}}, \overline{\mathbf{d}}\_{2\mathbf{p}}^{\mathbf{k}} \right] \right\} \tag{4}$$

2.3.2. Set the Reference Point for the Cost Value—Bottom Line Reference Point

New R&D institutions provide cost value for the introduction of individual scientific and technical talent, but it is not known whether the institution is satisfied. Therefore, a bottom-line reference point was set, and the cost value was compared with the bottom-line reference point. The bottom-line reference point is the bottom-line cost value that the institution can afford, and if the cost value is lower than the bottom-line reference point value, the new R&D institution feels a gain. If the cost value is higher than the bottom-line reference point value, the new R&D institution feels a loss. The bottom-line reference point data are jointly provided by the evaluation experts of the new R&D organization team, and because the team pays about the same cost to the scientific and technological talents, the bottom-line reference point represents the bottom-line cost of the entire new R&D organization. After the evaluation experts' determination, we obtained the bottom-line reference point value vector C3.

$$\mathbf{C}\_{3} = \left[ \mathbf{c}\_{3\circ} \middle| \left( \bigotimes \right) \right] = \left\{ \left[ \underline{\mathbf{c}}\_{31}, \overline{\mathbf{c}}\_{31} \right], \left[ \underline{\mathbf{c}}\_{32}, \overline{\mathbf{c}}\_{32} \right], \dots, \left[ \underline{\mathbf{c}}\_{3\circ}, \overline{\mathbf{c}}\_{3\circ} \right] \right\} \tag{5}$$

2.3.3. Calculate the Distance from the Appraised Value and the Cost Value to the Reference Point Separately

The reference point is the bullseye of the gray target decision. If the assembled data is greater than zero, the target is hit. If the assembled data are less than 0, it is considered off-target. The values of the new R&D organization team's assessment of technological talent and the cost of introducing technological talent are uncertain and are characterized by interval grey numbers. Historical reference point values, desired reference point values, and bottom-line reference point values also have uncertainty and are characterized by interval grey numbers. To calculate the distance from the appraised value and cost value to the reference point, the formula for calculating the distance between the interval grey number and the interval grey number is used. In this study, we consider the kernel and halfinterval length of the interval grey number to calculate the distance between the interval grey numbers.

**Definition 1.** *Let two grey numbers* , <sup>1</sup> ∈ [a, a]*,* , <sup>2</sup> ∈ [*c*, *c*]*, and define the kernel of the two interval grey numbers as*

$$
\widehat{\bigotimes\_1} = \frac{1}{2}(\underline{\mathbf{a}} + \overline{\mathbf{a}}) , \widehat{\bigotimes\_2} = \frac{1}{2}(\underline{\mathbf{c}} + \overline{\mathbf{c}}) \tag{6}
$$

*If* <sup>0</sup> , <sup>1</sup> <sup>&</sup>gt; <sup>0</sup> , <sup>2</sup>*, then* , <sup>1</sup> > , 2.

**Definition 2.** *Let two grey numbers* , <sup>1</sup> ∈ [a, a]*,* , <sup>2</sup> ∈ [c, c]*, and define the length of the two interval grey numbers as*

$$\mathrm{cl}(\bigotimes\_{1}) = \frac{1}{2}(\mathbb{\pi} - \mathbb{\underline{a}}), \mathrm{l}(\bigotimes\_{2}) = \frac{1}{2}(\mathbb{\pi} - \mathbb{\underline{e}}) \tag{7}$$

**Definition 3.** *Let two grey numbers* , <sup>1</sup> ∈ [a, a]*,* , 2 ∈ b, b , *and define the distance between the two interval grey numbers as*

$$\text{rd}(\bigotimes\_{1} \bigotimes\_{2}) = \left| \widehat{\bigotimes}\_{1} - \widehat{\bigotimes}\_{2} \right| + \frac{1}{2} \left| \mathbf{l}(\bigotimes\_{1}) - \mathbf{l}(\bigotimes\_{2}) \right| \tag{8}$$

*We calculated the assessed values separately according to the algorithm of interval grey numbers* A<sup>k</sup> *and the historical reference point value* Ck <sup>1</sup> *and the distance between the evaluated value* Ak *from the desired reference point value* C<sup>k</sup> <sup>2</sup> *and the distance of the cost value B and the bottom-line reference point value* C3 *to obtain the distance matrix* d AkCk 1 *,* d AkCk 2 *,* d(BC3).

2.3.4. Calculation of Prospective Values of Assessed and Cost Values Based on Distance

We compare the distances of the assessed and cost values calculated above to the reference point values by kernels of interval grey numbers. We substitute them into the prospect theory equation to obtain k matrix of prospective values of assessed values based on historical reference points, the k prospect value matrix of appraisal values based on desired reference points, and 1 prospect value matrix of cost values based on bottom-line reference points.

$$\mathbf{V}\_{\vec{\mathbf{y}}(1)}^{\mathbf{k}c\_{\mathrm{i}}} = \begin{cases} \left(\mathrm{d}\left(\mathbf{A}^{\mathbf{k}}\mathbf{C}\_{\mathrm{l}}^{\mathbf{k}}\right)\right)^{\alpha} \widehat{\otimes}\_{\mathbf{a}\_{\mathrm{i}}^{\mathbf{k}}(\otimes)} > \widehat{\otimes}\_{\mathbf{c}\_{\mathrm{l}}^{\mathbf{k}}(\otimes)}\\ -\theta \ast \left(\mathrm{d}\left(\mathbf{A}^{\mathbf{k}}\mathbf{C}\_{\mathrm{l}}^{\mathbf{k}}\right)\right)^{\beta} \widehat{\otimes}\_{\mathbf{a}\_{\mathrm{i}}^{\mathbf{k}}(\otimes)} < \widehat{\otimes}\_{\mathbf{c}\_{\mathrm{i}}^{\mathbf{k}}(\otimes)} \end{cases} \tag{9}$$

$$\mathbf{V}^{\mathbf{k}\mathbf{c}\_{2}}\_{\vec{\mathbf{u}}(1)} = \begin{cases} \left(\mathbf{d}\left(\mathbf{A}^{\mathbf{k}}\mathbf{C}^{\mathbf{k}}\_{2}\right)\right)^{\alpha} \widehat{\bigotimes}\_{\mathbf{a}^{\mathbf{k}}\_{\vec{\mathbf{u}}}(\otimes)} > \widehat{\bigotimes}\_{\mathbf{c}^{\mathbf{k}}\_{2\vec{\mathbf{u}}}(\otimes)}\\ -\boldsymbol{\theta}\*\left(\mathbf{d}\left(\mathbf{A}^{\mathbf{k}}\mathbf{C}^{\mathbf{k}}\_{2}\right)\right)^{\beta} \widehat{\bigotimes}\_{\mathbf{a}^{\mathbf{k}}\_{\vec{\mathbf{u}}}(\otimes)} < \widehat{\bigotimes}\_{\mathbf{c}^{\mathbf{k}}\_{\vec{\mathbf{u}}}(\otimes)} \end{cases} \tag{10}$$

$$\mathcal{V}^{\mathbb{C}\_{\triangleright}}\_{\vec{\mathsf{ij}}(1)} = \begin{cases} (\mathsf{d}(\mathsf{BC}\_{3}))^{\alpha} \xleftarrow{\widehat{\mathsf{Op}}\_{\vec{\mathsf{p}}}} \widehat{\mathsf{Op}}\_{\vec{\mathsf{p}}}(\otimes) < \widehat{\mathsf{Op}}\_{\vec{\mathsf{p}}}(\otimes) \\ -\mathsf{e} \ast (\mathsf{d}(\mathsf{BC}\_{3}))^{\beta} \xleftarrow{\widehat{\mathsf{Op}}\_{\vec{\mathsf{p}}}} \widehat{\mathbb{B}\_{\vec{\mathsf{q}}}}(\otimes) > \widehat{\mathsf{Op}}\_{\vec{\mathsf{q}}}(\otimes) \end{cases} \tag{11}$$

As the assessed value is a benefit type of data, the larger the value, the better, with gains above the reference point and losses below it. Cost values, on the other hand, are cost-based data; the smaller the value, the better with gains below the reference point and losses above the reference point.

Formula (9) in Vkc1 ij(1) denotes the prospective value of the first stage team's assessment of talent based on the historical reference point, as determined by the assessment value ak ij( ,) and the reference point value ck 1j( ,) calculated as a power function of the distance between the assessment value and the reference point value. Equation (10) in Vkc2 ij(1) denotes the prospect value of the assessment value based on the desired reference point, calculated as a power function of the distance between the assessment value a<sup>k</sup> ij( ,) and the value of the reference point c<sup>k</sup> 2j( ,) is calculated as a power function of the distance between the assessed value and the reference point value. Equation (11) in Vc3 ij(1) indicates that the assessment value is based on the prospective value of the historical reference point, obtained by a power function of the distance between the assessment value bij( ,) and the value of the reference point c3j( ,) is calculated as a power function of the distance between the assessment value and the reference point value. The prospect value is determined by the subjective perception of the decision maker, and the gains and losses are relative to the reference point.α and β denote the decision-maker's risk attitude coefficients in the gain and loss regions, respectively. α, β < 1 denote that the decision-maker's sensitivity is decreasing, and in this study, α = 0.88, β = 0.88. θ is the coefficient of the decision-maker's loss perception, and because decision-makers are risk-averse in the face of gains and riskaverse in the face of losses. When θ > 1, decision-makers are steeper and more sensitive in the loss region than in the gain region. In this study, θ = 2.25.

#### 2.3.5. Normalization of Assessed Value Prospect and Cost Value Prospect and Aggregation

The foreground values we obtained according to foreground theory do not meet the requirements of normalization and need to be normalized so that their values fall on [−1, 1]. We used the maximum value method for normalization, which is easy and convenient to operate and retains the characteristics of the original data.

$$\mathbf{M}^{\mathbf{k}c\_{\mathbf{i}}}\_{\mathbf{i}\mathbf{j}(1)} = \frac{\mathbf{V}^{\mathbf{k}c\_{\mathbf{i}}}\_{\mathbf{i}\mathbf{j}(1)}}{\max\_{\mathbf{i}} \max\_{\mathbf{j}} \mathbf{V}^{\mathbf{k}c\_{\mathbf{i}}}\_{\mathbf{i}\mathbf{j}(1)}}. \mathbf{M}^{\mathbf{k}c\_{\mathbf{i}}}\_{\mathbf{i}\mathbf{j}(1)} = \frac{\mathbf{V}^{\mathbf{k}c\_{2}}\_{\mathbf{i}\mathbf{j}(1)}}{\max\_{\mathbf{i}} \max\_{\mathbf{j}} \mathbf{V}^{\mathbf{k}c\_{2}}\_{\mathbf{i}\mathbf{j}(1)}}. \mathbf{M}^{\mathbf{C}3}\_{\mathbf{i}\mathbf{j}(1)} = \frac{\mathbf{V}^{\mathbf{c}3}\_{\mathbf{i}\mathbf{j}(1)}}{\max\_{\mathbf{i}} \max\_{\mathbf{j}} \mathbf{V}^{\mathbf{c}3}\_{\mathbf{i}\mathbf{j}(1)}} \tag{12}$$

Let the kth team be given an assessed value indicator weight of ω<sup>k</sup> <sup>j</sup> and the weight of the cost value indicator is μ<sup>j</sup> then, the matrix of prospective values of assessed values based on historical reference points Nkc1 <sup>i</sup> , the assessed value prospect value matrix based on the desired reference point Nkc2 <sup>i</sup> , and the cost value prospect values based on the bottom-line reference point Nc3 <sup>i</sup> are.

$$\mathbf{N}\_{\mathbf{i}(1)}^{\mathbf{k}\mathbf{c}\_{\mathbf{i}}} = \sum\_{\mathbf{j}=1}^{\mathbf{P}} \omega\_{\mathbf{j}}^{\mathbf{k}} \mathbf{M}\_{\overline{\mathbf{i}}(1)'}^{\mathbf{k}\mathbf{c}\_{\mathbf{i}}}, \mathbf{N}\_{\mathbf{i}(1)}^{\mathbf{k}\mathbf{c}\_{\mathbf{i}}} = \sum\_{\mathbf{j}=1}^{\mathbf{P}} \omega\_{\mathbf{j}}^{\mathbf{k}} \mathbf{M}\_{\overline{\mathbf{i}}(1)'}^{\mathbf{k}\mathbf{c}\_{\mathbf{i}}}, \mathbf{N}\_{\mathbf{i}(1)}^{\mathbf{c}\_{\mathbf{i}}} = \sum\_{\mathbf{j}=1}^{\mathbf{q}} \mu\_{\mathbf{j}} \mathbf{M}\_{\overline{\mathbf{i}}(1)}^{\mathbf{c}\_{\mathbf{i}}} \tag{13}$$

For Nkc1 i(1) , Nkc2 i(1) , and Nc3 <sup>i</sup>(1) transpose processing.

$$\mathbf{TS}\_{\mathbf{ts}(1)}^{\mathbf{c}\_1} = \begin{bmatrix} \mathbf{N}\_{\mathbf{i}}^{\mathbf{k}\mathbf{c}\_1 \mathbf{T}} \end{bmatrix}, \mathbf{TS}\_{\mathbf{ts}(1)}^{\mathbf{c}\_2} = \begin{bmatrix} \mathbf{N}\_{\mathbf{i}}^{\mathbf{k}\mathbf{c}\_2 \mathbf{T}} \end{bmatrix}, \mathbf{TS}\_{\mathbf{ts}(1)}^{\mathbf{c}\_3} = \begin{bmatrix} \mathbf{N}\_{\mathbf{i}}^{\mathbf{c}\_3 \mathbf{T}} \end{bmatrix} \tag{14}$$

Let the historical reference point weights of this study be θ and the desired reference point weight is 1 − θ. First, the assessment value prospect value based on the historical reference point and the desired reference point is set to obtain the assessment value composite prospect value. Then, the assessment value composite prospect value is set with the cost value prospect value, and the weight of the assessment value composite prospect value is δ and the weight of the cost-value prospect value is 1 − δ. After agglomeration, we obtained the first-stage matched composite prospect value, Sij(1).

$$\rm TS\_{\rm ts(1)} = \delta \times \left( \theta \times \left( \rm TS\_{\rm ts(1)}^{\rm c} \right) + (1 - \theta) \times \left( \rm TS\_{\rm ts(1)}^{\rm c2} \right) \right) + (1 - \delta) \times \left( \rm TS\_{\rm ts(1)}^{\rm c3} \right) \tag{15}$$

2.3.6. Construction of a Bilateral Matching Model to Derive the Results of the First Phase of Elimination Matching

Professor Gale of Brown University and the renowned economist Professor Sharply pioneered the theory of bilateral matching decisions in 1962 with their article, "College Admissions and Stable Marriages". Let μ : P ∪ Q → Q ∪ P be a one-to-one mapping, if ∀Pi ∈ P. ∀Qj ∈ Q satisfies μPi = Qj ∈ Q, and μQj = Pi ∈ P, then we call μ is a two-way matching. μ(Pi) = Qj denote Pi with Qj in μ in the match, and μQj = Qj indicates that Qj in μ does not match.

For the bilateral matching decision problem between teams and scientific and technological talent in new R&D institutions, forming a reasonable and effective bilateral matching scheme is the common demand of teams and talent, and constructing a bilateral matching model and proposing a solution algorithm is the most crucial step. The traditional Gale—Sharply algorithm performs research in preference order, and this paper proposes bilateral matching based on uncertain interval grey numbers and prospect theory based on reference points. In the first stage, the new R&D organization team and the scientific and technological talents are many-to-many matching. The new R&D organization achieves complete matching, the scientific and technological talents are incomplete matching, and the scientific and technological talents without successful matching are eliminated. In this study, we constructed the first-stage elimination matching model M-1.

$$\text{MAX}\,\%\,Z = \sum\_{\text{t}=1}^{\text{n}} \sum\_{\text{s}=1}^{\text{m}} \pi\_{\text{ts}} \,\text{TS}\_{\text{ts}(1)}\tag{16}$$

$$\text{s.t.} \begin{cases} \pi\_{\text{ts}} \in [0 \text{ or } 1] \\ \sum\_{t=1}^{n} \pi\_{\text{ts}} = 1 \\ \sum\_{s=1}^{m} \pi\_{\text{ts}} \le 1 \end{cases} \tag{17}$$

The objective of the M-1 model is to transform the dual objective of maximizing the prospective value of the assessed value and prospective value of the cost value into a single objective. The constraints are as follows: the weight vector πts can only be 0 or 1; each new R&D organization team must be matched to a scientific and technological talent, and only one talent can be matched; each scientific and technological talent may not be matched to a new R&D organization team.

#### **Theorem 1.** *M-1 The model must have an optimal solution.*

Note: According to the optimality existence theorem, any single-objective programming with bounded feasible domain must be optimal over its feasible domain. M-1 The model is a single-objective programming problem, and the feasible domain of the model exists and is bounded. Therefore, the M-1 model must have an optimal solution. M-1 It can be solved using LINGO software, and the solution of πts = 1 means that the tth new R&D institution team and the sth technological talent are successfully matched. πts = 0 means that the tth new R&D institution team and the sth technological talent do not match successfully, and the m-n technological talents are eliminated.

#### **3. New R&D Organization Team and Science and Technology Talent Second Stage selection Match**

After the first phase of elimination matching, the number of teams and talents is equal, and the matching is still many-to-many. After one year, when the trial period expires, the team re-evaluates the talent and determines the team they will eventually work for. In the second stage of matching, the highest evaluation value of the team to the talent is considered, and the highest satisfaction of the feedback of the ranking value of the team by the tech talent is also considered, setting historical reference points and expectation reference points for the evaluation value and bottom-line reference points for the satisfaction, fully considering the psychological factors of the new R&D organization team and the tech talent in both directions to better realize the two-way selection, and finally using the 0–1 The integer planning model is used for bilateral matching. In the second stage of selection matching, because the number of scientific and technical talents is the same as the number of R&D teams, the pressure on decision makers will be much less, there will be no more unmatched situations, and teams and talents achieve complete matching, as shown in Figure 2.

**Figure 2.** Matching diagram of the second stage of selection of teams and talents in new R&D institutions.

#### *3.1. Collecting the Team's Assessment Value of Scientific and Technical Talents and the Ranking Value of Talents to the Team*

The first phase of elimination matching is dominated by new R&D institutions and does not consider the dominant weight of talents. Therefore, the second phase of selection matching considers the assessment of scientific and technological talents by the team of new R&D institutions and also considers the ranking of scientific and technological talents to new R&D institutions.

We collected the assessed values of the second stage of the new R&D organization team for scientific and technological talents, characterized by the average value of the expert assessment. The reference point remains unchanged, and the data of the historical reference point and the desired reference point of the first stage are still used, since the expert assessments in this study are all scored on a percentage scale. In order to achieve the standardization requirements of the data, all the data are averaged and then divided by 100, so that all the assessed values, the reference point values lie between 0 and 1. We obtain the matrix of assessment values Ek = ek ij , i = 1, 2, ··· m, j = 1, 2, ··· p, k = 1, 2, ··· n.

We consider the ranking of teams in new R&D institutions by scientific and technical talents to obtain information on the preference order of scientific and technical talents for teams F = [fts], t = 1, 2, ··· n, s = 1, 2, ··· m. To facilitate the assembly of data, the preference order was transformed into satisfaction according to certain rules. The transformation rules are listed in Table 3. According to the transformation rules, it is transformed into satisfaction information G = . gts/ , t = 1, 2, ··· n, s = 1, 2, ··· m.

**Table 3.** Rules for converting the preference order of talents to satisfaction with the teams.


*3.2. Matching Integrated Prospect Values Based on Historical, Desired, and Bottom-Line Reference Point Sets for the Second Stage*

Based on the historical reference point value and the desired reference point value, the distance from the team's assessment of talent to the historical and desired reference points is calculated. Because it is a real number, the distance is directly subtracted and calculated by taking the absolute value. Let the bottom-line reference point of scientific and technological talents' satisfaction with the team be 0.5. Calculate the distance from the satisfaction to the bottom-line reference point, and because it is a real number, the distance is directly subtracted and calculated by taking the absolute value. The prospect value is calculated according to prospect theory.

$$\mathbf{V}^{\mathbf{k}c\_1}\_{\vec{\mathbf{ij}}(2)} = \begin{cases} \left| \mathbf{e}^{\mathbf{k}}\_{\vec{\mathbf{ij}}} - \mathbf{e}^{\mathbf{k}c\_1}\_{\mathbf{j}} \right|^\alpha \mathbf{e}^{\mathbf{k}}\_{\vec{\mathbf{ij}}} > \mathbf{e}^{\mathbf{k}c\_1}\_{\mathbf{j}} \\ -\boldsymbol{\mathfrak{g}} \ast \left| \mathbf{e}^{\mathbf{k}}\_{\vec{\mathbf{ij}}} - \mathbf{e}^{\mathbf{k}c\_1}\_{\mathbf{j}} \right|^\beta \mathbf{e}^{\mathbf{k}}\_{\vec{\mathbf{ij}}} < \mathbf{e}^{\mathbf{k}c\_1}\_{\mathbf{j}} \end{cases} \tag{18}$$

$$\mathbf{V}^{\mathbf{k}c\_2}\_{\vec{\mathbf{ij}}(2)} = \begin{cases} \left| \mathbf{e}^{\mathbf{k}}\_{\vec{\mathbf{ij}}} - \mathbf{e}^{\mathbf{k}c\_2}\_{\mathbf{j}} \right|^\alpha \mathbf{e}^{\mathbf{k}}\_{\vec{\mathbf{ij}}} > \mathbf{e}^{\mathbf{k}c\_2}\_{\mathbf{j}} \\ -\boldsymbol{\Theta} \ast \left| \mathbf{e}^{\mathbf{k}}\_{\vec{\mathbf{ij}}} - \mathbf{e}^{\mathbf{k}c\_2}\_{\mathbf{j}} \right|^\beta \mathbf{e}^{\mathbf{k}}\_{\vec{\mathbf{ij}}} < \mathbf{e}^{\mathbf{k}c\_2}\_{\mathbf{j}} \end{cases} \tag{19}$$

$$\mathbf{V}\_{\rm ts}^{\rm c\_3} = \begin{cases} \left| \mathbf{g\_{ts}} - \mathbf{0}.5 \right|^\alpha \mathbf{f\_{ts}} > 0.5\\ -\boldsymbol{\Theta} \ast \left| \mathbf{g\_{ts}} - \mathbf{0}.5 \right|^\beta \mathbf{f\_{ts}} < 0.5 \end{cases} \tag{20}$$

Combining the prospect value of the team's assessment of talent based on the historical reference point Vkc1 ij and the prospect value of the assessed value based on the desired reference point Vkc1 ij , combining the indicator weights of the assessed values given by the team as ω<sup>k</sup> j ,

$$\mathbf{N}\_{\mathbf{i}(2)}^{\mathbf{k}c\_1} = \sum\_{\mathbf{j}=1}^{\mathbf{P}} \omega\_{\mathbf{j}}^{\mathbf{k}} \mathbf{M}\_{\mathbf{i}(1)}^{\mathbf{k}c\_1}, \mathbf{N}\_{\mathbf{i}(2)}^{\mathbf{k}c\_2} = \sum\_{\mathbf{j}=1}^{\mathbf{P}} \omega\_{\mathbf{j}}^{\mathbf{k}} \mathbf{M}\_{\mathbf{i}(1)}^{\mathbf{k}c\_2} \tag{21}$$

For Nkc1 i(1) , Nkc2 <sup>i</sup>(1) to transpose the treatment.

$$\mathbf{TS}\_{\mathbf{ts}(2)}^{\mathbf{c}1} = \begin{bmatrix} \mathbf{N}\_{\mathrm{i}}^{\mathrm{kc}1} \mathbf{T} \end{bmatrix} \text{ , } \mathbf{TS}\_{\mathrm{ts}(2)}^{\mathbf{c}2} = \begin{bmatrix} \mathbf{N}\_{\mathrm{i}}^{\mathrm{kc}2} \mathbf{T} \end{bmatrix} \tag{22}$$

point weight be 1 − θ. The assessment value prospect values based on the historical reference point and the desired reference point are first pooled to obtain the assessment value composite prospect value, and then the assessment value composite prospect value is pooled with the satisfaction prospect value, and the weight of the assessment value composite prospect value is σ and the weight of the satisfaction prospect value is 1 − σ. After agglomeration, we obtain the second-stage matched composite prospect value, Sij(2).

$$\mathrm{TS}\_{\mathrm{ts}(2)} = \sigma \times \left(\theta \times \left(\mathrm{TS}\_{\mathrm{ts}(2)}^{\mathrm{c}\_{1}}\right) + (1 - \theta) \times \left(\mathrm{TS}\_{\mathrm{ts}(2)}^{\mathrm{c}\_{2}}\right)\right) + (1 - \sigma) \times \left(\mathrm{V}\_{\mathrm{ts}(2)}^{\mathrm{c}\_{3}}\right) \tag{23}$$

#### *3.3. Construction of a Bilateral Matching Model to Derive the Second Stage Selection Matching Results*

In the second stage, the new R&D organization team and the scientific and technological talents are many-to-many matched, the new R&D organization achieves complete matching, the scientific and technological talents are incompletely matched, and the scientific and technological talents without successful matching are eliminated. In this study, we constructed the first-stage elimination matching model, M-2.

$$\text{MAX}\,\%\,Z = \sum\_{\text{t}=1}^{\text{n}} \sum\_{\text{s}=1}^{\text{m}} \varphi\_{\text{ts}} \,\text{TS}\_{\text{ts}}\tag{24}$$

$$\text{s.t.} \begin{cases} \begin{array}{l} \text{q}\_{\text{ts}} \in [0 \text{ or } 1] \\ \sum\text{q}\_{\text{ts}} = 1 \\ \begin{array}{l} \text{m} \\ \text{m} \\ \sum\text{q}\_{\text{ts}} = 1 \end{array} \end{cases} \tag{25}$$

The objective of the M-1 model is to transform the dual objective of maximizing the prospect value of the team's assessment of talent and the prospect value of talent's satisfaction with the team's sorting transformation into a single objective. The constraints are as follows: the weight vector ϕts can only be 0 or 1; each new R&D organization team must be matched to a technology talent, and only one talent can be matched; each technology talent must be matched to a new R&D organization team; and only one team can be matched.

#### **Theorem 2.** *M-2 The model must have an optimal solution.*

Note: According to the optimality existence theorem, any single-objective programming problem with a bounded feasible domain must be optimal over its feasible domain. The M-2 model is a single-objective programming problem, and the feasible domain of the model exists and is bounded. Thus, the M-2 model must have an optimal solution. The M-2 model can be solved using LINGO software, and the solution of ϕts = 1 means that the tth new R&D institution team and sth technological talent are successfully matched. ϕts = 0 means that the tth new R&D institution team and the sth technological talent are not successfully matched.

In the second stage, the number of teams of new R&D institutions and scientific and technological talent are equal, and one team matches one scientific and technological talent, which is a many-to-many exact match, and a two-way selection match due to the consideration of two-way assessment.

#### *3.4. Methodological Steps of This Paper*

The steps of this study's prospect theory-based two-stage bilateral matching method for teams and scientific and technological talents in new R&D institutions are as follows.

In the first step, based on the first stage of elimination matching of new R&D institution teams and scientific and technological talents, a general framework of assessment indicators for new R&D institution teams, personalized assessment indicators, and indicators for the cost of introducing scientific and technological talents are constructed.

In the second step, the team-to-talent assessment data of new R&D institutions based on personalized assessment indicators in the first stage, historical reference point data and desired reference point data, cost of introducing technology talent data, and bottom-line reference point data, all characterized by interval grey numbers, were collected to calculate the benefit prospect value and cost prospect value of team-to-talent based on prospect theory and the given indicator weights, respectively.

In the third step, based on the benefit-cost weights, the integrated prospect values of the first stage are integrated, substituted into the M-1 model, and solved using LINGO software to obtain the elimination matching results of the first stage.

In the fourth step, based on the second stage of selection matching between the team of the new R&D institution and the scientific and technological talents, the assessment data of the team of the new R&D institution to the scientific and technological talents are collected, characterized by the mean value, following the historical reference point and desired reference point data in the first stage. The data were divided by 100 for normalization. While the ranking value of the scientific and technological talents to the team is collected, the ranking value is converted into satisfaction and given the satisfaction bottom-line reference point value. Based on the prospect theory and the given index weights, the evaluation prospect values of both team and talent are calculated.

In the fifth step, the prospect values of the second stage are integrated based on the team-talent weights, substituted into the M-2 model, and solved using LINGO software to obtain the selection matching results of the second stage.

#### **4. Case Studies**

#### *4.1. Background of the New R&D Agency Team*

The new R&D institution T was established in 2019 by the talent team of the University of D, the government, and social capital, of which the talent team holds 70% of the shares. The new R&D institution T aims at the frontier of science and technology and market demand and implements the specific measures of General Secretary Xi Jinping's "three firsts," reflecting the organic unity of superior disciplines, innovation clusters, and high-quality development. Currently, institutions are mainly engaged in new energy and intelligent distributed power generation, new high-efficiency refrigeration and heat pumps and equipment, key technologies for building environment and air quality, and new energy system optimization and monitoring research, actively building a national research platform for building intelligent environmental energy. As a leading domestic intelligent energy development and innovation enterprise, the institute adheres to the business philosophy of integrity, professionalism, aggressiveness, and cooperation and has accumulated rich experience in research and development, manufacturing, construction, operation, and maintenance in the fields of building energy conservation, distributed photovoltaic power generation, and comprehensive energy utilization, which is dedicated to providing customers with high-quality, fast, and efficient services. Few energy companies do energy extraction independently, and generally they work in partnership with many other companies. Now that many new energy companies are emerging, and they are eager to find clean alternative energy sources, there is a greater need for collaboration among scientific and technical talents in multiple fields. The need to recruit more scientific and technical talents and make their research and development direction match the team.

The new R&D institution T currently has four teams: the production team of highefficiency building energy-saving equipment, the R&D team of intelligent environmental energy, the team of power engineering design and construction, and the team of distributed energy management. Since its establishment, the new R&D organization T has introduced 36 high-end talents from home and abroad, who have PhD degrees, achieved remarkable results in the industry, won national awards, and earned considerable economic income. The new R&D organization T has always attached importance to the introduction and training of scientific and technological talent. Scientific and technological talent development is good—constantly pioneering and innovative— and has won more than 20 national

projects, the National Technical Invention Award, more than 20 national projects, one national technical invention award, more than 160 invention patents authorized, and four incubated enterprises.

The new R&D organization T is always looking for scientific and technical talents in environmental engineering and electronic information, requiring solid knowledge of environmental monitoring, theoretical knowledge of electronic information technology, relevant papers published in foreign professional academic journals, familiarity with laboratory-related technical experimental processes, a PhD degree, 1–3 years of relevant work experience, strong work responsibility, excellent language skills, and communication and coordination skills. Other requirements include the ability to handle complex problems and critical incidents independently, a strong work motivation, a sense of proactive service, etc. An annual salary of ¥100,000–¥1,000,000 is offered working in the Yangtze River Delta.

The new R&D institution T—in August this year, four teams put forward plans to recruit talents, the personnel department after screening, preliminary tests, practice, and other links. Finally, identified were six scientific and technical talents, the decision-making power to the team, and the team sent experts to conduct the first stage of elimination matching and the second stage of selection matching. Based on the results of the firststage matching, the scientific and technical talents will enter four teams for a one-year probationary period and work in each team for three months. Science and technology talents will finalize which team to work on based on the second stage of selection matching. According to the two-way assessment index system proposed in this paper for teams and tech talents in new R&D institutions, both teams and talents have completed the assessment, and the data and bilateral matching are presented in detail next.

#### *4.2. Phase I Phase-Out Matching of Teams and Scientific and Technological Talents in New R&D Institutions*

We obtained the assessed values of the T1-T4 team for six scientific and technical talents from the HR director of the new R&D organization T; the values are characterized by interval gray numbers, as shown in Tables 4–7. There are 22 science and technology teams in the new R&D organization T. Four of the teams had similar specialties, and each team sent three experts to assess the science and technology talents under each indicator. Twelve experts from each of the four teams filled in the assessment values under the percentage system. In this study, the uncertainty of the team's assessment value of talent is fully considered; therefore, the interval grey number is used to characterize the assessment value.


**Table 4.** Assessed values of talents S by team T1 in new R&D institutions.


**Table 5.** The assessed values of the new R&D organization team T2 for the talents S.

**Table 6.** Assessed values of the new R&D organization team T3 for talents S.




As can be seen from the above table:

(1) Each of the four teams of new R&D institutions selected their own personalized as-

sessment indicators and assessed their talent based on personalized assessment indicators. (2) Assessment data have a certain level of uncertainty, characterized by interval grey numbers, which are different for each assessment.

(3) The four teams of new R&D institutions provide both historical and desired reference point data, which are also uncertain and characterized by interval grey numbers.

The data in the above table do not show a bilateral match between the NRA team and scientific and technological talent, and a suitable method of integration is needed to facilitate comparison and analysis. Since the teams of the new R&D institutions rely on the new R&D institutions to select and hire scientific and technological talents, but the mechanism is very flexible, the new R&D institutions give the teams great hiring decisions and autonomy, and the funds are also allocated in a lump sum way, so the team leaders were invited to use the Delphi method to determine the indicator weights, and after several confirmations, it was finally determined that the personalized indicator weights of the four teams of the new R&D institutions were ω1j = (0.2, 0.1, 0.3, 0.2, 0.2), and ω2j = (0.1, 0.3, 0.3, 0.2, 0.1), and ω3j = (0.1, 0.4, 0.2, 0.2, 0.1), and ω4j = (0.15, 0.3, 0.15, 0.25, 0.15). In this study, a historical reference point and an expected reference point were set, and assuming that both reference points are equally important, θ = 0.5.

Based on the interval grey number distance formula and foreground theory formula, indicator weights, and reference point weights, we integrate to obtain the matrix of foreground values based on the dual reference points of historical reference points and desired reference points, as shown in Table 8.

**Table 8.** Prospect values for the assessment of talents S by the new R&D organization teams T.


As can be seen from the table above:

(1) The results of the assessment of scientific and technological talents by the four teams of the new R&D institutions are different: some are satisfied, some are not, and the degree of satisfaction and dissatisfaction are also different and ranked differently.

(2) Team T1 of the new R&D institution is satisfied with S1 and S5, and dissatisfied with S2, S3, S4 and S6, with a satisfaction rate of 33.33%, and the ranking of scientific and technological talents is S5 S1 S2 S6 S4 S3.

(3) Team T2 of the new R&D organization is satisfied with S1 and S2, and is dissatisfied with S3, S4, S5 and S6, with a satisfaction rate of 33.33%, and ranks the scientific and technological talents as S2 S1 S3 S5 S6 S4.

(4) Team T3 of the new R&D organization is satisfied with S2 and S4, and is dissatisfied with S1, S3, S5 and S6, with a satisfaction rate of 33.33%, and ranks the scientific and technological talents as S2 S4 S1 S5 S3 S6.

(5) Team T4 of the new R&D organization is satisfied with S2 and S5, and is dissatisfied with S1, S3, S5 and S6, with a satisfaction rate of 33.33%, and ranks the scientific and technological talents as S5 S2 S3 S4 S6 S1.

If we let the new R&D organization team T and tech talent S match directly through the satisfaction of prospect value representation, then it appears that teams T1 and T5 select technology talent S5, and teams T3 and T4 select tech talent S2 so that only two tech talents are selected. The phenomenon of internal talent grabbing occurs, which does not achieve the overall optimum of the new R&D organization. In the first stage, we select the technology talent that can match the applicable stage and eliminate the two tech talents. Some may also say that removing the least satisfactory ones is necessary. Our analysis found that T1 team eliminated S3, T2 team eliminated S4, T3 team eliminated S6, and T4 team eliminated S1. This eliminates too many again because we do not know how dominant the team is in the process.

In the first stage of matching the team of the new R&D organization and the scientific and technological talent belonging to the elimination stage, we must consider the high satisfaction of the team of the new R&D organization with the assessed value of the scientific and technological talent relative to the reference point, in addition to the low cost of introducing scientific and technological talent. We collected data on the cost of introducing these six scientific and technological talents into different teams, as shown in Table 9.

**Table 9.** Costs of bringing in scientific and technological talents S.


As can be seen from the table above:

(1) The cost of introducing scientific and technical talent S is different and uncertain and can only be estimated as an approximate range within which any value taken is possible, which we characterize as an interval grey number.

(2) The cost of bringing in scientific and technological talent under different projects varies, with the cost of housing subsidies generally being higher, and other incentives lower.

(3) The new R&D organization gives a bottom-line reference point, and feels a loss when the cost of bringing in technology talent exceeds the bottom-line reference point, and a gain when the cost of bringing in technology talent is less than the bottom-line reference point.

The introduction cost alone does not indicate the situation of scientific and technological talents, and we need to use the bottom-line reference point to measure whether the new R&D institutions pay the introduction cost as a gain or a loss. In this paper, we calculate the introduction cost of scientific and technological talents based on the bottom-line reference point, and set the index weight of the introduction cost of scientific and technological talents as ω5j = (0.25, 0.25, 0.2, 0.2, 0.1) to get the prospective value of the introduction cost of six scientific and technological talents.ST1j = (−0.2509, 0.0188, 0.2156, 0.3149, −0.0846, 0.2919). From the prospective values of the introduction cost of scientific and technological talent, it can be seen that:

(1) Scientific and technological talents S1 and S5 are beyond the bottom-line reference point, and new R&D institutions would feel a loss if they brought them in.

(2) Scientific and technological talents S2, S3, S4, and S6 do not exceed the bottom-line reference point, and new R&D institutions will feel the benefits of bringing them in.

(3) The highest prospective value of the introduction cost of S4 for scientific and technological talents indicates that S4 is the least expensive, but this does not mean that the introduction of S4 is appropriate and has to be considered in conjunction with the assessment of S4 by the team of the new R&D institution.

We cannot consider the selection and recruitment of scientific and technological talent from the perspective of cost alone, but should be combined with the possible benefits of scientific and technological talent for comprehensive consideration. In this study, we consider dual-objective bilateral matching between the maximum team satisfaction and the minimum cost of introducing scientific and technological talents in the new R&D organization, and simplify the dual objective into a single objective to construct a matching model. To achieve the overall optimum of the new R&D organization, we apply the M-1 bilateral matching model in this paper for matching and set the weights of the benefit assessment of the team and the scientific and technological talents as δ = 0.5, and the weight of cost assessment is 1 − δ = 0.5. This is because if technology talent can create more value, the more promising technology talent cannot be missed because of the low cost of the new R&D organization. The comprehensive prospect value is based on the combination of historical reference points and desired reference points, and the introduction cost, which fully considers the psychological and cost factors of decision makers. The higher the comprehensive prospect value, the higher the satisfaction of decision-makers, and the lower the comprehensive prospect value, the lower the satisfaction of decision-makers. Based on the prospect value of the assessment value, the prospect value of the introduction cost, and the weights of the two prospect values δ, we obtain the matrix of integrated prospect values for the first-stage matching of new R&D organization team T and scientific and technological talent S, as shown in Table 10.

**Table 10.** Combined Phase I outlook values for new R&D organization teams T and talents S.


As can be seen from the table above:

(1) After incorporating the cost of introduction, the satisfaction rate and ranking of scientific and technological talents by the team of new R&D institutions changed.

(2) Team T1 of the new R&D organization is satisfied with S2, S4, S5, and S6, and is dissatisfied with S1 and S3, with a satisfaction rate of 66.67%, and ranks scientific and technological talents as S6 S5 S4 S2 S3 S1.

(3) Team T2 of the new R&D organization is satisfied with S2 and S3, and is dissatisfied with S1, S4, S5 and S6, with a satisfaction rate of 33.33%, and ranks the scientific and technological talents as S2 S3 S6 S4 S1 S5.

(4) Team T3 of the new R&D organization is satisfied with S2 and S4, and is dissatisfied with S1, S3, S5 and S6, with a satisfaction rate of 33.33%, and ranks the scientific and technological talents as S4 S2 S3 S5 S1 S6.

(5) Team T4 of the new R&D organization is satisfied with S2, S3, S4, S5 and S6, and is dissatisfied with S1, with a satisfaction rate of 83.33%, and ranks the scientific and technological talents as S4 S3 S6 S2 S5 S1.

The combined prospect values of the first-stage matching of new R&D organization team T and technology talent S were substituted into the M-1 bilateral matching model and solved using LINGO software to obtain the first stage team-talent matching results, as shown in Table 11.

**Table 11.** Results of the first stage matching of new R&D organization teams T and talents S.


From the above table we can see that:

(1) New R&D organization teams T1, T2, and T3 were matched with the highest overall prospect value of science and technology talent, and T4 was more satisfied with S3.

(2) Tech talents S1 and S5 are eliminated during the first matching phase.

The match between the team of the new R&D institution and the scientific and technological talents in the first stage considers the team's assessment of the talents and the cost of introducing talent, which is more in line with the actual situation. In the process of the team's assessment of talent, the team's personalized indicators, historical reference points, and desired reference points are taken into account so that the team's psychological gain and loss can be obtained, and whether the team is satisfied, and the degree of satisfaction can be seen from the assessment prospect value. In the process of negotiating the introduction cost between the team and the talent, the bottom-line reference points of the new R&D organization are taken into account so that the gain and loss of the new R&D organization as a whole on the cost can be obtained, whether it is satisfied, and the degree of satisfaction from the cost prospect value. The prospect values of the combined benefits and costs of the team and talent were substituted into the M-1 model for bilateral matching and the matching results obtained were more convincing.

#### *4.3. Second-Stage Trial Matching of Teams and Scientific and Technical Talents in New R&D Institutions*

After the first phase of elimination matching, S2, S3, S4, and S6 enter the second phase of probationary matching. Each science and technology talent enters the new R&D institution to work and learn for a one-year probationary period, which is applied in four teams on a rotating basis for a three-month probationary period. After a year of probation in team T of the new R&D organization, the new R&D organization team again evaluated the scientific and technical talents to determine which team the scientific and technical talents would eventually work inside. In this study, we collect the values of the second stage of the assessment of scientific talent by the new R&D organization team, which is characterized by the mean value, with the historical and desired reference points also using the data given in the first stage. All data are divided by 100 and normalized so that they fall between [0,1], as shown in Tables 12–15. For ease of calculation, the scientific and technical talents were still matched with their original numbers.


**Table 12.** Assessment values of talents S by the new R&D organization team T1 phase 2.

**Table 13.** Assessment values of talents S by the new R&D organization team T2 phase 2.


**Table 14.** Assessment values of talents S by the new R&D organization team T3 phase 2.


**Table 15.** Assessment values of talents S by the new R&D organization team T4 phase 2.


As can be seen from the above table:

(1) The assessed values of the four teams of the new R&D organization on the S of the scientific and technological talents have changed compared to the assessed values in the first stage. Because scientific and technological talents have been on trial for a period of time, the new R&D organization teams are not familiar with the scientific and technological talents to the same extent and have a new understanding of the scientific and technological talents. The scientific and technological talents are also influenced by the team culture and values in the new working environment. For example, if everyone in the team is determined to serve the motherland, and meetings are often held with ideological education, then scientific and technical talents will also strengthen their love for the motherland in this atmosphere, and these changes are reflected in the changes in assessment values.

(2) The reason for taking the average value in this assessment is that the match in the second stage is a fit-for-post match and no more talent will be eliminated, so the matching pressure will be much less. Eventually, the teams will all be matched with tech talent, and the tech talent will all be matched with teams. For simpler and easier calculations, the uncertainty is ignored in the second-stage match.

(3) Reference points were set and teams of new R&D institutions were asked to provide the corresponding reference point data. The reference point data were not updated in the second stage, also for the convenience of calculation, and because the time difference between the first and second stages is only one year, the time of scientific and technological talents in each team is only three months, and the change in the reference points will not be very large, so the dynamics of the reference points were not considered in this study.

In the second stage, the reference point values are the average of the reference point values in the first stage divided by 100 normalized data, and the indicator weights and reference point weights are the same as those in the first stage. Based on the assessed values, reference point values, indicator weights, and reference point weights of the second stage of the new R&D organization team T to the scientific and technological talent S, we obtained the prospect value matrix of the second-stage match between the new R&D organization team T and the scientific and technological talent S, as shown in Table 16.

**Table 16.** Prospect values for the second stage assessment of talents S by the new R&D organization teams T.


As can be seen from the table above:

(1) After the trial of scientific and technical talents, both the satisfaction and satisfaction rates of the new R&D organization team with scientific and technical talents have changed.

(2) Team T1 of the new R&D organization is satisfied with S2, S3, and S6, but is not satisfied with S4, with a satisfaction rate of 75%, and ranks the scientific and technological talents as S3 S2 S6 S4.

(3) Team T2 of the new R&D organization is satisfied with all scientific and technological talents, with a satisfaction rate of 100%, and the ranking of scientific and technological talents is S6 S4 S2 S3.

(4) Team T3 of the new R&D organization is satisfied with S3 and S6, and is dissatisfied with S2 and S4, with a satisfaction rate of 50%, and ranks the scientific and technological talents as S6 S3 S2 S4.

(5) Team T4 of the new R&D organization is satisfied with S2, S4, S6, and is not satisfied with S3, with a satisfaction rate of 75%, and ranks the scientific and technological talents as S6 S4 S2 S3.

After the probationary period, the tech talent will be called a full staff member in the new R&D organization team; in the second stage of matching, the new R&D organization team is 50% dominant, and we need to consider the choice of technology talent. In this study, scientific and technical talents ranked the four teams, as shown in Table 17.


**Table 17.** Ranking of talents S on the second stage of the new R&D institution teams T.

Since the ordinal values cannot be matched, ordinal values are converted into satisfaction values: ordinal number 1 indicates satisfaction of 1, ordinal number 2 indicates satisfaction of 0.75, ordinal number 3 indicates satisfaction of 0.5, and ordinal number 4 indicates satisfaction of 0.25. Let satisfaction of 0.5 be the reference point of the scientific and technical talents. According to prospect theory, we obtain the prospect value of the scientific and technical talents' satisfaction with the team, as shown in Table 18.

**Table 18.** Ranking prospect values of talents S for the second stage of the new R&D organization team T.


We weighted the prospect values of the team-to-talent assessment and the prospect values of the talent-to-team assessment in the second stage to obtain the combined prospect values of the second stage match, as shown in Table 19.



As can be seen from the table above:

(1) After combining the rankings of talent degree teams, the satisfaction rate, satisfaction rate, and ranking of the bilateral match between teams of new R&D institutions and scientific and technological talents changed.

(2) Team T1 of the new R&D organization is satisfied with S3 and S6, and is dissatisfied with S2 and S4, with a satisfaction rate of 50%, and ranks the scientific and technological talents as S6 S4 S4 S2.

(3) Team T2 of the new R&D organization is satisfied with S2 and S3, and is dissatisfied with S1, S4, S5, and S6, with a satisfaction rate of 33.33%, and ranks the scientific and technological talents as S6 S4 S4 S2.

(4) Team T3 of the new R&D organization is satisfied with S2 and S4, and is dissatisfied with S1, S3, S5 and S6, with a satisfaction rate of 33.33%, and ranks the scientific and technological talents as S2 S4 S3 S6.

(5) Team T4 of the new R&D organization is satisfied with S2, S3, S4, S5 and S6, and is dissatisfied with S1, with a satisfaction rate of 83.33%, and ranks the scientific and technological talents as S2 S6 S4 S3.

The prospective values of the team's second-stage assessment of talent were substituted into the M-2 model for bilateral matching, and the results of second-stage matching

were obtained, as shown in Table 20. The overall assessment satisfaction for this match was 0.2478, and the match was stable.


**Table 20.** Results of the second stage matching of the new R&D organization teams T and the talents S.

From the above table we can see that:

(1) The results of the matching between the new R&D organization team T and the scientific and technological talent S have been completely different from the previous matching results, and the indicators of each team's assessment of the scientific and technological talent have not changed, indicating that after a period of understanding, the team's understanding of the scientific and technological talent has changed. Therefore, bilateral matching should be based on dynamic ideas so that the matching results can be more accurate.

(2) In terms of the composite prospect values, the data on the composite prospect values for Phase 2 also validated the Phase 1 match, with all reaching satisfaction, but with plenty of room for satisfaction improvement. The combined team and talent satisfaction is more balanced, with the highest satisfaction rate for tech talent S3.

#### *4.4. Comparison of Methods*

The first stage of this paper is elimination matching, where different methods eliminate talents with different results. The second stage of selection matching only evaluates and ranks the scientific and technical talents matched in the first stage; therefore, other methods cannot obtain all the data of the second stage. This paper conducts a method comparison on the first stage of matching.

Method A: The approach in this study, considering a dual historical and aspirational reference point, is based on prospect theory, considering the cost of bringing in scientific and technical talent, and considering two stages.

Method B: Based on the approach presented in this paper, only historical reference points were considered for one phase.

Method C: Based on the method presented in this paper, only the desired reference point was considered for one stage.

Method D: on the data in this study, a phase was considered without considering the cost of introducing scientific and technological talent.

Method E: on the data in this study, a stage was considered based on regret theory. Method F: on the data in this study, a phase was considered based on the grey correlation. After calculation, the results are shown in Table 21.


**Table 21.** Bilateral Matching Results of Teams and Talents in New R&D Institutions under Different Methods.

As can be seen from the table above:

(1) Under different approaches, some consider historical and desired dual reference points, some consider historical or desired single reference points, some consider introduction costs, some do not, some consider psychological factors, and some do not; thus, the results of bilateral matching are different.

(2) For the first stage of elimination matching, the scientific and technical talent eliminated is different under different methods, so the program still has a greater impact on the outcome of the elimination.

(3) Among the six different methods, the probability of being eliminated for scientific and technological talents S1, S3, and S6 was 50%, the probability of being eliminated for scientific and technological talents S5 was 33.33%, the probability of being eliminated for scientific and technological talents S5 was 33.33%, and the probability of being eliminated for scientific and technological talents S4 was 16.67%. Therefore, for scientific and technological talents, the higher the probability of being eliminated, the higher the risk, and it is necessary to find the direction of improvement from the personalized assessment index of the team of new R&D institutions, find the deficiency, and make efforts to improve so as to enhance their scientific and technological capabilities.

#### **5. Conclusions**

In summary, the following conclusions can be drawn.

First, this study characterizes the uncertain new R&D organization team's assessed value of scientific and technological talents and the cost of introducing scientific and technological talents by interval grey numbers, which shows the characteristics of multi-expert group decision-making of new R&D organization teams, inconsistent expert backgrounds and preferences, and the existence of an adjustment interval of talent policy, which is closer to reality.

Second, this study considers personalized assessment indicators for teams in new R&D institutions, because there are different teams in new R&D institutions, and different teams do not have exactly the same requirements for scientific and technological talents. Although the teams belong to the same new R&D institution, the focus of work is not quite the same, some focus more on market research, some focus more on R&D innovation, and some focus more on communication and collaboration. Therefore, this study provides a general framework of assessment metrics and allows teams to select individualized assessment metrics under the general framework, that is, some metrics will be selected and some will not, but the selected metrics must come from this metric framework. Different teams may choose different assessment metrics that are both relevant and easy to use.

Third, this study sets a dual reference point of historical reference point and desired reference point in the team assessment of talent, and a bottom-line reference point in the agreed cost of introduction of the team and talent, applying prospect theory based on the reference point to turn the assessed value and cost value into prospect value, fully considering people's psychology, and no longer comparing the absolute value of assessed value or cost value but comparing the relative value based on the reference point.

Fourth, the bilateral matching of teams and scientific and technological talents in new R&D institutions is divided into two stages, the first stage is elimination matching and the second stage is selection matching, both stages are many-to-many matching, but in the elimination matching of the first stage, all teams are matched and not all scientific and technological talents are matched, more scientific and technological talents than the number of teams will be eliminated. In the second stage of selection matching, there are equal numbers of teams and talents. Considering these two stages shows the idea of a dynamic assessment.

Fifth, in the first stage of elimination matching, both the team's assessed value of talent and the cost value of scientific and technological talent are considered. The larger the assessed value, the better the lower the cost value, so that the scientific and technological talent matched out by forming dual objectives is in line with the principle of maximizing benefits and will not be too costly, and the assessment factor and cost factor can be adjusted by weighting.

Sixth, in the second stage of selection matching, both the team's assessment value of the talent and the talent's ranking of the team are considered. The larger the assessment value, the better, and the smaller the ranking, the better, so as to form a double target matching out of the scientific and technological talent. The satisfaction of both sides is not too low, fully considering the psychological factors of both the team and the talent matching, and the assessment factor and the ranking factor are adjustable by weighting.

In summary, this study considers two stages of team and talent matching in new R&D organizations. The first considers the team's assessment of talent and the cost of talent introduction; the assessment considers historical reference points and desired reference points, the cost considers bottom-line reference points, and combines prospect theory with high assessment value and low cost to optimize the solution. The second stage considers the team's evaluation of the talent and the talent's satisfaction with the team. The evaluation still considers the historical reference point and the desired reference point, which considers the bottom-line reference point. This optimizes the solution based on high evaluation value and high satisfaction in combination with prospect theory, fully considering the psychological factors of decision makers. In real life, assessment values, ranking values, reference point values, etc., are decided by multiple participants, so these values mostly have uncertainty. The interval grey number possesses an independent algorithm that characterizes uncertainty but is simple to calculate. The method in this paper solves the problems of high cost and low satisfaction of talent introduction raised by HR directors of new R&D organizations and greatly improves the efficiency of team and talent matching. New R&D organization teams should comply with market changes, update personalized assessment indicators, dynamically assess scientific and technical talent, continuously build a reasonable talent ladder, identify talent deficiencies and train them, and adjust talent policies to promote output. Scientific and technological talent should strive to cope with the matching situation of the team of new R&D institutions, adjust their efforts according to the team's personalized assessment indicators in a timely manner, conduct scientific and technological work with the team's goals as the focus, create more scientific and technological achievements, and realize their self-worth. Since there is not much data acquisition, the author has selected only one new R&D institution for method calculation. The sample size is still small; therefore, it has some limitations. The current study only considers factors such as team's evaluation of talent, talent's cost, and talent's satisfaction with the team, which has some limitations. Later on, we can combine government factors, leadership factors of teams, self-motivation factors of talents, and project-based management factors with this study. The current study only considered static reference points. In the future, the author will also conduct research on bilateral matching of team and scientific and technological talents in new R&D institutions based on government supervision, re-research on bilateral matching of team and scientific and technological talents in new R&D institutions based on dynamic reference points, and research on trilateral matching of team-project-talent in new R&D institutions based on reference points.

**Author Contributions:** Conceptualization, L.J.; Methodology, L.J.; Writing—original draft, L.J.; Writing—review & editing, B.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** 1. The National Natural Science Foundation of China (NSFC) project: A method and application of dynamic emergency group decision-making driven by ternary group intelligence information interaction for public emergencies, Project No. 72071106. 2. Jiangsu Province Education Reform and Development Strategic and Policy Research Major Project: Jiangsu high-level teacher team construction research, Project No. 202000206. 3. Project of Jiangsu Provincial Education Department: Research on talent management innovation of universities based on big data in the context of "double first-class," Project No. 2019SJA1734.

**Institutional Review Board Statement:** There are no human ethical aspects to this study.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** The author obtained data from HR managers of the new R&D organization T for two phases of two-way team and talent assessment. For the new R&D organization T, these data are not publicly available and are not published in a publicly available dataset. These data are presented in the article. The focus of this paper is to demonstrate the validity of the method by simulating the data, and the data from the computation process are shown in detail in the paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Comparative Estimation of Electrical Characteristics of a Photovoltaic Module Using Regression and Artificial Neural Network Models**

**Jonghwan Lee and Yongwoo Kim \***

Department of System Semiconductor Engineering, Sangmyung University, Cheonan 31066, Republic of Korea **\*** Correspondence: yongwoo.kim@smu.ac.kr

**Abstract:** Accurate modeling of photovoltaic (PV) modules under outdoor conditions is essential to facilitate the optimal design and assessment of PV systems. As an alternative model to the translation equations based on regression methods, various data-driven models have been adopted to estimate the current–voltage (I–V) characteristics of a photovoltaic module under varying operation conditions. In this paper, artificial neural network (ANN) models are compared with the regression models for five parameters of a single diode solar cell. In the configuration of the proposed PV models, the five parameters are predicted by regression and neural network models, and these parameters are put into an explicit expression such as the Lambert W function. The multivariate regression parameters are determined by using the least square method (LSM). The ANN model is constructed by using a four-layer, feed-forward neural network, in which the inputs are temperature and solar irradiance, and the outputs are the five parameters. By training an experimental dataset, the ANN model is built and utilized to predict the five parameters by reading the temperature and solar irradiance. The performance of the regression and ANN models is evaluated by using root mean squared error (RMSE) and mean absolute percentage error (MAPE). A comparative study of the regression and ANN models shows that the performance of the ANN models is better than the regression models.

**Keywords:** regression; artificial neural network; I–V characteristics; photovoltaic module

#### **1. Introduction**

The output power of photovoltaic (PV) systems is strongly affected under arbitrary operating conditions such as temperature and solar irradiance of PV modules [1,2]. However, highly predictive and efficient models across different temperatures and irradiances have not been established [3–6]. In addition, their nonlinear characteristics make highly predictive modeling even more difficult [7–13]. The single-diode model (SDM) with five parameters is widely utilized to reproduce the current–voltage (I–V) characteristics [5–8]. Owing to the inherent implicit expression for the electrical equivalent circuit of the SDM, analytical and explicit I–V models have been proposed to calculate the I–V relationship [1,14–16]. The explicit I–V model based on the Lambert W function is simple and efficient, while the implicit model requires more computational time [14–16]. Although optimization methods have been proposed to obtain the five parameters at the standard test condition (STC), significant extraction efforts are required to consider the dependence of unknown parameters on temperature and solar irradiance [3–8,17–21]. For arbitrary operating conditions, the performance of the parameter translation model is greatly limited by the chosen translation equation and correction factors [13,17–21]. In order to construct a complete PV model for climatic conditions, the translational formula should be further modified [17,19] and new parameters may need to be taken into account [18,20,21]. Moreover, the accuracy of the translational formula varies significantly at low irradiance levels [3–5]. However, artificial neural network (ANN) models provide parameter identification, I–V prediction with higher accuracy directly from the measured data [22,23], and fault detection and diagnosis for

**Citation:** Lee, J.; Kim, Y. Comparative Estimation of Electrical Characteristics of a Photovoltaic Module Using Regression and Artificial Neural Network Models. *Electronics* **2022**, *11*, 4228. https:// doi.org/10.3390/electronics11244228

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 21 November 2022 Accepted: 16 December 2022 Published: 19 December 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

photovoltaic systems [24–26]. To accurately estimate the performance of PV modules under varying operating conditions, it is necessary to establish a data-driven model for the five parameters with change in irradiance and temperature [27–37]. In recent years, improved ANN models have been proposed by adding more variables [38] and utilizing the efficient training schemes and processing of neural networks [39–42]. This paper compares the performance of regression and ANN models for the five parameters in predicting the I–V relationship of a PV module based on an explicit expression. The results show that the ANN model provides better performance than the regression model. The novelty of the proposed approach lies primarily in the successful integration of comparative models into an analytical and explicit Lambert W function, in contrast with the previous practice for the electrical equivalent circuits. (1) In this new framework, temperature and solar irradiance serve as inputs to establish the regression and ANN models for I–V prediction under arbitrary operating conditions. (2) An advanced ANN model for the five parameters is developed by determining an optimum ANN architecture to improve the estimation of the model. The ANN model developed can provide an efficient method with higher accuracy in predicting I–V characteristics, compared to the regression model.

In this work, the modeling process begins in Section 2 with the theoretical formulation of an explicit I–V model and translation equations for the five parameters. Section 3 describes the regression and ANN models, followed in Section 4 by a comparative validation of both the models against the experimental data for a PV module. In Section 5, the main conclusions are drawn.

#### **2. Theoretical Models**

#### *2.1. Explicit and Analytical I–V Model*

The PV-equivalent circuit of a single diode with two resistors is shown in Figure 1. The I–V relationship of a PV module can be expressed with a single diode as [1,2,5–8]:

**Figure 1.** PV-equivalent circuit of a single diode with series and parallel resistance at arbitrary irradiance (*G*) and temperature (*T*).

$$I = I\_{ph} - I\_0 \left[ \exp\left(\frac{V + IR\_s}{nV\_t}\right) - 1\right] - \frac{V + IR\_s}{R\_p} \tag{1}$$

where *Iph* is the photogenerated current, *I*<sup>0</sup> is the diode reverse saturation current, *n* is the ideality factor, and *Rs* and *Rp* is the series and parallel resistance, respectively. The thermal voltage is given by *Vt* = *NskbT*/*q*, where *Ns* is the number of series-connected cells, *kb* is the Boltzmann constant, and *T* is the temperature. The explicit solution of the PV module transcendent Equation (1) is given as a function of the Lambert *W* function [1,14–16]:

$$I = \frac{R\_p \left(I\_{\rm pl} + I\_0\right) - V}{R\_s + R\_p} - \frac{nV\_t}{R\_s} W(a(V)) \tag{2}$$

$$n(V) = \frac{R\_{\mathfrak{s}} R\_{\mathfrak{p}} I\_0}{n V\_{\mathfrak{t}} \left(R\_{\mathfrak{s}} + R\_{\mathfrak{p}}\right)} e^{\frac{R\_{\mathfrak{p}} \left(R\_{\mathfrak{s}} I\_{ph} + R\_{\mathfrak{s}} I\_0 + V\right)}{n V\_{\mathfrak{t}} \left(R\_{\mathfrak{s}} + R\_{\mathfrak{p}}\right)}}\tag{3}$$

#### *2.2. Five Parameters as a Function of Temperature and Solar Irradiance*

Environmental conditions such as temperature (*T*) and solar irradiance (*G*) have strong effects on the I–V characteristics of the PV module. In order to extract accurate estimates of the model parameters under arbitrary *T* and *G*, mathematical expressions for the five parameters are reformulated by using the advantage of the previous formula [1,2,5–8]. The I–V curve translation for desired solar irradiance and temperature (*G*, *T*) from STC (*G*0,*T*0) is obtained by using the short circuit current (*Isc*) and the open circuit voltage (*Voc*). Assuming the condition *Isc* ≈ *Iph*, *Isc*(*G*, *T*) and *Voc*(*G*, *T*) are determined as [1,2,5–8]:

$$I\_{\rm sc}(G, T) \approx I\_{\rm ph}(G, T) = \left(\frac{G}{G\_0}\right) \left[I\_{\rm ph0} + \mathfrak{a}\_i(T - T\_0)\right] \tag{4}$$

$$V\_{\rm oc}(G, T) = V\_{\rm ac0} \left[ 1 + \alpha\_v (T - T\_0) + \beta\_v V\_t \ln \left( \frac{G}{G\_0} \right) \right] \tag{5}$$

where *Iph*<sup>0</sup> and *Voc*<sup>0</sup> are the photogenerated current and open circuit voltage at standard test conditions, respectively; *α<sup>i</sup>* and *α<sup>v</sup>* are temperature coefficients and *β<sup>v</sup>* is an irradiance coefficient. From the relationship *n* = *n*0(*Voc*/*Voc*0) and *Rs*,*<sup>p</sup>* = *Rs*0,*p*0(*Voc*/*Voc*0)(*Isc*0/*Isc*), the values of the translated parameters *n*(*G*, *T*), *Rs*(*G*, *T*), and *Rp*(*G*, *T*) are calculated as follows [1,2,7]:

$$n(G, T) = n\_0 \left[ 1 + \kappa\_n (T - T\_0) + \beta\_n V\_t \ln \left( \frac{G}{G\_0} \right) \right] \tag{6}$$

$$R\_s(G, T) = R\_{s0} \frac{1 + a\_{R\_s}(T - T\_0) + \beta\_{R\_s} V\_t \ln\left(\frac{G}{G\_0}\right)}{\left(\frac{G}{G\_0}\right) \left[1 + a\_{R\_s}^\*(T - T\_0)\right]}\tag{7}$$

$$R\_p(G, T) = R\_{p0} \frac{1 + \alpha\_{R\_p}(T - T\_0) + \beta\_{R\_p} V\_t \ln\left(\frac{G}{G\_0}\right)}{\left(\frac{G}{G\_0}\right) \left[1 + \alpha\_{R\_p}^\*(T - T\_0)\right]}\tag{8}$$

where *n*0, *Rs*0, and *Rp*<sup>0</sup> are the ideality factor, series resistance, and parallel resistance at standard test conditions, respectively; *αn*, *αRs*, *α*<sup>∗</sup> *Rs* , *αRp* , and *α*<sup>∗</sup> *Rp* are temperature coefficients and *βn*, *βRs*, and *βRp* are irradiance coefficients. By setting *I* = 0 in (1), the translation expression for the reverse saturation current *I*0(*G*, *T*) as a function of *Iph*(*G*, *T*), *Voc*(*G*, *T*), *n*(*G*, *T*), and *Rs*,*p*(*G*, *T*) is obtained as [1,2,7]:

$$I\_0(G, T) = \frac{I\_{ph}(G, T) - \frac{V\_{oc}(G, T)}{R\_p(G, T)}}{\exp\left(\frac{V\_{oc}(G, T)}{n(G, T)}\right) - 1} \tag{9}$$

#### **3. Parameter Identification Approaches**

#### *3.1. Multiple Regression*

The multivariate regression analysis is employed to model the statistical relationship between inputs (*G*, *T*) and outputs (*Iph*,*n*,*Rs*,*Rp*,*I*0). The parametric regression equation for a linear or nonlinear function *f* is expressed as [43–45]:

$$y = f(X\_1, \dots, X\_n; \theta\_1, \dots, \theta\_m) + \varepsilon \tag{10}$$

where *y* is an *n* × 1 vector of dependent variable, *X*1, ... , *Xn* are an *n* × *m* matrix of independent variables, *θ*1, ... , *θ<sup>m</sup>* are an *m* × 1 vector of regression parameters, and *ε* is an *n* × 1 vector of random error. The regression parameters *θ*1, ... , *θ<sup>m</sup>* are usually determined using the least square method (LSM) to minimize *k* ∑ *i*=1 (*yi* <sup>−</sup> *<sup>f</sup>*(*X*1*i*,..., *Xni*; *<sup>θ</sup>*1,..., *<sup>θ</sup>m*))<sup>2</sup> with different sample points *k*. Based on the estimated regression parameters, the optimum regression model is chosen for prediction.

#### *3.2. Artificial Neural Network (ANN)*

The artificial neural network is specified in modeling complex systems, especially nonlinear or random variable systems. The multilayer perceptron (MLP), known as the fully connected feed-forward network for supervised learning, is the most common and successful for modeling nonlinear systems [22,23,31]. The MLP network configuration has an input layer, two hidden layers, and an output layer. The input layer consists of two neurons (*G*, *T*), and the output layer contains five neurons (*Iph*,*n*,*Rs*,*Rp*,*I*0). Every neuron in one layer is fully connected to every neuron in the next layer. By using the activation function of a hyperbolic tangent sigmoid function for N neurons, the output *h* (*k*) *<sup>i</sup>* of the ith neuron in the *k*th hidden layer is computed as follows [22,23]:

$$h\_i^{(1)} = \tanh\left(\sum\_{j=1}^2 w\_{ij}^{(1)} x\_j + b\_i^{(1)}\right) \tag{11}$$

$$h\_i^{(2)} = \tanh\left(\sum\_{j=1}^N w\_{ij}^{(2)} h\_j^{(1)} + b\_i^{(2)}\right) \tag{12}$$

where *xj* is the *<sup>j</sup>*th input to the neuron, *<sup>w</sup>*(*k*) *ij* is the weight for the ith neuron and jth input in the kth hidden layer, and *b* (*k*) *<sup>i</sup>* is the bias for the ith neuron in the kth hidden layer. With the use of a linear activation function for neurons in the output layer, the network's output can be written as

$$y\_i = \sum\_{j=1}^{N} w\_{ij}^{(3)} h\_j^{(2)} + b\_i^{(3)} \tag{13}$$

In the vectorized form with a weight matrix *W*(*k*) , an activation vector *h*(*k*) , and a bias vector *b*(*k*) , the network's computations are given by [22,23]:

$$\mathcal{W}^{(1)} = \tanh\left(\mathcal{W}^{(1)}\mathbf{x} + \mathbf{b}^{(1)}\right) \tag{14}$$

$$\mathfrak{h}^{(2)} = \tanh\left(\mathfrak{W}^{(2)}\mathfrak{h}^{(1)} + \mathfrak{b}^{(2)}\right) \tag{15}$$

$$y = \mathbf{W}^{(3)}\mathbf{h}^{(2)} + \mathbf{b}^{(3)} \tag{16}$$

where *x* = [*G T*] *<sup>T</sup>* is the transpose of the input vector *x*. The neural network is trained by using the Levenberg–Marquardt algorithm, a method used extensively for learning a feed-forward network, to realize the rapid correction of network weights and biases. Since the initial value of weight and bias affects the training results, the neural network can be retrained several times to obtain a neural network with excellent universality. The configuration of the proposed model is summarized in Figure 2. The five parameters are predicted by a MLP neural network, and these parameters are put into the Lambert W function.

**Figure 2.** Configuration of the proposed photovoltaic model.

#### **4. Model Verification**

The validation of the regression and ANN models for the five parameters (*Iph*,*n*,*Rs*,*Rp*,*I*0) is assessed by using the experimental data of the monocrystalline SM55 PV panel(*Ns* = 36) [1]. The five parameters are extracted by using the quality factor variation method from the manufacturer's data sheet [1,17]. After determining the parameters at standard test conditions (*G*<sup>0</sup> = 1000 W/m2, *T*<sup>0</sup> = 298 K), the procedure is applied to estimate variations in the five parameters for different temperature levels ( *T* = 298 ∼ 343 K) and solar irradiance levels 1 *<sup>G</sup>* <sup>=</sup> <sup>200</sup> <sup>∼</sup> 1000 W/m<sup>2</sup> 2 . The database obtained from the procedure is used to develop the multiple regression models for *Iph*(*G*, *T*), *n*(*G*, *T*), *Rs*(*G*, *T*), *Rs*(*G*, *T*), and *I*0(*G*, *T*).

Table 1 shows the regression models and coefficient of determination *R*<sup>2</sup> estimated by using Equations (4)–(9). As shown in Table 1, the regression models have a high coefficient of determination (*R*<sup>2</sup> <sup>=</sup> 0.9692 <sup>∼</sup> 1.000), and *<sup>I</sup>*0(*G*, *<sup>T</sup>*) is obtained by other parameters with a high coefficient of determination.


**Table 1.** Regression model and *R*<sup>2</sup> for parameters estimated from experimental data.

The best ANN model has an input layer with two variables, two hidden layers with five neurons in each layer, and an output layer with five variables (2-5-5-5 topology). Logarithm data preprocessing is used to improve the ANN model accuracy for the reverse saturation current *I*0(*G*, *T*). For training the ANN model, the offline method is utilized to generate the dataset for the five parameters of the PV panel, which is extracted from manufacturer's data sheet. Figure 3 shows the dependence of five parameters on temperature and solar irradiance for the regression model and ANN model. The correlation coefficients are employed to evaluate the accuracy of the ANN model for the five parameters, including the training, validation, and testing phases. As a result, the correlation coefficients with values greater than 99.85% were observed between the predicted and measured data in all network phases.

**Figure 3.** *Cont*.

**Figure 3.** Dependence of the five parameters on temperature and solar irradiance for the regression model (**a**–**e**) and ANN model (**f**–**j**).

As plotted in Figure 4, two different statistical metrics based on the measured and estimated five parameters are employed to compare the accuracy of the ANN model with the regression model, including root mean squared error (RMSE) and mean absolute percentage error (MAPE), as follows [14,15]:

$$RMSE = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} (y\_{m,i} - y\_{c,i})^2} \tag{17}$$

$$MAPE = \frac{1}{n} \sum\_{i=1}^{n} \left| \frac{y\_{m,i} - y\_{c,i}}{y\_{m,i}} \right| \times 100 \text{ (\%)}\tag{18}$$

where *ym*,*<sup>i</sup>* and *ye*,*<sup>i</sup>* are the measured and estimated values, respectively. Both RMSE and MAPE values of the ANN models are lower than those of the regression models. This is attributed to the strong capability of MLP–ANN models to learn the nonlinear relationship between the inputs and the outputs, whereas the regression models may be limited to a specific condition. Figure 5a,b show I–V characteristics and the absolute error by simulated and experimental data at different irradiances and temperatures, respectively. It is found that the absolute error values of the ANN models are lower than the regression models, resulting from the better performance of the ANN models for the five parameters shown in Figure 3. Table 2 shows the comparison of maximum absolute errors of the I–V curves estimated from the proposed ANN model and different models for the SM55 PV panel. As can be seen, the maximum absolute errors of the proposed ANN model are much lower than the other models for different irradiances and temperatures. These results justify the higher accuracy of the ANN models, compared with other works [1,4,46].

**Figure 4.** Statistical metrics for the five parameters in evaluating the accuracy of the regression model and ANN model: (**a**) RMSE and (**b**) MAPE.

**Figure 5.** I–V curves and absolute error by simulated and experimental data (**a**) at different irradiances and (**b**) at different temperatures.


**Table 2.** Comparison of maximum absolute errors estimated from different models.

#### **5. Conclusions**

The electrical characteristics of a PV module under arbitrary operating conditions have been estimated by using the regression and ANN models. The models are utilized to predict the five parameters of a single diode solar cell, and the parameters are combined with an explicit equation for I–V characteristics. The inputs of the regression and ANN models are temperature and solar irradiance, while the outputs are the five parameters. The dataset needed for the five parameters was extracted from manufacturer's data sheet and used to construct the regression and ANN models. The best neural network architecture had a 2-5-5-5 topology for the five parameters, leading to correlation coefficients with values greater than 99.85%. Both the RMSE and MAPE values of the ANN models were found to be lower than those of the regression models. With comparative results, the ANN models show better performance than the regression models in predicting I–V characteristics under varying temperature and solar irradiance. It is applicable to extend the higher capability of ANN models to the prediction of electrical characteristics for diverse solar cells under actual weather conditions.

**Author Contributions:** Conceptualization, J.L. and Y.K.; methodology, J.L.; software, Y.K.; validation, J.L. and Y.K.; writing—original draft preparation, J.L.; writing—review and editing, J.L. and Y.K.; visualization, J.L.; supervision, Y.K.; project administration, Y.K.; funding acquisition, Y.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work is funded by a 2020 research grant from Sangmyung University.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Prediction of Solid Conversion Process in Direct Reduction Iron Oxide Using Machine Learning**

**Masih Hosseinzadeh 1, Hossein Mashhadimoslem 1,2, Farid Maleki <sup>3</sup> and Ali Elkamel 2,4,\***


**Abstract:** The direct reduction process has been developed and investigated in recent years due to less pollution than other methods. In this work, the first direct reduction iron oxide (DRI) modeling has been developed using artificial neural networks (ANN) algorithms such as the multilayer perceptron (MLP) and radial basis function (RBF) models. A DRI operation takes place inside the shaft furnace. A shaft furnace reactor is a gas-solid reactor that transforms iron oxide particles into sponge iron. Because of its low environmental pollution, the MIDREX process, one of the DRI procedures, has received much attention in recent years. The main purpose of the shaft furnace is to achieve the desired percentage of solid conversion output from the furnace. The network parameters were optimized, and an algorithm was developed to achieve an optimum NN model. The results showed that the MLP network has a minimum squared error (MSE) of 8.95 <sup>×</sup> <sup>10</sup><sup>−</sup>6, which is the lowest error compared to the RBF network model. The purpose of the study was to identify the shaft furnace solid conversion using machine learning methods without solving nonlinear equations. Another advantage of this research is that the running speed is 3.5 times the speed of mathematical modeling.

**Keywords:** direct reduction; MIDREX; neural network; optimization; algorithm; modeling

#### **1. Introduction**

Direct reduction of iron oxide (DRI) is one of the most important non-catalytic gassolid reactions in industry, and it continues to be an important field of study in chemical engineering [1,2]. The MIDREX process, which is one of the direct-reduction technologies, has received a lot of interest because it is a great technology for considerably reducing carbon dioxide (CO2) emissions from steel plants [3,4]. This is primarily accomplished by using natural gas instead of coke or coal [5]. Several approaches were used to develop these solutions, whose overview is provided in Figure 1.

**Figure 1.** Direct reduction methods have been developed extensively [6].

**Citation:** Hosseinzadeh, M.; Mashhadimoslem, H.; Maleki, F.; Elkamel, A. Prediction of Solid Conversion Process in Direct Reduction Iron Oxide Using Machine Learning. *Energies* **2022**, *15*, 9276. https://doi.org/10.3390/ en15249276

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 7 November 2022 Accepted: 4 December 2022 Published: 7 December 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Despite the global COVID-19 pandemic, global DRI output in 2020 is 104.4 million tonnes which has a 3.4% decrease compared with the previous year's record of 108.1 million tonnes. India and Iran produced about half of the world's DRI [7].

The shaft furnace, reformer, and recuperator are the three main parts of the Midrex process, of which the shaft furnace is the most important. Within the shaft furnace, reduction processes take place, and iron oxide turns into sponge iron. Researchers have recently worked to regenerate hydrogen and develop the new MIDREX process design. Pimm et al. improved the MIDREX process to use renewable energies to satisfy the energy needs of the revised MIDREX process and the hydrogen-based MIDREX unit. According to Rechberger et al.'s research, the carbon footprint of the power used to manufacture hydrogen has a significant impact on the amount of potential that the hydrogen-based pathway offers for environmentally friendly steelmaking [8,9].

Figure 2 indicates direct reduction processes for the production of sponge iron which uses natural gas as the major reducing agent. Today these processes provide for more than 70% of the overall production of DRI and hot briquetted iron (HBI). Natural gas is transformed into reducing agents, mostly carbon monoxide and hydrogen, which operate as iron oxide reducers [6]. The shaft furnace is divided into three main parts: (i) reduction zone, (ii) transition zone, and (iii) cooling zone. The most fundamental part of the shaft furnace is the place where reduction occurs. Therefore, most of the modeling has been conducted around this area. The unreacted shrinking core model (USCM) is an assumption adopted by the majority of prior simulations at the pellet scale [10–12]. Furthermore, some modeled direct reduction reactors in industrial units use this model and achieved desirable results [13,14]. Nevertheless, the grain model can be better than the USCM at predicting plant data [15].

**Figure 2.** World DRI Production report (2020) using different technologies [7].

Hamadeh et al. assumed that the shaft furnace had pellets made of grains and crystals [4]. In some reactor models, only one reductant gas is used, such as pure H2 gas [12,16–18], pure CO gas [19,20], and H2 and CO mixtures [21,22]. In a real shaft furnace, the reducing gas is a combination of H2, CO, H2O, CH4, and CO2 [4]. Modeling and simulation of industrial direct reduction furnaces have been performed by numerical solutions and computational fluid dynamics (CFD), which are summarized in Table 1. Additionally, some notable non-industrial modeling is given in Table 2.

It is employed at research centers today to comprehend the uses of machine learning (ML) in both the present and the future of energy systems. One of the most effective techniques employed in a variety of industrial fields is the deployment of ML algorithms. Because hybrid approaches take advantage of two or more ways to make an accurate forecast, they sometimes yield greater results than a single method. In light of this, it is advised to use hybrid ML strategies in the future [23–25].









The reduction zone is a part of the top of the furnace. The followings are the main chemical reactions that take place in the reduction zone [13,15]:

$$\text{2Fe}\_2\text{O}\_3 + 3\text{H}\_2 \to 2\text{Fe} + 3\text{H}\_2\text{O} \tag{1}$$

$$\text{Fe}\_2\text{O}\_3 + 3\text{CO} \rightarrow 2\text{Fe} + 3\text{CO}\_2\tag{2}$$

Before recirculation to another usage, a wet scrubber cleans and cools the gas discharged from the top of the furnace shaft. A compressor pressurizes the top gas, which contains CO2 and H2O, before mixing it with natural gas, preheating, and feeding it into a reformer furnace. Hundreds of reformer tubes that are filled with a nickel catalyst are installed in the reformer furnace. The mixture of the top gas and natural gas is reformed in these tubes to produce reductant gas, which consists of carbon monoxide and hydrogen. The following is the reaction that takes place in the reformer tubes [45]:

$$\text{CH}\_4 + \text{H}\_2\text{O} \rightarrow \text{CO} + \text{3H}\_2\tag{3}$$

$$\text{CH}\_4 + \text{CO}\_2 \rightarrow 2\text{CO} + 2\text{H}\_2 \tag{4}$$

After ongoing investigations on modeling, Parisi et al. developed a network for stable solid-state heterogeneous reactors by using neural networks for steam reformers that were able to respond 20 times faster than the numerical model [46]. In another research, a tubular reactor with a fixed bed full of porous pellets was developed isothermally by using an unsupervised grid with an accuracy of approximately 10−<sup>9</sup> [47]. The use of ML approaches in the simulation of the shaft furnace to estimate the conversion rate of pellets for making sponge iron can overcome the challenges of using nonlinear modeling and is one of the most important achievements of this research. Due to the complexity of pellet behavior, the difficulty of modeling, and the precise prediction of pellet behavior, a new model has been proposed in this study by using an artificial neural network (ANN). Consequently, Figure 3 presents four fundamental models for the DRI process. The low error value in the ANN method, and the complexity of mathematical modeling caused by the ANN technique, make it interesting.

**Figure 3.** Models developed for pellets in shaft furnaces [3,17–19,30–34,39].

The purpose of this research is to investigate industrial units. Since various industrial data are either non-existent or limited, it has been tried to construct the network according to the simulation results that were in high conformity with the industrial data. In this investigation, modeling was conducted using the four industrial units' real data, and MLP and RBF networks were built [13–15]. Based on the MLP network structure, Several optimization techniques tuned it after determining the optimal number of neurons in the hidden layers.

#### **2. Numerical Modeling**

As shown in Figures 3 and 4, to understand the DRI modeling process, a control volume would be positioned in the shaft furnace's cylindrical point for modeling the reduction zone, heat transfer, and mass transfer equations.

Firstly, the extent of the reaction should be defined to derive the balances around the element:

$$X\_1 = \mathbf{C}\_{\mathbf{H}\_2}^\diamond - \mathbf{C}\_{\mathbf{H}\_2} \tag{5}$$

$$X\_2 = \mathcal{C}\_{\text{CO}}^0 - \mathcal{C}\_{\text{CO}} \tag{6}$$

$$X\_{rs} = X\_1 + X\_2 = \mathfrak{Z}(\mathsf{C}\_{\mathrm{Fe\_2O\_3}}^{\bullet} - \mathsf{C}\_{\mathrm{Fe\_2O\_3}}) = X\_3 \tag{7}$$

This value for the solid and gas phases has been concluded based on the stochiometric coefficients, mass, and energy balance as follows [13,15]:

$$
\mu\_{\S} \frac{dX\_1}{dz} + R\_{r1}(X\_1, X\_3) = 0 \tag{8}
$$

$$
\mu\_{\mathcal{S}} \frac{dX\_2}{dz} + R\_{r2}(X\_2, X\_3) = 0 \tag{9}
$$

$$
\mu\_s \frac{dX\_3}{dz} + \left( R\_{r1}(X\_1, X\_3) + R\_{r2}(X\_2, X\_3) \right) = 0 \tag{10}
$$

$$\frac{dT\_{\mathcal{S}}}{dz} - \frac{n\_p A\_p h(T\_s - T\_{\mathcal{S}})}{G\_{\text{mg}} \mathbb{C}\_{p\mathcal{S}}(X\_1, X\_2, T\_{\mathcal{S}})} = 0 \tag{11}$$

$$\frac{dT\_s}{dz} - \frac{\left[n\_p A\_p h (T\_\% - T\_s) - \sum\_{i=1}^2 \Delta H\_i(T\_s) R\_{ri}(X\_i, X\_3, T\_s)\right]}{G\_{ms}(X\_1) C\_{ps}(X\_3, T\_s)} = 0\tag{12}$$

where *ug* is gas velocity, us is solid velocity, *Rr*<sup>1</sup> and *Rr*<sup>2</sup> are first and second reaction rate, *np* is the quantity of pellets per unit of bed volume, *AP* is pellet external area, *Tg* and *Ts* are gas and solid temperature, *Gmg* is gas molar flow, *Cpg* is heat capacity gas, h is heat transfer coefficient, and ΔH is reaction enthalpy.

The aforementioned mathematical modeling of the moving bed direct reduction reactor yields a set of nonlinear ordinary differential equations that can be solved by numerical methods such as Runge–Kutta and the shooting technique [13,15]. Several researchers have made the assumption that, since the radius of the reactor is 200–250 times greater than the pellet diameter, porosity fluctuations of the bed are disregarded [48–51].

#### *Case Studies*

Gilmore, Siderca, Mobarakeh, and Khorasan are industrial units with a comprehensive dataset on network inputs, as shown in Table 3. Using these four industrial units, 200 samples were extracted. Due to insufficient parameters, the dataset of other shaft furnaces could not be used. Developing a new general model using mentioned research dataset from Figure 5 has been considered dimensionless. We employed simulation results from previous research projects as a result of mathematical modeling for feed data. The data in Table 3, in which the error value of mathematical modeling is provided from previous research efforts (Relative Error).

**Figure 5.** Different parameters in shaft furnaces at different plants.

The following variables have been selected as network input parameters based on the effective parameters such as dependent and independent in direct reduction simulation to achieve the percentage of X3:

(i) dimensionless temperature of the gas and solid, (ii) percentage of gas entering the furnace, (iii) length-to-diameter ratio of the furnace. The network output is also investigated as a percentage of X3, which is practical for the calculation of the degree of metallization (MD) shown in Equation (13) as a key output parameter [30].

$$\text{MD}(\%) = \frac{\text{Fe}}{\text{Total Fe}(\text{Fe} + \text{FeO})} \times 100 \tag{13}$$



**Table3.**Modelparametersrangeforthecurrentstudy.

#### **3. Artificial Neural Network (ANN)**

This work aims to examine government machine learning (ML) strategies for addressing existing difficulties in DRI performance. Additionally, the optimization of machine learning (ML) is a promising method that quickly gains attraction in a variety of fields, such as medicine [52] and engineering [53–56].

Supervised learning is generally the task of machine learning and learning a function that maps input to output data from sample input–output pairs [57]. During the training process, information is added to the network validation, and the result data performs the network testing process. Training for the network is concluded when generalizations have improved. A variety of active functions have been used to determine the best one. Logistics, Relu, and identity were used to obtain the optimal function of the MLP network. During network training, the predicted network error should be maintained to a minimum for each step of the mean square error (MSE) in each iteration in order to determine the precise network parameter values. The MSE, square of the correlation coefficient (*R*2), and root mean square error (RMSE) are used as assessment metrics to relate the model outputs to the validation dataset. MSE, RMSE, and *R*<sup>2</sup> are calculated as follows [58–60]:

$$\text{MSE} = \frac{1}{n} \sum\_{i=1}^{n} \left( Y\_{predicted} - \chi\_{actual} \right)^{2} \tag{14}$$

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} \left(\chi\_{predicted} - \chi\_{actual}\right)^2} \tag{15}$$

$$R^2 = \frac{\sum\_{i=1}^{n} \left(\chi\_{predicted} - \chi\_{actual}\right)^2}{\sum\_{i=1}^{n} \left(\chi\_{predicted} - \chi\_{mean}\right)^2} \tag{16}$$

Three types of algorithms, stochastic gradient descent (*SGD*) Equation (17) [61], adaptive moment estimation (*Adam*) Equation (18) [62], and Broyden–Fletcher–Goldfarb–Shanno (*BFGS*) Equation (19) [63], were applied to find the best ANN algorithm which has the lowest MSE, RMSE, and highest R2. The basic concept of the mentioned algorithms is as follows:

$$w(k+1) = w(k) - \eta \frac{\partial E(k)}{\partial w(k)} + m \cdot (w(k) - w(k-1)) \tag{17}$$

where *w* represents the weight factor, *k* represents the position vector, *η* is the learning rate, *E* represents the cost function, and m represents the first moment.

$$w(k+1) = w(k) - \eta(\frac{m}{\sqrt{V^{(k)}} + \varepsilon})\tag{18}$$

where *V* is the second moment, and *e* is a small scalar that is used to prevent division by zero.

$$H\_{\mathfrak{c}}(k+1) = H\_{\mathfrak{c}}(k) + \frac{y\_k y\_k^T}{y\_k^T \Delta x\_k} - \frac{H\_{\mathfrak{c}k} \Delta x\_k \Delta x\_k^T H\_{\mathfrak{c}k}}{\Delta x\_k^T H \varepsilon\_k \Delta x\_k} \tag{19}$$

$$y\_k = \nabla f(\mathbf{x}\_{k+1}) - \nabla f(\mathbf{x}\_{\parallel k}) \tag{20}$$

$$
\Delta \mathbf{x}\_k = \psi\_k p\_k \tag{21}
$$

where *He* is Hussein matrix, *ψ* is step size, and *P* is Direction of search. The flowchart, as shown in Figure 6, is developed to select the best model.

**Figure 6.** Flowchart of optimization for the ANN approach to finding the best network.

#### **4. Results and Discussion**

Firstly, the MLP network has been constructed by practicing all three optimization techniques and the Relu activation function. The parameters of the algorithms are optimized to achieve the optimal network. In the SGD algorithm, there are three parameters: batch size, learning rate, and momentum. The batch size is the number of training samples used in one iteration. The learning rate is the amount of each iteration's step while approaching a minimal loss function. The momentum considers the gradient of previous steps rather than depending just on the current step gradient to control the process. The selected algorithm changes the learning rate and batch size to 0.02 and 20, respectively. A high learning rate enables the model to learn more rapidly but at the expense of a less-than-optimal final weight set. A slower learning rate may enable the model to discover a more optimal or even globally optimal combination of weights, but training will take significantly longer. Concerning deep learning, most practitioners set the value of momentum to 0.9. The optimized SGD algorithm parameters are shown in Figure 7.

**Figure 7.** Optimization Parameters SGD algorithm. (**a**) Study the effect of batch size on the MSE. (**b**) Study the effect of momentum on the MSE (**c**) Study the effective learning rate on the MSE.

To examine the RBF network and compare it with the MLP network, RBF network parameters should be optimized. Therefore, according to Figure 8, the spread parameter should be optimized by network data.

**Figure 8.** Optimization Spread parameter in RBF network.

#### *4.1. Comparative Analysis of ANN Models*

In order to design the structure of the MLP network, it should be optimized for the effective components of the network. One of the most crucial components is the activation function used in the network. According to Figure 9, different activation functions in the MLP network were evaluated according to the number of hidden layer neurons and were found to be the best activation functions. Although the Relu function provides an unsatisfactory output when the number of neurons is smaller than 20, it makes fewer errors when the number of neurons is greater than 20.

The MSE, RMSE, and R parameters for the MLP and RBF networks were determined in Tables 4 and 5. The LBFGS approach is preferable based on the evaluator parameters. In the MLP network, several activation functions and optimization algorithms were also evaluated. As illustrated in Figure 9, the Relu activation function provided a more accurate result.

Figure 10 presents the R of MLP network data with the Relu activation function and different optimization methods in various hidden layer neurons for each network optimization method, which means that the BFGS approach is desirable.

As shown in Table 4, the BFGS optimization algorithm method was selected as the optimization method because it has the lowest error considering the number of hidden layer neurons. According to network comparison, the MLP network with the 27 neurons in the hidden layer has the best result and an MSE value of 8.95 <sup>×</sup> <sup>10</sup>−6. The structures of other networks can cause minor errors when they have more neurons. In Figure 11, three types of different optimization algorithms that have been used for the mlp network are compared with the RBF network. The comparison of different algorithms in Figure 11 shows that the lowest number of neurons concerning the mean squared of error has the LBFG optimization method.

**Figure 10.** Comparison accuracy performance of MLP network using the Relu activation function. (**a1**–**a3**) SGD method, (**b1**–**b3**) Adam method, and (**c1**–**c3**) BFGS methods (blue dotes are data and the red line is the fit line).

**Figure 11.** Comparison of the MLP and RBF networks performance optimization.

*Energies* **2022**, *15*, 9276


2.69 × 10−5

5.18 × 10−3

0.9998

 0.9998

 43

2.39 × 10−5

4.89 × 10−3

0.9998

 0.9999




#### *4.2. Optimum ANN Results for Prediction of Solid Conversion for DRI Process*

The accuracy of network results and the comparison of real data with the prediction amount are quite acceptable. The neural network has a low error rate, and it can calculate the percentage of X3 from the shaft furnace based on input variables such as diameter, length, and input flow to the shaft furnace. The best result was shown using the LBFGS optimization method and the Relu activation function in Figure 12.

**Figure 12.** The best network-developed MLP with two hidden layers (**a**) Compare predicted using real data (**b**) strong positive correlation.

As a result, the most functional network consists of two hidden layers, with 13 neurons in the first layer and 14 neurons in the second layer. According to Figure 13, these neurons are fully connected using weights. Furthermore, each neuron has a bias that is listed in Appendix A, along with its weight. The matrix created using the MLP algorithm simulated for the prediction of the MIDREX process is shown in Appendix A.

**Figure 13.** Schematic of final MLP structure.

#### 4.2.1. Effect of Dimensions on Pellet Conversion Rate

Heatmap and pair plot have been used to show the effects of different parameters on the sponge iron produced, as shown in Figures 14 and 15. In the heatmap chart, Pearson's correlation coefficients show the relationship between various parameters [64]. In Figure 14, the effect of different parameters of the network is shown as a linear and nonlinear spectrum, with the interpretation that the farther from the unit value (1 and −1), the more nonlinear it is, and the closer to the unit value, the effect of two parameters is proportional and are more linear. The closer value is to one, the more linear relationship between the two parameters. If this value is positive, it indicates a direct relationship between those the two parameters, while if this value is negative, it means that those two parameters are opposite of each other (they increase or decrease in opposite directions). According to Figure 14, the correlation coefficients between the real data of the network and the prediction value of the network are equal to one, which shows that it can completely predict the real data. The correlation coefficient for the effect of dimensions on the conversion rate is 0.91, which shows that the relationship between the two is linear and direct. Based on Figure 15, the direct relationship between these parameters and the degree of linearity is clear. This direct relationship is because of the longer reactor, the longer pellets in the regeneration zone, and the higher conversion rate [15].

**Figure 14.** Heatmap of Pearson correlation coefficient matrix for network MLP.

#### 4.2.2. The Effect of Gaseous Compounds on Pellet Conversion

According to Figure 14, the correlation coefficient between XCO2 and XH2O with pellet conversion rate is −0.83 and −075, indicating that there is a nonlinear link between them and the direction of their changes is the opposite of each other. These values for the relationship between XCO and XH2 with pellet conversion are equal to 0.74 and 0.77, which proves that the relationship between them is direct and nonlinear. The connection is formed when H2 and CO gases enter the furnace from the bottom of the reduction zone. After interaction with the iron oxide pellets, they turn into CO2 and H2O and exit from the top of the reduction zone. Hence, changes in XCO2 and XH2O have an inverse relationship with changes in pellet conversion rate. From the top of the regeneration zone to the bottom, the amount of pellet conversion increases, and the amount of XCO2 and XH2O decreases. This

relationship can be seen in the performance of all stimulations shown in Figure 15 for four shaft furnaces.

**Figure 15.** Comparing the effects of different parameters on each other and on the amount of solid conversion (based on dimensionless data).

#### 4.2.3. The Effect of Flow Rate on Pellet Conversion Rate

According to Figure 14, the correlation coefficient between the ratio of gas to solid flow rate is 0.22. This value proves several facts: firstly, the relationship between the solid flow rate and the pellet conversion rate is opposite of each other because the solid flow rate is in the denominator of the dimensionless flow parameter; secondly, the relationship between the gas flow rate and the conversion rate is because the gas flow rate in the case of the dimensionless flow parameter is direct. In addition, this relationship is nonlinear since the correlation coefficient is near zero. By examination of other parameters in Figure 14, it can be seen that the most nonlinear parameter is the ratio of flow rates. This sign of nonlinearity demonstrates that the impact of this parameter on iron ore recovery is greater than other parameters. This conclusion has been proved by solving the governing equations of the problem [14].

The relationship between solid flow rate and conversion rate is very similar to the relationship between dimensions and conversion rate (both are due to residence time). Furthermore, because of the lower solid flow rate and lower speed of pellets, the pellets and the regeneration gas are more in contact with each other. Consequently, the conversion rate grows as the residence time of the pellets in the regeneration area increases. Contrary to the relationship between solid flow rate and pellet conversion, the pellet conversion rate increases with increasing gas flow rate. This increase is not due to the reduction of external mass transfer resistance from gas to solid. However, simulations reveal that even at the lowest flow rate, the Sherwood number is sufficiently large to render the mass transfer resistance insignificant [15]. Consequently, as the flow rate increases, so does the concentration of reducing gases in the top portion of the reactor.

#### 4.2.4. The Effect of Temperature on the Pellet Conversion Rate

The lowest part of the reduction zone, where the reduction gas enters, has the highest temperature [4]. Furthermore, in this area, the pellet conversion is maximum, proving that increasing the gas and the solid temperature raises the pellet conversion rate. The correlation coefficient between the gas and solid temperature with the solid conversion rate shows the value of 0.71 and 0.72. These values show the direct and nonlinear relationship between the parameters. The smaller the particle, the higher the mass transfer coefficient, which means the film resistance is lower. Thus, the film resistance is maximized if the larger particle is chosen. To explain, if we can remove this film resistance at speed, all smaller particles can be removed at the same speed as the film resistance [50].

Considering the environmental issues, air pollution has been one of the most challenging issues for humans in recent years. Since 7% of carbon dioxide production is derived from the steel industry [9], it can be modeled in future research. The production of Midrex with green hydrogen was aimed at its feasibility from thermal and economic points of view. Another vital issue is the analysis of energy consumption in the MIDREX unit. Recently Salimi et al. investigated the technical and economic study of energy harvesting from the waste heat of the Midrex process by the Kalina cycle in the process of direct iron reduction [65].

#### **5. Conclusions**

In this study, two networks were constructed for the percentage of X3 output parameters from the shaft furnace. Two networks were investigated using different optimization methods. The Adam and LBFGS optimization algorithm methods were faster and more accurate, delivering results for MSEs in order 8.95 <sup>×</sup> <sup>10</sup>−6. Furthermore, various activation functions were practiced to improve the network by the Relu, which can cause the least error due to the number of hidden layer neurons. After optimization of both networks, RBF and MLP, they were compared with the same number of hidden layer neurons. The MLP network was able to generate 8.95 <sup>×</sup> <sup>10</sup>−<sup>6</sup> errors and better predict the conversion percentage of the pellet as an output parameter. This network could be used to improve the performance of the shaft furnace and make modeling to predict the conversion percentage of iron oxide output of the shaft furnace straightforward. The predictive advantage of the shaft furnace can be determined using the network results, such as weights and biases, in the shortest time and with the highest accuracy. Finally, for a better analysis of the effect of different parameters, using a heat map, it was shown that the coefficient of each of the network inputs can communicate with the output and other input parameters of the network. This research could be the beginning of the utilization of ML in the direct reduction process, which is due to the complex behavior of pellets in the shaft furnace and its complex reactions (six heterogeneous reactions of iron oxide phases, methane reforming reactions, water gas shift reactions, and other side reactions) can help the pellet behavior with high speed and accuracy through ML. This modeling is superior to earlier approaches because it is more precise and quick than earlier numerical methods. Another application for ML is in unit control; for instance, it could be used to optimize the temperature of the bustling gas in the shaft furnace. This alone could make for some very fascinating research in the future. ML could be used in future studies for control purposes, such as the MIDREX process.

**Author Contributions:** Conceptualization, H.M., M.H. and A.E.; Methodology, H.M., M.H. and F.M.; Software, M.H.; Validation, M.H.; Formal analysis, M.H., H.M. and F.M.; Data curation, M.H.; Writing—original draft, M.H., H.M.; Writing—review & editing, H.M. and F.M.; Supervision, A.E.; Project administration, H.M.; Funding acquisition, A.E. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Nomenclature**



#### **Appendix A**

**Table A1.** Characteristic weights and biases of the DRI − ANN − MLP model with the best algorithm.


#### **References**


18. Usui, T.; Ohmi, M.; Yamamura, E. Analysis of rate of hydrogen reduction of porous wustite pellets basing on zone-reaction models. *ISIJ Int.* **1990**, *30*, 347–355. [CrossRef]


### *Article* **Artificial Intelligence (AI)-Based Occupant-Centric Heating Ventilation and Air Conditioning (HVAC) Control System for Multi-Zone Commercial Buildings**

**Alperen Yayla 1, Kübra Sultan Swierczewska ´ 2, Mahmut Kaya 3, Bahadır Karaca 4, Yusuf Arayici 5, Yunus Emre Ayözen <sup>6</sup> and Onur Behzat Tokdemir 7,\***


**Citation:** Yayla, A.; Swierczewska, ´ K.S.; Kaya, M.; Karaca, B.; Arayici, Y.; Ayözen, Y.E.; Tokdemir, O.B. Artificial Intelligence (AI)-Based Occupant-Centric Heating Ventilation and Air Conditioning (HVAC) Control System for Multi-Zone Commercial Buildings. *Sustainability* **2022**, *14*, 16107. https://doi.org/10.3390/su142316107

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 11 October 2022 Accepted: 12 November 2022 Published: 2 December 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Abstract:** Buildings are responsible for almost half of the world's energy consumption, and approximately 40% of total building energy is consumed by the heating ventilation and air conditioning (HVAC) system. The inability of traditional HVAC controllers to respond to sudden changes in occupancy and environmental conditions makes them energy inefficient. Despite the oversimplified building thermal response models and inexact occupancy sensors of traditional building automation systems, investigations into a more efficient and effective sensor-free control mechanism have remained entirely inadequate. This study aims to develop an artificial intelligence (AI)-based occupant-centric HVAC control mechanism for cooling that continually improves its knowledge to increase energy efficiency in a multi-zone commercial building. The study is carried out using two-year occupancy and environmental conditions data of a shopping mall in Istanbul, Turkey. The research model consists of three steps: prediction of hourly occupancy, development of a new HVAC control mechanism, and comparison of the traditional and AI-based control systems via simulation. After determining the attributions for occupancy in the mall, hourly occupancy prediction is made using real data and an artificial neural network (ANN). A sensor-free HVAC control algorithm is developed with the help of occupancy data obtained from the previous stage, building characteristics, and real-time weather forecast information. Finally, a comparison of traditional and AI-based HVAC control mechanisms is performed using IDA Indoor Climate and Energy (ICE) simulation software. The results show that applying AI for HVAC operation achieves savings of a minimum of 10% energy consumption while providing a better thermal comfort level to occupants. The findings of this study demonstrate that the proposed approach can be a very advantageous tool for sustainable development and also used as a standalone control mechanism as it improves.

**Keywords:** artificial intelligence (AI); automatic HVAC control; occupant behavior; model predictive control; energy efficiency

#### **1. Introduction**

Due to high demand and the need for an increasing energy supply, energy efficiency becomes crucial. Restricted energy markets have wide effects in areas ranging from household budgets to international relations. Thus, due to high energy consumption, buildings are on the front line of energy efficiency research. Buildings compose approximately 40% of the total global energy consumption (International Energy Agency, 2019 [1]), and almost 40% of this goes towards heating, ventilation, and air conditioning (HVAC) systems (Yang et al., 2014 [2]). Clearly, the development and implementation of efficient building energy control systems is essential for economic and environmental sustainability. The HVAC system is a commonly used tool to maintain thermal comfort in buildings. It also serves as an essential demand-response source for peak load reduction and system-wide activity stabilization through effective demand-side energy management strategies. Until today, this energy demand in the buildings has been measured with sensors. Since heating and cooling in large masses do not occur rapidly, the inability to respond to sudden changes in occupancy and environmental conditions makes traditional HVAC control systems energy-inefficient, especially in large commercial buildings.

An HVAC is a dynamic mechanism that includes multiple input and output variables and is subject to various fluctuations and uncertainties, including occupant behavior, external air temperature, humidity, air volume, and regulated air temperature (Alcalá et al., 2003 [3]; Mirinejad et al., 2008 [4]). These specific features and characteristics all need to be taken into consideration to operate the HVAC system effectively. Thus, the research question of this paper is "how HVAC systems can be made efficient in meeting the sudden changes in demand responses in large commercial buildings by taking into consideration the occupancy patterns and prediction?" The following section provides a critical review of the literature with related studies in energy management with HVAC control systems to establish the setting for the research.

#### **2. Related Studies**

#### *2.1. Traditional and Advanced Control Strategies*

HVAC control strategies can be examined in general terms under two headings: traditional control strategies (TCSs) and advanced control strategies (ACSs). This section presents a review of related studies focusing on ACSs. Different control mechanisms are examined, and then a limited number of occupancy-based control approaches are discussed.

TCSs generally include sequencing, on-off, process, and proportional-integral-derivative (PID) controls. Their simple structure, quick response, easy implementation, and low initial costs are the main advantages of TCSs. They also have many disadvantages, such as low accuracy, quality, and performance, and (thus) energy efficiency. Furthermore, they do not interact with the external environment or regulate and adapt to the input variables accurately, in terms of their setpoints, schedules, and working modes, among others (Gholamzadehmir et al., 2020 [5]). Thus, the diversity and complexity of variables make it impossible to create accurate and reliable mathematical HVAC models for TCSs.

ACTs effectively obtain superior results in HVAC applications. These can be divided into four categories: (i) soft-computing, (ii) hard-computing, (iii) hybrid, and (iv) adaptivepredictive control strategies.

#### 2.1.1. Soft Computing Strategies

Reinforcement learning (RL), artificial neural network (ANN)-based deep learning, fuzzy logic (FL), and agent-based controls together comprise the soft-computing control strategies. As a control mechanism, this enables solutions to more complex problems by generating more accurate and statistical responses for unclear and uncertain inputs. The key benefit of fuzzy logic controllers is that no mathematical simulation is needed for controller design (Mizumoto, 1995 [6]; Mirinejad et al., 2008 [4]; Soyguder et al., 2009 [7]). The knowledge-based methodology is the fundamental aspect of a fuzzy controller. This consists of if-then rules, membership functions, and scaling factors constructed based on expert experience or learning and self-organization methods that do not involve the system's mathematical model forms.

Since the human sensation of thermal comfort is subjective, and self-reporting can vary among occupants and over time, linguistic rules, on which fuzzy logic is based, are well suited to characterize HVAC systems and thus ideal for increasing thermal comfort (Chiou and Lan, 2005 [8]; Mirinejad et al., 2008 [4]). There are two different approaches to the automation of rule-based construction in fuzzy systems, which can be used for optimizing the fuzzy system parameters (Mirinejad et al., 2012 [9]): one involves evolutionary techniques and the other soft-computing methods and technologies, such as ANNs.

Soft-computing methods with ANNs can integrate the learning ability of neural networks with the knowledge representation of fuzzy logic. They are frequently used when the aim is to decrease the error between the fuzzy system output and the target value, as characterized by the general term "neurofuzzy system" (Mirinejad et al., 2012 [9]). ANNs can also be applied to optimize the fuzzy database, including membership functions and scaling factors in a fuzzy system (Egilegor et al., 1997 [10]; Kruse et al., 1997 [11]; Wu et al., 2011 [12]). Some studies have utilized advanced fuzzy methods to optimize the function of existing, traditional PID controllers (Malki et al., 1994 [13]; Ying, 1994 [14]; Wu et al., 1996 [15]; Patel and Mohan, 2002 [16]; Li et al., 2005 [17]), while others have used them more directly in the development of new HVAC control mechanisms (Fanger, 1972 [18]; Alcalá et al., 2003 [3]; Liang and Ru, 2008 [19]; Gacto et al., 2011 [20]; Nowak and Urbaniak, 2011 [21]).

Together with model predictive control (MPC) algorithms, fuzzy control algorithms are implemented in the hierarchical framework for HVAC device control (Nowak and Urbaniak, 2011 [21]). Wei et al. (2017) [22] presented a deep reinforced learning (RL) method to develop an HVAC system that they found to be energy-efficient compared with the traditional rule-based approach. Du et al. (2021) [23] presented a model-free deep RL framework for an optimized control approach for a multi-zone residential building. This proposed RL model was reported to provide substantial energy savings and 98% less comfort violation than a rule-based HVAC control strategy

#### 2.1.2. Hard Computing Strategies

Hard-computing control strategies, which include auto-tuning PID control, gainscheduling control, self-tuning control, supervisory/optimal control, MPC, and robust control, benefit from a mathematical/analytical model that needs real input variables to respond accurately and rapidly. Some important hard-computing control strategy examples are summarized below, with a focus on MPC applications as these are more important here.

Pasgianos et al. (2003) [24] applied a non-linear feedback approach for climate control in greenhouses, especially for ventilation, cooling, and moisturizing. A non-linear multiinput and multi-output model has been used for an air-handling unit (AHU) control (Moradi et al., 2010 [25]). Robust control was applied to control the temperature in a multi-zone HVAC mechanism (Al-Assadi et al., 2004 [26]) and to supply air temperature (Anderson et al., 2008 [27]). Optimal control strategies were used to manage both single zone heating in buildings (Dong, 2010 [28]) and a multi-zone air conditioning system (Mossolly et al., 2009 [29]). An adaptive optimal control approach was also employed to optimize HVAC system control using a genetic algorithm (Yan et al., 2008 [30]).

MPC is an optimization technique that involves the construction of an objective function and an input sequence considering both specified and forced constraints. Serale et al. (2018) [31] aimed to describe the problem formulation, applications, and advantages of an MPC framework for improving building and HVAC energy efficiency. MPC has four functions in buildings, related to weather, user behavior, grid, and thermal mass. Kusiak et al. (2011) [32] created a predictive model with a data-mining approach to optimize HVAC mechanisms using information gathered from an experiment performed at a research facility. Kusiak et al. (2014) [33] presented an HVAC optimization approach with data-driven models and an interior-point method. The Poisson and uniform distributions modeled the uncertainty of occupant behavior, and the internal heating gain was measured with the stochastic mechanism of the building's occupancy. The results showed that the future performance of HVAC was estimated precisely.

Another data-driven approach for optimizing HVAC energy consumption was proposed by Wei et al. (2015) [34]. For this, a quad-objective optimization problem was built

to balance energy usage and occupancy comfort and solve a modified particle swarm optimization algorithm. A substantial amount of energy savings was obtained. Biyik et al. (2015) [35] and Kelman et al. (2013) [36] suggested an MPC solution in a standard commercial building for two traditional HVAC setups to maximize energy efficiency and increase occupant comfort by using weather forecasting data. The effect of occupants on internal load prediction and learning from occupant activity is one of the key features of MPC, which can have a major impact on energy efficiency (Serale et al., 2018 [31]).

#### 2.1.3. Hybrid Strategies

Huang et al. (2015a) [37] carried out a study that proposed a hybrid MPC framework. This integrated a classical MPC with a neural network feedback linearization method to reduce the cost and energy of HVAC in commercial buildings. The results indicated that a significant level of energy-saving could be achieved without compromising thermal comfort. Garnier et al. (2015) [38] implemented predictive control for a multi-zone HVAC mechanism in non-residential buildings using EnergyPLus software for the building model and ANN-based models for the controller's internal models. This took the predicted mean vote index as a measure of thermal comfort. Basic scheduling techniques and the proposed HVAC system using a genetic algorithm for optimization were compared, and the importance of the predictive approach demonstrated. Barzin et al. (2016) [39] carried out an experimental study using weather prediction and a price-based control system for passive solar buildings, with up to 90% energy savings achieved.

Alibabaei et al. (2016) [40] explored a Matlab-TRNSYS co-simulator development for control of the TRNSYS software, which was previously designed and balanced based on a real case-study building and used an advanced predictive controller. This study is important here in terms of the co-simulation application. For various other studies, Afram and Sharifi (2013) [41] supplied a detailed literature review including control techniques that focused on the theory, and implementation of MPC approaches for the HVAC mechanism; Afram et al. (2017) [42] presented another comprehensive MPC review focusing on artificial neural network applications with a case study involving ANN models built and calibrated with the on-site data of a residential house. Trˇcka and Hensen (2010) [43] and Afroz et al. (2017) [44] presented a critical review of the latest simulation and modeling techniques, used in HVAC, focusing on their benefits, limitations, implementations, and efficiency.

#### 2.1.4. Adaptive-Predictive Control Strategies

The APCS (adaptive-predicted control strategy) method can be adapted to a controlled system with time-dependent variables through online variation of its control gains. Huang et al. (2015b) [45] presented an ANN model-based system identification approach to model multi-zone buildings. This showed the thermal interactions between the zones to be well captured by the ANN model, incorporating the energy input from mechanical cooling, ventilation, changes in the weather, and the convective heat transfer between adjacent zones. Thus, more precise outcomes are obtained than a single-zone model. Javed et al. (2017) [46] introduced a random neural network (RNN)-based controller on an Internet of Things (IoT) platform combined with cloud computing to carry out RNN that estimated the number of occupants inside the area and sent information to the central RNN-based occupancy calculator placed in the sensor node.

Cardoso et al. (2018) [47] introduced a study of HVAC power-demand forecasting based on occupant activity. This influences our study in terms of the use of real data from a research building for estimation. Estimation of HVAC demand plays a vital role in developing a more efficient HVAC system. Yang et al. (2019) [48] proposed an adaptive, robust MPC and compared its performance with predictive model controllers. This study showed that adaptive modeling and robust optimization minimize unsuitable indoor conditions because of uncertainties. Zhou et al. (2019) [49] developed a non-linear MPC by MATLAB using production control systems and weather forecasts and reported a substantial decrease in energy consumption. Finally, Gholamzadehmir et al. (2020) [5]

presented a review of the adaptive-predictive control strategy for HVAC systems in smart buildings focusing on advanced control approaches and their effect on buildings according to energy consumption and cost. This study indicated that although adaptive control strategies eliminate the shortcomings of model predictive approaches, such as uncertainty and unpredictable data, a high degree of inconsistency is observed in the literature.

#### *2.2. Occupancy Related Studies*

Since the primary focus of our study is the occupancy pattern and prediction, the following paragraphs look at occupancy-related studies. Erickson et al. (2009) [50] indicate that a 14% reduction in HVAC energy usage can be provided with occupancy prediction and usage patterns. They created a wireless camera sensor network for occupancy data and estimated occupancy with an accuracy of 80%. Erickson and Cerpa (2010) [51] proposed a strategy for HVAC systems using real-time occupancy monitoring and estimation of occupancy with a sensor network of cameras, indicating energy savings of up to 20%. Oldewurtel et al. (2013) [52] developed an MPC framework using occupancy information to investigate the effect of occupancy patterns to achieve a more energy-efficient HVAC mechanism. Furthermore, an RFID-based occupancy detection was presented by Li et al. (2012) [53] to decrease the consumption of the HVAC. The study shows how demand-driven HVAC operation is efficient by integrating an occupancy detection system.

A clustering-based iterative evaluation algorithm for eliminating when and how occupants occupy a building was introduced by Yang et al. (2016) [54], who evaluated energy implications at the building level with building information modeling that provided the building geometries, HVAC system configurations, and spatial information as inputs for the computation of possible energy consequences. Capozzoli et al. (2017) [55] applied an occupancy-related HVAC operation schedule that focused on shifting groups of occupants with similar activity in the same thermal zone. As a result of the new schedule approach, HVAC-related energy use decreased by almost 14%.

Another occupant-centric model, the predictive control approach, was developed by Aftab et al. (2017) [56], who created and applied an occupancy-predictive HVAC mechanism using real-time occupancy recognition, predicting user activity, and building thermal simulation. Aftab et al. (2017) [56] focused on a single-zone mosque area whereas the research in this paper focuses on multi-zone commercial buildings and adopts the use of AI for the prediction of occupancy activity. With these advancements, the research in the paper differs from the one by Aftab et al. (2017) [56].

Shi et al. (2017) [57] used a change-point logistic regression model for precise occupancy estimation to create an occupant-centric model predictive algorithm. Their findings indicated that an HVAC control strategy with real-time occupancy estimation provides energy-saving and increases building occupant comfort. Peng et al. (2018) [58] found that 52% energy saving is possible with occupancy prediction-based cooling control using machine learning in office buildings. A demand-responsive method was developed based on energy-related occupant activity. Nikdel et al. (2018) [59] estimated the benefits of occupancy centric HVAC controls in small office buildings based on programmable thermostats; when compared with no thermostat control, their proposed HVAC control approach reduced electricity and natural gas use by up to 50% and 87%.

Ahmadi-Karvigh et al. (2019) [60] presented an automation system that continually learns occupant behavior to help service system control by determining the set of rules according to the user's preferences and behaviors. Adaptive automation gave better results than inquisitive automation in terms of benefits and occupant satisfaction. Pang et al. (2020) [61] determined the energy efficiency potential of the new HVAC system combined with occupancy sensing methods. Their study involved an energy simulation with three different occupancy scenarios, with occupancy presence sensor and occupant counting sensor providing energy savings in office buildings.

Azuatalam et al. (2020) [62] developed a reinforced learning (RL) framework to optimize and control the HVAC for a whole commercial building. Simulations showed that, compared to a handcrafted baseline controller, an energy saving of up to 22% could be reached. Deng and Chen (2020) [63] developed a smart HVAC control mechanism for multi-occupant offices using the physiological signals of occupants. They applied an ANN model to predict indoor conditions and physiological signals, such as clothing level, (wrist) skin temperature, relative skin humidity, and heart rate. The heating and cooling loads in interior offices were reduced by 90% and 30%, respectively, following coupling with the occupancy-based control through lighting sensors and wristband Bluetooth. This study was vital for our research in terms of its development of occupancy-related HVAC and direct measurement of occupant comfort level. Jung and Jazizadeh (2019) [64] presented a structured literature review examining the user-centric operations and human dynamics of HVAC systems. This study focused on occupancy, comfort, and energy savings aspects. Finally, Jazaeri et al. (2019) [65] analyzed the complex relationships among local climates, building characteristics, and occupancy patterns with the annual and peak HVAC demand of residential buildings. These studies are important for us in terms of occupancy, but as mentioned before there is no study that predicts occupancy without real-time detection tools.

#### *2.3. IDA Indoor Climate and Energy (ICE) Software Background*

The IDA Indoor Climate and Energy (ICE) simulation software is one of the four primary building-energy simulation tools used in research (Ryan and Sanquist, 2012 [66]) and one of the twenty main building-energy simulation software packages (Crawley et al., 2008 [67]). As with many other simulation software packages, this uses building geometry as the foundation for accurate measurements of solar radiation distribution in and between spaces. The program dynamically measures energy balances while considering climatic changes and a changing time-step. Heat balance equations are solved by the program using building geometry, design, HVAC conditions, and internal heat loads. The effectiveness and validness of the IDA-ICE software are proved in several studies over recent years (Bring et al., 1999 [68]; Achermann and Zweifel, 2003 [69]; ISO, 2003 [70]; Karlsson et al., 2007 [71]; Loutzenhiser et al., 2009 [72]; Hilliaho et al., 2015 [73]; Salvalai, 2012 [74]; Mazzeo et al., 2015 [75]; Mili´c et al., 2018 [76]).

#### **3. Aim of the Research**

The advanced prediction ability of AI methods can be employed with sensors to determine occupant behavior, which offers an excellent opportunity to minimize the weakness of the traditional HVAC systems. The aim of the paper is to develop an AI-based, occupantcentric HVAC control mechanism that uses actual weather predictions and continually improves its knowledge to increase energy efficiency in a commercial building. Since the cooling problem has gained importance in recent years, the focus will be on the cooling function of the HVAC systems.

The novelty of the work is twofold. Firstly, a new HVAC control algorithm is proposed, based on forecasted weather and occupancy information to establish a sensor-free mechanism. Secondly, an artificial intelligence-based occupancy forecast system is presented, which considers all parameters (weather information, time indicators, social situations) and provides year-round usage with accurate prediction. Although there are limited examples for real-time occupancy detection in multi-zone buildings, any research study involving occupancy prediction without a camera or sensors has not been performed yet. There is also no other study that constructs the relationship between occupancy prediction, real-time weather, and indoor temperature to manage HVAC control via an algorithm. While a sensor-free algorithm allows both low installation cost and high energy efficiency, AI-based occupancy forecasting provides a system that improves itself as these data increase to use control mechanism standalone and obtain better energy savings.

#### **4. Research Methodology**

Paper adopts the design science research (DSR) methodology that facilitates the development of innovative solutions for industry and organizations driven by information (Vaishnavi, et al., 2019 [77]). Its characteristics involve iterative design processes leading to development of innovative solutions in the problem domain (Wieringa, 2014 [78]). The DSR methodology integrates both social context and knowledge base technical capability to achieve the aim of the research (Markus et al., 2002 [79]). Wieringa (2014) [78] described that there are two types of DSR: These include "problem-oriented research—evaluation research", and "solution-oriented—technical research". The problem-oriented research looks at what causes/effects a problem has, or how to solve a problem, whereas the solution-oriented research design and validate a system, or a requirement (Peffers et al., 2006) [80].

With DSR, this paper promotes the adoption of AI-based and occupant-centric HVAC control systems in commercial buildings to address the research problem around inefficient energy management of the existing HVAC systems. The DSR features with social context that are relevant to the paper are given in Table 1 and the overall research methodology is illustrated in Figure 1.

**Table 1.** Design science research (DSR) features.


**Figure 1.** The flowchart of the design science research (DSR) methodology.

The design science research (DSR) methodology enabled developing the HVAC control system for accurate prediction of energy supply in commercial buildings that will serve to meet human needs for energy demand. The system developed via DSR is innovative with the key being the embedment of artificial intelligence that processes the occupancy and these weather data without the use of sensors. This paper with DSR implementation brings not only the novelty but also provides important solution to the practice for energy management in commercial buildings.

In the information systems (IS) science, the DSR methodology is highly preferred to solve identified organizational problems by developing information technologies. The paper is designed in accordance with the DSR methodology. The research problem domain and opportunities are elaborated by means of literature review through the relevance cycle. The design cycle is evaluated in Section 5 regarding the development of the artifact (AI-based occupant-centric HVAC control system) that is extended with the test and demonstration of the proposed artifact in Section 6 through the rigor cycle. This then leads to the accumulation of findings into the new knowledge base, articulated in Sections 6 and 7.

#### **5. Design and Development of the AI-Based Occupant-Centric HVAC Control System**

The artifact, which includes novelty about the research problem, is created. This artifact in this paper is the AI-based occupant-centric HVAC control system. Since the purpose of the study is to reveal energy efficiency potential of the proposed HVAC control mechanism, energy analysis according to different scenarios constitute the central part of this section. The research focuses on a specific site to obtain realistic results, using two-year occupancy and environmental conditions data of a shopping mall in Istanbul. Figure 2 shows the architecture of the system, consisting of three steps: predicting hourly occupancy, a new HVAC control mechanism, and comparison of the traditional and AI-based control systems via simulation according to different scenarios.

In the first step, building properties and real occupancy information are collected. In the second step, after determining the attributions for occupancy in the mall, hourly occupancy predictions are made using real data and ANNs, and a sensor-free HVAC control algorithm is developed with the help of occupancy data obtained from the previous stage, building characteristics, and real-time weather forecast information.

ANN is considered one of the traditional and most used artificial intelligence methods, and is still one of the most accurate and effective. This enables traditional and AI-based sensor-free HVAC control mechanism comparison to be performed in the final step, using IDA-ICE 4.8 software developed by EQUA Simulation AB based in Stockholm, Sweden.

#### *5.1. Building Properties, Occupancy, and Environmental Information*

According to the Association of Real Estate and Real Investment Companies of Turkey (2019) [81], there are currently 454 shopping malls in Turkey; across Europe, there are more than 9500 malls, with over 1000 in France and more than 1500 in the UK (STATISTA, 2021 [82]). Worldwide, there is a huge number of shopping malls, which makes them a significant target for energy savings and important in the development of sustainable energy policy. These buildings tend not to have good energy efficiency strategies because they are mostly constructed for consumption and entertainment purposes. It is commonplace for them to use varied and excessive lighting to attract people and make them feel good inside the building.

Poor heating and cooling settings disrupt the comfort area for people as well as causing energy inefficiencies. As a complicating factor and unlike office buildings, shopping malls do not have certain daily occupancy distributions, thus accurately gauging correct heating and cooling settings is not easy. It is for these reasons that this study takes a shopping mall as its case study. For more accurate energy analysis, a realistic model of the building is used that incorporates the real properties of the building elements. Furthermore, the solar radiation and weather data for the building location are obtained automatically from IDA-ICE software for energy simulations. Figure 3 shows the model of the building story; Table 2 shows the properties of the building elements.

**Figure 2.** AI-based occupant-centric HVAC control system design.

**Figure 3.** Sample 2D drawings and 3D models of the building story.

**Table 2.** Building components.


*5.2. Occupancy Prediction with ANNs*

#### 5.2.1. ANN Parameters

Many factors affect the occupancy numbers and distribution of a shopping mall. They can be divided into two categories: social and environmental. When the collected real entrance data are examined, temperature, humidity, and weather conditions along with type and time of a day come to the forefront as significant parameters. Determined as attributes in the ANN calculation, these parameters are thus:


Table 3 illustrates the detailed categories, variables, and unit/index of attributes used in the ANN model; Table 4 shows a sample of the actual data. Furthermore, the histograms of the temperature, humidity, weather conditions, and occupancy variables are presented in Figure 4 to show the distribution of the collected data.

**Table 3.** Detailed category, variable, and unit/index information of attributes.


**Table 4.** Real data for ANN (sample).



**Table 4.** *Cont.*

**Figure 4.** Histograms of the temperature, humidity, weather conditions, and occupancy variables.

#### 5.2.2. ANN Models

Due to their strong logic, error tolerance, versatility, and generalization capabilities, AI methods are used in various applications. The ANN, a mathematical model that imitates the biological nervous system, is one of the most widely used types of AI and has been implemented to solve a variety of practical challenges in many fields of study.

The fundamental biological unit of the nervous system is the neuron, a fundamental processing factor that receives and integrates signals from other neurons through dendrite input paths. The neuron generates an output signal along the axon that links to the dendrites of several other neurons if the combined input signal is sufficiently high. An attempt to model the behavior of biological neural systems was made that led to the development of ANNs, in which artificial neurons model the components of a real neuron. An ANN is thus a set of independently linked processing units that function as parallel-distributed computing networks.

Unlike traditional computers, which are programmed to perform particular tasks, ANNs may learn from examples and eliminate the need for complicated mathematical formulas or costly physical models by acting as (human) brain-like mathematical models. They are fault-tolerant and can work with noisy data, allowing for quick generalization of unknown inputs (Wijayasekara et al., 2011 [83]). They also have specific adaptation abilities that enable them to solve highly non-linear problems in which finding analytical formulations that relate the input data to the output data is especially challenging (Hagan et al., 2014 [84]). Unlike other statistical or parametric approaches, ANNs can extract non-explicit relationships from a massive volume of correlated data using the high computational capabilities of current computers; thus, ANNs have become a prevalent problem-solving strategy in a diverse range of study areas.

The architecture of ANN models is formed by layers with complete or random connections between them. There is a connection between each neuron, and information exchange is performed. The network receives data from the input layer. The nodes in this layer do not have any weights or activation functions, thus it is not a neural computing layer. The hidden layer or intermediate layer includes data processing and computing steps and the final response to a given input, which is called the output layer (see Figure 5). The ANN model is developed using TensorFlow's Keras API and the Adam algorithm is used to train the model. Five-fold cross-validation was applied for splitting the data into two subsets, namely, training and testing. Ninety percent of cases were used for training in each trivial, and the remaining were utilized to test the model accuracy. All equations are adopted from the book *Artificial Neural Networks* by Springer US (2021) [85].

**Figure 5.** ANN structure for the study.

Generally, the net input of a neuron—activation potential *Ai*—is equivalent to the product *wijxj* , where *wij* is the weight of the corresponding connection on the *i*-th postsynaptic neuron and *xj* is the input signal (Equation (1)). Connection weights can be considered as storage of the knowledge that underlies the processing. Thus,

$$A\_i = \sum\_j w\_{ij} \mathbf{x}\_j - a\_i \tag{1}$$

where *ai* is the threshold activation constant of the neuron. An output can only be obtained by propagating through a specific activation function. After the signal has been thus propagated, an output can be found thus:

$$y\_i = \varphi(A\_i) = \varphi\left(\sum\_j w\_{ij}x\_j - a\_i\right) \tag{2}$$

where *yi* is the output of a layer and *ϕ*(•) is the transfer function.

The sigmoid activation function has been a common activation function for neural networks for a long time. Its input is converted to a value of between 0.0 and 1.0, with inputs that are significantly greater than 1.0 being converted to 1.0, and inputs that are significantly smaller than 0.0 snapped to 0.0. However, due to the vanishing gradient problem, usage of the sigmoid and hyperbolic tangent activation functions in networks with many layers is not true. This problem can be overcome by using the rectified linear activation function, allowing ANN structures to learn faster and increase performance. The formula of the rectifier or rectified linear unit (ReLU) is as follows:

$$f(\mathbf{x}) = \mathbf{x}^+ = \max(\mathbf{x})\tag{3}$$

where *x* is the input to a neuron. This is also known as a "ramp function" and is analogous to half-wave rectification in electrical engineering. Connection weights are modified by the ANN model using a suitable learning method during the training phase. The network uses a learning mode to obtain the desired output by adjusting the weights. This is executed by introducing input and desired output to the network. The difference between the expected output and the network's output is then used to determine the error value. In the training phase, recalculations are carried out to decrease the error to an acceptable value. Due to zero occupancy on some days, mean absolute error (*MAE*) is used to calculate the error value, thus:

$$MAE = |y\_i - \mathcal{Y}\_i|\tag{4}$$

where *y*ˆ*<sup>i</sup>* is the corresponding desired output value. An error of close to zero shows that the ANN output values match the expected values very well and the network is welltrained. Backpropagation training is accomplished by assigning random weights to all nodes. Equation (5) is used to measure the variation quantity of the connection weights:

$$
\Delta w\_{i\bar{j}}(t) = \lambda \delta\_i - y\_i + \mathfrak{a} \Delta w\_{i\bar{j}}(t-1) \tag{5}
$$

where the training rate is *λ*, the momentum coefficient is *α*, and the error of the *i*-th output layer is *δi*, which is calculated thus:

$$\delta\_i = y\_i (1 - y\_i) MAE\_i \tag{6}$$

*MAE* and mean absolute percentage error (*MAPE*) are also calculated as indices to evaluate the performance of the ANN model, thus:

$$MAPE = \frac{1}{N} \sum\_{i=1}^{N} \left| \frac{y\_i - \hat{y}\_i}{y\_i} \right| = \frac{1}{N} \sum\_{i=1}^{N} |Relative\ Error\_i| \tag{7}$$

where *N* is the total number of data sequences.

#### 5.2.3. HVAC Control Scenarios for Energy Simulation

The primary goal of establishing HVAC control scenarios in terms of the level of development here is to measure the amount of energy to be saved with the proposed AI-based control approach. Although great progress has been made in air conditioning systems, a large proportion of commercial buildings have the most traditional type of control system, which is one that is operated manually by an attendant (janitor or similar) responsible for turning the system on and off. The most common HVAC control is based on the measurements of environmental conditions via sensors, generally temperature, humidity, and pressure sensors. The most serious deficiency of sensors in terms of energy consumption is the failure to facilitate a quickly responsive control system.

Many shopping malls serve as a lunch-places for people working near the building, which causes short-term occupancy densities during the lunch-break period. The rise in temperature due to sudden increase of people density is a slower process and, by the time this reaches the sensor, the control system responds, and the appropriate ambient temperature is provided, most people will already have left the building to return to work. Moreover, traditional building automation systems depend on quite imperfect occupancy sensors, which retards system responsiveness. Passive infrared and ultrasonic occupancy sensors, for example, are low-functioning devices for this usage since they are unable to accurately assess the occupancy condition, especially when people are stationary for an extended period and have a limited range, which especially affects their effectiveness in large areas.

AI prediction technology offers significantly more accurate occupancy information and improved energy efficiency than traditional building automation systems. Accordingly, our HVAC control mechanism takes predicted occupancy information and the maximum number of people per day and adjusts its power according to occupancy rate over time. Furthermore, new schedule algorithms are developed based on occupancy information, and weather forecasts for the scenarios (S3 and S4) explained below. The on-off status of the HVAC is determined according to these setpoint schedule algorithms. The maximum setpoint value is determined as 24 ◦C for all scenarios since we focus on the summer period in this study. Finally, four different scenarios—showing a level of development (from traditional to advanced)—are determined, as follows:


4. S4: The S4 scenario represents the HVAC control system in the S3 scenario with a pre-cooling ability along with a quick response. The control algorithm provides pre-cooling time to control the system according to predicted weather conditions and occupant numbers. All other features are the same as for S3.

Figure 6 provides basic illustrations for the four energy simulation scenarios.

**Figure 6.** HVAC control scenarios for energy simulations.

Algorithm 1 and Algorithm 2, shown in below, explain the proposed HVAC control schedule algorithms in terms of cooling for S3 and S4. The control algorithm takes occupancy prediction results from the ANN analysis and real weather forecast information from provider websites as inputs (see lines 1–3). In the time intervals when the occupancy volume increases, the HVAC control activates according to the maximum setpoint (lines 4–6); otherwise, the algorithm checks the forecasted temperature and compares it with the maximum setpoint value.


If the weather forecast temperature for time t is greater than the maximum, the HVAC control uses the maximum setpoint (lines 8–10); if not, while the HVAC deactivates cooling automatically for S3 (lines 12 in Algorithm 1), the algorithm checks the occupancy trend of one hour later for S4. If there is a sudden increase (determined at 250 visitors), it activates the pre-cooling 30 min before the upward trend begins for S4 (line 15 in Algorithm 2).

Due to fluctuations in the occupancy numbers, sudden changes can cause comfort limit values to be exceeded, especially in situations where the number of visitors will increase too much one or two hours later, even if the occupancy trend is downward for the current time. To prevent this, S4 presents a 30-min pre-cooling. If there is no such increase, the HVAC control deactivates cooling (see line 17 in Algorithm 2), just as for the S3 algorithm.


#### **6. Demonstration and Evaluation of the AI-Based Occupant-Centric HVAC Control System**

In this stage, the designed and developed system is tested in relation to the scenarios for energy analysis. According to design science research, research can exploit experimentation, simulation, case study, proof, or other activities to demonstrate the proposed solution to the research problem. Hence, experimentation of the scenarios via simulation using IDA-ICE 4.8 software is performed for the computational energy analysis. The model of the shopping mall was created using Revit software and imported to IDA-ICE in IFC format. Four different energy analyses are carried out according to the four scenarios. For each scenario, the HVAC control system corresponding to the characteristics of the scenario is created in the simulation software using macros. Figure 7 shows the MPC algorithm framework. Simulations are carried out daily, with the results for two days explained in detail in the results section.

**Figure 7.** Model predictive control (MPC) algorithm framework.

The weather data of the software are used in the energy analysis. It is seen that the day-ahead predictions give almost the same values as real values. Therefore, the forecasted weather data for the algorithm are not used (to avoid repeating the results and graphics). In scenarios S4 and S5, both the estimated occupancy numbers found as a result of ANN calculation and the real occupancy numbers are used in different simulations to show how the small difference between the real data and ANN prediction affects the energy simulation.

The research artifact, the AI-based occupant-centric HVAC control system, is assessed, and evaluated to conceive how well the developed and demonstrated artifact is considered as a solution to the research problem. At this stage, research can benefit from surveys, feedback, and simulations. If the solution rate, which corresponds to the research problem, or functionality of the solution is not at an acceptable level, the iterative process is performed by turning stage 2 and 3.

#### *6.1. ANN Results*

In addition to the initial network settings (attributes, layer design, training algorithm, etc.), ANN parameters (hidden layer size, number of neurons in the hidden layer, batch size, which refers to the number of training examples utilized in one iteration, number of epochs, the number of complete passes through the training dataset, etc.) have a highly significant influence on the network output during the training and prediction phases. While a model with too few neurons has poor predictive performance because it cannot handle a complex model structure, if too many are selected, weak prediction performance follows as overfit too easily results from a minor fluctuation in the data.

Therefore, it is crucial to test the model's output with different design parameters. Different ANN models were trained for this study; as a result of the trials grid search methodology using the number of neurons in each layer and the number of epochs as variables by keeping the number of hidden layers constant at 8. Figure 8 shows the MAPE and R-squared results of the ANN models created using a grid search. Since the computational times are not too long and do not change much between them, they are not considered as parameters. The ANN models with 8 neurons in each hidden layer and with 500 epochs (ANN-1), with 8 neurons in each hidden layer and with 750 epochs (ANN-2), with 16 neurons in each hidden layer and with 250 epochs (ANN-3), and with 16 neurons in each hidden layer and with 500 epochs (ANN-4) give the best results with overall MAPE values of 0.1323, 0.1344, 0.1315, and 0.1335, respectively.

Figure 9 shows the learning curves of the different ANN models. It is clear from the loss curves that training and validation loss values for ANN-1 and ANN-3 (Figures 9a and 10b, respectively) are in the ideal range for model complexity. However, the distance between the training loss line and validation loss line gradually increases after a certain point because of overfitting in the ANN model with 64 neurons in each hidden layer and with 1000 epochs (Figure 9d).

The comparison between actual occupancy numbers and predicted occupancy values as given by ANN-1 and ANN-2 is illustrated in Figure 10. Although the prediction values are naturally far from the real values at some peak points, the prediction trend follows the real numbers in a general fashion. Thus, as the quantity of data increases in the future, more accurate results will be obtained.

Furthermore, to examine the results in more detail, four days (18, 19, 20, and 30 August 2019) were removed from the training data set and used as prediction values. Prediction values for these days were obtained using the ANN-1 features because of the model accuracy and processing time; a comparison with actual occupancy numbers is shown as a list (Table 5) and graphically (Figure 11).

Additionally, 18 August was a Sunday, while 30 August is a national holiday. These days are important in examining the ANN algorithm for weekdays, weekends, and special days. More people are expected to visit the shopping center on weekends and national holidays than on weekdays.

**Figure 8.** Grid search results of the ANN models according to MAPE and R-squared values.

**Figure 9.** Learning curves of the different ANN models.

**Figure 10.** Actual and ANN-predicted occupancy results (ANN-1 and ANN-2).


**Table 5.** Actual and ANN-predicted occupancy results (hourly).

From Table 5 and Figure 11, it is clear that the prediction values show a harmonious performance against time parameters. Moreover, although 30 August was a Friday, this analysis managed to approximate actual values with an accuracy of about 87% as an important measure of the success of predictions.

**Figure 11.** Actual and ANN-predicted occupancy results (hourly).

#### *6.2. Energy Analysis Results*

The energy analyses aim to demonstrate the effectiveness of the proposed HVAC control algorithm and compare it with traditional control systems. For this purpose, indoor temperature results and daily energy consumption values for the four scenarios were obtained from the IDA-ICE software. Two days are selected for detailed energy analysis and comparison of indoor temperatures according to energy simulation scenarios, Monday, 29 August 2019, and Saturday, 7 June 2019; these are illustrated in Figures 11 and 12, respectively.

Since the S1 scenario represents the full-powered HVAC at all times, the indoor temperature remains constant with small fluctuations at 24 ◦C for both 29 August and 7 June (Figures 12a and 13a, respectively), as expected. When the indoor temperature results of the S2 scenario, which represents the sensor-based traditional control approach, are examined for 29 August (Figure 12b), the temperature is found to vary between 23 and 25 ◦C across wide intervals. This is because basic thermostats allow the temperature to fluctuate a few degrees from the fixed temperature to reduce the frequency with which the cooling device is turned on and off. Consequently, it is seen that the HVAC control mechanism fails to respond to the rapid increase in outdoor temperature and occupancy numbers between 10 and 11 o'clock, and when the maximum occupancy number is reached, the indoor temperature values stay outside the comfort limits. Additionally, although the

fluctuations for 7 June (Figure 13b) show a similar pattern, low outdoor temperatures cause the indoor temperatures to return to comfort limit values more quickly.

**Figure 12.** Comparison of indoor temperatures of scenarios for Monday, 29 August 2019.

**Figure 13.** Comparison of indoor temperatures of scenarios for Saturday, 7 June 2019.

The S3 scenario represents the energy simulation according to our new control approach without pre-cooling, which is Algorithm 1, with the HVAC control adjusting according to the occupancy rate. Although there was a decrease in occupancy between 01:00 p.m. and 02:00 p.m. and between 04:00 p.m. and 05:00 p.m. on 29 August, the cooling status remained on as the outdoor temperature was higher than the setpoint (Figure 12c). At 07:00 p.m., as occupancy started to decrease and the air temperature dropped, the cooling went off, and the indoor temperature increased due to the occupancy. As a result of the dramatic decrease in the number of people, this increase ended before exceeding the

comfort level. Furthermore, since the HVAC became operational in response to the rise in occupancy, the indoor air temperature remained mostly at the comfort level.

The scenario S4 for 29 August (Figure 12d) produces the same result as does S3 because conditions that would activate the pre-cooling did not arise on this day. Similarly, there is no difference in the application of the algorithm in the simulation with the estimated occupancy values on 29 August (Figure 12e,f) because the increase and decrease trends are captured correctly by ANN. Depending on the difference in occupancy values between real and predicted, small changes are observed in temperature changes and fluctuations. For instance, indoor temperatures do not rise as high for the simulations with predicted values as for the simulation with real values after the cooling is off because the predicted values are smaller than others for those time intervals.

In the S3 scenario for 7 June (Figure 13c), the cooling is switched off between 11:00 a.m. and 01:00 p.m. because the outdoor temperature was below the setpoint with a decrease in the number of people between these hours. An indoor temperature increase is observed to occur naturally in this period, but the low outdoor temperature prevents this increase from reaching significant levels.

Likewise, with the increase in the number of people, the cooling becomes active again from 01:00 p.m. Similar to the scenario for 29 August, cooling is deactivated by the algorithm in the hours close to the shopping mall closing time. While the actual occupancy numbers increase, estimated occupancy values decrease between 03:00 p.m. and 04:00 p.m. However, the S3 scenario simulation with estimated occupancy (Figure 13e) follows the same cooling status as the simulation with actual occupancy (Figure 13c) since air temperature is above the setpoint between these hours. This situation is critical to minimize the failures due to inaccurate estimation by ANN.

The main difference between S3 (Figure 11c,e) and S4 (Figure 11d,f) is that cooling status is active at 12:30 p.m. for S4. The reason for this is that the pre-cooling algorithm is activated under suitable conditions at S4. While there is a decrease in both real and predicted occupancy numbers between 12:00 p.m. and 01:00 p.m., there is an increase of more than 250 people between 01:00 p.m. and 02:00 p.m.. The algorithm starts the cooling 30 min before this increase in occupancy to prevent comfort disturbances caused by the rapid increase. As a result of pre-cooling, the indoor temperature falls to the setpoint level at the beginning of the occupancy increase, contrary to the S3 scenarios. Similar to the simulations performed for 29 August, the actual and predicted occupancy numbers lead to slight differences in the simulations.

When the daily energy consumption results are examined for 29 August (Figure 14), scenario S1 has the greatest consumption with 4090.61 kWh, as expected. Scenario 2 provides an energy saving of approximately 30% compared to S1; with a consumption value of 2279.26 kWh; however, scenarios S3 and S4 consume 22% less energy than S2. When the daily energy consumption results are analyzed for 7 June (Figure 14), the energy consumption trends generally show a similar pattern to that of 29 August. The S2 scenario uses almost 30% less energy than the S1, while S3 and S4 provide an energy saving of approximately 10% over S2. The savings presented by the HVAC control scenario are lower in June than August because delays resulting from the sensor-based approaches at low air temperatures affect the energy efficiency less.

There are also some minor naturally based differences between the simulations performed according to actual and estimated occupancy numbers because the simulation tool adjusts the HVAC power depending on the occupancy. Regarding the ANN values, it is natural to obtain a lower energy consumption because of simulations with predicted values for 29 August, since the average of the predicted occupancy number, 1485.27, is lower than the real occupancy average, 1613.57. Similarly, energy consumption values of simulations with predicted occupancy are greater than simulations with real occupancy because the average of the predicted occupancy, 1755.32, is greater than the real occupancy average, 1679.64.

**Figure 14.** Comparison of energy consumption values of scenarios.

#### **7. Conclusions**

This paper presented an analysis of different HVAC control approaches according to their level of development using energy simulations in which ANN features as the focus of the study due to the need for the sensor-free control mechanism. The ANN analysis was performed using real occupancy and weather information collected for each day and hour, with the energy simulations performed for four scenarios using IDA-ICE software.

The ANN results showed that the prediction of occupancy numbers according to time intervals could be calculated with almost 87% accuracy. This accuracy rate was achieved with a limited dataset, and estimation precision should be expected to increase with stronger datasets developed over time. Further, the ANN prediction responded to different parameters, such as special days. This allows the proposed HVAC control algorithm to be used year-round, without exceptions.

According to Wong and Li (2010), "total energy use" is the top selection criterion, followed by "system reliability and stability", "operating and maintenance cost", and "control of indoor humidity and temperature". Since our control strategy is based on data, not real-time detection tools, while it reduces energy consumption, it also very positively affects reliability and operating cost. Different scenarios varying according to level of development were used to measure the effectiveness of our new HVAC control mechanism. A detailed examination of energy simulation results has revealed that the scenarios representing our AI-based occupant-centric control approach (S3 and S4) save a minimum of 10% energy consumption as compared to the traditional sensor-based approach (S2) and a minimum of 35% on those with full-powered HVAC at all times (S1). In the months when the outside temperature is high, these rates reach approximately 20% and 40%, respectively, because traditional approaches allow the indoor temperature to fluctuate excessively, causing an increase in the power consumed for cooling.

Another significant result is that there were only very slight differences in indoor temperature and energy consumption results between simulations performed with predicted and real occupancy numbers. This shows that using estimated values in the HVAC control algorithm does not significantly change the energy consumption or comfort level. Manifestly, the transformation of control approaches proposed has great potential for energy savings.

A few limitations should be noted. First, the proposed control algorithms (Algorithms 1 and 2) were not designed with any complexity; the study design was selected with relatively simple algorithms to show the savings to be made in a simple way. In cases where occupancy tends to decrease slightly for long periods and the outdoor temperature is low, for example, the cooling may remain off for a long time, a situation that was not represented here. In such cases, the occupancy not being very low could cause the interior temperature to rise (i.e., even though the air temperature is low). To avoid such a situation, the algorithm can easily be made more complex with the addition of further parameters, such as occupancy limit and cooling-off time limit.

Second, even though day-ahead weather forecasts mostly make perfect predictions for the following day, some days might fall outside the acceptable margin of error. Such a situation could cause a decrease in the comfort level or inefficiency in the energy consumption, albeit only for very limited periods (or very few days). However, and similarly not considered in this study, existing sensors might be used as an aid tool to measure the real situation and included in the algorithm (as stated) to prevent both these shortcomings.

As a major condition of the experimental design and thus a third limitation, only the cooling function of the HVAC was investigated. Regarding further research, therefore, a control algorithm can also be developed for heating. Then, the method for HVAC control introduced in this study may be applied to the shopping mall by real experimental setup and the results observed in reality. Furthermore (as indicated), more complex control algorithms can be developed according to the specific occupancy pattern of the building studied.

Finally, this study differs from others in considering prediction occupancy numbers with ANN as the main focus in order that significant energy savings can be achieved with a simple control algorithm. For this reason, the study can be a pioneer in terms of a new HVAC system with low installation cost and high energy efficiency. This research can play a major role in guiding the AI-based occupant-centric control tool for sustainable development, which can be used as a standalone control mechanism as it improves.

**Author Contributions:** Conceptualization, A.Y. and O.B.T.; methodology, A.Y., K.S.S., Y.A., M.K. ´ and B.K.; software, A.Y. and K.S.S.; validation, M.K. and B.K.; formal analysis, Y.A.; investigation, ´ A.Y.; resources, B.K., Y.E.A. and O.B.T.; data curation, A.Y.; writing—original draft preparation, A.Y.; writing—review and editing, Y.A.; visualization, A.Y.; supervision, O.B.T.; project administration, O.B.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Some or all data, models, or codes that support the findings of this study are available from the corresponding author upon reasonable request.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Semantic Segmentation Algorithm-Based Calculation of Cloud Shadow Trajectory and Cloud Speed**

**Shitao Wang, Mingjian Sun and Yi Shen \***

Department of Control Science and Engineering, Harbin Institute of Technology, Harbin 150001, China **\*** Correspondence: shen@hit.edu.cn; Tel.: +86-451-86413411-8602; Fax: +86-451-86418378

**Abstract:** Cloud covering is an important factor affecting solar radiation and causes fluctuations in solar energy production. Therefore, real-time recognition and the prediction of cloud covering and the adjustment of the angle of photovoltaic panels to improve power generation are important research areas in the field of photovoltaic power generation. In this study, several methods, namely, the principle of depth camera measurement distance, semantic segmentation algorithm, and longand short-term memory (LSTM) network were combined for cloud observation. The semantic segmentation algorithm was applied to identify and extract the cloud contour lines, determine the feature points, and calculate the cloud heights and geographic locations of the cloud shadows. The LSTM algorithm was used to predict the trajectory and speed of the cloud movement, achieve accurate and real-time detection, and track the clouds and the sun. Based on the results of these methods, the shadow area of the cloud on the ground was calculated. The recursive neural LSTM network was also used to predict the track and moving speed of the clouds according to the cloud centroid data of the cloud images at different times. The findings of this study can provide insights to establish a low-cost intelligent monitoring predicting system for cloud covering and power generation.

**Keywords:** solar energy; semantic segmentation algorithm; cloud moving prediction; cloud shadow; cloud speed

#### **1. Introduction**

Solar energy is a widely distributed and sustainable source of energy worldwide. Photovoltaic power generation technology can directly convert light energy into electrical energy through the photovoltaic effect, and it has the advantages of no pollution, safe use, and convenient maintenance. With continuous technical improvement and cost reduction, photovoltaic power generation has increased rapidly. In 2005, the global cumulative installed photovoltaic capacity exceeded 5 GW. According to "Snapshot of Global PV Markets 2020" [1] issued by the International Energy Agency, by the end of 2019, the global installed capacity exceeded 600 GW, and the average annual growth rate was 41%. In the past three years (2019–2022), the annual installed capacity has exceeded 100 GW. Figure 1 shows the global installed photovoltaic capacity over the past 10 years (2011–2019).

Large-scale photovoltaic projects require real-time monitoring of power quality and operating information while maintaining optimal scheduling. Therefore, it is essential to ensure the accurate forecasting of generation capacity, especially short-term and real-time forecasting [2]. Therefore, dynamically adjusting the solar panel according to weather type, cloud occlusion, and the radiation angle of sunlight to maximize the power generated by photovoltaic modules has always been an important research topic in the field of photovoltaic power generation [3–5]. Changes in photovoltaic power generation are almost proportional to the changes in radiation intensity, which are directly affected by cloud occlusion. Different weather types and cloud cover lead to considerable changes in the power generated by photovoltaic systems and power grid fluctuations [6–9].

However, in the field of photovoltaic power generation, it has always been challenging to accurately predict the weather type and cloud movement [10,11].

**Citation:** Wang, S.; Sun, M.; Shen, Y. Semantic Segmentation Algorithm-Based Calculation of Cloud Shadow Trajectory and Cloud Speed. *Energies* **2022**, *15*, 8925. https://doi.org/10.3390/en15238925

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 24 October 2022 Accepted: 23 November 2022 Published: 25 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Cloud cover is usually analyzed based on the shape, size, distribution and evolution, and height of the clouds. The cloud shape changes easily; therefore, clouds should be monitored continuously and in real-time. Traditional artificial observation is based on subjective judgment and observation experience, so it cannot accurately predict the cloud shade. With research advances, observation technology has developed significantly. Contemporary observation technologies for different space environments include satellite- or space-based equipment for atmospheric observations and ground-based equipment for near-Earth observations. In large-scale ground-based photovoltaic power stations, power prediction is mainly based on short-term and real-time monitoring of the weather in the plant area. Considering the area of the power station, the speed and economic cost of ground equipment required for cloud monitoring as well as the monitoring method should be selected for cloud observation. Table 1 lists the various cloud height observation methods.



Among the methods mentioned in Table 1, cloud meter and radar measurement are widely used; however, these methods have the disadvantages of high cost and inconsistent measurement results; moreover, it is challenging to obtain the edge profiles of clouds and predict the cloud shading range. Therefore, in this study, we developed and investigated a new low-cost prediction method that combined sky images and machine learning methods to obtain an accurate cloud height, extract the edge contours of clouds, measure the shade range (i.e., cloud cover), and predict and analyze the moving direction and speed of the clouds.

In this study, the weather type was detected and identified in real-time by using artificial intelligence algorithms and deep learning networks. In sky images, the existence of cloud shielding, range of shielding, and moving speed and track of clouds are determined to obtain insights for guiding the angle of photovoltaic panels and increasing power generation, providing a basis for the real-time prediction of photovoltaic power generation.

#### **2. Predicting Cloud Shadow Moving Trajectory and Speed**

#### *2.1. Method for Cloud Monitoring*

For cloud monitoring, multiple cloud cameras are distributed across a photovoltaic power station field to obtain aerial images that contain high-resolution spatiotemporal information about solar radiation. Various software can be used to process the information in the control room. Thereafter, the predicted value of solar energy is obtained. Several researchers have conducted related studies. In 2013, Tao et al. used a pair of CCD (charge coupled device) digital cameras to set a baseline length of 60 m to form a cloud base-height measurement system with binocular imaging [12–14]. The Harris corner detector was used to extract the corner features of the images, and then the relative disparity was obtained according to the matching feature points. The principle of photogrammetry was used to calculate the height of the cloud base. In 2013, Zhang et al. used industrial cameras and image processing technologies for cloud monitoring [15,16]. Cloud height was calculated based on the dual-camera measuring distance principle. The same feature points were obtained using CSIFT (color scale-invariant feature transform) and SIFT (scale-invariant feature transform) methods for object matching to detect the cloud speed. In 2015, Peng et al. used support vector machine classifiers to identify cloud clusters from multiple TSI images and evaluated the essential height and movement of each cloud cluster [17–21]. In 2018, the German DLR Solar Energy Research Institute developed the WobaS system [22–26]. This system comprises 2–4 cloud cameras that are used to capture sky images. These images are evaluated and the cloud speed and future distribution are calculated, enabling the successful prediction of solar radiation values in the next 15 min.

Typical cloud detection and measurement methods based on dual imaging systems use similar hardware; these methods often utilize two or more cameras (especially fisheye cameras with large viewing angles and TSI devices) and apply a similar triangle principle to calculate the distance between the cloud and the cameras (depth camera principle). These methods can thus achieve high-resolution images at low equipment cost. However, the software used in these methods are often different from those used in conventional machine learning algorithms or early deep learning algorithms [27–31] However, the recognition accuracy and feature matching of clouds and sun were insufficient. The results showed significant calculation errors in cloud parameters such as cloud height, cloud area, cloud shadow, and cloud speed.

Deep learning algorithms are mainly divided into three categories:

A convolutional neural network (CNN) is commonly used for image data analysis and processing such as image classification, target detection, and semantic segmentation (i.e., Mask R-CNN and YOLACT). A recurrent neural network (RNN) such as a long- and short-term memory (LSTM) network is often used for text analysis or natural language processing. A generative adricative network (GAN) is typically used for data generation or unsupervised learning applications such as generating similar original data; 3D-GAN is used to generate high-quality 3D objects [32–35]. In 2018, He et al. proposed the Mask R-CNN method based on faster R-CNN, which is an extension of Mask [36–38].

In this study, first, the edge contours of clouds were obtained, and then, the feature points on the edges were obtained using the PSPNet semantic segmentation algorithm [39] based on the images obtained from the CMOS imaging system. Furthermore, the LSTM algorithm was used to obtain the cloud parameters such as cloud height and moving track and speed of cloud shadow, which were then combined with geographic information to predict the cloud shadow occlusion on the ground.

#### *2.2. Cloud Edge Contour Extraction and Feature Point Recognition*

This study requires distinguishing the cloud and non-cloud parts of a picture, that is, to classify each pixel to form the boundary of a cloud. The sky and clouds have different colors; therefore, we considered using color features to classify each pixel to form a boundary. The texture features of the sky and clouds are also different; thus, texture features can also be used for classification. Texture features can be extracted using the gray-gradient co-occurrence matrix (GGCM).

Several types of neural networks can realize semantic segmentation; herein, the PSPNet network semantic segmentation algorithm (Figure 2) [11] was used for classification. In the PSPNet network, the netscope space pyramid pool structure was adopted, as shown in Figure 3.

**Figure 3.** Netscope space pyramid pool structure.

PSPNet is a modification of the basic RESNET architecture, and it uses hole convolution. First, it pools the features and then processes them at the same resolution in the whole encoder network (one-fourth of the original image input) until a spatial pooling module is obtained. Auxiliary loss is considered in the middle layer of RESNET to optimize the overall learning, and the global context model is aggregated in the spatial pyramid pooling layer at the top of the modified RESNET encoder.

In this study, 3800 cloud pictures were annotated using lableme software and given as input to the network model for training. The training process is as follows:

Step 1: The weight and deviation of neurons in each layer is initialized.

Step 2: Forward propagation: the image is converted into a matrix input in RGB format, the linear combination value is obtained through the weight and deviation of neurons in each layer, and then the activation function is applied to the linear combination value.

Step 3: The loss function is used to calculate the error between the output value of forward propagation and the annotated images, and the weight and deviation of neurons in each layer are optimized using the back propagation algorithm according to the error.

Step 4: Steps 2 and 3 are repeated iteratively to reduce the error to a specified value, and the weight and deviation of each layer of neurons are saved to obtain a well-trained model.

The cloud images are given as input into the model, and the training results are shown in Figure 4.

**Figure 4.** Example results of PSPNet.

#### *2.3. Cloud Movement Trajectory and Velocity Recognition Based on LSTM Network*

Cloud moving can be considered as a time series prediction problem. Contemporary deep learning methods mainly use RNNs. In this study, a LSTM network model was used, which comprises an input layer, a hidden layer, and an output layer. The internal structure of the hidden layer is shown in Figure 5.

**Figure 5.** Internal structure of the hidden layer.

In Figure 5, *t* − 1, *t*, and *t* + 1 are continuous time series, *X* is the input sample, *St* is the memory of the sample at time *t*, and *St* = *f*(*W* × *St*−<sup>1</sup> + *U* × *Xt*), where *W* represents the weight of the last time, *U* represents the weight of the input sample at the moment, and *V* represents the weight of the output.

For general initialization, the start time is considered as *t* = 1, input is *S*<sup>0</sup> = 0, and *W*, *U*, and *V* are initialized randomly; then, Equation (1) is used for prediction.

$$\begin{aligned} \mathcal{W}\_1 &= \mathcal{U}\mathbf{x}\_1 + \mathcal{W}\mathbf{s}\_0\\ \mathbf{s}\_1 &= f(\boldsymbol{h}\_1) \\ \boldsymbol{\sigma}\_1 &= \mathbf{g}(\boldsymbol{V}\mathbf{s}\_1) \end{aligned} \tag{1}$$

where *f* and *g* are activation functions.

As time progresses, the state *s*<sup>1</sup> is considered the memory state at start time *t*1, and these parameters then participate in the next predicting activity, as shown in Equation (2).

$$\begin{aligned} h\_2 &= \mathcal{U}x\_2 + \mathcal{W}s\_1\\ s\_2 &= f(h\_2) \\ o\_2 &= g(Vs\_2) \end{aligned} \tag{2}$$

Finally, the final output value is obtained using Equations (2) and (3).

$$\begin{aligned} lh\_t &= \mathbb{L}\mathbf{1}\mathbf{x}\_t + \mathbf{W}\mathbf{s}\_{t-1} \\\\ \mathbf{s}\_t &= f(h\_t) \\ o\_t &= \mathbf{g}(Vs\_t) \end{aligned} \tag{3}$$

LSTM updates the weight parameters *W*, *U*, and *V* using the loss function. For each time sequence, LSTM produces an error value *et*. The total error value *E* is calculated using Equation (4).

$$E = \sum\_{t} \varepsilon\_{t}$$

$$\nabla U = \frac{\partial E}{\partial U} = \sum\_{t} \frac{\partial \varepsilon\_{t}}{\partial U}$$

$$\nabla V = \frac{\partial E}{\partial V} = \sum\_{t} \frac{\partial \varepsilon\_{t}}{\partial V} \tag{4}$$

$$\nabla W = \frac{\partial E}{\partial W} = \sum\_{t} \frac{\partial \varepsilon\_{t}}{\partial W} E = \sum\_{t} \varepsilon\_{t}$$

#### *2.4. Calculating the Cloud Height and Shadow*

2.4.1. Method of Calculating the Cloud Height and Shadow

Two cameras with the same internal parameters were placed in parallel so that their optical axes were parallel to each other and the cameras faced vertically upward. Another pair of coordinate axes were collinear. The two imaging planes were coplanar. The optical centers of the two cameras were at a fixed distance of *d*. Figure 6 shows a schematic of binocular stereo vision.

**Figure 6.** Schematic of binocular stereo vision.

In the above camera arrangement method, we assumed that the coordinate system of camera *C*<sup>1</sup> was *O*1*X*1*Y*1*Z*1, the coordinate system of camera *C*<sup>2</sup> was *O*2*X*2*Y*2*Z*2, the focal length of the two cameras was *f* , and the distance of the camera was *d*. The coordinates of any space point *p* photographed by two cameras at the same time are expressed as (*x*1, *y*1, *z*1) in the *C*<sup>1</sup> coordinate system and (*x*2, *y*2, *z*2) in the *C*<sup>2</sup> coordinate system. The image coordinates of space point *p* in camera *C*<sup>1</sup> are (*u*1, *v*1). The coordinate of the image point in camera *C*<sup>2</sup> is (*u*2, *v*2). Therefore, the ratio of *p* to camera *C*1's *X* wheelbase from *x*<sup>1</sup> and *u*<sup>1</sup> is equal to the ratio of the *Y* wheelbase from *y*<sup>1</sup> and *u*<sup>1</sup> to camera *C*1, and this ratio is equal to the ratio of the camera's focal length *f* and *p*. to camera *C*1's *Z* wheelbase from *z*1. The same is true for *p* and camera *C*2. From this, the 3D space depth camera uses Equations (5) and (6).

$$\begin{cases} \frac{f}{z\_1} = \frac{\mu\_1}{x\_1} = \frac{\upsilon\_1}{y\_1} \\ \frac{f}{z\_2} = \frac{\mu\_2}{x\_2} = \frac{\upsilon\_2}{y\_2} \end{cases} \tag{5}$$

$$\begin{cases} X = x\_1 = x\_2 + d \\ \quad Y = y\_1 = y\_2 \\ \quad Z = z\_1 = z\_2 \end{cases} \tag{6}$$

These two equations are combined as follows.

$$\begin{aligned} \mathbf{x}\_1 - \mathbf{x}\_2 &= d\\ \mathbf{x}\_1 &= \frac{z\_1}{f}\mathbf{u}\_1 = \frac{z}{f}\mathbf{u}\_1\\ \mathbf{x}\_2 &= \frac{z\_2}{f}\mathbf{u}\_2 = \frac{z}{f}\mathbf{u}\_2 \end{aligned} \tag{7}$$

The binocular 3D vision method is used to reconstruct the 3D space points using Equations (8)–(12).

$$d = \frac{z}{f}(\mu\_1 - \mu\_2) \tag{8}$$

$$X = x\_1 = \frac{z}{f} u\_1 = \frac{u\_1}{u\_1 - u\_2} d \tag{9}$$

$$Y = y\_1 = \frac{z}{f}v\_1 = \frac{v\_1}{\mu\_1 - \mu\_2}d\tag{10}$$

$$Z = \frac{f}{\mu\_1 - \mu\_2}d\tag{11}$$

$$S\_{\rm act} = \left(\frac{Z}{f}\right)^2 S\_{\rm ing} \tag{12}$$

#### 2.4.2. Calculation Method of Solar Irradiation Angle

To calculate the illumination angle of sunlight, first, the altitude and azimuth of the Sun are calculated. The relationship between the altitude angle and the latitude angle and the time angle is obtained from the geometric relationship of the Sun and the Earth using Equation (13):

$$
\sin \phi = \sin \mathcal{Q} \sin \delta + \cos \mathcal{Q} \cos \delta \cos \omega \tag{13}
$$

where ∅ is the local latitude; *δ* is the declination angle; and *ω* is the hour angle.

The solar declination angle (*δ*) is the angle between the Sun and the Earth center line and the equatorial plane. As the Earth moves around the Sun, the declination angle changes accordingly. The declination angle is representative of the season and fluctuates between −23◦26 and +23◦26 , and it repeats the cycle in years. The approximate declination angle is calculated using Equation (14):

$$\delta = 23.45 \times \sin\left(360 \times \frac{284 + n}{365}\right) \tag{14}$$

where *n* represents the date serial number (based on 1 year), and it is in the range of 1–365. For a leap year, the value of n will be 1–366, and the denominator 365 will be changed to 366.

Azimuth is represented by *γ*, and it can be considered the approximate angle between the shadow and the meridian of a straight line erected on the ground under the Sun, that is, the angle between the shadow cast by the light falling on the ground and the local meridian. *γ* is set to 0 in the due north of the target, continues to expand clockwise, and changes in

the range 0–360◦. The relevant measurement work was carried out in a clockwise direction, the starting destination of the solar azimuth was set in the north of the reference object, the ending destination was considered the incident direction of sunlight, and the required angle was measured in a clockwise direction.

The relationship between the azimuth, altitude angle, declination angle, dimension, and time angle is expressed using the following equations.

$$
\sin \gamma = \frac{\cos \delta \sin \omega}{\cos \varphi} \tag{15}
$$

$$\cos \gamma = \frac{\sin \mathfrak{a} \sin \mathcal{Q} - \sin \delta}{\cos \mathfrak{p} \cos \mathcal{Q}} \tag{16}$$

The solar time angle *ω* in Equations (13) and (15) can be obtained using the following equations.

$$
\omega = 15(ST - 12) \tag{17}
$$

$$ST = LT + Z \tag{18}$$

where *ST* is true solar time, *LT* is the local time, and *Z* is the time zone; the 24 h format is used to calculate time.

The projection area of a cloud on the ground is predicted by calculating the cloud height, the edge contour of cloud, and the illumination angle of sunlight.

#### **3. Results and Discussion**

#### *3.1. Verification Experiment and Results of Object Shadow Casting*

Real-time measurement of clouds and cloud shadows is challenging; therefore, we used fixed objects such as a flagpole to replace clouds for the experiments. A local coordinate system was established with the flagpole as the origin. First, the relative position of the flagpole and the camera was estimated, and then the relative position of the flagpole and the shadow was determined. The estimated results were compared with the actual measurement to verify the effectiveness of the cloud shadow position calculation in the local coordinate system. There was only a rotation and translation transformation relationship between the local coordinate system and the world coordinate system; therefore, the effectiveness in the local coordinate system is equal to that in the world coordinate system. The experimental method is presented in Figure 7.

**Figure 7.** Relative position model of the flagpole and camera.

The experimental steps are as follows:

Step 1. Two adjustable level platforms are set up under the flagpole, the level ruler is placed on the level platform, and adjusted to the level of the water platform.

Step 2. The cameras are placed on a horizontal platform in parallel, and they capture images in a vertically upward position.

Step 3. The distance between the two water platforms is measured and recorded.

Step 4. The length and azimuth of the shadow is measured and recorded.

Step 5. The azimuth and distance of the flagpole relative to the two cameras are measured and recorded.

Step 6. The length and azimuth of the shadow and the azimuth and distance of the flagpole relative to the two cameras are calculated.

Step 7. The local coordinate system is built with the flagpole as the origin, and the calculation results and measurement results are expressed in the local coordinate system for comparison.

The relative position between the flagpole and the camera can be determined by the distance between the flagpole and the camera and the angle between the flagpole and the camera and the two cameras. First, the angle between the flagpole and the camera line and the two camera lines is calculated, that is, the angles *α* and *β*, respectively (Figure 8). *α* and *β* are calculated using the images captured using cameras A and B. We carried out camera correction; therefore, the connecting line between the two observation points can be considered the transverse dividing line passing through the center point in the photos taken by cameras A and B. O is considered the center point for taking photos, and P is the imaging point of the flagpole vertex in the photo; it was assumed that the pixel coordinates of points O and P were (*x*0, *y*0) and (*x*, *y*), respectively. Then,

$$\sin \alpha = \frac{|y - y\_0|}{\sqrt{\left(\chi - \chi\_0\right)^2 + \left(y - y\_0\right)^2}} \tag{19}$$

$$a = \arcsin(\sin(a))\tag{20}$$

The *α* obtained using the above equation is in agreement with that obtained in Figure 7. *β* was obtained in a similar manner.

Using the obtained values of *α* and *β*, the distance between the flagpole and the camera was calculated, as shown in Figure 7, *DE*⊥*AB*, *DE* = *AB*/(cot*α* + cot*β*). Then, *AD* = *DE*/tan*α*, where *AD* is the horizontal distance between observation point A and flagpole vertex *C*. Similarly, the distance *BD* between the flagpole and observation point *B* could be calculated. Thus, we determined the relative positions of the flagpole and the camera.

Because the Sun is sufficiently far from the Earth, the sunlight reaching the Earth can be considered parallel light. Therefore, for the same object, the length of its shadow is determined by the solar altitude angle. A larger solar altitude angle implies a shorter shadow, smaller solar altitude angle, and longer shadow. As shown in Figure 8, when the cloud height h and the solar altitude angle α are known, then, the shadow length *d* = *H*/tan*α*.

The direction in which the shadow extends is opposite to the direction of the Sun; thus, the direction of the shadow can be calculated by the Sun azimuth, as shown in Figure 9; *α* is the azimuth of the Sun with 0◦ due north, and *β* is the shadow azimuth with 0◦ due north. *B* = *α* − 180◦.

**Figure 8.** Demonstration of shadow length.

**Figure 9.** Demonstration of the shadow orientation.

The experimental contents and steps are as follows.

Two platforms were placed under the flagpole and adjusted to horizontal using a level ruler. The cameras were placed on a horizontal platform, and the lens was placed facing vertically upward to capture the top of the flagpole. The distance between two horizontal platforms as well as the length and extension direction of the flagpole in the ground shadow were measured and recorded. Any distortion of the captured picture was corrected, and then the height h of the flagpole was calculated. The solar altitude angle at that time was calculated according to the longitude and latitude of the shooting location and the shooting time *α* and azimuth *β*. Then, the shadow length and extension direction were calculated. The local coordinate system was built with the flagpole as the origin, and the calculation results and measurement results were expressed in the local coordinate system for comparison.

The pictures taken by the left and right cameras are presented in Figure 10.

**Figure 10.** Pictures taken by the (**left**) and (**right**) cameras.

The pixel coordinates of the center point of the picture were (2144,1424); in the pictures taken by the left and right cameras, the pixels at the top of the middle flagpole were (2913,1849) and (1465,1797), respectively. Camera distance (baseline) was 7.35 m, the measured shadow length was 31.2 m, and the measured shadow orientation was 94◦ (0◦ due north). The angle between the connecting line of flagpole and the left camera and the connecting line of the two cameras was 28◦, and the distance from the flagpole to the left camera was 3.9 m. The angle between the connecting line of the flagpole and the right camera and the connecting line of the two cameras was 30◦, and the distance from the flagpole to the right camera was 3.8 m.

According to the principle of calculating the relative position between the flagpole and the cameras, the angle between the connecting line of the flagpole and the left camera and the connecting line of the two cameras was 28.93◦, with an error rate of 3.32%. The distance from the left camera to the flagpole was 3.66 m, with an error rate of 6.15%. The angle of the connecting line between the flagpole and the right camera and the connecting line between the two cameras was 28.78◦, with an error rate of 4.07%. The distance from the right camera to the flagpole was 3.69 m, with an error rate of 2.89%.

The calculated flagpole height was 16.3039 m. The longitude and latitude of the flagpole were 122.082920◦ E and 37.530085◦ N. The shadow was measured at 16:40:00 on 23 July 2021. The calculated solar altitude angle and azimuth angle were 27.2826◦ and 275.2011◦, respectively.

According to the shadow length *d* = *H*/tan*α*, the calculated shadow length was 31.6117 m, and the error rate was ~1.32%.

Because the shadow azimuth equal to the sun azimuth minute 180◦ and the shadow azimuth was 95.2012◦; therefore, three groups of experiments were carried out, and the results are shown in Table 2.


**Table 2.** Experimental results.

P1 and P2—pixel coordinates of the top of the flag pole in the images taken by the left and right cameras, respectively; BL—length of the baseline, that is, distance between the two cameras (m); LS—measured shadow length of the flagpole (m); SA—azimuth angle of the flagpole shadow; CLS—flagpole shadow length predicted using the proposed method; CSA—flagpole shadow azimuth predicted using the proposed method; ER—error ratio of the predicted shadow length.

After many experiments, the experimental data of the solar altitude and solar azimuth were compared with the reference data, and the average errors were 0.0568◦ and 0.0629◦. Therefore, we believe that the experimental method for calculating the solar altitude and solar azimuth is reliable.

#### *3.2. Verification Experiment and Results of Cloud Shadow Moving Track and Speed*

For continuously moving clouds, the LSTM network is used to predict the moving direction and speed of clouds from a set of continuous cloud images. Considering the influence of the change of solar orientation on the cloud shadow position, the cloud shadow cannot be predicted directly. The proposed method first predicts the cloud position and then calculates the cloud shadow position combined with the cloud height and solar orientation information. Because the cloud height changes, it is necessary to train the data pertaining to a change in cloud height. Using multiple groups of continuous cloud images, the cloud centroid and cloud height were obtained as the training set for training the LSTM network. The LSTM network brings an additional operation into the network through exquisite gate control, solving the problem of gradient disappearance.

After training the neural network, some data are used for prediction. After using the cloud centroid and cloud height calculated from continuous pictures as the input, the network predicts the cloud position and cloud height for a certain time in the future; then, the shadow position combined with the sun orientation information is calculated. This predicted position of the cloud is the position of the cloud centroid. The changes in cloud shape are irregular; therefore, the shape of the last input picture is used as the approximate shape of the prediction result.

Next, the prediction results were verified. For example, when the predicted position of the cloud was for 5 min later, then the cloud was photographed 5 min later to calculate the actual position and compare it with the predicted position.

According to the taken photos, it was observed that the cloud contour changed with time; thus, the contour information cannot adequately characterize a cloud (the cloud contours in the pictures taken at different times are different). Therefore, this study used the centroid of the cloud contour to identify the location characteristics of the cloud.

To calculate the center of mass, n contour mass points are set on the x–O–y coordinate plane as the mass points of m1, m2, ... ... mn, respectively. Their coordinates can be (x1, y2), (x1, y2), . . . . . . (xn, yn), respectively; then, these n particles comprise a particle system.

Furthermore, x = My <sup>M</sup> <sup>=</sup> <sup>∑</sup><sup>n</sup> <sup>i</sup>=<sup>1</sup> mixi ∑n <sup>i</sup>=<sup>1</sup> mi , <sup>y</sup> <sup>=</sup> Mx <sup>M</sup> <sup>=</sup> <sup>∑</sup><sup>n</sup> <sup>i</sup>=<sup>1</sup> miyi ∑n <sup>i</sup>=<sup>1</sup> mi , where M is the total mass of the contour point, and Mx and My are the static moments of the particle on the *x*- and *y*-axes, respectively; then point (x, y) is the required center of mass.

The centroid of the cloud was obtained by calling the API for computing the centroid in Opencv library function in PYTHON and giving the coordinate data of the cloud contour as the input to the proposed model.

The centroid data of the cloud was first obtained using the centroid acquisition method to recognize the cloud edge contour of the cloud image through the UNET network, and then the centroid position of the cloud was calculated according to the edge contour of the cloud, as shown in Figure 11.

**Figure 11.** Method of calculating the centroid position.

From the images captured at 10 s intervals, we manually selected 11 images with whole clouds. Continuous photos were obtained by using the centroid acquisition method in succession; a data record contains 11 triples of information including cloud centroid longitude, cloud centroid dimension, and time, as shown in Figure 12.


**Figure 12.** Cloud position data.

Figure 13 shows 1140 records of such data; the first 1040 pieces were used as the training dataset and the remaining 100 pieces were used as the test dataset. The data were divided into two parts. The centroid longitude and latitude from time 1 to time 10 were used as the training input set, and the centroid longitude and latitude from time 2 to time 11 were used as the verification result set. Similarly, the centroid longitude and latitude from time 1 to time 10 of each data in the test dataset were used as the input, and the output of the model was the centroid longitude and latitude prediction of the next time corresponding to each time.

**Figure 13.** Comparison of the prediction of different cloud centroids at two times and corresponding real longitude and latitude.

The locus diagram of centroid points is drawn to demonstrate the real and predicted centroid longitude and latitude of the cloud, wherein the red dot is the real centroid longitude and latitude, the blue dot is the predicted centroid longitude and latitude, t0 is time 1, t1 is time 2, and so on. Figure 13 shows the comparison between the predicted results and the real longitude and latitude by tracking and predicting the centroids of different clouds at two different times. The red and blue dots in the figure are the true centroid longitude and latitude and the predicted centroid longitude and latitude, respectively.

After randomly selecting 100 centroid points, these were normalized and the mean square error was calculated, as shown in Figure 14. The blue points in the figure are the root mean square error of the predicted value. The root mean square error is the square root of the ratio of the sum of squares of prediction errors to the number of prediction times. A smaller value of the root mean square error implies a more accurate prediction.

**Figure 14.** Mean square error of 100 centroids randomly selected; the blue points in the figure are the root mean square error of the predicted value.

Multiple groups of data were verified; then, the CNN-LSTM network model was used to predict the moving trajectory of the cloud more accurately.

#### **4. Conclusions**

This study presents a new low-cost and easy-to-implement method for predicting the influence of cloud on solar radiation.

This method can accurately predict the trajectory of a cloud and be used at solar power stations to effectively predict the location of cloud shadows in tens of minutes, thus enabling an adjustment of the solar panels to a suitable angle in advance. Compared with other implementations, it can save product costs and increase the rate of generation of solar panel energy. This research is conducive to the progress of related works of solar radiation and energy generation.

The proposed method also had some limitations. The sky camera's shooting field of vision is limited; thus, the calculation and prediction of clouds in the sky are affected to some extent and can only be predicted within a limited range. Considering this limitation, during the actual implementation, the cloud prediction range can be increased by deploying sky cameras at multiple points around the area of the photovoltaic power station.

**Author Contributions:** Conceptualization, S.W.; Methodology, M.S. and Y.S.; Validation, S.W. and M.S.; Formal analysis, S.W.; Investigation, S.W. and M.S.; Supervision, Y.S.; Writing—original draft preparation, S.W.; Writing—review and editing, Y.S. and M.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Maritime Autonomous Surface Ships: Problems and Challenges Facing the Regulatory Process**

**Mohamad Issa 1,2,\*, Adrian Ilinca 2, Hussein Ibrahim <sup>3</sup> and Patrick Rizk <sup>2</sup>**


**Abstract:** Technological innovation constantly transforms and redefines the human element's position inside complex socio-technical systems. Autonomous operations are in various phases of development and practical deployment across several transport domains, with marine operations still in their infancy. This article discusses current trends in developing autonomous vessels and some of the most recent initiatives worldwide. It also investigates the individual and combined effects of maritime autonomous surface ships (MASS) on regulations, technology, and sectors in reaction to the new marine paradigm change. Other essential topics, such as safety, security, jobs, training, and legal and ethical difficulties, are also considered to develop a solution for efficient, dependable, safe, and sustainable shipping in the near future. Finally, it is advised that holistic approaches to building the technology and regulatory framework be used and that communication and cooperation among various stakeholders based on mutual understanding are essential for the MASS to arrive in the maritime industry successfully.

**Keywords:** autonomous shipping; MASS; IMO; maritime law; Maritime Safety Committee; advanced sensor module; shore control center; cyber security threats

#### **1. Introduction**

With fast-increasing technology, a new paradigm shift is occurring, considering alternative marine fuels that promise safer, greener, and more efficient ships than ever before in response to stringent international legislative requirements. The first change occurred during the First Industrial Revolution in the 1800s when mechanical power was introduced, and vessels began to be driven by steam-powered coal engines. The Second Industrial Revolution began in the early 1900s when the advent of diesel engines improved the efficiency and reliability of ships by using oil as a new fuel. The internet–digital revolution, representing the Third Industrial Revolution, introduced computerized ship control in the 1970s. With the introduction of gas as a fuel, such as liquefied natural gas (LNG) [1–5], we are taking a step closer to the new paradigm linked with cyber-physical systems and autonomy as part of "Shipping 4.0 [6–8]".

Porathe et al. [9] present four reasons why autonomous shipping is seen as a feasible choice: (1) the efforts to reduce transportation costs; (2) the need for a better onboard working environment for crews and the prevention of future seafarer shortages; (3) the need to reduce emissions on a worldwide scale; and (4) the desire to improve shipping safety. According to a 2010 report submitted to the International Maritime Organization (IMO) by the Baltic and International Maritime Council (BIMCO) and the International Shipping Federation (ISF), the shipping industry is expected to face tightening labor markets, with recurrent shortages of ship officers [10], due to hazardous working conditions and extended periods away from land. Under the fiercely competitive economy of scale, the shipping industry has seen downward pressure on freight rates and excess capacity. Reduced ship

**Citation:** Issa, M.; Ilinca, A.; Ibrahim, H.; Rizk, P. Maritime Autonomous Surface Ships: Problems and Challenges Facing the Regulatory Process. *Sustainability* **2022**, *14*, 15630. https://doi.org/10.3390/ su142315630

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 11 October 2022 Accepted: 22 November 2022 Published: 24 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

pollution and emissions and improved ship safety are more important than ever with the emergence of low- or zero-carbon alternative fuels [11].

Under these conditions, the launch of maritime autonomous surface ships (MASS) will be a watershed moment that will either disrupt or precipitate a paradigm shift in the shipping sector and maritime transport system. Therefore, communication and coordination among stakeholders, particularly those involved in the maritime and port industries, would be required for the safe, effective, and efficient adoption and operation of MASS. As a result, critical concerns related to autonomous shipping and their impact on policy, technology, and industry should be investigated together with their interaction for a successful introduction and smooth settlement of MASS and associated infrastructures in the marine industry.

On the regulatory side, the IMO agreed to conduct a regulatory scoping exercise (RSE) to assess the safe, secure, and environmentally sound operation of MASS [12]. However, the RSE would be a complicated issue because it would touch a few areas, including safety, security, contacts with ports, pilotage in the event of an incident, and the marine environment. In addition, international maritime conventions, such as the International Convention for the Safety of Life at Sea (SOLAS), the International Regulations for the Prevention of Collisions at Sea (COLREG), and the Standards for Training and Certification of Watchkeepers (STCW), apply to MASS [13]. Therefore, IMO Member States will be asked to review the scope of their domestic laws considering the RSE.

Technological development will improve ships' control capabilities, communication, and interfaces using the newest information and communications technology (ICT) systems. As a result, they will soon be operated by remote land-based or offshore services [14]. Unmanned watercraft have already been deployed for military, aeronautical, and research purposes. Deep-sea exploration also uses submersible unmanned vehicles, such as autonomous underwater vehicles (AUV) and remotely operated vehicles (ROV), which are still being developed. However, regarding safety, efficiency, and environmental protection, the technology that replaces manning must outperform the personnel [15].

On the industrial side, autonomous vehicles are already being developed in various means of transportation, such as airplanes, trains, and automobiles. Therefore, MASS is expected to significantly impact shipbuilding, equipment, and devices, as well as shipping and port infrastructures in the maritime industry. Furthermore, autonomy, automation, unmanned operation, big data, enterprise-grade connectivity, and analytics will steadily grow in the maritime industry [16]. As a result, good communication and coordination with essential stakeholders, particularly the shipping, shipbuilding, and port industries, are required to implement MASS properly.

To the best of the authors' knowledge, several review studies have discussed briefly or deeply the regulatory challenges concerning MASS. The authors' discussions and interviews with maritime experts such as naval officers, senior marine engineers, and naval architects inspired this essay. In this paper, the authors have chosen to focus on all the effects that MASS may have on the maritime industry at the human level (such as training and education), legislative level (definition of transparent laws and regulations), and technological level (such as security navigation). The paper presents some reflections on the obstacles and issues that need to be clarified soon. It does not deal with data based on experiments, calculations, or quantified scenarios. The primary motivation of the authors is to present the magnitude of these challenges and the work that remains to be done to achieve safe autonomous surface ship navigation worldwide.

The structure of the present paper is as follows. We first introduced the latest projects on global trends for building autonomous vessels. Second, the impact of MASS on regulations, technology, and industries has been explored, as well as their relationships to uncover both previous and future efforts to prepare for the new maritime paradigm change. Finally, other essential problems, e.g., safety, security, jobs, training, ethics, liability, and insurance, were explored to obtain greater insight regarding future shipping that is efficient, reliable, safe, and sustainable.

#### **2. Global Autonomous Vessel Developments**

The shipping industry has recently faced changes due to the Fourth Industrial Revolution. One such transition is the AI (artificial intelligence), robots, IoT (internet of things), and autonomous vehicles paradigm shift in technological progress [17]. Big data and the achievements of the Third Industrial Revolution have been integrated with AI and IoT technology to enable smart shipping. Autonomous ships, e-Navigation, and smart ports are further examples of marine transportation advancements.

Many companies, including Rolls Royce, DNV, the Norwegian University of Science and Technology (NTNU), and Norway's Kongsberg, have announced ambitious intentions to create all-electric and autonomous container ships by 2020, as shown in Table 1. Other groups worldwide are working on similar, if not competing, concepts and systems to enable unmanned operations and infrastructure initiatives, such as autonomous ports and high-speed communications.

**Table 1.** Maritime full-electric and autonomous vessels projects since 2012.


In 2012, the European Commission-funded project Maritime Unmanned Navigation through Intelligence in Networks (MUNIN) began looking into unmanned ships' feasibility in various areas, including technical maturity, economic benefits, social impact, and safety during deep-sea voyages [18,19]. Following the MUNIN project, DNV and NTNU launched the Revolt as a specific research project to build an autonomous, zero-emission, and short-sea vessel to help manage traffic congestion in urban regions on the EU's road network [20,21].

The Advanced Autonomous Waterborne Applications Initiative (AAWA), founded by Rolls-Royce in 2015, is another notable initiative related to autonomous vessels. This project brought together a diverse group of stakeholders, including universities, ship designers, equipment manufacturers, and classification societies, to examine the economic, social, legal, regulatory, and technological barriers that must be overcome for autonomous ships to become a reality. Its goal is to provide preliminary designs for the future generation of innovative ship solutions, complete with technical specifications [22].

The Yara Birkeland is one of the most recent autonomous ship initiatives. Yara and Kongsberg built the world's first totally electric container feeder vessel. Reducing up to 40 thousand truck travel in densely populated urban areas is estimated to significantly cut NOX and CO2 emissions while enhancing road safety and alleviating traffic congestion [23,24].

Last but not least, the e5 project is a Japanese consortium dedicated to developing renewable energy-powered commercial ships. The name "e5" refers to the partnership's five "focus points": electrification, environment, evolution, efficiency, and economics. The e5 Tanker claims to be the world's first entirely electric oil tanker, with a 3.5 MWh battery that can "operate non-stop for 10 h on a half-capacity battery", according to the company [25]. In addition, the ship will have a high level of automation [26] and will be charged using wind and solar energy to cut emissions further [27].

#### **3. Problems and Challenges Facing the Regulatory Process**

The fact that all technical shipping rules relating to the safety of navigation, environmental protection, and training/watchkeeping standards were designed with the idea that humans would do some functions must be reviewed in the context of autonomous vessels. A few instances are sufficient to demonstrate the flaws in the current regulatory structure if applied to the MASS operations without modification. Chapter V, regulation 24 of the International Convention for the Safety of Life at Sea (SOLAS) 1974 requires that manual control of the ship's steering be established promptly in dangerous navigational situations, or an autonomous ship without a crew will be unable to comply with this law [28]. Regulations that need human judgment are a more complex matter. It is unclear how this law would apply to vessels designed to make navigational decisions using algorithms based on data collected from their sensors. Rule 2 of the International Regulations for Preventing Collisions at Sea (COLREGs) 1972, for example, states that nothing shall exonerate any vessel, or the owner, master, or crew thereof, from the consequences of any neglect or any precaution which may be required by ordinary practice of seamen. Those developing the new technology often remind us that deep learning based programs are flexible and react to and from the new patterns which are programmed to identify, meaning that a program could learn situational awareness and the subjective aspects of COLREGs. Even so, this poses a significant challenge to those who seek to regulate the matter.

Additionally, there are severe risks in today's fully automated ships, such as sensor defects and software errors. For example, aviation incidents involving the Boeing 737 MAX in 2018 and 2019 are examples where the airplane's angle sensors gave the altitude control system inaccurate information. As a result, the airplane crashed because it was challenging to bypass the mechanism manually. As a result, under the current blame system, harm brought on by improper algorithms may be categorized as a product defect (and hence a technical failure) and negligence (based on the root issue).

Although there is no definitive answer at this time regarding how, if at all, regulations like the COLREGs will be modified for MASS application, it is a crucial topic of discussion in the maritime sector. The implementation of the COLREGs with MASS is facing numerous obstacles, based on the information at the time this article was written. For most of the rules, participants preferred the original COLREGs. However, some rules were preferred with modest modifications. Most of these findings are consistent with the regulatory scoping exercise conclusions from the IMO. Adding or refining meanings for terminology, e.g., "master and crew", "the common practice of sailor", "crew ashore", and "lookout" were among the most popular revisions.

Additionally, an all-around colored MASS-identifying light was selected to add different traffic separation schemes that are required for MASS. Since almost 75% of participants preferred more than one amendment over the original regulation, it was clear that participants were amenable to some adjustment. Additionally, those who have had more practice using the COLREGs demonstrated a modest propensity toward selecting the revised rules compared to participants who had had less practice. To better train seamen for the future as the maritime sector adopts autonomy, it is crucial that MASS and its impact on the COLREGs and other IMO instruments are further investigated immediately.

#### *3.1. Impact on Regulation*

Despite the rapid advancement of science and technology in the marine industry, autonomous vessels must unquestionably adhere to international standards in order to operate securely between nations and even seabed areas outside of national authority. Although some parts of manned vessel regulation, such as some clauses of the International Safety Management (ISM) Code, may be compatible with unmanned vessels, there is a need for unique international rules to consider the characteristics of unmanned vessels. A request for RSE was recently submitted to the Maritime Safety Committee (MSC) and was incorporated into the MSC work plan at MSC 98 [29] to ensure MASS safety, security, and environmental soundness. The RSE for MASS aims to determine the degree of autonomy that may affect existing regulatory frameworks to address MASS operations. The degrees of autonomy at MSC 100 [30] were divided into four phases to help with the RSE process (see Figure 1). One should emphasize that MASS can operate in multiple levels of autonomy during a single voyage.

**Figure 1.** MASS's level of automation, according to IMO.

All conventions seem obsolete, and new regulatory standards will be needed. It is recommended that all IMO committees and subcommittees work together using the goal-based approach. The MSC recently authorized a revision of generic principles for producing IMO goal-based standards (GBS) to set safety goals and functional requirements while considering the whole MASS lifetime [31]. Risk assessment and software quality assurance (SQA) will be necessary, in addition to the GBS, for MASS's safety in both the real and virtual worlds.

Autonomous shipping is a new technology requiring an international regulatory or harmonization between existing regulatory rules for all states' territorial waters. The matter is further complicated as rules and regulations are embodied in several international agreements over the last century or so, in some cases after years of negotiations conducted by the international community. Until an international consensus on regulating this new technology is reached, it is doubtful that autonomous ships will operate in international waters beyond any state's territorial waters [32].

#### *3.2. Impact on Technology*

Demonstrating that autonomous systems are at least as safe as piloted ship systems and providing the ship shore control center (SSCC) with enough situation awareness represent one of the most challenging issues in building the technology for MASS. The ship systems should be remotely monitored and managed by the operators of the SCC to obtain essential information through satellite at short intervals in case of emergencies such as rescue attempts or evasive maneuvers. If the autonomous system fails, the SSCC should include a smart alarm system and the capacity to transition to manual control mode. Figure 2 depicts the MASS and SSCC systems, their essential equipment and operations, and the relationship formed by satellite data.

**Figure 2.** The relationship between MASS and the ship-shore-control-center (SSCC).

The sensors' dependability must be ensured through design approval, remote and onpremises testing, and monthly inspections, particularly for sensors that support monitoring and decisions from SCC. Sensor failures pose a significant risk to the system's safety. Therefore, the most significant safety sensors should consider redundancy, diagnostics, prognosis, and homogeneous and heterogeneous redundancy. It's worth noting that heterogeneous redundancy is more dependable than others because it can eliminate sensortype dependency [33]. A more extensive elicitation of experts could also be advantageous to overcome various concerns connected to threats affecting autonomous ships' safe and efficient operations due to a lack of failure data and easy access to the data.

The Relevance of Cyber Risks Management for Shipping Operations

Based on their complexity, transportation systems may have the following four levels of cyber systems:

The first is the perceptual layer, which uses components, such as wireless sensors and GPS, to connect the cyber and physical worlds. The second type is network systems, which it's used to convey data (e.g., satellite networks and the internet mobile communication network). The third tier is the support layer, which includes cloud computing and intelligent computing, and the fourth layer is the application layer, which connects people and the physical world to cyber systems (e.g., intelligent transportation and environmental monitoring), see Figure 3. All these four layers are present in the context of the modern vessel. Such integration is achieved utilizing Ethernet Industrial Protocols that collect and process data via wireless and fiber optic sensors, cameras, radars, satellite communications, and cloud computing.

**Figure 3.** Illustration of the main systems integrated into modern vessels and the four levels of cyber systems.

While the integration of technology promises to make sea transportation safer, more ecologically friendly, and entertaining while lowering costs, it also raises the risk of disrupted vessel operations. The rising use of information technology (IT) systems during marine transportation eliminates the need for the perpetrator to bypass physical security measures, as happened in the 9/11 attacks. Taking control of a vessel or disrupting its operation can now potentially be achieved electronically by remotely interfering with any other of these four layers. Interference can be achieved in a variety of ways, the most prominent of which are as follows:


The IMO was alarmed by two events in particular: the first occurred in 2017, when at least 20 vessels in the Black Sea appeared in the automatic identification system (AIS) 20 miles inland, close to a Russian airport [37], and the second occurred between 2011 and 2013 when a criminal gang infiltrated the container tracking system at the Port of Antwerp located in Flanders (Belgium) and stole containers in which illicit substances were hidden, unbeknownst to their owners [38].

The IMO, alerted by such instances, emphasized the necessity for enterprise-wide cyber risk management by all industry stakeholders, including public authorities and commercial companies. Given that interconnectivity is the fundamental pillar of digitalized and autonomous operations, it's understandable that those recommendations would be addressed to such a large audience. However, perhaps the most critical recommendation of the IMO is that a cyber risk management program is included in safety management systems.

#### *3.3. Impact on Industry*

The shipping industry has relied on the knowledge and experience of ship crews for hundreds of years. With unmanned vessels, autonomous technology is designed to revolutionize the marine sector. Small autonomous boats have already entered operation, while larger vessel technology is still developing. It is time for the marine industry to embrace autonomy and comprehend how it will influence the industry's future and how to utilize it best. MASS will affect ship design, shipbuilding, and port infrastructure, including services and interfaces. On-shore shipping ports will be transformed by automation, from port infrastructure and cargo handling to land-based logistics and transportation. One of the logistics industry's goals is to provide fast service, which allows shippers and customers to adjust dispatches and receive deliveries from this self-contained logistics transport chain on the fly [39].

Communication and cooperation among MASS stakeholders based on mutual understanding will be critical to the MASS's successful introduction to the marine industry. Figure 4 depicts the main stakeholders and their relationships. Stakeholders in the maritime sector would include seafarers onboard and ashore, insurance companies, cargo and bunkering corporations, research institutions, universities, and training centers. Furthermore, autonomous vessels will transform existing industries by introducing system integration and control, system management and maintenance, SSCC operation and management, fleet management, cybersecurity, big data analysis, smart sensors, and communication. Furthermore, to make autonomous ships effective and dependable, development, alteration, and interpretation of maritime rules and regulations, as well as communication and cooperation among stakeholders, are essential for the MASS to be successful.

**Figure 4.** The main stakeholders for ships.

#### *3.4. Impact on Jobs and Training*

While the marine business is rapidly expanding, finding suitably skilled sailors is a constant challenge. Lloyds Register [40], in particular, forecasted severe shortages of skilled officers and crews by 2025. Furthermore, the introduction of MASS has generated concerns about the seafarers and positions to be replaced by AI and autonomous systems. However, this change will trigger new business and jobs for highly qualified crews and operators, particularly those with knowledge of technology, IT systems, engineering, and public relations and regulations [41].

Crewmembers' training needs to focus on different skills and competencies, from seafaring skills and automation and communication engineering knowledge, where the engineering support team ensures the communication between the shore team and the automated ship in an efficient, bidirectional way [42]. In addition, watchkeeping personnel and companies have an essential role in ensuring safe faring. For practicing challenging safety situations, a well-designed simulator is used, but the only problem with that simulator is the inability to create real-time challenging safety situations, which require creativity and deep knowledge of seagoing accidents. Thus, ship operators require a combination of nautical and technological expertise, such as voyage planning, digital and port approaches for communication duties, mooring, unmooring, ship monitoring, and docking [40–42].

The use of automation could mitigate the predicted worker shortage. Many maritime jobs will be transferred to land-based SSCC due to remote and autonomous operations, allowing the industry to recruit new people who find a marine career onshore more appealing. It is also expected that autonomous ships will improve seafarers' quality of life. The difficulties of staying on board for extended periods and the risks of marine mishaps will be reduced if ships are controlled from the shore.

MASS on-shore operators receive relevant training and education under the International Convention on Standards of Training, Certification, and Watchkeeping for Seafarers (STCW). However, in light of the declining number of seafarers, it may also be essential to explore developing new STCW Convention qualifying criteria or new knowledge, understanding, and proficiencies. Thus, while applying a reliable maritime education and training (MET), qualified trainers must be considered along with their ability to teach and assess their trainees. An effective training methodology must hold a cognitive, psychomotor, and affective learning approach with clear objectives corresponding to the domain and level of the required competencies. Moreover, the trainers must be creative and engage the trainees in the learning process by promoting a leadership spirit in an appropriate way, i.e., seeing, thinking, and applying what is learned. Finally, continuous educational research and training must be provided to face future challenges in shipping while applying MASS.

#### *3.5. Issue of Laws and Ethics*

The industry has embraced advanced and new technologies to boost productivity, cut costs, and increase safety. As there is a mutual influence of regulations and technologies, effective and timely regulatory procedures are essential for the industry to profit from the benefits of the technology entirely. Traditionally, liability has been given to human individuals or organizations that are considered legal entities, such as shipping companies. An algorithm is not regarded as a moral or legal agent, and assigning blame for wrongdoing is impossible. This issue was thoroughly analyzed in the automotive industry. The testing of classic examples of moral problems is part of the argument about the safety of self-driving cars [41]. The ISM Code (SOLAS Chapter IX) requirements to establish a legal organization responsible for the safe operation of ships and pollution prevention, for example, will continue to apply to the MASS [42].

The development and use of autonomous ships will raise a wide range of ethical challenges. Human communication has dominated ship operations in the past, but the implementation of MASS includes man–machine and machine–machine communication. This implementation's risk or change assessment should include analysis and protocols of cases in which machine communication fails or is denied. The definition of legal liability boundaries, particularly the establishment of reasonable criteria and scopes of responsibility between shipowner and manufacturer, is required, as well as an appropriate security structure for insurance coverage.

As an example, consider the following inquiry on the ethical issue. It was thought that a MASS would take the most cost-effective path. However, a manned passenger ship capsized near the MASS, and communication systems between the MASS and the manned ship were unavailable or misdirected, leaving the crews and passengers on the capsized ship with little choice but to wait for assistance. Unfortunately, the MASS may be unable to distinguish the passenger ship in dangerous circumstances. Who is liable for failing to recognize the ship and perform rescue duties?

#### **4. Discussion**

IMO convention's standards are structured into several categories. Many of those standards will need revision as some may be obsolete.

One is detailed control requirements. The existing SOLAS method assumes a physical navigation bridge with an officer of the watch stationed on it, from which the vessel may be controlled immediately. This is the basis for several distinct criteria. For example, there are standards for steering gear, propulsion controls, propeller pitch controls, and watertight compartment controls to be supplied on the bridge, as well as voice communications provisions. Another example of this regulation is the need for pilot transfers to be supervised by a certified person with bridge communication. These will need to be updated primarily to allow for shore-based control.

Second, the precise criteria for electronic communications systems presume that there is a crew on board who is in regular contact with the shore and other vessels. Both radar and onboard radio people, a VHF unit on the bridge, constant radio watch, and other requirements are among them. Communications systems include the facilities for sending distress calls by at least two separate and independent means, equipment capable of receiving shore-to-ship distress alerts and transmitting and receiving ship-to-ship distress alerts, and search and rescue coordinating communications. It also requires on-scene communications, maritime safety information, general radio communications to and from shore-based radio systems, and bridge-to-bridge communications. There are further needs for the master to convey any navigational dangers he encounters, in addition to the hardware requirements [43–45]. Considering all these obligations, most of these communications must be preserved. No one wants the requirement for radio watch on distress frequencies, the ability to send distress calls, or inter-ship communication to go away in the case of autonomous shipping. Even the ability to receive maritime safety notices on board may be helpful, if only because they will need to be relayed if the shore-based controller is outside the transmitting station's range. However, the standards will need to be changed to relate to radio signals being relayed to and from the shore-based controller via the vessel rather than to someone on the vessel.

Third, clarification is needed as to numerous references in the IMO conventions to the master. The Comite Maritime International (CMI) has produced a spreadsheet in its submission to the IMO identifying provisions in the IMO regulations that will need clarification or amendment to deal with unmanned vessels. It also identifies numerous provisions with the comment interpretation of the master.

Fourth, the International Maritime Organization must adopt new regulations to deal with autonomous vessels that do not have a crew on board. Finally, training and certification standards for remote onshore controllers will need to be added to the STCW. These should only be found in countries that have signed the MARPOL convention.

SOLAS must also be addressed in terms of what it does not include. For example, the features required of the communications and remote-control devices used to manage the vessel while at sea are entirely dependent on autonomous shipping. Therefore, it will have to deal with issues such as the following in-depth:

• The reliability of propulsion and other machinery, such as steering gear, will have to be controlled for long periods, possibly weeks, from a distance with limited possibilities of interim maintenance.


#### **5. Conclusions**

Regarding safety, security, and environmental protection conventions and regulations for autonomous surface ships, there are new and distinct concerns to be addressed. As a result, before MASS is introduced into commercial shipping, more holistic, worldwide, and unified approaches for new regulatory frameworks to the MASS must assure the prevention of marine accidents and environmental protection. It is also crucial to comprehend the MASS's impact on legislation, technology, and industries and the interactions among relevant players. While some preliminary studies have been completed, various projects are underway or planned worldwide to develop pilot ships, competing concepts and systems to support unmanned operations, and infrastructure initiatives such as autonomous ports and high bandwidth communications. The MASS should be monitored and managed remotely by the SSCC's operators, with a smart alarm system receiving critical information through satellite. The MASS and SSCC systems and sensors must be designed and built, and their synergetic effects must be carefully examined. Onboard equipment and devices will need to be interconnected to efficiently gather, manage, and analyze data from the MASS. They will be heavily modularized to avoid failures and have a high degree of redundancy and endurance. The MASS will affect ship design, shipbuilding, and port infrastructure, including services and interfaces.

Communication and cooperation among numerous stakeholders based on mutual understanding would be critical for a successful introduction of the MASS to the maritime industries, including shipping, shipbuilding, equipment production, and classification societies. MASS can modify pirate, terrorist, and criminal behavior patterns. By establishing new inspection procedures, technical and institutional considerations should be made to increase security. While the number of seafarers is expected to decline, developing qualification criteria for MASS onshore operators and providing relevant training and education will be critical. Regarding legal and ethical concerns, the time it takes for technology to mature vs. the time it takes to implement relevant legislation and procedures may negatively impact the timely adoption of innovations. A quantitative analysis of the influence of the MASS on technologies and industries, including economic consequences, will be addressed as part of future work.

**Author Contributions:** Conceptualization, M.I. and A.I.; methodology M.I.; validation: A.I.; investigation, M.I. and P.R.; resources, P.R. and H.I.; writing—original draft preparation, M.I.; writing—review and editing, M.I., P.R. and H.I.; visualization, M.I. and H.I.; supervision, A.I.; project administration, A.I. All authors have read and agreed to the published version of the manuscript.

**Funding:** Authors have received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Solar Irradiance Probabilistic Forecasting Using Machine Learning, Metaheuristic Models and Numerical Weather Predictions**

**Vateanui Sansine 1,2,\*, Pascal Ortega 1, Daniel Hissel <sup>2</sup> and Marania Hopuare <sup>1</sup>**


**Abstract:** Solar-power-generation forecasting tools are essential for microgrid stability, operation, and planning. The prediction of solar irradiance (SI) usually relies on the time series of SI and other meteorological data. In this study, the considered microgrid was a combined cold- and power-generation system, located in Tahiti. Point forecasts were obtained using a particle swarm optimization (PSO) algorithm combined with three stand-alone models: XGboost (PSO-XGboost), the long short-term memory neural network (PSO-LSTM), and the gradient boosting regression algorithm (PSO-GBRT). The implemented daily SI forecasts relied on an hourly time-step. The input data were composed of outputs from the numerical forecasting model AROME (Météo France) combined with historical meteorological data. Our three hybrid models were compared with other stand-alone models, namely, artificial neural network (ANN), convolutional neural network (CNN), random forest (RF), LSTM, GBRT, and XGboost. The probabilistic forecasts were obtained by mapping the quantiles of the hourly residuals, which enabled the computation of 38%, 68%, 95%, and 99% prediction intervals (PIs). The experimental results showed that PSO-LSTM had the best accuracy for day-ahead solar irradiance forecasting compared with the other benchmark models, through overall deterministic and probabilistic metrics.

**Keywords:** solar irradiance; forecasting; numerical weather predictions; machine learning; deep learning; metaheuristic models; optimization

#### **1. Introduction**

Global electricity demand is expected to rise by 2.4% in 2022, despite economic weaknesses and high prices [1]. This rise, driven by the growth of the world population, the industrialization of developing countries, and the worldwide process of urbanization [2], uses fossil fuels as the main power source. This has proven to be detrimental for the environment and the climate. Therefore, renewable energies have gained a lot of attention, especially photovoltaics (PVs), due to their accessibility, low cost, lifetime, and environmental benefits. Solar PV installations are growing faster than any other renewable energy. Indeed, PVs are forecast to account for 60% of the increase in global renewable capacity in 2022 [3]. In this context, PVs provide many environmental and economic benefits. However, uncontrollable factors such as the weather, seasonality, and climate lead to intermittent, random, and volatile PV power generation. These significant constraints still hinder the large-scale integration of PVs into the power grid and interfere with the reliability and stability of existing grid-connected power systems [4]. Thus, a reliable forecast of PV power outputs is essential to ensure the stability, reliability, and cost-effectiveness of the system [5]. Those forecasts are usually implemented through prediction of the global horizontal irradiance (GHI). There are three main groups of solar irradiance forecasting model [6]:

**Citation:** Sansine, V.; Ortega, P.; Hissel, D.; Hopuare, M. Solar Irradiance Probabilistic Forecasting Using Machine Learning, Metaheuristic Models and Numerical Weather Predictions. *Sustainability* **2022**, *14*, 15260. https://doi.org/ 10.3390/su142215260

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 9 October 2022 Accepted: 31 October 2022 Published: 17 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).


Machine learning algorithms, classified under statistical models, have become very popular for studies related to PV power-output forecasting, and play an important role in contemporary solar-irradiance forecasting for conventional grid management and for smaller and independent microgrids.

Ogliari et al. [7] compared two deterministic models with hybrid methods, a combination of an artificial neural network (ANN) and a clear sky radiation model, for PV power output forecasting. The models were trained on one year of measured data in a PV plant located in Milan, Italy. The results show that the hybrid method is the most precise for PV output forecasting, demonstrating advantages by combining physical models with machine learning algorithms.

Crisosto et al. [8] used a feedforward neural network (FFNN) with Levenberg–Marquardt backpropagation (LM–BP) to make predictions for one hour ahead with one-minute resolution in the city of Hanover, Germany. The model was trained on a four-year dataset including all-sky images, used for cloud cover computation, and measured global irradiance. For hourly average predictions, the FFNN-LM-BP showed the best results with an RMSE (Wh/m2) = 65, and R2 = 0.98, compared with the persistence model with an RMSE = 91 and R2 = 0.91.

Yu et al. [9] used a long short-term memory (LSTM) model to predict GHI in three cities in the USA, namely, New York, Atlanta, and Hawaii. The time horizons of the model were one hour ahead and one day ahead. The model's performance was compared with other models such as the autoregressive integrated moving average (ARIMA), convolutional neural network (CNN), FFNN, and recurrent neural network (RNN). For hourly predictions, the LSTM model was more precise in all three states, with R<sup>2</sup> exceeding 0.9 on cloudy and partially cloudy days, whereas R2 for the RNN was only 0.70 and 0.79 in Atlanta and Hawaii. For daily forecasting, LSTM outperformed the other models except in clear-sky days for New York, whereas for Hawaii and Atlanta, LSTM was better in every case.

However, it is difficult to improve the forecast from only one machine learning model, which sometimes suffers from instability originating from poor parameter choice, or from a reduced number of input variables. Ensemble learning is a popular development trend in artificial intelligence (AI) algorithms [10]. It combines independent models with stronger learners, which can achieve better stability and prediction effects compared with individual models [11].

Huang et al. [12] used gradient boosting regression (GBRT), extreme gradient lifting (XGboost), Gaussian process regression (GPR), and random forest (RF) models to carry out GHI predictions. Those ensemble models performed better than other stand-alone models such as decision tree (DT), backpropagation neural network (BPNN), and support vector machine regression (SVR). It is concluded that the stacking models—including GBRT, XGboost, GPR, and RF—are the best models to predict solar radiation.

Li et al. [13] used XGboost to implement point forecasts for solar irradiance and kernel density estimation (KDE) to generate probabilistic forecasts from the above prediction results. This method enabled the computation of confidence levels and demonstrated better results than other benchmark algorithms such as SVR and random forest.

To improve the efficiency of machine learning (ML) models, an increasing number of studies have used metaheuristic models in order to optimize the parameters of the considered GHI forecasting model.

Jia et al. [14] utilized particle swarm optimization (PSO) coupled with a Gaussian exponential model (GEM) to predict daily and monthly solar radiation (Rs). The hybrid PSO-GEM model showed the best results for Rs prediction.

Duan et al. [15] used NWP, together with the kernel-based nonlinear extension of Arps decline (KNEA) to predict solar irradiance. The KNEA algorithm is optimized by a metaheuristic algorithm called the Bat algorithm (BA). The proposed method for GHI forecasting is called the BA-KNEA. Duan et al. also implemented other hybrid models such as PSO-XGboost, BA-XGboost, and PSO-KNEA. The results showed that BA-KNEA is better at performing solar radiation forecasts.

In summary, ensemble learning models are an emerging trend in ML, proving to be appropriate tools for regression, and therefore, GHI forecasting. They have shown good results compared with deep learning models for day-ahead GHI point forecasts [12]. Moreover, ensemble methods can be further improved with metaheuristic models for parameter optimization, as well as the prevention of potential numerical instability from which various ML models suffer. However, one of the drawbacks of point forecasts is that they contain limited information about the volatility and randomness of solar irradiance. Point forecasts cannot satisfy the needs of a power system's optimized operation [13]. For this reason, considerable attention has been drawn to probabilistic forecasting, which enables the computation of prediction intervals to provide to grid dispatchers in order to facilitate grid operation.

This study focused on the implementation of daily probabilistic forecasts with hybrid models such as PSO-XGboost, PSO-LSTM, PSO-GBRT, and quantile mapping for the computation of prediction intervals. The hybrid models were compared with other reference models, namely, ANN, CNN, LSTM, RF, and GBRT.

The novelty of this work lies in the residual modeling implemented with an innovative hybrid model (PSO-LSTM), enabling us to compute prediction intervals with different confidence levels, and thus obtain probabilistic forecasts. To the best of our knowledge, no day-ahead probabilistic GHI predictions have been implemented with this method. Secondly, we demonstrate that using a deep learning approach combined with metaheuristic models can achieve higher accuracy than ensemble models, or their optimized versions.

In order to produce those forecasts, historical data measured on-site coupled with NWP were used in the training of GHI forecasting models. These forecasting tools are intended to control a combined cold- and power-generation system, comprising several energy production and storage sub-systems, the whole being powered by solar energy. This prototype is called RECIF (the French abbreviation for a microgrid for electricity and cold cogeneration), and has been developed within the framework of a project funded by the French National Agency for Research (ANR) and is being implemented at the University of French Polynesia (UPF).

The rest of the paper is organized as follows: the historical data and the implemented data processes are presented in Section 2, followed in Section 3 by a theoretical background of machine learning and metaheuristic models. The results, analysis, suggestions for future research, and perspectives are presented in Section 4. The conclusions and the principal results are presented in Section 5.

#### **2. Materials and Methods**

#### *2.1. Input Variables*

This study utilized historical data measured from the weather station set-up in the University of French Polynesia. Two years of measurements are at our disposal, from 2019 to 2020. Those measurements are crucial in the design and implementation of a reliable forecasting system based on machine learning algorithms. The meteorological variables are measured with a time step of 1 min. The GHI is measured with a BF5 pyranometer supplied by Delta Devices, which uses an array of photodiodes with a unique computer-generated shading pattern to measure the diffuse horizontal irradiance (DHI) and GHI [16]. This enables the computation of the direct normal irradiance (DNI) for a given solar zenith angle. The set of inputs chosen from the weather station, for the GHI day-ahead forecasting models, was as follows:


An overview of the data is presented in Table 1. The processing of the historical data is detailed in Section 2.2.

**Table 1.** Descriptive statistics including the mean, standard deviation (std), minimum/maximum values, and the quantiles for each meteorological variable.


In addition to these in situ measurements, numerical weather predictions (NWPs) were used to train our day-ahead forecasting models. The numerical weather prediction model AROME was implemented by Météo-France with a resolution of 0.025 × 0.025◦ (2.5 × 2.5 km) in French Polynesia. These predictions have a maximum time horizon of 42 h and are updated every 12 h in French Polynesia. In Figure 1, each node (or grid point) of the AROME model for the north-eastern part of Tahiti is depicted, numbered from 1 to 34. Two years of NWP outputs are available, spanning from January 2019 to December 2020 with an hourly time-step. The GHI values predicted by AROME are only available from 9 am to 4 pm.

**Figure 1.** Points from the AROME grid (red dot represents the University).

#### *2.2. Data Processing*

This section explains the steps involved in processing the historical data and the AROME output. A vital step in data processing is to remove anomalous data that are caused by technical glitches from the sensors, such as negative and Not a Number (or NaN) values, or outliers. After the removal of NaN values, the outliers are detected through the interquartile range method (IQR). A sliding mean is applied to the 1 min time-step meteorological data, in order to obtain hourly values, and making correlation with the AROME output possible. The mean value at time t is computed from the 60 previous measurements.

In order to quantify the errors between the in situ measurements and the AROME model, and then determine which points of the AROME grid to use for the training of the machine learning algorithms, the following metrics were used: the mean square error (MSE), the root-mean-square error (RMSE), and the determination coefficient (R2).

$$\text{MSE} = \frac{1}{\text{N}} \sum\_{i=0}^{\text{N}} (\mathbf{y}\_{\text{measured,i}} - \mathbf{y}\_{\text{predicted,i}})^2 \tag{1}$$

$$\text{RMSE} = \sqrt{\frac{1}{N} \sum\_{i=0}^{N} (\mathbf{y}\_{\text{measured,i}} - \mathbf{y}\_{\text{predicted,i}})^2} \tag{2}$$

$$\mathbf{R}^2 = 1 - \frac{\sum\_{i=0}^{\text{N}} (\mathbf{y}\_{\text{measured,i}} - \mathbf{y}\_{\text{predicted},i})^2}{\sum\_{i=0}^{\text{N}} (\mathbf{y}\_{\text{measured,i}} - \overline{\mathbf{y}}\_{\text{measured}})^2},\tag{3}$$

where N is the number of observations, ymeasured,i is the measurements, y measured is the mean value of the measurements, and ypredicted,i is the predicted values. The results are presented in Figure 2.

**Figure 2.** Errors between the measured GHI at the UPF and the AROME predictions for each grid point.

Points 31 to 34 were not used because they contained a great number of outliers in the first semester. The selected points were, arbitrarily, the points with some of the lowest correlation (R<sup>2</sup> <sup>&</sup>lt; <sup>−</sup>0.45), i.e., n◦3, 12, 20, 25, and points that exhibited positive correlation with the measured data, i.e., n◦7, 13, 14, 15, 16, 17, 21, and 26.

The missing data were not replaced (through linear interpolation for example), but the consecutiveness of the dates of the data was ensured in the construction of the input data (or input vector); thus, no missing values were processed in the machine learning algorithms.

The night hours were removed from the measured data; consequently, the GHI forecasts implemented in this study were only performed for the hours between 6 am and 8 pm. After correlating the measurements and the AROME output, the merged data were normalized according to Equation (4):

$$
\chi\_{\text{normalized}} = \frac{\chi - \chi\_{\text{min}}}{\chi\_{\text{max}} - \chi\_{\text{min}}},\tag{4}
$$

The data were then split into 70% as training data, 20% as validation data, and 10% as testing data.

#### **3. Theoretical Background**

#### *3.1. Long Short-Term Memory (LSTM)*

In recent years, LSTM has been widely applied to implement GHI forecasting [17,18]. One of the main advantages, compared with a classical RNN, is that LSTM models can deal with long-term dependencies found in the data without having problems such as vanishing gradients [19] using forget gates.

As shown in Figure 3, a typical LSTM network consists of one cell and three gates (an input gate, forget gate, and output gate). The input gate adjusts the amount of new data stored in the unit. The output gate determines which information to obtain from the cell, while the forget gate determines which information can be discarded [15]. Each gate uses either tanh or sigmoid as activation functions.

**Figure 3.** Basic structure of an LSTM model [15].

The input gate can be calculated with Equation (5) [15]:

$$\text{gate}(f\_i) = \sigma\_s(w\_i \mathbf{x}\_t + \mathbf{u}\_i \mathbf{h}\_{t-1} + \mathbf{b}\_i),\tag{5}$$

where *σ<sup>s</sup>* is the sigmoid activation function, *ht*−<sup>1</sup> is the cell output at the previous time-step, *Wi* and *Ui* are weight factors, and *bi* is the bias.

The forget gate can be computed with Equation (6) [15]:

$$
\log \text{det}(f\_t) = \sigma\_s (w\_t \mathbf{x}\_t + \mathbf{u}\_t h\_{t-1} + b\_t),
\tag{6}
$$

where *Wt* and *Ut* are weight factors and *bt* is the bias. The output is finally computed with Equation (7) [15]:

$$
\mathfrak{gate}(f\_o) = \sigma\_s(w\_o \mathfrak{x}\_t + u\_o h\_{t-1} + b\_o),
\tag{7}
$$

where *Wo* and *Uo* are weight factors and *bo* is the bias.

In this study, an LSTM model and an optimized LSTM (PSO-LSTM) model were used to implement daily GHI forecasting. They were compared with other models for probabilistic predictions. The parameters used for optimization are listed in Section 3.4.

The implemented LSTM model was composed of two LSTM models for day-ahead forecasting. One model was to process historical data; the second model was used to process AROME outputs. The outputs of the two LSTM models were concatenated, before being processed by a classical ANN.

#### *3.2. Particle Swarm Optimization (PSO)*

Particle swarm optimization was first proposed by Kennedy and Eberhart [20]. This algorithm simulates the predatory actions of a swarm of animals to find the best solution. A massless swarm of particles is created, with only two parameters: their position and speed. Each particle searches for the optimal solution separately in the search space and records it as the current individual extremum. The position of the extremum is shared with other particles in the whole swarm. If one individual extreme value is the best out of all other extremes, it is recorded as the global optimal solution. The global optimal solution is updated every time a particle finds a better extremum.

All the particles in the swarm adjust their velocity and position according to the current extremum already seen by the individual and the current global optimal solution shared by the whole swarm. The formulas for updating the position and speed of the PSO algorithm are shown in Equations (8) and (9) [20]:

$$\mathbf{X}\_{\mathbf{i},\mathbf{t}} = \mathbf{X}\_{\mathbf{i},\mathbf{t}-\mathbf{1}} + \mathbf{V}\_{\mathbf{i},\mathbf{t}\prime} \tag{8}$$

$$\mathbf{V}\_{\mathbf{i},\mathbf{t}} = I\_W \times V\_{\mathbf{i},\mathbf{t}-1} \times c\_1 \times \theta\_1 \times (pbest\_i - \lambda\_{\mathbf{i},\mathbf{t}-1}) + c\_2 \times \theta\_2 \times (gbest\_i - \lambda\_{\mathbf{i},\mathbf{t}-1}),\tag{9}$$

where Xi,t is the position of the i-th particle during the t-th iteration, and Vi,t is the speed of the i-th particle during the t-th iteration. *c*<sup>1</sup> and *c*<sup>2</sup> are called the cognitive (personal) and social (global) coefficients, respectively. The coefficients control the exploitation of the individual extremum found by each particle and the levels of exploration made by the swarm in the entire search space. *θ*<sup>1</sup> and *θ*<sup>2</sup> are random data, in the range [0, 1]. *pbesti* is the best location of the i-th particles among all iterations. *pbesti* is the best global location of all particles. *IW* is random data initialized in the range [0, 1]

#### *3.3. XGboost*

XGboost is a machine learning algorithm realized by gradient lifting technology, and is the first parallel gradient enhanced tree (GBDT) algorithm. XGboost is based on classification and regression tree (CART) theory [21]. It provides parallel tree boosting and is one of the leading machine learning algorithms for regression, classification, and ranking problems. The XGboost model is built by adding trees iteratively. The predicted values of the i-th sample in the t-th iteration can be expressed as follows [21]:

$$
\mathfrak{Y}\_{i,t} = \mathfrak{Y}\_{i,t-1} + f\_t(X\_i), \tag{10}
$$

where *ft*(*Xi*) represents the addition needed to improve the model. The tree is added iteratively to minimize the objective function, which can be expressed as [21]:

$$obj^{(t)} = \sum\_{i=1}^{n} L(y\_i, \hat{y}\_{i, t-1} + f\_l(X\_i)) + \Omega(f\_l), \tag{11}$$

where *obj*(*t*) is the loss function [21].

$$
\Omega(f\_t) = \gamma T + \frac{1}{2}\lambda \sum\_{j=1}^{T} w\_{j'}^2 \tag{12}
$$

*γ* and *λ* are parameters that represent the model complexity. *T* is the number of leaves, and *wj* is a weight parameter.

#### *3.4. Hybrid Models*

In this study, a hybrid model, PSO-XGboost, was implemented in order to obtain point forecasts of the GHI. The PSO algorithm is used to choose the best parameters for the XGboost algorithm. Seven important parameters for the XGboost model were chosen, as listed in Table 2. Those parameters were also used by Yu et al. [10] in order to estimate daily reference evapotranspiration values. The parameter "number of trees" has been added, because it is also an important parameter for XGboost.


**Table 2.** Parameters used in the optimization of XGboost.

The ML models were used, in this case, to solve a regression problem; therefore, we set R2 to be the main metric of the PSO algorithm. R<sup>2</sup> is a positive-oriented metric; thus, the practical objective function used here was 1−R2. Indeed, the more the precision of the results increases, the closer R2 is to 1, which also represents a minimum in the objective function 1−R2. Twenty particles are used for the PSO algorithm in order to limit computation time, and to explore the entire research space. The flow chart of the hybrid model is presented in Figure 4.

**Figure 4.** Flow chart of the hybrid models.

A PSO-Gradient boosting model was also implemented, but with fewer parameters than the PSO-XGboost. Only the maximum depth, the learning rate, number of trees, and subsample were used in this instance. The other parameters seen for PSO-XGboost were not available for the gradient boosting algorithm. As stated above, a hybrid PSO-LSTM model was also implemented for daily GHI forecasting. The parameters chosen for the optimization are presented in Table 3.


**Table 3.** Parameters of the LSTM model for particle swarm optimization.

#### *3.5. Residual Modeling*

Probabilistic forecasting was implemented in this study through residual modeling. For each individual hour, the residuals were computed and assumed to have either a Gaussian or a Laplacian distribution. This method was inspired by He et al. [22]. The quantiles of the residuals were computed and taken as prediction intervals (PIs). To compute the different quantiles for all the considered distributions, we first needed to consider their cumulative distribution function (cdf) FResidus(x) in Equation (13).

$$\forall \mathbf{x} \in \mathbb{R}\_{\prime} \; F\_{\text{Residus}}(\mathbf{x}) = \mathbb{P}(Residus \le \mathbf{x}),\tag{13}$$

The inverse of the cdf is called the percent point function or quantile function *Q*(*q*), and is provided in Equation (14):

$$\forall q \in [0,1], \ Q(q) = F\_{\text{Residus}}^{-1}(\mathbf{x}) = \inf \{ \mathbf{x} \in \mathbb{R}, F\_{\text{Residus}}(\mathbf{x}) \ge q \},\tag{14}$$

where *Q*(0.25), *Q*(0.5), and *Q*(0.75) are the first quantile, the median, and the third quantile, respectively. The specific quantile function corresponded to a specific distribution (Gaussian or Laplacian). The PIs were calculated at different confidence levels or CLs. In this study, the 38%, 68%, 95%, and 99% PIs were derived from this inverse cdf for the Gaussian distribution in Equation (15). For the Laplacian distribution, the PIs could be derived using Equation (16), defined in [22]:

$$P\_{cl + \frac{1 - \varsigma l}{2}} = \sigma\_l Q(cl),\tag{15}$$

$$P\_{cl+\frac{1-cl}{2}} = -\sigma\_l \ln(2(1-cl)),\tag{16}$$

where *σ<sup>t</sup>* is the standard deviation of the distribution (Laplacian or Gaussian). Given the symmetry of those distributions, the upper bounds, *Ut*, and lower bounds, *Lt*, were derived using Equations (17)–(19) [22]:

2

$$\mathcal{U}\_t = P\_{cl + \frac{1-\mathcal{L}}{2}} \, \text{ } \,\tag{17}$$

$$L\_l = P\_{\frac{1-cl}{2}"\prime} \tag{18}$$

$$L\_t = \mathcal{U}\_{t\prime} \tag{19}$$

#### *3.6. Metrics for Probabilistic Forecasting*

The quality of probabilistic forecasts was quantified using three different metrics, namely, the prediction interval coverage percentage (PICP), the prediction interval normalized average width (PINAW), and the coverage width-based criterion (CWC), as defined in [13].

The PICP, detailed in Equations (20) and (21), indicates how many real values lie within the bounds of the prediction interval:

$$\text{PICP} = \frac{1}{N} \sum\_{i=1}^{N} \delta\_{i\prime} \tag{20}$$

$$\delta\_i = \begin{cases} 1 & \text{if } y\_i \in \left[ L\_i, \mathcal{U}\_i \right] \\ 0 \text{ if } y\_i \notin \left[ L\_i, \mathcal{U}\_i \right]' \end{cases} \tag{21}$$

The PINAW, shown in Equation (22), quantitatively measures the width of the different PIs:

$$\text{PINAW} = \frac{1}{NR} \sum\_{i=1}^{N} (lI\_i - L\_i) \tag{22}$$

where R is a normalizing factor. The PINAW represents the quantitative width of the PIs; thus, a lower value of PINAW represents better performance for the prediction intervals.

The CWC, shown in Equations (23) and (24), combines the PICP and PINAW to optimally balance the probability and coverage.

$$\text{CWC} = \text{PINAW} \left( 1 + \gamma (\text{PICP}) e^{-\rho (PICP - \mu)} \right), \tag{23}$$

$$\gamma(PICP) = \begin{cases} 0 & \text{if } PICP \ge \mu \\ 1 \text{ if } PICP < \mu \end{cases} \tag{24}$$

where *μ* is the preassigned PICP which is to be satisfied, and *ρ* is a penalizing term. When the preassigned PICP is not satisfied, the CWC increases exponentially. The CWC is a negatively oriented metric, meaning the lower the value, the better.

#### **4. Results**

#### *4.1. Preliminary Results*

Firstly, before implementing any hybrid model, it is necessary to quantify whether the AROME predictions are effective in increasing the accuracy of our forecasting models. Secondly, a study was also performed to determine how many days should be input into the models, so that we have optimal precision in daily forecasts. These two preliminary results are shown in Table 4 and were only performed for the XGboost model and for lagged terms, from 1 day prior to 5 days prior. The employed metrics were MAE, RMSE, and R2.


**Table 4.** Results for lagged days and NWP with the XGboost model.

The results show that the use of AROME does increase the prediction accuracy for daily GHI forecasts. Indeed, for the same number of lagged days, the results with AROME are always better than without AROME, in terms of MAE, R2, and RMSE.

With the AROME data, the best values in terms of MAE and RMSE are 110.24 W/m2 and 176.47 W/m2 and R<sup>2</sup> = 0.76, respectively, for 5 days prior. For this reason, 5 days prior was taken as a standard way to implement our GHI forecasting tools.

However, it would be interesting to carry out the same study with more lagged days at inputs of the machine learning algorithms. In order to carry out such studies, more historical data and AROME outputs are needed.

The results show a decrease in accuracy for 2 days prior. One possible explanation for this decrease is that the default parameters of the XGboost algorithm might be not ideal for daily GHI predictions for 2 days prior.

#### *4.2. Hybrid Models Results*

Tables 5 and 6 present the parameters found by PSO for the XGboost and LSTM algorithms, respectively. Once the optimal parameters are found, the optimized models are tested on the testing data.

**Table 5.** Optimal parameters for XGboost.


**Table 6.** Optimal parameters for LSTM.


The results for all the models used for daily GHI predictions are summarized in Table 7 with deterministic metrics, and in Table 8 with probabilistic metrics.


**Table 7.** Deterministic metrics for all implemented models used for daily GHI predictions.

**Table 8.** Probabilistic metrics for the implemented models.



For the deterministic metrics, we first note that the use of PSO increases the accuracy of standalone models such as LSTM, GBDT, and XGboost. Indeed, there were decreases in MAE and RMSE and an increase in R2 when considering standalone models with their optimized versions.

The deterministic metrics also show that the hybrid PSO-XGboost method is the best for implementing daily forecasting, in terms of RMSE = 153.69 W/m<sup>2</sup> and R2 = 0.82. However, the PSO-LSTM model is also strong, but in terms of MAE = 99.37 W/m2, as well as R2 = 0.82. Neither of the two models has any significant advantage over the other.

In order to choose the best model, a Taylor diagram was drawn (Figure 5) for all implemented machine learning models. It can be seen that PSO-LSTM was slightly better than all the other models for deterministic predictions, because it was closer to the observation than the other models. The standard deviation was also the same for the observation and PSO-LSTM (red dotted line), meaning that it appropriately represented the variability in solar irradiance.

**Figure 5.** Taylor diagram for deterministic comparison between models.

For all models, we can see that the RMSE is greater than the MAE, which is the manifestation of high variance in individual errors. Indeed, because the RMSE is a quadratic scoring rule, it tends to assign high weight to large errors, whereas the MAE gives the same weight to all errors, independently of their magnitude. This variance has been studied thanks to residual modelling and the generation of prediction intervals.

For the probabilistic forecasts the PICP, PINAW, and CWC values were computed for all forecasting models. Highlighted in black in Table 8 are the best CWC values, for 38%, 68%, 95%, and 99% PIs. PSO-LSTM is the best algorithm for all prediction intervals. For 38%, 68%, 95%, and 99% PIs, the CWC values are 6.3, 14.68, 59.13, and 40.1, respectively. Notably, for 38% and 68%, the best fit was the Laplacian distribution, whereas for the 95% and 99% PIs, the best fit was the Gaussian distribution. The proposed methods were implemented in Python 3.7, with the machine learning package Tensorflow 2.2.0. Duan et al. [15] also used PSO-XGboost for predicting solar radiation in four different locations in China. After training with four different datasets, the four R<sup>2</sup> values for 1-day-ahead forecasting were 0.816, 0.84, 0.787, and 0.755. Those values are not far from our own PSO-XGboost algorithm, with an R<sup>2</sup> = 0.82. In our case, the PSO-LSTM model was even better than the PSO-XGboost, demonstrating that deep learning models can still outperform ensemble learning models for day-ahead forecasting and, to the best of our knowledge, no PSO-LSTM has ever been used with quantile mapping to obtain day-ahead GHI probabilistic forecasting. The accuracy of point forecasts depends, however, on the global structure of the LSTM, meaning that a simpler structure from an LSTM model might not have the same results.

Figure 6 shows the PSO-LSTM predictions with the corresponding prediction intervals. We can see that the GHI measurements do stay within the prediction intervals; however, we can see that the prediction intervals are quite large. For this problem, it would be interesting to use another method for computing the confidence levels (CLs), which are smaller than the prediction intervals computed in this paper. Li et al. [13] used kernel density estimation (KDE) for confidence level computation, which gave PINAW values of 15.45, 17.03, and 19.55 for 80%, 85%, and 90% CIs, respectively. This is considerably smaller than the PIs in this article, which are approximately equal to 30 for 95% PIs.

**Figure 6.** PSO-LSTM probabilistic forecasting.

#### *4.3. Perspectives and Future Research*

For the PSO algorithm, the higher the number of particles, the better the exploration of the entire search space; however, the computation time increases accordingly. In order to reduce computation time, we limited ourselves to 20 particles in the swarm. According to Eberhart et al. [23], population sizes ranging from 20 to 50 are optimal in terms of minimalizing the number of evaluations (population size multiplied by the number of iterations) needed to obtain a sufficient solution. Nevertheless, it would be interesting to see the result for GHI day-ahead predictions with the number of particles in a range from 20 to 500 particles for maximum exploration ability of the search space.

As presented in Section 4.1, the maximum precision was obtained for five days of measurements at the input of the models. It is assumed that the more information (lagged days), the better the precision of the forecasting models. For this reason, it would be interesting to carry out a study with more lagged days fed into the models. However, a constraint arises when more lagged days are used as input vectors. Indeed, with the available data, using more lagged days would greatly reduce the number of training samples. To retain a sufficient number of days for the training of our forecasting models, while simultaneously increasing the number of lagged days, more meteorological data and AROME outputs are needed.

Testing another meta-heuristic model also seems a promising way to improve GHI forecasts. Duan et al. [15] used the Bat algorithm for parameter optimization. Other bioinspired optimization processes could be implemented, such as the grey wolf optimizer (GWO), whale optimization algorithm (WOA), or salp swarm algorithm (SSA). Duan et al. also showed that the KNEA algorithm is appropriate for providing accurate point forecasts. Therefore, a hybrid model with the metaheuristic models listed above, coupled with the KNEA algorithm, seems to be a very good way to implement daily GHI forecasts. As mentioned in the last section, combining the computation of confidence levels with the KDE method represents a very efficient way of obtaining better probabilistic forecasting from the aforementioned hybrid models.

#### **5. Conclusions**

The accurate forecasting of solar irradiance is paramount for photovoltaic power generation. In this study, the solar irradiance forecasts from the operational weather prediction model (AROME), implemented by Météo-France, were compared with in situ measurements for error quantification. In order to drastically improve the forecasting accuracy on-site, to control an isolated solar-powered microgrid called RECIF, implemented in Tahiti, ML algorithms were coupled with a metaheuristic particle swarm optimization (PSO) model for parameter optimization. The novelty of this paper resides in the implementation of probabilistic forecasting by combining an innovative hybrid model (PSO-LSTM) with quantile mapping. Mapping of the residuals allowed us to generate 38%, 68%, 95%, and 99% prediction intervals (PIs) with two different distributions, for probabilistic forecasting. Nine machine learning models were used for comparison purposes, namely, artificial neural network (ANN), convolutional neural network (CNN), long short-term memory (LSTM), random forest (RF), gradient boosting (GBRT), XGboost, PSO-LSTM, PSO-GBRT, and PSO-XGboost. PSO-LSTM was superior to all other models with MAE = 99.37 W/m2, RMSE = 154.84 W/m2, and R2 = 0.82, coupled with a Taylor diagram. The PSO-LSTM model was also the best for all probabilistic metrics, exhibiting a Laplacian distribution for 38% and 68% prediction intervals, with CWC values equal to 6.33 and 14.68, respectively. Furthermore, the PSO-LSTM model showed the best results, exhibiting a Gaussian distribution for 95% and 99% prediction intervals, with CWC values equal to 59.13 and 40.1, respectively. This demonstrates that deep learning models coupled with metaheuristic models can outperform the ensemble learning method for day-ahead GHI forecasting.

**Author Contributions:** Conceptualization, V.S.; methodology, V.S., D.H. and P.O.; software, V.S. and P.O.; validation, P.O. and D.H.; formal analysis, V.S., P.O. and D.H.; investigation, V.S.; resources, P.O. and M.H.; data curation, V.S. and M.H.; writing—original draft preparation, V.S.; writing—review and editing, V.S., P.O. and D.H.; supervision, P.O. and D.H.; project administration, P.O. and D.H.; funding acquisition, P.O. and D.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by the EIPHI Graduate School (contract ANR-17-EURE-0002) and the Region Bourgogne Franche-Comté. We thank the National Agency of Research (ANR-18- CE05-0043) for purchasing the equipment needed for this investigation. We also thank the FEMTO-ST laboratory and the University of French Polynesia, for funding this research.

**Data Availability Statement:** Data are provided within this article.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **System Frequency Control Method Driven by Deep Reinforcement Learning and Customer Satisfaction for Thermostatically Controlled Load**

**Rusi Chen 1, Haiguang Liu 1, Chengquan Liu 2,\*, Guangzheng Yu 2, Xuan Yang <sup>3</sup> and Yue Zhou <sup>3</sup>**


**\*** Correspondence: y20103028@mail.shiep.edu.cn

**Abstract:** The intermittence and fluctuation of renewable energy aggravate the power fluctuation of the power grid and pose a severe challenge to the frequency stability of the power system. Thermostatically controlled loads can participate in the frequency regulation of the power grid due to their flexibility. Aiming to solve the problem of the traditional control methods, which have limited adjustment ability, and to have a positive influence on customers, a deep reinforcement learning control strategy based on the framework of soft actor–critic is proposed, considering customer satisfaction. Firstly, the energy storage index and the discomfort index of different users are defined. Secondly, the fuzzy comprehensive evaluation method is applied to evaluate customer satisfaction. Then, the multi-agent models of thermostatically controlled loads are established based on the soft actor–critic algorithm. The models are trained by using the local information of thermostatically controlled loads, and the comprehensive evaluation index fed back by users and the frequency deviation. After training, each agent can realize the cooperative response of thermostatically controlled loads to the system frequency only by relying on the local information. The simulation results show that the proposed strategy can not only reduce the frequency fluctuation, but also improve customer satisfaction.

**Keywords:** thermostatically controlled load; frequency regulation; customer satisfaction; soft actor–critic; energy storage index; discomfort index

#### **1. Introduction**

With the increasing proportion of renewable energy in the power grid, the characteristics of intermittence and fluctuation will bring considerable challenges to the active power balance and frequency stability of the power grid [1]. The traditional power system maintains the balance of the system by adjusting the output of the generating side units. The regulation method is relatively simple and will generate additional economic and environmental costs [2]. In addition, with the increase in power load and the extensive access to renewable energy, the regulation capacity of the power generation side gradually decreases [3]. The power system with renewable energy as the main body can utilize advanced information technology to integrate and dispatch demand-side resources to provide a variety of auxiliary services [4,5]. Therefore, reasonable control of demand-side resources can supplement the traditional system frequency regulation, and thus enhance the stability of the power system [6].

In the demand-side resources, the thermostatically controlled load (TCL) is a kind of electric equipment controlled by a thermostat, which can realize electric heating conversion and adjustable temperature, including in heat pumps, electric storage water heaters (ESWHs), refrigerators, and heating, ventilation and air conditioning (HVAC) systems [7]. TCL can be used to provide frequency regulation services, and is mainly based on the

**Citation:** Chen, R.; Liu, H.; Liu, C.; Yu, G.; Yang, X.; Zhou, Y. System Frequency Control Method Driven by Deep Reinforcement Learning and Customer Satisfaction for Thermostatically Controlled Load. *Energies* **2022**, *15*, 7866. https:// doi.org/10.3390/en15217866

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 18 September 2022 Accepted: 18 October 2022 Published: 24 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

<sup>1</sup> State Grid Hubei Electric Power Research Institute, Wuhan 430077, China

following three points. Firstly, it is widely distributed in residential, commercial and industrial buildings, with adjustable potential. Secondly, it has sufficient thermal storage capacity and can be regarded as distributed energy storage equipment. Thirdly, the control method is flexible and can respond to the power demand of the system in time [8]. Therefore, in order to fully excavate the frequency regulation potential of flexible resources on the demand side and maintain the grid frequency within a certain offset range, it is necessary to conduct in-depth research on the control strategy of large-scale TCLs on the demand side.

In the current research, there are mainly three control methods for TCLs to participate in ancillary services: centralized control, decentralized control and hybrid control [9–11]. In centralized control, the control center sends control signals to all controlled loads, but it needs to build a large number of communication channels, leading to high control costs. Hu et al. [9] established a hierarchical centralized load tracking control framework that coordinates demand-side heterogeneous TCL aggregators and uses a state–space model for modeling. The decentralized control decentralizes the judgment mechanism of load control to the local control terminal, and pre-sets the procedures or thresholds at the local control terminal. When the demand-side device detects important parameter changes, the load acts according to the pre-set strategy. Because the judgment of decentralized control is performed at the local port, the demand for communication is low and the response speed is fast. However, the control effect is largely influenced by the user behavior and the error of the detection device. Delavari and Kamwa [10] applied a multi-objective optimization approach for optimizing each load setting to reduce the amount of load response required and trigger the load based on the frequency response index of decentralized control. The hybrid control combines the features of centralized and decentralized control, and establishes a control framework of "centralized parameter setting–decentralized decision making", and coordinates large-scale users and grid control centers through load aggregators (LAs). Song et al. [11] built a two-stage control model based on hybrid control to participate in energy market trading. Based on hybrid control, Wang et al. [12] used TCLs to mitigate PV and load variations in microgrid communities. The above methods require a communication network between the control center and all aggregates, increasing the cost and difficulty of demand-side load control.

In the research on the participation of TCLs in auxiliary service, Ref. [13] built a dynamic model and verified the performance of a variable-frequency heat pump in providing frequency modulation services by using direct load control. This paper mainly studied the dynamic response performance of a single air-conditioning system, but focused less on the coordinated control of large-scale air-conditioning loads. Ref. [14] established a virtual energy storage model of variable-frequency air conditioning, shielded part of the model information through a hierarchical control framework and simplified the downlink control by using a unified broadcast signal. However, in this paper, the adjustable capacity of the air conditioning cluster will be sacrificed in order to simplify the downlink control. There are two main control modes of TCL, namely direct switch and temperature setting [15]. Ref. [16] realized frequency adjustment based on load direct switch. The advantage of this method is that the tracking accuracy of the system is high and the influence on the user's comfort is low within the range of load regulation ability. The disadvantage is that when the indoor temperature of the load is concentrated near the temperature boundary, the equipment will be frequently switched on and off, which will not only fail to complete the adjustment task, but also reduce the service life of the equipment [17]. Temperature setting can avoid the above disadvantages, but its limitation is that the tracking effect of power depends on the designed controller including a minimum variance controller [18], a sliding mode controller [19] and an internal model controller [20]. In addition, its limitations are also shown in the large range of temperature changes, which will have an impact on the comfort of users [21]. Pallonetto et al. [22] established a residential building energy management system (EMS) based on a combination of optimization techniques and machine learning models. This EMS reduces energy consumption while maintaining thermal comfort. Therefore, it is important to consider the influence of consumer satisfaction in

the process of load response control to dispatch the motivation of user participation in demand response.

The above-mentioned documents all adopt a single control method to schedule and control TCL, but they often fail to meet the application requirements. This is not only because of their inherent defects, but also because of the different requirements of users. Direct switching is suitable for loads with high temperature requirements. At this time, the user expects the start and stop frequency of the equipment to be kept at a low level to extend the service life of the equipment. The load with a low temperature requirement is suitable for temperature setting. At this time, the user expects the temperature change to be maintained within a certain range to improve the comfort. Therefore, it is of practical significance to combine the two control modes. Ref. [23] proposed a hybrid control strategy based on a parallel structure. This strategy can improve the tracking accuracy of the system and reduce the number of switches in the equipment. However, the temperature changes widely, which will reduce the user's comfort. In recent years, the deep reinforcement learning algorithm has provided a new solution to the frequency control problem of power system.

With its strong search and learning ability, the deep reinforcement learning algorithm has the potential of online optimization of decision making in the face of complex nonlinear frequency control problems. The Q-learning algorithm of deep reinforcement learning is used to realize the cooperative control of distributed generation units, thus eliminating the frequency deviation of the system in [24,25]. Ref. [26] combined electric water heater buffer models with domain randomization to reduce the initialization time of Q-learning in demand response control. Ruelens et al. [27] applied batch reinforcement learning to coordinate the power consumption of users with thermostatically controlled loads. However, the Q-learning algorithm can only discretely select control actions from low-dimensional action domains, so it cannot deal with problems with continuous variables [28]. Ref. [29] proposed a deep reinforcement learning algorithm that acts on the continuous action domain, thus realizing the adaptive control of load frequency. An energy management scheme for AC control based on the deep deterministic policy gradient (DDPG) algorithm is proposed in [30]. However, this algorithm is only suitable for the optimal control of a single generator set, and is not suitable for the control of large-scale thermostatically controlled load. In [31], a distributed soft actor–critic (DSAC)-based data-driven frequency control method is proposed; the DSAC model estimates the distribution of value function over returns instead of only estimating the mean. The DSAC method based on entropy regularization has a faster learning speed compared to the traditional expectation-based reinforcement learning methods.

In view of the above problems, we take the thermostatically controlled load as the frequency control object, and based on the deep reinforcement learning algorithm, propose a frequency stability control method with the participation of thermostatically controlled load, considering customer satisfaction. Firstly, considering the operation characteristics of thermostatically controlled load with different control types, the energy storage index and the discomfort index are established, and the fuzzy comprehensive discrimination method is used to evaluate the customer satisfaction. Then, in order to realize the frequency cooperative control of large-scale thermostatically controlled load, a multi-agent control model is established based on the soft actor–critic algorithm to realize the continuous action control of thermostatically controlled load. Through the multi-agent reinforcement learning model considering customer satisfaction, the frequency response control of each thermostatically controlled load cluster can be coordinated online. The main contributions of this paper are as follows:


This method can rely only on local load and frequency data to achieve real-time cooperative control of large-scale TCLs, which reduces the communication pressure in the scheduling process.

The remainder of this paper is organized as follows. In Section 2, the TCL dynamic model and control methods are formulated. Section 3 is concerned with the comprehensive control index of TCLs considering customer satisfaction. In Section 4, the frequency response control of TCLs based on SAC deep reinforcement learning is modeled. Case studies are provided in Section 5. Finally, conclusions are summarized in Section 6.

#### **2. TCL Dynamic Model and Control Methods**

#### *2.1. TCL Dynamic Model*

The first-order ordinary differential equation model considering indoor environment, outdoor environment and building characteristics has high accuracy and simple calculation, and is widely used in practice [32,33]. State variables *Ti* and virtual variables *si* are introduced into the model. The operating characteristics of the *i*-th TCL in the cooling mode can be expressed as:

$$\frac{dT\_i(k)}{dk} = \frac{1}{C\_i R\_i} (T\_\infty(k) - T\_i(k) - s\_i(k)R\_i P\_i) \tag{1}$$

Among them, the change rule of *si*(*k*) is as follows:

$$s\_i(k + \Delta k) = \begin{cases} 0 & s\_i(k) = 1 \text{ and } T\_i(k) \le T\_i^{\text{min}} \\ 1 & s\_i(k) = 0 \text{ and } T\_i(k) \ge T\_i^{\text{max}} \\ s\_i(k) & \text{others} \end{cases} \tag{2}$$

$$\begin{cases} \begin{array}{c} T\_i^{\min} = T\_i^{\text{set}} - \frac{\delta}{2} \\ T\_i^{\max} = T\_i^{\text{set}} + \frac{\delta}{2} \end{array} \end{cases} \tag{3}$$

where *T*∞(*k*) and *Ti*(*k*) are outdoor temperature and indoor temperature, respectively; *Ci*, *Pi* and *Pi* are the equivalent heat capacity, equivalent heat resistance and energy transfer rate of the *i*-th TCL, respectively; *si*(*k*) indicates the load switch state, on state *si*(*k*) = 1 and off state *si*(*k*) = 0. *T*max *<sup>i</sup>* and *<sup>T</sup>*min *<sup>i</sup>* are the upper and lower limits of the temperature during load operation, respectively. *Tset <sup>i</sup>* is the temperature setting value, *δ* is the temperature dead zone interval and is a constant. *k* and Δ*k* are operation time and control period, respectively. By solving Equation (1):

$$T\_i(k) = T\_\infty(k) - s\_i(k)R\_i P\_i - \left(T\_\infty(k) - s\_i(k)R\_i P\_i - T\_i(0)\right)e^{-\frac{k}{C\_i R\_i}}\tag{4}$$

where *Ti*(0) represents the initial indoor temperature. For a load cluster composed of *N* TCLs, the aggregate power consumption is the sum of the rated power of all loads.

$$P\_{total}(k) = \sum\_{i=1}^{N} P\_i^n s\_i(k) \tag{5}$$

$$P\_i^n = \frac{P\_i}{\eta\_i} \tag{6}$$

where *P<sup>n</sup> <sup>i</sup>* is the rated power of the *i*-th TCL and *η<sup>i</sup>* is the energy conversion efficiency coefficient of the *i*-th TCL.

Figure 1 shows the frequency response model of the power system with TCL load participating in frequency modulation, where *TGa* and *TGb* are the time constants of the governor and the turbine, respectively. The governor and the turbine are the instantaneous characteristic compensation links, which are expressed by the lead lag transfer function between the time constants *T*<sup>1</sup> and *T*2. *TR* is the TCL response delay time constant, *Tc* is the communication delay time constant and *R*eq is the unit adjustment rate. Δ*PG*, Δ*PL*, *H* and *D* represent the total output power, the disturbance power, inertia time constant and load damping coefficient of the system, respectively. Δ*f* is the frequency offset.

**Figure 1.** Frequency modulation model of power system with TCL participation.

#### *2.2. TCL Control Methods*

The control methods of TCL are mainly divided into direct switch control and temperature setting control. The TCL operation characteristics of the direct switch are shown in Figure 2a. The temperature setting value of the load remains unchanged, and the dispatching command directly acts on the equipment switch during the operation time *k* = *k*0. The advantage of this method is that it can accurately track the power within the adjustable temperature range and has little impact on the user's comfort. However, when the indoor temperature approaches the temperature boundary, it will cause frequent switching, thus reducing the service life of the equipment. The TCL operating characteristics of the temperature setting are shown in Figure 2b. The dispatching command increases the temperature setting value at time *k* = *k*0. Since the temperature dead zone of the load is unchanged, its operating range will change, thus indirectly changing the switching state of the equipment. Since the indoor temperature of the load is always uniformly distributed in the temperature dead zone, the indoor temperature will not approach the temperature boundary, but it has a notable impact on the user's comfort, and its tracking effect depends on the designed controller.

In practical application, the reasonable distribution of power regulation can make load clusters with different control modes cooperate with each other, which can not only realize accurate tracking of power, but also avoid their limitations and meet the different needs of users. Many factors need to be considered when allocating the power regulation amount. On the one hand, it is necessary to meet the requirements of the power system for frequency regulation and ensure the tracking accuracy within a certain range. On the other hand, for the load with direct switch, the number of switches should be reduced as much as possible to prolong the service life of the equipment. For the load with temperature setting, the temperature change should be reduced as much as possible to improve the user's comfort. There is often a highly nonlinear relationship between these factors and power distribution, and the frequency regulation requires high real-time performance. Therefore, the conventional optimization method cannot obtain the optimal power allocation.

**Figure 2.** Operation characteristics of TCL in different control modes. (**a**) Direct switch control; (**b**) temperature setting control.

#### **3. Comprehensive Control of TCL Considering Customer Satisfaction**

*3.1. Calculation of TCL Load Regulation Index*

The temperature setting value of the direct switch load is almost constant, and the influence on the user's comfort is negligible. Its control mode directly acts on the equipment switch. When the indoor temperature is close to the temperature boundary, the load regulation capacity will decrease, and the start and stop frequency of the equipment will be higher and the service life will be shortened. When the indoor temperature is close to the temperature setting value, the load regulation capacity will rise. At this time, the start and stop frequency of the equipment is low and the service life is extended. In order to characterize the regulation ability of TCL and provide a basis for evaluating the service life of equipment, we use the definition of battery state of charge for reference, and define the energy storage index *C*s for TCL cluster with a direct switch under refrigeration mode:

$$\mathbf{C}\_{\mathbf{s}} = \frac{2}{\delta} \left| \frac{\sum\_{i=1}^{N} T\_{\text{max}} - T\_i}{N} - \frac{\delta}{2} \right|\_{} T\_i \in \left( T\_i^{\text{min}}, T\_i^{\text{max}} \right) \tag{7}$$

According to the definition of *C*s, the closer *C*s is to 0, the closer the indoor temperature is to the temperature set value. At this time, the temperature distribution of TCL is relatively uniform, its adjustable potential is large and the switch switching is not frequent. The closer *C*<sup>s</sup> is to 1, the closer the indoor temperature is to the upper and lower temperature limits. At this time, the temperature distribution of TCL is more concentrated, its adjustable potential is small and the switch switching is more frequent. Therefore, when the agent outputs the control command, it should make *C*<sup>s</sup> close to 0 as much as possible, so as to reduce the start and stop frequency of the equipment.

The tracking effect of the load depends on the designed controller, and because the set value changes, the temperature changes in a large range, which will reduce the comfort level of the user. To characterize the user's comfort, the discomfort index *C*u is defined as follows for TCL cluster with temperature setting:

$$\mathbf{C}\_{\mathbf{u}} = \left| T\_i^{\text{set}} - T\_i^{\text{set}}(0) \right| \tag{8}$$

where *Tset <sup>i</sup>* (0) indicates the initial temperature setting value. According to the definition of *C*u, the more the temperature setting value deviates from the initial temperature setting value, the higher the user's discomfort level. Therefore, when the agent outputs the control command, it should make *C*u as close to 0 as possible to reduce the user's discomfort.

#### *3.2. Customer Satisfaction Assessment Method Based on Regulation Index*

According to the above analysis, users' needs are different under different control methods. In order to comprehensively evaluate customer satisfaction, the fuzzy comprehensive evaluation (FCE) method is adopted for evaluation. The specific operations are as follows.


$$r\_{sp}(y\_s) = e^{-\left(\frac{y\_s - w\_{sp}}{\sigma\_{sp}}\right)^2} \tag{9}$$

where *ys* is the input of the *s*-th factor, *C*<sup>s</sup> and *C*u. *usp* and *σsp* are the mean and standard deviation of the *s*-th factor and the *p*-th comment, respectively. Then, the fuzzy evaluation matrix *R* is:

$$R = \begin{bmatrix} r\_{11} & r\_{12} & r\_{13} & r\_{14} & r\_{15} \\ r\_{21} & r\_{22} & r\_{23} & r\_{24} & r\_{25} \end{bmatrix} \tag{10}$$

(5) Fuzzy comprehensive evaluation is carried out. The fuzzy evaluation set is:

$$B = A \diamond R = \begin{bmatrix} b\_1 \ b\_2 \ b\_3 \ b\_4 \ b\_5 \end{bmatrix} \tag{11}$$

where ◦ represents the operation of the fuzzy matrix. Since the weighted average fuzzy synthesis operator has an obvious weight effect and strong comprehensive degree, and can make full use of the information of *R*, the element *bp* is:

$$b\_{\mathcal{P}} = \min\left(1, \sum\_{s=1}^{2} a\_{s} r\_{sp}\right) \tag{12}$$

(6) Evaluate customer satisfaction. In order to make the level continuous and quantitative, the level rank corresponding to the matrix B element is set as 1, 2, 3, 4 and 5, and the customer satisfaction m is defined as:

$$m = \frac{b\_1 + 2b\_2 + 3b\_3 + 4b\_4 + 5b\_5}{b\_1 + b\_2 + b\_3 + b\_4 + b\_5} \tag{13}$$

According to the definition of *m*, the smaller *m* is, the higher the user's satisfaction.

In practical application, the agent should not only consider customer satisfaction, but also meet the requirements of the power system for frequency regulation, that is, to ensure the tracking accuracy within a certain range. In order to evaluate the tracking performance of the system, the root mean square error index *E*RMS is defined as:

$$E\_{\rm RMS} = \sqrt{\frac{\sum\_{\rm Ak=1}^{N\_b} \left(\varepsilon(\Delta k)\right)^2}{N\_s \left(P\_{\rm target}^{\rm max} - P\_{\rm target}^{\rm min}\right)^2}} \times 100\% \tag{14}$$

where *Ns* is the number of control cycle Δ*k*, *e*(Δ*k*) is an error signal in a control period, and *P*max target and *P*min target are the minimum value and the maximum value of the tracking power signal, respectively.

According to the definition of *E*RMS, the smaller *E*RMS is, the higher the tracking accuracy of the system. In order to comprehensively evaluate the regulation effect and provide the basis for the optimization of the power distribution signal, the comprehensive evaluation index *J* is defined as:

$$J = (1 - \lambda)E\_{\text{RMS}} + \lambda m \tag{15}$$

where *λ* is the proportion of satisfaction.

In fact, priority should be given to ensuring the frequency stability of the power grid: when the tracking accuracy is within a certain range, customer satisfaction can be considered; otherwise, it will not be considered. The relationship between *λ* and *E*RMS is as follows:

$$\lambda = \begin{cases} G\_1 & 0 < E\_{\text{RMS}} \le F\_1 \\ G\_2 & F\_1 < E\_{\text{RMS}} \le F\_2 \\ G\_3 & F\_2 < E\_{\text{RMS}} \le F\_3 \\ 0 & E\_{\text{RMS}} > F\_3 \end{cases} \tag{16}$$

where *F*1, *F*2, *F*3, *G*1, *G*<sup>2</sup> and *G*<sup>3</sup> are constants, which can be set according to actual operation conditions.

#### **4. Frequency Response Control of TCL Based on SAC Deep Reinforcement Learning**

*4.1. A Deep Reinforcement Learning Model of Soft Actor–Critic*

Reinforcement learning is adaptive learning in the way of trial and error of agents. The agent interacts with the environment continuously, and takes actions to change the environmental state by acquiring the environmental state. The agent obtains corresponding rewards or punishments as the update guidance of the model parameters, so as to obtain the maximum cumulative rewards in continuous learning. Through this perception action evaluation learning method, the agent continuously obtains knowledge in the interaction

process, and constantly adjusts and improves its action strategy to adapt to the environment, and finally gives a better task execution strategy. The environment interaction mode is generally described by the Markov decision process (MDP) and is composed of five tuples (S, A, P, *r*, *γ*), namely state space S, action space A, state transition probability P, return function r and discount factor *γ*.

In this paper, the deep reinforcement learning based on the flexible actor evaluator framework is used to control the frequency response of TCL. The framework of the proposed control model is shown in Figure 3. In the iterative calculation at time t, the actor first generates the action *at* through the policy network according to the operating state *st* of the TCL cluster observed at this time. After that, the TCL cluster performs state transition according to the control strategy at this time, and reaches the state *st*+<sup>1</sup> at the next time. At the same time, the system environment calculates the reward *r*(*st*, *at*) at time t and feeds it back to the agent, who records (*st*, *at*,*r*(*st*, *at*),*st*+1) in the experience pool. Then, the action strategy sampling of the actor and the system state are input to the critical at the same time, and the action value function *Q*(*st*, *at*) is output to evaluate the strategy. This process is carried out circularly, and the actor and the judge update their neural network parameters through the gradient descent method, so as to realize the adaptive learning of the model. During the training process, the accumulated return of the agent in the response period will gradually increase and eventually become stable. By introducing the maximum entropy encouragement strategy, the SAC reinforcement learning algorithm can improve the robustness of the algorithm and speed up the training speed. It can make accurate and effective control decisions for large-scale temperature control loads in the complex power supply and demand environment.

**Figure 3.** Frequency response control framework of large-scale TCLs based on deep reinforcement learning.

#### *4.2. SAC Deep Reinforcement Learning Method* 4.2.1. SAC Objective Function

The objective function of SAC requires the strategy to maximize the policy entropy while maximizing the cumulative return, so as to avoid greedy sampling in the learning process and falling into local optimization. According to this, the objective function *π*∗ max is constructed as shown in the equation.

$$\pi\_{\text{max}}^\* = \arg\max\_{\pi} \sum\_{t=1}^T E\_{(s\_q, a\_q) \sim p\_\pi} \left( r(s\_{q^\*} a\_q) + a H(\pi(\cdot | s\_q)) \right) \tag{17}$$

where *E*(·) is the expected function, *π* is a policy, *sq* is the state space of the *q*-th agent, *aq* is the action space of the TCL and *r*(*sq*, *aq*) is the reward function of the *q*-th agent. The state action trajectory (*sq*, *aq*) ∼ *p<sup>π</sup>* formed by the strategy *π*. *α* is the temperature term, which determines the influence of entropy on reward. *H*(*π*(·|*sq*)) is the entropy term of the strategy in the state, and its calculation method is shown in Equation (18).

$$H(\pi(a\_{\boldsymbol{q}}|\mathbf{s}\_{\boldsymbol{q}})) = -\int\_{\cdot^{\boldsymbol{q}}} \pi(a\_{\boldsymbol{q}}|\mathbf{s}\_{\boldsymbol{q}}) \log(\pi(a\_{\boldsymbol{q}}|\mathbf{s}\_{\boldsymbol{q}})) da\_{\boldsymbol{q}} = E\_{a\_{\boldsymbol{q}} \sim p\_{\pi}}(-\log(\pi(a\_{\boldsymbol{q}}|\mathbf{s}\_{\boldsymbol{q}}))) \tag{18}$$

#### 4.2.2. SAC Iteration Strategy

The value function in the reinforcement learning process is shown in Equation (19), which is used for strategic value evaluation *Q*(*sq*, *aq*). The bellman backup operator is shown in Equation (20) and is used for policy updating.

$$Q(s\_q, a\_q) = r(s\_q, a\_q) + \gamma E\_{s\_{t+1} \sim p}(Q(s\_{q+1}, a\_{q+1})) \tag{19}$$

$$T^{\pi}Q(\mathbf{s}\_{q}, a\_{q}) \stackrel{\Delta}{=} r(\mathbf{s}\_{q}, a\_{q}) + \gamma E\_{\mathbf{s}\_{t+1} \sim p}(V(\mathbf{s}\_{q+1})) \tag{20}$$

where *Est*<sup>+</sup>1∼*<sup>p</sup>* is the expected function of the state space, *<sup>T</sup><sup>π</sup>* is the bellman backup operator under the policy *π* and *γ* is the discount factor of the reward. *V*(*sq*+1) is a new value function of the state, and its calculation method is shown in the following equation.

$$V(\mathbf{s}\_{\emptyset}) = E\_{\mathbf{a}\_{\emptyset} \sim \pi}(Q(\mathbf{s}\_{\emptyset}, a\_{\emptyset}) - \log \pi(a\_{\emptyset}|\mathbf{s}\_{\emptyset+1})) \tag{21}$$

Meanwhile, there are

$$Q^{k+1} = T^{\pi} Q^k \tag{22}$$

where *Q<sup>k</sup>* is the value function of the *k*th calculation.

Equation (23) can be realized by iterating Equations (20) and (22) continuously.

$$\lim\_{k \to \infty} \mathbb{Q}^k = \hat{\mathbb{Q}} \tag{23}$$

where *Q*ˆ is the soft *Q*-value.

#### 4.2.3. SAC Policy Update

The strategy updating method in the calculation process is shown in Equation (24).

$$\pi\_{\text{new}} = \arg\min\_{\pi \in \Pi} D\_{KL} \left( \pi(\cdot|s\_{\emptyset}) || \frac{\exp\left(\frac{1}{a} \mathbf{Q}^{\pi\_{\text{old}}}(s\_{\emptyset'} \cdot)\right)}{Z^{\pi\_{\text{old}}}(s\_{\emptyset})} \right) \tag{24}$$

where *KL* is divergence, <sup>Π</sup> is a policy set and *<sup>Q</sup>πold* (·) is the value function under the old strategy *πold*. *Zπold* (*sq*) is the partition function under the old strategy *π*, which is used to normalize the distribution.

#### 4.2.4. Construction of SAC Algorithm

The SAC algorithm needs to construct neural networks, including *Q*-value network and strategy network. The *Q*-value network outputs a single value through several layers of neural networks, and the strategy network outputs a Gaussian distribution. In this process, the neural network will be updated. The *Q*-value network parameter has an update strategy as shown in Equation (25), and the policy network parameter has an update strategy as shown in Equation (26).

$$J\_Q(\theta) = E\_{\left(s\_q, a\_q, s\_{q+1}\right) \sim D} \left( \frac{1}{2} \left( Q(s\_{q\prime}, a\_q) - \left( r(s\_{q\prime}, a\_q) + \gamma V\_{\overline{\theta}}(s\_{q+1}) \right) \right)^2 \right) \tag{25}$$

$$J\_{\pi}(\phi) = D\_{KL}\left(\pi(\cdot|s\_{\eta})||\exp\left(\frac{1}{a}Q\_{\theta}(s\_{\eta'}\cdot) - \log Z(s\_{\eta})\right)\right) \tag{26}$$

where *θ* is the network parameter of *Q*-value, *φ* is the policy network parameter and *V<sup>θ</sup>* and *Q<sup>θ</sup>* are the new value function and the value function after substituting the *Q*-value network parameters. *Z*(*sq*) is the partition function of the state.

The temperature parameter is an important parameter to assist in maximizing entropy, which can maximize the exploration of action space. A reasonable temperature parameter setting is helpful to realize iterative testing of all feasible actions. Therefore, the update of the temperature parameter is as shown in Equation (27).

$$J(\mathfrak{a}) = E\_{\mathfrak{a}\_{q} \sim \pi\_{q}}(-\alpha \log \pi\_{q}(\mathfrak{a}\_{q}|\pi\_{q}) - \mathfrak{a}H\_{0}) \tag{27}$$

where *π<sup>q</sup>* is the control strategy of the *q*-th agent. *H*<sup>0</sup> is the entropy term.

The expressions (25)–(27) are all obtained by calculation. Throughout the whole process, the *Q*-value network parameters, strategy network parameters and temperature parameters are continuously updated through deep neural network learning, which can make the model converge continuously and solve the optimal strategy.

#### *4.3. Design of SAC Deep Reinforcement Learning Model for TCL Frequency Response*

In this paper, the SAC algorithm is used to solve the control strategy problem of large-scale temperature control load participating in system frequency regulation. The structural model of the proposed control method is shown in Figure 3. The agent in the figure is an agent based on a deep neural network. The environment of the controller is the frequency Δ*f* deviation of the power system, the differentiation and integration of Δ*f* , the baseline power signal *Pbase* of the aggregated TCL, the aggregated power consumption *P*total and the automatic generation control signal PAGC. The automatic generation control signal is a series of positive and negative power signals, which represents the active power deviation between the supply and demand of the power system. The baseline power signal of aggregated TCL is the sum of the rated power *P<sup>n</sup> <sup>i</sup>*.*set* of TCL cluster under a certain temperature setting value.

$$P\_{\text{base}} = \sum\_{i=1}^{N} P\_{i, \text{sct}}^{\text{u}} \tag{28}$$

$$P\_{i.set}^{\rm II} = \frac{T\_{\infty} - T\_i^{\rm set}}{\eta\_i R\_i} \tag{29}$$

A tracking power signal *P*target can be generated by superimposing the *P*AGC and *Pbase*. The tracking error signal *e* can be generated by subtracting the *P*target and *P*total, that is

$$P\_{\text{target}} = P\_{\text{AGC}} + P\_{\text{base}} \tag{30}$$

$$
\varepsilon = P\_{\text{target}} - P\_{\text{total}} \tag{31}
$$

The agent obtains the response power of TCL cluster after optimization according to the environmental information. Then, the direct switch control cluster and the temperature setting control cluster complete the adjustment task according to the obtained control signals. The method includes two stages: offline pre-learning and online application. In the offline pre-learning stage, the pre-learning process will iteratively update all parameters of the agent. During each self-learning iteration, the agent will conduct action exploration (i.e., generate different commands) to interact with the environment. After exploration, the parameters of the agent will be updated according to the system frequency deviation and the reward function of the TCL controller. With an appropriate reward function R and considering environmental constraints, the gradient of the actor (i.e., the gradient of the control target relative to the parameters of the agent) will be calculated and used to update all the parameters of the agent. In the online application stage, the intelligent personnel will calculate the operation value (i.e., generate command) according to the observation value and parameters obtained by themselves for each control cluster.

For the frequency response model of large-scale temperature control load, the negative value of the comprehensive evaluation index *J* is taken as the reward function of the agent, that is

$$R = f = (1 - \lambda)E\_{\rm RMS} + \lambda m \tag{32}$$

By introducing the system frequency deviation and customer satisfaction into the reward function, the obtained control strategy can improve the tracking accuracy of the system, reduce the switching frequency of the equipment, reduce the temperature change and improve the customer satisfaction.

#### **5. Result Analysis**

#### *5.1. Example Introduction and Scenario Setting*

In order to verify the effectiveness of the method proposed in this paper for large-scale thermostatically controlled loads to participate in power grid frequency control, we take the distribution network with large-scale thermostatically controlled loads as an example, and set the disturbance power of the regional power grid as the net load power, that is, the difference between the original load power and the power generated by new energy, such as photovoltaic and wind power. The disturbance power in the simulation time is shown in Figure 4. As an important thermostatically controlled load, HVAC accounts for a large proportion and is easy to control and manage. Therefore, we selected 2000 HVAC units for simulation experiments. The load parameter settings are shown in Table 1. The initial indoor temperature of the load is evenly distributed in the temperature dead zone, and the temperature dead zone is set to 1.2 ◦C. *P*AGC is the actual frequency regulation signal from the PJM power market in the United States, which changes every four seconds.

**Figure 4.** Variation of disturbance power within 2 h.



Note: *N* (2, 0.01) represents the normal distribution with the mean value of 2 and the variance of 0.01, and the others are the same.

The parameter settings in the customer satisfaction evaluation and comprehensive evaluation indicators are shown in Table 2. When the value of ERMS is (0, 2%], the frequency regulation effect of the power system is better. At this time, customer satisfaction is mainly considered. When the value of *E*RMS is (2, 3%], *λ* is 0.5, indicating that the root mean square error index and the customer satisfaction have the same impact on the system regulation. When the value of *E*RMS is (3%, 5%], the frequency regulation effect of the power system is poor, and the root mean square error index is mainly considered at this time. When *E*RMS is greater than 5%, *λ* is 0, indicating that the system regulation will no longer consider the customer satisfaction, and will focus on improving the tracking accuracy.



#### *5.2. Frequency Control Effect Analysis Considering Customer Satisfaction*

Figure 5 compares the changes in customer satisfaction before and after customer satisfaction is included in the agent optimization process. As can be seen from Figure 5, on the one hand, the peak value of customer satisfaction increased from 3.5 before optimization to 4.7 after optimization. On the other hand, within the simulation time range, customer satisfaction after optimization is higher than that before optimization. Figure 6 compares the frequency control effect of the method proposed in this paper before and after considering customer satisfaction. As can be seen from Figure 6, since the customer satisfaction index is added to the control target, the algorithm's penalty weight for frequency deviation is relatively reduced, resulting in an increase in the maximum frequency deviation of the system compared with that before considering customer satisfaction. Considering customer satisfaction will limit the number of TCLs participating in frequency response, resulting in a poor system frequency control effect, but the impact is not significant. Therefore, the method proposed in this paper can better balance the frequency deviation control and customer satisfaction.

**Figure 5.** Comparison of customer satisfaction before and after optimization.

**Figure 6.** Frequency deviation before and after considering customer satisfaction.

#### *5.3. Frequency Control EFFECT Analysis Based on SAC Deep Reinforcement Learning*

In order to verify the effectiveness of the proposed SAC deep reinforcement learning algorithm in the collaborative control of large-scale thermostatically controlled load compared with the traditional PID method, we used the traditional PID controller, the PID controller optimized by particle swarm optimization (PSO) algorithm parameters, and the algorithm proposed in this paper to conduct simulation experiments on the thermostatically controlled load controlled by the direct switch control and the temperature setting control. The number of HVAC experiments is 2000. The comparison of the three control methods in the system frequency control effect is shown in Figure 7.

It can be seen from Figure 7 that under the regulation control of the conventional PID controller, there is a system frequency deviation of about 0.047 Hz in the period when the disturbance power fluctuates violently. Compared with the traditional PID controller, the frequency effect of the PID controller is improved after PSO algorithm parameter optimization, but the effect is still not ideal. The frequency deviation of the system fluctuates within the range of (−0.037 Hz, 0.038 Hz). The algorithm proposed in this paper can keep the system frequency deviation within (−0.02 Hz, 0.023 Hz), and can significantly improve the frequency stability of the power grid.

**Figure 7.** Comparison of frequency control effect between the proposed algorithm and PID controller.

#### *5.4. Comparative Analysis of the Algorithm and Traditional Deep Reinforcement Learning*

This section shows the advantages of the algorithm in this paper compared with the data-driven deep Q network (DQN) algorithm and the DDPG algorithm from the aspects of system frequency control effect and algorithm convergence speed. The proposed algorithm and DDPG are both deep reinforcement learning algorithms based on continuous action space. As shown in Figure 8, the frequency control effect of the two algorithms is significantly better than that of the DQN algorithm designed based on the discrete action space. After integrating the advantages of model driven and data driven, the algorithm proposed in this paper further improves the real-time frequency control effect in the continuous action domain compared with the fully trained DDPG algorithm.

Figure 9 compares the iterative convergence process of the cumulative reward value of the DQN algorithm, the DDPG algorithm and the algorithm proposed in this paper. Among them, after about 250 and 300 iteration cycles, the cumulative reward value of DQN algorithm and DDPG algorithm tends to be stable and will not continue to increase. It is worth noting that each iteration cycle in this paper is an empirical trajectory containing 200 iterations. That is, the two algorithms need 50,000 and 60,000 iterations, respectively, to converge. However, the proposed algorithm only needs about 150 iteration cycles (30,000 iterations) to complete the parameter training of the deep neural network, and the oscillation amplitude of the convergence curve of the algorithm is the smallest.

**Figure 8.** Comparison of frequency control effect between different algorithms.

**Figure 9.** Iterative convergence process of cumulative reward value: (**a**) DQN algorithm; (**b**) DDPG algorithm; (**c**) SAC algorithm in this paper.

#### **6. Conclusions**

In this paper, considering the support of large-scale thermostatically controlled load on the demand side to the frequency of power grid, based on the deep reinforcement learning of soft actor–critic, a frequency cooperative control method of thermostatically controlled load considering customer satisfaction is proposed to solve the frequency control problem of a power system with large-scale thermostatically controlled load on the demand side participating in frequency regulation. In the example analysis, a distribution network is taken as the research object, and the performance of different algorithms is compared and verified based on time domain simulation. The simulation results show that compared with the existing deep reinforcement learning methods, this algorithm has obvious advantages in system frequency control, customer satisfaction and algorithm training time.

The algorithm proposed in this paper is mainly used to solve the frequency cooperative control problem under the participation of large-scale thermostatically controlled load in the distribution network, and does not consider the application of other flexible resources on the demand side in the system frequency modulation. The next step is to explore the application and practice of multi-agent deep reinforcement learning method in the frequency response of demand-side flexible resources based on the operation mechanism and mathematical modeling of demand-side flexible resources.

**Author Contributions:** Conceptualization, R.C. and G.Y.; Data curation, C.L. and X.Y.; Investigation, H.L. and Y.Z.; Methodology, R.C.; Software, H.L.; Supervision, G.Y. and Y.Z.; Validation, G.Y.; Visualization, X.Y.; Writing—original draft, R.C. and H.L.; Writing—review and editing, C.L., X.Y. and Y.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Science and Technology Project of State Grid Hubei Electric Power Research Institute: "Research on frequency dynamic prediction and active control strategy of high proportion new energy power system under mutational weather" (project number B31532225680).

**Data Availability Statement:** The data in this paper are from a real distribution network and involve a confidentiality agreement. The dataset in this paper is not publicly available.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviation**


#### **References**


### *Systematic Review* **Identifying Key Components in Implementation of Internet of Energy (IoE) in Iran with a Combined Approach of Meta-Synthesis and Structural Analysis: A Systematic Review**

**Mir Hamid Taghavi 1, Peyman Akhavan 2,\*, Rouhollah Ahmadi 3,\* and Ali Bonyadi Naeini <sup>1</sup>**


**Abstract:** The increasing consumption of energy and the numerous obstacles in the way of its extraction, including diminishing fossil fuels and the turn towards renewable energies, environmental changes, a tendency towards systems of information networks, rising costs of energy and advancement of technology have made the need for new technologies aimed at efficient management of energy more imminent. The Internet of Energy (IoE) technology has been recognized as a novel and efficient strategy that provides the necessary tools for optimal energy management. The present study was carried out with the purpose of identifying key components in implementation of IoE in Iran. This study is practical in its goal and descriptive-explorative in its methodology. First, the data were categorized using the qualitative method of meta-synthesis and using the Sandelowski and Barroso method. The statistical population of the study was the scholarly finding of 2010–2021 and 55 papers were sampled from the published works. The kappa coefficient was used to determine reliability and quality control. The kappa coefficient calculated with SPSS equals 0.87, which falls in the "excellent" category. Second, the frequency and importance of each component was determined using the Shannon entropy technique. The purpose of this method is to measure the weight or importance of each component based on frequency and to identify the key components. Third, the MICMAC structural analysis method was used to evaluate the influence/dependence of components by eight experts in the field of energy and determine strategic components. The purpose of this step is to compare the results with the results of the second step of the research. The results show that 82 indicators play a role in implementation of the concept of IoE; these indicators can be divided into ten axial categories of rules and regulations, individual and human factors, funding, technological infrastructure, cultural and social factors, security factors, technological factors, knowledge factors, learning style, and management factors. In the Shannon entropy method, technological infrastructure, management factors, and rules and regulations are the most significant, respectively. In MICMAC structural analysis, the components of managerial factors, technological infrastructure, and financing have the largest share in influence and dependence, respectively. Conclusion: The two components of management factors and technological infrastructure can be considered as key and strategic components in implementation of IoE in Iran.

**Keywords:** IoE; optimal energy management; sustainable development; meta-synthesis; MICMAC analysis

#### **1. Introduction**

In the 21st century, the growing demand for energy and the widespread use of fossil fuels and traditional energy sources have been challenged by factors like the energy crisis, environmental pollution, and global warming [1]. In 2011, 82% of energy was generated

**Citation:** Taghavi, M.H.; Akhavan, P.; Ahmadi, R.; Bonyadi Naeini, A. Identifying Key Components in Implementation of Internet of Energy (IoE) in Iran with a Combined Approach of Meta-Synthesis and Structural Analysis: A Systematic Review. *Sustainability* **2022**, *14*, 13180. https://doi.org/10.3390/su142013180

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 4 May 2022 Accepted: 28 September 2022 Published: 14 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

from fossil fuels [2]. On the one hand, global energy demand in 2018 increased by 2.3 percent compared to 2017, which is the largest increase since 2010. As a result, CO2 emissions from the energy sector set a new record in 2018. According to figures released by the International Energy Agency (IEA), global energy demand will increase by more than two-thirds by 2035 [3]. Compared to the pre-industrial temperature levels, global warming has reached 1.5 degrees Celsius, and, if this trend continues, will exceed 2 degrees Celsius and have a negative impact on the planet and human life [2]. Undoubtedly, such an increase in energy demand will put an additional burden on the old energy infrastructure, which will lead to serious problems of grid congestion and reduced energy quality. The usual grid structure faces reliability problems due to the lack of real-time monitoring, automation techniques, error detection, transparency, and flexibility [3,4]. The demand for renewable energy sources, such as solar and wind energy, is increasing significantly as a solution for the problem of traditional energy sources. In addition to protecting the environment, such an approach will also meet future energy demands [5].

Although renewable energy sources have advantages, such as sustainable development and environmental conservation, they have disadvantages too. It is difficult to accurately predict the amount of energy generation from renewable energy sources and it mainly depends on environmental conditions [6,7]. Moreover, with the existing electricity infrastructure, energy from renewable sources cannot be fully efficient. China, for example, generates most of its energy in a green way but still faces an energy crisis because it cannot deliver the energy it needs to its large population. The gradual shift to decentralized renewable sources also shows that electricity generation depends on the seasons, and this unpredictable nature of electricity generation requires new demand-supply management techniques [8,9]. In addition, a key question for peer-to-peer energy trading is making it possible to share and connect the existing energy infrastructure with decentralized renewable energy sources. This matter requires systematic management and intelligent control, in addition to renewable energy sources, Distributed Generation (DG), flexibility, and transparency to achieve a smart, sustainable, and coordinated energy market. Distributed Generation (DG) of electricity provides several advantages, such as high efficiency and environmental protection, reduction of transmission and distribution losses, supporting the local power grid, and improving system stability. A better way to understand the potential benefits of DG is to take a system approach that considers generation and associated loads as a subsystem or "micro-grid" [10]. Micro-Grid (MG) ecosystems are increasingly being utilized to integrate smart grids with renewable energy sources such as wind power, photovoltaics, hydro turbines, biogas, etc. A micro-grid is a set of micro-resources such as micro-turbines, fuel cells, photovoltaic systems, storage systems, and wind turbines that provides distributed energy generation [2,11,12]. It can be connected to the utility grid (grid mode) or used independently and separated from the utility grid (island mode) [10]. The micro-grid also allows local energy exchange in the smart grid and reduces the waste due to energy transmission. In short, micro-grids are considered a solution to meet the challenges facing traditional power systems [2,11,12].

Smart grid technology is created in the context of micro-grids. The smart grid provides a platform for the production, distribution, storage, and transmission of energy and creates a reliable, transparent, flexible, and automated power system. A smart grid system with balanced generation and consumption of energy ensures energy sustainability [2,11,13]. Since decentralized renewable energy sources are widely used in micro-grids, achieving a stable power balance is difficult [4]. Therefore, there is a more imminent need to find a solution for the demand–supply balance, optimal management of energy, sustainable development, and all the problems mentioned before. IoE is of great importance as a future solution for the optimal management of energy production and consumption. IoE provides access to large amounts of decentralized energy sources by considering microgrids as infrastructure in future energy systems. The purpose of this study is to identify the infrastructure components to implement the new concept of IoE in Iran.

The existing studies have presented concepts related to the IoE in a scattered manner. Therefore, a comprehensive classification of the factors affecting this concept helps to understand it correctly. On the other hand, rapid global and technological developments in all areas such as IoE are inevitable. That is why future study approaches, including scenario analysis, become more important. In order to use this approach, it is necessary to identify the key drivers of that area so that future scenarios can be designed based on them, as well as appropriate policies for each scenario. In addition to identifying and classifying the components affecting IoE, the present study ranks them and determines the strategic components that can be used as an input for future research.

The present study answers two basic questions: What are the fundamental components in implementation of the concept of IoE? What are the key drivers of the implementation of IoE concept? In response to these questions, in response to these questions, two approaches of meta-synthesis and MICMAC analysis were used. In the first approach, after going through the screening process of papers based on the Critical Appraisal Skills Program (CASP), the number of relevant papers for review was determined. Then, based on the review of research literature and library studies, a set of key parameters for the implementation of IoE were extracted in the form of main and sub-categories. At this stage, using the MAXQDA software, the research parameters were coded and the frequency of each of them was determined. In the next step, the validity of extracted parameters was measured based on the opinions of ten experts. Having expertise and related education in the area of IoE have been the two main factors in the selection of research experts. Finally, strategic components were determined based on the Shannon entropy method and MICMAC structural analysis.

#### **2. Theoretical Foundations of Research**

The term IoE was first coined in 2011 by renowned American researcher Jeremy Rifkin. In his book, *The Third Industrial Revolution*, he points to the role of IoE in reducing fossil fuel consumption, increasing the use of decentralized energy sources, and decreasing environmental pollutions [14].

IoE combines the two concepts of the smart grid and Internet of Things (IoT). IoT is a concept in which every object can be identified, accessed, and even remotely controlled through the Internet and via the Internet Protocol (IP). This concept, based on smart grids, has been developed and introduced to the scientific community as IoE [3,15]. IoE refers to a robust understanding of IoT, big data, artificial intelligence technologies, and computing capabilities in centralized and decentralized energy management systems with the aim of optimizing the efficiency of existing energy infrastructure. It also facilitates coordination between renewable energy sources, smart grids, micro-grids, electric vehicles, and control centers, with the primary goal of improving efficiency, flexibility, and energy support [4,16]. In other words, IoE provides a real-time interface between the smart grid and a large set of equipment, and by processing data and information, creates the capacity for optimal energy production and storage while balancing energy production and consumption in the smart grid [17]. IoE is a paradigm that transforms current grid systems from centralized and oneway energy production to sustainable, flexible, efficient, reliable, and highly secure energy girds [1]. Using IoE paradigm provides a complete set of benefits. First, with the balanced effect of IP-based networking, it is possible to coordinate interactions with a large number of ICT technologies. In addition, Machine-to-Machine (M2M) interactions decentralize the control process, which in turn destroys the central communication network. Finally, interactive communication is the key to success in the global free energy market [18].

The development of renewable energies, along with the growth of information and communication technology, are the two driving and key elements in the field of IoE. Therefore, IoE can be seen as an energy efficiency system that enables the distribution of clean energy through systems of information and communication technology and can be studied as a smart grid [7,19]. Smart energy control, energy security, demand-side management of energy, increasing the use of renewable energies and their integration, reducing energy loss, reducing blackouts due to reduced energy production, the possibility of real-time monitoring, reducing operating and maintenance costs, increasing energy efficiency, system flattening, resource management, and self-organization are the main benefits of IoE [2,4,5].

#### **3. Empirical Foundations of Research**

The concept of IoE has gained the attention of various sectors, including universities, industries, and government departments [4]. For example, in the ARTEMIS IoE project in Europe, 38 companies from 10 European countries are developing IoE technology by focusing on Electric Mobility Infrastructure and smart grids [20]. In the United States, the Center for Future Renewable Electric Energy Delivery and Management (FREEDM), established by the National Science Foundation (NSF), has created new energy distribution infrastructure with the ability to plug and play decentralized renewable energy sources. In their view, IoE is considered a tool for flexible and automatic distribution of electricity [21]. In 2015, the President of China introduced IoE as a green solution to the global electricity demands [22]. In the same year, China launched a project called the Global Energy Internet (GEI), which works by developing smart grids to connect decentralized renewable energy sources by exchanging their information over the Internet [8]. In 2016, as well, the Global Energy Interconnection Development and Cooperation Organization (GEIDCO) in China introduced IoE as a sustainable new source of energy [23].

Numerous studies have been conducted on the new concept of IoE and its application in optimal energy management. Miglani et al. (2020) [4] introduced IoE as an essential technology required in the energy sector to not only manage demand response and peerto-peer energy trading, but also provide smart grid security. In this study, the use of blockchain technology in the context of IoE is considered an important tool in creating a decentralized structure, countering cyber-attacks, and maintaining smart grid security.

Hossein Motlagh et al. (2020) [2] examined the widespread uses of IoT technology in the energy sector (production, transmission, distribution, and consumption of energy). They also offer blockchain technology as a solution to the challenges of IoE, such as privacy and security.

Taghavi et al. (2021) [17], expressing the need for optimal energy management in the country due to the increased likelihood of facing an energy crisis in the near future, considered the new paradigm of IoE as a suitable solution and presented an IoE model for optimal energy management.

Sani et al. (2019) [23] considered the structure of the existing smart grids in the field of energy to be insufficient, and therefore proposed a cybersecurity structure for IoE. This structure introduces an identity-security mechanism called "I-ICAAAN" (Integrity, Confidentiality, Availability, Authorization, Authenticity, and Nonrepudiation), a secure communication protocol and a smart security system for energy management. Such a structure provides sufficient privacy and security for data and components of the network. It defines IoE as a software platform for controlling, monitoring, and managing the entire smart network through two-way interaction between all sources of energy production and consumption.

Nguyen et al. (2018) [24] propose a building energy management system (BEMS) based on IoE to manage issues such as large volumes of building energy data and energy overload problems in the future. Based on the studies, the most important key components for the implementation of IoE were identified as follows (Table 1).


#### **Table 1.** Key components in implementation of IoE.


Planning for energy management of smart cities [3,24,36,38,39,60,61]

#### **4. Research Method**

The present study is practical in its purpose and descriptive-explorative in its methodology. In the first step of the research, the Sandelowski and Barroso method (2007) [62] was used to identify the components of the implementation of the concept of IoE. Metasynthesis is a qualitative method based on a systematic review of literature to gain in-depth knowledge of the phenomenon under study. With the expansion of research in various fields of science and the confrontation of the scientific community with an explosion of information, researchers have, in practice, come to the conclusion that it is mostly not possible to be aware, up-to-date, and a master in all aspects of a field. Therefore, synthesis methods that offer the essence of research on a particular subject in a systematic and scientific way to researchers have become increasingly popular (Tables S1 and S2). Meta-synthesis evaluates other research; hence, it is called an evaluation of evaluations. Meta-synthesis is not merely an integrated review of the literature, but an analysis of the findings of these studies [63].

In the second step of the research, a futuristic approach called structural analysis was used. The potential of this method in using qualitative data along with quantitative data has made it one of the most widely used methods in research about the future. In this step, the matrix of analysis of the interaction of variables is completed by forming a panel consisting of eight experts in the field of energy. Then, in the framework of MIKMAK forecasting software, the influence and dependence (direct and indirect) of each variable on others are measured and strategic or key driving variables are obtained. MIKMAK software is one of the best software designed to implement structural analysis. The output of the software, in the form of tables and graphs, can help in understanding the system relationships and how they will work in future [64].

#### **5. Research Findings**

#### *5.1. Meta-Synthesis*

In this step, papers and studies conducted from 2010 to 2021 in the field of IoE were studied and analyzed. The Web of Science, Science Direct, Google Scholar, Springer, Emerald, ResearchGate, and Scopus databases were used to collect and categorize papers based on content, using two keywords of "Internet of Energy" and "Energy Internet" in the title; a total of 417 studies were found. Then, the process of reviewing papers, including the title, abstract, content, and research methodology began, the purpose of which was to exclude studies that were not relevant to the research questions. The review process is summarized in Figure 1.

The next step was to evaluate the methodological quality of the research, which aimed to eliminate studies in which the researcher did not trust the findings. The most commonly used tool for assessing the quality of primary studies in qualitative research is the Critical Appraisal Skills Program (CASP), which helps identify the accuracy, validity, and importance of qualitative studies by asking ten questions. These questions focus on the following: 1. Research objectives, 2. Methodological logic, 3. Research design, 4. Sampling method, 5. Data collection, 6. Reflectivity, 7. Ethical considerations, 8. Accuracy of data analysis, 9. Clarity of results and findings, and 10. Value of research [63].

In using this tool, studies were assigned a score of 1 to 5 on the above criteria after being studied. Based on the 50-point scale of CASP, the researcher proposed a scoring system according to Table 2 and categorized the studies based on their methodological quality. Studies that scored below the "good" category (score 31) were excluded from the project [65].

**Table 2.** Scoring system of the Critical Appraisal Skills Program (CASP).


**Figure 1.** Screening process of papers.

In this study, the 55 studies that had survived the first round of filtering based on title, abstract, content, and research methodology were then evaluated using the CASP system (Table 3). After assigning scores to each study based on the given criteria and eliminating studies with a score of less than 31, 55 studies were finally accepted to enter the evaluation process, of which 11 studies were assigned to the "very good" category and 44 studies to the "good" category. Therefore, after a few rounds of filtering, 362 papers were eliminated from the initial 417 and 55 found their way to analysis (Figure S1, Table 4). After evaluating the papers, the data were categorized as primary codes (open code) with reference to the source and frequency. Each of the codes was then classified according to their meaning in terms of similar concepts, which helped identify the main components of the research Table 1.

**Table 3.** The result of the Critical Appraisal Skills Program (CASP).


**Table 3.** *Cont.*



**Table 3.** *Cont.*

**Table 4.** List of papers evaluated using the Critical Assessment Skills Program (CASP).


**Paper Code Title** C20 Energy management based on Internet of Things: practices and framework for adoption in production management C21 Energy Internet blockchain technology C22 Energy Management Strategies for RES-enabled Smart-grids empowered by an Internet of Things (IOT) Architecture C23 The Internet of Energy: Smart Sensor Networks and Big Data Management for Smart Grid C24 Internet of Things Role in Renewable Energy Resources C25 Optimal sharing energy of a complex of houses through energy trading in the Internet of Energy C26 Does the Internet development affect energy and carbon emission performance? C27 Digitalization and energy: How does Internet development affect China's energy consumption? C28 Dynamic assessment of Energy Internet's emission reduction effect—a case study of Yanqing, Beijing C29 An overview of "Energy + Internet" in China C30 Energy Internet: The business perspective C31 Modeling of the Internet of Energy (IoE) for Optimal Energy Management with an Interpretive Structural Modeling (ISM) Approach C32 Internet of Things (IOT) and the Energy Sector C33 The Internet of Energy: A Web-Enabled Smart Grid System C34 A Review of Internet of Energy Based Building Energy Management Systems: Issues and Recommendations C35 Energy Management in Smart Cities Based on Internet of Things: Peak Demand Reduction and Energy Savings C36 Towards an Internet of Energy C37 Discussion on Energy Internet and Its Key Technology C38 An integrated approach for multi-objective optimization and MCDM of Energy Internet under uncertainty C39 A comprehensive review of Energy Internet: basic concept, operation and planning methods, and research prospects C40 Energy Harvesting for the Internet-of-Things: Measurements and Probability Models C41 Cyber security framework for Internet of Things-based Energy Internet C42 The Energy and Emergy of the Internet C43 Optimal Charging Control of Energy Storage and Electric Vehicle of an Individual in the Internet of Energy with Energy Trading C44 Information and resource management systems for Internet of Things: Energy management, communication protocols, and future applications C45 Research on operation and management muti-node model of mega city Energy Internet C46 Energy Internet forums as acceleration phase transition intermediaries C47 Energy-Efficient Device Architecture and Technologies for the Internet of Everything C48 Internet of Things for Modern Energy Systems: State-of-the-Art, Challenges, and Open Issues C49 An Overview of Internet of Energy (IoE) Based Building Energy Management System C50 Integration of electric vehicles and management in the Internet of Energy C51 Green Energy Management of the Energy Internet Based on Service Composition Quality C52 IoT Technologies for Augmented Human: a Survey C53 The Development of the Energy Internet of Things in Energy Infrastructure C54 Energy Internet and We-Energy C55 Architecture of the Internet of Energy Network: An Application to Smart Grid Communications

#### 5.1.1. Analytical Quality Control

In qualitative research, the concept of trustworthiness is used instead of the concepts of reliability and validity. In this regard, to control the extracted concepts, the coding of the two researchers was compared. To evaluate the degree of agreement between two coders (by two people or using two tools or at two different times) and, therefore, to evaluate internal reliability, the Kappa interclass correlation was used in SPSS. The kappa index value is calculated to be 0.87, which is in the range of excellent agreement (0.81–1) [66].

#### 5.1.2. Shannon Entropy

The steps for data analysis based on the Shannon entropy method are as follows:


$$m\_{ij} = \frac{\mathbf{x}\_{ij}}{\sum \mathbf{x}\_{ij}} \tag{1}$$

• The entropy value of each indicator (*Ej*) is calculated based on Equation (3):

$$k = \frac{1}{Ln(a)}; \text{ a = Number of indicates} \tag{2}$$

$$E\_{\vec{j}} = -k \sum [n\_{i\vec{j}} L N(n\_{i\vec{j}})] \tag{3}$$

• The significance coefficient of each indicator must be calculated. Whatever *Wj* has a higher value is more significant (Equation (4)):

$$\mathcal{W}\_{\dot{l}} = \frac{E\_{\dot{j}}}{\sum E\_{\dot{j}}} \tag{4}$$

To calculate the weight of each of the components, the total weight of its codes was calculated, and the ranking took place based on the weights obtained in Table 5.



**Table 5.** *Cont.*


**Table 5.** *Cont.*


**Table 5.** *Cont.*


#### *5.2. Structural Analysis Using MICMAK Software*

In this step, the ten components extracted in the previous step are placed in a 10 by 10 matrix and evaluated, based on the opinion of experts, by being assigned numbers between 0–3 in accordance with Table 6. The final availability matrix after expert scores is shown in Figure 2. Based on the findings obtained from Table 7, it can be said that the matrix filling index is 88%, which indicates the high degree of connectivity and influence of the identified variables with and on each other.


**Table 6.** Evaluation table of relationships between variables.


**Figure 2.** Final availability matrix of research variables.


#### 5.2.1. Determining the Degree of Direct Influence and Dependence of Components

Based on the matrix of direct effects, the sum of rows and columns of the matrix indicates the degree of influence and dependence of the components, respectively. As can be seen in Table 8, the component of management factors has the greatest influence on other factors, and the components of laws and regulations and technological infrastructure come in second and third places. Based on the software results on the level of dependence, the component of technological infrastructure has been dependent on other components the most; management factors and security factors come in second and third in terms of dependence.

**Table 8.** The degree of direct influence and dependence of components.


5.2.2. Location of Components in the Zones of the Influence and Dependence Map

Variables are divided into four types based on their location in one of the four areas of the influence-dependence map Figure 3:


**Figure 3.** Diagram of system stability/instability.

5.2.3. Analyzing the Graph of Influence

The graph of influence shows the relationships between the components and how they influence each other. This graph is shown in the form of red and blue lines, the end of which is shown by an arrow and indicates the direction of the component's influence. Red lines indicate strong influence of factors on each other and blue lines, with differences in thickness, showing moderate to weak relationships (Figure 4).

The status of relationships in the graph of influence indicates that the variables of management factors, laws and regulations, and technological infrastructure have been the source of the most severe influences and increased their role in the system. Management factors, technological infrastructure, and security factors are also strongly influenced by other components of the system. Table 9 shows the share of each component in influence and dependence and Figure 5 shows the movement of each component.

**Figure 4.** The influence cycle graph.


**Figure 5.** Movement of components in direct and indirect influence and dependence.


**Table 9.** Arrangement of components with the largest contribution to direct influence and dependence.

#### **6. Discussion and Conclusions**

The Internet of Energy (IoE) technology as a novel solution has changed the methods of production, transmission, and consumption of energy and has affected human life. IoE plays an essential role as an efficient tool to increase energy efficiency, recover the economy of energy and sustainable development. In order to answer the research questions, two approaches of meta-synthesis and MICMAC analysis were used. First, after the screening process of papers based on Critical Appraisal Skills Program (CASP), relevant papers were identified and carefully reviewed. Then, the research parameters were coded using MAXQDA software to determine their frequency and classification. The kappa coefficient is a statistic in qualitative research that shows the robustness of the methodology by measuring the agreement of experts on the extracted codes. In this research, the Kappa coefficient value is 0.87, which is in the excellent range and indicates the reliability of the method. There is also a consensus among experts in the field of IoE about the research parameters. In the next step, the importance of each component was determined using the Shannon entropy and MICMAC structural analysis methods. In the Shannon entropy method, based on the frequency of components and calculating the significance coefficient for each of them, the components can be ranked. In the MICMAK structural analysis method, the influence and dependence levels of the components were obtained, which resulted in determining the strategic components that have the largest share in influence and dependence. In other words, the accuracy of the results can be ensured by comparing the results obtained from the Shannon entropy and MICMAK structural analysis methods. The results show that 82 indicators under the umbrella of ten axial components are involved in the implementation of IoE: rules and regulations, individual and human factors, financing, technological infrastructure, cultural and social factors, security factors, technological resources, knowledge resources, learning style, and managerial factors. In the Shannon entropy method, technological infrastructure (1), management factors (2), rules and regulations (3), technological resources (4), security factors (5), financing (6), cultural and social factors and individual and human factors (7), knowledge resources (8), and learning style (9) are the most significant, respectively. In MICMAK structural analysis, the components of management factors (1), technological infrastructure (2), security factors and financing (3), knowledge resources (4), rules and regulations and technological resources (5), cultural and social factors and individual and human factors (6), and learning style (7) have the largest share in influence and dependence, respectively. Conclusion: The two components of management factors and technological infrastructure are the most important in both methods and can be considered as key and strategic components, which is consistent with the findings of researchers, such as Taghavi et al., 2021 [17]; Miglani et al., 2020 [4]; Hua et al., 2019 [5]; Qiu et al., 2019 [19]; Sun, 2019 [45]; Lombardi et al., 2018 [47]; and

Town et al., 2018 [38]. On the other hand, individual and human factors and cultural and social factors together are of equal importance, which is in accordance with the findings of Umer et al., 2019 [46]; Pirmagomedov and Koucheryavy, 2019 [36]; and Mahapatra, 2018 [41]. In both methods, the learning style has less priority.

One of the important points in qualitative research is that the basis of such research is the opinions of experts. Undoubtedly, the emergence of new studies in the area of IoE introduce new parameters that keep the way open for future research.

Today, Scenario-Based Strategic Planning (SBSP) is one of the most important and key tools in the field of future studies that has attracted the attention of many researchers. SBSP outlines a more realistic future for individuals and helps them make future decisions. The use of this tool requires the identification of key drivers in the subject under study. The output of this research can be a good criterion for future works of researchers. Therefore, it is suggested that researchers use the results of this study on the subject of future studies regarding the IoE. Blockchain technology is another emerging technology that is influential in various fields such as energy. In their study, Azizi et al. (2021) [67] mentioned the use of Internet of Things (IoT) and blockchain in the smart supply chain. In addition, as another suggestion to researchers, studying the application of blockchain technology in the field of IoE is another interesting topic that can pave the way for future research.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/su142013180/s1, Figure S1: PRISMA 2020 flow diagram for systematic reviews which included searches of databases and registers; Table S1: PRISMA 2020 Main Checklist; Table S2: PRISMA 2020 Abstract Checklist [68].

**Author Contributions:** Conceptualization, P.A., R.A. and M.H.T.; methodology, A.B.N. and M.H.T.; software, M.H.T.; validation, P.A., R.A. and A.B.N.; formal analysis, P.A. and M.H.T.; investigation, R.A. and A.B.N.; resources, P.A., R.A., A.B.N. and M.H.T.; data curation, M.H.T.; writing—original draft preparation, M.H.T.; writing—review and editing, P.A., R.A. and A.B.N.; visualization, A.B.N.; supervision, P.A. and R.A.; project administration, P.A., R.A. and A.B.N. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Neuro-Cybernetic System for Forecasting Electricity Consumption in the Bulgarian National Power System**

**Kostadin Yotov, Emil Hadzhikolev, Stanka Hadzhikoleva \* and Stoyan Cheresharov**

Faculty of Mathematics and Informatics, University of Plovdiv Paisii Hilendarski, 236 Bulgaria Blvd., 4027 Plovdiv, Bulgaria

**\*** Correspondence: stankah@uni-plovdiv.bg

**Abstract:** Making forecasts for the development of a given process over time, which depends on many factors, is in some cases a difficult task. The choice of appropriate methods—mathematical, statistical, or artificial intelligence methods—is also not obvious, given their great variety. This paper presented a model of a forecasting system by comparing the errors in the use of time series on the one hand, and artificial neural networks on the other. The model aims at multifactor predictions based on forecast data on significant factors, which were obtained by automated testing of different methods and selection of the methods with the highest accuracy. Successful experiments were conducted to forecast energy consumption in Bulgaria, including for household consumption; industry consumption, the public sector and services; and total final energy consumption.

**Keywords:** electricity consumption; forecast energy consumption; forecasting system

#### **1. Introduction**

The forecasting of the future is extremely important for the effective management of a process or system. Forecasting is about predicting the future as accurately as possible, given all of the information available, including historical data and knowledge of any future events that might impact the forecasts [1]. From a scientific point of view, forecasting is a scientifically based assumption about the future state and development of processes, events, indicators, etc. [2]. Considering the possibility of the existence of many different forecasts for the development of a given process in the future, forecasting can be defined as a reasonable assumption of possible options for development in a given area and the probability that they will be realized.

The synergy between mathematics and computer science has led to the development of a wide variety of algorithms, approaches, methods, and tools for forecasting. Widely used, with application in various fields are mathematical and statistical methods including regression and clustering [1,3], time series [4,5], polynomial approximations [6], fuzzy collaborative methods [7], as well as many methods for artificial intelligence predicting, such as machine learning [8,9], etc. On the one hand, this diversity provides an opportunity to choose a specific approach to solving a given task, but, on the other hand, it makes it difficult to find the most effective solution.

In the process of our work on multifactor and multi-step forecasting of energy consumption in the Republic of Bulgaria, we came to the need to forecast many socio-economic factors through which to make the final forecast. The functions, by which the individual factors change, as well as the energy consumption, can have a variety of linear and nonlinear forms, where the appropriate forecasting methods for each of them may be different. Determining the most accurate forecast values for the factors would have a positive effect on the accuracy of forecasting the target value, which in our case is energy consumption.

The automation of the process of choosing the most effective method for any individual factor or target value contributes to the acceleration of the process and the improvement of the forecast accuracy. Finding the most effective forecasting method requires experimenting

**Citation:** Yotov, K.; Hadzhikolev, E.; Hadzhikoleva, S.; Cheresharov, S. Neuro-Cybernetic System for Forecasting Electricity Consumption in the Bulgarian National Power System. *Sustainability* **2022**, *14*, 11074. https://doi.org/10.3390/ su141711074

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 25 July 2022 Accepted: 31 August 2022 Published: 5 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

with several different approaches that are combined and compared. Making more effort to solve the prognostic task could save future effort, time, and money. Fast, but essentially inaccurate forecasts often lead to unreasonable investments or unrealized opportunities.

#### **2. Predicting Electricity Consumption—State of the Art**

Whether the forecast is long-term, medium-term, or short-term, predicting electricity consumption has a key role to play in investment planning, the introduction of new capacities or the decommissioning of unnecessary ones, and the assessment of the behavior of the entire economic system. Effective modeling of electricity consumption is becoming a vital task aimed at avoiding costly mistakes in unreasonable investments, shutting down important facilities, improperly scheduled repairs or short-sightedness in exports or imports. Therefore, we should not be surprised that in the literature on the subject there are many proposed options for dealing with forecasting problems.

Official studies focused on the development of the energy sector in Bulgaria, factors influencing electricity consumption and approaches for forecasting the consumption and factors have been conducted by teams from the Bulgarian Academy of Sciences (BAS) and Risk Management Lab. Traditional methods such as correlation and regression analyses have been used in both studies.

In the research, conducted in BAS, a basic and in-depth analysis has been made of the impact of individual factors influencing electricity consumption. Taken into consideration are the country's gross domestic product, gross value added by economy sectors, population size, number of employees, income, prices, changes in temperature until 2040 according to the Bulgarian National Institute of Meteorology and Hydrology, etc. Forecasts have been made based on three different scenarios according to different expectation for changes in factors [10].

The team of Risk Management Lab creates mathematical and statistical models for forecasting the electricity balance (including as an element of itself and forecasting electricity consumption). The study examines specific factors in terms of electricity consumptions by households and the industry [11].

Due to the great social and economic importance of forecasting electricity consumption, many scientists have proposed different types of forecasting models to solve the problem in the last few decades. The methods for forecasting electricity consumption can be defined in several categories:


Each of the considered approaches has its advantages and disadvantages in different specific tasks and situations. This shows that the creation of an automated forecasting system in which multiple forecasting models can be implemented and compared could bring many benefits in forecasting not only electricity consumption but also other quantities.

#### **3. Materials and Methods**

#### *3.1. Forecasting Process*

In the general case, the solution of forecasting tasks is performed in a similar way and the process can be reduced to several stages—data collection and processing, a study of solution methods, solution, and analysis of the results [1]. The algorithm is iterative and individual stages can be performed and overlap repeatedly over time.

Data collection is often associated with preliminary research in the subject area, which provides additional information related to the set task. At this stage, an idea of the possible factors influencing the predicted values is formed. In the best case, the data on the selected factors are provided by the stakeholder or can be collected from one or more sources. In other cases, new technical equipment and software applications must be built in order to collect them.

The merging and synchronization of data by time, location, seasons, or other criteria is an integral activity when using multiple data sources providing data for various factors.

Data analysis and processing include activities to check and clear incorrect input data; conversion of data into structures suitable for modeling; and graphical representation of the data through which trends, periodicity, seasonality, etc., can be detected. In some cases, data behavior may be affected by various methodological and technological differences in data collection, as well as social, societal, climatic, or other changes. This implies the use of various methods for analysis and subsequent forecasting of the formed segments.

In the stage of research of methods for solving the problem of forecasting, various existing methods and algorithms are considered or specific ones are created. Both standard statistical and artificial intelligence methods and models can be used as forecasting methods. The number of methods and their use may depend on the requirements of the specific task, on factors related to the environment, performers, etc. An important part of this stage is the definition of indicators for evaluating the effectiveness of forecasting methods. The evaluation of efficiency may include various parameters depending on the volume of data, technical capabilities of computer systems, cost, speed, etc.

In the last stages, one or several of the perspective models for forecasting are selected and applied for the solution of the problem. After evaluating the results, individual steps can be repeated many times in order to achieve better results.

This whole process requires a lot of time and resources. This raises the question—is it possible to fully or partially automate the forecasting process, including the conducting of experiments with various mathematical and artificial intelligence methods, comparing errors, and choosing the best solution? This is the purpose of our work that is presented in this article.

#### *3.2. Multifactor Forecasting System through Automated Selection of the Best Methods*

The successful solution of the set task for forecasting energy consumption in the national power system is directly related to the correct assessment of the factors influencing energy consumption. These are macroeconomic and demographic indicators, social parameters, weather conditions, and others. This work investigated several factors influencing electricity consumption: gross domestic product; energy intensity; population size; population income; price of electricity; expected temperatures for the respective period; energy efficiency; and electricity consumption in a preceding period.

The automation of various stages and activities of the forecasting process requires careful planning. Such a system must support the following basic capabilities:


In our proposed model for an automated search of effective forecasting methods, two main approaches are used:


The long-term goal of our research is to develop a neurocybernetic system for energy consumption forecasting that supports various mathematical and artificial intelligence methods. In the first version, we used a neural network in its basic version. Further work on the system involves adding other forecasting methods.

#### *3.3. Mathematical Model of the Forecasting Task*

The main approach to work in the forecasting system through artificial neural networks can be formally presented as follows:

Let us assume we have the following event:

$$\mathcal{S} = \mathcal{S}\left(\stackrel{\rightarrow}{X}\_{1\prime} \stackrel{\rightarrow}{X}\_{2\prime} \dots \stackrel{\rightarrow}{X\_k}\right),$$

whose outcome is determined by *<sup>k</sup>* influencing factors <sup>→</sup> *<sup>X</sup>*1, <sup>→</sup> *<sup>X</sup>*2, ... <sup>→</sup> *Xk*. Let a sample of values be provided for each of the factors <sup>→</sup> *Xi*

$$\overrightarrow{X}\_i \colon \begin{pmatrix} \mathbf{x}\_{i,1} \ \mathbf{x}\_{i,2} \dots \ \mathbf{x}\_{i,j\_i} \end{pmatrix} \textit{ } i = 1,2,\dots \boldsymbol{k}, \ i, j\_i \in \mathcal{N}.$$

The main steps performed by the system when using complex forecasting are the following:


Various criteria can be used to evaluate the effectiveness of forecasting methods, such as decision error, cost, speed, computer resources used, etc. In our study, we chose to call the method *M*<sup>1</sup> more effective than *M*<sup>2</sup> if it has a lesser prediction error, i.e.,

$$E\_{M\_1} \le E\_{M\_2}.\tag{1}$$

Condition (1) allows us to introduce a formula for the efficiency of a forecasting method as the inversely proportional values of the error that occurs when forecasting with it:

$$
efficiency = \frac{1}{error}.\tag{2}$$

Absolute error (AE) or root mean square error, obtained when forecasting on an *n*tuple data array, can be used. For single point forecasts created on (*n* − 1) the sample element, the absolute forecasting error for the last element is:

$$AE = |Real\_n - Forecast(n)|,\tag{3}$$

where


For multi-step forecasts for the last *k* elements of the data array, one of the most popular error metrics were used—the mean absolute error (MAE), mean square error (MSE), and symmetric mean absolute percentage error (SMAPE) [39]. They are calculated by the formulas:

$$\text{MAE} = \frac{1}{k} \sum\_{i=n-k+1}^{n} |real\_i - forecast(i)|\,\text{s}\tag{4}$$

$$\text{MSE} = \frac{1}{k} \sum\_{i=n-k+1}^{n} [real\_i - forecast(i)]^2. \tag{5}$$

$$\text{SMAPE} = \frac{100\%}{n} \sum\_{i=1}^{n} \frac{|Forcast - Real|}{|Forcast| + |Real|} \tag{6}$$

#### *3.4. System Architecture*

The modules in a forecasting software system automate the main activities (Figure 1). The user of the system accesses the individual modules through a common interface provided by the Manager module. In addition to the connection with the user, it provides management and control over the other modules.

**Figure 1.** Main modules and the connections between them in an automated forecasting system.

The data entry module assists the user in entering data and their initial classification.

Merging and synchronization tools are useful in cases where data from different sources are used. At the user's choice, through functions or parameters, the data are converted into a format suitable for making forecasts. The data analysis is supported by graphical tools integrated in the module and standard statistical methods, providing the user with opportunities for additional classification and clearing of incorrect data, as well as preparation of various data models.

The forecasting module presents an opportunity to choose one or more forecasting methods, as well as an opportunity to evaluate the most effective method. An important feature that the system must support is easy integration of new forecasting and evaluation methods in the module.

The presentation module contains graphical tools for visualization of the results obtained from the most effective method, as well as from all other methods used.

The data storage and management module provide access to various types of data that can be used in the configuration, training of forecasting methods and their subsequent use:


#### *3.5. Multifactorial Multi-Step Forecasts*

The forecasting module (Figure 2) provides us with two different forecasting approaches:


**Figure 2.** Forecasting module.

During the process of factor forecasting, for each of the factors the most effective method was sought, which could be used repeatedly (Figure 3). It is appropriate to forecast the individual factors in parallel and to use cloud computing to save time and resources.

The factor forecasting module receives a two-dimensional array of input data 4 *xi*, *<sup>j</sup>* 5*j*=1...*ni <sup>i</sup>*=1...*<sup>k</sup>* , containing the sample for *k* in number factors, each with *nk* values. Other input parameters are the number of desired forecast values *c* ∈ *N* and the lower limit of the desired *ef ficiency*. For each of the factors, independent forecasts were made using the library of "prediction methods".

The set of forecasting methods {*mt*}*t*=1...*p*, *p* ∈ *N* was pre-set in the software system and could be extended. The methods include a variety of time series forecast models, models based on artificial neural networks in which parameters such as the number of neurons, activation functions, learning algorithms and other artificial intelligence algorithms can be changed. Models with time series and artificial neural networks are integrated in the system so far.

Applying the forecasting methods, for each of the factors there are approximating functions with the desired efficiency 4 *ft*(*xi*, *<sup>j</sup>*) 5*t*=1...*<sup>p</sup> j*=(*ni*−*c*+1)...*ni* , *i* = 1, 2, . . . *k*.

In the process of evaluating the effectiveness, using the AE, MAE, and MSE errors, the efficiencies of the tested methods or variants of methods were compared. The most appropriate forecasting model was selected, including the method *mt*<sup>1</sup> ∈ {*mt*}*t*=1...*<sup>p</sup>* and its corresponding parameters.

**Figure 3.** Choice of forecasting method for each factor.

The end result of the process is a set of ordered triples <factor, factor data, selected forecasting method and its corresponding parameters>, i.e., 41*i*, *xi*, *<sup>j</sup>*, *mti* 25*j*=1...*<sup>n</sup> <sup>i</sup>*=1...*<sup>k</sup>* , *mti* ∈ {*mt*}*t*=1...*p*. When the data change, the same selected method can be used, or the process can be restarted to search for a new method.

#### *3.6. Algorithm for Searching for an Optimal Artificial Neural Network*

To solve a task, many different neural networks can be built, with different numbers of neurons in the hidden layer, with greater or lesser error and different result functions, which have similar behavior in the input–output samples used. An error is not always an indicator of the complexity of the neural network. The lesser number of neurons implies faster and easier learning of the neural network and, subsequently, faster work. The optimal neural network for solving a certain task has characteristics such as a minimum number of neurons and compliance with the user-specified allowable error in training, testing, and validation with the available input-output samples. Its effectiveness in newcomer input data can be evaluated at a later stage. An algorithm providing capabilities for automated construction of multiple neural networks would help choose the optimal solution.

One of our experimented approaches for creating an optimal neural network is iteration over various parameters (Figure 4) needed to create neural networks: number of neurons in the hidden layer, activation functions (Table 1), training algorithms (Table 2), and number of training epochs. It successively changes the parameters for creating neural networks and examines the efficiency of the current neural network. The first neural network that meets the requirements set by the user is considered optimal. Therefore, an important part of the algorithm is how to change the iterative parameters. Since the creation

and training of each neural network requires a certain amount of computer resources and time, the iterative approach is appropriate to use on single-processor machines only for tasks where finding the appropriate neural network is expected to have a relatively small number of iterations.

**Figure 4.** A variant of a parallel algorithm for constructing an optimal neural network for forecasting. **Table 1.** Used activation functions.


**Table 2.** Used training methods.


Finding the optimal solution faster is associated with:


The iterative algorithm for automated construction of artificial neural networks (Figure 5) has the following main steps:

	- Tensor data—input data for the factors, which are usually in the form of a onedimensional—{*xi*}*i*=1...*k*—or two-dimensional array—<sup>4</sup> *xi*, *<sup>j</sup>* 5*j*=1...*ni <sup>i</sup>*=1...*<sup>k</sup>* .
	- Number of forecasted results—*c*—which is 1 for single-point forecasts or a larger integer for multi-step forecasts.
	- Desired efficiency—*ef ficiency*—of the trained neural network.
	- List of training methods *lms* = {*mi*}, where *i* varies from *1* to the number of methods (Table 2). Depending on the task, to achieve the desired result faster, it is possible to arrange the methods in the list according to the expected efficiency, and some of them may even be excluded if they are considered inappropriate.
	- List of activation functions *afs* = 4 *a fj* 5 , where *j* varies from *1* to the number of functions (Table 1). Here, too, the functions can be arranged at the discretion of the appropriateness of their use in the specific task.
	- Minimal and maximal number of neurons—*min*\_*n u max*\_*n,* as well as a step by which neurons change—*step*\_*n*. The current number of neurons we denote by *n*. For more elementary tasks, the number of neurons may start from *1* (*min*\_*n* = 1) and the step by which their number increases is also *1*. The maximum number limits the possible iterations related to the number of neurons.
	- The epochs *epochs* change from *min*\_*ep* to *max*\_*ep* with a step *step*\_*ep*. Values we have experimented with are *minep* = 1000, *maxep* = 5000, *step*\_*ep* = 1000, where usually 3–4 iterations are enough to assess whether the change of epochs affects the efficiency of the trained neural network.
	- By iterating over the number of neurons, training methods and activation functions, a neural network with their current values is created—the ordered triple (*n*, *lm*, *a f*) and the input data. The nesting of the loops for the specific task is a matter of judgment, which determines the sequence of the parameters change. In the experiments, we chose to increase the number of neurons in the outermost loop, as we wanted to find a neural network with the lowest number of neurons. Training methods and activation functions change in inner loops.
	- If the neural network meets the condition, its data is saved and the task is completed;
	- Otherwise, attempts are made to increase the efficiency of the neural network by increasing the number of learning epochs. The information about the most efficient neural network found (with the smallest error) is saved, and it can be current or obtained in a previous iteration.

The use of the "brute force" method, by traversing all possible values of the iterative parameters and finding the neural network with the optimal ratio "number of neurons efficiency" is not the most rational approach. Tracking changes in the specified ratio can lead to the creation of heuristic variants of the algorithm by automated changing of the order of change of the iterative parameters.

The availability of sufficient computing power predisposes to the use of different parallel algorithms for such automated search for an optimal neural network (Figure 5) in which, for example, all combinations of training methods and activation functions can be started in parallel processes—(*lmi*, *a fj*), where *i* and *j* are changed from *1* to the corresponding number of training methods and activation functions. After the completion of the individual processes, the efficiencies of all neural networks created during their implementation are compared and the most appropriate one is selected.

**Figure 5.** General scheme of forecasting processes.

#### **4. Results and Discussion**

A prototype was developed for approbation of the presented model. The program MatLab was used, with which the described basic modules and functionalities were implemented. The prototype was tested to solve prognostic tasks in the field of energy.

#### *4.1. Setup of the Experiment*

The problem of forecasting the demand and respectively—the consumption of electricity is extremely important for the planning and management of the national energy system of every country. Accurate forecasting of probable electrical loads is an important prerequisite for effective planning of production capacity, proper maintenance of the transmission and distribution network, planning of future exports or imports of electricity, the behavior and direction of energy flows both in the country and in related international networks.

The developed prototype was used to solve three tasks for forecasting electricity consumption in the National Power System of Bulgaria. The targets subject to forecasting were:


The forecasts were made by taking into account their dependencies on the following socio-economic factors:


These factors have been identified as significant in studies carried out by the Bulgarian Academy of Sciences (BAS) and Risk Management Lab [10,11]. A subsequent step would be to build a module for the automated study of correlations between the factors with the aim of minimizing their number and optimizing post-processing.

In order to forecast the target values for a specific year, we first made forecasts for the factors for the respective year.

In the process of work, data from official sources such as the National Statistical Institute [42], Information System INFOSTAT [43], Electricity System Operator [44], The World Bank Group [45], and Eurostat [46] were used. All available data on factors and target values for 17 years were used for the study.

The factor forecasting module uses different models of the time series trend. A substructure involving the use of neural networks to predict these factors provides for the possibility of further processing of the input data. Some of the data are submitted to the neural networks normalized, which is implemented by multiplying them by a specific coefficient. This reduces the radius vector of the input data and facilitates the training of this type of artificial intelligence. The general scheme showing the joint operation of the modules is shown in Figure 6:


#### *4.2. Single-Point Forecasts for the Factors*

The use of the prototype for forecasting electricity consumption in single-point (annual) factor forecasting showed different results for the effectiveness of different types of trends in time series (Figure 6).

When forecasting GDP, population, and average annual income, maximum efficiency was achieved with a linear trend of the time series. The forecasting of the energy intensity, the price of the electricity for the household, and the price of the electricity for the industry

achieved good results respectively in logarithmic, quadratic, and hyperbolic trends of the time series. Models for the most efficient neural networks, forecasting the same factors, are presented in Figure 7.

**Figure 6.** Comparison of efficiency in different types of time series trend.

**Figure 7.** Forecasting the behavior of the factors influencing the electricity consumption in NEES through separate artificial neural networks. (**a**) Gross domestic product, (**b**) energy intensity of the economy (Net\_Int), (**c**) population, (**d**) average annual income, (**e**) price of electricity for households, (**f**) price of electricity for the industry.

During neural network training, all factors were considered as functions of time in order to enable the comparison of the results of forecasting in time series. The type of training that proved to be most effective for current tasks is the Lavenberg–Marquardt algorithm. The most appropriate activation function of neurons in the hidden layers for Energy intensity of the economy neural network is the logarithmic sigmoid function:

$$g(x) = \sigma(x) = \frac{1}{1 - e^{-x}},$$

and for every other it is the hyperbolic tangent:

$$\lg(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}.$$

The activation function of the output neuron in all neural networks is linear:

$$g(x) = x.$$

The comparative analysis between the efficiency of the found neural networks in a one-year forecast and the type of trend in time series, leading to minimal error, is presented in Figure 8.

**Figure 8.** Comparison of efficiency between the most effective time series trend and the results of the forecasts of artificial neural networks.

The use of the presented prototype with time series and artificial neural networks showed a significant advantage in favor of neural networks in the case of forecasting GDP, population, and average annual income. In the other three cases, single-point forecast of energy intensity, price of electricity for households and its price for the economy sector, the module for comparison of errors showed some advantage in the efficiency of time series. In these cases, the type of trend was logarithmic, quadratic, and hyperbolic, respectively. Based on them, forecasts were made for the development of the values of the factors presented in Table 3.


**Table 3.** Single-point forecasts for the factors for the last year of the available dataset.

#### *4.3. Multifactor Single-Point Forecasts of Target Values*

In the second part of the conducted experiments, the influence of the considered factors on the target values in the National Power System of Bulgaria was studied. Appropriate optimal neural structures were created for their forecast (Figure 9):


**Figure 9.** Neural networks for consumption forecasting. (**a**) Total final consumption in the NES (Net\_Nees), (**b**) electricity consumption in industry, public sector, and services (Net\_Industry), and (**c**) electricity consumption in households (Net\_Households).

Activation functions of all neural networks found by the software system were hyperbolic tangent, and of the output neuron, linear.

The deviations of the forecast results from the actual data for single-point forecasts (for one-year consumption) were relatively small (Table 4) as the largest deviations in forecasting consumption in the industry are about 41 thousand tons of oil (toe) equivalent, i.e., about 1.525%. The deviation in forecasting consumption in the entire energy system was 0.0739%, and household consumption was 0.0302%.


**Table 4.** Forecasting of consumption with artificial neural networks, in thousand toe.

Considering complex values as time series did not provide good results. The errors obtained with the best approximations of time series were many times worse than the corresponding neural networks (Table 5).

**Table 5.** The comparative table of errors in the use of these neural networks and time series.


The analysis of the weights of the neural networks can show us the influence of the individual factors on the predicted value of the target variable.

Let us denote by *wi*,*<sup>p</sup>* the weights by which the *i*-th factor is transmitted to the neuron *p* from the hidden layer of the already trained neural network, as *i*, *p* ∈ *N*. We choose the value as a criterion for the significance of factor *i*:

$$INPF\_i = \left| \sum\_{p=1}^{r} w\_{i,p} \right|\_{r}$$

where *r* is the number of neurons in the hidden layer of the neural network.

The study showed that the largest impact on the total electricity consumption in the national network belongs to the gross domestic product of the country with *INPF*<sup>1</sup> = 11.6, and the least to the average annual income per capita with *INPF*<sup>4</sup> = 1.06. For consumption in the industry sector, things are similar. The most significant factor was GDP with *INPF*<sup>1</sup> = 5.01, and the one with the least importance was the price of electricity for the household—*INPF*<sup>6</sup> = 2.3. The most important factors in the energy consumption of households are population (*INPF*<sup>3</sup> = 13.82) and average annual income—*INPF*<sup>4</sup> = 6.16.

#### *4.4. Multi-Step Forecasts*

Using the created neural structures for factor forecasting, we created forecasts for a period of 7 years. Their accuracy can be determined over time (Table 6). Similarly, we created 7-year forecasts for the studied multifactorial values (Table 7). The forecast results showed that household electricity consumption, as well as total final consumption, will gradually increase, while electricity consumption in industry, the public sector, and services will decrease.


**Table 6.** Seven-year forecasting of factors.

**Table 7.** Seven-year forecast for electricity consumption in the National Energy System, made through artificial neural networks.


#### **5. Conclusions**

Forecasting is a complex task. The availability of a wide variety of mathematical and statistical methods, and artificial intelligence methods, combined with the pursuit of the most accurate forecasting, usually requires a lot of time and effort. The use of software tools to automate some of the activities greatly simplifies the work.

The article proposed a model for multifactor forecasting which automatically selects the best method for forecasting significant factors and then uses the data predicted by them to create a complex multifactor forecast. The developed model was successfully experimented with to make multiple forecasts for the energy consumption of households, industry, and total consumption—for a one-year and a seven-year period. By automating the forecasting process in an indicative way, we made it easier to make predictions with fewer errors than previously.

The presented model has wide applications in various subject areas. It can be used for air quality forecasting, demographic forecasting, forecasting in industry, etc.

In the future, the system can be expanded in several directions. An important part of its development is the integration of additional forecasting methods. The development of a module to evaluate the correlation between the individual factors and the target variable would also help to optimize the forecasting process. Automated generation of graphics and documentation for each step in the overall forecasting process would be beneficial to the end user of the system.

**Author Contributions:** Conceptualization, K.Y., E.H. and S.H.; data curation, K.Y. and S.C.; formal analysis, K.Y., E.H. and S.H.; funding acquisition, S.H. and S.C.; investigation, K.Y., E.H. and S.C.; methodology, K.Y., E.H. and S.H.; project administration, E.H.; software, K.Y. and S.C.; supervision, E.H.; validation, K.Y., E.H. and S.H.; visualization, K.Y. and E.H.; writing—original draft, K.Y., E.H. and S.H.; writing—review and editing, S.H. and S.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** The work is partly funded by the MU21-FMI-004 project at the Research Fund of the University of Plovdiv "Paisii Hilendarski".

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data are available through the public databases mentioned in the text.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Systematic Review* **Solar Panels Dirt Monitoring and Cleaning for Performance Improvement: A Systematic Review on Smart Systems**

**Benjamin Oluwamuyiwa Olorunfemi \*, Omolola A. Ogbolumani and Nnamdi Nwulu**

Center for Cyber-Physical Food, Energy and Water Systems (CCP-FEWS), University of Johannesburg, P.O. Box 524, Auckland Park, Johannesburg 2006, South Africa

**\*** Correspondence: ben93olorunfemi@gmail.com

**Abstract:** The advancement in technology to manage energy generation using solar panels has proved vital for increased reliability and reduced cost. Solar panels emit no pollution while producing electricity as a renewable energy source. However, the solar panel is adversely affected by dirt, a major environmental factor affecting energy production. The intensity of light falling on the solar panel is reduced when dirt accumulates on the surface. This, in turn, lowers the output of electrical energy generated by the solar panel. Since cleansing the solar panel is essential, constant monitoring and evaluation of these processes are necessary to optimize them. This emphasizes the importance of using smart systems to monitor dirt and clean solar panels to improve their performance. The paper tries to verify the existence and the degree of research interest in this topic and seeks to evaluate the impact of smart systems to detect dirt conditions and clean solar panels compared to autonomous and manual technology. Research on smart systems for addressing dirt accumulation on solar panels was conducted taking into account efficiency, accuracy, complexity, and reliability, initial and running cost. Overall, real-time monitoring and cleaning of the solar panel improved its output power with integrated smart systems. It helps users get real-time updates of the solar panel's condition and control actions from distant locations. A critical limitation of this research is the insufficient empirical analysis of existing smart systems, which should be thoroughly examined to allow further generalization of theoretical findings.

**Keywords:** photovoltaic panel; remote solar plant; automated cleaning; condition monitoring; internet of things; solar panels dirt; dirt detection; dirt accumulation and removal; device management; real-time monitoring and cleaning

#### **1. Introduction**

In many industrialized nations, electricity generation is still dependent on fossil fuels. Although these fuels are very effective in energy quality, they are not suited for long-term use because the fossil fuel source will eventually run out someday. Furthermore, fossil fuels are a considerable threat to environmental balance and create numerous ecological problems such as global warming [1,2]. Therefore, the utilization of renewable sources must be accepted as soon as possible. A significant feature of renewable electricity generation is the infinite supply [3]. Compared to conventional fossil fuel technologies, renewable electrical energy sources have a more negligible effect on the environment, considering cleanliness.

Solar panel technology is becoming more popular as a renewable electricity generation due to the growing renewable energy request [4,5]. By the end of this decade, China's solar capacity is foreseen to reach 400 GW [6]. The cumulative installed solar capacity in megawatts between 2012 and 2021 is shown in Figure 1, based on the information provided by IRENA, International Renewable Energy Agency [7].

**Citation:** Olorunfemi, B.O.;

Ogbolumani, O.A.; Nwulu, N. Solar Panels Dirt Monitoring and Cleaning for Performance Improvement: A Systematic Review on Smart Systems. *Sustainability* **2022**, *14*, 10920. https://doi.org/ 10.3390/su141710920

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 15 June 2022 Accepted: 22 August 2022 Published: 1 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

It is critical to appropriately manage solar power plants, aiming to optimize their performance and reliability for their continued use. The efficiency and stability of solar panels can be increased, while costs can be reduced [8]. Irradiation and temperature are the key environmental factors that determine the power output of a solar panel module. A decrease in the amount of irradiance and an increase in temperature decrease solar panel module efficiency [9]. Solar panels convert solar radiation into direct current electrical energy; it must constantly be exposed to the maximum amount of sunlight to maximize electricity productivity [10]. Nonetheless, irradiation decrease on the solar panel surface caused by shading due to dirt accumulation can be well controlled. This happens repeatedly and decreases the amount of sunlight reaching the panels [11]. Dirt accumulated on solar panels can include dust, snow, ice, and other organic waste [12]. Fine dust particles settle more deeply on the surface of solar panel modules, affecting their output performance more than coarse dust particles [13]. A controlled experiment conducted in [14], using spotlights to simulate solar radiation, found that the external irradiance resistance can reduce the photovoltaic performance by up to 85%. Rain can naturally wash away dust and sand, but moss requires proper cleaning [15,16]. Solar panel cleaning is one of the major challenges for solar power developers because cleaning the solar panel surface requires careful planning and resources (time, materials, and labor) and results in higher production costs. However, cleaning solar panels is an important task to ensure the long-term operational and financial success of a solar power plant [17]. Cleaning solar panels is necessary because it ensures that the solar panel surfaces are properly maintained to ensure efficient energy generation. It also prevents damage from accelerated aging or corrosion caused by weather conditions such as heavy rains, snow, hail, or high humidity [18,19].

The performance of a solar panel is mainly measured by its efficiency, which indicates how much electricity the panel produces compared to its maximum theoretical efficiency. For example, a solar panel with an efficiency of 20% means that it generates 20% more electricity than it could if left uncoated. An experiment on the cleanliness and tracking mechanism for the various conditions of a solar panel was carried out by [20]. The conditions examined are the fixed and clean panel, the dirty and fixed panel, the dirty and tracking panel, and the clean and tracking panel. Dust buildup on the solar panels' surfaces causes the efficiencies to decline even with installed sun-tracking. The high transmission rate of light on the cleaned solar panel causes an increase in efficiency [19]. Tracking a solar panel without cleaning is less efficient than keeping the solar panel fixed and cleaned, having an efficiency decrease of up to 50%. Dust deposition on solar panels reflects more loss in large-scale power plants in megawatts [21]. A 1% decrease in proficiency may meaningfully

influence the Internal Rate of Return (IRR). In comparison, low-level dust accumulation might not significantly affect the production output of small-scale solar plants [22].

Solar panels are monitored with data acquisition systems, where the performance of the system is evaluated under real-world conditions [23]. Real-time monitoring and evaluation of dirt accumulated on solar panels are required to optimize the cleaning operation. Generally, monitoring dirt accumulation on solar panels can either be done online or offline [24]. Smart systems enhanced by internet connection are integrated into solar panel cleaning to improve the performance of autonomous cleaning methods. This will make the system intelligent to monitor the remote solar panel. It can detect dirty conditions and activate its removal from the solar panel surface without human control. Solar panel surface maintenance can be done at a fraction of the installation cost, and electricity generation improvements are possible [25]. There are several reviews on cleaning methods for the solar system, both manual and autonomous systems, but to the author's knowledge, none has considered a review of the different smart system approaches applied to solar panel monitoring or cleaning system approaches used in this study. Published research papers have been reviewed and analyzed. The study aims to conduct a literature review concerning the theoretical framework for smart systems as it relates to solar panel cleaning and remote monitoring to promote the concept of smart solar systems. Literature searches were conducted using specific keywords related to the paper's subject. Although there are many review papers in the literature concerning the concept of smart solar systems, there is a limited number of papers concentrating on the technology adoption aspect of smart solar systems.

#### *1.1. Review of Solar Panels Automated Cleaning Techniques*

The continuous cleaning and monitoring of solar panels after installation on a roof or at a remote solar farm is difficult [26]. Solar panels can currently be cleaned using a variety of techniques, including the traditional method of brushing away dust, coating processes, and robotic cleaning devices. This process has been automatized since cleaning with manual brushes and water is incredibly time- and labor-consuming and costly for industrial solar installations [27]. An automated cleaning system for solar panels is composed of an autonomous unit using sensors and controllers and a cleaning mechanism unit that can be watered or waterless. Solar panels can be cleaned using several methods of removing dirt [28,29]; they are robotic, heliotex, electrostatic, coating cleaning, vibrating cleaning, and forced-air cleaning. The review of the cleaning methods, listed in Table 1, compares each method's pros and cons.

#### 1.1.1. Brush Cleaning

The brush cleaning method combines mechanical and electronic components to control the brush's movement as shown in Figure 2, for cleaning the solar panel either with or without water [38]. The turn-on and turn-off process is automated by sensing the current dust accumulation on the solar panels and comparing it with the set reference by the program. The electronic component supplies a signal to the motor for the cleaning system movement [39]. The system has to be robust with many types of complex procedures to be performed with greater precision, flexibility, and control than with conventional techniques [40]. Furthermore, the developed system improves the efficiency and output power of the solar panels as a result of improved performance [41].


#### **Table 1.** Comparison of various cleaning techniques.

**Figure 2.** Robotic brush cleaning of solar panels [42].

#### 1.1.2. Heliotex Cleaning

Heliotex cleaning involves spraying water onto the solar surfaces [43]. It is possible to program the cleaning system based on the environment whenever necessary. Further maintenance is not required, other than a periodic replacement of the water filter if it is blocked by sand and the top-ups of the cleanser. Pumps are connected via piping to a water reservoir, fixed to nozzles on the solar surface. The system is very effective and recommended for locations with no water deficiency due to the high amount of water

consumed for cleaning [44]. Figure 3 demonstrates the heliotex method of cleaning. This system is not suitable for all situations [19].

**Figure 3.** Demonstration of the heliotex method of cleaning solar panels [45].

#### 1.1.3. Electrostatic Cleaning

Another dust removal method is electrostatics cleaning, used on dry and dusty solar panels, as shown in Figure 4. In electrostatic precipitation (ESP), fine dust particles on the surface of the solar panel can be removed by induced electrostatic charges [46]. The solar panels are covered with transparent plastic or glass sheets of electrostatic charge material; when a high AC voltage is applied to the electrostatic material, the force acts on the dust close to it and causes the repository motion of the dust particles to shake off from the solar panel surface. The system can clean 90 percent of accumulated dust in less than two minutes [47]. A significant concern that limits the application of this method is safety. It would be unsafe since the solar panel would always remain charged even in showery weather. However, the dynamic motion of all the particles cannot be conveyed by fixed wire electrodes as experimented in [48].

**Figure 4.** The electrostatic cleaning procedure Some dust particles pass through the hole in the upper screen electrode due to their inertia force and the alternating electrostatic field near the electrodes stirs up the dust particles, and a high-speed microscope camera was used to capture the results as shown in (**a**) and (**b**) respectively [49].

#### 1.1.4. Coating Cleaning

The coating method is also a technique for cleaning solar panels using anti-soiling coating [50]. This method can be used with either a solid, liquid, or gas-based substrate. This method relies on the self-repellent action of the coating material to prevent dust particles from adhering to solar modules. Hydrophilic film and hydrophobic film are the two methods of coating cleaning [51]. The superhydrophobic coating surface method allows for self-cleaning PV panels. This has benefits, like preventing water damage and graffiti [44]. Water gets absorbed into the film in the hydrophilic and rinses the dirt away. On the other hand, the hydrophobic film repels water as it falls; due to its hydrophobic properties, water drops that reach the surface are pushed off quickly, picking up particles alongside. A specific cure time and evaporation of the solvent and drying of the nanoparticle base were required for the coatings to dry once applied as liquids with low viscosity [19]. Each of the coating samples had high transmission, low reflection, and low absorption properties in the ultraviolet (UV), visible (Vis), and near-infrared (NIR) regions.

The fundamental raw material for the coating cleaning is nano metal oxide particles and resin. The product is made by mixing chemicals [19]. Figure 5 compares a layer of hydrophilic coating causing a sliding motion to a rolling motion made by a hydrophobic coating. Despite their differences, both methods of self-cleaning serve the same end [52].

**Figure 5.** Droplets slide and roll during the self-cleaning process [53].

#### 1.1.5. Vibrating Cleaning System

The vibrating cleaning method prevents solar panels from getting dirty and does not require water or manual labor [46]. To remove the adhesive force between dust particles and the solar panel surface, a mechanical vibrator attached to a panel produced harmonic excitation force. In [54], the wind energy is converted into mechanical vibration for dust removal from solar panel surfaces without consuming any energy from the solar system, thereby improving its efficiency. As vibration intensifies, the inertial force of vibration increases, which turns the dust particles' adhesion force into kinetic energy. External sources of power are usually needed to operate the vibrating motor in vibrating cleaning systems [36]. The panel's self-cleaning system is driven by a DC motor that is fastened to the rear sheet. Based on [55], a solar module was supported on four edges to simulate a system being excited by an unbalanced mass to induce vibrations (see Figure 6). As the DC motor's rotor reached the first natural frequency, a large amount of vibration was induced on the panel.

**Figure 6.** DC motor attached to solar panel rear for vibration cleaning [55].

#### 1.1.6. Forced-Air Cleaning System

A forced-air cleaning system for solar panels can help to keep them clean and free of debris. This type of system uses a blower to force air through the panels, which can help to remove dirt, dust, and other debris, in addition to improving the efficiency and performance of residential and commercial solar panels. However, this method is only effective for removing dust blown by the air from solar panels [56]. Water is neither consumed nor directly contacted by the turbulent airflow generated by compressed air [37]. These results were used to construct a pilot cleaning and cooling system [37] that utilized a compressed-air unit composed of a compressor, air tank, airflow management valve, and nozzles with a thickness of 5 mm, see Figure 7. The compressor is powered by PV panels, and a valve controls the flow of compressed air from the tank to meet the needs of cleaning and cooling. A pipe assembly that can be moved around an installation as needed can transport air between the panels [19].

#### *1.2. Evaluation of the Performance and Cost of Solar Panel Cleaning Techniques*

Several studies may be done regarding cost–performance considerations after the cleaning technique is developed, implemented into use, and a cleaning frequency is defined. It is important to point out that various cleaning methods are dependent on market demands, with some common procedures employing natural, manual, mechanical, electrostatic, vibration, and coating processes. Different environmental variables and setups may be evaluated using the performance and cost. To assess the necessity for a self-cleaning system, a relationship analysis between soiling and its impacts on performance efficiency is required [57]. The amount of sunshine, dust concentration, and rain effects are a few examples of environmental variables. The effectiveness of each cleaning method and its costs may very well be forecasted for each circumstance using optimization models and machine learning. A configuration strategy and an investment plan will result from comparing the different circumstances.

**Figure 7.** Regulatory mechanism for solar PV panel arrays using compressed air [37].

The configuration plan identifies the hardware and software modifications required to be made for the current platforms as well as the system architectural modifications that should be implemented. Some experts believe that the cost of restoring solar panels' capacity to capture energy should be determined exclusively using a satisfactory rate of return (ROI) [37]. The effectiveness of an integrated smart system for solar panel cleaning may be determined by this analysis [22]. The limitation of the ROI analysis is that they only evaluate the economic side of the problem. Reliability should be taken into account when determining how much to spend on solar panel cleaning methods because it is a major obstacle to effective monitoring and cleaning.

Based on the cleaning method analysis of various cleaning systems by [58], the electrostatic cleaning method is the most effective. Dust particles are removed from the surface without using water; however, spraying water on the photovoltaic cells during cleaning increases their efficiency. The economic viability of automatic self-cleaning mechanisms of solar panels is evaluated in [59] to determine their contribution to the total system cost. When comparing the power generation of PV modules with and without automated selfcleaning mechanisms, the findings reveal a difference of 35%. A domestic installation has a payback period of about five years while making an installation in a commercial setting will typically pay off after 2.25 years. Similarly, ref. [60] reported an efficiency increase of 30–33% in a solar panel array when a robotic cleaning system was used. A robot can also be programmed to fix panels of different sizes. Cleaning a complete array is extremely beneficial since the accumulation of dust on one panel can hinder the performance of the entire array. The fact that solar panel cells are usually connected in series makes it extremely important that they operate at maximum efficiency.

#### *1.3. Review of Solar Panel Remote Monitoring*

#### 1.3.1. Condition Monitoring

It has been an important research topic to continuously check the condition of solar panels in remote areas and detect faults to provide stable power [61]. The status application captures and reports the operation, performance, and usage of the solar panel being monitored. With diagnostics applications, monitoring, troubleshooting, repairing, and maintaining networked devices are possible. IoT-enabled smart solar monitoring systems provide remote monitoring and recording. This platform monitors the solar system in real-time via the internet. Monitoring of parameters such as voltage, current, temperature, and humidity is performed by a smart solar panel cleaning system built with IoT. Solar panel performance is typically characterized by measuring the I–V curve under standard conditions (1000 W/m<sup>2</sup> solar irradiance and 25 ◦C temperature) [62].

In general, remote monitoring systems consist of three components: a sensing unit, a processing unit, and a display unit [63]. The sensing unit is located near the solar system to gather all relevant data to monitor system performance. The data from the sensing unit is carried to a processing unit using a wired or wireless (wireless sensor network—WSN) network, then to the display unit. These services are made possible by wireless sensor networks, which are cost-efficient to install, consume low power, and require little maintenance. Long-range features enable their deployment at remote sites [64]. Smart sensors are often used in a sensing unit so that the signals generated by a solar monitoring system can be handled efficiently before they are sent to a central processing unit. A plant health monitoring system utilizing IoT was proposed by [65], in which the sensors were embedded in the solar system and connected to the internet via wireless networks.

According to [66], the sensor values are crucial for determining a panel's output. A solar power plant's dirt condition can play a crucial role in monitoring the need for maintenance [22]. Despite the unpredictable nature of solar energy and the initial installation costs, research has been undertaken to discover the execution of solar energy optimization. To improve electrical systems reliability, the optimization method aims to minimize investment, operating, and maintenance costs and emissions [67,68]. Ref. [69] examined how the internet of things can be used to monitor solar panels and found its usage is crucial to the proper management of the solar system. Sensing hardware, data acquisition software, and block management modules measure data. This allows all the real-time data collection on the solar plant's electrical output variables to be viewed and stored within the block management. When the panel is not operating correctly, the smart system will offer suggestions, display errors, and send alerts when maintenance is needed.

#### 1.3.2. Dirt Detection

Monitoring and cleaning solar systems have been studied extensively. Before the performance of the cleaning system, it can be challenging to predict the deposition rate of the organic and inorganic particles on the solar module surfaces. Therefore, to ensure that the cleaning system is as effective and efficient as possible and to make the best use of the energy yield, it is required to inspect the solar panels for dust. Besides analyzing dust effects and deposition rates, identifying the crucial information from the previous research is the purpose of this section.

A dirt detection mechanism on a solar panel was made by [39]. A weight sensor in the system continuously measures the dust. Upon receiving defined feedback from the sensors, the Arduino controller commands the dust cleaning. Solar panels are fitted with weight sensors that measure dust thickness according to changes in weight. The feedback for cleaning is that the panel weight goes more than the predefined value due to dust. A dusty solar panel will weigh more than one that is clean. The Arduino controller can reference the actual weight of the solar panel. In their findings, the continuous monitoring of the weight by the microcontroller through the load cell aids the dirt detection and cleaning of the solar panels. A mounting plate holds the load cell below the panel.

In 2019, ref. [70] developed an innovative system for monitoring solar panels' condition. Radiometric sensors are used in a condition monitoring system that links to an Arduino platform, and it works by analyzing the emissivity of a surface and recognizing a low value when the dust is present. A thermographic camera is employed for the radiometric sensor to provide reliable results. Unmanned aerial vehicles are designed to carry the system. With the Internet of Things, radiometric data can be sent to the cloud for analysis, and thermograms can be stored to be further processed. An actual solar panel measures sensor output and surface conditions from various angles and distances. Results from the radiometric sensor analysis show a high degree of accuracy, and dust is recognized in all set-ups. As part of a thermography analysis, it is found that there is consistency and regularity in the quantities. The average variation of each experiment determines accuracy. When the luminous emittance of the solar panel increases, the radiometer measures a higher temperature. The thermal images verify the results of this measurement using a radiometer sensor, the surface being measured is characterized by its temperature. Obtaining reliable measurements for thermal image processing requires knowing the surface characteristics and the ambient conditions, such as humidity, the temperature of the air, distance to an object, reflected temperature, and incident radiation [9].

A design and fabrication demonstration shows a prototype that cleans the panel's surface in [22]. The sensing unit was programmed with the regression model developed using a month's worth of data from spotless and dirty panels. By the regression model and the integrated sensing unit, the autonomous unit determines the optimal time for cleaning. The prototype autonomous unit monitors input and influencing parameters with direct or indirect impact on the solar farm's output power. They also investigated automatic cleaning in their study. Therefore, this system can determine if dust particles impact solar panels' power generation. In the prototype, the light intensity is measured with the TSL 2561 illuminance sensor, and sensors measure voltage and current to determine output power. The other manipulating parameters for solar panel output, such as temperature and humidity, are measured by DHT11 temperature and humidity sensors, whereas GP2Y1014AU0F dust sensors determine the dust density. The measured illuminance value and output power are stored in the cloud interface. Regression analysis of the processed data is carried out to determine the connection between input variables and power output.

Another study by [71] investigated robotic technology for removing solar dust from solar boards. The strategy proposes screening power generation on a mobile app and cleaning the solar surfaces in response. The input mechanism includes an Android switch unit, IP camera, voltage sensor, and current sensor in the experiment. IP cameras monitor solar panel cleaning and conditioning; it is internet-connected and displayed on a PC Windows system or an Android device. Images might be rather costly to transfer over the internet. A related study in [72] used smart cameras with "RGB" and "infrared" for night vision to continuously take pictures of solar panels. A real-time algorithm determines whether or not a panel needs to be cleaned based on the picture. An Advanced Reduced Instruction Set Computer Machines (ARM) processor, which will also be on the board, will process the incoming data and execute the algorithm. The intelligent system detects a dirty panel automatically and triggers a mechanism to clean the panel. As the panels become dusty or exposed to bird feces, but do not accumulate enough deposits to exceed the threshold, the energy drop is within the average fluctuation energy of clean panels. The dustier the panels become, the more the energy drop occurs, and the energy output can be increased by cleaning the panels. A typical cleaning interval can range from several months to less than a day, depending on weather conditions.

In the work described in [73], they used machine vision for solar power plants. A self-inspection cleaning device, fault detection systems, and combined power units using drone platforms for multi-image fusion contaminant recognition were researched. The autonomous detection and recognition function is achieved through an image recognition analysis. To collect images, an "infrared thermal imaging camera, a color visible light camera, and a black and white visible-light camera" are used; once the acquired image has been fused, it is processed. An infrared thermal imaging camera is combined with an image camera to create a visual image for inspection purposes. A recognition algorithm analyzes the image to identify hot spots and surface impurities generating fault inside the panel and pollution notification in the control system. According to [74], four generations of outdoor soiling loss monitoring systems were developed by the authors. In the fourth-generation soiling monitoring stations, a glass shutter is opened for 2 min (or less as necessary) approximately at solar noon to take Isc measurements on the clean cell. As the glass was open, the shunt voltage of this panel was also recorded. As a result of these measurements, it is possible to determine any soil accumulation on the glass surface. Positive gains occur due to thick soil layers at around 1%/mm, whereas positive gains are typically much lower than 0.2%/mm at thin soil layers. The design allows data to be sent via mobile networks from anywhere around the globe, enabling the monitoring of multiple sites.

An approach based on computer vision was presented for detecting soil and dust on solar surfaces in [75]. To sense dirt on the solar panel, physical features are extracted through the use of the Gray Level Co-occurrence Matrix (GLCM) method. Solar panel detection is the first step of the proposed solar panel classification method. A solar surface is discovered in an image. Its background is removed at this step, and the input image is stripped of extra information. By pre-processing the image, the effects of lighting can be minimized, while at the same time, fine details that represent dust and soil can be emphasized. The red, green, blue (RGB) image is converted to hue saturation value (HSV) during this process. This is followed by feature extraction, which converts the input image into a limited number of parameters. The final step in the proposed method is classification. A flexible camera orientation was used in this study to collect two hundred images under controlled conditions with variable lighting types. One-half of the collected images show clean and dry panels, while the other half shows dirty panels. Histogram equalization and a high pass filter are used to improve image contrast. The histogram equalization technique improves a picture's disparity. Based on the results of the tested images, the proposed method has a high recognition rate. It would be helpful to consider incorporating the shadow areas, broken panels, and wet panels into the pattern recognition stage in the future.

#### *1.4. Device Management and Performance Analysis*

As a device management system [76], a smart system for cleaning and monitoring can improve or enhance the solar panel's performance. In a system failure, the analyzed data are transferred to the cloud for predictive maintenance and cause assessment via the internet. In addition to using real-time monitoring data, historical data and trends can also be used to make comparisons [77]. Analytics applications such as predictive analytics, pattern recognition, and machine learning analyze data and trigger sequenced patterns of behavior based on data filtering, normalization, and transformation [78]. The site engineers can make future decisions based on historical data stored in this way. This prevents equipment failure. It also eliminates the need to keep track of upgrades and saves time and money.

The main aim of the monitoring system for the PV power plant is to transmit the data in a reliable, secure, and efficient manner. However, several issues significantly affect the performance of various monitoring technologies in terms of efficiency, range, data processing capability, sampling rate, and signal interference. There remains a clear link between dirt monitoring and dirt cleaning, especially under varying environmental test conditions [79,80]. The performance ratio of a photovoltaic system is the proportional rate between the instantaneous power generation and its rated power generation [9]. In existence are dirt accumulation monitoring and diagnostic systems, which do not take into account the instantaneous rate of dirt accumulation. It was not possible to quantify the cost of dirt contamination as there was no specific data available. The qualitative analysis revealed in this study is vital to evaluate smart system performance in monitoring and cleaning dirt from solar panels.

In an attempt to offer a systematization of the literature about optimizing the performance of smart systems monitoring solar panel cleaning, we investigated the variables and factors effective at producing the best outcomes in such systems. Parameters for the performance evaluation of solar panel monitoring and cleaning systems include the complexity of the system (hardware and software structure), the number of sensors required to provide reliable sensory feedback, the detection time for the system after an abnormal solar panel condition, and steps involved in the monitoring and cleaning action. The most important challenge in the dirt detection process is to counter the unique operating characteristics that result due to changing environmental conditions. Using environmental sensors such as weight and irradiance sensors to measure dirt accumulation rates is more susceptible to environmental effects. The complexity of these methods is lower, and they require high computational accuracy as well. In contrast, image detection is more reliable, requires less detection time, and is reported to be more robust in detecting dirt and shading on solar panels [81].

The remaining sections of the study give the methodology associated with the systematic literature review, and the results and conclusions. In conclusion, the study summarizes the results, implications, and limitations.

#### **2. Methods**

Scopus and Google Scholar were used as the fundamental database to retrieve the data for this study. Although additional sources were cited for this article, they were not included in the review, since they were only used to clarify the background of the topic. The Scopus database provides free access to STM (Scientific, technical and medical) journal articles and the references cited in those articles, research, and collection development can both be performed with the database [82]. Search topics included "solar panel", "monitoring", and "cleaning". The terms appeared in the titles, abstracts, and keywords of the publications. The period covered spanned from 2008 to 2022. Various types of publications were indexed, and the number of publications associated with smart solar panel monitoring, and cleaning was 45. According to the document types, the majority are journal articles (*n* = 25, 55.6%) and conference papers (*n* = 17, 37.8%). Similarly, conference reviews are few (*n* = 3, 6.7%). The Scopus database contains details about every publication, such as the publication year, the authors, their addresses, the title, abstract, the journal, subject categories, and references. This set of data from Scopus was exported for the analysis of publication output and growing trends, as well as geographical and institutional distribution and collaboration. The Scopus analysis feature was used to visualize the geographical and institutional distribution and collaboration, while VOSviewer was used to analyze and visualize author relationships, co-citations, and terms. VOS represents the similarity or relatedness between items according to their distance as accurately as possible [83]. This method of clustering topics into groups was used to classify them into different groups, where each group is denoted by a different color. The results section describes in detail how the visualizations were interpreted.

#### *2.1. Search*

The research scope was formulated at the intersection of the broad terms "solar panel performance improvement" and "remote monitoring and cleaning". The search string is composed of two main parts: (i) the smart solar system; (ii) the exclusions and limitations of the search scope. The structure of the search string used in the Scopus search comprises solar AND panel AND monitoring AND cleaning.

#### *2.2. Eligibility Criteria*

An overview of the hypotheses of the current study can be found in the preregistration on Open Science Framework (osf.io/rk8yj). The screening step in the evaluation process involves a deeper analysis of the publication's full-text analysis for potential research items, which is discussed accordingly. Scientific publications in the English language, including

reviews, research articles, and open access documents, were considered. Moreover, the publication duplicate was checked by the EPPI-Reviewer.

After full-text analysis, two articles were excluded because they failed to meet the inclusion criteria, or did not meet the quality requirements since this review is focused more on the smart systems for solar cleaning and its implications. A total of 55 scientific publications entered the data collection phase. This is illustrated in Figure 8.

**Figure 8.** Process flow chart of the search.

#### **3. Results**

#### *3.1. Overview of Selected Articles*

Compared to other reviews on the monitoring and cleaning of the solar panel (e.g., [46,84]), the current review provides a relatively short bibliometric analysis. The bibliometric analysis was conducted following a systematic literature review (with PRISMA) that allowed the elimination of articles outside of the pre-defined scope (see Section 3) and work only with those within. Due to the lack of studies regarding the implementation of the Internet of Things in improving solar panel performance, the number of articles was significantly reduced during the filtering process (see Figure 8), resulting in only 41 papers for the bibliometric analysis.

#### 3.1.1. Publication Output and Growth Trend

It is important to measure the number of publications of a scientific research discipline or subject to gauge its development trend. The number of smart systems for solar panel monitoring and cleaning publications has grown since 2008, as seen in Figure 9. In the year 2008, there was just one publication on smart systems for solar panel monitoring and cleaning. Until 2016, there were few publications on the subject (less than four publications each year). Every year since 2016 has had more than four in the number of publications, except for 2017, and 2019, when there has been a decrease. In 2020, there was a peak of publications (*n* = 11), followed by a downward trend (*n* = 4) in the first quarter of 2022.

**Figure 9.** Publication trends related to smart systems for solar panels (between 2008 and 2022).

3.1.2. Authors and Their Collaboration

A total of 159 writers contributed to the 41 articles. Only a small set of prolific authors contributes to a considerable percentage of publications on a certain issue, which is consistent with observations in other subject areas. As shown in Figure 10, the subject areas of engineering and energy received the highest percentage of credit (22.0 percent each; approximately *n* = 9/41), while the computer science topic area accounts for 15.0 percent (*n* = 6/41). As a result of the multi-subject area publications, it can be stated that there is a lot of collaborative research in smart system technology.

**Figure 10.** Analysis of subject area on smart systems for the solar panel.

One significant cluster of writers may be identified in the collaboration network in Figure 11. The average published for each of the principal researchers is in the year 2020. In

terms of authorship, it is worth noting a potential bias: writers with the same name could not be separated from one another.

**Figure 11.** Author and co-citation visualization in the VOS viewer.

#### 3.1.3. Geographical Distribution

One hundred and fifty-nine writers from various nations or territories contributed to the smart systems for solar panel publications. Elven are in India, six are in China, two are in Egypt, and one publication each came from Senegal, Morocco, and Algeria. Figure 12 depicts the global distribution of contributing countries and territories for the most productive solar panel research on smart systems technology. It is an economic investment to clean the module surfaces, but the investment must be offset by a sufficient increase in energy production [85]. Economic growth appears to encourage scientific and academic investment since the most prolific papers on smart systems for solar panel research are found in all of the world's major industrialized countries. A publication might be written by various writers from different nations or territories, or a single author can be affiliated with multiple countries or territories. When looking at the continents in Figure 12, a geographical discrepancy can be detected in the extension of the information on countries and territories. The depth of the color on the map represents the number of authors from each country.

#### *3.2. Integration of Smart System for Solar Panel Monitoring and Cleaning*

To keep solar panels clean, automatic connections and continuous monitoring are necessary. Smart solar monitoring and cleaning applications can overcome all of these challenges with robust and efficient cloud-based tracking systems that provide consistent and real-time monitoring from remote locations. As part of smart systems applications for solar panel cleaning, a key characteristic will be the combination of their essential functions in providing timely monitoring and device management as a solution for improving the efficiency of solar plants. Sensors and actuators would be integrated with different configurations to provide autonomous applications with a smart system that supports solar panel cleaning. Table 2 summarizes the focus of various journal papers on smart solar systems.

**Figure 12.** Geographical distribution of authors.








#### **4. Discussion and Future Prospects**

The current status of smart system integration in solar panel monitoring and cleaning is summarized in this review. Through the proposed harmonized data structures, future assessments can be more efficiently planned and integrated by showing what data have already been used and what data can be used in the future. It can be especially beneficial when certain optimizations are implemented based on real-time data from a solar panel site. From this review, we have identified the following gaps and recommendations:

• Though the purpose of communication technologies and cloud platform implementation was justified in the past studies for monitoring real-time data for decision making, most do not relate to assessments of the analytical soundness, measurability, and platform deployment, as well as their linkages to one another. Furthermore, more theoretically based research is required to create reliable evidence for selecting communication technologies and implementing cloud platforms.


The main gaps in the smart system for solar panel monitoring and cleaning are the optimal cleaning frequency and costs, which are yet to be proven with the monitored data. The modeling of energy output degradation is an important tool for increasing the bankability of solar plants since dirt does not accumulate uniformly over time, but rather is affected by variations in weather conditions from day to day.

#### **5. Conclusions**

In the systematic review, well-conducted studies have been shown to improve solar panel cleaning and monitoring through the inclusion of smart system integration. The findings of other reviews of smart systems for solar panels are consistent with the observation that smart systems for solar panel monitoring and maintenance are effective. The ability to visualize the solar panel dirt conditions can be instrumental in optimizing the cleaning time and operation. There were four areas of interventions our research identified: dirt detection, cleaning methods, wireless communication technologies for data gathering, and cloud platforms for IoT implementation. Currently, there is enough evidence, but more studies are needed to fill the identified knowledge gaps. Smart systems for solar panels have the potential to improve lifetime performance, reduce maintenance costs, reduce human intervention, as well as increase energy output. Ultimately, the optimal frequency and cost of cleaning must be determined with monitored data, but the evidence reviewed here can be helpful to practice, policy, and future research.

**Author Contributions:** Conceptualization, B.O.O. and N.N.; methodology, B.O.O. and N.N.; writing original draft preparation, B.O.O.; writing—review and editing, B.O.O., O.A.O. and N.N.; supervision, O.A.O. and N.N. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Intelligent Deep-Q-Network-Based Energy Management for an Isolated Microgrid**

**Bao Chau Phan 1, Meng-Tse Lee <sup>2</sup> and Ying-Chih Lai 1,3,\***


**Abstract:** The development of hybrid renewable energy systems (HRESs) can be the most feasible solution for a stable, environment-friendly, and cost-effective power generation, especially in rural and island territories. In this studied HRES, solar and wind energy are used as the major resources. Moreover, the electrolyzed hydrogen is utilized to store energy for the operation of a fuel cell. In case of insufficiency, battery and fuel cell are storage systems that supply energy, while a diesel generator adds a backup system to meet the load demand under bad weather conditions. An isolated HRES energy management system (EMS) based on a Deep Q Network (DQN) is introduced to ensure the reliable and efficient operation of the system. A DQN can deal with the problem of continuous state spaces and manage the dynamic behavior of hybrid systems without exact mathematical models. Following the power consumption data from Basco island of the Philippines, HOMER software is used to calculate the capacity of each component in the proposed power plant. In MATLAB/Simulink, the plant and its DQN-based EMS are simulated. Under different load profile scenarios, the proposed method is compared to the convectional dispatch (CD) control for a validation. Based on the outstanding performances with fewer fuel consumption, DQN is a very powerful and potential method for energy management.

**Keywords:** hybrid renewable energy system (HRES); isolated microgrid; energy management system (EMS); Deep Q Network (DQN); HOMER software

#### **1. Introduction**

The worldwide increase in energy demand leads to the consideration of using renewable energy types such as solar, wind, tidal, and geothermal. Currently, fossil fuels are still the major reliable power sources especially for rural and island electrification. On the other hand, fossil fuel price is constantly increasing, and fossil fuels are responsible for global environmental pollution. Consequently, many countries have recently opted for the long-term sustainable development of renewable energy. By 2025, the Ministry of Economic Affairs (Taiwan) aims at increasing the share of renewable energy to 20% within the total power generation, as well as phasing out nuclear energy. Several developing countries such as Philippines, Thailand, and Vietnam have changed their power development plan based on green energy. We consider them some of the most typical countries for the deployment of renewable energy power plants [1].

The recent development of solar and wind energy has recently been considered because of the available amount of solar radiation and wind distribution. These energy types are environment-friendly and cost effective, but unpredictable and uncontrollable as well due to the significant dependence on weather conditions. In order to improve the operational ability and efficiency of these power systems, the concept of a hybrid renewable energy system (HRES) was created [2]. In terms of power generation for rural and island

**Citation:** Phan, B.C.; Lee, M.-T.; Lai, Y.-C. Intelligent Deep-Q-Network-Based Energy Management for an Isolated Microgrid. *Appl. Sci.* **2022**, *12*, 8721. https://doi.org/10.3390/ app12178721

Academic Editors: Luis Hernández-Callejo, Sara Gallardo Saavedra and Sergio Nesmachnow

Received: 30 July 2022 Accepted: 27 August 2022 Published: 31 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

areas, HRES is more cost effective than a grid extension. Depending on the distance from a power station, a grid extension can range from 10,000 to 50,000 USD per kilometer [3].

In a HRES, the combination for sustainable and reliable power supply of renewable energy resources, energy storage systems (ESSs), and diesel generators (DGs) can create economic, technical, environmental, and social benefits to investors. The role of ESSs is to store the excess energy from renewable energy sources. DGs can be operated when both renewable energy resources and ESSs are out of power. The configuration and topology of a hybrid HRES system can vary in several ways. The most generic classification includes on-grid and off-grid systems. According to the bus interconnection or the physical link between all components, the system can be classified as DC, AC, or hybrid DC/AC [4]. To ensure a high level of system reliability and operational efficiency, energy management algorithms are needed to manage the power flow inside the system. In particular, this algorithm has to allow for the variation of load demand and the system complexity.

Energy management system (EMS) is one of the most important components of the HRES. The main function of EMS is to balance power between the system components reducing the amount of fossil fuel used for power generation. The EMS control can be classical and intelligent [4,5]. Classical EMS is based on linear, nonlinear, or dynamic programming [6]. We can also find rule-based and flowchart methods [7]. More latest classical EMS controllers are based on proportional-integral controller [8], sliding mode controller [9], and H-infinity controller [10]. Classical EMS, which may require complicated mathematical models with various system variables, has low computational complexity. Compared to classic EMS, the intelligent one seems to be more robust and more efficient. Examples include the fuzzy logic (FL) [11], the artificial neural network controller (ANN), the Neural-Fuzzy controller (ANFIS) [12], and a model predictive controller (MPC). In addition, evolutionary algorithms-based EMS methods have been also developed, such as the Particle Swarm Optimization (PSO), the Genetic Algorithm [13], and the Modified Bat Algorithm (MBA) [14]. Recently, machine learning has been applied for EMS such as support vector machine (SVM) [15]. Among these intelligent EMS methods fuzzy logic, neural network, and ANFIS are definitely popular.

Different from classical EMS, based on the intelligent EMS, simple mathematical models are required to manage hybrid system dynamic behaviors. However, the current forms of these methods are still not able to guarantee better performance of optimal control [15]. Over time, a lot of hybrid studies have been conducted to enhance the global optimal solutions and the convergence speed. The major purpose is to find the action that optimizes the value of an objective function. In [16], a method, named as PF3SACO, was developed to improve the optimization ability and convergence speed, in which PSO and fuzzy are used to adjust system parameters. In [17], to adapt to complex scenes, the author proposed a robust tracking method based on a feature weight pool that has multiple weights for different features. In [18], a variable neighborhood search and non-dominated sorting genetic algorithm II (VNS-NSGA-II) were applied to optimally solve the routing problem with multiple time windows. In [19], a principal component analysis (PCA), a local binary pattern (LBP), and a gray wolf algorithm were combined to optimize the parameters of kernel extreme learning machine (KELM) for image classification. It can be confirmed that these hybrid methods are powerful in solving a complex optimal problem, especially since they can be used to optimize the parameters of machine-learning-based approaches. However, they would heavily depend on complex mathematical models and computational complexity.

More studies on agent-based machine learning methods for hybrid EMS have been conducted recently, such as deep learning (DL) and deep reinforcement learning (DRL) [20,21]. Instead of using a complex mathematical control model, these agent-based approaches can manage the system by learning the control policy from the environmental-interacting historical data, leading to a potential solution to energy management problems. Following the concepts of RL and DRL, the control purpose is to obtain the maximum rewards by continuously interacting with the system environment. Based on exploration-exploitation

strategies, such as -greedy or softmax, the action with highest reward is taken [22]. Qlearning is a popular model-free RL algorithm. However, RL-based methods can only handle discrete control problems which may be hard to implement in practical applications. DRL-based methods combine RL with deep learning to handle continuous control problems with large state-action pair. DRL has successfully been implemented to play Go games and Atari [23]. It should be a powerful method to handle the problems of complex optimal control with large state spaces by using a deep neural network. It can also be applied in robotics [23], control of building HVAC [24], and hybrid electric cars [25].

Up to now, studies about the application of RL and DRL for energy management of a stand-alone microgrid are not common. A self-learning single neural network was proposed by Huang and Liu (2013) for EMS residential applications [26]. A two-step ahead Q-learning method was defined by Kuznetsova (2013) for scheduling the operation of the battery in a wind system. In [27], a three-step ahead Q-learning method was used to schedule battery operation a solar energy system. A Q-learning-based multi-agent for a solar system was developed in [28] to reduce the amount of energy consumption. Based on an autonomous multi-agent system in [29], it can manage RE buying and selling optically. In [30], authors proposed a multi-agent system to monitor energy generation and consumption. A Q-learning single agent system was applied to manage a solar energy system by Kofinas (2016) [31]. In [32], a Q-learning algorithm and a fuzzy reward function were introduced to improve system performance. It intends to learn about the power flow between the components of the solar system more efficiently, which includes a photovoltaic (PV), a battery, load demand, and a desalination unit (for water supply). Later, Kofinas (2018) [33] proposed a cooperative fuzzy Q-learning-based multi-agent system for the energy management of a stand-alone microgrid. The latter system included a PV, a fuel cell, a diesel generator, an electrolyzer, a hydrogen tank, battery, and a desalination plant. Each component was represented by an agent. Each agent acted as an individual learner and interacted with other agents. The simulation results from MATLAB/Simulink indicated that the controller could continuously maintain state and action space. The learning of each agent took place through exploration/exploitation with fast convergence towards a policy and with good performance. In [33], the author used fuzzy logic as the function approximation for determining the Q-values. Similar to the above approach, deep Q-learning (DQN) applies a neural network to calculate the Q-values in order to increase the learning capacity of agents. In [25], deep Q-learning was applied for the energy management of a hybrid electric vehicle. The DRL-based controller acted autonomously to learn an optimal policy without using any prediction or predefined rule.

The main goal of this study is to propose a DQN algorithm for the energy management of an isolated HRES and to present a case study about an HRES conducted at Basco island of the Philippines. It is the extended study of our previous work, which developed a DRL-based controller to track the maximum power point for PV systems under various weather and partial shading conditions [34]. DL and DRL are widely used in robotics and autonomous; however, only a few studies are about DRL application in an HRES for energy management. Thus, the advantage and novelty of this study is the application of DQN-based EMS for rural and island areas, in which the system includes battery, DG, and hydrogen system, as well as a case study with practical load demand data. The adopted power system in this study consists of a PV system, a wind turbine (WT), a battery, a DG, a fuel cell, an electrolyzer, and a hydrogen tank. Based on weather data and load demand at the applied site, we used HOMER software for determining the structure of the HRES.

The major contributions of this paper are described below:


The rest of the paper is organized as follows. The mathematical models of the system components are introduced in Section 2. The DQN algorithm and the CD control are introduced in Section 3. The performance of EMS controller based on DQN is simulated in Section 4. The final section describes the conclusion and future work directions.

#### **2. Mathematical Models of the System Components**

This section describes the mathematical models of the system components, which are used to calculate their power generation and consumption. In this HRES, solar and wind energy are the primary energy resources. Short-term energy storage technologies have the ability to store and discharge energy for minutes or hours after being charged. In contrast, long-term energy storage can extend the storage time between charging and discharging to weeks or seasons [35]. In HRES, FCs can be used as a long-term energy storage option [4]. However, the slow dynamics of fuel cells and their degradation due to frequent start up and shut down cycles are a major disadvantage. Hence, batteries are also needed to create a hybrid system in which they take care of the power deficit and act as a short-term energy storage medium [36]. Batteries can provide or absorb large power gradients in short time. However, due to their short lifetime, high self-discharge rate, sensitivity to environment conditions, and limited storage capacity, batteries are not suitable for long-term solution.

#### *2.1. PV System*

A *PV* system is composed of one or more solar panels integrated with inverter or other electrical and mechanical hardware, using energy from the Sun to generate electricity. The output power of the *PV* system is strongly affected by the amount of solar radiation and the ambient temperature. The expression for the *PV*-generated power is as follows [22]:

$$P\_{PV} = V\_{p\upsilon} I\_{p\upsilon} = I\_{p\upsilon} \left\{ \frac{q}{AkT} \ln \left( \frac{I\_{p\text{l}} - I\_{p\upsilon} + I\_{p\text{vo}}}{I\_{p\text{vo}}} \right) - I\_{p\upsilon} R\_s \right\} \tag{1}$$

where *k* is the Boltzmann constant, *A* is the non-ideality factor, *q* is the electron charge, *T* is temperature, *q* is the light-generated current, *Ipvo* is the dark saturation current, and *Rs* is the series resistance.

#### *2.2. Wind Turbine System*

During wind power generation, the blow of the wind generates kinetic energy, which drives the blades allowing the turbine to rotate. The mechanical energy then gets converted into electricity by the generator. The wind turbine system is significantly influenced by the wind speed. The generated power of the *WT* system is obtained from the manufacturers as follows [3]:

$$P\_{WT} = \begin{cases} 0 & \text{if } V < V\_{\text{in}} \text{ or } V > V\_{\text{out}}\\ P\_r \left(\frac{V - V\_{\text{in}}}{V\_r - V\_{\text{in}}}\right)^3 & \text{if } V\_{\text{in}} \le V < V\_r\\ P\_r & \text{if } V\_r \le V \le V\_{\text{out}} \end{cases} \tag{2}$$

where *PWT* denotes the output power at a particular value of wind speed. *Pr* represents the rated capacity. *Vin*, *Vr*, *Vout* stand for the cut-in, rated, and cut-out speeds, respectively.

#### *2.3. Battery Storage System*

Among various kinds of battery storage systems such as lithium-ion battery or nickelzinc battery, we chose lead-acid batteries for their low cost (300–600 USD per kWh). Leadacid batteries have a good cycle efficiency of up to 90% and a low self-discharge rate of less

than 0.3% [37]. They are designed to withstand more deep discharge cycles, which make them suitable for an HRES.

One of the most important parameters of the battery system is the *SOC*, which expresses the level of charge relative to its capacity. The excess power is used to charge the battery, while a power deficiency towards the load demand discharges the battery. The battery *SOC* can be defined as follows [38]:

$$SOC\_{t+1} = SOC\_t \pm \frac{P\_{Bat} \eta\_{Bat}}{P\_{n,Bat}} \times 100\tag{3}$$

where *SOCt*+<sup>1</sup> and *SOCt* contain the battery *SOC* at the next time step and the current step, respectively. *PBat* stands for the battery power charging or discharging (kWh), while *Pn*,*Bat* denotes the battery rated capacity, and *ηBat* denotes the round-trip efficiency.

When the battery is turned on during the operation, the charging and discharging rates of the battery are defined based on the amount of power required at the current time step, always satisfying:

$$P\_{Bat, discharge} \stackrel{<}{\sim} P\_{Bat} \stackrel{<}{\sim} P\_{Bat, charye} \tag{4}$$

where *PBat*,*discharge* with negative sign indicates the discharge rate of battery, and *PBat*, *charge* with positive sign shows the charge rate of battery.

At any time-step, the value of *SOC* must satisfy:

$$SOC\_{min} \le SOC \le SOC\_{max} \tag{5}$$

#### *2.4. Diesel Generator*

In the HRES system, a diesel generator is used as the back-up system when the load demand cannot be met by other components. The diesel generator ensures the availability, reliability, and quality of the power system all the time. We chose the model of the *DG* system according to its fuel consumption. In [39], an approximate linear model is presented where the hourly fuel consumption is calculated from the rated capacity of the *DG* and its operating power.

$$Fuel\_t = \alpha\_{DG} P\_{DG,t} + \beta\_{DG} P\_{r,t} \tag{6}$$

where *Fuelt* expresses the fuel consumption (*l*). *PDG*,*<sup>t</sup>* denotes the operating power, while *Pr*,*<sup>t</sup>* denotes the rated power of the *DG* system (kW). The coefficients of the fuel consumption are *αDG* = 0.246 and *βDG* = 0.08145. They were used similarly in several studies [40,41].

#### *2.5. Fuel Cell*

A fuel cell (*FC*) uses the chemical energy of hydrogen or another fuel to produce electricity. There are various types of *FCs* available in the market. The so-called proton exchange membrane fuel cell (PEMFC) is the most frequently used. The advantages of PEMFC include high-power density, low operating temperature, small size, and good performance at start up and shut down. For this reason, PEMFC was chosen for this project. The hourly hydrogen consumption can be expressed as follows [9]:

$$q\_{H\_{2,con}} = \frac{p\_{FC}}{E\_{low,H\_2} \eta\_{therm} \mathcal{U}\_f \eta\_{FC}} \tag{7}$$

where *PFC* denotes the output power supplied by the *FC*, *Elow*,*H*<sup>2</sup> = 33.35 kWh/kg assumes the lower heating value of the hydrogen, *ηtherm* = 0.98 is the thermodynamic efficiency at 289 *K*, while *Uf* is the fuel utilization coefficient, namely, the ratio between the mass of fuel entering the *FC* and the mass of fuel reacting in the *FC*. Finally, *ηFC* denotes the *FC* efficiency.

#### *2.6. Electrolyzer*

To supply the hydrogen fuel for the operation of the *FC*, an electrolyzer is used. It generates hydrogen from water via electrolysis. The chemical reaction in an electrolyzer is the reverse of that in an *FC*. The power absorbed by the electrolyzer and the generated hydrogen mass are related by the expression below [9]:

$$P\_{\rm EL} = Bq\_{H\_{2,\,\,nom}} + Aq\_{H\_{2,\,\,gen}} \tag{8}$$

where *PEL* denotes the power consumed by the electrolyzer system, *qH*2, *nom* denotes the nominal hydrogen mass flow generated by the electrolyzer, while *qH*2, *gen* symbolizes the actual generated hydrogen mass flow (kg/h). A and B are the consumption coefficients of the electrolyzer power curve where A = 10 kW/kg and B = 40 kW/kg were used in this paper.

#### *2.7. Hydrogen Tank*

In the HRES, a hydrogen tank is used as the container of hydrogen that is generated by the electrolyzer and is consumed by the *FC* system. Hydrogen can be stored as either liquid or pressurized gas. There are three methods to store the hydrogen: compressed high-pressure gas, hydrogen-absorbing materials, and liquid storage, among which, the first one is the most common. The hydrogen level in a hydrogen tank can be determined by the following expression [9]:

$$L\_{H\_2}(t+1) = L\_{H\_2}(t) + \frac{q\_{H\_2, \text{gen}} - q\_{H\_2, \text{con}}}{\mathbb{C}AP\_{H\_2}} \tag{9}$$

where *LH*<sup>2</sup> (*t* + 1) and *LH*<sup>2</sup> (*t*) stand for the level of the hydrogen at the next and the current time-steps, respectively, and *CAPH*<sup>2</sup> denotes the capacity of the hydrogen tank (kg).

#### *2.8. Power Balance*

Power balance is the state of equality between the produced energy and the load demand. More exactly, at each time step, the total possible power generation should never fall short of the power consumption. The weather data collection for feasibility extended over one year to facilitate system analysis and to allow for scheduling the operation of the whole system. The power balance equation is expressed as follows:

$$P\_{PV} + P\_{WT} + P\_{Bat} + P\_{DG} + P\_{FC} + P\_{EL} = P\_{Load} \tag{10}$$

#### **3. Energy Management of an HRES Based on Deep Q-Network**

*3.1. Introduction of the Proposed HRES*

EMS is one of the most important parts to ensure the system is in reliable and efficient operation. The main function of the EMS is to balance the power flow between the system components, and simultaneously reduce the amount of fossil fuel and cost of energy production. A proposed DC/AC-bus system for power generation is presented in Figure 1. Excess energy from PV and WT will be stored in the battery and hydrogen system by controlling the K\_Battery and K\_Electrolyzer switches. In case PV and WT cannot fulfill the load demand, based on the available energy levels of system components, EMS will discharge battery or turn on FC and DG by K\_Fuel-Cell and K\_Diesel switches, respectively.

The proposed EMS control schema is presented in Figure 2. It is a learning-based approach, so no explicit mathematical model of the system is needed. A Markov Decision Process (MDP) of the EMS is needed for the implementation of the DQN algorithm. Based on the MDP model, the objective is to find the optimal policy for dispatch control of the system components to ensure a stable operation of the power system with the lowest cost of energy. An MDP model of the EMS is firstly defined in Section 3.2, including states (S), actions (A), transition probabilities (P), and rewards (R). It is considered as a tuple S, A, P, R. In which, "S" is a finite set of states which describes the all the operating point of the

system. "A" is the control action. "P" is the probability of moving from one state to another one. "R" is an immediate return given to an agent when he or she performs specific action or task. Good action will receive positive reward while bad action will get punished.

A description of the DQN algorithm for EMS control is shown in the following part. In the DQN approach, a deep neural network is designed to approximate the action-value function and the DQN algorithm is adopted to train the neural network. It takes the state of the HRES as inputs, and outputs are the signals for dispatch control of the system components. The combination of the states of K\_Battery, K\_Electrolyzer, K\_Fuel-Cell, and K\_Diesel basically determines the system modes of operations. Finally, in Section 3.4, a conventional-based EMS is also applied for the validation of the proposed method.

**Figure 1.** The diagram of the proposed HRES.

**Figure 2.** The proposed Deep-Q-Network-based EMS.

*3.2. Markov Decision Process Model for the EMS*

#### 3.2.1. States and State Variables

During the operation of the HRES, the EMS controller receives a current state, it takes action, and moves to the next state based on its knowledge. The state information provides the basis for power flow control among all system components. The elements of our proposed HRES include *PV*, *WT*, *DG*, battery, and hydrogen system. The state variables are defined as combinations of the powers of load, *PV*, *WT*, *DG*, battery, fuel cell, and electrolyzer, as well as the state of charge, and the percentage of hydrogen in the tank (*LH*<sup>2</sup> ):

$$S = \left\{ P\_{\text{Load}\prime} P\_{\text{PV}\prime} P\_{\text{WT}\prime} P\_{\text{DG}\prime} P\_{\text{Rat}\prime} P\_{\text{FC}\prime} P\_{\text{EL}\prime} S O C\_{\prime} L\_{H\_2} \right\} \tag{11}$$

#### 3.2.2. Actions and Action Variables

Given the state at the current time step *st*, the EMS controller chooses an action and moves to next state by opening or dispatching the operation of following elements: *DG*, fuel cell, electrolyzer, and battery system. The action set *A* is formed by 4-component control signals:

$$A = \left\{ \sigma\_{\text{Battery}} \times \sigma\_{\text{DG}} \times \sigma\_{\text{FC}} \times \sigma\_{\text{EL}} \right\} \tag{12}$$

The control actions of the battery system are discharging (−1), stopping (0), and charging (1), that is:

$$
\sigma\_{Battery} = \{-1, 0, 1\} \tag{13}
$$

The control action variable of the diesel generator is in Equation (14), including stop, operating 25%, 50%, 75%, and full capacity, that is:

$$
\sigma\_{DG} = \{0, 0.25, 0.5, 0.75, 1\} \tag{14}
$$

The control action variables of *FC* and electrolyzer are defined as *σFC* and *σEL*, respectively, including ON (0) and OFF (1), that is:

$$
\sigma\_{\rm FC} = \{0, 1\} \tag{15}
$$

$$
\sigma\_{\rm EL} = \{0, 1\} \tag{16}
$$

#### 3.2.3. Transition Probability

Transition probability defines the probability that the agent moves from one state to another state. Given an action *at*, where *t* denotes the current time step, the transition probability from a current state *st* to the next state *st*+<sup>1</sup> = *s* is denoted by *P<sup>a</sup> ss* , that is [42]:

$$P\_{ss'}^{a} = P\left[S\_{t+1} = s'|S\_t = s, A\_t = a\right] \tag{17}$$

In model-based energy management approaches, the transition probabilities *P<sup>a</sup> ss* are estimated by Monte Carlo simulation based on the prior probability distribution, or they are predicted by a short-term prediction model. However, in a model-free approach such as the DQN algorithm, they are estimated through learning from data.

#### 3.2.4. Rewards

Reward function is used to calculate the reward from environment in response to a given state and action. It describes how the agent ought to behave. A good reward function can accelerate convergence during the training process. It can also affect the controller performance. For a simple approach, our designed reward function is the consumption of the reward from each system component as follows:

$$r\_t(s\_{t\prime}a\_t) = r\_{t,Bat} + r\_{t,F\subseteq} + r\_{t,EL} + r\_{t,DG} \tag{18}$$

where *rt*,*Bat*, *rt*,*FC*, *rt*,*EL*, and *rt*,*DG* are the rewards from the subsystems: battery, fuel cell, electrolyzer, and diesel generator.

The component rewards are essentially defined as follows:

$$r\_{t, \text{Rat}} = \begin{cases} \frac{P\_{\text{Rat}}}{P\_{\text{discharge}, \text{max}}} & \text{if } \left( P\_{PV} + P\_{WT} + P\_{DG} - \frac{P\_{\text{Load}}}{\eta\_{invert}} \right) \ge 0\\ -\frac{P\_{\text{Rat}}}{P\_{\text{discharge}, \text{max}}} & \text{otherwise} \end{cases} \tag{19}$$

$$r\_{t, \text{FC}} = \begin{cases} \frac{2 \ast P\_{\text{FC}}}{P\_{\text{FC}, \text{max}}} & \text{if } \left( P\_{\text{PV}} + P\_{\text{WT}} - P\_{\text{Rat}} - \frac{P\_{\text{Load}}}{\eta\_{\text{invert}}} \right) \le 0 \text{ and } \text{SOC} \le 0.5\\ \frac{P\_{\text{FC}}}{P\_{\text{FC}, \text{max}}} & \text{if } \left( P\_{\text{PV}} + P\_{\text{WT}} - P\_{\text{Rat}} - \frac{P\_{\text{Load}}}{\eta\_{\text{invert}}} \right) \le 0\\ -\frac{P\_{\text{FC}}}{P\_{\text{FC}, \text{max}}} & \text{otherwise} \end{cases} \tag{20}$$

$$\sigma\_{t, \text{EL}} = \begin{cases} \frac{2 \ast P\_{\text{FL}}}{P\_{\text{EL,max}}} & \text{if } \left( P\_{PV} + P\_{WT} + P\_{DG} - P\_{\text{Bat}} - \frac{P\_{\text{Land}}}{\eta\_{invert}} \right) \ge 0 \text{ and } \text{SOC} \ge 0.9\\ \frac{P\_{\text{FL}}}{P\_{\text{EL,max}}} & \text{if } \left( P\_{PV} + P\_{WT} + P\_{DG} - P\_{\text{Bat}} - \frac{P\_{\text{Land}}}{\eta\_{invert}} \right) \ge 0\\ -\frac{P\_{\text{FL}}}{P\_{\text{EL,max}}} & \text{otherwise} \end{cases} \tag{21}$$

$$r\_{t,DG} = -\frac{Fuel\_t}{Fuel\_{\text{max}}} \tag{22}$$

where *ηinverter* is the inverter efficiency, *Fuelt* is the fuel consumption of the diesel generator based on the actual operating power at time step *t*, and *Fuelmax* is the fuel consumption at the maximum capacity.

As shown in Equations (19)–(22), the component reward functions are defined based on the result of the power balance function. For example, the battery will get negative reward when the sum of *PV*, *WT*, and *DG* powers is smaller than 0. Thus, the agent will learn to avoid choosing the negative-reward actions. The reward functions of *FC* and *EL* are similar to that of battery. For *DG* reward function, more fuel consumption means more negative rewards. Thus, it helps the agent to stop the operation time of the *DG* as much as possible.

In addition, the agent receives a big penalty if these parameters are out of their boundaries as shown below:

$$SOC\_{min} \le SOC \le SOC\_{max} \tag{23}$$

$$P\_{Bat, discharge} \stackrel{<}{\leq} P\_{Bat} \stackrel{<}{\leq} P\_{Bat, charge} \tag{24}$$

$$L\_{H2,min} \le L\_{H2} \le L\_{H2,max} \tag{25}$$

$$0 \le P\_{\rm FC} \le P\_{\rm FC,max} \tag{26}$$

$$0 \le P\_{EL} \le P\_{EL,\text{max}}\tag{27}$$

#### *3.3. Methodology of the DQN-Based EMS*

In this part, the DQN algorithm is described. Its objective is to find an optimal policy that maximizes the expected total rewards from a starting state. Figure 3 shows a graph of DQN-based EMS. The optimal policy is formulated as [42]:

$$V^{\pi^\*}(s) = \max E\_{\pi} \left[ \sum\_{t=0}^T \gamma^t r\_{t+1} | s\_0 = s \right] \tag{28}$$

where *π*<sup>∗</sup> ∈ Π is the optimal policy in response to a given state and action. It is a strategy which is applied by the agent to decide the next action based on the current state. 0 < *γ* < 1 is the discount factor used to define the importance of future reward. *E<sup>π</sup>* denotes the expected value of reward according to the policy the agent follows.

In the DQN formulation, the optimal policy is represented by the optimal actionvalue function:

$$V^{\pi^\*}(s) = \max \mathbb{Q}^{\pi^\*}(s, a) \tag{29}$$

where *Vπ*<sup>∗</sup> (*s*) is the optimal state-value function of an MDP. It is the expected return starting from state "*s*" following optimal *π*∗; *Qπ*<sup>∗</sup> (*s*, *a*) is the optimal action-value function. It is the expected return starting from state "*s*", following optimal policy *π*∗, taking action "*a*". It focuses on the particular action at the particular state.

It is expressed as follows [42]:

$$\mathbf{Q}^{\pi^\*} (\mathbf{s}, a) = \mathbb{E}\_{\pi^\*} \left[ \sum\_{k=1}^{\infty} \gamma^{k-1} r\_{t+k} | \mathbf{s}\_t = \mathbf{s}, a\_t = a \right] = \mathbb{E}\_{\pi^\*} \left[ r\_t + \gamma \max \mathbf{Q}^{\pi^\*} (\mathbf{s}\_{t+1}, a\_{t+1}) | \mathbf{s}\_t = \mathbf{s}, a\_t = a \right] \tag{30}$$

Following the optimal action-value function, the optimal policy can be determined by [42]:

$$
\pi^\*(s) = \arg\max Q^{\pi^\*}(s, a) \tag{31}
$$

In the DQN algorithm as shown in Figure 4, a deep neural network is used to calculate *Qπ*<sup>∗</sup> (*s*, *a*). It is expressed as *Q*(*s*, *a*|*θ*) network, where *θ* is the weight vector of the neural networks. As shown in the pseudo code in Figure 4, two separate *Q*-networks are used. *Q*(*s*, *a*|*θ*) represents the prediction network, while *Q*(*s*, *a*|*θ* ) represents the target network [42]. To train the *Q*-network, a gradient descent is applied to minimize the loss function of the target and prediction networks. In every time step of the training process, the prediction *Q* network is updated by back-propagation method. In contrast, the target network is frozen. After a period of C time steps (C steps in the algorithm), its weights are updated by simply copying the weights from the current prediction *Q* network. Freezing the target *Q* network for a period of time helps stabilize the training process. In general, the Deep *Q* Network must be trained through the process in Figure 4 to ensure that EMS controller always chooses the best action. Then, EMS uses its trained Deep *Q* Network to calculate the *Q* value based on the current state information, and the next action is chosen following that *Q* value.

**Figure 3.** A graph of DQN-based EMS.


#### *3.4. Methodology of the Conventional Dispatch-Based EMS*

The EMS controller chooses the operational mode of an HRES according to the power difference between generation and consumption and the available power in the energy storage system. It aims to satisfy the power demand all the time with the lowest fuel consumption. Following the work in [14], an convectional dispatch EMS method is applied in this study. It is used to compare with DQN-based method in term of system performance efficiency. The control actions of CD method are the same as DQN, including switching on/off the diesel generator, the fuel cell, and the electrolyzer, as well as charging/stopping/discharging the battery. The flow chart of the considered method is shown in Figures 5 and 6. This controller chooses the operational mode according to the power difference between generation and consumption and the available power in the energy storage system. It aims to satisfy the power demand all the time with the lowest fuel consumption.

**Figure 5.** Flow chart of the EMS controller of our HRES based on the CD method (branch 1).

**Figure 6.** Flow chart of the EMS controller of our HRES based on the CD method (branch 2).

#### **4. Results and Discussion**

#### *4.1. Site Description*

Based on the weather and load data collected from this area, an optimal configuration of HRES is calculated by HOMER software [43]. Then, its simulation model is defined in MATLAB/Simulink for the implementation of our DQN-based EMS. In this part, the introduction of Basco Island is presented. This island is about 190 km away from Taiwan and is located in the northern region of the Philippines, where the major economic sectors are farming and fishing. On the island, the current source for power generation are diesel generators and fossil fuels, which require high operational costs due to the constantly increasing fuel prices and logistic costs. The location of Basco Island is excellent for marine resource management and tourism. As the government supports developing a sustainable economy, the local governors took the opportunity to invest in a more environment-friendly power system for the local community. Thus, research plays an important role in the economic development plan in this area. It ensures the continuous power supply with low cost of energy and environmental friendliness.

Figure 7 shows the diagram of our presented HRES in HOMER software (left), as well as the load profile through the year at Basco station (right) presented in HOMER software. A daily power consumption with an average demand of 700 kW every hour is shown in Figure 8. The weather data used for system simulation were taken from the database of the National Renewable Energy Lab (NREL), which can be generated by HOMER software. The average year around solar radiation is 4.44 kWh/m2/day, while that of the wind speed is 7.22 m/s. Following the data, the energy system should supply 18 MWh a day with a peak power of 1.4 MW.

**Figure 7.** The proposed HRES (**left**) and the load demand at Basco station (**right**) presented in HOMER software.

Following the analysis in HOMER software, the optimal configuration of HRES in this case study is obtained [43]. It is reliable, environmentally friendly, and cost-effective. The proposed design includes a 5483 kW PV system, 236 pieces of 10 kW wind turbines, a 20,948 kWh battery system (48 V DC, 4 modules, 5237 strings), a 750 kW diesel generator, a 500 kW Fuel Cell system, a 3000 kW electrolyzer, a 500 kg hydrogen tank, and a 1575 kW converter. The Net Present Cost (NPC) of the system means the present value of the costs of investment and operation of a system over its lifetime. In this study, it was about 72.5 million USD. The Cost of Energy (COE), as the average cost per kWh of useful electrical energy produced by the system, was about 0.696 USD/kW. Furthermore, it can be concluded that the combination of the FC and the battery as the storage system is the best option for the design of HRES with lowest cost of energy. In this kind of system, FC is

for a long term, while the battery is for short-term usage. Following the load demand at the applied area, the system is practical and cost-effective.

**Figure 8.** The structure of the critic network for the DQN-based EMS.

#### *4.2. Implementation of DQN-Based EMS in MATLAB/Simulink*

We carried out the simulation of the designed HRES in the Reinforcement Learning Toolbox of the MATLAB/Simulink environment. The time interval between two time-steps was one hour. There was a total of 5000 episodes during the training process where each episode ran for a randomly selected 48-h period. At the beginning of each episode, random initial conditions were generated including the initial state of charge and the initial amount of hydrogen in the tank.

Based on the experiences from previous publication [34] as well as trial-and-error during the training process, the structure of the network and its training parameters were determined. This is a usable reference in this area because there are not many publications that discuss details of the implementation of DRL for an HRES. In this study, the structure of the critic network applied for the DQN method is depicted in Figure 8, while the initial setting parameters for the simulation are displayed in Table 1. The amount that the network weights are updated during training is referred to as the step size or the learning rate (*α*). A large learning rate helps the agent to learn faster, and it could obtain the local optimal solution. On the other hand, a smaller learning rate may allow the agent to learn a global solution but may take significantly longer to train. In this study, the learning rate of the critic network is set to 0.001. It would mean that weights in the Q network are updated 0.1% of the estimated weight error each updating time. The action space of DQN comprises the combination of the actions of the four system components: battery, fuel cell, electrolyzer, and diesel generator.

**Table 1.** Parameters for the simulation of the DQN-based EMS.


The discount factor (γ) affects how much weight it gives to future rewards in the value function. γ = 0 means that the agent will be completely myopic and only studies actions that produce an immediate reward. γ = 1 means that the agent will assess each of its actions based on the sum total of all of its future rewards. Exploration rate (ε) is the probability that our agent will explore the environment rather than exploit it. It is set to 1 at the beginning and reduced gradually over the training time. This ensures that the agent has enough time to explore and learn all about the environment.

#### *4.3. Training Result*

The training progress of the EMS controller based on the DQN algorithm is shown in Figure 9. The blue line represents the total reward in each episode, while average reward of total episodes at every time step is indicated by the red line. The estimation of the discounted long-term reward of critics when each episode starts, episode Q0, is marked as the yellow line in the graph. The average reward of total episodes at every time step flattens after 500 episodes. During the training process, we save the trained agents for online use when the average reward passes the design average value.

**Figure 9.** The training process of the DQN-based EMS.

#### *4.4. Performance under Various Conditions*

We used two scenarios for validating the performance of the proposed method. Each test also included a comparison with a conventional dispatch-based control. In the first scenarios, the operation of the diesel generator is totally turned off by the controller, because the battery and hydrogen system can fulfill the load the demand in case of not enough from solar PV and wind turbine. The second scenario is used to test for the operation of a diesel generator when all other energy resources run out of energy. It starts with less energy from PV and WT, so the operation of battery and hydrogen are required. Finally, diesel generation must be turned on to ensure the operation of the power system.

The simulation period was two days long using one-hour intervals between consecutive steps. WT, PV, and load demand were randomly generated from the year-round data. SOC and hydrogen levels were initialized with random values. Training based on random inputs shows the proposed DQN method can make effective schedules for the EMS in a deterministic environment from any initial conditions. The SOC minimum level was set to 30% in order to avoid running into deep discharging, thereby increasing battery lifetime. The minimum hydrogen level was set to 0. The simulation was implemented in the Reinforcement Learning Toolbox of MATLAB/Simulink software.

#### 4.4.1. Scenario 1

The first scenario aimed at demonstrating the performance of the proposed DQN approach without the operation of the diesel generator. Figure 10 indicates the available power from the PV (green) and WT (blue) systems. The load demand is depicted by the red line. The simulation result is displayed in Figure 11. The three subfigures on the left apply for the DQN-based (red) EMS method, while on the right, apply for the CD-based (blue) EMS method. The first row displays the SOC of the battery. The second row displays the level of hydrogen in the tank. The third row displays the fuel consumption of the diesel generator.

**Figure 10.** Load demand, PV, and WT power in Scenario 1.

**Figure 11.** Comparison between the DQN and CD methods in Scenario 1.

Figure 11 shows that the diesel generators remained shut down under both methods. The SOCs on the left and right are almost identical. Between steps 0 and 5, the battery was charged by the power production of the WT system. Between steps 5 and 11, the battery switched to discharging due to no power from PV and WT. Between steps 11 and 24, more renewable power was available, so the battery was charged, and the excess power was used to run the electrolyzer. The amount of hydrogen increased between 17 and 20 h and between 38 and 43 h. Under the DQN method, the battery itself handled the problem of insufficient renewable input. Since the fuel cell remained shut down, there was

no reduction of the hydrogen level in the tank over the simulation time. Under the CD approach, the fuel cell operated during steps 21–25 and 45–46, reducing the hydrogen level.

#### 4.4.2. Scenario 2

The second scenario aimed at demonstrating the performance of the proposed DQN approach with the operation of the diesel generator. Similar to the previous case, Figure 12 shows the PV and WT productions and the load demand, while Figure 13 demonstrates the performance of the DQN- and CD-based methods. No renewable energy was available at the beginning. The level of SOC was 45%, and the amount of hydrogen in the tank was 10%. Thus, the diesel generator was forced to operate when power deficit occurred.

**Figure 12.** Load demand, PV, and WT power in Scenario 2.

**Figure 13.** Comparison between the DQN and CD methods in Scenario 2.

From 0 to 4 time-steps, since no power was available from PV and WT, the battery discharged to its lower limit of 30%. The fuel cell supplied the power demand from 5 to 7 time-steps, resulting in the reduction of the hydrogen level. Since the power deficit persisted, the diesel generator turned on from 6 to 8 time-steps. The battery was charged fully from 9 to 15 time-steps when more power was produced by PV and WT. Similarly, extra power was used by the electrolyzer to generate hydrogen. After that, the battery discharged from 23 to 34 time-steps, and charged to its upper limit by step 38. The diesel

generator and the fuel cell remained off due to no consumption. The operating time of the diesel generator under both methods was 2 h. However, the proposed DQN method only consumed 353 L, while the CD-based method consumed 492 L.

#### **5. Conclusions**

This study presents a DQN-based control to solve the complex problem of energy management in an HRES, where the energy flow between the HRES units is managed. The power system for case study on Basco Island, Philippines, includes a PV system, a WT system, a battery system, a diesel generator, and a hydrogen system. Due to its advantages of non-polluting power generation, the hydrogen system is considered for use in the proposed HRES. In the hydrogen system, an electrolyzer uses the excess energy from PV and WT to generate hydrogen for the operation of the fuel cell when an occasion of power shortage occurs. In the field of HRES, most of the current studies applied Q-leaning method, which has the limitation of a finite state and action space. In order to confront with continuous state space and large discrete action space, we introduced a deep neural network, allowing the agent to use function approximation to generalize across states, instead of using a Q look-up table. For any given state, the agent will choose the action with the highest value of reward and move to the next state. An MDP model of the HRES and the reward functions are formulated for the implementation of the proposed method in MATLAB/Simulink environment.

A basic rule-based EMS method named CD is considered to compare with the proposed DQN following the power efficiency. Based on this comparison, we know that the proposed method is always equal to or at least is better than the CD method. Despite only two scenarios considered for the result analysis, it can be concluded that the proposed method has good performance and outperforms the CD method under any uncertain environment. This is because the agent is trained based on the random initial conditions with random weather data and load demand, generated from the whole-year data.

The future work is to perform comparative real-time experiments with different advanced EMS methods such as Fuzzy, ANFIS, and PSO. Furthermore, to overcome the disadvantage of our proposed method, which is using a simple network structure and a basic reward function, a better study on the design of deep neural networks and gradient reward functions should be considered for fast convergence and less fluctuation of the average reward during the training process. These two factors ensure that the optimal policy for optimal EMS control of HRES is always obtained. Moreover, computational complexity should be an important metric for testing and validation. In addition, lithiumion batteries are just as cheap as lead-acid batteries. They have lower self-discharge rates and higher lifetimes and efficiencies. Moreover, in the size category of multi-MW-storages, high temperature batteries such as sodium-sulfur batteries may be worth looking into for the future development. Instead of using two-day data, multiple-year data will be applied for the simulation.

In conclusion, we believe that deep reinforcement learning is the new potential trend in the field of energy conversion and management due to the following features: (1) the ability to learn from experience, (2) the ability to solve complex optimal control problems without prior environment knowledge, (3) the requirement of a simple mathematical model, and (4) the ability to handle problems for continuous state and action spaces.

**Author Contributions:** Conceptualization, B.C.P., M.-T.L. and Y.-C.L.; methodology, M.-T.L. and Y.-C.L.; software, B.C.P.; validation, B.C.P. and Y.-C.L.; formal analysis, B.C.P.; investigation, B.C.P.; resources, Y.-C.L.; data curation, B.C.P.; writing—original draft preparation, B.C.P.; writing—review and editing, B.C.P., M.-T.L. and Y.-C.L.; visualization, B.C.P. and Y.-C.L.; supervision, Y.-C.L.; project administration, Y.-C.L.; funding acquisition, Y.-C.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Ministry of Science and Technology (MOST) under grant number MOST 111-2221-E-006-110- and MOST 111-2622-E-006-012-.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Development of an Intelligent Solution for the Optimization of Hybrid Energy Systems**

**Djamel Saba 1,\*, Fahima Hajjej 2, Omar Cheikhrouhou 3, Youcef Sahli 1, Abdelkader Hadidi <sup>1</sup> and Habib Hamam 4,5,6,7**


**Abstract:** This paper presents a proposal for the development of a new intelligent solution for the optimization of hybrid energy systems. This solution is of great importance for installers of hybrid energy systems, as it helps them obtain the best configuration of the hybrid energy system (efficient and less expensive). In this solution, it is sufficient to enter the name of the location of the hybrid energy system that we want to install; after that, the solution will show the name of the best technology from which the optimal configuration of this system can be obtained. To accomplish this goal, the study relied on the ontology approach for two reasons, one of which is related to the nature of hybrid systems, because it is characterized by a large amount of information that requires good structuring, and the second reason is the interaction of hybrid energy systems with the external environment (climate, site characteristics). Afterward, to develop the knowledge base of the ontology, many steps were followed, the first of which is related to a detailed study of the existing one and the extraction of the basic elements, such as the concepts and the relations between them, followed by the development of the rules of intelligent reasoning, which is an interaction between the elements of the ontology through which all possible cases are treated. The "Protégé" software was used to edit these elements and perform the simulation process to show the results of the developed solution. Finally, the paper includes a case study, and the results show the importance of the developed solution, and it is open to future developments.

**Keywords:** decision-making tool; intelligent reasoning rules; energy saving; energy domain ontology; hybrid energy system

### **1. Introduction**

Air pollution, climate change, and limited fossil resources have raised awareness that sustainable development that takes care of the environment in which we live is necessary [1]. With the difficulty of connecting electricity grids to remote areas, RE presents a good alternative to fossil fuels, it does not emit greenhouse gases, and it allows the decentralized production of resources [2]. However, the random specificity of energy sources imposes special rules for the optimization and operation of energy systems. In addition, the hybridization between some of the RE sources forms a complementarity of energy production and an alternative to conventional generators generally used to produce electricity. An HES design is an important step because of its relationship to completion

**Citation:** Saba, D.; Hajjej, F.; Cheikhrouhou, O.; Sahli, Y.; Hadidi, A.; Hamam, H. Development of an Intelligent Solution for the Optimization of Hybrid Energy Systems. *Appl. Sci.* **2022**, *12*, 8397. https://doi.org/10.3390/ app12178397

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 5 July 2022 Accepted: 8 August 2022 Published: 23 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

costs and reliability [3]. Therefore, it is necessary to provide new solutions and focus in particular on meeting consumer demands for energy while ensuring the minimum cost [4].

To meet the rapidly increasing demand for energy, all energy sources must be exploited. Renewable energies are unlimited and clean, but the biggest problem with them lies in their intermittent aspect. To overcome this problem, a combination of several energy sources is made to obtain the so-called hybrid renewable energy system [5,6]. Many works have been carried out on HES optimization methodologies. A study by Ammari et al. aimed at presenting and analyzing a literature review of recently published works in the field of hybrid renewable energy [7]. The study focuses on four basic categories of hybrid renewable energy systems, which are scaling (using software or using conventional methods), optimization (classical, synthetic, and hybrid methods), control (centralized, distributed, and hybrid control), and energy management (technical and economic objective). Furthermore, the review compares the different methods used in each category. A research work by Nourollahi et al. presents a hybrid approach to process improvement and addresses uncertainties in a residential energy system consisting of photovoltaics, fuel cells, boilers, and storage units [8]. In this regard, uncertain parameters are divided into two categories, including poorly behaved and well-behaved parameters, and then robust optimization and stochastic programming are used for modeling. Additionally, the conditional value at risk is implemented to assess the risks of well-behaved parameters. According to simulations, it has been shown that uncertainties with bad behavior have greater effects on the operation of the system. Moreover, changing the control parameters for robust optimization and conditional value at risk from 0 to 24 and from 0 to 1 increases the total cost by 5.2% and 0.47%, respectively. Comparative results also show that the proposed hybrid method makes less conservative decisions for cost optimization. The application of hybrid energy storage to distributed energy systems can greatly improve energy efficiency and reduce operating costs. However, insufficient efforts focused on investigating the integration of the two systems, configuration optimization, and systems operation strategy were found. Therefore, a new distributed energy system that combines hybrid energy storage was proposed, and the configuration of the system optimization and the operation strategy of the new system were considered simultaneously [9]. The studied hybrid system contained thermal storage and two forms of energy storage, that is, supercapacitors and a lithium battery. The impact of hybrid energy storage on distributed energy systems was fully considered. Then, a two-layer cooperative optimization method for the new system was introduced, taking into account energy efficiency, economy, and environmental protection. The new system was applied to an almost energy-free society. The results indicate that the energy-saving and equivalent pollutant-equivalent primary emission reduction rates for the new system were evaluated to be 54.8% and 63.6%, respectively, under the low-pass filter operating strategy combined with secondary feedback modulation of the supercapacitor charge state. System configuration and operating strategy can be optimized simultaneously by the twolayer collaborative optimization method. Renewable energy resources often suffer from challenges, such as erratic energy generation, as a consequence of weather and seasonal changes. Hybrid renewable energy systems are a solution for efficient energy densification of renewable resources as several of them are combined to overcome challenges arising from operating them in a stand-alone mode. A study by Sharma et al. proposed multitarget dynamic optimization of candidate hybrid renewable energy systems using a genetic algorithm in which the operating cost of hybrid renewable energy systems, nonrenewable energy use, and fuel emissions were simultaneously reduced over a limited period [10]. In optimization, three strategies are evaluated by considering the profiles of wind, solar, and 24 h load (Strategy 1), past and 1 h ahead (Strategy 2), and 1 h ahead (Strategy 3). A comparison of the results shows that the energy to be purchased from the network for Strategy 1 is 8.7% and 10.7% lower compared with Strategies 2 and 3, and also the energy sold to the network is 19% and 22% higher than for Strategies 2 and 3, respectively, while meeting the specified load profile of 100 households.

One of the most important applications of the renewable energy system is installing a well-designed HES in remote areas where grid extension is complicated and expensive. However, the proper design of such a system is a difficult task, such as coordination between different energy sources. Energy storage and load are very complex. Hybrid renewable energy system optimization is selecting appropriate components, scaling, and control strategies to provide efficient, reliable, and cost-effective alternative energy to society. A work by Fulzele and Daigavane presents a design of a hybrid renewable energy system consisting of photovoltaic cells and a wind generator with a battery and an inverter [11]. The system was optimally simulated using the HOGA (Hybrid Optimization by Genetic Algorithms) tool developed by the Department of Electrical Engineering of the University of Zaragoza, Spain. In the same context, a work by Mahmoud et al. is concerned with the application of modern optimization methods to devote the optimal configuration of hybrid renewable energy sources, which include photovoltaic panels, wind turbines with battery storage systems, and diesel generators [12]. The cost function of the optimization problem was chosen to be the energy cost and energy supply potential loss for the hybrid renewable energy system, where the main function of the optimization algorithms is to reduce this cost function. However, the optimal configuration cannot be achieved without meeting system reliability and operational limitations. A comparison of their results was made in order to determine the most effective method. Furthermore, a statistical study was conducted to determine the stability of the performance of each optimization strategy. The results revealed optimal variables, including the number of photovoltaic panels, wind turbines, batteries, and diesel generator capacity. Over the course of a year, the optimal configuration was tested against the study of capital and fuel expenditures. The statistical results also proved the robustness of the developed algorithm. Cano et al. presented a comparative study between four optimization methods for autonomous HES, which include two energy sources (PV, wind), storage batteries, and fuel cells [13]. The first method is based on mathematical equations, the second uses the SDO program, the third uses the Homer optimization software, and the last uses genetic algorithms based on the HOGA software. The results show that the HES designed by each method guarantees reliability in the energy supply, and SDO is the best HES optimization method. Other contributions use AI approaches. Maleki and Askarzadeh propose four heuristic algorithms, PSO, TS, SA, and HS, on an autonomous HES (PV, wind turbine, fuel cell) [14]. This study showed that HES with energy storage batteries is the best choice economically, and PSO is the most reliable optimization technique. In the same context, Aydin et al. were interested in the development of a methodology based on the information of a GIS to identify the preferred sites for an HES based on wind and PV [15]. This study used the principle of fuzzy logic and the MCDM approach to determine the economic feasibility of wind and solar energy. Finally, the associated maps were superimposed to obtain the most feasible places for HES. The proposed methodology can help policymakers and investors easily adapt to other types of energy resources. Luna-Rubio et al. provided an overview of optimization methods. The study presented the different architectures of stand-alone or network-connected HES [16]. This study concluded with the following submissions:


Djamel Saba et al. provided an optimization solution for HES based on the intelligent reasoning rules [17,18]. Then, this solution was improved to be generic. After implementing it, the results showed its effectiveness in choosing the optimization technique. In addition, it is characterized by various advantages, such as flexibility in its operation and updating [19]. Coelho et al. proposed a MAS solution to optimize the HES [20]. Amicarelli et al. proposed to minimize the costs, maximize the use of RE, and simplify the integration into the grid [21]. Elamine et al. present an intelligent method of HES energy management [22]. The solution is based on the MAS. This concept allows the different units of the MAS to work together

to achieve the objectives. Morstyn et al. studied the performance of the control strategy by verification on a microgrid that includes storage batteries and a PV generator [23]. This study allows us to know the advantages that characterize distributed management compared with centralized management. L. Raju et al. developed a MAS for managing the energy of a microgrid containing two PV systems, two wind turbines, a battery unit, and a diesel plant [24]. This solution offers the consumer the possibility of choosing the actions to increase energy efficiency. A study adapted to the west African climate was conducted by Mbodji et al. It is based on the MAS for the control, where it was applied on an HES (PV, wind, batteries) [25]. In this study, three load profiles were considered in the simulations. Then, the control made it possible to reduce battery use by 3%, 5%, and 6%, for profiles 1, 2, and 3, respectively. It was made in 1 day to obtain 35 kWh for profile 1 and 20 kWh for profile 3. From an economic aspect, the strategy applied to profile 1 was allowed a gain of EUR 6 per day and EUR 2322 per year or more than EUR 46,442 for 20 years (lifetime of the project).

The present study is intended to face the problem of the absence of a generic optimization solution for hybrid energy systems. The main objective of this work is to set up a reliable and easy solution to optimize hybrid energy systems suitable for most sites (generic). For these reasons, this study proposes a generic optimization solution. With this solution, the user is not required to know much data about optimization techniques, energy sources, and climatic data. The user only chooses the installation site, and it is up to the adopted solution that provides the appropriate sources as well as the most reliable technique to calculate the optimal configuration of the system to be installed. This contribution is based on the ontology of the field of energies. The choice of ontology is mainly due to the nature of the studied system and its environment (energy system), which is dynamic over time. In addition, this technique makes it possible to precisely present a set of knowledge in a form that can be permitted for use by a machine, which allows users to introduce the necessary updates without occurrence of any damage to the rest of the data.

#### **2. Materials and Methods**

#### *2.1. Proposed Approach for Ontology Construction*

In this work, the problem revolves around the absence of a generic optimization solution for hybrid energy systems. The main objective is to set up a reliable and easy solution to optimize hybrid energy systems suitable for most sites (generic). In the same context, and to solve any research problem, two subproblems must be initially separated and finally integrated, which are: the conceptual subproblem and the operational subproblem. For this research, the following hypothesis is employed:


In addition, the following main questions can be asked according to this hypothesis: what is the impact of the developed solution on the optimization of hybrid energy systems? It is also possible to extract from this question some subquestions, such as:


#### *2.2. Proposed Approach*

The goal of this solution is to find the optimal configuration of the HES. It is based on a knowledge base (KB) and includes some steps, as shown in Figure 1:


**Figure 1.** Proposed solution flowchart.

In addition, each optimization technique in the literature has been developed for a specific situation (types of energy sources, type of load, duration of application, etc.). The proposed solution at the start chooses the most appropriate optimization technique according to the data previously presented (concept, attribute, relation, etc.) and recorded

in the knowledge base of the solution ontology. The second step concerns the import and execution of its algorithm to finally obtain the best optimal configuration.

#### *2.3. Construction of Ontology*

The modeling process consists of several steps, as itemized below [19,26]:

#### 2.3.1. Define the Ontology Domain

This step is reserved for defining the ontology subject or domain. It includes a full understanding of the ontology subject. The ontology in this work would focus on the management of the energy required to model and optimize the HES.

#### 2.3.2. Reuse of Existing Ontologies

After consulting previous works, many contributions concern the optimization of HES [27]. These works have achieved very encouraging results, but they still require some development through the introduction (or removal) of concepts and the development of intelligent reasoning [28–30]. For this work, the creation of the ontology was performed from the beginning, and we did not use previous ontologies.

#### 2.3.3. Interesting Concepts for the Ontology

The main concepts of this ontology are sources (photovoltaic, wind energy, etc.), load, and climatic data (temperature, wind speed, etc.).

#### 2.3.4. Explain Classes and Their Hierarchy

The ontology concepts form a hierarchy [30]. There are some methods to developing a class hierarchy [31]:


In this work, the first method (top–down) was chosen (Table 1).

**Table 1.** An extract from the classes of the system.


2.3.5. The Properties of Classes and the Facets of Attributes

After the presentation of the concepts, it is essential to describe their interior structure [28,29]. Then, the attribute facet describes the types of values that can be assigned to the attribute [32] (Table 2).


#### 2.3.6. Design Instances and Relationships

The last step concerns the creation of instances and relationships for the proposed solution (Tables 3 and 4). These two elements are necessary for the development of the ontology [30].

**Table 3.** An extract from the ontology relations.


**Table 4.** Examples of instances.


2.3.7. Intelligent Reasoning of the Solution

All intelligent reasoning rules are constructed through a detailed study of the internal and external environments of the hybrid energy systems (the main elements of a hybrid energy system, environmental data, geographic data, etc.). We mainly focused our attention on related aspects that influence the HES operation. In addition, through this study, we extracted a series of information, namely, optimization techniques, installation site, climate data, load, and so on. Finally, depending on the predicate logic, the programming language Python, and the essential elements of ontology (concepts, attributes, relationships, and instances), we can formulate the intelligent reasoning rules associated with choosing the optimization technique (Figure 2, Table 5).

**Figure 2.** Examples of the solution rules.

**Table 5.** Description of R1, R2, and R3.


#### *2.4. Ontology Editing and Presentation of a Scenario*

2.4.1. Choice of Editor

For the editing of ontology, the "Protégé OWL 3.4.4" software was used because of the following advantages [19,35]:


#### 2.4.2. Choice of Reasoning Tools

Most inference engines can process rules added to the ontology. Rules-specific engines can be used, such as the Jess engine [36]. Most inference engines can process rules added to the ontology. Rules-specific engines can be used, such as the Jess engine [34]. The latter has a language for the expression of knowledge in the form of rules. It can be used from "Protégé-OWL API," thanks to the existence of a bridge that allows for the translation and execution of an ontology model in the language of Jess to retrieve the result in the "Protégé" software.

#### 2.4.3. SWRL

This is a rules language for the semantic web combining OWL-DL and RuleML [37]. It makes it possible to enrich the semantics of an ontology presented in OWL and to manipulate instances by variables (? X, ? Y, ? Z). It simply allows you to add relations according to the values of the variables and the satisfaction of the rule. However, SWRL does not allow the generation of concepts or relationships.

#### 2.4.4. Editing the Ontology

The ontology elements edited in Protégé are the concepts, attributes, relationships, and instances (Figure 3).

**Figure 3.** Elements of ontology in Protégé-OWL 3.4.4.

2.4.5. Implementation of Intelligent Reasoning Rules

The SWRL rules editor operates in Protégé OWL and provides a very interactive interface for editing rules (supporting all the features of the SWRL language). Rule engines, such as Jess, can be integrated with this editor, which helps provide richer rule-based reasoning (Figure 4).


**Figure 4.** Rules of inference in Protégé-OWL.

#### 2.4.6. Presentation of a Scenario

In this scenario, three Algerian cities were taken—Adrar, Annaba, and Illizi—which have different climatic characteristics. The three cities are presented with the average and annual radiation, the average and annual wind speed, the average and annual soil temperature, and the type of energy load (Tables 6 and 7).


**Table 7.** Criteria for selecting energy sources.


The criteria for selecting renewable energy sources are defined (Table 7):

Regarding the values of 2550 Wh/m<sup>2</sup> for solar radiation, 5 m/s for wind speed, and 110 ◦C for soil temperature, they are considered the minimum acceptable values for electrical energy production for solar energy source, wind energy source, and geothermal energy source, respectively. For this reason, they were selected for the case study.

In this case study, we tested three techniques: LPSP, MUMT, and YMOST. In addition, the first technique is applied to loads of the "Load Daily" type. The second technique is applied to loads of the "Load Monthly" type. The third technique is applied to loads of the "Load Annual" type.

The execution of the solution includes a set of steps:

Step 1:

This step is reserved for the choice of the site based on the rule form presented in Figure 5.

**Figure 5.** Rule form.

In this example, the choice is Adrar City. Step 2:

The objective in this step is to choose the energy sources. This operation is realized based on the intelligent reasoning rules (Figure 6).

**Figure 6.** Examples of intelligent reasoning rules.


In this step, the objective is to choose the optimization technique by using the R1, R2, and R3 reasoning rules, then by replacing the variable "y = DailyLoad" in the three rules, (R1), (R2), and (R3). The "LPSP" technique is proposed by the reasoning of the solution.

Due to the complexity of using and mastering the software and optimization tools for HES, this solution was proposed. It allows the selection of the appropriate tool to calculate the optimal configuration. This selection is made in a simple, easy way. It also makes it possible to relieve the user of any knowledge of energy sources or optimization techniques.

The use of other methods requires knowledge of several aspects (sites, techniques, energy sources). On the other hand, with this solution, the user only chooses the site concerned with the installation, and the rest of the steps are ensured by the developed solution.

#### **3. Results and Discussion**

#### *Evaluation of the Proposed Solution*

To have an alignment of a proposed solution of a research question, which presents the key to evaluating solutions from a research point of view, requires the good formulation and use of hypotheses and conceptual subproblems. Research is only feasible if we can establish the validity of the hypotheses and conceptual subproblems proposed through a good review of the literature. Additionally, the correct answers to the conceptual subproblems allow a good alignment.

Through the results obtained in this work, we can see that all responses to the elements mentioned in subsection "*2.1. Proposed Approach for Ontology Construction*", respond to the sub-questions stated in the same sub-section (Table 8).

**Table 8.** Samples of evaluation of the proposed solution.


To assess the hypotheses proposed in this work, the following steps are realized:


#### **4. Conclusions**

This work focused on the problem of optimizing HES. In this context, it was proposed to develop an optimal solution based on ontology.

The HES are systems that combine multiple energy sources to meet electricity needs. They allow improvements in terms of energy efficiency and increasing integration of RE sources. However, the process of optimizing these systems is complex, especially with its distributed and interactive nature with the external environment.

In an HES, energy sources and storage tools are combined to meet consumer demand for electricity while ensuring the lowest cost. These tasks are the main goals for HES optimization. However, because of the intermittent nature of RE, the optimization of the HES is difficult, which depends principally on the data of energy sources and climate. Responding to the optimization problem is precisely the objective of this work.

There is a set of tools for HES optimization, such as algorithms and software, where each is based on one or hybridization between several approaches (probability, linear programming, fuzzy logic, neural networks, etc.). There is also some software for HES optimization, the most popular being Hybrid, TRNSYS [38], Hybrid2, Homer [39], and HOGA [40]. However, the choice between these tools is considered a real problem. First, a generic optimization solution for HES was developed. This contribution shows its reliability following an explanatory scenario and based on real data from Algerian sites. The results indicate the importance of the proposed solution for real uses, where the user can benefit from many advantages, such as saving time, accuracy, and ease of use.

For future works, this solution needs collaborative work with experts in many fields, which will allow the development of the existing solution in many points, as well as the introduction of some other technologies, such as multiagent systems, especially the mobile agent, which can navigate to the source of information to take it instead of staying in a fixed address and communicate with the rest of the agents based on network protocols, which makes the solution more effective. Finally, extensive testing must be performed for other case studies to find the limits of the use of this solution.

**Author Contributions:** Formal analysis, D.S.; writing—original draft, D.S.; software, D.S. and H.H.; methodology, D.S. and O.C.; validation, D.S. and F.H.; funding acquisition, O.C., F.H. and H.H.; supervision, D.S., O.C. and Y.S.; visualization, F.H. and H.H.; writing—review and editing, Y.S., A.H. and H.H.; project administration, D.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia, Project Number PNURSP2022R236.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** This project was supported by Princess Nourah bint Abdulrahman University researchers supporting Project Number (PNURSP2022R236), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviation**


#### **References**


### *Article* **A Hybrid Generative Adversarial Network Model for Ultra Short-Term Wind Speed Prediction**

**Qingyuan Wang 1,†, Longnv Huang 1,†, Jiehui Huang 1,†, Qiaoan Liu 2, Limin Chen 1,\*, Yin Liang 1, Peter X. Liu 1,3 and Chunquan Li <sup>1</sup>**


**Abstract:** To improve the accuracy of ultra-short-term wind speed prediction, a hybrid generative adversarial network model (HGANN) is proposed in this paper. Firstly, to reduce the noise of the wind sequence, the raw wind data are decomposed using complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN). Then the decomposed modalities are entered into the HGANN network for prediction. HGANN is a continuous game between the generator and the discriminator, which in turn allows the generator to learn the distribution of the wind data and make predictions about it. Notably, we developed the optimized broad learning system (OBLS) as a generator for the HGANN network, which can improve the generalization ability and error convergence of HGANN. In addition, improved particle swarm optimization (IPSO) was used to optimize the hyperparameters of OBLS. To validate the performance of the HGANN model, experiments were conducted using wind sequences from different regions and at different times. The experimental results show that our model outperforms other cutting-edge benchmark models in single-step and multi-step forecasts. This demonstrates not only the accuracy and robustness of the proposed model but also the applicability of our model to more general environments for wind speed prediction.

**Keywords:** wind speed forecast; OBLS; data preprocessing; optimized hyper-parameters

#### **1. Introduction**

Energy demand has always been one of the main problems of human development since the increasing consumption of energy with the improvement of living standards. In recent years, renewable energy has gradually become a research hotspot. Wind energy is valued for its clean, pollution-free, renewable, and abundant availability. However, wind is highly random and volatile, which may affect the stability of the power system and hinder the efficient use of wind energy [1]. Accurate ultra-short-term wind speed prediction models are therefore crucial in power dispatch planning and power market operations [2]. Thus, reliable wind speed prediction has drawn a lot of interest.

The three common wind speed prediction models are physical models, statistical models, and hybrid models. Physical models take into account the physical conditions and locations of wind farms, which require abundant meteorological data. Numerical weather prediction is a typical physical model, as it takes into account temperature pressure and obstacles for wind speed prediction, so it has a long calculation period [3]. Physical models

**Citation:** Wang, Q.; Huang, L.; Huang, J.; Liu, Q.; Chen, L.; Liang, Y.; Liu, P.X.; Li, C. A Hybrid Generative Adversarial Network Model for Ultra Short-Term Wind Speed Prediction. *Sustainability* **2022**, *14*, 9021. https:// doi.org/10.3390/su14159021

Academic Editors: Luis Hernández-Callejo, Sergio Nesmachnow and Sara Gallardo Saavedra

Received: 5 June 2022 Accepted: 20 July 2022 Published: 22 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

are efficient and accurate for long-term forecasting, but they are computationally intensive and expensive for small-scale forecasting.

Statistical models make better use of historical wind speed data to predict future wind speeds than physical models. Statistical models include both traditional statistical models and neural network-based models. Traditional statistical models include the autoregressive moving average model [4], the autoregressive integrated moving average model [5], the Bayesian model [6], etc. However, the non-linear nature of wind makes it difficult for traditional statistical models to extract the deeper features of wind speed data. Neural networks are introduced into the field of wind speed prediction for their ability to fit the non-linear part of the data well. Neural network-based models can extract deeper features from wind speed data than traditional statistical models—for example, BP [7], RBF [8], artificial neural network [9], SVR [10], etc. To improve the learning ability and predictive ability of predictive models, deep neural networks are introduced into wind speed prediction, such as the deep belief network [11], RNN [12], GNN [13], and LSTM [14].

In recent years, hybrid models have gradually become the mainstream wind speed prediction models. Hybrid models typically use one or more auxiliary strategies to assist the main forecasting network in wind speed prediction. Therefore, hybrid models can achieve better prediction performance than physical models and statistical models. The auxiliary strategies involved in hybrid models include data preprocessing techniques, optimization algorithms, error correction, and weighting strategies.

(1) Data preprocessing techniques. Zhang et al. [15] used EMD for data pre-processing of wind speed, which effectively reduced the volatility of the wind speed series. However, EMD suffers from the problem of modal confusion, which leads to unsatisfactory decomposition results. Santhosh et al. [16] used EEMD to process the raw wind speed series, which effectively mitigated the EMD problem. However, EEMD has a noise residual problem affected by noise residuals. Wang et al. [17] used CEEMD for wind speed prediction. CEEMD cancels out the residual noise with a pair of white noises, effectively improving the efficiency of the calculation. Ren et al. [18] experimentally demonstrated that the CEEMDAN-based model always performs best compared to the EMD-based model.

(2) Optimization algorithms. Optimization algorithms can be used to optimize the hyperparameters, weights, network structure, and thresholds of predictive models. Li et al. [19] used PSO to optimize two hyper-parameters of LSTM, which solved the problem of wide intervals caused by interval superposition and thus improved the wind speed prediction accuracy. Tian [20] used PSO to optimize the weight coefficients of each prediction model, and the experimental results demonstrate the necessity of introducing the weight coefficient optimization strategy. Liu et al. [21] used GA to optimize the internal parameters of LSTM, which improved the efficiency and accuracy of the prediction model. Cui et al. [22] used the Bat algorithm to optimize the thresholds of BP networks, effectively improving the generalization ability and nonlinear mapping ability of BP networks.

(3) Error correction. Error correction is a post-processing technique for wind forecasting. It predicts the residuals and superimposes the predictions on the original predictions to obtain the final predictions. Duan et al. [23] used improved CEEMDAN to decompose the errors, and the experimental results showed that the error decomposition correction method can significantly enhance the prediction accuracy. Liu et al. [24] proposed an adaptive multiple error correction method, which makes full use of the deeper predictable components and effectively improves the reliability and accuracy of the model. Zhang et al. [25] demonstrated experimentally that the final predicted values after Markov chain correction are closer to the original wind field data, which proves that the use of the Markov chain is effective.

(4) Weighting strategies. To scientifically determine the weights of different prediction networks in a hybrid model, many scholars have proposed different weighting strategies. To alleviate the adverse effects of multi-collinearity in combinatorial prediction models, Jiang et al. [26] used a GMDH neural network to automatically identify the weights of three nonlinear models. The application of GMDH can significantly improve the predictive

capability compared to the widely used equal-weighting scheme. Wang et al. [27] used MTO to minimize the error sum of squares of the IOWA operator, which obtains the optimal weight vector for the combined prediction model and ensures the stability of the prediction results. Altan et al. [28] optimized the weighting coefficients for each IMF using the gray wolf optimizer algorithm.

Although the above models achieve good predictive performance, they still have some problems. Methods involving deep neural networks [27] cause huge computational costs. Hybrid methods based on weighting strategy [28] may have the problem of multicollinearity, which reduces the prediction accuracy. The performance of hybrid methods based on parameter optimization [26] is largely influenced by the understanding of the researcher of the optimization algorithm.

Considering the above issues, we propose a hybrid model combining data preprocessing techniques and optimization algorithms for ultra-short-term wind speed prediction. We design the hybrid generative adversarial network (HGANN) as the prediction master network for the proposed hybrid model. The contributions and innovations of this research are concluded as follows:


The rest of this article is organized as follows. Section 2 introduces the model framework and methods involved in this article in detail. In Section 3, the experimental cases and prediction results are elaborated in detail, which verifies the validity of the framework we propose. Section 4 contains a discussion of the results of the experiment. The conclusions are presented in Section 5.

#### **2. Proposed Predictive Framework**

#### *2.1. Overall Framework of HGANN*

Generative adversarial networks (GANs) [29] are deep learning networks, which are composed of a generator and discriminator that confront each other. The role of the generator is to generate false samples that are close to the real ones. The role of the discriminator is to distinguish between true and false samples as correctly as possible. However, GANs often suffer from the problem of target confusion. Our proposed HGANN alleviates this problem to a great extent.

We developed a hybrid generative adversarial network model (HGANN) for ultrashort-term wind speed prediction, which uses the two networks to compete with each other to achieve highly accurate wind speed predictions. The proposed model is shown in Figure 1. First, CEEMDAN decomposes the raw wind speed data into multiple modalities. These modalities are separately fed into the generator of HGANN, the OBLS. The generator is used to obtain virtual samples that are similar to real samples. The virtual samples and real samples are then fed into the discriminator, which consists of convolutional layers and fully connected layers. The discriminator extracts the high-dimensional features of the input samples through the convolutional layer, and then further extracts the effective features by the fully connected layer. The outputs scalars "1" or "0" of the discriminator are passed to the generator and the discriminator to perform the iterative update of HGANN. Via the continuous iterative update, OBLS obtains the best parameters and performs wind speed prediction. Finally, the final wind speed forecast can be obtained by stacking all forecast values.

**Figure 1.** The proposed short-term wind speed forecasting framework. In the data processing step, CEEMDAN turns the wind data into multiple modalities. The HGANN network consisting of a generator and discriminator predicts these modalities. The final wind speed prediction result can then be obtained by stacking the prediction results of all modalities.

#### *2.2. CEEMDAN Model*

Due to the high volatility of the wind speed series, CEEMDAN [30] is introduced to smooth the wind speed data. CEEMDAN decomposes a signal into some modalities.

The original wind speed series is defined as *X*(*n*). CEEMDAN decomposes *X*(*n*) into *IMFj*(*n*), *j* = 1, 2, 3, ... *J* and residue *rj*(*n*). Figure 2 shows the flow chart of the CEEMDAN algorithm. The specific steps of the algorithm are as follows.

Randomly generate white noise with (0, 1), which is defined as *w<sup>i</sup>* (*n*), *i* = 1, 2, ... *I*. Define an operator *E*{∗}, which generates the IMFs by EMD. We set the noise standard deviation to *ε* = 0.2 and the ensemble size to *I* = 500.

Add *w<sup>i</sup>* (*n*) to the *X*(*n*) and generate a new series with noisy signals *X*(*n*) + *ε*0*w<sup>i</sup>* (*n*). For *j*= 1, the first-order *IMF*1(*n*) that is decomposed by EMD is expressed as:

$$\overline{IMF\_1}(n) = \frac{1}{I} \sum\_{i=1}^{I} E\left\{ X(n) + \varepsilon\_0 w^i(n) \right\} \tag{1}$$

The first-order residue is computed as follows:

$$r\_1(n) = X(n) - \overline{IMF\_1}(n) \tag{2}$$

For *j* = 2, 3, . . . *J*, calculate the *IMFj*(*n*) and the *j*th residue as follows:

$$\overline{IMF\_{\vec{\jmath}}}(n) = \frac{1}{I} \sum\_{i=1}^{I} E\left\{ \left. r\_{\vec{\jmath}-1}(n) + \varepsilon\_{\vec{\jmath}-1} w^{i}(n) \right\} \right\} \tag{3}$$

$$r\_j(n) = r\_{j-1}(n) - \overline{IMF\_j}(n) \tag{4}$$

Decompose *E* 4 *rj*−1(*n*) + *<sup>ε</sup>j*−1*w<sup>i</sup>* (*n*) 5 until the residue *rj*(*n*) cannot be decomposed and has only one extreme value. Then we can get *IMFj*(*n*) = <sup>1</sup> *<sup>I</sup>* <sup>∑</sup>*<sup>I</sup> <sup>i</sup>*=<sup>1</sup> *E* <sup>4</sup> *rj*−1(*n*) + *<sup>ε</sup>j*−1*w<sup>i</sup>* (*n*) 5 and the final residue *rj*(*n*) = *<sup>X</sup>*(*n*) <sup>−</sup> <sup>∑</sup>*<sup>J</sup> <sup>j</sup>*=<sup>1</sup> *IMFj*(*n*).

The original wind speed time series *X*(*n*) can be decomposed as *X*(*n*) = ∑*<sup>J</sup> <sup>j</sup>*=<sup>1</sup> *IMFj*(*n*) +*rj*(*n*), where *IMFj*(*n*) or *rj*(*n*) can represent different features of the wind speed.

#### *2.3. Generator OBLS for HGANN*

BLS [31] can provide incremental structural learning. It achieves better forecasting results in time-series forecasting. Furthermore, because of its shallow network structure, BLS has higher error convergence performance than CNN. Compared with BLS, OBLS can provide both higher convergence performance and predictive accuracy. This is because OBLS uses IPSO to improve the network hyper-parameter of optimization. Therefore, OBLS has faster convergence and higher error convergence than CNN. Therefore, instead of using CNN as the generator of GAN, we use OBLS as the generator of HGANN to solve the problem of target confusion during HGANN training, which can improve the generalization ability and error convergence of HGANN, and thus make HGANN more suitable for wind speed prediction. The following is the detailed process of the OBLS algorithm.

Randomly generate *n* particles so that the dimensions of the particles are a threedimensional vector {*NF*, *NW*, *NE*} corresponding to the three parameters of BLS, respectively. Initialize the particle position *xid* ∈ (1, 100), and speed *vid* ∈ (−1, 1). Determine the learning factors *c*<sup>1</sup> = 1.5 and *c*<sup>2</sup> = 1.5, inertia weights *wmax* = 1.0 and *wmin* = 0.4, and the maximum number of iterations *itermax* = 100.

Assume the input wind speed series data *<sup>X</sup>*(*n*) and project the data using ∅*i*(*X*(*n*)*Wei*+ *βei*) to represent *i*th mapped feature *Zi*, where *Wei* represents random weight with the proper dimensions. The *<sup>j</sup>*th group of enhancement nodes ∅*<sup>j</sup> ZiWhj* <sup>+</sup> *<sup>β</sup>hj* is denoted as *Hi*. ∅*<sup>i</sup>* and ∅*<sup>j</sup>* can be different functions. The *<sup>i</sup>*th mappings can be denoted as:

$$Z\_i = \mathbb{1}\_i(X(n)W\_{\text{ei}} + \beta\_{\text{ei}}), i = 1, 2, \dots, n \tag{5}$$

The feature nodes are denoted as *Z<sup>n</sup>* - [*Z*1, *Z*<sup>2</sup> ... *Zn*], where *Whj* and *βhj* are random weights. The enhanced nodes are denoted as:

$$H\_j = \mathcal{Q}\_j \left( Z\_i \mathcal{W}\_{\text{hj}} + \beta\_{\text{hj}} \right), j = 1, 2, \dots, m \tag{6}$$

Let *H<sup>m</sup>* - [*H*1, *H*<sup>2</sup> ... *Hm*] where the symbol means "noted as"; then the output of the BLS can be denoted as:

$$\mathcal{Y} = \{ Z^n | H^m \} \mathcal{W}^n \tag{7}$$

where the *W<sup>n</sup>* is the final target weight needed by OBLS and is obtained through the ridge regression algorithm, that is, *W<sup>n</sup>* -{*Zn*|*Hm*}+*Y*.

Let {*M*} <sup>=</sup> {*Zn*|*Hm*}; then {*Zn*|*Hm*}<sup>+</sup> can be expressed as follows:

$$\{Z^{\mathfrak{n}}|H^{\mathfrak{m}}\}^{+} = \lim\_{\check{\mathbf{v}} \to 0} \left\{{}^{\check{\mathbf{v}}}I + \{M\}\{M\}^{T}\right\}^{-1} \{M\}^{T} \tag{8}$$

where λ is *l*<sup>2</sup> regularization.

The IPSO [32] is introduced to iterate to optimize the parameters of BLS: {*NF*, *NW*, *NE*}. When the iteration of IPSO is consistently performed, the position and speed of the particles are continually updated through the following equation:

$$w\_{id} = wv\_{id} + c\_1 r\_1 (p\_{id} - \mathbf{x}\_{id}) + c\_2 r\_2 \left(p\_{\ghd} - \mathbf{x}\_{\gd}\right) \tag{9}$$

$$
\pi\_{id} = \pi\_{id} + \gamma v\_{id} \tag{10}
$$

Here, *γ* is the velocity coefficient; the value of the inertia weight *w* is *w* = *wmax* − (*wmax* − *wmin*) ∗ 1/*iter*. When reaching the maximum iterative number *itermax*, the iteration is stopped and the optimal value of {*NF*, *NW*, *NE*} can be obtained.

The generator takes the wind speed subsequence {*x*1, *x*2,... *xn*} as input, which is generated by CEEMDAN. Then the generator generates a new wind speed sequence 4 *y* <sup>1</sup>, *y* <sup>2</sup>,... *y n* 5 , which is statistically similar to the wind speed sequence {*y*1, *y*2,... *yn*}.

From Equations (8)–(10), OBLS does not require layer-to-layer coupling. Since there are no multi-layer connections, OBLS does not need to use gradient descent to update the weights, so the computational cost of OBLS is significantly lower than that of deep learning. When the accuracy of OBLS does not meet the requirements, its accuracy can be improved by increasing the "width" of the network nodes. Compared with the increase in the amount of calculation by increasing the number of layers in the deep network, the increase in that by increasing the "width" of the network nodes in OBLS is negligible.

#### *2.4. Discriminators for HGANN*

To maintain the stability of the generated samples, we used the discriminator of WGAN [33] as the discriminator of HGANN. In HGANN, the discriminator takes 4 *xi*, *y i* 5 or {*xi*, *yi*} as input. The training goal of discriminator is to discriminate <sup>4</sup> *xi*, *y i* 5 as false and {*xi*, *yi*} as true. The discriminator is trained by minimizing the distance function (L*D*) (loss function), which is defined as follows:

$$\mathcal{L}\_D = \mathcal{L}(D(\{\mathbf{x}\_{i'} y\_i\}), 1) + \mathcal{L}(D(\{\mathbf{x}\_{i'} y\_i'\}), 0) + GP \tag{11}$$

$$P = \frac{\bullet}{m} \sum\_{i}^{m} \left[ \nabla\_{\mathbf{x}\_{i}, \mathbf{y}\_{i}'} D\left(\mathbf{x}\_{i}, \mathbf{y}\_{i}'\right)^{i} \right]^{2} \tag{12}$$

here, *D*(∗) represents the output of the D; L is the binary cross entropy, defined as:

$$\mathcal{L} = -[k \log(s) + (1 - k) \log(1 - s)]\tag{13}$$

Based on this loss function, the discriminator can achieve an output of 1 when the input is {*xi*, *yi*} and an output of 0 when the input is <sup>4</sup> *xi*, *y i* 5 , and then discriminates the wind speed sequence {*y*1, *y*2,... *yn*}.

The discriminator outputs a scalar of "0" or "1." The scalar of "0" or "1" has two purposes: (1) It can influence and then adjust the weights of the neural network in the discriminator and maximize Equation (14) through a backpropagation algorithm. (2) It can be passed to the generator to assist the PSO algorithm to find the optimal hyperparameters of the OBLS and then calculate the value of the fitness function *Fc*, which is defined as follows:

$$F\_{\mathbb{C}} = \frac{1}{n} \sum\_{i=1}^{n} D(G(\mathbf{x}(n))) \tag{14}$$

where *G*(∗) represents the output of the generator.

#### *2.5. Prediction Steps of the Proposed HGANN Model*

We propose the HGANN model for ultra-short-term wind speed prediction. The flow chart of the prediction process of the proposed model is shown in Figure 3. CEEMDAN is used to decompose the raw wind speed data {*x*1, *x*2,... *xn*} into multiple modes *IMFj*(*n*). These *IMFj*(*n*) are separately sent into the generator (OBLS) of HGANN to obtain virtual samples 4 *y* <sup>1</sup>, *y* <sup>2</sup>,... *y n* 5 . The discriminator (WGAN) takes 4 *xi*, *y i* <sup>5</sup> or {*xi*, *yi*} as input and then outputs scalars "1" or "0." The scalars "1" or "0" are passed to the generator (OBLS) and the discriminator (WGAN) to participate in iterative model updates. Through the continuous iterative update, OBLS obtains the optimal value of {*NF*, *NW*, *NE*}. The final wind speed forecasting values {*y*1, *y*2,... *yn*} can be obtained by stacking all forecast values.

**Figure 3.** Flowchart of the proposed HGANN model prediction procedure.

#### **3. Case Analysis**

#### *3.1. Data Description*

To demonstrate the applicability of the proposed model in different locations, we used datasets from the 50Hertz wind farm in Germany and the Mahuangshan wind farm in China [26]: HER and MHS, respectively. The HER datasets are freely available at http://www.netztransparenz.de/ (accessed on 5 October 2021). Both data sets are recorded for one year and wind speeds are measured in 15 min intervals. We selected wind speed series from both HER and MHS datasets for March, June, September, and December, representing spring, summer, autumn, and winter, respectively. Experiments using the four wind speed series of spring, summer, autumn, and winter can verify the applicability of our model at different periods. Each series contains 2880 samples. Table 1 shows information on the selected wind speed data for spring, summer, autumn, and winter.


**Table 1.** Seasonal statistics of the wind speed data.

In our experiments, the first 80% of the wind speed sequence was used as the training set, and the rest was used as the test set for ultra-short-term wind prediction. Table 1 displays the information of the four datasets. The experiments were implemented in MATLAB R2021b on a 64-bit personal computer with Intel(R) core i5-9300 CPU/16.00 GB RAM.

#### *3.2. Evaluation Index*

To comprehensively evaluate the prediction performance of HGANN, four evaluate indicators were given. MAE can accurately reflect the average value of the absolute error. MAPE divides the absolute error by the corresponding actual value. RMSE represents the sample standard deviation between the predicted value and the actual observation value, which has a very sensitive reflection and can reflect the accuracy of the prediction well. SSE represents the total error of the model. Their definitions are as follows:

$$MAE = \frac{1}{N} \sum\_{i=1}^{N} |y\_i - \hat{y}\_i| \tag{15}$$

$$MAPE = \frac{1}{N} \sum\_{i=1}^{N} \left| \frac{y\_i - \mathcal{Y}\_i}{y\_i} \right| \tag{16}$$

$$RMSE = \sqrt{\frac{1}{N} \sum\_{i=1}^{N} (y\_i - \hat{y}\_i)^2} \tag{17}$$

$$SSE = \sum\_{j=1}^{n} (y\_i - \hat{y}\_i)^2 \tag{18}$$

where *y*ˆ*<sup>i</sup>* is the predicted value and *yi* is the actual value.

#### *3.3. Comparable Methods*

To verify the prediction performance of the proposed HGANN, it was compared with 10 advanced predictive models, involving PSO-ANFIS [34], VMD-GA-BP [35], EEMD-GPR-LSTM [36], EMD-ISSA-LSTM [37], MWS-CE-ENN [20], CNN [38], WGAN [39], BLS [31], OBLS, and WGAN-OBLS. Table 2 lists the parameter settings of six comparison methods. BLS, WGAN, PSO-BLS, and PSO-WGAN-OBLS are the same as those of HAGNN to perform the ablation experiment for HAGNN.

**Table 2.** Parameter settings of the models.


In Table 2, *itermax* is the iterative number; *ep* is the number of network iterations; *np* is population size; *c*<sup>1</sup> and *c*<sup>2</sup> are personal and global learning coefficients, respectively; *nr*, *nv*, *ni*, and *no* are the number of rules, variables, input nodes, and output nodes, respectively; *k* is the decomposition number of VMD/EEMD; *nbi* is the number of the ith hidden nodes; *lr* is the learning rate of the network; *pr* is the training requirement accuracy; *nstd* is the noise standard deviation in ICEEMDAN/CEEMDAN; *ve* is the early warning value; *pd* and *ps* are the proportion of discoverers and sparrows aware of danger, respectively; and ˘ is the regularization parameter for ridge regression.

#### *3.4. Experimental Results*

#### (1) Experiment I: Comparison Between Different Forecasting Methods

We experimentally verified the effectiveness and advancement of the proposed HGANN by comparing it with PSO-ANFIS, VMD-GA-BP, EEMD-GPR-LSTM, MWS-CE-ENN, and EMD-ISSA-LSTM. Considering that wind data characteristics show strong seasonality, experiments were conducted using wind series from multiple seasons to further validate the predictive performance of the model. We chose the HER and MHS datasets for March, June, September, and December for this experiment. The training and testing processes of each of the compared models were repeated 10 times. The experimental results of the different datasets are presented in Tables 3 and 4, where the first-best predictions are highlighted. Figure 4 depicts the wind speed prediction results of the proposed model for the HER dataset.


**Table 3.** Forecast results of different models for the HER data.


**Table 3.** *Cont.*

**Table 4.** Forecast results of different models for the MHS data.


**Figure 4.** Forecasting results of HER wind speed data sets: (**a**) experiment results of spring wind speed sequences; (**b**) experiment results of summer wind speed sequences; (**c**) experiment results of autumn wind speed sequences; (**d**) experiment results of winter wind speed sequences.

Interestingly, it can be seen from Tables 3 and 4 that the proposed HGANN had the best prediction performance for each of the RMSE, SSE, MAPE, and MAE indicators on four datasets among all models. Wind speed forecasts at different times or at different locations may yield different results. Notably, our model achieved promising predictive results on both the geographically distinct German dataset HER and the Chinese dataset MHS. Furthermore, our model showed competitive prediction performance for wind series in different seasons. This indicates that our model can be extended to more general environments for wind speed prediction.

The abscissa and ordinate in Figure 4 represent the actual wind speed and the predicted wind speed, respectively; the blue line indicates that the predicted value is equal to the actual value. The ordinate of the green point is the predicted value, so the fit of the green point to the straight line reflects the accuracy of the prediction. As can be seen from Figure 4, the green points are very close to the blue line, which indicates that our model can predict wind speed effectively.

(2) Experiment II: Multi-Step Prediction Experiment

Multi-step forecasting can be built based on single-step forecasting. Compared to singlestep forecasting, multi-step forecasting is more practical for power systems. Therefore, in wind speed prediction, multi-step prediction is of high practical value. The experiment aimed to demonstrate the predictive performance of the HGANN model in multi-step forecasting. We selected 2880 samples from 23 August to 22 September from the HER dataset for the one-step, two-step, and three-step experiments. Performance metrics involved RMSE, SSE, MAPE, and MAE. Benchmark models covered PSO-ANFIS, VMD-GA-BP, EEMD-GPR-LSTM, MWS-CE-ENN, and EMD-ISSA-LSTM. The training and testing processes of each model were repeated 10 times. The experimental results of the proposed model and the benchmark models are shown in Table 5.

**Table 5.** Multi-step prediction results for 15 min wind speed.


From Table 5, it can be seen that the one-step, two-step, and three-step prediction results of the proposed HGANN model provided lower RMSE, SSE, MAPE, and MAE values than those of the benchmark models. For instance, the proposed HGANN provided 0.0091 (one-step), 0.0136 (two-step), and 0.0178 (three-step) on RMSE, compared with EMD-ISSA-LSTM, which provided the predictive results of 0.0109 (one-step), 0.0139 (two-step), and 0.0183 (three-step). Furthermore, we also provide clear visual results of multi-step predictions for the six models in Figure 5. The results from Table 5 and Figure 5 indicate that the proposed HGANN model had the best robustness and the highest wind speed prediction accuracy among all compared models.

**Figure 5.** Multi-step forecasting experiments under RMSE, SSE, MAPE, and MAE indicators: (**a**) experiment results on the RMSE indicator; (**b**) experiment results on the SSE indicator; (**c**) experiment results on the MAPE indicator; (**d**) experiment results on the MAE indicator.

(3) Experiment III: Ablation Experiment Between Single Models and Hybrid Models.

To verify the rationality of the proposed HGANN model, it was compared with WGAN-OBLS, OBLS, WGAN, BLS, and CNN on the HER dataset. The generator of HGANN is OBLS and its discriminator is the discriminator of WGAN. To emphasize the effectiveness of OBLS, it was compared with the generator of WGAN, namely, CNN. Similarly, all compared models were repeatedly trained and tested 10 times. In the HER dataset, 2880 data from 23 August to 22 September were selected for this experiment. The experimental results are shown in Table 6, where the first-best predictions are highlighted with dark gray backgrounds. The forecast results for 22 September 2019 are plotted in Figure 6, which also shows the forecast errors in superimposed shades.

**Table 6.** Forecasting performances of the proposed model and reference models.


Figure 6 shows that among all the compared models, our proposed model had the best curve fitting and the smallest predicted error. The suggested model consistently outperformed WGAN-OBLS, PSO-BLS, WGAN, OBLS, CNN, and BLS, as shown in Table 6. This further demonstrates the advantages of our proposed model, as it combines CEEMDAN, OBLS, and WGAN.

**Figure 6.** Forecasting results in HER dataset (22 September 2019).

Furthermore, first, compared with WGAN-OBLS without CEEMDAN, the proposed model had better predictive performance due to its covering CEEMADN and WGAN-OBLS, thus showing the effectiveness of CEEMDAN. Second, compared with WGAN or OBLS, WGAN-OBLS provided better predictive performance due to combing OBLS and WGAN, thus showing the effectiveness of OBLS and WGAN in WGAN-OBLS. Third, compared with BLS, OBLS provided better predictive performance due to using the improved PSO, thus showing the effectiveness of PSO in OBLS. Fourth, OBLS had better predictive results compared to the generator of WGAN, namely, CNN. This demonstrates the advantage of OBLS over CNN as a generator. This may be due to the flexible structure and better error convergence of OBLS.

#### **4. Discussion**

Our model was compared with five advanced models to evaluate its performance and advantages in various wind sequence experiments. Experimental results show that the proposed model had better predictive performance. The reasons behind this fact are given as follows.

First, the wind speed data were one-year data from wind farms in Germany and China, which cover complex fluctuation characteristics. Therefore, our HGANN model uses CEEMDAN to smoothen the volatility of the data and improve the predictive performance.

Second, HGANN uses OBLS as the generator to provide a special shallow broad incremental learning network structure, which can not only be beneficial for improving prediction accuracy for one-dimensional wind speed prediction compared to CNN but also greatly decrease computational cost using pseudo-inverse operations to determine the network weights instead of using convolution operations.

Third, in our HGANN model, the proposed OBLS uses an improved PSO to optimize the hyper-parameters of its network, which can search in a wider range and obtain the optimal parameters over BLS. Therefore, OBLS has better generalization ability than BLS.

Finally, HGANN can better extract the deeper features of wind speed data by playing a minimum–maximum game between the generator and discriminator for wind speed prediction.

#### **5. Conclusions**

Although existing various hybrid predictive models have provided competitive performance in ultra-short-term wind speed prediction, they still need to be further improvedfor instance, how to effectively reduce the computational cost of hybrid predictive models, and how to effectively deal with the multicollinearity problem of the hybrid forecasting model based on weighted strategy, which leads to the problem of reduced forecasting accuracy. To enhance the predictive power and decrease the computational cost, this paper proposes the HGANN model for ultra-short-term wind speed forecasting. HGANN is a generative adversarial network in which the generator and discriminator play against each other to obtain wind speed predictions with high accuracy. In HGANN, we developed OBLS and the convolutional structures as the generator and the discriminator, respectively, which enables them to obtain effective synergies to improve predictive performance. Particularly, OBLS involves a special shallow broad incremental learning network structure, which can effectively deal with one-dimensional wind speed data. Furthermore, the shallow network structure of OBLS can also significantly decrease computational cost via using pseudo-inverse operations rather than convolution operations. In addition, the proposed OBLS applies an improved PSO to obtain the optimal network hyper-parameters. CEEMDAN performs noise reduction and decomposition of the wind data. Via the above rational combination, the proposed HGANN provides high predictive accuracy and generalization ability with low computational cost in ultra-short-term wind speed prediction. The experimental results indicate the above fact. For instance, the RMSE predictive errors of the proposed model were 29.35%, 49.22%, 38.09%, and 30.10% compared to the four state-of-art predictive models PSO-ANFIS, VMD-GA-BP, EEMD-GPR-LSTM, and MWS-CE-ENN on the spring wind data of the HER dataset, respectively. In the future, we plan to use parallel computing to speed up the process of PSO optimization of BLS during training. Furthermore, the proposed HGANN will be extended to a wider range of applications, such as financial time-series forecasting, electricity-load forecasting, traffic forecasting, etc.

**Author Contributions:** Conceptualization, Q.W. and L.H.; methodology, Q.W.; software, J.H.; validation, Q.W., J.H. and L.H.; formal analysis, L.H.; investigation, Q.W. and Q.L.; resources, J.H.; data curation, Q.W. and Q.L.; writing—original draft preparation, L.H.; writing—review and editing, L.C. and Y.L.; visualization, Q.W.; supervision, P.X.L.; project administration, P.X.L.; funding acquisition, C.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported in part by the National Natural Science Foundation of China under grants 62173176 and 61863028, and in part by the Science and Technology Department of Jiangxi Province of China under grant 20204ABC03A39.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Nomenclature**



#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

MDPI Books Editorial Office E-mail: books@mdpi.com www.mdpi.com/journal/books

Academic Open Access Publishing

www.mdpi.com ISBN 978-3-0365-7645-9