*Article* **Harris Hawk Optimization-Based Deep Neural Networks Architecture for Optimal Bidding in the Electricity Market**

**Kavita Jain 1, Muhammed Basheer Jasser 2, Muzaffar Hamzah 3,\*, Akash Saxena <sup>1</sup> and Ali Wagdy Mohamed 4,5**

	- Giza 12613, Egypt; aliwagdy@gmail.com

**Abstract:** In the power sector, competitive strategic bidding optimization has become a major challenge. Digital plate-form provides a superior technical base as well as backing for the optimization's execution. The state-of-the-art frameworks used for simulating strategic bidding decisions in deregulated electricity markets (EM's) in this article are bi-level optimization and neural networks. In this research, we provide HHO-NN (Harris Hawk Optimization-Neural network), a novel algorithm based on Harris Hawk Optimization (HHO) that is capable of fast convergence when compared to previous evolutionary algorithms for automatically searching for meaningful multilayered perceptron neural networks (MPNNs) topologies for optimal bidding. This technique usually demands a considerable amount of time and computer resources. This method sets up the problem in multidimensional continuous state-action spaces, allowing market players to get precise information on the effect of their bidding judgments on the market clearing results, as well as implement more valuable bidding decisions by utilizing a whole action domain and accounting for non-convex operating principles. Due to the use of the MPNN, case studies show that the suggested methodology delivers a much larger profit than other state-of-the-art methods and has a better computational performance than the benchmark HHO technique.

**Keywords:** electricity market; optimal bidding; Harris Hawk Optimization; multi layered neural network; bi-level optimization; strategic bidding

**MSC:** 68T01; 68T05; 68T07; 68T09; 68T20; 68T30

#### **1. Introduction**

Many economic systems used in the power sector's research are centralized, optimize the objective of the system (e.g., maximizing social welfare) and assume that market participants behave in a perfectly competitive (price-taking) manner. However, increasing attempts to deregulate the power industry have resulted in increased competition among several self-interested (profit-driven) market competitors, particularly in the production and supplier domains. Due to the fact that self-interested market player's nature is not always allied with global targets, existing centralized frameworks are no longer able to give accurate perspectives. Consequently, emerging market methods that are useful are capable of tracking the strategic (price-making) behavior of self-interested market players as well as recognizing the market outcomes that result from their interactions [1].

Power production and transmission have evolved from vertically integrated operations to market-driven operations in the world's main economies. The liberalization of

**Citation:** Jain, K.; Jasser, M.B.; Hamzah, M.; Saxena, A.; Mohamed, A.W. Harris Hawk Optimization-Based Deep Neural Networks Architecture for Optimal Bidding in the Electricity Market. *Mathematics* **2022**, *10*, 2094. https://doi.org/10.3390/math10122094

Academic Editor: Jian Dong

Received: 16 May 2022 Accepted: 10 June 2022 Published: 16 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the power sector has resulted in increased effectiveness through rivalry among market players. In the energy sector, generators are the ideal candidates for iteration in rivalry to enhance the effectiveness and competitiveness in allocation of resources, as well as to compete for the cheapest cost with the superior products. Real-time balancing and day ahead are the two kinds of energy markets. In a real market, rivals purchase energy during the business days, and energy prices are established daily or hourly based on requirement and production. A study of spikes in electricity price has been conducted in reference [2]. In a day ahead market, participants purchase wholesale energy in the day-ahead power sector a day before the operating days, and the discrepancy among planned and real needs on the operating day is compensated in the real time marketplace [3].

An energy pool in the day ahead marketplace is one of the key services in the wholesale competitive liberated energy sector in the reformation of the electrical energy industry across the world [4]. The Independent System Operator (ISO) finalized generators bids for every hour of the next day in the day ahead marketplace. So each generator provides bids to the ISO for each hour of the following day. The proposed price must be greater than zero but less than the industry restriction. The ISO examines bids to establish the hourly MCP and the amount of power to be delivered by each generator's framework of the proposed cost and MCP [5]. The EM mechanism that is explained above is shown in Figure 1.

**Figure 1.** Electricity Market Mechanism.

A two-auction system exists: (a) uniform price, which provides just at agreed level (with such a price available at a cheaper clearance cost) based on hourly MCP, and (b) pay-as-bid auction, which pays at the accepted level of each generator based on its applicable rate [6]. In every marketplace, the price-forecasting process is linked to: (a) size of the market (the amount of prospective producers and consumers); (b) market structure and process (including such payment methods or accessibility) [7]; (c) degree of access to market and player's information; and (d) risk analysis options [8]. Even so, there really are discrepancies among energy and other commodity markets, such as:


Optimization approaches influenced by nature: swarm intelligence is a popular branch of artificial intelligence in which algorithms are created by emulating the intelligent behaviour of various animals such as wolves, whales, ants, lions, crows, and bees. HHO [11] is a swarm intelligence-based method that was recently discovered to solve real-world optimization problems. Ali Asghar Heidari et al. found it in 2019 [11] after being inspired by Harris 'hawks' cooperative behaviour and chasing style in a setting known as amazement jump. A few hawks agreeably jump a victim from varied angles in order to astonish it. Harris hawks can find a variety of pursuing examples based on the rabbit's distinct concept of situations and getting away from instances. To nurture an optimization method, this research endeavour numerically simulates such powerful examples and behaviours. Regardless of the fact that there are various nature-inspired optimization methods based on stochastic behaviour available in the literature, the HHO algorithm was chosen in this study due to its acceptable search space exploration strength when compared to other meta-heuristic algorithms. A multi-strategy search was inspired by the prey hunting behaviour of Harris's hawk. The least squares support vector machine (LSSVM) was utilised to represent the reactive power output of the synchronous condenser, and the developed Harris Hawks optimization algorithm was employed to optimise it.

Times series system [12], GARCH system [13], a mixture of wavelet transform and ARIMA [14], fuzzy auto regression framework [15], game theory [16], Bayesian optimization [17], neural network [18], and a combination of a machine learning algorithm and bat [19] have all been presented in the literature as methodologies for predicting electricity prices. Game theory, time series models, and simulation models make up the MCP prediction method; time series architectures are classified into three categories: stochastic models, machine intelligence models, and casual models [20].

#### *1.1. Related Work*

ANN has been identified to be the most appropriate model for predicting MCP from the above described models. They can evaluate the complicated relationship among next day MCP and past information of demand and other factors such as type of day, load, temperature, settling point, period, and so on. Among the most extensively utilized techniques for forecasting MCP judging by past data are NN architectures [21]. Authors used an NN to estimate MCP for the Australian power industry for many hours. Because the synthetic data and the Euclidean distance norm with normalized considerations were used to pick comparable days, the study revealed a significant exponential relationship with a rise in the hourly price prediction from 9.75% for one-hour-ahead forecasts to 20.03% for six-hour-ahead forecasts [22].

A Combinatorial Neural Network (CNN) was presented by Abedinia et al. to predict MCP in the Pennsylvania-New Jersey-Maryland (PJM) and mainland Spain markets. The parameters of the NN in CNN are optimized using the Chemical Reaction Optimization (CRO) technique in this system, and the mean WME is equivalent to 4.04% [5]. Authors

created a novel hybrid method for predicting MCP in the Spanish and Pennsylvania-New Jersey-Maryland energy markets by merging the bat optimization with an NN; numerical results reveal that the average MAPE value is less than 1% [19]. Anbazhagan and Kumarappan suggested a model to predict MCP in the Spanish and New York electrical markets, with improvements in the average MPCE of 0.7 percent and 0.9 percent above the NN method, correspondingly [23].

The above-mentioned NN methods can be subdivided as: a heuristic technique whihc was used to measure the periodicity pattern of MCP in group 1, and another which was used to improve NN performance in group 2. Each of these qualities has indeed been addressed in this article. It is difficult to enhance MCP forecast accuracy because of its inherent stochastic and nonlinear tendencies. As a result, we report a novel hybrid model architecture that combines NN, PSO, and GA algorithms. Rather than using the classic back propagation method, PSO is used to enhance the learning power of a conventional neural network and optimize the weights of the NN, resulting in a local optimal configuration [19]. The GA is used to optimize the number of hidden layers on the NN [19] because networks are important to a number of neurons in their hidden layers. Since MCP has a periodicity tendency, the K-means method was used to analyse the NN's training dataset and identify MCP's periodicity trend [24]. This study proposes a machine learning-driven portfolio optimization methodology for virtual bidding in electricity markets that takes both risk and price sensitivity into account. To maximise profit, an algorithmic trading strategy is built from the standpoint of a proprietary trading firm [25]. The goal of this article is to maximise power market transactions and clearing price using metaheuristic algorithms. To achieve feasible results in a short amount of time, the exhaustive search algorithm is implemented using a parallel computer architecture. The global optimal outcome is used as a metric to compare the effectiveness of various metaheuristic algorithms. The results, discussion, comparison, and recommendations for the suggested set of algorithms and performance tests are presented in this work [26]. This article gives a case study of Pakistan's electricity system, with information on electricity generated, connected load, frequency deviation, and load shedding during the course of a 24-h period. The data were evaluated using two methods: a traditional artificial neural network (ANN) with a feed forward back propagation model and a Bootstrap aggregating or bagging approach.

#### *1.2. Major Contributions and Paper Structure*

As previously stated, the following is a summary of our suggested method's contribution:


The arrangement of this manuscript is systematized as follows in Figure 2.

**Figure 2.** Systematized format of manuscript.

#### **2. Problem Formulation**

The following section presents a brief overview of the Gen-Co bidding strategy approach as well as the issue's algebraic formulation. The sub-sections that follow cover everything that was previously mentioned.

#### *2.1. Bidding Strategy*

A generating company has run at a high degree of efficiency to flourish in a fierce competition. However, in the electricity sector, good execution may not be enough since, in order to generate the most profit, it must sell its products at a competitive price. Different factors affect a producing company's profit, including its own bids, bids submitted by competitors, overall energy demands, and so on. Despite the fact that a generating company has no control over its competitors' bids or the energy demand, it can create its own strategy for putting in a bid that maximizes profit while minimizing risk, as shown in Figure 3.

**Figure 3.** Supplier side's bidding strategy.

#### *2.2. Mathematical Formulation*

The market operator plots an upward production possibilities curve and a vertical line for consumer expectations following accepting offers from the Gen-Co's. The point where the two curves intersect is the equilibrium point, and the straight line drawn from the equilibrium point on the y-axis determines the system's MCP.

#### 2.2.1. Objective Function

The profit of Gen-Co is calculated using the given (1).

$$\text{Gen}-\mathbb{C}o\_{k,profit} = \text{Revenue} - \text{Gen} - \mathbb{C}o\_{k,cost} \tag{1}$$

here,

$$Revenue = MCP \times Q\_k \tag{2}$$

$$\text{Gen}-\text{Co}\_{k,\text{cost}} = a\_k \text{Q}\_k + b\_k \text{Q}\_k^2 \tag{3}$$

where, *Qk* is the amount of quantity of *k*th Gen-Co trade in the market. *ak* and *bk* are the cost coefficients.

2.2.2. Operating Constraints

i Generation Limits

$$Q\_{\min} \mathbb{U}\_{k(t)} \le Q\_{k(t)} \le Q\_{\max} \mathbb{U}\_{k(t)} \; \forall t \in T \tag{4}$$

ii Inter-temporal constraints

$$(1 - \mathcal{U}\_{k(t+1)})M\_k^{ut} \le h\_{k(t)'}^{on} \text{ if } \mathcal{U}\_{k(t)} = 1. \tag{5}$$

$$\mathcal{U}l\_{k(t+1)}\mathcal{M}\_k^{dt} \le h\_{k(t)}^{off} \qquad \text{if } \mathcal{U}\_{k(t)} = 0. \tag{6}$$

iii Limits on bid price

$$\mathbb{C}\_{\min} \le Gen - \mathbb{C}o\_{k\varphi cost} \le \mathbb{C} \tag{7}$$

#### **3. Proposed Technique**

*3.1. Harris Hawks Optimization Algorithm: A Framework*

Ali Asghar Heidari et al. [11] presented the Harris Hawks Optimization (HHO) metaheuristic algorithm in the year 2019. Harris hawks exhibit superb social behavior with respect to hunting and attacking the bunny. Searching for a bunny, hitting it in various ways, and executing a rapid jump are all part of the exploitative aspects of the technique. Harris hawks spread to various locations in search of rabbits, and they use two distinct investigating tactics. Aspirants are perhaps the intended prey or very close to it, with the targeted prey or really close to it being the ideal. Harris hawks perch in a spot similar to those of other families, as well as the bunny in the very first encounter (prey). The hawks in the second step look for tall trees randomly. The HHO then progresses an optimization process by mathematically faking such beneficial approaches and behaviors.

#### 3.1.1. Initialization Step

At this step, the search space and objective function are defined. Furthermore, the first population-based chaotic maps are being developed. In addition, all of the attribute values have been specified.

#### 3.1.2. The Step of Exploration

During this step, all Harris hawks are viable candidate responses. In each cycle, the fitness value is computed for each of these viable alternatives based on the desired prey. Two approaches have been introduced to replicate the exploring capabilities of Harris hawks in the search area, as specified in Equation (8).

$$z(i+1) = \begin{cases} |z\_r(i) - g\_1|z\_r(i) - 2g\_2z(i)| & q \ge 0.5\\ \left(z\_{rabbit}(i) - z\_a(i)\right) - g\_3(lb + g\_4(\mu b - lb)) & q < 0.5 \end{cases} \tag{8}$$

The hawks' positions within (ub-lb) borders are based upon two precepts: (1) create the responses using a hawk from the current population as well as other hawks randomly and (2) build outcomes based on the prey's position, the average hawk's location, and random weighted elements. Despite the fact that g3 is a scale parameter, if the value of r4 reaches one, it will help to boost the unpredictability of the algorithm. This law adds an arbitrarily scaled drive length to lb.

Additional dynamic capabilities to investigate other sections of the feature space are explored with a random scaled component. The average hawk posture (solutions) is stated as follows in Equation (9):

$$z\_a(i) = \frac{1}{n} \sum\_{l=1}^{n} z\_l(i) \tag{9}$$

Once the hawk uses the random hawks' information to catch the rabbit, rule 1 is usually applied in Equation (8). Rule 2 is executed once all hawks have accepted the finest hawk and the optimal option have been picked.

#### 3.1.3. From Exploration to Exploitation Transition

This step depicts how HHO progresses from exploration to exploitation based on the bunny's level of energy (E). The strength of the bunny is gradually depleted as a result of the bunny's escaping behaviors, as according HHO. The energy needed decline is modeled in Figure 4.

**Figure 4.** For 250 iteration, E's behavior changes with time.

*E*<sup>0</sup> is the expected power decline, as shown in Equation (10).

$$E = 2E\_0 \left( 1 - \frac{i}{I} \right), \quad E\_0 \in [-1, 1] \tag{10}$$

#### 3.1.4. Step of Exploitation

In this step, the exploitation step is accomplished by employing four distinct factors. The position that was discovered during the exploration stage determines these strategies. Despite the hawks' best efforts to track it down and catch it, the prey frequently sought to flee. To emulate the hawks' offensive style, HHO exploitation employs four basic strategies. The four strategies are soft besiege, soft besiege with progressive speedy dives, hard besiege, and hard besiege with progressive speedy dives. These approaches are contingent on two variables, *r* and |*E*|, which label the technique to be cast-off. Where, |*E*| is the prey's escaping energy and r is the probability of escaping, with *r* < 0.5 indicating a better likelihood of the prey escaping effectively and *r* ≥ 0.5 representing an unsuccessful escape. The following is an overview of these approaches:

• Soft besiege approach

The rabbit has some energy to escape in the soft besiege method, where *r* ≥ 0.5 and |*E*| ≥ 0.5, while the hawks are softly encircling the prey suddenly lost additional energy before completing the unexpected pounce. In Equations (11)–(13), soft besiege is described mathematically.

$$z(i+1) = \Delta z(i) - E|jz\_{rabbit} - z(i)|\tag{11}$$

$$
\Delta z(i) = z\_{rabbit} - z(i) \tag{12}
$$

$$j - 2(1 - r\_5), \text{g.s.} \in [0, 1] \tag{13}$$

• Hard besiege approach

The prey is so tired when *r* ≥ 0.5 and |*E*| < 0.5 and at this time the escaping energy is very low. In addition, the Harris hawks scarcely encircle the planned prey to at long last play out the unexpected pounce. The present positions are updated in this condition by using Equation (14).

$$z(i+1) = z\_{rabbit}(i) - E|\Delta z(i)|\tag{14}$$

An easy instance of this step with one hawk is represented in Figure 5.

**Figure 5.** In the scenario of a hard besiege, a sample of overall vectors.

• Soft besiege approach with progressive speedy dives

In this circumstance *r* < 0.5 and |*E*| ≥ 0.5, the rabbit has enough energy to flee. The hawks move astutely around the prey and calmly plunges before the amazed jump. The harris hawk's position is refreshed in two stages throughout this action, which is referred to as adaptive soft besiege. In the initial stage, the harris hawks advance near the rabbit by assessing the following move of the rabbit as shown by Equation (15).

$$y = z\_{rabbit}(i) - E|j z\_{rabbit}(i) - z(i)|\tag{15}$$

In the subsequent stage, the harris hawks concluded to jump, in view of the examination between the past jump and the conceivable outcome. In case it is not, the harris hawk delivers an unpredictable jump, based on the Levy Flight (LF) idea, as detailed in Equation (16).

$$\mathbf{x} = \mathbf{y} + r\mathbf{v} \times \mathbf{l} \mathbf{f}(\text{Dim}) \tag{16}$$

In Equation (16), the *Dim* represents the dimension of the solutions, *rv* is random vector of size 1 × *Dim*. *lf* is for the levy fight function and it is calculated using Equation (17).

$$\ln f(z) = 0.001 \times \frac{\mu \times \sigma}{|v|^{\frac{1}{\beta}}} \prime = (\frac{\Gamma(1+\beta) \times \sin(\frac{\Gamma\beta}{2})}{\Gamma(\frac{1+\beta}{2}) \times \beta \times 2(\frac{\beta-1}{2})})^{\frac{1}{\beta}} \tag{17}$$

where *u*, *v* are random values inside (0.1) and *β* is a default constant and it is 1.5. In the soft besiege stage, Equation (11) can update the positions of hawks in the final strategy.

$$\mathbf{x} = \begin{cases} yiff(y) < f(z(i)) \\ xiff(x) < f(z(i)) \end{cases} \tag{18}$$

An example of this process for one hawk is shown in Figure 6. Sometimes during iterations, the position background of LF-based leapfrog trends is also documented and shown in this illustration. There is an LF-based trend in one trial, and the colored dots are the location footprints. It will only be possible to pick the best position Y or Z in every step. It is the same for all of the search agents.

**Figure 6.** In the scenario of a soft besiege with progressive speedy dives, a sample of overall vectors.

• Hard besiege with progressive speedy dives

The rabbit has not so much energy to escape at *r* < 0.5 and |*E*| < 0.5 and before the surprise pounce to capture and kill the prey, a hard besiege is constructed is shown in Figure 7.

There are similarities between this step and the soft besiege, and yet this time, the hawks try to reduce one's average distance from the escaping prey. This is why it is necessary to follow all aspects when under hard besiege:

$$\mathbf{x} = \begin{cases} yiff(y) < f(z(i)) \\ xiff(x) < f(z(i)) \end{cases} \tag{19}$$

where, *y* and *x* are calculated using Equations (20) and (21).

$$y = z\_{rabbit}(i) - E|jz\_{rabbit}(i) - z\_{a}(i)|\tag{20}$$

$$\infty = y + rv \times lf(Dim) \tag{21}$$

**Figure 7.** In the circumstance of a hard besiege with progressive speedy dives in 2D and 3D dimension, an example of overall vectors.

The procedure of the HHO algorithm is presented in Figure 8.

**Figure 8.** The Procedure of HHO.

#### **4. Deep Neural Network**

Deep neural network is a machine learning discipline that focuses on understanding several levels of representations by creating a structure of features in which the top levels are described by the lower tiers, and the same lower tier features can be used to construct many top level features [27]. The relation between artificial intelligence, machine learning and deep learning is presented in Figure 9.

**Figure 9.** Relation between Artificial Intelligence, Machine Learning and Deep Learning.

To describe more complicated and nonlinear relationships, the DL (Deep Learning) structure extends classic neural networks (NN) by adding more hidden layers to the network design between the input and output layers. This approach has piqued the interest of academics in recent years due to its superior performance in a variety of EM applications [28–30]. Convolutional-Neural Networks (C-NN) have become a popular DL design in recent years because they can perform sophisticated functions using convolution filters. Another DL design that is commonly used for classification or regression with success in many areas is the Deep Neural Network (DNN). It is a common feed forward network in which the input passes from the input layer to the output layer via a number of hidden layers that exceed two [30]. The usual design for DNNs is shown in Figure 10, where Ni is the input layer, which contains neurons for input features, No is the output layer, which contains neurons for output classes, and Nh,l are the hidden layers.

**Figure 10.** Three Layered Neural Network.

#### **5. Harris Hawk Optimization of Deep Neural Networks Architecture**

Figure 11 shows the general framework of the suggested model for estimating the optimal bids in EM. The suggested model's main goal is to improve the performance of the NN by applying the HHO algorithm to discover the ideal NN weights, hence the name HHO-NN. The suggested HHO-NN begins by determining the beginning value for a group of N individuals X. Each of these individuals represents the NN weights, thus we have a collection of N networks from which to choose the best. As a result, the data set is randomly divided into training and testing sets of 70% and 30% respectively.

**Figure 11.** The proposed HHO-NN method.

The training data set is used to analyse the existing network's (solution) effectiveness by determining the corresponding objective function, which is dependent on the original value *yi* and the forecast value *y*.

$$fit = \sqrt{\frac{\sum\_{l=1}^{ns} y\_l - \hat{y\_l}}{ns}} \tag{22}$$

The next phase is to locate the network, Yb, with the lowest fitness value. Then, using the optimal solution and the HHO's operation, the other solutions will be modified. When the stopping circumstances are achieved, the process of updating solutions and determining the best option will be completed. The test set is used to determine the quality of the output to evaluate the performance of the best network developed during the training phase.

#### **6. Simulation Results and Experimentation**

The variation is written in MATLAB 2019 and operates on a 4.00 GHz i5 processor with 8 GB of RAM. The number of iterations and population size for all algorithms are kept constant in order to draw an evaluation of optimization routines (i.e., maximum number of iterations = 500 and number of search agents = 50).

To test the forecasting potential of the ANN version, extraordinary criteria are used. This potential can be checked after the MCP is calculated. The four types of errors are checked in this, which are the root mean of squared error (RMSE), the mean absolute error (MAE), the mean absolute percentage error (MAPE) and the coefficient of co-relation (CC).The performance result of these tests is shown in Table 1 Regression verification of the proposed algorithm was also undertaken to check the validation of the same.

• Mean Absolute Error (MAE) The MAE is the average of the absolute values of the forecasting error and s calculated with Equation (23).

$$MAE = \frac{1}{1000} \sum\_{i=1}^{1000} \left| f\_i - \hat{f}\_i \right| \tag{23}$$

• Mean Absolute Percentage Error (MAPE) The mean absolute percentage error is usually taken as a loss function for solving the problem of regression and in evaluation of model because of its very instinctive clarification in terms of relative error, as shown in Equation (24).

$$MAPE = \frac{1}{1000} \sum\_{i=1}^{1000} \left| \frac{f\_i - \hat{f\_i}}{f\_i} \right| \times 100 \tag{24}$$

• Root Mean Square Error (RMSE) The RMSE is defined as the square root of the second sample moment of the differences between predicted and actual data. RMSE is shown in Equation (25).

$$RMSE = \sqrt{\frac{1}{1000} \sum\_{i=1}^{1000} \left( f\_i - \hat{f\_i} \right)^2} \tag{25}$$

	- **–** A strong positive association is indicated by a value of one.
	- **–** A negative association is indicated by a value of −1.
	- **–** A zero means that there is no connection at all.

$$\text{CC} = \frac{\sum\_{i=1}^{1000} (\hat{f}\_i - \overline{\hat{f}}\_i)(f\_i - \overline{f}\_i)}{\sqrt{\sum\_{i=1}^{1000} (\hat{f}\_i - \overline{\hat{f}}\_i)^2 \sum\_{i=1}^{1000} (f\_i - \overline{f}\_i)^2}} \tag{26}$$



**Table 1.** Performance Test Results.

**Figure 12.** Regression Verification.

#### **7. Application of HHO-NN on Optimal Bidding Challenge of Electricity Market**

IEEE-14 bus test system is taken, where, three competing generating companies compete with Gen-Co-G. The competition is for selling power in EM, the bidding strategy is designed for optimum output. Table 2 shows the bid data of competitor prices and Table 3 shows the power-blocks data of Gen-Co-G [30].

For constructing the neural network, we have generated 1000 samples for preparing the HHO-NN, out of these data, 70% has been used for training and the remaining 15% has been kept for validation purposes. We report the results of some unknown samples in this analysis. A toal fo ten unknown samples were taken and the analysis of these samples is depicted through Figures 13 and 14 are the input and target data to train the NN. The input data to train the NN for the strategic bidding problem in EM are competitors' bidding data and the target data are profit of Gen-Co-G for the same inputs.


**Table 2.** Rival's Bidding Data for IEEE-14 Bus System [30].

**Table 3.** Power Blocks data of Gen-Co-G for IEEE-14 Bus System [30].


**Figure 13.** Input data for training the Neural Network.

Figure 15 represents the input data for testing the trained NN for the specific problem of attaining the optimal bids and optimal profit of the Gen-Co-G in the EM.

Figure 16 represents the profit curve obtained by the selected algorithms. From the figure, it can be observed that the proposed supervised net yields maximum profit as compared to other Monte Carlo-based optimization approaches. The cumulative profit calculated by this architecture is (\$153,275). However, the profit calculated by HHO is (\$138,758.75) , ALO-NN is (\$128,543.42), GWO-NN is (\$117,641), MFO-NN is (\$121,051), ALO is (\$80,176.20458), GWO is (\$126,070.0738), SSA is (\$86,375.01205) and WOA is (\$119,826.25). This is due to the better anticipating capability of market conditions by the HHO tuned neural network.

**Figure 14.** Target data for training the Neural Network.

**Figure 15.** Input data for testing the Trained Neural Network.

**Figure 16.** Comparative Analysis Cumulative Profit.

#### **8. Conclusions and Future Scope**

The proposed Harris Hawk Optimization-based Deep Neural Networks Architecture is used to investigate the optimum bidding strategy problem in the power market. Using the expected load and competitors' bidding data, the planned architecture determines the best bidding technique for maximising the profit. For various power demand values, provider end income, customer end profit, and MCP values, the proposed methodology is examined. IEEE 14 is used to dissect the viability of the suggested technique in the MATLAB/Simulink platform. The proposed approach displays excellent productivity by combining the relative examination with alternate techniques such as ALO-NN, GWO-NN, MFO-NN, ALO, GWO, SSA, WOA and standard HHO.

The proposed calculations have the advantages of reduced computational complexity and good accuracy in extrapolating subjective data. Furthermore, the proposed work's statistical measurements such as mean and standard deviation, as well as performance metrics such as best, worst, average, and computational time, were validated. It was demonstrated that the proposed methodology outperformed other strategies in terms of statistical measures when compared to methodologies for comparing results.

Furthermore, the investigations on the bigger network with multiple players and more constraints pertaining to generation, transmission limits, transmission congestion and consumer side bidding, will be addressed in our future publications.

**Author Contributions:** Conceptualization, K.J., A.S. and M.B.J. writing—original draft preparation: K.J. and A.S.; Data curation, K.J. and A.S.; Funding acquisition, M.H.; Investigation, K.J., A.S., M.B.J., M.H. and A.W.M.; Methodology, A.S.; Project administration, Resources, Supervision, A.S., M.B.J., M.H. and A.W.M.; Writing—review & editing, K.J., A.S., M.B.J. and A.W.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is funded by the UMS publication grant scheme.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**



#### **References**


**Feifei Hou, Xu Liu, Xinyu Fan and Ying Guo \***

School of Automation, Central South University, Changsha 410083, China **\*** Correspondence: yingguo@csu.edu.cn

**Abstract:** Cavity under urban roads has increasingly become a huge threat to traffic safety. This paper aims to study cavity morphology characteristics and proposes a deep learning (DL)-based morphology classification method using the 3D ground-penetrating radar (GPR) data. Fine-tuning technology in DL can be used in some cases with relatively few samples, but in the case of only one or very few samples, there will still be overfitting problems. To address this issue, a simple and general framework, few-shot learning (FSL), is first employed for the cavity classification tasks, based on which a classifier learns to identify new classes given only very few examples. We adopt a relation network (RelationNet) as the FSL framework, which consists of an embedding module and a relation module. Furthermore, the proposed method is simpler and faster because it does not require pre-training or fine-tuning. The experimental results are validated using the 3D GPR road modeling data obtained from the gprMax3D system. The proposed method is compared with other FSL networks such as ProtoNet, R2D2, and BaseLine relative to different benchmarks. The experimental results demonstrate that this method outperforms other prior approaches, and its average accuracy reaches 97.328% in a *four-way five-shot* problem using few support samples.

**Keywords:** ground-penetrating radar (GPR); cavity morphology recognition; few-shot learning (FSL); deep learning (DL); relation network (RelationNet)

**MSC:** 86-08

#### **1. Introduction**

Urban areas around the world continue to experience a series of sudden sinkhole collapses that cause severe traffic disruptions and significant economic losses. Underground cavities are the main reason for the formation of sinkholes. Complex conditions such as changes in drainage patterns, excessive pavement loads, and disturbances in infrastructure construction often lead to various cavities [1]. Therefore, the cavities may vary in morphology, for example, cavities with different shapes or combinations of several basic shapes, or they may be filled with different media or have different positions and sizes. Due to the unpredictability and morphological complexity of cavities, there is a growing need for their early recognition.

Ground-penetrating radar (GPR) has gradually been applied to the detection and perception of underground cavities [2], owing to its nondestructive inspection, strong penetrating ability, and high-precision characteristics. GPR transmitters emit electromagnetic (EM) waves into the surface at multiple spatial positions, and then the reflected signal can be measured by the GPR receiver to establish a two-dimensional (2D) GPR image. Three-dimensional (3D) GPR images can be obtained immediately when multichannel GPR transmitters and receivers exist parallel to the scanning direction at the same time [3,4]. The morphological scale of a cavity can reflect the evolution speed of the cavity and the severity of future road collapse, and it can also accurately reflect the 3D space state of the cavity. Therefore, the accurate detection of morphology has scientific value for studying the

**Citation:** Hou, F.; Liu, X.; Fan, X.; Guo, Y. DL-Aided Underground Cavity Morphology Recognition Based on 3D GPR Data. *Mathematics* **2022**, *10*, 2806. https://doi.org/ 10.3390/math10152806

Academic Editor: Andrej Brodnik

Received: 27 June 2022 Accepted: 1 August 2022 Published: 8 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

mechanism of cavity formation and summarizing the corresponding prevention and repair methods. However, as the quantity of the 3D GPR data increases, the manual analysis of the GPR data becomes time-consuming and difficult to meet the requirements of the efficient and fine detection of cavity morphology.

Three challenges in automating this task cannot be ignored. The first challenge is the selection and extraction of morphological features. Environmental complexities, such as the interference of surrounding pipelines and groundwater leakage, may impose difficulties in describing cavity morphological features. It is impossible to comprehensively describe the reflection properties with one or a few features. Additionally, the acquired morphological attributes still need to be inferred and identified by experienced professionals, making it difficult to obtain a general description feature. The second challenge is the fine classification of cavity morphology in the GPR data. Due to the variety of types, varying sizes, distinct directions and extensions, and irregular shapes, cavity morphology analysis faces difficulties in identification and fine classification. The third challenge is to address the issue of insufficiently labeled GPR data. Compared with objects such as buried rebars and pipes, the scale and diameter distribution of a cavity with collapse threat is relatively large. It is very laborious to make such a large target in the lab, and the targets are generally in regular forms, not universal and representative. To obtain reliable results, the lack of sample data must be considered and addressed.

Therefore, in this study, a novel deep learning (DL)-aided framework was proposed to extract the morphological features of cavities and classify them in facing a small number of 3D GPR data. Figure 1 shows the details of the proposed framework. The contribution of this work is twofold:


**Figure 1.** GPR cavity morphology recognition framework.

The rest of this paper is organized as follows: Section 2 introduces the literature review of the GPR cavity detection. In Section 3, the imaging scheme of the 3D GPR data is proposed. Section 4 introduces the details of the FSL network and RelationNet structure for morphological classification. In Section 5, the experimental results are compared and analyzed, and finally, in Section 6, conclusions are drawn.

#### **2. Literature Review**

Previous studies have focused on the automated GPR cavity detection process. Qin et al. [5] proposed a pattern recognition method based on the support vector machine (SVM) classifier to identify cavities in GPR images. Park et al. [6] combined instantaneous phase analysis with the GPR technique to identify hidden cavities. Hong et al. [7] developed a new time-domain-reflectometry-based penetrometer system to accurately estimate the relative permittivity at different depths and estimate the state of a cavity. Yang et al. [8] constructed a horizontal filter to identify cavity disease and eliminate the interference of rebar echo. Based on the data collected by multisensors such as unmanned aerial vehicles (UAVs) and GPR, the authors of [9,10] detected and analyzed cavity diseases in disasterstricken areas to rescue potential victims trapped in cavities. In 2022, Rasol et al. [11] reviewed state-of-the-art processing techniques such as machine learning and intelligent data analysis methods, as well as their applications and challenges in GPR road pavement diagnosis. To better localize pavement cracks and solve the interference of various factors in the on-site scene, Liu et al. [12] integrated a ResNet50vd-deformable convolution backbone into YOLOv3, along with a hyperparameter optimization method. To detect subsurface road voids, Yamaguchi et al. [13] constructed a 3D CNN to extract hyperbolic reflection characteristics from GPR images.

Previous results were based on the processing of only B-scans; however, once faced with specific subsurface objects, it was difficult to classify them using B-scans alone. In particular, the characteristics of various cavities in GPR B-scan images tended to be similar. Therefore, to improve the classification performance, both the GPR B-scan and C-scan images were considered in the classification process using the DL network [14–17]. Compared with the 2D GPR data, 3D data can provide rich spatial information and greatly improve the process in terms of data volume, imaging methods, and disease detection accuracy. Luo et al. [18] established a cavity pattern database including C-scans and B-scans, where the C-scan provides location information of objects, and B-scan information assists in verifying object types. Kim et al. [19] proposed a triplanar convolutional neural network (CNN) for processing the 3D GPR data, enabling automated underground object classification. Kang et al. [20] designed the UcNet framework to reduce the misclassification of cavities, and the next year, Kang et al. [21] developed a transfer-enhanced CNN to improve the classification accuracy. Khudoyarov et al. [22] proposed a 3D CNN architecture to process the 3D GPR data. The authors of another study [23] visualized and distinguished underground hidden cavities from other objects (such as buried pipes, and manholes). In 2021, Kim et al. [24] used the AlexNet network with the transfer learning technology to achieve underground object classification, further improving detection accuracy and speed. Abhinaya et al. [25] detected cavities around sewers using in-pipe GPR equipment and confirmed that YOLOv3 [26] was suitable for cavity recognition tasks. Liu et al. [27] combined the YOLO model and the information embedded in 3D GPR images to address the recognition issue of road defects. The above research demonstrated that, compared with using only B-scan images, the developed CNNs using both the B-scans and C-scans improved the classification performance. However, it was found that the cavity morphology is still indistinguishable due to the difficulty of cavity data acquisition and the lack of a GPR database.

Faced with such a problem, FSL [28,29], as a novel DL technique, was developed to generalize the network with very few or fewer training samples for each class. This changes the situation where traditional DL models must require large quantities of labeled data. FSL can be divided into three categories: model-based, optimization-based, and metric-based learning methods [30]. The model-based learning method first designs the model structure and then uses the designed model to quickly update parameters on a small number of samples, and finally directly establishes the mapping function of the input and prediction values. Santoro et al. [31] proposed the use of memory augmentation to solve this task and a memory-based neural network approach to adjust bias through weight updates. Munkhdalai et al. [32] proposed a meta-learning network, and its fast generalization ability is derived from the "fast weight" mechanism, where the gradients generated during training are used for fast weight generation. The optimization-based learning method completes the task of small sample classification by adjusting the optimization method instead of the conventional gradient descent method. Based on the fact that gradient-based optimization algorithm does not work well with a small quantity of data, Ravi et al. [33] studied an updated function or rule for model parameters. The method proposed by Finn et al. [34] can deal with situations with a small number of samples and can obtain better model generalization performance with only a small number of training times. The main advantages of this method are that it does not depend on the model form, nor does it need to add new parameters to the meta-learning network. The metric-based learning method is developed to measure the distance/similarity between the training set and the support set and completes the classification with the help of the nearest neighbor method. Vinyals et al. [35] proposed a new matching network, which aims to build different encoders for the support set and the batch set, respectively. Sung et al. [36] proposed a RelationNet network to model the measurement method, which learns the distance measurement method by training a CNN network.

#### **3. Imaging Scheme of 3D GPR Data**

#### *3.1. The 3D GPR Data Format*

The GPR data comes in three forms: A-, B-, and C-scan. The transmitter radiates EM waves to the underground, and the receiver collects signals reflected by underground objects or stratum interfaces, so as to obtain underground information. The original data format of the reflected signal is a one-dimensional (1D) waveform, which can also be called a GPR A-scan waveform. By scanning a region of interest with the single-channel GPR system, a 2D radargram can be obtained, called a GPR B-scan image. A single C-scan image is formed by imaging data points at the same depth in multiple B-scan images.

As shown in Figure 2, the *x*-axis is the same as the scanning direction, the *y*-axis denotes the width direction of the radar antenna device, and the *z*-axis indicates the depth direction of the measured object. The 3D GPR data can be represented as two kinds of orthogonal planes: B-scan and C-scan. B-scan and C-scan are arbitrary sections perpendicular to *y*and *z*-axes. This is conducive to identifying the cavity morphology from multiple different perspectives, which can effectively avoid the misjudgment of a single angle in the 2D image. Therefore, 3D GPR can guarantee the accuracy of the detection type and reduce the number of core sampling verifications.

**Figure 2.** Orthogonal slice planes (B-, C-scan) of 3D GPR data.

#### *3.2. GPR Morphological Data Extraction*

The hyperbolic signature in B-scan images is a kind of typical characteristic that is often used in the detection of underground objects. The C-scan image can reflect the detailed shape information of the subsurface cavity. The B-scan and C-scan images contain morphological information in length–depth and length–width directions, respectively. The color on scan images reflects the field strength (V/m) of subsurface media. In this study, we propose an automatic algorithm for cavity morphology classification using GPR B-scan and C-scan images simultaneously.

The two forms are extracted from the 3D GPR data *S*. Take the irregular cavity as an example: (1) B-scan images are sequentially extracted parallel to the *XOZ* plane, and their stacked display is shown in Figure 3a. It can be expressed as *S*<sup>1</sup> = {*B*1, *B*2, *B*3, ··· , *Bn*}, *n* = *ly*/ *y*, where *Bn* represents each slice image, *S*<sup>1</sup> is a collection of *n* B-scan slices, *ly*, and *y* are the model space length and space step size in the *y*-axis direction. (2) C-scan images are sequentially extracted parallel to the *XOY* plane. The horizontal slices are extracted at equal intervals, and their stacked display is shown in Figure 3b. The slices can be expressed as *S*<sup>2</sup> = {*C*1, *C*2, *C*3, ··· , *Cm*}, *m* = *l*z/ *z*, where *Cm* represents each C-scan image, *S*<sup>2</sup> is the set of *m* C-scan slices, *lz* and *z* are the model space length and space step size in the *z*-axis direction.

The typical GPR B-scan and the corresponding C-scan images of an irregular cavity are shown in Figures 4 and 5. The C-scan images (right) are intersectional layers of red lines on the B-scan images (left). In the B-scan image, the horizontal *x*-axis and the vertical *t*-axis indicate the GPR scanning trajectory (m) and the two-way travel time of EM wave (ns), respectively. In the C-scan image, the *x*- and *y*-axes indicate the GPR scanning trajectory (m) and the width of GPR equipment (m), respectively. As shown in Figure 4, the irregular cavity shows a hyperbolic pattern on the B-scan image and elliptical signatures on the C-scan image. As shown in Figure 5, the rectangular cavity shows a double hyperbolic signature on the B-scan image and quadrilateral signatures (Figure 5b) on the C-scan images. Therefore, different morphologies of cavities can be well-recognized using B-scan and C-scan images.

**Figure 4.** Typical GPR images of an irregular cavity: (**a**) B-scan image; (**b**) C-scan images.

**Figure 5.** Typical GPR images of a rectangular cavity: (**a**) B-scan image; (**b**) C-scan images.

#### *3.3. The 2D Morphological Image Generation*

Multiple B-scan and C-scan images are integrated into a 2D morphological map, where the B-scan and C-scan images are assigned to the upper and lower parts of the morphological map, respectively. Each cavity model corresponds to a 2D morphological image *I* = {*S*1, *S*2}, which consists of 8 B-scan images *S*<sup>1</sup> = {*B*1, *B*2, *B*3, *B*4, *B*5, *B*6, *B*7, *B*8}, and 12 C-scan images *S*<sup>2</sup> = {*C*1, *C*2, *C*3, *C*4, *C*5, *C*6, *C*7, *C*8, *C*9, *C*10, *C*11, *C*12}. The morphological image *I* consists of *S*<sup>1</sup> and *S*2, which is formed into a new 5 × 4 matrix and can also be expressed as Equation (1). Examples of the morphological images for each model are presented in Figure 6. The 2D morphological image is then used as the input of the following deep network.

$$I = \begin{Bmatrix} B\_1 & B\_2 & B\_3 & B\_4 \\ B\_5 & B\_6 & B\_7 & B\_8 \\ C\_1 & C\_2 & C\_3 & C\_4 \\ C\_5 & C\_6 & C\_7 & C\_8 \\ C\_9 & C\_{10} & C\_{11} & C\_{12} \end{Bmatrix} \tag{1}$$

**Figure 6.** The 2D GPR morphological images of (**a**) a spherical cavity, (**b**) a rectangular cavity, (**c**) a cylindrical cavity, and (**d**) an irregular cavity.

#### **4. Few-Shot Learning Designed for Morphology Classification**

#### *4.1. FSL Definition*

FSL is able to quickly identify new classes on very few samples. It is generally divided into three kinds of datasets: training set, support set, and testing set. The training set can be further divided into a sample set and a query set. If the support set contains *K* labeled examples for each of *C* unique classes, the target few-shot problem is called *C-way K-shot*. Figure 7 shows the FSL architecture for a *four-way one-shot* problem.

**Figure 7.** FSL architecture for a *four-way one-shot* problem with one query example.

The parameters *θ* are optimized by the training set, hyperparameters are tuned using the support set, and finally, the performance of function *f*(*x*, *θ*) is evaluated on the test set. Each sample *x*ˆ is assigned a class label *y*ˆ. The data structure in the training phase is constructed to be similar to that in the testing phase; that is, the sample set *S* and query set *Q* during the training simulate the support set and testing set at the testing time. The sample set *S* = {(*x*1, *y*1),(*x*2, *y*2),...,(*xm*, *ym*)} is built by randomly picking *C* classes from the training set with *K* labeled samples, and the rest of these samples are used in the query set *Q* = {(*x*1, *y*1),(*x*2, *y*2),...,(*xn*, *yn*)}.

#### *4.2. Relation Network Architecture and Relation Score Computation*

The relation network (RelationNet) is a typical metric-learning-based FSL method. In essence, metric-learning-based methods [36–41] compare the similarities between query images and support classes through a feed-forward pass through an episodic training mechanism [35]. The core of RelationNet is to learn a nonlinear metric through deep CNN, rather than selecting a fixed metric function. RelationNet is a two-branch architecture that includes an embedding module and a relation module. The embedding module is used to extract image features. The relation module obtains the correlation score between query images and sample images; that is, it measures their similarity, so as to realize the recognition task of a small number of samples.

Figure 8 represents the RelationNet architecture settings for FSL. The embedding module utilizes four convolutional blocks, and each convolutional block consists of a 64-filter 3 × 3 convolution, a batch normalization, and a ReLU nonlinearity layer. In addition to the above, the first two convolutional blocks also include a 2 × 2 max-pooling layer, and the latter two convolutional blocks do not contain the pooling layer. The output feature maps are then obtained for the following convolutional layers in the relation module. The relation module consists of two convolutional blocks and two fully connected layers. Each convolutional block is a 3 × 3 convolution containing 64 filters, followed by batch normalization, ReLU nonlinearity, and 2 × 2 max-pooling. For the network architectures, in order to generate relation scores within a reasonable range, in all fully connected layers, ReLU functions are employed, except for the output layer, in which Sigmoid is used.

**Figure 8.** RelationNet architecture settings.

The prior few-shot works use fixed pre-specified distance metrics, such as the Euclidean or cosine distances, to perform classification [35,42]. Compared with the previously used fixed metrics, RelationNet can be viewed as a metric capable of learning deep embeddings and deep nonlinearities. By learning the similarity using a flexible function approximator, RelationNet can better identify matching/mismatching pairs. Sample *xj* in the query set *Q* and sample *xi* in the sample set *S* are fed through the embedding module *f<sup>ϕ</sup>* to produce feature maps *fϕ*(*xi*) and *f<sup>ϕ</sup> xj* , respectively. Then, these two feature maps *fϕ*(*xi*) and *f<sup>ϕ</sup> xj* are combined using the operator C *fϕ*(*xi*), *f<sup>ϕ</sup> xj* . After that, the combined feature map is fed into the relation module *gϕ*, which finally produces a scalar in the range of 0–1 to represent the similarity between *xi* and *xj*, also called the relation score. Thus, the relation score *ri*,*<sup>j</sup>* is generated as shown in Equation (2):

$$r\_{i,j} = g\_{\phi}(\mathbb{C}(f\_{\phi}(\mathbf{x}\_i), f\_{\phi}(\mathbf{x}\_j))), \text{ i } = 1, 2, \dots, \mathbb{C} \tag{2}$$

Here, the mean square error loss is computed to train the model, as shown in Equation (3), regressing the relation score *ri*,*<sup>j</sup>* to the ground truth: the similarity of matched pairs is 1, and the similarity of unmatched pairs is 0.

$$\varphi, \phi \leftarrow \operatorname\*{argmin}\_{\varphi, \phi} \sum\_{i=1}^{m} \sum\_{j=1}^{n} \left( r\_{i,j} - \mathbf{1} \left( y\_i == y\_j \right) \right)^2 \tag{3}$$

#### *4.3. RelationNet-Based Cavity Morphology Classification Scheme*

Based on the data and structural characteristics of the GPR cavity morphological recognition system, we divide the complex processing process into two main parts: a training phase and a testing phase, as shown in Figure 9. The cavity morphologies are discussed here, e.g., cylindrical, rectangular, spherical, and irregular hemispherical. First, a training set is inputted to learn classification rules inside the network in the training phase. Then, in the testing phase, a small number of support samples (labeled) and test samples (unlabeled) are inputted into the trained network model, and the unlabeled test samples are predicted and classified, thereby outputting the final morphological classification results.

**Figure 9.** RelationNet-based GPR cavity morphology classification scheme.

Based on the above principle, the RelationNet-based GPR cavity morphology classification system first obtains the trained model on the training set and then recognizes the new category of cavity images. The embedding model is used to extract the feature information of each inputted GPR image and then concatenates the image features between the test sample and support sample. Then, the integrated features are inputted into the relation model for comparison. According to the comparison results, it is judged to which category the test sample belongs, so as to achieve the classification of cavity morphologies.

#### **5. Experiments and Results**

#### *5.1. Experimental Settings*

The classification category should be labeled for each morphological image since the DL-based classification algorithm is essentially a supervised learning algorithm. The choice of category is related to the morphology of the cavity. There are four types of cavities involved, namely cylindrical cavity, rectangular cavity, spherical cavity, and irregular cavity. To train CNN, the 2D morphological images with each image size of 498 × 395 × 3 pixels were fed to the input layer, and then the convolutional layers were used to extract multilevel image features by convolution kernels. A total of 68 GPR morphological images were obtained, each consisting of 8 B-scans and 12 C-scans. Among them, there were 17 spherical cavity images, 17 cylindrical cavity images, 17 rectangular cavity images, and 17 irregular cavity images.

We used the PyTorch 1.5 implementations of the LibFewShot package [43], which is a comprehensive FSL library and integrates the most advanced FSL methods. The code was implemented using NVIDIA RTX 3080Ti GPU and Intel i9-12900K CPU. In the training phase, in the FSL in all experiments, we used the Adam optimizer [44] with an initial learning rate of 10−3 and step decay. The backbone adopted Conv64F. The batch size was set to 128. In the testing phase, the test epoch was set to 10, and the test episode was 17, the *four-way one-shot* contained 16 query images, and the *four-way five-shot* had 12 query images for each of the 4 classes in each training episode.

We compared the results against those of various networks for few-shot recognition, including ProtoNet [42], R2D2 [45], and BaseLine [46]. Embedding backbones, such as Conv64F, ResNet12, and ResNet18, were also compared to identify and select the main backbone with a better performance. The main experiments were conducted on two benchmark datasets: miniImageNet [35] and tieredImageNet [47]. The miniImageNet dataset [35] consists of 60,000 color images with 100 classes, and the input images are resized to 84 × 84. The tieredImageNet dataset [47] consists of 779,165 color images with 608 classes, each of size 84 × 84.

#### *5.2. The 3D GPR Cavity Data Acquisition*

The GPR data are difficult to collect and label. To address the issue, a 3D GPR forward modeling tool, GprMax3D [48,49], is often used for the 3D modeling and simulation of underground structures. Based on the finite-difference time-domain (FDTD) method, the technique increases the volume of training data by creating synthetic GPR images. Maxwell's equations govern the propagation of EM waves used by the GPR. Figure 10 shows the flowchart of the GprMax3D forward simulation technique. GprMax3D was used to generate synthetic GPR images [50]. To further imitate the real situation, we set up cavity objects with uneven surfaces and random media to approximate real objects. An example of the GPR system's parameter setting is shown in Table 1.

**Figure 10.** GprMax3D simulation flowchart.

**Table 1.** GPR system parameters in road structure scene.


Road cavity is generally distributed in the underground range of 0.3–1 m, which is also the junction of the pavement structure and the subgrade. The subgrade is generally a soil structure, while the pavement structure is generally a flexible, semi-rigid, or rigid structure, generally made of cement or asphalt concrete; in particular, its bottom layer is generally made of hard materials such as gravel. Due to the high probability of soil erosion, cavities are most likely to appear at the junction of soft and hard layers. Figure 11 shows a simulation model example of a road structure with the first layer of asphalt, the second layer of concrete, and the third layer of sandstone. Their attribute parameter settings are shown in Table 2. In addition, the subgrade is generally dominated by soil, and a more realistic soil model was established by simulating random media. A number of soil dispersion materials were defined as follows: soil with 50% sand, 50% clay, sand density of 2.66 g/cm3, clay bulk density of 2 g/cm3, and volumetric water content ranging from 0.001 to 0.25. These materials were distributed over a model with a volume of 2 <sup>×</sup> 1.2 <sup>×</sup> 1 m<sup>3</sup> (in which the soil layers were randomly distributed).


**Table 2.** Dielectric properties of road structure.


Figure 12 presents the simulated models of four representative cavity morphologies: spherical, rectangular, cylindrical, and irregular hemispherical cavities. The first three cavities have smooth surfaces, and the last one shows an uneven surface. Figure 13 shows the representative 2D GPR morphological images of cavities. Figure 13a shows the case of spherical cavities with parabolic and circular features, which can be observed on the B-scan and C-scan images, respectively. As shown in Figure 13b, rectangular cavities generally have distinguishable features, namely a double parabola shape in B-scans. Similar features are also revealed in the cylindrical case of Figure 13c. However, Figure 13b,c can be distinguished by C-scans because they, respectively, show quadrilateral signatures and double circular intersection features in C-scans. Compared with Figure 13a, it can be observed in Figure 13d that the C-scan images of the irregular cavity show unsmoothed circular features with considerable noise randomly distributed inside the circle. Due to

these distinguishable and representative characteristics, the cavity morphology can be classified well with the FSL frameworks.

**Figure 12.** Display of simulation model from different perspectives (cavity in red).

**Figure 13.** Representative 2D GPR images of cavities: (**a**) spherical, (**b**) rectangular, (**c**) cylindrical, and (**d**) irregular hemispherical.

#### *5.3. Classification Results and Analysis*

Figure 14 shows the RelationNet-based underground cavity morphology classification results. These results were obtained based on the network settings of the backbone Conv64F, the benchmark dataset tieredImageNet, and in a *four-way five-shot* problem. As expected, compared with the ground truth, the irregular hemispherical cavity with significant characteristics in the GPR morphological image was correctly classified. Moreover, the classification accuracy rates of spherical, rectangular, and cylindrical cavities were 97.5%, 98.33%, and 98.33%, respectively. However, 0.83% and 1.67% of spherical were misclassified as cylindrical and hemispherical due to their similar morphological features. In addition, 1.67% of rectangular were misclassified as hemispherical due to their double parabola shapes in B-scans, while 0.83% and 0.83% of cylindrical were, respectively, misclassified as spherical and hemispherical, due to their similar features in B- or C-scans. The classification performance of RelationNet was evaluated based on the indices (*precision*, *recall*, and *F-score*) using the following Equations (4)–(6):

$$precision = \frac{true\ positive}{true\ positive +false\ positive} \tag{4}$$

$$recall = \frac{true\ positive}{true\ positive +false\ negative} \tag{5}$$

$$F-score = \frac{2 \cdot precision \cdot recall}{precision + recall} \tag{6}$$

Table 3 shows the statistical results obtained from the RelationNet results, namely the *precision*, *recall*, and *F-score* values. For the rectangular cases, 99.15% *precision* and 97.5% *recall* indicated that false-positive (FP) and false-negative (FN) results occurred, and the number of FN results was greater than those of FP. Similarly, the 99.16% *precision* and 98.33% *recall* values of the cylindrical case meant the FP and FN alarms because RelationNet sometimes recognized a cylindrical cavity as a spherical or hemispherical. For rectangular cases, 100% *precision* and the relatively low *recall* value indicated that FN occurred due to the misclassification between rectangular and hemispherical. On the contrary, 96% *precision* and 100% *recall* values meant that the hemispherical samples were properly classified using RelationNet, but multiple samples in other categories were misclassified as hemispherical at the same time. According to the *F-scores*, it can be concluded that the performance of RelationNet was acceptable.

**Table 3.** Statistical results obtained from RelationNet (%).


*5.4. Comparison Experiments*

5.4.1. RelationNet Evaluation on Different Embedding Backbones

The performance of RelationNet relies on the quality of the embedding backbone. To select a suitable main embedding backbone, we compared three different embedding backbones: Conv64F, ResNet12, and ResNet18. The above experiments were run for 10 epochs on the miniImageNet dataset. Other settings remained the same. Conv64F consisted of four convolutional blocks, and each block was composed of a convolutional layer, a batch-normalization layer, a ReLU layer, and a max-pooling layer. ResNet12 consisted of four residual blocks, each of which contained three convolutional blocks along with a skip connection layer. ResNet18 had the same architecture as used in [50]. Table 4 shows the comparison results among different backbones. It can be observed that RelationNet equipped with Conv64F backbone achieved the best performance in either a *four-way one-shot* or *four-way five-shot* problem.


**Table 4.** Comparison results on different embedding backbones (%) (the best results in bold).

5.4.2. RelationNet Evaluation on Different Benchmark Datasets

Table 5 compares the performance of RelationNet on miniImageNet and tieredImageNet datasets by controlling the most implementation details. For this comparison, RelationNet adopted Conv64F as the main backbone. As can be seen from Table 5, in the *four-way one-shot* problem, the accuracy achieved on the miniImageNet dataset was slightly higher than that on the tieredImageNet dataset. In the *four-way five-shot* problem, competitive accuracy could be achieved using the tieredImageNet dataset compared with using the miniImageNet dataset, improving the accuracy by 8.394%. Therefore, the accuracy results indicated that RelationNet performed well when trained on the tieredImageNet dataset.

**Table 5.** Comparison results on different benchmark datasets: miniImageNet vs. tieredImageNet (%) (the best results in bold).


#### 5.4.3. Performance Comparison of Different FSL Networks

To validate the effectiveness of RelationNet, the four experimental validation results of ProtoNet, R2D2, BaseLine, and RelationNet were compared. We used the exact same embedding backbone Conv64F and benchmark dataset miniImageNet. It can be observed from Table 6 that RelationNet achieved the best results in the *four-way one-shot* problem, even improving the accuracy by 11.994% over the BaseLine. With the increase in the number of sample images, these four frameworks achieved substantial improvements in facing the *four-way five-shot* problem, and RelationNet still achieved the highest accuracy over the other three frameworks.

**Table 6.** Statistical results obtained from RelationNet (%) (the best results in bold).


#### **6. Conclusions**

In this paper, we first applied the FSL technique to classify and identify cavity morphology characteristics based on the 3D GPR data. RelationNet was adopted as the FSL framework and trained end-to-end from scratch. Based on the advantages of learning a deep distance metric, RelationNet addressed the issue of insufficient cavity data and obtained the classification results using only a few samples. The experiment results demonstrated the effectiveness of using RelationNet in morphology classification performance. The RelationNet model achieved an average classification accuracy value of 97.328% in the *four-way five-shot* and 78.097% in the *four-way one-shot* problem.

There is a limitation that could be addressed in future research. In the experiments, all the models were trained on the source domain (e.g., miniImageNet and tieredImageNet) and directly tested on the target domain (e.g., cavity radar dataset). However, the performance hardly improved or significantly dropped when there was a large domain shift. In this paper, based on the fact that there was no intersection between the source set and our cavity radar dataset, there was a large domain offset between the source and target domains. Future efforts need to be made to integrate prior knowledge into FSL or explore *one-shot* or *zero-shot* classification methods.

For on-site applications, there are two limitations that could be addressed in future research. First, the real cavity data are difficult to collect for training the proposed method. Additionally, the publicly available GPR cavity datasets are limited. Efforts need to be made in the future to collect and prepare GPR datasets to facilitate the implementation of this method. Second, the proposed method was only tested for cavity morphology classification using the GprMax3D data. The scalability of the method in other challenging environments and applications needs further investigation. Future studies could test this method for collecting cavity data and classifying their morphologies in on-site city roads.

**Author Contributions:** F.H.: Conceptualization, Methodology, Writing—Original Draft Preparation, Software. X.L.: Data Curation, Writing, Validation. X.F.: Visualization, Investigation. Y.G.: Supervision, Writing—Reviewing and Editing. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was funded by the National Natural Science Foundation of China (no. 61871407).

**Institutional Review Board Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

