**Contents**


## **About the Editors**

#### **Francois Rivest**

Francois Rivest, Ph.D., has been an associate professor in the Department of Mathematics and Computer Science at the Royal Military College of Canada (RMC) since 2010 and a member of the Center for Neuroscience Studies at Queen's University since 2011. Dr. Rivest received his M.Sc. degree in Machine Learning from McGill University (Dean's Honours List) and his Ph.D. degree in computational neuroscience from the University of Montreal. His current research focuses on understanding how animals learn so quickly, in particular timing, in order to develop better representation construction algorithms for real-time machine learning. His research interests include the brain's dopaminergic system, animal interval timing, machine reinforcement learning, and automatic construction of representations.

#### **Abdellah Chehri**

Abdellah Chehri, Ph.D., is an associate professor in the Department of Mathematics and Computer Science at the Royal Military College of Canada (RMC). Dr. Chehri received his master's degrees in Digital Communications and Signal Processing from the University Nice-Sophia Antipolis-Eurecom (France) and his Ph.D. degree in Electrical Engineering, with applied research in Information Communication and Telecommunications, from the University Laval (Quebec City). Dr. Chehri has received many prestigious awards, including the Dean's Scholarship Award, Postdoctoral Studies (University Ottawa), Scholarship Fund to Support Success (Laval University), Japan Society for the Promotion of Science, MITACS, and NSERC Postdoctoral Fellowship. He has been listed among the top 2% cited scientists reported by Stanford University since 2020. His research interests include big data, data analytics, AI/ML, IoT (Internet of Things) for the real-time response and control of autonomous intelligent systems, intelligent biometric monitoring systems, ML/federated learning in wireless systems, and unmanned aerial vehicle communications.

## *Editorial* **Editorial for the Special Issue "Advances in Machine Learning and Mathematical Modeling for Optimization Problems"**

**Abdellah Chehri \* and Francois Rivest**

Mathematics and Computer Science, Royal Military College of Canada, Kingston, ON K7K 7B4, Canada; francois.rivest@rmc.ca

**\*** Correspondence: chehri@rmc.ca

Machine learning and deep learning have made tremendous progress over the last decade and have become the de facto standard across a wide range of image, video, text, and sound processing domains, from object recognition to image generation. Recently, deep learning and deep reinforcement learning have begun to develop end-to-end training to solve more complex operation research and combinatorial optimization problems, such as covering problems, vehicle routing problems, traveling salesman problems, scheduling problems, and other complex problems requiring general simulations. These methods also sometimes include classic search and optimization algorithms for machine learning, such as Monte Carlo Tree Search in AlphaGO.

Starting from the above considerations, this Special Issue aims to report the latest advances and trends concerning advanced machine learning and mathematical modeling for optimization problems. This Special Issue intends to provide a universally recognized international forum to present recent advances in mathematical modeling for optimization problems. We welcomed both theoretical contributions as well as papers describing interesting applications. Papers invited for this Special Issue considered aspects of this problem, including:


After reviewing submissions, we accepted a total of nine papers for publication.

The Internet of Things (IoT) encompasses many applications and service domains, from smart cities, autonomous vehicles, surveillance, and medical devices, to crop control. Most experts regard virtualization in wireless sensor networks (WSNs) as the most revolutionary technological technique in these areas. Due to node failure or communication latency and the regular identification of nodes in WSNs, virtualization in WSNs presents additional hurdles.

In the contribution by Othman et al. [1], "A Multi-Objective Crowding Optimization Solution for Efficient Sensing as a Service in Virtualized Wireless Sensor Networks", the authors present a novel architecture for heterogeneous virtual networks on the Internet of Things. They propose to embed the architecture in WSN settings to improve fault tolerance and communication latency in service-oriented networking. Moreover, the authors utilize the Evolutionary Multi-Objective Crowding Algorithm (EMOCA) to maximize fault tolerance and minimize communication delay for virtual network embedding in WSN environments for service-oriented applications focusing on heterogeneous virtual

**Citation:** Chehri, A.; Rivest, F. Editorial for the Special Issue "Advances in Machine Learning and Mathematical Modeling for Optimization Problems". *Mathematics* **2023**, *11*, 1890. https://doi.org/ 10.3390/math11081890

Received: 10 April 2023 Accepted: 12 April 2023 Published: 17 April 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

networks in the IoT. Unlike the current wireless virtualization approach, which uses the Non-dominated Sorting Genetic Algorithm-II (NSGA-II), EMOCA uses both domination and diversity criteria in the evolving population for optimization problems. The analysis of the results demonstrates that the proposed framework successfully optimizes fault tolerance and communication delay for virtualization in WSNs.

Scholars have recently introduced various non-systematic satisfiability studies on Discrete Hopfield Neural Networks to address the lack of interpretation. Although a flexible structure was established to help generate a wide range of spatial solutions that converge on global minima, the fundamental issue is that the existing logic completely ignores the distribution and features of the probability dataset, as well as the literal status distribution.

In the study by Abdeen et al. [2], "S-Type Random k Satisfiability Logic in Discrete Hopfield Neural Network Using Probability Distribution: Performance Optimization and Analysis", the authors consider a new type of non-systematic logic known as Stype Random k Satisfiability, which employs a novel layer of a Discrete Hopfield Neural Network and plays a significant role in identifying the predominant attribute likelihood of a binomial distribution dataset. Establishing the logical structure and assigning negative literals based on two specified statistical parameters is the objective of the probability logic phase. Abdeen et al. examined the performance of the proposed logic structure by comparing a proposed metric to current state-of-the-art logical rules. As a result, they discovered that the models have a high value in two parameters that efficiently introduce a logical structure in the probability logic phase. In addition, the study observed that implementing a Discrete Hopfield Neural Network reduced the cost function. The authors employed a novel statistical method of synaptic weight assessment to investigate the influence of the two proposed parameters on the logic structure. Overall, they revealed that regulating the two proposed parameters positively impacts synaptic weight management and the generation of global minimum solutions.

Traditional leak detection methods for gas pipelines necessitate task offloading decisions in the cloud, which has poor real-time performance. Edge computing provides a solution by allowing decisions to be made directly at the edge server, improving real-time performance; however, energy is the new bottleneck. In "Edge Computing Offloading Method Based on Deep Reinforcement Learning for Gas Pipeline Leak Detection", Wei et al. [3] concentrate on the real-time detection of gas transmission pipeline leaks. As a result, the authors propose a novel detection algorithm that combines the benefits of both the heuristic algorithm and the advantage actor-critic (AAC) algorithm.

The proposed detection algorithm seeks to optimize and ensure real-time pipeline mapping analysis tasks and maximize the survival time of portable gas leak detectors. Because the computing power of portable detection devices is limited due to their battery power, the main problem posed in this study is how to account for node energy overhead while ensuring system performance requirements.

Wei et al. establish the optimization model by introducing the concept of edge computing and using the mapping relationship between resource occupation and energy consumption as a starting point to optimize the total system cost (TSC). This is constituted of the transmission energy consumption of the node, the local computing energy consumption, and the residual electricity weight.

To reduce TSC, the algorithm employs the AAC network to make task scheduling decisions and determine whether tasks should be offloaded. Furthermore, it uses heuristic strategies and the Cauchy–Buniakowsky–Schwarz inequality to allocate communication resources.

Their experiments show that their proposed algorithm can meet the detector's realtime requirements while consuming less energy. Compared to the Deep Q Network (DQN) algorithm, their proposed algorithm saves approximately 56% of the system energy. It saves 21%, 38%, 30%, 31%, and 44% of energy consumption compared to the artificial gorilla troops Optimizer (GTO), the black widow optimization algorithm (BWOA), the exploration-enhanced grey wolf optimizer (EEGWO), the African vulture optimization algorithm (AVOA), and the driving training-based optimization (DTBO). Moreover, it

saves 50% and 30% compared to entirely local computing and fully offloading algorithms, respectively. Meanwhile, this algorithm's task completion rate is 96.3%, the best real-time performance among these algorithms.

The pickup and delivery problems are pertinent problems in our interconnected world. Efficiently moving goods and people can decrease costs, emissions, and time. In the contribution by Little et al. [4], "Comparison of Genetic Operators for the Multi-Objective Pickup and Delivery Problem", the authors develop a genetic algorithm to solve the multiobjective capacitated pickup-and-delivery problem by adapting standard benchmarks.

They aim to reduce the total distance traveled and the number of vehicles employed. Based on NSGA-II, the authors investigate the effects of inter-route and intra-route mutations on the final solution. Little et al. introduce six inter-route operations and sixteen intra-route operations. Then, they calculate the hypervolume to compare their impact directly. In addition, the authors present two unique crossover operators tailored to this problem.

Their methodology identified optimal results in 23% of the instances in the first benchmark. In most other models, it generated a Pareto front within 1 vehicle and 20% of the best-known distance. Users can select the routes that best suit their requirements due to the presence of multiple solutions.

In a disaster, the road network is often compromised in capacity and usability conditions. This is a challenge for humanitarian operations delivering critical medical supplies. In the contribution by Anuar et al. [5], "A Multi-Depot Dynamic Vehicle Routing Problem with Stochastic Road Capacity: An MDP Model and Dynamic Policy for Post-Decision State Rollout Algorithm in Reinforcement Learning", the authors optimize vehicle routing for a Multi-Depot Dynamic Vehicle-Routing Problem with Stochastic Road Capacity (MD-DVRPSRC) using the Markov Decision Processes (MDP) model. They use the Post-Decision State Rollout Algorithm (PDS-RA) as a look-ahead approach in an Approximate Dynamic Programming (ADP) solution method. The authors execute a PDS-RA for all assigned vehicles to effectively solve the problem. The agent then decides at the end.

For the PDS-RA, Anuar et al. propose five types of constructive base heuristics. Firstly, they propose the Teach Base Insertion Heuristic (TBIH-1) to investigate the partial random construction approach for non-obvious decisions. The paper presents TBIH-2 and TBIH-3 as extensions to the TBIH-1 to demonstrate how experts could execute the Sequential Insertion Heuristic (I1) and Clarke and Wright (CW) in a dynamic setting, respectively. Additionally, the authors propose TBIH-4 and TBIH-5 (TBIH-1 with the addition of Dynamic Look-ahead SIH (DLASIH) and Dynamic Look-ahead CW (DLACW)). The goal is to improve the on-the-fly constructed decision rule (dynamic policy on the fly) in look-ahead simulations.

COVID-19 has shaken the world economy and affected millions of people in a brief period. COVID-19 has countless overlapping symptoms with other upper respiratory conditions, making it challenging for diagnosticians to diagnose correctly. Several mathematical models have been presented for their diagnosis and treatment. In "An Optimized Decision Support Model for COVID-19 Diagnostics Based on Complex Fuzzy Hypersoft Mapping", Saeed et al. [6] propose a mathematical framework based on a novel agile fuzzy-like arrangement, the complex fuzzy hypersoft (CFHS) set, a combination of the complex fuzzy (CF) and the hypersoft sets (an extension of the soft set).

First, the authors develop the CFHS elementary theory, which considers the amplitude term (A-term) and phase term (P-term) of complex numbers simultaneously to address uncertainty, ambivalence, and mediocrity of data. This new fuzzy-like hybrid theory is versatile in two parts.

First, it provides access to a wide range of membership function values by broadening them to the unit circle on an Argand plane and incorporating an additional term, the P-term, to account for the periodic nature of the data. Second, it divides the distinct attributes into corresponding sub-valued sets for easier comprehension. The CFHS set and CFHS mapping, with its inverse mapping (INM), can manage such issues. They validate their proposed framework by connecting COVID-19 symptoms to medications. This work also

includes a generalized CFHS mapping [6], which can assist a specialist in extracting the patient's health record and predicting how long it will take to overcome the infection.

With the fourth industrial revolution developing, the way factories operate will no longer be the same. Factory automation can save labor and avoid equipment failures with online fault-detection systems. In recent years, various signal-processing methods have received much attention in the problem of fault-detection systems. In the article by Lee et al. [7], "Application of ANN in Induction-Motor Fault-Detection System Established with MRA and CFFS", the authors propose a fault-detection system for faulty induction motors (bearing faults, inter-turn shorts, and broken rotor bars) based on a multiresolution analysis (MRA), correlation and fitness values-based feature selection (CFFS), and artificial neural network (ANN).

For induction–motor–current signature analysis, Lee et al. compare two featureextraction methods: the MRA and the Hilbert Huang transform (HHT). This work compares feature-selection methods to reduce the number of features while maintaining the best detection system accuracy to reduce operating costs. In addition, the proposed detection system is tested with additive white Gaussian noise, and the best signal-processing and feature-selection methods are chosen to create the best detection system. According to their results, features extracted from MRA outperform HHT using CFFS and ANN. The authors also confirm that the CFFS significantly reduces operation costs (95% of the features) while maintaining 93% accuracy using ANN in their proposed detection system.

Detection and recognition of scene text, such as automatic license plate recognition, is a technology with various applications. Although numerous studies have been conducted to increase detection performance, accuracy decreases when low-resolution and low-quality legacy license plate images are input into a recognition module.

In "HIFA-LPR: High-Frequency Augmented License Plate Recognition in Low-Quality Legacy Conditions via Gradual End-to-End Learning", Lee, S.-J. et al. [8] propose a model for high-frequency augmented license plate recognition. They integrate and collaboratively train the super-resolution and the license plate recognition modules using a proposed gradual end-to-end learning-based optimization. To train their model optimally, the authors propose a holistic feature extraction method that effectively precludes the generation of grid patterns from the super-resolved image during training.

Moreover, to exploit high-frequency information that affects license plate recognition performance, the authors propose a high-frequency augmentation-based license plate recognition module. In addition, they present a three-step, gradual, and end-to-end learning process based on weight immobilization. Their three-step methodological approach optimizes each module for robust performance in recognition. The experimental outcomes demonstrate that their model outperforms extant methods in low-quality legacy conditions for the UFPR and Greek vehicle datasets.

In machine learning, the convex minimization problem in the sum of two convex functions is fundamental. Many authors have analyzed this problem due to its applications in various fields, such as data science, computer science, statistics, engineering, physics, and medical science. These applications include signal processing, compressed sensing, medical image reconstruction, digital image processing, and data prediction and classification. In the contribution by Chumpungam et al. [9], "An Accelerated Convex Optimization Algorithm with Line Search and Applications in Machine Learning", the authors introduce a new line search technique and use it to build a novel accelerated forward–backward algorithm for solving convex minimization problems in the sum of two convex functions, one of which is smooth in a real Hilbert space.

The authors demonstrate a weak convergence to a solution of the proposed algorithm in the absence of the Lipschitz assumption on the gradient of the objective function. Furthermore, they evaluate its performance by applying the proposed algorithm to classification problems on various data sets and comparing it to other line search algorithms. The authors' experiments show that their proposed algorithm outperforms other line search algorithms.

The articles presented in this Special Issue provide insights into fields related to "Advances in Machine Learning and Mathematical Modeling for Optimization Problems", including models, performance evaluation and improvements, and application developments. We wish that readers can benefit from the insights of these papers and contribute to these rapidly growing areas. We also hope that this Special Issue sheds light on major developments in the area of machine learning and mathematical modeling for optimization problems and attracts the attention of the scientific community to pursue further investigations leading to the rapid implementation of these techniques.

**Acknowledgments:** We would like to express our appreciation to all the authors for their informative contributions and to the reviewers.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **A Multi-Objective Crowding Optimization Solution for Efficient Sensing as a Service in Virtualized Wireless Sensor Networks**

**Ramy A. Othman 1, Saad M. Darwish 2,\* and Ibrahim A. Abd El-Moghith <sup>3</sup>**


**Abstract:** The Internet of Things (IoT) encompasses a wide range of applications and service domains, from smart cities, autonomous vehicles, surveillance, medical devices, to crop control. Virtualization in wireless sensor networks (WSNs) is widely regarded as the most revolutionary technological technique used in these areas. Due to node failure or communication latency and the regular identification of nodes in WSNs, virtualization in WSNs presents additional hurdles. Previous research on virtual WSNs has focused on issues such as resource maximization, node failure, and link-failure-based survivability, but has neglected to account for the impact of communication latency. Communication connection latency in WSNs has an effect on various virtual networks providing IoT services. There is a lack of research in this field at the present time. In this study, we utilize the Evolutionary Multi-Objective Crowding Algorithm (EMOCA) to maximize fault tolerance and minimize communication delay for virtual network embedding in WSN environments for service-oriented applications focusing on heterogeneous virtual networks in the IoT. Unlike the current wireless virtualization approach, which uses the Non-dominated Sorting Genetic Algorithm-II (NSGA-II), EMOCA uses both domination and diversity criteria in the evolving population for optimization problems. The analysis of the results demonstrates that the proposed framework successfully optimizes fault tolerance and communication delay for virtualization in WSNs.

**Keywords:** fault tolerance; virtualization; internet-of-things; multi-objective optimization; evolutionary crowding algorithm

**MSC:** 37M05; 37-04

## **1. Introduction**

To accommodate the ever-expanding range of services offered by the IoT, network virtualization has been heralded as a crucial future-proofing mechanism for the Internet [1]. Through virtualization, a computer's hardware may be abstracted into a set of logical units that can then be shared across several users and, in some cases, competing software programmers. Multiple applications will be able to cohabit on the same virtualized WSNs, making this a potential strategy that can enable efficient use of WSN implementations [2]. The virtualization of networks has been proposed as a component of future inter-network communication models that might make it simple to integrate new functions into the Internet without requiring fundamental changes to the underlying architecture. The evolution of Internet structures would be hastened by this [3].

As a whole, the network virtualization environment is made up of individual network nodes and the connections between them. A virtual topology is created when virtual nodes are linked together via virtual connections to overcome the limitations of a single

**Citation:** Othman, R.A.; Darwish, S.M.; Abd El-Moghith, I.A. A Multi-Objective Crowding Optimization Solution for Efficient Sensing as a Service in Virtualized Wireless Sensor Networks. *Mathematics* **2023**, *11*, 1128. https:// doi.org/10.3390/math11051128

Academic Editors: Francois Rivest and Abdellah Chehri

Received: 26 November 2022 Revised: 16 February 2023 Accepted: 21 February 2023 Published: 24 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

connection, such as low bandwidth. The same physical hardware can host many virtual networks, each of which may have drastically different features. Resource-virtualization technologies also make things more abstract, which gives network operators a lot of freedom in how they run and change the network [4].

Sensing as a service (SaaS), which may be carried out in conjunction with network as a service (NaaS), is one of several fascinating application areas where the concept of WSN virtualization can be put to use. WSN virtualization enhances IoT security, resource usage, and administration, and decreases energy consumption [5]. Figure 1 shows how WSN visualization can be performed by making it easier for different kinds of networks to work together on the same physical infrastructure. The current four-tiered virtualization architecture for WSN networks is designed to cut down on unnecessary duplication of sensor networks across various IoT use cases [6–8].

**Figure 1.** Architecture for virtualized wireless sensor networks.

The current virtualized wireless sensor networks architectures have not taken into account the possibility of a communication breakdown on a virtual network as a result of a breakdown in communications on real-world WSN networks. All nodes in a WSN are susceptible to failures such as node failures, communication failures, or internal component malfunctions of the sensors (such as a transceiver, CPU, battery, etc.) due to the wide variety of risk or hazard situations in which WSNs are deployed. Additionally to sensor attributes (low cost, compact size, high quality, etc.) [9], WSN technology has a number of challenges, but fault tolerance is by far the most significant of these. Due to the severity of these problems, it is even more important to include procedures and ways to remedy these flaws and reinforce their operation in order to boost fault tolerance [10].

In many scientific and technical contexts, it is important to simultaneously maximize many objectives while weighing the tradeoffs between them. Recent years have seen extensive studies devoted to the development of effective algorithms for resolving such multi-objective optimization (MOO) challenges. To solve MOO issues, these algorithms employ a population of candidate solutions, investigating a number of non-dominated solutions simultaneously. This is in contrast to the single-solution-at-a-time approach taken by conventional methods. In this process, the authors in [11] used a probabilistic approach to the formulation of a novel evolutionary multi-objective crowding algorithm (EMOCA). A middle ground between the issues of dominance and variety in the expanding population appears to be provided by their method.

In this context, this paper presents a novel architecture for heterogeneous virtual networks in the IoT that may be embedded into WSN settings to improve fault tolerance and decrease communication latency in service-oriented networking. Since fault tolerance and communication latency are often two conflicting objectives in WSN settings, the problem can be formulated as a reactive optimization of fault tolerance and communication delay, which in our case is carried out by adapting an evolutionary multi-objective crowding algorithm (EMOCA). EMOCA's novel method lies in its use of a non-domination ranking scheme and a probabilistic technique to decide whether an individual's offspring will be considered during the replacement-selection phase. EMOCA incorporates diversity preservation as an integral part of the algorithm. Compared with the well-known nondominated sorting genetic algorithm NSGA-II, EMOCA discovers a diverse set of nondominated solutions with near-uniform spacing [11]. Simulations are used to find out how well EMOCA performs at optimizing fault tolerance for virtualization in WSNs.

The remaining sections of the paper are as follows: the literature on virtual network embedding's fault tolerance is discussed in Section 2. Section 3 lays forth the specifics of the multi-objective optimization problem's mathematical formulation and EMOCA's application toward resolving it. The simulation environment, metrics, and performance comparisons are discussed in Section 4, and a summary is provided in Section 5.

#### **2. Related Works**

This section will provide an overview of some of the studies that have been carried out on fault tolerance in virtual network embedding (VNE). We surveyed the literature and classified past research into three broad classes: that focusing on link failure, that focusing on node failure, and that focusing on multi-objective optimization for network survival. We will next move on to a discussion of virtualization as a contributing area in WSNs. Many approaches have been suggested to strengthen VNE's dependability against the failure of the substrate resources, and many researchers have attempted to address the VNE problem using these mechanisms [12].

There are two main types of solutions to VNE survivability issues that have been identified in the literature: (a) proactive solutions that involve reserving resources in advance of a potential failure, and (b) reactive solutions that respond to a failure by immediately initiating a restoring mechanism [13]. In this case, each link's backup-storage quota has been depleted to be used for protection and restoration. Survivability techniques based on connection restoration and protection are useful from a commercial standpoint, but they have certain limitations. In many instances, the reactive method might cause data loss. The survivability measurement also does not account for the fault-tolerance capabilities of connections or communication latency [13].

Reactive solutions utilize a path-selection algorithm to determine backup pathways for each underlying connection before any VNE request is received. An existing embedding technique is then used to create the virtual node and link it to the subsequent request. With increased data loads, failure can cause a significant loss of data, and the backup mechanism may not be able to restore the VNE [14]. In [15], the authors presented the link-based backup strategy as a preventative measure against link failure. A portion of each core link's backup bandwidth is reserved in advance of any incoming VN request during the setup process. In this case, the backup bandwidth is scheduled ahead of time, before a problem occurs, which is preferable. Further, the VN embedding process requires fewer computational resources. With the shared pre-allocation method, backup bandwidth is held regardless of the VN requests, meaning it might not be used if even a small number of VN requests come at once.

To choose the most suitable virtual link for failure recovery, a hybrid technique was presented in [16]. In contrast to the reactive approach, which seeks to reallocate any capacity negatively impacted by a large request, the preventative approach embeds virtual links into numerous core channels to promote resistance to attacks and efficiency in resource use. This method depends on the WSN's remaining hardware resources, which may not be enough to fix the virtual network on a very busy network. An approach for identifying the alternate link among the impacted virtual network (VN) resources is introduced in [17]. While a dynamic recovery method is useful in general, it is especially useful when physical failures cause additional downtime and resources are limited. This approach demands a full VN reset, which takes a long time and makes the service inaccessible.

The authors in [18] presented a two-step methodology for restoring the whole VN of the failed attachment node. First, a graph is built to request VN with a virtual link backup contract, and then the improved VN is requested on the core set by employing both the redundant and K-redundant schemes. While this strategy may help optimize the allocation of certain resources, it may not be able to do so for all of them. It is recommended to set aside a spare node and link for every vital node in the network. A second two-step strategy for restoring VN is presented in [19]. The VN is augmented using virtual nodes (VNodes) and virtual links (VLinks) in the first stage, and sensor networks are then given access to this improved VN in the second stage. In the worst event, each VNode needs to have a backup set aside. The research in [20] offered an enhanced VN based on a failover method to minimize backup resources. Despite being resource-efficient, this method is unworkable because VNodes frequently migrate.

Contrary to these approaches, in [21], the authors presented a joint optimization approach to assign both primary and backup resources. Although heuristic-based mapping quickly tackles single-node failure, the complexity and inconvenience of considering backup resources and the possibility of node and connection failure are inherent in this embedding technology. A method for improving long-term viability with minimal operating expenses was discussed in [22], which takes advantage of the spatial distribution of VNE's physical resources. A heuristic-based method was used for the smaller network, while an integer linear programming model was used for the larger one. It has been hypothesized that this is a multi-commodity network-flow issue. Since smaller networks often have faster physical connectivity, location data have less of an effect. If the structure of the virtual networks is altered, undesirable topology-based survival characteristics will emerge as a direct result. Even though there are more and more factors that take survival into account, the use of single-objective optimization approaches has stopped progress toward the best values for network parameters [23–26].

To improve fault tolerance in WSN virtualization, the popular MOO approach of nondominated sorting based on a genetic algorithm (NSGA-II) is developed in [4]. Through a process of chromosomal sorting, NSGA-II is modified to address the optimization issue. The technique of sorting prioritizes chromosomes depending on competing criteria. Concerning solution dispersion and convergence to the genuine Pareto optimal, NSGA-II performs better than other Pareto-optimal approaches. However, there are drawbacks to the framework because of restrictions on the dissemination of consistency in some issues. Moreover, crowded comparisons can restrict the convergence. Virtualization proposals for WSNs tend to focus on improving resource (sensor) usage via the use of application-centric multitasking and the abstraction of sensors according to their use (i.e., virtual sensors).

The research in [27] investigated the challenge of finding the optimal lifetime and number of relay nodes for a network operating in three-dimensional environments. To achieve a better compromise between two goals, a new method is suggested. The technique combines a decomposition-based multi-objective evolutionary algorithm with a targeted local search to improve its component parts. In [28], the controller placement problem, which is a multi-objective optimization problem, is stated for selecting the optimal location for Software Defined Network (SDN) controllers to improve WSN performance. Considerations such as cost, time, and dependability are among the constraints that are applied here. In addition, a novel adaptive population-based cuckoo optimization (APB-CO) is used to position controllers optimally.

The work in [29] discussed WSN resource allocation for combined time-slot assignment, channel allocation, and power control. This study analyses resource dependency to design a two-stage resource-allocation optimization technique for a non-convex issue with diverse research aims and computing complexity. First, a graph-coloring technique for timeslot assignment is created for conflict-free sensor information interchange. Based on the first stage of this technique, combined power control and channel allocation are examined and articulated as a multi-objective optimization problem to solve the tradeoff between energy efficiency and network capacity maximization under link interference and loadbalancing constraints. In their work, multi-objective hybrid-particle swarm optimization yields Pareto-optimal solutions.

In [30], the time function of the goal function perception matrix is presented, taking into account the features of low-power and real-time performance of sensor nodes in WSN. In order to limit the perceptual nodes' inherent bias, a constraint on the number of targets they can detect is suggested; a weighted factor on the utility function is employed to ensure users are treated fairly; and finally, an optimization model of multi-objective resource allocation is established. To effectively allocate resources, a new technique is presented that builds on top of a modified version of simulated annealing (SA), bringing together the speedy optimization capabilities of SA with the robust search capabilities of logistic chaos.

The authors in [31] presented a multi-objective protocol (MOP) that maximizes network lifespan and residual energy using a mixed-integer linear-programming (MILP) optimization technique. Within the boundaries of the nodes that make up a given target, sets of MILP are solved locally. Therefore, within the same coverage nodes, energy is conserved. This research takes into account the goals of optimizing network residual energy and neighbor node connections. In order to determine which nodes to deactivate, each round's local MILP solution is used to identify the nodes that have the lowest connection to their neighbors and are thus the most heavily used throughout the routing process.

For 5G systems that support the Internet of Things, the research in [32] developed a new method of clustering based on optimization via network slicing. By using network slicing and cluster construction, multi-objective improved seagull optimization-based clustering with network slicing (MOISGO-CNS) aims to improve 5G systems' energy efficiency and load distribution. Both ISGO-based clustering and IGSO using bidirectional long short-term memory (BiLSTM) form the backbone of the MOISGO-CNS method. Twohop connectivity ratio, residual energy, and link quality are the three metrics used to build a fitness function in the IGSO-based clustering method. In addition, the ISGO algorithm is developed as part of the network-slicing process in order to pick hyperparameters for optimum slicing classification performance. See [33,34] for an updated review of multiobjective optimization in wireless sensor networks. Recent studies that have looked at the crucial research of node and network-level virtualization in WSNs for the IoT [35,36] and applications show this to be the case [6,8,37–39].

In general, the problem with employing evolutionary algorithms for improving fault tolerance in WSN virtualization is that they cannot determine whether or not a solution is optimum; they can only determine whether or not it is "better" than other solutions that they already know about. It is also tricky to provide accurate weights to the objective functions, run the algorithm numerous times, and end up with various Pareto-optimal solutions; and Pareto-front concaves are notoriously difficult to analyze. A key challenge in the development of effective algorithms is the incorporation of diversity mechanisms into evolutionary algorithms for multi-objective optimization problems. This is the case for problems with an exponentially large number of possible non-dominated goal vectors. An acceptable approximation of the Pareto front is what we are aiming to obtain.

We look at how this can be carried out using the diversity mechanism of crowding dominance and highlight where this idea is demonstrably beneficial to handle internal failure perspectives in virtualization in WSNs. We use EMOCA as an MOO technique to maximize fault tolerance and minimize communication delay. The performance of EMOCA is compared with that of the well-known non-dominated sorting genetic algorithm NSGA-II. According to [11], EMOCA performs better than the other algorithm in eight of the nine test problems when it comes to convergence and diversity. It always finds a wide range of solutions that are not dominant.

#### **3. The Proposed Framework**

Here we cover the topic of virtualization's fault tolerance in WSNs. We evaluate a network structure with four layers. There is the "physical" layer, which is made up of the real sensor nodes, and then there is the "virtualization" layer, which creates additional "virtual" sensors that can perform additional jobs and services beyond what the "physical" layer can. In the third layer, known as the "access layer", different WSNs are developed based on the fault-tolerant incorporation of mission-oriented sensors. There is an access agent for every embedded network. The applications layer is where the IoT's smart applications, such as humidity, fire monitoring, temperature, etc., are represented to the end users who really benefit from them. In order to implement the suggestion, the access layer is modified.

Every node in a traditional sensor network cooperates to deploy sensors at the same level [24]. When many sensor networks operate together and share the same physical location, they form the Virtual Sensor Network (VSN). The same domain hosts a variety of physically distinct sensor networks. As part of a larger wireless sensor network, it is established by the sensor nodes that are most relevant to a certain activity or use case at that moment [20]. But in a virtual sensor network, the nodes work together to complete a specified task at a precise moment. To create a virtual sensor network, logical connections must be made between cooperating sensor nodes. Depending on the phenomenon being monitored or the function being served, nodes may be organized into distinct virtual sensor networks. The capability for network construction, utilization, adaptation, and maintenance of a subset of sensors working on a given job should all be provided by the virtual sensor network protocol. The proposed framework's flowchart is shown in Figure 2, and the mathematical terminology used to describe its key processes is included in Table 1.

Say we have a sensor network with nodes dispersed over the network region *NA*. Assume mesh topology, meaning all nodes are connected. This network supports virtual networks. Assume a link-route breakdown causes *s<sup>v</sup>* and *dv*'s link connection to fail. The wireless sensor network connects source physical sensor *s<sup>p</sup>* and destination physical sensor *d<sup>p</sup>* nodes. Investigate all possible paths between *s<sup>p</sup>* and *d<sup>p</sup>* to discover a fault-tolerant alternative. To find these routes, you must know the expected number of intermediary nodes. By calculating the average distance to the nearest-neighboring node, we may count the paths. Obtaining the sensor's probability density function (pdf) is all that is required to compute the nearest-neighbor sensor's distance; pdf is the probability of a neighbor sensor within *r* and (*r* + Δ*r*), where *r* is the transmission radius and Δ*r* is the incremental distance. The physical wireless sensor network is considered to have a uniform sensor distribution *λ* such that [4].

$$\int\_{\mathcal{N}\_A} \lambda \, d\mathcal{N}\_A = 1 \Rightarrow \lambda = \frac{1}{\mathcal{N}\_A} \tag{1}$$

For any two sensors separated by a distance between *r* and (*r* + Δ*r*), the probability *Pc <sup>r</sup>*|(*r*+Δ*r*) of the closest-neighbor sensor is equal to the product of the probabilities that one of the sensors is present at the distance *P<sup>s</sup> <sup>r</sup>*|(*r*+Δ*r*) and that none of the other sensors are closer *P*0 <sup>&</sup>lt;*r*. Assume that the *Nn* sensor nodes in the network can only send data at a distance of 0.5 rad to the destination *dp*. In this case, *P<sup>c</sup> <sup>r</sup>*|(*r*+Δ*r*) can be computed as:

$$P\_{r|(r+\Delta r)}^{\varepsilon} = P\_{\prec r}^{0} \cdot P\_{r|(r+\Delta r)}^{\varepsilon} \tag{2}$$

$$P\_{r|(r+\Delta r)}^{\mathbf{c}} = \left[1 - P\_{\prec r}^{\mathbf{s}}\right] \cdot \left[P\_{r|(r+\Delta r)}^{\mathbf{s}}\right] \tag{3}$$

$$P\_{r|(r+\Delta r)}^{c} = \left[1 - \sum\_{j=1}^{N\_0} \binom{N\_0}{j} \left(\frac{\lambda \pi r^2}{2}\right)^j \left(1 - \frac{\lambda \pi r^2}{2}\right)^{N\_0 - j}\right] \cdot \left[\sum\_{j=1}^{N\_0} \binom{N\_0}{j} \int\_r^{r+\Delta r} \left(\frac{2\lambda \pi r \cdot dr}{2}\right)^j dr \cdot \int\_r^{r+\Delta r} \left(1 - \frac{2\lambda \pi r \cdot dr}{2}\right)^{\lambda i - j} dr\right] \tag{4}$$

$$P\_{r|(r+\Lambda r)}^c = (1 - \lambda \pi r^2)^{N\_\pi} \left[ 1 - \left( 1 - \lambda \pi \left( r dr + dr^2 \right) \right)^{N\_\pi} \right] \tag{5}$$

$$P\_{r|(r+\lambda r)}^c = (1 - \lambda \pi r^2)^{N\_\pi} \left[ 1 - \left\{ 1 - \begin{pmatrix} N\_\pi \\ 1 \end{pmatrix} \cdot \left( \lambda \pi (r dr + dr^2) \right) + \begin{pmatrix} N\_\pi \\ 2 \end{pmatrix} \cdot \left( \lambda \pi (r dr + dr^2) \right)^2 \dots \right\} \right] \tag{6}$$

$$P\_{r|(r+\Delta r)}^{c} = (1 - \lambda \pi r^2)^{N\_0} \left[ N\_0 \lambda \pi r dr + N\_0 \lambda \pi dr^2 - \binom{N\_0}{2} \cdot \left( \lambda \pi (r dr + \dots \quad dr^2) \right)^2 \dots \right] \tag{7}$$

**Figure 2.** Flowchart of the proposed framework.


**Table 1.** Mathematical Nomenclature.

In order to calculate the probability density function of the nearest-neighbor distance *fr*(*r*), we can use the limit in Equation (7) as:

$$f\_r(r) = \lim\_{dr \to 0} \frac{P\_{r|(r + \Delta r)}^c}{dr} = N\_n \lambda \pi r (1 - \lambda \pi r^2)^{N\_n} \tag{8}$$

Considering *R* as transmission range of sensors, the expected closest-neighbor distance (*r*) can be expressed as

$$E(r) = \int\_0^R r f\_r(r) dr = \int\_0^R \mathcal{N}\_n \lambda \,\pi r^2 \left(1 - \lambda \pi r^2\right)^{N\_n} dr\tag{9}$$

$$E(r) = \left[\frac{-r\left(1-\lambda\pi r^2\right)}{\lambda\pi \left(N\_{\text{fl}}+1\right)}\right]\_{0}^{R} + \int\_{0}^{R} \frac{\left(1-\lambda\pi r^2\right)^{N\_{\text{n}}+1}}{\lambda\pi \left(N\_{\text{n}}+1\right)} dr\tag{10}$$

$$E(r) = \left[\frac{1}{\lambda \pi (N\_n + 1)} \sum\_{i}^{N\_{n+1}} \left(\begin{array}{c} \text{N}\_n + 1\\ i \end{array}\right) \frac{\left(-\lambda \pi r^2\right)^i r}{i + 1}\right]\_0^R \tag{11}$$

$$E(r) = \frac{\sqrt{N\_A}}{\lambda \pi^{\frac{3}{2(N\_w+1)}}} \sum\_{i}^{N\_{n+1}} \frac{\left(-1\right)^i}{i+1} \tag{12}$$

It can be shown that there are exactly (*Nin* − <sup>2</sup>)*ck* pathways from *<sup>s</sup><sup>p</sup>* to *<sup>d</sup><sup>p</sup>* with exactly *k* intermediary nodes, where

$$k = \left\{ 1, 2, 3, \dots \left( \left\lfloor \frac{D}{E(r)} \right\rfloor - 1 \right) \right\} \tag{13}$$

$$N\_{in} = \left( \left\lfloor \frac{D}{E(r)} \right\rfloor - 1 \right) \tag{14}$$

*D* represents distance between *s<sup>p</sup>* and *dp*. The equation for the total number of routes, *Np*, from *s<sup>p</sup>* to *d<sup>p</sup>* is as follows:

$$N^p = (N\_{\rm in} - 2)\_{\rm C\_1} + (N\_{\rm in} - 2)\_{\rm C\_2} + \dots + (N\_{\rm in} - 2)\_{\rm C\_{(N\_{\rm in} - 2)}} \tag{15}$$

$$N^p = \left\{ (N\_{\rm in} - 2)\_{\mathbb{C}\_0} + (N\_{\rm in} - 2)\_{\mathbb{C}\_1} + \dots + (N\_{\rm in} - 2)\_{\mathbb{C}\_{(N\_{\rm in} - 2)}} \right\} - 1 \tag{16}$$

*<sup>N</sup><sup>p</sup>* <sup>=</sup> <sup>2</sup>*Nin* <sup>−</sup> 1 (17)

If we want to maximize fault tolerance (*FT*), we can write it as:

$$\text{Maximize } FT = \max\_{i=1,2,\ldots,N^p} \left( FT\_i^p \right) \tag{18}$$

$$FT\_i^p = \frac{1}{\left(\left\lfloor \frac{D}{E(r)} \right\rfloor - 1\right)} \sum\_{i=s^p, j=1}^{i=s\left(\left\lfloor \frac{D}{E(r)} \right\rfloor - 1\right), j=d^p} FT\_{i,j}^{I} \tag{19}$$

*FT<sup>p</sup> <sup>i</sup>* is the fault tolerance of the *<sup>i</sup>*th path from source *<sup>s</sup><sup>p</sup>* to destination *<sup>d</sup>p*, and *FT<sup>l</sup> <sup>i</sup>*,*<sup>j</sup>* is the fault tolerance of a link between an adjacent pair of nodes. The ordered set of nodes of *i*th path is represented by *Sop i*

$$S\_i^{op} = \{ \mathbf{s}^p, \mathbf{1}, \mathbf{2}, \dots \} \left( \left\lfloor \frac{D}{E(r)} \right\rfloor - \mathbf{2} \right) \,, \left( \left\lfloor \frac{D}{E(r)} \right\rfloor - \mathbf{1} \right) \,, d^p \} \tag{20}$$

Similar to how the maximize *FT* function is written, the communication-delay (*CD*) minimization function is given by:

$$Minimize\ CD = \min\_{i=1,2,\ldots,N^p} \left(\mathbb{C}D\_i^p\right) \tag{21}$$

$$\text{CD}\_{i}^{p} = \frac{1}{\left( \left\lfloor \frac{D}{E(r)} \right\rfloor - 1 \right)} \quad \sum\_{i=s^{p}, j=1}^{i - \left( \left\lfloor \frac{D}{E(r)} \right\rfloor - 1 \right), j = d^{p}} \left( \frac{CD\_{i,j}^{l}}{CD\_{\text{max}}^{l}} \right) \tag{22}$$

*CD<sup>p</sup> <sup>i</sup>* represents delay of *<sup>i</sup>*th path from *<sup>s</sup><sup>p</sup>* to *<sup>d</sup>p*, *CD<sup>l</sup> <sup>i</sup>*,*<sup>j</sup>* is the delay of a link between an adjacent pair of nodes, and *<sup>i</sup>*, *<sup>j</sup>* <sup>∈</sup> *<sup>S</sup>op <sup>i</sup>* . The maximum link delay among all the links is represented by *CD<sup>l</sup> max*. The optimization issue outlined above has the following restrictions:

$$0 < FT\_i^p \le 1, \ 0 < FT\_i^l \le 1, \ 0 < CD\_i^p \le 1, \ 0 < \frac{CD\_{i,j}^l}{CD\_{\max}^l} \le 1\tag{23}$$

The problem can be formulated as a reactive optimization of fault tolerance and communication delay, which is accomplished in our case by adapting an evolutionary multi-objective crowding algorithm (EMOCA). The number of objectives being optimized for is the primary dividing line between single- and multi-objective streamlining. When there are several competing goals, there is no best way to solve the situation at hand. There are a few possible good solutions. Pareto-optimal solutions are those that maximize utility with the fewest costs. As far as all goals go, the Pareto front does not provide a single solution that is optimal. Accordingly, all Pareto-front solutions are valuable without any problem-specific knowledge regarding the relative importance of different goals. Finding numerous such solutions that represent tradeoffs between goals is the primary aim of multi-objective optimization [40,41].

The primary objectives of multi-objective evolutionary algorithms (MOEAs) include: (1) settling on a Pareto-optimal solution set; and (2) acquiring a wide variety of options that are evenly spaced. When solutions are distributed unevenly, the Pareto front becomes crowded in certain areas. The EMOCA solution prioritizes variety throughout the algorithm to solve this problem [11]. Evolutionary operators such as crossover and mutation, in addition to chromosomal sorting through the non-dominance concept and diversity, are used to alter the solutions in EMOCA. After multiple cycles, the EMOCA eventually arrives at a collection of tradeoffs known as the Pareto front. Unlike an aggregate optimization strategy that only offers one solution, this set of alternatives gives the system designer many to choose from. The main structure of EMOCA is illustrated in Algorithm 1. Now we will discuss each of EMOCA's distinct steps [11].

**Algorithm 1:** EMOCA main structure

	- 2.1. Generate Mating Population.
	- 2.2. Generate offspring by crossover followed by mutation.
	- 2.3. Create a new pool consisting of parents and some offspring.
	- 2.4. Trim new pool to generate the population for the next iteration.
	- 2.5. Update archive to contain all non-dominated solutions

Mating Population Generation: As a means of increasing the number of viable mates, EMOCA uses a system of binary tournament selection. An individual's fitness level is equal to their non-dominance rank plus their diversity rank. Individuals' non-dominance ranks are determined using the non-dominated sorting algorithm presented in [42–44]. Each individual within the population is compared to the others to determine dominance. This gives initial non-dominated front solutions. The first front's solutions are temporarily discarded, and then the preceding method is repeated until no non-dominated fronts remain. Solutions from the same non-dominated front are ranked equally.

For diversity rank, NSGA-II crowding's distance metric determines each solution's crowding density. To determine the density of solutions around a specific solution in a front, we calculate the average distance of two solutions along each goal (two solutions on either side of the solution *xi*). Front boundary solutions have an infinite crowding distance. For all other solutions within a front, the following Algorithm 2 is used to assign the crowding distance [36]. Greater crowding distance in a solution suggests more variety (diversity). Based on their crowding distance, the solutions in the population are rated and ordered.

<sup>1.</sup> Initialize.

#### **Algorithm 2:** Crowding distance measure

	- 2.1 Sort the solutions in *F* along objective *fm*;


New Pool Generation: After comparing each child to one of its randomly selected parents, taking dominance and crowding density into account, a new pool consisting of all the parents and some of the offspring is formed. Possible outcomes include the following three scenarios:


$$P = 1 - \exp(\left(\delta(parent) - \delta(offspring)\right))\tag{24}$$

*δ* denotes the crowding distance of a solution. A more diverse child with a larger crowding distance than its parent has a greater chance of survival. More diverse solutions are rewarded by being given a chance to thrive in subsequent generations.


Trimming New Pool: Both non-domination rank and diversity rank are used to sort the new pool. Thus, the diversity rank is used to compare alternatives that have the same non-domination rating. The new population will be made up of the initial items of the sorted list of fronts *F*1, *F*2, ... , *F*<sup>n</sup> where elements of *Fi* + 1 are dominated only by elements in *F*1, *F*2, ... , *Fi*. All generations of non-dominated solutions are saved in EMOCA's archive.

For the most part, EMOCA relies on an individual's diversity score to determine whether or not their offspring will be allowed to join the new population. While EMOCA does not tolerate offspring who are dominant like their parents, it does allow some lowquality offspring to remain in the population, provided they have sufficient variety. The result is a more well-rounded and interesting population. Although NSGA-II allows all viable offspring to go on into the next generation, EMOCA only allows a small percentage to do so. Therefore, whereas NSGA-II executes non-dominated sorting on a population of size 2*N*, where *N* is the population size, EMOCA executes non-dominated sorting on a population size between *N* and 2*N*. With this, EMOCA's computational complexity decreases [11].

#### *3.1. Chromosome Representation*

In EMOCA's solution space for an optimization problem, a chromosome *CHi* is an ordered collection of intermediate nodes *Sop <sup>i</sup>* that begins with source *<sup>s</sup><sup>p</sup>* and ends with *<sup>d</sup>p*. Genes in the chromosomal model are represented by each node in the set.

$$\text{CH}\_{i} = \left\{ s^{p}, 1, 2, \dots, \left( \left[ \frac{D}{E(r)} \right] - 2 \right), (|D / E(r)| - 1), d^{p} \right\}^{FT, CD} \tag{25}$$

$$FT\_{i,j}^{l} = \left(1 - \sum\_{t=0}^{N\_{tr}} \left(\varepsilon\_{i,j}\right)^{t} \left(1 - \varepsilon\_{i,j}\right)\right) + d\_{i,j}^{l} \tag{26}$$

Given a connection with packet error rate *ei*,*<sup>j</sup>* and degree estimate of the link *d<sup>l</sup> i*,*j* , we may calculate the number of retransmissions, *Nre*, that will be necessary for a successful transmission. In this case, a path's cumulative fault tolerance is calculated by adding the fault tolerances of its individual connections. With the help of packet-error-rate-based linkquality estimation and neighbor-density-based degree estimation, we are able to calculate a link's fault tolerance *FT<sup>l</sup> i*,*j* . The degree estimation can be derived from Equation (27) where *de <sup>i</sup>* and *<sup>d</sup><sup>e</sup> <sup>j</sup>* are the degrees of nodes *i* and *j*, respectively, and *α* is a decision variable varying between 0 and 1.

$$d\_{i,j}^{l} = \begin{cases} 1, & d\_i^c = d\_j^c = N\_n - 1 \\ 1 - \alpha^{d\_i^v}, & d\_i^c = d\_j^c < N\_n - 1 \\ 1 - \alpha^{\frac{(d\_i^v - d\_j^v)^2}{d\_i^v + d\_j^v}}, & \left| d\_i^v - d\_j^c \right| > 0 \end{cases} \tag{27}$$

When calculating the communication delay *CD<sup>l</sup> i*,*j* , we factor in interference for the connection, which is based on the link quality, as well as propagation and transmission delay where *di*,*<sup>j</sup>* is the distance between the pair of nodes *i* and *j*, *Sp* represents propagation speed, *Spkt* is the packet size, and *st* represents transmission speed.

$$CD\_{i,j}^{l} = \left(1 - \sum\_{t=0}^{N\_{\rm tr}} \left(\boldsymbol{e}\_{i,j}\right)^{t} \left(1 - \boldsymbol{e}\_{i,j}\right)\right) + \frac{d\_{i,j}}{S\_p} + \frac{S\_{pkt}}{s\_t} \tag{28}$$

#### *3.2. Crossover and Mutation*

The crossover procedure involves randomly swapping a collection of nodes between two chromosomes from the population (all paths between *s<sup>p</sup>* and *dp*). The exchange is limited to nodes that are reachable both downstream and upward. Larger group sizes are desirable in the earlier stages (lower generations) of a solution. Generation number and chromosomal pair size determine crossover group size. Due to the recurrence of intermediate nodes, chromosomes after crossover operations (also called offspring in optimization theory) are repaired. Intermediate nodes in the parent chromosome but not in the offspring are considered during repair. If two randomly chosen nodes on the chromosome can be reached (present as neighbors) from their respective descendant nodes, then the mutation process will swap their positions.

#### *3.3. Non-Dominance and Crowding-Distance-Based Sorting for Chromosomes*

Using non-dominance, chromosomes are sorted. Multiple competing goals are used to arrange chromosomes. Consider population chromosomes *CHi* and *CHj*. According to Pareto optimum, a chromosome *CHi* dominates *CHj* if at least one of its fitness values is higher than *CHj*'s and the other fitness values are equal. Multi-objective optimization in communication networks favors Pareto-optimal prioritizing [40,41]. For two goals, it is:

$$\text{CH}\_{i} > \text{CH}\_{j} = \begin{cases} \text{CH}\_{i}(\text{FT}) > \text{CH}\_{j}(\text{FT})\_{\text{/}} \land \text{CH}\_{i}(\text{CD}) \not< \text{CH}\_{j}(\text{CD})\\ \text{CH}\_{i}(\text{CD}) > \text{CH}\_{j}(\text{CD})\_{\text{/}} \land \text{CH}\_{i}(\text{FT}) \not< \text{CH}\_{j}(\text{FT}) \end{cases} \tag{29}$$

The population's chromosomes are sorted by fitness using the non-dominance notion. Non-dominant chromosomes rank first in the population. Only one chromosome in a population ranks second. Population-wise, chromosomes dominated by two others rank third. Each chromosome's crowding distance is computed after ranking. The next generation is chosen via a tournament method.

Algorithm 3 lays out the whole process that was followed to obtain an optimal solution, for which a population (paths between pairs of sources and destinations) of size *Spop* is formed by randomly scattering the decision variable throughout some allowed range (low, high). Non-dominance-based sorting *oldpop* is used to order the population. To determine the objective-1 normalized fault tolerance and the objective-2 normalized delay for each *Si* ∈ *oldpop*, the best half of the population is selected, and for each *Si* the crowing distance *Cdist* is computed from all points excluding boundary points. Using the tournamentselection approach, the best half of the population is chosen based on the rank of *i th* solution *Ri* and crowding distance *Cdist*. By introducing mutations and performing crossovers, a superior solution may be generated from a preselected parent population. The optimal half of the population is once again chosen from the whole population. These procedures are iterated until the stop criterion is met (the maximum number of generations is reached) in order to produce optimal chromosomes. The time complexity of EMOCA is *<sup>O</sup>*(<sup>2</sup> × *<sup>S</sup>pop* × *<sup>N</sup>gen*), where *<sup>S</sup>pop* is size of the old population and *<sup>N</sup>gen* represents the number

of generations. The number of generations, and hence the amount of time it takes to run, is indirectly determined by the size of the network. As a result, the time needed for each generation might change based on the system's hardware.

In summary, convergence is emphasized by the concept of non-domination rank. During the period of tournament selection and population reduction, variety is preserved by the use of diversity rank. It is also possible to apply the crowding distance to the parameter space [11]. In contrast, we measure crowding in the target space to determine the optimal solution. When compared to NSGA-II and other multi-objective evolutionary algorithms (MOEAs) such as multi-objective ant-colony optimization (MOACO) and multi-objective particle-swarm optimization (MOPSO), EMOCA's most distinguishing features include:


The next section details the simulations run to assess the framework's efficacy, paying special attention to the parameters of the test beds, the metrics used, and the analysis of the resulting data. Two goals were set to accomplish simulations based on case studies. To begin, the number of generations has an influence on fault-tolerant optimization's efficacy, which is then used to determine how well it performs. Second, network density is a key indicator of fault-tolerant optimization's effectiveness.

```
Algorithm 3: EMOCA for solving the optimization problem
Input: Spop, Ngen, lS,D, Fpath
                        i
Starting with generate initial population size Spop. Then saving one copy of population as
"oldpop".
For each Si ∈ oldpop
       Calculate objective-1 normalized fault-tolerance using Equation (19)
       Calculate objective-2 normalized delay using Equation (22)
End for
   g = 1;
While (g ≤ Ngen)
    Non-dominated sorting (oldpop)
    For each Si ∈ oldpop
       Calculate Ds
                   i
  End for
       j = 1,
  For each Si ∈ oldpop
         If (Ds
              i = φ)
         Fj = Fj ∪ Si, Ri = 1
       End if
  End for
       j = 2,
  For each Si ∈ oldpop
  If (Ds
       i 	= φ && Ri == j − 1)
  Fj = Fj ∪ Si, Ri = 1, j = j + 1
  End if
End for
Crowding _distance (oldpop)
  Assume Cdist from boundary point (group of solution) to ∞ for any solution.
  For each Si ∈ oldpop
      Calculate Cdist from all point excluding boundary points
  End for
  Select the best half population as parentpop considering Ri & Cdist using tournament selection
approach.
       childpop = φ
       Schild−pop = 0
  While (Schild−pop ≤ Spop)
       Randomly select two chromosomes from the parent population.
       Perform crossover to produce two child chromosomes.
       Update childpop and Schild−pop = Schild−pop + 2
       Randomly choose a chromosome from parent population.
       Mutate chromosome to produce a child chromosome
       Update childpop and Schild−pop = Schild−pop + 1
  End while
       Generate new population of size (2 × Spop) by oldpop ∪ childpop
       Calculate normalize fault-tolerance using Equation (19).
       Calculate normalized delay using Equation (22).
       Non-dominated sorting (Noldpop ∪ childpop)
       Crowing_ distance (oldpop ∪ childpop)
       Select again the best half population as oldpop using rank and Cdist
End while
Output: optimized chromosomes
```
#### **4. Experimental Results**

In order to evaluate the proposed framework in virtual networks, the NS2 network simulator employs C++ to develop the simulation's primary classes. The major classes of the simulation include 'NetworkNode', 'VirtualNode', 'RandomProvider', 'PathSearch', and 'MainApp'. All the characteristics of a node in a network, such as position, list of neighbors,

link delay with neighbors, and fault tolerance of associated links, are implemented in 'NetworkNode'. At 'VirtualNode', tasks are processed using an interface-based architecture. Different sets of network nodes are generated at random by the 'RandomProvider' for each simulation run. PathSearch is a tool for optimizing virtual network generation with respect to delay and fault tolerance. The simulation is run on a machine with a 64-bit UBUNTU operating system (Linux), 16 GB of RAM, and an Intel Core i7-11700K processor running at 3.6 GHz. Three sets of randomly formed networks of 100, 500, 1000, 1500, and 2000 nodes are constructed using the Poisson distribution method. For each of four distinct networks, the EMOCA algorithm is run for 500, 1000, 1500, and 2000 generations in an effort to maximize fault tolerance and minimize communication latency. The most recent generation's chromosomes in the results table stand in for the most recent set of optimized values.

Parameter and setting values utilized in the simulations are listed in Table 2. Sensors are deployed in a range of 100 to 1000, according to a specific deployment pattern, with a maximum transmission radius of 30 m, uniformly and randomly distributed across the circle with area *NA* = 1500 m2. The initial energy level J of each sensor is the same. The power consumed while transmitting, receiving, and in the idle state are 175 mJ, 175 mJ, and 0.015 mJ, respectively. The power consumed for sensing is equal to 1.75 μJ. For focusing on coverage measurement, a sensing range of 10 m and a transmission range of 30 m are considered during the simulation. Transmission delays due to propagation have been deemed insignificant for the simulation region chosen. Each experiment was repeated 30 times using the specified simulation settings and variables, and the arithmetic mean was used to optimize the data record with a 95% degree of confidence.


**Table 2.** Basic parameter setting for simulation.

#### *4.1. Comparative Results*

In Figures 3–6, we see how EMOCA, NSGA-II, and multi-objective versions of both optimization algorithms, which include particle-swarm optimization (PSO) and ant-colony optimization (ACO), perform while optimizing a network with 100 nodes across 500–2000 generations. Herein, the comparative algorithms were employed as black-box versions with their default parameters (open-source code from GitHub). It is evident that EMOCA outperforms other comparative algorithms in terms of optimization performance, with regards to both fault tolerance and communication latency. The finding demonstrates that virtualized WSNs based on EMOCA can successfully deal with failure. More specifically, the optimal values for fault tolerance and communication latency are 0.67 and 0.02, respectively. This is because packet-error rate is a reliable predictor of fault tolerance. For the multi-objective version of ACO, the optimal value of fault tolerance is approximately 0.57 and the optimal value of communication delay is approximately 0.038. However, for the multi-objective version of PSO, the optimal values of fault tolerance and communication delay are 0.31 and 0.11, respectively.

**Figure 3.** Optimized chromosome with 100 nodes after 500 generations.

**Figure 4.** Optimized chromosome with 100 nodes after 1000 generations.

**Figure 5.** Optimized chromosome with 100 nodes after 1500 generations.

**Figure 6.** Optimized chromosome with 100 nodes after 2000 generations.

The optimal value of fault tolerance for NSGA-II is around 0.44, whereas the optimal value of delay is approximately 0.05. This is because a fault-tolerant estimate is reliant on the degree of connection. In a wireless environment, the estimate is inappropriate. Having a large number of chromosomes also increases latency and decreases fault tolerance. In addition, because of the reduced size of the network (100 nodes), the effect of a larger number of generations on the final, optimized chromosome is far less dramatic. It is difficult to tell what makes one set of results distinct from the next. This is because there are fewer possible paths to create in more compact networks.

The network is scaled up to 500 nodes in order to amplify the optimization performance gap between generations. Figures 7–10 display a comparison of optimization performance with increasing network size. The results show that when both goals are included, EMOCA achieves greater optimization performance than NSGA-II and multiobjective versions of both ACO and PSO. Specifically, the fault tolerance value of the latest optimized chromosome is about 0.92, while the communication latency value is around 0.015. This is because more paths are available in more extensive networks, allowing for the selection of connections of higher quality, with higher fault tolerance and reduced communication latency. There is a tradeoff between fault tolerance and communication latency, with the optimal value for each ranging around 0.82, 0.06 for ACO; 0.72, 0.08 for NSGA-II; and 0.59, 0.1 for PSO, respectively. The pace at which the system converges on an optimal solution has slowed, and the number of optimized chromosomes has decreased. Additionally, the bigger network (500 nodes) mitigates the negative effects of increasing the number of generations on the optimized chromosome.

The convergence rate toward the ideal solution is boosted by increasing the network size to 1000 nodes. Figures 11–14 display a comparison of the optimization convergence rates. As expected, the results show that EMOCA has a higher optimization convergence rate compared to NSGA-II, ACO, and PSO for both goals. Comparatively, the optimum chromosomal value for communication latency is about 0.010, whereas the fault-tolerance value is around 0.98. This is because, as the size of the network grows, more and more paths become suitable for use, allowing for more discriminatory tolerance in the paths that are ultimately chosen. The optimal fault tolerance for ACO chromosomes is around 0.82, whereas the optimum communication delay is about 0.06. The optimal fault tolerance for NSGA-II chromosomes is around 0.78, whereas the optimum communication delay is about 0.07. The optimal fault tolerance for PSO chromosomes is around 0.59, whereas the optimum communication delay is about 0.1. In addition, when the size of the network is ramped up, the proportion of optimized chromosomes grows. In both cases, you will find that the chromosomes are packed closely together. We can also observe that the Pareto

front obtained by EMOCA covers a wider region of the objective space compared to the Pareto fronts obtained by the other algorithms.

**Figure 7.** Optimized chromosome with 500 nodes after 500 generations.

**Figure 8.** Optimized chromosome with 500 nodes after 1000 generations.

#### *4.2. Summary of Results*

We can also observe that the Pareto front obtained by EMOCA covers a wider region of the objective space compared to the Pareto fronts obtained by the other algorithms. EMOCA yields much smaller values for the crowding distance of a solution compared to competing techniques. EMOCA finds a wide variety of non-dominated solutions spaced out almost uniformly. These characteristics enable EMOCA algorithms to search for solutions in a much larger space with less complexity, and the results show that the EMOCA approach was capable of providing more accurate solutions at a lower computational complexity than the existing compared methods.

**Figure 9.** Optimized chromosome with 500 nodes after 1500 generations.

**Figure 10.** Optimized chromosome with 500 nodes after 2000 generations.

**Figure 11.** Optimized chromosome with 1000 nodes after 500 generations.

**Figure 12.** Optimized chromosome with 1000 nodes after 1000 generations.

**Figure 13.** Optimized chromosome with 1000 nodes after 1500 generations.

**Figure 14.** Optimized chromosome with 1000 nodes after 2000 generations.

Algorithms built on the NSGA-II framework outperform their PSO-based counterparts. Here are several explanations that might be at play. Because of NSGA-II's crossover and

mutation processes, chromosomes may be shifted across huge distances in the solution space. Additionally, in NSGA-II, there is no correlation between individual chromosomes and the present local or global best results. Such capabilities allow NSGA-II based VNE algorithms to explore solutions in a considerably broader area than is possible with the PSO method alone. On the other hand, only the "best" particle shares its knowledge in PSO-based VNE algorithms. In contrast to PSO-based algorithms, those based on ACO vary in the calculation rank of the nodes, which affects the sequence in which virtual node consolidation and pheromone computing occur. As a result, EMOCA algorithms are a viable option for multi-objective optimization since they may provide more workable solutions.

#### **5. Conclusions and Future Work**

Both diversity and convergence are crucial for VNE optimization techniques. A system designer who is interested in analyzing several tradeoff alternatives in order to make an informed decision would not benefit from a Pareto set with few solutions concentrated in a certain location of the Pareto front. In this study, we demonstrate how we optimized two conflicting objectives—fault tolerance and communication latency in virtualized WSNs—by concentrating on heterogeneous network requirements for IoT applications through utilizing an EMOCA framework that uses a stochastic replacement-selection technique that includes both non-domination and diversity. In order to address the issues of fault tolerance and communication latency in virtualized WSNs, a mathematical formulation of a multi-objective optimization problem is presented. Using NSGA-II as a benchmark, we found that EMOCA's optimization framework was superior to the current standard. Simulation results demonstrate that EMOCA outperforms superior optimization techniques produced with fewer generations. Moreover, the time to achieve the optimization outcomes is reduced compared to the best current methods. This proves the effectiveness of the suggested framework in terms of convergence and diversity, since a diverse range of non-dominated solutions is constantly discovered. The successful performance of EMOCA in optimization challenges across a broad range of sectors, such as routing and battery life in virtualized WSNs, will be considered in future research work. This is in addition to expanding the current work to include many other objective functions besides fault tolerance and communication delay.

**Author Contributions:** Conceptualization, S.M.D.; methodology, S.M.D. and R.A.O.; software, R.A.O.; validation, S.M.D. and I.A.A.E.-M.; formal analysis, S.M.D. and I.A.A.E.-M.; investigation, S.M.D.; resources, R.A.O.; data curation, R.A.O. and I.A.A.E.-M.; writing—original draft preparation, S.M.D., I.A.A.E.-M. and R.A.O.; writing—review and editing, S.M.D.; visualization, R.A.O. and I.A.A.E.-M.; supervision, S.M.D.; project administration, R.A.O. and I.A.A.E.-M.; funding acquisition, R.A.O. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** The study did not require ethical approval.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The study did not report any data.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **S-Type Random** *k* **Satisfiability Logic in Discrete Hopfield Neural Network Using Probability Distribution: Performance Optimization and Analysis**

**Suad Abdeen 1,2, Mohd Shareduwan Mohd Kasihmuddin 1,\*, Nur Ezlin Zamri 3, Gaeithry Manoharam 1, Mohd. Asyraf Mansor <sup>3</sup> and Nada Alshehri <sup>2</sup>**

	- **\*** Correspondence: shareduwan@usm.my; Tel.: +60-46534769

**Abstract:** Recently, a variety of non-systematic satisfiability studies on Discrete Hopfield Neural Networks have been introduced to overcome a lack of interpretation. Although a flexible structure was established to assist in the generation of a wide range of spatial solutions that converge on global minima, the fundamental problem is that the existing logic completely ignores the probability dataset's distribution and features, as well as the literal status distribution. Thus, this study considers a new type of non-systematic logic termed S-type Random *k* Satisfiability, which employs a creative layer of a Discrete Hopfield Neural Network, and which plays a significant role in the identification of the prevailing attribute likelihood of a binomial distribution dataset. The goal of the probability logic phase is to establish the logical structure and assign negative literals based on two given statistical parameters. The performance of the proposed logic structure was investigated using the comparison of a proposed metric to current state-of-the-art logical rules; consequently, was found that the models have a high value in two parameters that efficiently introduce a logical structure in the probability logic phase. Additionally, by implementing a Discrete Hopfield Neural Network, it has been observed that the cost function experiences a reduction. A new form of synaptic weight assessment via statistical methods was applied to investigate the effect of the two proposed parameters in the logic structure. Overall, the investigation demonstrated that controlling the two proposed parameters has a good effect on synaptic weight management and the generation of global minima solutions.

**Keywords:** discrete hopfield neural network; non-systematic satisfiability; probability distribution; binomial distribution; statistical learning; optimization problems; travelling salesman problem; evolutionary computation

**MSC:** 37M22; 37M05

#### **1. Introduction**

A Discrete Hopfield Neural Network (DHNN) is a significant type of Artificial Neural Network (ANN) that employs a learning model based on association features formulated by Hopfield and Tank [1]. ANNs have long been used as a mathematical method with which to solve a range of issues [2–8]. DHNN is a recurrent ANN with feedforward connections that comprise interconnected neurons in which every neuron output is fed back into every neuron input. Neurons are stored in either a binary or bipolar form in the input and output neurons of the DHNN structure [9]. Further, to approximate optimization solutions for problems, the structures of DHNN have been extensively modified. This network has many interesting behaviors. Fault tolerance is also a feature of the Content Addressable Memory (CAM) technique, which has an infinite capacity for pattern storage and is useful for its

**Citation:** Abdeen, S.; Kasihmuddin, M.S.M.; Zamri, N.E.; Manoharam, G.; Mansor, M.A.; Alshehri, N. S-Type Random *k* Satisfiability Logic in Discrete Hopfield Neural Network Using Probability Distribution: Performance Optimization and Analysis. *Mathematics* **2023**, *11*, 984. https://doi.org/10.3390/ math11040984

Academic Editors: Francois Rivest and Abdellah Chehri

Received: 14 December 2022 Revised: 17 January 2023 Accepted: 20 January 2023 Published: 15 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

converging iterative process [10]. Numerous applications have made use of DHNNs, including optimization problems [1], clinical diagnosis [11–13], the electric power sector [14], the investment sector [15], location detectors [16], and others. Despite the importance of using the intelligent decision systems of the DHNN to solve optimization problems, it is necessary to implement the symbolic rule to guarantee that the DHNN always converges to the ideal solution, because recent studies failed to conduct a thorough analysis of a DHNN based on neural connections. This issue was solved by Wan Abdullah [17], who suggested a logical rule for ANNs by associating each neuron's connection with a true or plausible interpretation.

The Wan Abdullah approach is a novel approach, and it is interesting to note that the synaptic weight is determined by matching the logic cost function and the Lyapunov energy function. This approach led to better performance than traditional teaching techniques such as Hebbian learning with respect to obtaining the synaptic weight during the training phase. A more specific logical rule has been developed since the logical rule was first introduced in the original DHNN. Sathasivam [18] decided to expand the work of Wan Abdullah and proposed Horn Satisfiability (HORNSAT) as a new Satisfiability (SAT) concept. This study introduced the Sathasivam method of relaxation to improve the finalized state of neurons. This proposal demonstrates the strong capabilities of the HORNSAT in terms of reaching the absolute minimum amount of energy. The outcome demonstrates that logical rules can be included in DHNNs. Nevertheless, because DHNNs relax too quickly and offer fewer possibilities for neurons to interchange information, more local minimum solutions result, which makes it difficult to understand how different logical rules affect DHNNs. This motivated the emergence of a new era of research with different perspectives, beginning with Kasihmuddin et al. [9], who introduced systematic *k* Satisfiability (*k*SAT) for *k* = 2, 2 Satisfiability (2SAT). With each clause containing two literals and all clauses joined by a disjunction, the implementation of 2SAT in a DHNN was reported to achieve a high global minima ratio while keeping computational time to a minimum. Subsequently, Mansor et al. [19] continued the research by proposing a high degree of order of *k*SAT for *k* = 3, namely, 3 Satisfiability (3SAT), in a DHNN. With each clause containing three literals and all clauses joined by a disjunction, the proposed 3SAT in a DHNN increases the storage capacity of a network because each neuron's number of local minimum solutions tends to be low. Despite the success of the implementation of systematic logic in DHNNs, this approach lacks control with respect to distributing the number of negative literals as well as regarding a variety of clauses. Furthermore, as the number of such neurons increases, the efficiency of the training phase in the DHNN decreases. During the testing phase of DHNNs, there is less neuronal variation. Sathasivam et al. [20] clarified that the rigidity of the logical structure contributes to overfitting solutions in DHNNs. When the number of neurons is large, the restricted number of literals per clause results in suboptimal synaptic weight values, thereby decreasing the likelihood of locating diverse global minima solutions. The necessity of variance in the recovered solutions ensures that the search space is well-explored. Further stated by [21], DHNNs are still vulnerable to various challenges, including a lack of generality as a result of non-flexible logical rules and a strict logic structure, despite the fact that the accuracy of research acquired from the real-world dataset has been satisfactory.

Due to the need for a different logical clause set that contributes to the degree of connection between the logical formulae, Sathasivam et al. [20] proposed a non-systematic SAT called Random *k* Satisfiability (RAN*k*SAT) by using first-order and second-order logic 2SAT in conjunction, where *k* = 1, 2; Random 2 Satisfiability (RAN2SAT); and all clauses are connected by disjunction. RAN2SAT introduces a flexible logic structure that contributes to the generation of more logical inconsistency, which expands the diversity of synaptic weights. The proposed RAN2SAT in a DHNN achieved about 90% of the global minima ratio with fewer neurons. Due to the necessity of increasing the storage capacity of RAN2SAT and dealing with the absence of interpretation in a typical systematic satisfiability logic and limited *k* ≤ 2, Karim et al. [22] were inspired to resolve this problem and thus proposed a flexible logic structure that increases storage capacity by incorporating third-order clauses into the formulation. Random 3 Satisfiability (RAN3SAT) suggests three logical (*k* = 1, 3; *k* = 2, 3; and *k* = 1, 2, 3) literal structures per clause, and for all clauses to be joined by a disjunction. This increases the capacity of the DHNN to recover neuronal states based on different logical orders, which can lead to a variety of convergent interpretations of global minimum solutions. Both RAN*k*SAT types experience difficulty regarding the selection system in terms of the composition represented by the first, second, and third logical formulations, which is still poorly defined. Thus, the combination of correct interpretations is restricted to the number of *k*-order clauses with a predefined term assigned in the logical formula.

Another fascinating study on non-systematic logic with a different perspective was introduced by Alway et al. [23]; this solution increases the representation of 2SAT compared to 3SAT clauses in non-systematic SAT logic through an assigned 2SAT ratio (*r\**) in DHNN in order to decrease the duplication of final neuron state patterns. The proposed Major 2 Satisfiability (MAJ2SAT) in the DHNN successfully provides more neuronal variation. Zamri et al. [24] introduced Weighted Random *k* Satisfiability (*r*SAT) as a non-systematic method with a proposed logical structure that ideally produces the proper *r*SAT logical structure using a Genetic Algorithm (GA) by taking into account the desired proportion of negative literals (*r*). Another method introduced by Sidik et al. [25] consisted of altering the *r*SAT logic phase by adding a binary Artificial Bee Colony algorithm to guarantee that negative literals are distributed properly. The proposed *r*SAT in a DHNN with a weighted ratio of negative literals leads to a significant global minima ratio. Nonetheless, despite this significant advancement in controlling the logical structure of selecting clauses and using a metaheuristic approach to distribute the number of negative literals, these techniques fail to account for the representation of the probability distribution of the dataset in the selection system.

Unique, flexible logical systems were formed by combining systematic and nonsystematic approaches with a unique perspective. This approach leads to a great potential for solution diversity as it randomly generates a number of clauses. Guo et al. [26] proposed Y-Type Random 2 Satisfiability (YRAN2SAT), in which a number is randomly assigned to the first-order and second-order clauses, while further final states can be retrieved by YRAN2SAT in a DHNN with the minimum global energy. With high order logic, Gao et al. [27] proposed a G-Type Random *k* Satisfiability (GRAN3SAT) system, in which a set of clauses of first, second, and third orders is randomly generated. In a DHNN, GRAN3SAT can exhibit a larger storage capacity and is capable of investigating complex dimensional issues. Despite this success, its system of selection still has a flaw: there is no clear system with which to control a distribution over the desired number of negative literals based on the probability distribution of a dataset.

The Probabilistic Satisfiability problem (PSAT) involves assigning probabilities to a set of propositional formulations and deciding whether this assignment is consistent. The pioneering work was introduced by George Boole [28] as another perspective. He proposed the PSAT to determine if he could discover a probability measure for truth assignments that satisfy all assessments. The PSAT framework was developed to demonstrate these details as logical sentences with linked probabilities to infer the likelihood of a query sentence. The PSAT was initially suggested by George Boole and, subsequently, was refined by Nilsson [29]. This intelligent perspective was followed by different studies [30–33], which all aimed to integrate the probability tools into satisfiability without considering their implementation in a DHNN. The present study addresses this gap by introducing a probability distribution to the prevailing attribute in the data set, which is represented in a DHNN through desire logic.

There are no studies in this area regarding the way in which the probability distribution for literals with SAT may be represented in a DHNN. Thus, the findings addressing this issue can be used to guarantee the most effective search for satisfying interpretations. Therefore, this study introduces S-type Random *k* Satisfiability (*δkSAT*), where *k* = 1, 2

(*δ*2*SAT*) and with the probability distribution of the prevailing attribute in the simulation dataset. It aims to address the problem regarding RAN*k*SAT, where *k* randomizes structural issues by utilizing two statistical features, the probability distribution and the sample size formula, to obtain an estimator for the binomial distribution dataset. In addition to helping to assign the negative literal that was mapped to the prevailing attribute in a dataset with a non-systematic logical RAN2SAT, the main feature of RAN2SAT is its structural flexibility, which takes advantage of another logical rule, 2SAT, whereas the non-systematic logical rule provides a more diversified solution [34,35]. Furthermore, the probability distribution is used to control the composition's probability of appearing in first- and second-order logic to avoid a poorly explained or lack of interpretation in non-systematic SAT by providing suitable logical combinations depending on the dataset's distribution. Moreover, the logic system uses the binomial distribution's sample size to determine the appropriate number of negative literals based on the predetermined proportion appearing in the dataset. Then, the clauses are distributed in each order depending on the probability distribution governing appearance. This approach will help us determine the appropriate weight of a negative literal number in logic systems based on the distributed clauses in order to create suitable solutions [24]. Notably, researchers tend to neglect negative literals because they are indirectly mapped errors in a logical structure [36]; however, in this study, negative literals represent the prevailing attribute in a binomial distribution that has only two characteristics.

Our proposed logical rule will provide flexibility with respect to controlling the overall structure of *δ*2*SAT* in terms of the dataset's characteristics by combining both the effects of statistical parameters and non-systematic features to identify suitable neuronal variation and diversity in the proposed logic. The main aims of this study are as follows:


The framework of this paper is as follows: The motivation for this study is described in detail in Section 2. An overview of *δ*2*SAT*'s structure is given in Section 3. The integration of *δ*2*SAT* into a DHNN is described in Section 4. Section 5 explains the experimental setup and performance assessment metrics incorporated into the simulation. In Section 6, the effectiveness of the proposal logic in a DHNN is discussed and analyzed, with comparisons made to several existing logical structures with regard to various parameters and phases. The conclusions and future work are presented in Section 7 at the end of the article.

#### **2. Motivation**

#### *2.1. Issue with the Identified Probability Distribution*

With reference to the structural issue regarding existing systematic and non-systematic satisfiability, that is, the systematic logic *k*SAT [19,37], the relevant approaches in this respect implement random selection for the literal states from within clauses, where the clauses are selected uniformly, without regard to the individual probability or chance of appearing in the required population dataset. Whereas the non-systematic logic RAN*k*SAT [20,22] structure is defined randomly, wherein the clauses are selected uniformly. Moreover, the chance of obtaining both negative and positive literals is uniformly distributed [38], with both outcome having an equally likely chance of appearing. This implies that the population follows a uniform distribution and is thus considered a limited option. In this study, we address this research gap by giving the clauses and negative literals inside clauses the priority of a population dataset's probability distribution, and when the dataset has two characteristics, i.e., negative and positive literals, we assign the negative literal for the prevailing attribute that is withdrawn from a binomial distribution.

#### *2.2. Initialization for the Number of Clauses and Number of Neuron*

The investigation into controlling the general structure of SAT is still ongoing. Cai and Lei's [39] work proposed a Partial Maximum Satisfiability (PMAXSAT) clausal weighting mechanism, with a positive integer as its weight. This method demonstrated the power of weight in terms of controlling the distribution of a logical structure based on the desired result. Conversely, Always et al. [23] suggested a non-systematic logical rule, MAJ2SAT, which seeks to create bias in the selection of 2SAT over 3SAT via the *r\** ratio. The MAJ2SAT system successfully provides more neuronal variations that increase the composition of the 2SAT with the same number of neurons. Despite the benefit of extracting information from real datasets that exhibit the behaviors of 2SAT and 3SAT, the persistent issue is the system of selection, which limits the value of *r* in the set of limited pre-defined intervals and is chosen randomly without considering a dataset's probability distribution. Therefore, we propose the non-systematic logical rule *δ*2*SAT*, which incorporates a probability logic phase to calculate the probability of first- and second-order clauses appearing from the dataset by determining the required number of literal and clauses.

#### *2.3. Initialization for the Number of Negative Literals*

The structure of SAT should be subjected to a systematic analysis to avoid the poor description of a dataset. Dubois and Prade [40] examined the role of logic in dealing with uncertainty in an ANN. The work concluded that it was crucial to use the generalization method to determine how many negative literals should be distributed for technical convenience. Zamri et al. [24] introduced *r*SAT with the (logic phase) as a new phase to produce a non-systematic logical structure based on the ratio of negative literals. The ratio is generated in the logic phase by employing GA to increase the logic phase's effectiveness. Nevertheless, the findings showed that the proposed model performed well, indicating that having a dynamic distribution of negative literals will benefit the generation of global minimum solutions with different states of the final neurons. One of the limitations of the weighting scheme is the method of choosing the number of negative literals, where the value of *r* is in the set of limited pre-defined intervals and is subject to the issue of random system selection without considering the probability distribution of literals.

Alway and Zamri's studies motivated the current study, in which we propose the nonsystematic logical rule *δ*2*SAT*, which incorporates a probability logic phase to calculate the appearance-related probability distribution in the first-order and second-order clauses from the real dataset by predetermining the required number of neurons or number of clauses via harnessing the behavior of 2SAT so as to explore a wider solution space and extract information from datasets, as well as assign the number of negative literals required for logic by using the sample size formula with a predefined, prevailing attribute proportion from the dataset that will be exposed in the logic.

#### *2.4. Synaptic Weight Performance Using Statistical Analysis*

The research on satisfiability in DHNNs suffers from a lack of statistical analysis, especially in terms of synaptic weight, which is considered the backbone for the global minimum solution achieved during testing phases. We determine synaptic weight by contrasting the cost function with Lyapunov energy. The previous studies on systematic and non-systematic approaches were limited in terms of assessing the performance accuracy of the logic in different phases, as mentioned in [9,21,22]. The synaptic weight was analyzed at several points in this study since they were not completely comprehensible in [20,26], wherein the authors describe the dimensions of the synaptic weight values. In addition, [27] measured the accuracy of the error in the synaptic weight by evaluating the differences between the synaptic weight obtained by Wan's method and the synaptic weight achieved in the training phase. The gap was addressed in this study by using new statistical tests to capture the impact of changing the synaptic weight during training phases due to the absence of statistical tools in the synaptic weight analysis.

#### **3. S-Type Random 2 Satisfiability Logic**

S-Type Random 2 Satisfiability (*δ*2*SAT*) is a new category of non-systematic-clause SAT in which the probability distribution is used to assign prevailing attributes in the dataset via two methods: First, depending on the dataset requirements, we assigned the probability of the appearance of first- and second-order logic. Second, we used the sample size from a binomial population [41] to ascertain the appropriate number of negation literals inside each clause based on its assigned probability since the probability of a negative literal appearing follows a binomial distribution. The novelty of the mentioned methods is that they determine the suitable weight of negative literal numbers (*ξ*) in logic depending on the probability clauses distributed, which will lead to greater structural diversity. In addition, the negative literal number is not fixed, and by increasing or decreasing the probability of obtaining a literal number in the logic system, there is greater flexibility in the dataset.

Our approach can be introduced as a form of non-systematic logic comprising *n* literals per *T* clauses. It is a general form of RAN*k*SAT logic, where *k* = 1,2 is expressed in the *k* Conjunctive Normal Form (*k*CNF). The components of the S-Type Random 2 Satisfiability Logic problem are as follows:

	- i. A set of *x* first-order clauses: *T*(1) <sup>1</sup> , *<sup>T</sup>*(1) <sup>2</sup> , *<sup>T</sup>*(1) <sup>3</sup> ,...... *<sup>T</sup>*(1) *<sup>x</sup>* , *<sup>x</sup>* <sup>∈</sup> <sup>N</sup>.
	- ii. A set of *y* second-order clauses: *T*(2) <sup>1</sup> , *<sup>T</sup>*(2) <sup>2</sup> , *<sup>T</sup>*(2) <sup>3</sup> , ...... *<sup>T</sup>*(2) *<sup>y</sup>* , where *<sup>T</sup>*(2) *<sup>y</sup>* = (*ri* ∨ *rj*), *y* ∈ N.

The general formulation of S-Type Random 2 Satisfiability is given as follows:

$$
\Theta\_{\delta 2SAT} = \bigwedge\_{i}^{\chi} T^{(1)} \bigwedge\_{j}^{\chi} T^{(2)} \quad \text{for } k = 1, 2 \tag{1}
$$

$$T\_i^k = \begin{cases} (r\_i)\_{\prime} & k=1\\ (r\_i \lor r\_j)\_{\prime} & k=2 \end{cases} \tag{2}$$

where Θ*δ*2*SAT* in Equation (1) is *δ*2*SAT* for *k* = 1, 2. The difference between *δ*2*SAT* and RAN2SAT lies in the selection system for the number of clauses and the number of negative literals in *δ*2*SAT*. This system is established under the condition that the number of clauses corresponds to:

$$\begin{cases} \boldsymbol{\chi}\_m = p(\boldsymbol{\chi}) \cdot \boldsymbol{\lambda}\_m \\ \boldsymbol{y}\_m = p(\boldsymbol{y}) \cdot \boldsymbol{\lambda}\_m \end{cases} \tag{3}$$

where *λ<sup>m</sup>* denotes the total number of literals *λ*<sup>1</sup> or total number of clauses *λ*2; *ym* and *xm* denote the number of literals in the first- and second-order clauses or the number of clauses when *<sup>m</sup>* = 1, 2, respectively; *ym*, *xm* ≥ 0 represent clauses *<sup>T</sup><sup>k</sup> <sup>i</sup>* for different values of *k*; and *p*(*xm*) and *p*(*ym*) denote the probability of first- and second-order logic appearing, which is calculated by the Laplace formula [42] to find the probability *Aym* from population Ω expressed as follows:

$$p(y\_m) = \frac{|A\_{y\_m}|}{|\Omega|}\tag{4}$$

 *Aym* represents a number of elements that contain a prevailing attribute from the total number of a dataset |Ω| in this study. We will denote the probability of second-order *p*(*ym*) by *Y*, which is considered as the first parameter in *δ*2*SAT*.

The number of negated literals that exist in each *T<sup>k</sup> <sup>i</sup>* will be determined by *ξ*, where *ξ* ∈ N is the negative literal number used to obtain *ρ* in the dataset [41] and is calculated as follows:

$$\xi^{\chi} = \frac{\lambda\_m \rho\_0 (1 - \rho\_0)}{(\lambda\_m - 1)(d^2 / z^2) + \rho\_0 (1 - \rho\_0)}\tag{5}$$

where:

*ρ*: The pre-defined negative literal proportion required in the logic system (Second parameter in Logic).

*ρ*0: the negative literal proportion in the population (which is available before the survey; if no estimate of *ρ*<sup>0</sup> is available prior to the survey, a worst-case value of *ρ*<sup>0</sup> = 0.5 can be used to determine the sample size).

*d*: the margin of error (or the maximum error) of the negative literal proportion, which is calculated as follows:

$$d = Z\_a \sqrt{\frac{\rho(1-\rho)}{\lambda}}\tag{6}$$

*Z*: the upper *α*/2 point of the normal distribution when *α* = 0.01, where Significance Level = P (type I error) = *α*.

The distribution of the number of negated literals in each order logic clause *T<sup>k</sup> <sup>i</sup>* is dependent on the value *β<sup>k</sup>* , where:

$$\begin{cases} \beta\_1 = (\S \times p(\mathbf{x})) \\ \beta\_2 = (\S \times p(y)) \end{cases} \tag{7}$$

In (7), *β*<sup>1</sup> and *β*<sup>2</sup> denote first- and second-order logic, respectively, and ∑ *β<sup>k</sup>* is the total number of negated literals existing in *δ*2*SAT* logic, where:

$$
\sum \beta\_i - \xi = 0 \tag{8}
$$

The structure of Θ*δ*2*SAT* is believed to provide more variations and greater diversity of the final neuron states and to be able to find more global solutions in other solution spaces via two effective parameters: *Y* and *ρ*. The implementation of S-type Random *k* Satisfiability logic in this study is outlined in Figure 1.

#### *Probability Logic Phase in δ*2*SAT*

The probability logic phase was developed to assess the features of a prevailing attribute in the dataset via probability distribution, which are then reflected in the logic system by the two parameters *Y* and *ρ*; this differs from the logic phase in *r*SAT [24], where the phase is established to allocate the correct ratio of the negative literals and the position in the *r*SAT logic via metaheuristics. The main purpose for the probability logic phase is to extract the required information from the dataset, and then generate the correct structure of RAN2SAT logic depending on the dataset features assigned by the two probability Equations (3) and (5). Subsequently, once the desired logic has been attained, the probability logic phase is complete. This section will introduce some logic generated

from the dataset using the two parameters *Y* and *ρ*; the restriction in the probability logic phase is as follows:

$$p(y\_m) + p(\mathbf{x}\_m) = 1, \ p(y\_m) > p(\mathbf{x}\_m), \ p(\mathbf{x}\_m) \neq 0 \tag{9}$$

whose probability function can be defined as follows (Nilsson 1986) [29]:

$$\begin{cases} \quad p(\lambda) = 1\\ r\_i \wedge r\_{\dot{\jmath}} \equiv 0, \text{ its mutually exclusive, then} \\ \quad p(r\_i \wedge r\_{\dot{\jmath}}) = p(r\_i) + p(r\_{\dot{\jmath}}) \end{cases} \tag{10}$$

**Figure 1.** Block diagram of the proposed S-type Random 2 Satisfiability logic Θ*δ*2*SAT*.

According to the applied method for the determination of probability, there are two types of *δ*2*SAT*: First, there is the type of probability logic phase that determines the probability of the appearance of the number of first-order logic and second-order logic literals *λ*<sup>1</sup> and the distribution of the desired number of negative literals in each clause depending on the selected dataset. Second, there is the type of probability logic phase that determines the probability of the appearance of the number of first-order logic and second-order logic clauses *λ*<sup>2</sup> and the distribution of the desired number of negative literals in each clause depending on the selected dataset. Table 1 introduces some possible examples of two cases of the logic of *δ*2*SAT* that can be used to generate the dataset using Equations (4), (5) and (7) when *ρ* = 0.7.



We observe that applying the same probability to more clauses *λ*<sup>2</sup> results in a reduced number of first-order logic items than applying it to a greater number of neurons *λ*1; notably, the number of unique logic combinations that a probability logic phase can create by using a specific value of the two parameters *<sup>Y</sup>* and *<sup>ρ</sup>* is *x* 1 × *y* 1 . Algorithm 1

presents the pseudocode for the steps taken to generate the Θ*δ*2*SAT*, which starts with the determination of the value of the two parameters *Y* and *ρ*; then, by applying the constraint of the logic in Equation (9), the probability logic phases operate under the following conditions: (a) *ρ* = 0.5, because we need to expose the prevailing attribute. (b) The *z* is a random number generated to ensure the negative values will be distributed in the logic phase randomly. (c) The loop will run *w* times to ensure that the logic system will be correctly generated. (d) The probability logic phase ends when Equation (8) is satisfied, at which point the DHNN training phase begins.

The limitation that we observed in *δ*2*SAT*'s logic structure is the position of negative literals; these are selected randomly depending on *z* random numbers, and this randomization clearly effects results in an inconsistent interpretation. In addition, there are no redundant literals. Also, due to the high probability of 2SAT, the Exhaustive Search (ES) algorithm is unable to find the best number of instances of first-order logic for a small number of clauses that satisfies Equation (9). The utilization of Θ*δ*2*SAT* in a DHNN is presented as *DHNN* − *δ*2*SAT*. In the next section, we clarify how Θ*δ*2*SAT* functions as a representational command to control the neurons of the DHNN mappings.

```
Algorithm 1: Pseudocode for generating the probability of logic phases
     Input: λm, ρ, p(ym), Set of ri
     Output: The best Θδ2SAT
Begin
    Generate Θδ2SAT
     Initialized λm;
     Initialized Proportion ρ;
     Initialized Second-order clauses p(ym);
     Calculate The number of first- and second-order clauses
     While
    (β1 ≤ y&β2) ≤ x&(β1 + β2) = ξ&p(ym) + p(xm) = 1&p(ym) > p(xm)&ym + xm = λm&ym/2 = 0, xm 	= 0)
Do
     Calculate ym, xm by Equation (3);
     Calculate ξ by Equation (5);
     Calculate β1 & β2 by Equation (7);
     End while
     distributed negative literal in logic
     While (ω ≤ 1000) do
     While (b = β1&ρ 	= 0.5&b∗ = β2)do
     for (u = 0 to xm) do
     Generate random number z;
     Generate proportion to be initial negative literal ρ∗;
     IF (ρ∗ ≥ z) THEN
    ¬ri;
    (b = b + 1);
     ELSE
     ri;
     End IF
     End for
     for (u = 0 to ym) do
     Generate random number z;
     Generate proportion to be initial negative literal ρ∗;
     IF (ρ∗ ≥ z) THEN
     -B;
(b∗ = b∗ + 1);
     Else
     B;
     End IF
     End for
     End While
End While
End
```

```
Note: b, b∗ is a counter.
```
#### **4.** Θ*δ*2*SAT* **in Discrete Hopfield Neural Network**

A DHNN is a type of free, self-feedback information comprising *N* interconnected neurons with no hidden layers. The neurons are updated one at a time; Ref. [23] asserts that the possibility of neuronal oscillations is eliminated by asynchronous updating. This network has parallel computing, quick convergence, and is also effective in terms of its CAM capacity, which has encouraged researchers to use DHNNs as mediums for solving challenging optimization problems. A general description of the state of activated neurons in a DHNN is provided below:

$$S\_i = \begin{cases} 1, & \sum\_{j}^{N} W\_{ij} S\_j \ge \varepsilon \\ -1, & \text{otherwise} \end{cases} \tag{11}$$

where the synaptic weight from unit *i* to unit *j* is *Wij*. The synaptic weight of a DHNN is always symmetrical, whereby *Wij* = *Wji*, and has no self-looping, *Wii* = *Wjj* = 0. *Si* represents the state of neuron *j*; *ε* is a predetermined threshold value, and in this study, *ε* = 0 to guarantee a uniform decrease in DHNN energy [18]; and *h* is the number of logic variables. The *δ*2*SAT* is implemented in a DHNN according to the following equation (*DHNN* − *δ*2*SAT*), due to the requirement for a symbolic rule that can control the network's output and decrease logical inconsistency by minimizing the network's cost function. To derive the cost function *E*Θ*δ*2*SAT* of Θ*δ*2*SAT*, the following formula can be used:

$$E\_{\Theta\_{\lambda 2SAT}} = \sum\_{i=1}^{x} \left( \prod\_{i=1}^{1} \Psi\_{ij} \right) + \sum\_{i=1}^{y} \left( \prod\_{i=1}^{2} \Psi\_{ij} \right) \tag{12}$$

where *x*<sup>2</sup> and *y*<sup>2</sup> are the number of clauses. The inconsistency of Θ*δ*2*SAT*, denoted as Ψ*ij*, is specified in Equation (13), as literals are possible in Θ*δ*2*SAT*:

$$\Psi\_{ij} = \begin{cases} \frac{(1-S\_r)}{2}, & \text{if } \neg r\\ \frac{(1+S\_r)}{2}, & \text{if } r \end{cases} \tag{13}$$

where *<sup>r</sup>* denotes the random literals assigned in <sup>Θ</sup>*δ*2*SAT*. If (1+*Sr*) <sup>2</sup> = 0, which leads to *E*Θ*δ*2*SAT* = 0; this indicates that all clauses in Θ*δ*2*SAT* are satisfied with the value of the mean task for the logic program during the training phase (i.e., a consistent interpretation is found). A consistent interpretation will help the logic program to derive the correct synaptic weight of Θ*δ*2*SAT* clauses, and the Wan Abdullah (WA) method [17] can be used to directly compare the cost function and Lyapunov energy function of the DHNN to determine the values of *Wij*. However, it is noted that the DHNN's synaptic weight can be effectively trained using a traditional approach such as Hebbian learning [1]; nevertheless, Ref. [43] demonstrated that the (WA) method, when compared to Hebbian learning, can achieve the optimal synaptic weight with minimal neuron oscillation. Synaptic weight is a building block (matrix) of CAM. Therefore, a specific output-squashing mechanism will be applied to every neuron in *DHNN* − *δ*2*SAT* via the Hyperbolic Tangent Activation Function (HTAF) to retrieve the correct logic pattern of the CAM; according to Karim et al. [22], the equation is expressed as follows:

$$\tanh(h\_i) = \frac{\mathcal{e}^{h\_i} - \mathcal{e}^{-h\_i}}{\mathcal{e}^{h\_i} + \mathcal{e}^{-h\_i}}\tag{14}$$

A DHNN's testing phase allows for the asynchronous updating of the neuronal state based on the following equation:

$$\mathcal{W}\_i = \sum\_{j=1, j \neq i}^{N} \mathcal{W}\_{ij}^{(2)} S\_j + \mathcal{W}\_j^{(1)} \tag{15}$$

*hi* represents the network's local field, where *<sup>W</sup>*(2) *ij* is the second-order synaptic weight and *W*(1) *<sup>j</sup>* is the first-order synaptic weight. By applying the HTAF to the *hi* values, the final state of the neurons is retrieved, and the neuron states *Si*(*t*) are updated by:

$$S\_l(t) = \begin{cases} 1, & \text{if } \tanh(h\_l) \ge 0 \\ -1, & \text{otherwise} \end{cases} \tag{16}$$

The information that results in *E*Θ*δ*2*SAT* = 0 must be present in the neuron's final state [44], which corresponds to *H*Θ*δ*2*SAT* , the Lyapunov energy function [18]:

$$H\_{\Theta\_{\text{U2S-AT}}} = -\frac{1}{2} \sum\_{\substack{i=1, i \neq j}}^{n} \sum\_{j=1, j \neq i}^{n} W\_{ij}^{(2)} S\_i S\_j - \sum\_{l=1}^{n} W\_l^{(1)} S\_l \tag{17}$$

The convergence of the energy will indicate when the degree of convergence has reached a stable state according to [22]. This is supported by Sathasivam [18], who states that if a DHNN is stable and oscillation-free, the Lyapunov energy will reach its lowest value (the equilibrium state). Hence, [45] a DHNN will always converge to the global minimum energy. One can see the convergence of the final neuron state based on the following Equation:

$$\left| H\_{\Theta\_{\ell 25AT}} - H\_{\Theta\_{\ell 25AT}}^{\text{min}} \right| \le \text{Tol} \tag{18}$$

where *H*min <sup>Θ</sup>*δ*2*SAT* , the final neuron state, produces the anticipated global minimum energy and is calculated as follows:

 

$$H\_{\Theta\_{25AT}}^{\text{min}} = -\left(\frac{\varkappa\_2}{2} + \frac{y\_2}{4}\right) \tag{19}$$

where *x*<sup>2</sup> and *y*<sup>2</sup> denote the number of first- and second-order clauses, respectively. Algorithm 2 is an example of the *DHNN* − *δ*2*SAT* given in pseudocode, which explains the processes of the training phase and testing phase of *DHNN* <sup>−</sup> *<sup>δ</sup>*2*SAT*. Conventionally, the logic program employs a 2*<sup>n</sup>* search space to find consistent interpretations by ES in the training phase.

Figure 2 illustrates the schematic diagram of *DHNN* − *δ*2*SAT*. Different orders of *k* = 1, 2 are shown in two different blocks. In the orange block, there are two inputs and an output (I/O) line, which are green and yellow, representing the two types of logic distributed by clauses and neuron, respectively. Inside the orange box, the second-order clauses are depicted, and every line represents the connection of the neuron state via weights. On the right side, the dashed blue line denotes the first-order clause that is present in this phase as well, with two (I/O) lines: green and yellow. On the inside, the line represents the connection of the neuron state via weights. The satisfied clauses from the two boxes will result in *E*Θ*δ*2*SAT* = 0; the figure only represents the satisfied clauses of Θ*δ*2*SAT*.

**Algorithm 2:** Pseudocode of *DHNN* − *δ*2*SAT*


**Figure 2.** Schematic diagram of *DHNN* − *δ*2*SAT* for both types of logic; the total of literal is *n* for first-second-order logic.

#### **5. Experimental Procedure for Testing DHNN -***δ***2SAT**

In this section, we explain the proposed logic output and evaluate it using several evaluation metrics at all phases to guarantee the effectiveness of adding statistical parameters in RAN2SAT, which aimed to produce Θ*δ*2*SAT* logic. Furthermore, the simulation platform, the assignment of parameters, and the metrics for performance are all explained. All models were used with the ES algorithm, where the algorithm utilizes trial and error to achieve a cost function that is minimized (*E*Θ*δ*2*SAT* = 0) [23].

#### *5.1. Simulation Platform*

All simulations were carried out using an open-source software, visual basic C++ (Version 2022), and a 64-bit Windows 10 operating system. To avoid biases in the interpretation of the results, the simulations were run on a single personal computer equipped with an Intel Core i5 processor. The open-source software **R** studio was used to perform the statistical analysis. Eight different simulations—depending on the statistical parameters (probability and proportion)—were conducted, including those involving different numbers of clauses and neurons. In addition, different numbers of logic combinations (*η*) were tested in this study.

Each simulation's specifics are as follows:


#### *5.2. The Parameter Setting in Probability Logic Phase*

The proposed model incorporates a probability logic phase. As we previously mentioned, there are two types of Θ*δ*2*SAT* depending on the probability that is applied to the number of neurons or the number of clauses. Numerous types of simulations are conducted to examine the impacts of different probabilities and several types of expected negative literal proportions on the dataset, in which the probability logic phase is dependent upon the dataset. The different probability logic phase will be denoted as *δγ*2*SATρ*, where *γ* = 1, 2 (**1** refers to the probability with respect to the number of neurons and **2** refers to the probability of the number of clauses), and *ρ* refers to the negative literal proportion; the overall model can be denoted as *δ*12*SAT*0.9. Another type of logic is possible if the range of the probability parameter *Y* with respect to the number of neurons or clauses stated in the simulation step generates only one type of neuron or clause state, and this will yield a systematic 2SAT during initialization, which is not covered in this study; alternatively, the first-order logic clauses will correspond to more than second-order logic. When this occurs, the proposed system's structural benefit cannot be seen because only one specific type of solution can be found in the final neuron state. In order to prevent these two types of logic, it is proposed that *Y* > 0.5, wherein more features of second-order as opposed to first-order logic are implemented in the DHNN. In parallel, to determine the range proportion, we proposed *ρ* > 0.5 to determine the correct number of negative literals that represent the prevailing attribute in the dataset, and we also considered *ρ*<sup>0</sup> = 0.5 since there is no available information prior to the survey; the symbols of the stages are presented in Table 2.

**Table 2.** Parameter list for probability logic phases.


## *5.3. Parameter Setup of DHNN* − *δγ*2*SATρ*

All simulations were run with 100 logical combinations (*η* = 100). This method aids the DHNN model's analysis and the approximate evaluation of the efficacy of the proposed logic in a DHNN with various distributions of the two parameters *Y* and *ρ*. The number of total literals in the logic system is represented by the number of neurons (*λ*1) in the DHNN. We chose a specific number of neurons: 5 < *λ*<sup>1</sup> < 50. For the DHNN, we apply a relaxation procedure in accordance with [18]. We select *R* = 3 in this context because a further reduction in the potential neuron oscillation has been observed, and a value of *R* greater than 4 will yield the same outcome as [27]. Table 3 summarizes the establishment of all the parameters necessary for *DHNN* − *δγ*2*SATρ*. In addition, it is notable that each *δγ*2*SATρ* has a neuron combination that is equivalent to the other DHNN logic systems, which eliminates the issue of a small sample size.

**Table 3.** List of parameters for *DHNN* − *δ*2*SAT*.


#### *5.4. Performance Metrics*

The objective of each phase includes the evaluation of the performance of the proposed model. Therefore, this study will utilize several performance metrics to assess the efficacy of each simulation in the different phases with respect to the *DHNN* − *δγ*2*SATρ* model to verify the effectiveness of the proposed logic system in terms of the probability logic, learning, and testing analysis phases.

#### 5.4.1. Assessment Logic Structure

The probability logic phase is the phase in which the correct logic sequence is generated and that controls the number of clauses and negative literals by solving Equations (3), (5) and (7). We attempt to evaluate the features of the output logic by comparing it with other models to guarantee well-produced logic in terms of clauses and negative numbers, which will the acquirement of the minimum cost function given in Equation (12). To determine the appropriate synaptic weight based on the main objective of this phase, we express three features: (a) the number of negative literals affected by parameter *ρ*, (b) the weights of the second-order logic clauses affected by parameter *Y*, and (c) the full-negativity second-order logic clauses affected by the two parameters *Y* and *ρ*. The goal is to compare these features to determine whether the probability logic phase will be successful in achieving the desired logic system by changing this parameter and demonstrating its excellence with respect to expressing the logic features. The parameter *ρ* controls the proportion of negative literals; hence, in this section, we test the effectiveness of this parameter based on several different aspects, which are provided below.

The proportion of negativity: in the probability logic phase, the optimal value of negative literals in the logic system will be assigned *ξ*, which is a constant ratio that is dependent on *λ*1, and the probability of negative literals in the logic system will be computed using the following equation:

Probability Of total Negativity (PON):

$$\text{PON} = \frac{1}{\eta} \sum\_{i=1}^{\eta} \frac{\tilde{\xi}}{\lambda\_1} \tag{20}$$

Equation (20) is derived from a Laplace formula [42]; we need to test whether the change in *ρ* will affect the probability of a negative literal structure occurring in the two types of logic compared to other forms of logic that introduce random proportions of negative literals in the logic structure. When compared to other types of logic, this matrix's scale, if corresponding to the necessary proportion, gives us the correct negative literal probability in the logic structure. While analyzing the deviation of the negative literal in terms of the whole logic system, we introduce a second measure to determine the state of the negative literals in the whole logic system, as shown below:

Negativity Absolute Error (NAE):

$$\text{NAE} = \frac{1}{\eta} \sum\_{l=1}^{\eta} \frac{|\lambda\_1 - \tilde{\xi}|}{\tilde{\xi}} \tag{21}$$

The proposed NAE scale measures the amount of error that is not negative if it fits the desired proportion in Equation (5). The optimal NAE is zero, which is equivalent to the required number of negative literals.

The probability of the full negativity of second-order logic: Full negativity second-order (¬*ri* ∨ ¬*rj*) logic helps us to represent a greater number of the attributes in the final solution. The main objective of the *δ*2*SAT* is to control the number of negative literals and second-order logical items in the logic structure. We need to expose the features of second-order logic as mentioned previously to fully enjoy the benefits of 2SAT in terms of our proposed logic system. Therefore, the next measure is presented as follows:

Full-Negativity Absolute Error second clauses (FNAE):

$$\text{FNAE} = \frac{1}{\eta} \sum\_{i=1}^{\eta} \frac{|\xi\_{2SAT} - \lambda\_{2SAT}|}{\lambda\_{2SAT}} \tag{22}$$

where *ξ*2*SAT* is the number of full negativity second-order clauses and *λ*2*SAT* is the number of secondorder clauses in a specific string of logic. The accuracy of the logic will be measured by the FNAE scale in terms of generating the full-negative second-order clauses, which are expressed as (¬*ri* ∨ ¬*rj*), from the rest of the second-order clauses, that is, (¬*ri* ∨ *rj*), (*ri* ∨ ¬*rj*), and (*ri* ∨ *rj*). Similarly, using this scale, we will address the effectiveness degree of the two factor parameters *Y* and *ρ* with respect to their significance in terms of altering the second-order clauses. We can determine if the required

logic can represent the prevailing attributes by the properties of this measure. The optimal best of FNAE scale is zero, which is equivalent to the required number fully negative second-order clauses.

To address the effect of a parameter *Y* in the second-order weight, we propose the weighted error measure, which gives the accuracy of the changing of the effect of *Y* in both proposed logic types when compared to other logic systems, as follows:

Weight Full-Negativity Absolute Error (WFNAE):

$$\text{WFNAE} = \frac{1}{\eta} \sum\_{i=1}^{\eta} \frac{(\left| \tilde{\xi}\_{2SAT} - \overline{\lambda}\_{2SAT} \right| ) \times w(y\_m)}{\sum\_{i=1}^{\eta} \lambda\_{2SAT}} \tag{2.3}$$

where *λ*2*SAT* is the mean number of second-order clauses, and *w*(*ym*) is the weight of second-order clauses, which equals *Y* because the Laplace formula determines an equally likely probability for all the elements. Using this measure, we can determine the effect of *Y* on the amount of deviation of the full negative clauses from the mean. We can calculate the real weight for this deviation by multiplying it with the weighted *w*(*ym*). A large scale signifies a high degree of representation in terms of the weight of the negative strings, which greatly improves our understanding of the weight of dominating attribute in logic. By comparing the scale to the other reasoning and assigning weight to that prioritized (completely negative sentences), the deviance is biased towards. Table 4 lists the symbols that we require during this phase.



#### 5.4.2. Assessment during the Training Phase

In the training phase, we achieved satisfying assignments of the clauses, which generated the optimal synaptic weights in terms of Θ*δγ*2*SAT<sup>ρ</sup>* by minimizing Equation (12). The Root-Mean-Square Error (RMSE) has been used as a basic statistical metric for measuring the quality of a model's prediction in many fields [24], and it is utilized to identify the quality of the training phase, wherein the value of RMSE training (RMSEtrain) signifies the root square of the error between the neurons' desired fitness value *Fdesierd* generated and their current fitness *Fi* [22]. The RMSEtrain formula is:

$$\text{RMSE}\_{\text{train}} = \sqrt{\frac{1}{\nu} \sum\_{l=1}^{\eta \times \upsilon} \left( F\_l - F\_{\text{desired}} \right)^2} \tag{24}$$

The optimal value of the RMSE in the DHNN model is achieved when it is zero, which means the WA method derived the correct synaptic weight. Furthermore, a good model is achieved when the measure is between 0–60. Whereas the Root-Mean-Square Error in synaptic weight (RMSEweight) used will be assessed based on the following formula

$$\text{RMSE}\_{\text{weight}} = \sqrt{\frac{1}{\upsilon \times \eta} \sum\_{t=1}^{\eta \times \upsilon} \left(\text{W}\_{\text{E}} - \text{W}\_{\text{A}}\right)^{2}} \tag{25}$$

where WE denotes the Expected synaptic weight obtained by the WA method, and WA is the actual synaptic weight obtained in the testing phases; this measure gives us a complete understanding of the error produced by the WA method, wherein the best result is 0, which corresponds to Equation (12).

#### 5.4.3. Assessment for Testing Phase

In the event that the suggested network satisfies the requirement in Equation (18), the proposed *DHNN* − *δ*2*SAT* will act in conformance with the embedded logical rule during the testing phase. The final neuron state will enter a state of minimum energy, which corresponds to the cost function of the proposed *DHNN* − *δ*2*SAT* logical rule. Therefore, based on the synaptic weight generated in the training phase, we evaluate the quality of the retrieved final neuron states (global), namely, the minima solutions. Thus, we apply the next measure as follows: Global minima ratio (*RG*)—the goal of the global minima ratio is to assess the retrieval efficiency of the *DHNN* − *δ*2*SAT*. The formula of the *RG* is:

$$R\_G = \frac{1}{\eta \times \varrho} \sum\_{l=1}^{\lambda\_1} G\_{\mathbb{G}\_{l25AT}} \tag{26}$$

where *G*Θ*δ*2*SAT* is the number of global minimum solutions that satisfy condition (18) after being distributed in Equation (19), *ϕ* is the number of trials in the training phase, and *η* is the logical combination for each run. This metric was frequently used in articles such as [21,38] to assess the proposed *DHNN* − *δ*2*SAT*'s convergence property.

The second measure in the testing phase is the Root-Mean-Square Error energy (RMSEenergy) [22], which is used to evaluate the minimization of energy achieved by *DHNN* − *δ*2*SAT*. The energy profile can be determined using RMSEenergy:

$$\text{RMSE}\_{\text{energy}} = \sqrt{\frac{1}{\upsilon \times q \rho} \sum\_{l=1}^{\eta \times \upsilon} \left( H\_{\text{O}\_{\text{Z5-AT}}} - H\_{\text{O}\_{\text{Z5-AT}}}^{\text{min}} \right)^2} \tag{27}$$

We use RMSEenergy to analyze the converge of *δ*2*SAT* to determine the actual energy difference between the absolute minimum energy *H*min <sup>Θ</sup>*δ*2*SAT* and the final minimum energy *H*Θ*δ*2*SAT* .

#### 5.4.4. Similarity Index

The similarity index [38] and cumulative neuronal variation [24] can be used to evaluate SAT performance using a DHNN. The similarity index values will be compared with benchmark neuron states *S*max *<sup>i</sup>* to determine the quality of each optimal final neuron state that achieved global lowest energy, as indicated in the following formula:

$$S\_l^{\text{max}} = \begin{cases} \begin{array}{ll} 1, & r\_l \\ -1, & \neg r\_l \end{array} \end{cases} \tag{28}$$

where 1 denotes a positive literal of *ri*, and −1 denotes a negative literal of ¬*ri* in each clause. It should be noted that the benchmark neuron states are the DHNN model's ideal neuron states that satisfy the conditions in Equation (18). The retrieved final neuron states are compared to the benchmark neuron states indicated in Table 5 to provide a comprehensive comparison of the benchmark neuron states and final neuron states.

**Table 5.** Variables' similarity index specifications.


The overall comparison of the benchmark and final neuron states is conducted as follows [9]:

$$\mathcal{L}\_{S\_l S\_l^{\text{max}}} = \left\{ (S\_l, S\_l^{\text{max}}) | i = 1, 2, \dots, n \right\} \tag{29}$$

According to Case 1 in Θ*δ*2*SAT* given in the examples in Table 1, the final neuron states are generalizable, as follows: *S*max *<sup>i</sup>* = (−1, 1, −1, 1, 1, −1, 1, −1, −1, −1).

In this study, we selected a well-known measure with which to determine the similarity index for diverse perspectives, namely, that developed by Sokal and Michener (Sokal) [46], which will be employed to evaluate the viability of the recovered final neuron states. It should be noted that Sokal measures the similarity of negative cases of *Si* with *S*max *<sup>i</sup>* over a range of (0, 1). The formulation is as follows:

$$\text{Sokal}(S\_{l\prime}, S\_l^{\text{max}}) = \frac{f + e}{f + e + h + g} \tag{30}$$

The Ratio of Cumulative Neuronal variation (*Rtv*) is used because the testing phase uses the DHNN's ability to directly memorize the final neuron states ratio without the need to create a new state. This is expressed as follows:

$$\begin{cases} R\_{l\psi} = \frac{1}{\varrho \times \eta \times \upsilon} \sum\_{l=1}^{\varrho} \sum\_{l=1}^{\eta \upsilon} \mathbf{E}\_{l\prime} \\ \mathbf{E}\_{l} = \begin{cases} 1, & S\_{l} \neq S\_{l}^{\max} \\ 0, & S\_{l} = S\_{l}^{\max} \end{cases} \end{cases} \tag{31}$$

where E*<sup>i</sup>* denotes the points scores used to assess the difference between newly recovered final neuron states and the benchmark neuron states. The symbol that we require for this Testing and Training phase is shown in Table 4.

#### *5.5. Comparison of Method and Baseline Models*

Since this study focuses on investigating *δγ*2*SATρ* performance with respect to its logical behavior, we need to investigate the *δγ*2*SATρ*'s performance in terms of *Y* and *ρ* with regard to constructing a good logical structure in the probability logic phase. Therefore, we compare *δγ*2*SATρ* with the existing logic systems in DHNNs based on the logic structures, testing phases, and the quality of the solution to examine two behaviors relating to logic:


In order to examine the logic in a DHNN after its implementation, we also compare its final neuron state's quality to that of RAN2SAT. We also evaluate the variation introduced by the testing phase, global minima solutions, and variation of neurons. The most recent logic systems with a 2SAT structure were selected for this reason, and one of their features was the decision to compare the logic systems' structures. Each clause contains two literals and all clauses are joined by a disjunction.


#### *5.6. Benchmark Dataset*

In this study, the proposed model generated bipolar interpretations randomly from a simulated dataset. More specifically, the logical illustration that was used in the simulations will serve as the foundation for the structure of the simulated data. The simulated dataset is commonly used in the modeling and evaluation of the efficacy of SAT logic programming, as demonstrated in the work of [18,22,27].

#### *5.7. Statistical Test*

This section provides a brief definition of the statistical measures that will be used in this study for two purposes (description and testing):

(a) The measure of central tendency is defined as "the statistical measure that designates a single value as being indicative of a whole distribution" [47]. Therefore, we selected two measures*:* (a) The average, which is known as the arithmetic mean (or, simply, "Mean"). It is calculated by adding all of the values in the dataset and dividing them by the number of observations. One of the most significant measures is the central tendency measure. The mean has the disadvantage of being sensitive to extreme values/outliers, especially when the sample size is small. As a result, it is ineffective as a measure of central tendency for a skewed distribution [48]. Its formula is expressed as follows:

$$\overline{X} = \frac{\sum\_{l=1}^{n} x\_{l}}{n \ast} \tag{32}$$

where *X* denotes the mean, *xi* represents the set of data, and *n***\*** denotes the sample size of the data. (b) The median is the value that, when all observations are arranged in ascending/descending order, occupies the central position. It divides the frequency distribution into two halves, is not biased by outliers, and is determined by the following formula [49]:

$$\bar{X} = \begin{cases} \frac{\frac{X \cdot \underline{\omega} + X \cdot \underline{\omega} + 1}{2}}{2}, & \text{if } n \ast \text{ even} \\\ x^{\frac{n+1}{2}}, & \text{if } n \ast \text{ odd} \end{cases} \tag{33}$$

where *X*# denotes the median, and *n***\*** denotes the sample size of the data.

(b) The measure of dispersion: Variability measures inform us about the distribution of the data and allow us to compare the dispersion of two or more sets of data. We can determine whether the data are stretched or compressed using dispersion metrics, namely, the Standard Deviation (SD), which evaluates variability by considering the distance between each score and the distribution's mean as a reference point. It is a variance square root and gives an indication of the standard deviation or average separation from the mean. It is presented as follows:

$$\sigma \ge \left(SD\right) = \frac{\sum\_{l=1}^{n} \left(\mathbf{X}\_{l} - \overline{\mathbf{X}}\right)^{2}}{n \ast - 1} \tag{34}$$


$$f(\mathbf{x}) = \int\_{-\infty}^{\infty} f\_{\mathbf{x}}(\mathbf{x}) \, d\mathbf{x} \tag{35}$$

where *f*(*x*) denotes the probability density function for random variables; the shape provides a visualization of the distribution of continuous random variables and provides the probability that a continuous random variable's value will fall within a specific interval.

(f) The Wilcoxon signed-rank test: The Wilcoxon signed-rank test was introduced for the first time by Frank Wilcoxon in 1945 [52]. It is a one-sample location problem-based nonparametric test that is used to test the null hypothesis wherein the median of a distribution equals some value (*H*<sup>0</sup> : *X*# = 0) for data that are skewed or otherwise (i.e., do not follow a normal distribution). It can be used instead of a one-sample t-test or paired t-test, or for ordered categorical data with a normal distribution. If (*p***-value** *≤ α*), the null hypothesis is rejected; this is strong evidence that the null hypothesis is invalid, i.e., the result of the median is significant. The Formula for the Wilcoxon Rank Sum Test (*W*) for *xi* independent random variable is:

$$\mathcal{W} = \frac{W\_{\epsilon}^{\*} - \frac{\pi \left(\pi + 1\right)}{4}}{\sqrt{\frac{\pi \left(\pi + 1\right) \left(2\pi + 1\right)}{24}}} \tag{36}$$

where *π* = number of pairs whose difference is not 0. *W*∗ *<sup>s</sup>* = smallest of the absolute values of the sums of *xi*. The symbols of these statistics are listed in Table 6. The details of the implementation of Θ*δ*2*SAT* into DHNN is presented in Figure 3, which contains the probability logic, the learning and testing phases, and the evaluation metric in each phase.

**Table 6.** Parameters List for *DHNN* − *δ*2*SAT*.


**Figure 3.** Flowchart of *DHNN* − *δ*2*SAT* and Experimental evaluation.

#### **6. Results and Discussion**

In this section, we describe the suggested logical output and evaluate it using a variety of evaluation metrics throughout all three phases to ensure that the addition of statistical tools to the RAN2SAT structure and the produced *δγ*2*SATρ* logic was effective. Furthermore, the simulation platform, assigned parameters, and the metrics' performance are discussed in this section. It is important to note that we have not considered any optimization during the probability logic phase, as in Zamri et al.'s [24] work; the training phase, as proposed in [21,38]; or the testing phase, as proposed in [9,53].

#### *6.1. Logic Structure Capability*

The probability phases give us different models in terms of negative literals and second-order logic with respect to the two parameters *Y* and *ρ*. Since both parameters fall within the [0,1] interval, we can generate an endless (infinite) number of 2SAT models using both parameters. For the majority of the representations of 2SAT, we chose to use *Y* (*p*(*ym*)) more frequently than *p*(*xm*) so that the results would be in the range (0.6–0.9). In this study, we chose values of *ρ* greater than 0.5 in the range (0.6–0.9) of the probability logic phases to obtain a greater representation of the negative numbers in order to study the predominating attributes in the dataset, as we previously mentioned.

We selected the most significant differences from the two intervals and designated them as models, which are illustrated in Table 7, in order to examine the efficacy of the two parameters with different numbers of *λm*, where 5 < *λ*<sup>1</sup> < 50 so at improve the benefits compared with other recently developed produced logic systems. Subsequently, we will test two *δγ*2*SATρ* types with different numbers of *λm*, *Y*, and *ρ*; these values are selected considering the significant change in probability and negative literals. Notably, values of *ρ* = 1 will be disregarded because we do not need all literals to be negative because the structure will not represent the Binomial distribution dataset. Moreover, the *DHNN* − *δ*2*SAT* will give one satisfied interpretation of a first-order clause [54]; on the other hand, *Y* = 1 will give us second-order logic. It is important to emphasize that we do not consider a systematic *δ*2*SAT* logical system in this study. Table 7 shows the names of two *δγ*2*SATρ* types for different possible models depending on the two parameters *Y* and *ρ*, as well as other logic symbols.


**Table 7.** The logical symbols in the experiment.

The negativity representation: The PON measure in the different logic models has been tested by Equation (20). The PON represents the probability of the appearance of a negative literal in the entire logic system in all combinations with different *λ*1. It is necessary to control the negative literals in order to determine the prevailing attributes in the dataset, as negative literals will ensure more negativity in the final neurons; then, we can ensure that the attribute will appear in the solution space by helping the DHNN find the optimal solution [24].

The Figure 4, a line representation, shows different layers of logic in different proportions for both types of *δγ*2*SATρ*. At the same time, for other groups, *ρ* = 0.5 for *r*SAT logic, and *ρ* = random for other logic systems (YRAN2SAT, MAJ2SAT, RAN3SAT, 2SAT, and RAN2SAT). The reason why this is in the minimum levels of the proposed logic for the *δγ*2*SATρ* is because, as was already noted, the probability of receiving a negative literal for the SAT is incredibly low. The highest two layers were recorded as *ρ* = 0.9 and *ρ* = 0.8 in both types of *δγ*2*SATρ*, respectively. By applying Equation (5), we obtain the best number of negative literals for all *λ*1, which is similar to the third layer for the other two groups, where *ρ* = 0.6 and *ρ* = 0.7 were the lowest probabilities in both types of *δγ*2*SATρ*, which, by the change in the proportional parameter *ρ*, indicates the success in terms of producing the

desired number of negative literals in the logic system, representing the predominate attributes in our dataset. Additionally, there was a direct correlation between the number of neurons in each class of the desired proportion and the proportions where a high PON recorded low probability when the number corresponded to *λ*1. When the number is less than 17 and after 31 for *λ*1, the PON becomes approximately stable. This is because the *d* in Equation (6) in the sample size equation always selects the optimal sample that reflects the number of negative literals, even if the number of neurons is low. Table 8 provides detailed information on the PON in each proportion group for the two types of logic. Note that group (*ρ* = 0.9) recorded the maximum PON and highest mean value of the PON with low *σ* in both types of *δγ*2*SAT*0.9; the small *σ* indicates a different number of neurons *λ*1, and this provides the nearest value of the PON means, and that result is highly similar within each group for all models and increases in accordance with the *Y* increasing in the models for both types, namely, *δ*12*SATρ* and *δ*22*SATρ*. the small *σ* indicate, with different number of neuron *λ*1, it provides the nearest value of PON mean's, and that result is highly similar within each group for all models and increases in accordance with the *Y* increasing in models for both types, namely, *δ*12*SATρ*, *δ*22*SATρ*. However, we can also note that the PON mean of the other logic systems is closest, indicating that the minimal PON value was recorded in YRANSAT with a minimum mean of 0.4966 and a low SD(*σ*) = (0.015), which indicates that it was also the lowest for different numbers of neurons, but we can also notice that the PON mean of other logic systems is closest, showing low values for different numbers of neurons that were less than or equal to 0.5. The PON results prove the flexibility of *δγ*2*SATρ*'s structure in terms of controlling the literals' states.

**Figure 4.** PON line representation for models in both types of logic (**a**) *δ*12*SATρ*, (**b**) *δ*22*SATρ*, and recently developed logic systems.

The accuracy of the models is evaluated by the NAE measure in Equation (21) in terms of the amount of error that is non-negative or the quantity of the negative literal status for the entire logic system in each proportion group for both types of *δγ*2*SATρ* models. According to Figure 5, in the line representation, note that the effect of the proportional changes in the logic restructure guarantees that the best RAN2SAT is required, or effective of the prevailing attribute in the dataset where different proportions give us different layers. The details of Figure 5 can be found in Table 9, which shows the minimum values of the NAE that were recorded in a group *ρ* = 0.9 where A4 in *δ*22*SAT*0.9 was recorded as the lowest error by (0.1429). It should be observed that the median value (0.3090) was the lowest possible value, indicating that the A4 for all *n* neurons of *λ*<sup>1</sup> always had a lesser error in the middle sections. Additionally, it should be noted that all models in the same group, A16, A12, and A8, have very similar median values (0.333, 0.31, and 0.320), which is because, as shown in the PON, this group has the highest probability for the representation of a negative literal, which is accomplished by the proportion *ρ* = 0.9. Similarly, in *δ*22*SAT*0.9, Q4 recorded the lowest error as 0.1429, but the least median was recorded by Q16 (0.13125), which means the minimum error lies in the middle values with respect to the number of neurons *λ*1. Moreover, it can be noted from Figure 5 that for a small number of neurons *λ*1, Q4 has fewer NAE values than Q8, Q12, and Q16. However, the reverse is true for the middle values of Q16 compared to Q12, Q8, and Q4, as mentioned before regarding

the effect of *Y* in *λ*1. However, in Table 9 the value of the median has very small differences from the model in group *ρ* = 0.9. As discussed in terms of the PON, this indicates the successfulness of the proportion of representation in the logic system. The highest NAE value was observed to be for *r*SAT with a high median, where *r* = 0.5 with the nearest value of NAE for the other logic systems (YRAN2SAT, MAJ2SAT, RAN3SAT, 2SAT, and RAN2SAT); as previously mentioned, there was a lack of representation of the negative literals in the logic system, as they recorded the least degree of the probability of the appearance of negative literals.

**Table 8.** PNO results for models with both types of logic, *δ*12*SATρ* and *δ*22*SATρ*, and recently developed logic systems' details determined by Wilcoxon test for median divided by *ρ* value.


**Note:** The yellow highlights indicate the highest number in the column and the green indicates the smallest number in the column.

**Figure 5.** NAE line representation for models in both types of logic (**a**) *δ*12*SATρ*, (**b**) *δ*22*SATρ*, and recently developed logic systems.


**Table 9.** Maximum and minimum NAE results for models with both types of logic *δ*12*SATρ*, *δ*22*SATρ*, and recently developed logic systems with details determined by Wilcoxon test for median divided by *ρ* value.

**Note:** The results highlighted in yellow indicate the highest number in the column and the green indicates the smallest number in the column, (*p*-value < 0.00), for all models in terms of Wilcoxon test, which means that *H*<sup>0</sup> should be rejected.

The probability of full negativity of second-order logic: We examined the ability of several models incorporating the two types of *δγ*2*SATρ* to produce full-negativity second-order clauses with greater accuracy compared to other recently developed logic systems by manipulating two parameters, *Y* and *ρ*, using the FNAE measure for the second-order clause in Equation (22). Obtaining full negativity second-order logic guarantees that the prevailing attribute in the desired logic structure is represented. Figure 6, a columnar representation, shows the result of the FNAE measure, the higher accuracy achieved by A8 and A4 in *δ*12*SATρ*, and Q4 in *δ*22*SATρ* that obtained a value of (0) for FNAE. This is due to the effect of the two parameters in this model, for which the proportion of negative number is *ρ* = 0.9, with a lower probability than other models in second-order logic where *Y* = 0.6, 0.7, which means that all second-order clauses are satisfied by negative numbers because of the small representation of second-order clauses. Based on the same figure, the low accuracy obtained by A1 and Q1, which obtain the maximum number in terms of the FNAE logic (0.8930, 0.8650), is the reason for the low representation of the negative proportion in the logic system. Thus, if we need greater representation of the prevailing attributes in the desired logic structure, we should choose the A8 and A16 from *δ*12*SAT*0.9 and Q4 from *δ*22*SAT*0.9. Model A4 recorded higher accuracy using the lowest value of the FNAE median (0.3995), which means the minimum error lies is in the middle values for all neuron quantities *λ*1. We also note the proportion of negative literals is *ρ* = 0.9, which means there are more second-order negative clauses in the models in *δ*12*SATρ* recorded in model Q12, where the lowest FNAE median was (0.4147). The accurate results regarding the FNAE measure are listed in Table 10. It is evident that the ratios of the negative literals are *ρ* = 0.9 and *Y* = 0.9, indicating that the model has a higher fraction of negative, second-order representations. Comparing these results to those of other state-of-the-art logic systems, all of them provide low accuracy due to a higher median value, which indicates that the mean lacks the ability to accurately represent the full-negative second-order values in this model. RAN2SAT performs the best among the logic systems. The latest logic systems give higher errors because the fluctuation in predetermine for assigning second-order logic and low represent for negative literal that indicate the *δ*12*SATρ* and *δ*22*SATρ* is flexible more than the recent logic systems in controlling of two parameters.

**Figure 6.** FNAE column representation for models in both types of logic (**a**) *δ*12*SATρ*, (**b**) *δ*22*SATρ*, and recently developed logic systems.

**Table 10.** Maximum and minimum FNAE results for models in both types of logic *δ*12*SATρ*, *δ*22*SATρ*, and recently developed logic systems with details determined by Wilcoxon test for median.


**Note:** The results highlighted in yellow indicate the highest number in the column and the green indicates the smallest number in the column, (*p*-value < 0.00) for all models in terms of Wilcoxon test, it means reject *H*0.

A high result in the WFNAE measure in Equation (23) indicates that full-negative second-order logic is more greatly represented. By using this scale, the weight of the sentences in the logic has been evaluated, and the *Y* parameter may be used to determine whether the model is desirable because the highest probability gives the highest weight. The maximum probability, as shown in Figure 7, is the highest weight represented and is obtained by A16, Q16 in *δ*12*SAT*0.9, and *δ*22*SAT*0.9, respectively, and 0 for YRANSAT, because it also produces first-order logic. In Table 11, note the highest significant

median value was achieved by the A16 and Q16 models (0.4477 and 0.4691, respectively), and the lowest significant median value was achieved by the YRANSAT (0) WFNAE value. This would ensure that the prevailing attribute has the highest representation in our logic compared to other state-of-the-art logic systems, in addition to its ability to minimize and maximize changes in *Y*. In conclusion, it is evident that the two parameters, *Y* and *ρ*, have a direct impact on the probability distribution dataset in the *δγ*2*SATρ* logic structure.

**Figure 7.** WFNA column representation for models in both types of logic (**a**) *δ*12*SATρ*, (**b**) *δ*22*SATρ* and recently developed logic systems.



**Note:** The results highlighted in yellow indicate the highest number in the column and the green indicates the smallest number in the column, (*p***-value < 0.00**) for all models in terms of Wilcoxon test, it means reject *H*0.

*r*SAT 0.1200 0.0600 0.1500 36

#### *6.2. Training Phase Capability*

This phase's objective is to evaluate the efficiency of various *δγ*2*SATρ* structures produced in the probability logic phase, which were trained in a DHNN and minimize the logical inconsistencies using Equation (12), to obtain the correct synaptic weight. In this phase, ES obtained consistent interpretations for Θ*δγ*2*SAT<sup>ρ</sup>* and derived the correct synaptic weight for the logic system. If the model arrived at an inconsistent interpretation (*E*Θ*δ*2*SAT* = 0), the *DHNN* − *δ*2*SAT* model will reset the whole search space and generate a new one until *φ* = *υ*. The error of the maximum fitness of logic, which is represented by the total clause from the achieved fitness, is calculated by employing RMSEtrain and RMSEweight to quantify the error in the training phase via Equations (24) and (25), respectively. Figures 8 and 9 show different RMSEtrain, and RMSEweight results for both types of *δγ*2*SATρ*, when (*υ* = 100); for both types of *δγ*2*SATρ*, RMSEtrain was described to undergo an exponential increase (logistic growth) with a rate of growth equal to |*Fi* − *Fdesierd*| and a linear positive increase in RMSEweight. According to [26], the error value in the training phase starts off low when the learning set is small because it is more difficult to fit the larger learning set. In this instance, as *λ*<sup>1</sup> rises, more iterations are required for the DHNN to locate SAT structures with satisfying interpretations, and the training phase metrics obtain 0 value when *λ*<sup>1</sup> is small. When the value of *Y* is high, there is always low error because the structure of second-order logic helps ES by becoming satisfied (*Fi* = *Fdesierd*) to a greater extent than first-order logic and because the probability of finding a consistent interpretation for each *δγ*2*SATρ* clause follows a binomial distribution, which measures the effect of flexible structure by changing in two parameters *Y* and *ρ* in terms of the RMSEtrain and RMSEweight results [24]. As shown in Figures 8 and 9, high probability of obtaining second-order *Y* makes it easier to locate optimal interpretations [22], which means the WA method will derive the correct synaptic weight. On the other hand, when *Y* decreases, it signifies that the probability of the first-order clauses being satisfied is very low for 2SAT. Due to its limited number of interpretations, the non-systematic logical rule with first-order clauses reduces the cost function of the logic.

**Figure 8.** RMSEtrain line representation for models in both types of logic (**a**) *δ*12*SATρ*, (**b**) *δ*22*SATρ*.

Table 12 records the values in Figure 8; in the line representation, it is noted for *δ*12*SATρ* large RMSEtrain reported for A4 (118.895) that follows group *Y* = 0.6, have the smallest number for 2SAT at the same time, the result of the RMSEtrain median gives us the more significant result reported by group *Y* = 0.7, whereas A8(68.5274) has a large RMSEtrain value without any effect by outlier for all *λ*1; thus, when *Y* decreases, the ES could not find a consistent interpretation for first-order logic. The low RMSEtrain median go for group *Y* = 0.9 were A14 (38.16665), which also indicates a large number of 2SAT that make it simpler for ES to achieve consistent interpretation. For *δ*22*SATρ* a large error was reported for the *Y* = 0.6 group in Q1(114.342) because of a small number of 2SAT. For the median result, we note that Q3(64.7599) reported a high RMSEtrain in the same group, and group *Y* = 0.9 reported a lower value with respect to Q16 (41.0488), which indicates it has the same behavior for *δ*12*SATρ*; it is worth noting here that large *Y* and *ρ* have large fitness errors. It is clear in Q(4,8,12,16) that when *ρ* = 0.9 in both measures, that means it is difficult for ES to become satisfied for negative literals, because the extreme value for negative literal makes it difficult to achieve optimal fitness, as mentioned in [24]. Due to the limited room for searching, it is challenging for ES to be applied to large *Y* in small *λ*1. Finally, the mechanism of ES in the training phase of DHNN is only effective when *λ*<sup>1</sup> is small and effected by a high number of neurons because of the non-randomized operator [24]. The training phase can be improved further by embedding a learning algorithm in a DHNN and using global and local search operators [26]. This approach may aid in the search for optimal Θ*δγ*2*SAT<sup>ρ</sup>* interpretations and ensures that logical inconsistencies are minimized.

**Figure 9.** RMSEweight column representation for models in both types of logic (**a**) *δ*12*SATρ*, (**b**) *δ*22*SATρ*.

From Figure 9, column representation, the RMSEweight for two types *δγ*2*SATρ* models help to better understand the fitness of the neuron state. Based on the results, the value of 0 was obtained in different quantities of *λ*<sup>1</sup> in the interval [5,18] in different models in both types of *δγ*2*SATρ*; then, the values started to fluctuate at large *λ*1—the maximum RMSEweight values were reported for A7 and Q3, where the values of the negative literals were large (*ρ* = 0.9) and where *λ*<sup>1</sup> was large. In Table 13, which corresponds Figure 9, it is reported that the maximum RMSEweight values in terms of the median are A1(0.0791) and Q10(0.0548), wherein the *ρ* is small. In addition, a small result was reported for A16 (0.0075) and Q14(0.0048), where the negative numbers are large, which is clearly the result of the RMSEweight being affected by the fitness clauses that were measured by RMSEtrain, because the ES is could not find the interpretation for a clause with a high value of *λ*<sup>1</sup> then the DHNN could not derive the correct synaptic weight by WA methods when the result was more than zero. The fluctuation in the result is because the DHNN is selected the random number for weight if *E*Θ*δ*2*SAT* = 0 after the number of iterations *φ* reaches the maximum. In conclusion, it is evident that two parameters, *Y* and *ρ*, have a direct impact on the probability distribution dataset during the testing phases.


**Table 12.** Maximum and minimum RMSEtrain results for models in both type of logic *δ*12*SATρ*, *δ*22*SATρ* details by Wilcoxon test for median.

**Note:** The results highlighted in yellow indicate the highest number in the column and the green indicates the smallest number in the column, (*p*-value < 0.00) for all models in terms of Wilcoxon test, it means reject *H*0.

**Table 13.** Maximum and minimum RMSEweight results for models in both types of logic *δ*12*SATρ*, *δ*22*SATρ* and details by Wilcoxon test for median.


**Note:** The results highlighted in yellow indicate the highest number in the column and the green indicates the smallest number in the column, (*p*-value < 0.00) for all models in terms of Wilcoxon test, it means reject *H*0.

#### *6.3. Testing Phase Capability*

Optimal testing phase is achieved when *E*Θ*δ*2*SAT* = 0 retrieved optimal synaptic weight, after *DHNN* − *δ*2*SAT* completing checking clause satisfaction and generating optimal synaptic weight through the WA method. The final state of the neuron will converge towards the global minimum energy. It is important to evaluate testing phase because DHNN frequently produces similar final neuron states as opposed to novel final neuron states [55]. Therefore, we compare the *δγ*2*SATρ* logic with the recent logic systems by global minima ratio matric. If the model is unable to reach a global solution, this indicates that it is trapped in a local solution, which makes it impossible to determine whether the proposed *DHNN* − *δ*2*SAT* is satisfied or not.

Figure 10, column representation, shows the global minima ratio results, calculated by Equation (26) for two types *δγ*2*SATρ* and state-of-the-art logic systems, without considering any optimizer, to assess the actual testing phase capability for of *DHNN* − *δ*2*SAT*. Where the optimal result for global minima ratio *RG* is 1, we can note in Figure 10 that all are capable of retrieving the optimal synaptic weight values in small *λ*<sup>1</sup> and then it decreases linearity with large *λ*1, because the ES is unable to manage synaptic weight in the training phase and will be susceptible to retrieving non-optimal neuron states and ensnared in local minima. A model's ability to achieve maximum global minima ratio demonstrates that the suggested SAT is effectively integrated into DHNN. Maximum global minima ratio reported for YRAN2SAT, *r*SAT, and (A1, A11, Q11) models in *δγ*2*SATρ*. The reason for YRAN2SAT recorded the high global minima ratio for small *λ*<sup>1</sup> [26] because the flexibility in the structure offers an accurate result. Table 14 gives the Figure 10 numerical result, from the *RG* median results without effect from the outliers, note both type *δγ*2*SATρ* achieve near result to other latest logic systems. High median goes for MAJ2SAT because the structure of logic that (2SAT,3SAT) [23], also the fare literals state represent in *r*SAT [24] make it achieved highly *RG*. Based on Table 14, from *RG* median, there is a high effect for two parameters *Y* and *ρ* in *δγ*2*SATρ*, small *λ*<sup>1</sup> in the DHNN for small *Y* and *ρ* can retrieval the right synaptic weight such as (A1,Q1), but from median, the high *Y* and *ρ* achieved more global minimum than other such as A(13,14,15), Q(9,10,13,14). It can say the proposed models showcased the efficiency of *δγ*2*SATρ* to control DHNN as a symbolic structure that causes network convergence. Since the local field in Equation (15) drives the neuron's final state in accordance with the behavior of the second and first-order clause, it exhibits the same behavior as the non-systematic RAN2SAT structure presented by [20].

**Figure 10.** Column representation for models in both types of logic (**a**) *δ*12*SATρ*, (**b**) *δ*22*SATρ* and recently developed logic systems.


**Table 14.** Maximum *RG* results for models in both type of logic *δ*12*SATρ*, *δ*22*SATρ* and RAN2SAT details by Wilcoxon test for median.

**Note:** The Yellow highlighted to indicate the highest number in the column, (*p*-value < 0.00) for all models in terms of Wilcoxon test, it means reject *H*0.

The purpose of finding the RMSEenergy in Equation (27) is to calculate the difference between the final energy and the absolute minimum energy, as stated in condition Equation (18). indicates whether or not the solutions produced by *DHNN* − *δ*2*SAT* are optimal, it must assess the flexibility of *δγ*2*SATρ* by determining the value of RMSEenergy. Based on Figure 11 column representation, was reported to small *λ*<sup>1</sup> achieve less RMAEenergy value for all models, which indicates a successful convergence towards the optimal final neuron state, after which the final energy difference fluctuates as the number of *λ*<sup>1</sup> increased. This phenomenon occurs as a result of the decreased probability of receiving cost function *E*Θ*δ*2*SAT* = 0, as clear in RMSEtrain which leads to higher energy, and *DHNN* − *δ*2*SAT*'s ineffective learning strategy. As the number of *λ*<sup>1</sup> increases, some synaptic weights become suboptimal, resulting in final neuron states stuck in local minimum energy. Additionally, Sathasivam [18] claims that during the DHNN testing phase, suboptimal neuron updates are what caused the local minimum energy to exist. Suboptimal neuron updates in this situation will result in more incomplete sentences, which raises the energy gap. When the logical formulation containing 2SAT was incorporated into *DHNN* − *δ*2*SAT* we said the *δγ*2*SATρ* behaved like the traditional non-systematic logical rule RAN2SAT. As shown in Figure 11 it can be observed that the adverse impact of negative literal with high number of *λ*<sup>1</sup> in logic where A4 Q12 recorded the highest value of RMSEenergy and A1 Q1 when number of *λ*<sup>1</sup> small is the opposite, from Table 15 gives from the Figure 11 the median of RMSEenergy gives us the accurate result where the small median go for A13, Q9 with low value parameter *ρ* and A8, A16 with high value parameter *ρ* give high RMSEenergy error. This demonstrates that when most neuron states are negative, tend to converge towards local minimum energy. In conclusion, it is evident that two parameters, *Y* and *ρ*, have a direct impact on the probability distribution dataset during the testing phases.

**Figure 11.** RMSEenergy column representation for models in both types of logic (**a**) *δ*12*SATρ*, (**b**) *δ*22*SATρ*.

**Table 15.** Maximum and minimum RMSEenergy results for models in both type of logic *δ*12*SATρ*, *δ*22*SATρ* details by Wilcoxon test for median.


**Note**: The results highlighted in yellow indicate the highest number in the column and the green indicates the smallest number in the column, (*p*-value < 0.00) for all models in terms of Wilcoxon test, it means reject *H*0.

#### *6.4. Similarity Index Analysis*

For final neurons' quality states only compare both type of *δγ*2*SATρ* with RAN2SAT because *δγ*2*SATρ* consider the enhancement and developing for RAN2SAT, also, they have the same structure behavior, we tested the variation introduced by the testing phase for *δγ*2*SATρ* models and final neurons' quality state compared with RAN2SAT, where the degree of state redundancy for the DHNN model training phases is indicated by the similarity index of the final neuron state. A standard has been introduced indexing metrics, which is Sokal index, and consider the effective metric known as the ratio of total neurons variation *Rtv*.

Firstly, consider the lower Sokal in Equation (30) in the similarity index matrices indicates that the final neuron states obtained are highly distinct to the benchmark states. According to Figure 12, a column representation to both types of *δγ*2*SATρ* reported low values, which imply higher more variety solution than other recorded by A16, Q16m but Q1, A5 recorded high value, due to parameter *ρ*. Table 16 translates Figure 12, which clarifies numerically, where the A16, Q16 reported low median value. Where all logic has the *ρ* = 0.9 and *Y* = 0.9 record low value, it indicates that there is more negative neuron and less first-order logic provides the final neuron state and the benchmark state distinction as shown in blue numbers in Table 16, Q, A (4,8,12,16). In other words, the low negativity and greater representation for first-order logic give us a high Sokal, as shown in Q, A (1,5,9,13) with a red number.

**Figure 12.** Sokal column representation for models in both types of logic (**a**) *δ*12*SATρ*, (**b**) *δ*22*SATρ* and RAN2SAT.

Secondly, the effective parameter known as the ratio total variation of neurons *Rtv* In Equation (31). From the Figure 13 its clearly a column representation to both types giving us different number of variation solution for different number of *λ*<sup>1</sup> because of the effect of two parameter *Y* and *ρ* in the training phase. The highly oscillation recorded for *δ*12*SATρ* in 14 < *λ*<sup>1</sup> < 20 and 14 < *λ*<sup>1</sup> < 26 for *δ*12*SATρ* models and the highest oscillation value is recorded for A16 in 17 < *λ*<sup>1</sup> < 20. For *δ*22*SATρ*. The highly oscillation recorded in 14 < *λ*<sup>1</sup> < 26 for *δ*12*SATρ* models and the highest oscillation value is traced for Q15 in 13 < *λ*<sup>1</sup> < 23. At the same time both type of *δγ*2*SATρ* models affected by a number of neurons, they start the ups and downs in different *λ*<sup>1</sup> according to the effect of two parameter *Y* and *ρ*. The total oscillation for some of models is rich to zero when *λ*<sup>1</sup> < 5, *λ*<sup>1</sup> > 39 such as A (1,3,4,5,8,10,12) and so low for others models *δ*12*SATρ*, also Q1,Q4 when *λ*<sup>1</sup> < 5, *λ*<sup>1</sup> > 35 for *δ*22*SATρ*, we can be said there are no significant variations for high than 37, also we can note here the effect of *Y*, it can't achieve the global solution for low *Y* because the ES will disturb the *δγ*2*SATρ* model in order to reach the optimal training phase (known as learn inconsistent interpretation), from Figure 10, global solutions acquired by *δγ*2*SATρ* models grow as *λ*<sup>1</sup> decrease as introduce previously. Table 17 for numerical result for Figure 13 note the effect of increase *ρ* where the logic has *ρ* > 0.7 recorded the highest high number of *Rtv*, where the highest variation go for A16 (0.2149) and Q15 (0.2084), and also can see the *δ*22*SATρ* record high *Rtv* than *δ*12*SATρ* in general, the reason her the *δ*22*SATρ* give less number of first-order logic than *δ*12*SATρ* for the same *Y* that was mention previously in Table 1, then the ES will deal with fewer numbers for first-order logic where it difficult to reach the optimal training phase. Moreover, Figure 13 presents the reason for the decrease when

*λ*<sup>1</sup> increases the hard achieved global solution. It was observed that the RAN2SAT behave similarly to the *δγ*2*SATρ*, with a high *Rtv* recorded of (0.1764) at the same time as its increase in the interval 13 < *λ*<sup>1</sup> < 42 and then decrease with a high *λ*1. The impact of the global minimum solution *Rtv* is related to the number of neurons. As *λ*<sup>1</sup> rises, the probability of the number of global solutions reduces. We can conclude from the above results that *Rtv* is related to the occurrence of other neuron states that lead to global minimum solutions in other domain adaptations [22].

**Table 16.** Maximum and minimum Sokal results for models in both types of logic *δ*12*SATρ*, *δ*22*SATρ* and RAN2SAT details by Wilcoxon test for median.


**Note:** The results highlighted in yellow indicate the highest number in the column and the green indicates the smallest number in the column, (*p*-value < 0.00) for all models in terms of Wilcoxon test, it means reject *H*0.

**Figure 13.** Column representation for models in both types of logic (**a**) *δ*12*SATρ*, (**b**) *δ*22*SATρ* and RAN2SAT.


**Table 17.** Maximum and minimum *Rtv* results for models in both types of logic *δ*12*SATρ*, *δ*22*SATρ* and RAN2SAT.

**Note:** The results highlighted in yellow indicate the highest number in the column and the green indicates the smallest number in the column.

#### *6.5. Synaptic Weight Analysis*

The mean is important because it signifies the location of the dataset's centre value, it contains information from each observation in a dataset. When a dataset is skewed or contains outliers, the mean may be untrue. We are utilizing various statistical tests to aid us comprehend the behaviors of synaptic weight to deduce information about the performance of logic in the training phases for further inquiry in synaptic weight distribution. The descriptive statistic of mean synaptic weight is a novel perspective in synaptic weight analysis, and we consider the mean of full logic to obtain a meaningful result in this analysis of our study by using the following formula:

$$\text{Mean of } \delta 2SAT = \sum\_{l=1}^{\eta \times \upsilon} \left( \frac{\sum\_{l=1}^{\eta} \mathcal{W}\_{rl} + \sum\_{j=1}^{\eta} \mathcal{W}\_{rj} + \sum\_{j=1}^{\eta} \mathcal{W}\_{rjrj+1}}{\lambda\_1} \right) \tag{37}$$

where *Wri* = ±0.5 synaptic weight for first-order logic, *Wrj* = ±0.25 synaptic weight for second-order logic literals, *Wrjrj*+<sup>1</sup> = ±0.25 synaptic weight for second-order logic clauses. An example of the formula is shown as follows:

$$\begin{cases} \delta 2SAT = \neg a \land b \land (\neg \varepsilon \lor \neg f) \land (\neg k \lor I) \\ \text{Mean of } \delta 2SAT = \frac{\neg \dots \leftrightarrow \dots \to \neg (\neg .25 - .25 - .25) + (\neg .25 + .25 + .25)}{6} = 0.0833 \end{cases} \tag{38}$$

The center value located in a dataset is carries a piece of information from every observation in a dataset; accordingly the mean will give the information of the center value for all synaptic weight in logic, where they affect together in cos function on training phases, in this study the mean for 100 combinations been calculated in training phase as sampling size for each logic in both type of *δγ*2*SATρ*, so we have 100 individual results for the means that have the same characteristic in two parameters *Y* and *ρ*. It is worth noting that all means' values were tested first using appropriate tests that yielded significant *p*-values to ensure a correct outcome. The logic value signifies that features will be statistically defined by the curve of probability density function *f*(*x*), representing points and (boxplot and whiskers), denoted as (Raincloud Plot), and we want to achieve the following by using these figures:

(a) The probability density function *f*(*x*) the curve will give an accurate result data behaviors (symmetric or skewness) so that we can determine if there is an outlier or if all value is

distributed normality in the *δ*12*SATρ* and *δ*22*SATρ* logic (a normal bell curve indicating there is no outlier, and this logic has a high probability of achieving satisfaction in terms of *Y* and *ρ*). (b) The representing points the spread of mean values, while the boxplot and whiskers explain

the amount of spread around the median, along with the details of an outlier from the median value given by whiskers sides.

This investigation will look at the impact of mean value analysis in evaluating the *DHNN* − *δ*2*SAT* during the training phase. We consider the highest *λ*<sup>1</sup> in each logic systems combination to calculate the mean, so we have *λ*<sup>1</sup> between 48,50 to obtain more accurate results. In the training phase, the synaptic mean value was determined using the ES effect to uncover inconsistent interpretations that offer us a basic understanding about the behavior of logic and achieving satisfied. There are 4 figures for both types of *δγ*2*SATρ*, each figure includes a probability density function curve, the representing point, and (boxplot and whiskers), its classification depending on the *Y* values in both types of *δγ*2*SATρ*, where they have the same structure because it is the key affected parameter in the mean values discussed as follows: For both *δ*12*SATρ* and *δ*22*SATρ*:

(a) When *Y* = 0.6 noted the following from Figure 14:

**Figure 14.** The Raincloud Plot analysis for (**a**) *δ*12*SATρ*, (**b**) *δ*22*SATρ* synaptic weight means when *Y* = 0.6.

*δ*12*SATρ* probability function curve shows thin-tailed on two sides, so it is fairly to be a symmetric ship, which indicates that outliers are infrequent (an observation is considered an outlier if it differs numerically from the rest of the data), and the values for the mean tend to be normal for A (1,2,3,4), whereas the probability function curve Q3,4 are similar in behavior for *δ*12*SATρ*. It is fairly to a symmetric ship, so it has thin tailed on two sides and rare outliers, but Q1,2 shows different results because it tends to be non-symmetric by the heavier tail on the left which means that there are a lot of outliers. This result will be supported by the boxplot and whiskers. When we look at interquartile ranges, IQR (the lengths of the boxes), the longer it is, the more dispersed the data are, and the shorter it is, the less dispersed the data are. It can be observed that the *δ*12*SATρ* is highly dispersed from the median compared to *δ*22*SATρ* since the IQR range is higher in A (1,2,3,4) than Q (1,2,3,4). In addition, in terms of outlier, when checking a box plot, an outlier is defined as a data point that lies outside the box plot's whiskers, the *δ*12*SATρ* and *δ*22*SATρ* have the approximate behavior of a huge outlier, but it can be noted that the *δ*12*SATρ* has more outlier than *δ*22*SATρ* because ES could not achieve inconsistent interpretation in the training phase due to the *δ*22*SATρ* models structure that leads to a random value for synaptic weight. Finally, the boxplot clearly shows that the distribution is nonsymmetric for *δ*12*SATρ* and *δ*22*SATρ*, as previously explained (the distribution is symmetric when the median is in the center of the box and the whiskers are nearly the same on both sides of the box), in both logic systems. The reasons for these results are:

In terms of *Y* parameter, the number of first-order logic that has a *p*(*xm*) = 0.4 in logic value that pulls the logic curve to the sides because the suboptimal synaptic weight for first-order logic is clearly in the distribution tail and box-whiskers plot also *δ*22*SATρ* has more 2SAT than *δ*12*SATρ* for the same *Y* parameter, and that reflects in the spread of value in the boxplot, which is high in *δ*12*SATρ*. This indicates a high variation between the mean values ES failed to find a consistent interpretation. In terms of *ρ* parameter, from the boxplot also, we can observe that the effect of *ρ* gives more negative synaptic weight, but we should also consider the value of WBB that was positive in clauses (¬*ri* ∨ *rj*), (*ri* ∨ ¬*rj*) and (*ri* ∨ *rj*) that affected 2SAT clauses mean values, because it is noted in *δ*22*SATρ* there is no effect for *ρ*, as mentioned previously, it has more 2SAT clauses than *δ*12*SATρ* for the same *Y*. Therefore, the ES tend to obtain consistent interpretation that is reflected in the mean values of whole synaptic weight logic. Conversely, for *δ*12*SATρ* the effect of *ρ* is clearer in the mean values, with most points of the values located on the negative side.

#### (b) When *Y* = 0.7 noted the following from Figure 15 as follows:

The probability function curve for *δ*12*SATρ* exhibits the same behavior for *Y* = 0.6, indicating that it is a symmetric ship with normal mean values. It has a thin tailed on two sides, so outliers are infrequent. For *δ*22*SATρ* it is a little different, all Q (5,6,7,8) is symmetric. Then, the mean values tend to be normal and have a light tail, except for Q6, as we see in the curve, it is a fat tail, therefore there are a lot of outliers on both sides. The boxplot and whiskers tell the same story for *Y* = 0.6. When we look at the box side, we can see that the *δ*22*SATρ* is highly dispersed from the median compared than the *δ*22*SATρ* because the value of IQR is higher in A (4,5,6,7) than the Q (4,5,6,7). Moreover, in terms of an outlier, we can observe that the *δ*12*SATρ* and *δ*22*SATρ* both have the approximate behavior of a huge outlier, but the *δ*12*SATρ* is more outlier than *δ*22*SATρ* except for Q6. Most logic has an outlier and, at the same time, is a short box (which implies that high-frequency data tends to be more fat-tailed). Finally, from the boxplot, the non-symmetric shape in both for *δ*12*SATρ* and *δ*22*SATρ* can be seen clearly. The reasons for these results are justified as follows:

In terms of *Y* parameter, the number of second-order logic clauses that have a *p*(*xm*) = 0.3 is considered a bit high, especially in high *λ*<sup>1</sup> which generates *E*Θ*δ*2*SAT* = 0 that pulls the logic curve to the two sides because the suboptimal synaptic weight is clearly in the tail of probability curve distribution and boxplot-whiskers. For *δ*22*SATρ*, it has more 2SAT clauses than *δ*12*SATρ*, for the same *Y* parameter. This reflects in the spread of value in the boxplot at its highest more than in *δ*12*SATρ*. Therefore, it shows a high variation between mean values because ES failed to find consistent interpretations. In terms of *ρ* parameter, boxplot in *δ*12*SATρ* and *δ*22*SATρ* are reflected in a negative synaptic weight value. Both models have the parameter of *Y* = 0.6, the spread of data affected by *ρ* in 2SAT clauses and it affects the value of the mean which tends to be positive, as we mentioned previously. Finally, as seen in the Q6, the reasons for right fat-tailed the number of high second-order logic sentences that generate suboptimal synaptic weight, resulting in positive mean values.

**Figure 15.** The Raincloud Plot analysis for (**a**) *δ*12*SATρ*, (**b**) *δ*22*SATρ* synaptic weight means when *Y* = 0.7.

(c) When *Y* = 0.8 is observed from Figure 16 as follows:

Where the 2SAT clauses are the common clauses, for *δ*12*SATρ* the curve shows semi normal ship in A (9.11,12) with a semi skewed in A10, the light tail on the two sides with less outlier is in all *δ*12*SATρ*. On the other side, *δ*22*SATρ* gives near result where Q (10,12) is fairly to be symmetric ship, the mean values tend to be normal, it has a thin-tailed in two sides, Q (9.11) tend to be non-symmetric, the light tail in the two sides with less outlier is on all *δ*22*SATρ*. The boxplot and whiskers for *δ*12*SATρ* and *δ*22*SATρ*, is highly sparse from the median comparison because the IQR range is higher in A (9,10,11,12) than Q (9,11,12), and shorter in Q10. In the terms outlier, from a box plot whiskers, the *δ*12*SATρ* and *δ*22*SATρ* have approximate behavior of huge outliers on both sides, but we can note the Q11 is more outlier on the left than others and Q9 is more outlier on the right. Finally, based on the boxplot, it clarifies both logic systems have non symmetric curves. The reasons for these results are justified as follows:

In terms of *Y* parameter, the number of first-order logic clauses that have a small appearance probability that makes the range values of mean is high in the two previous Figures 14 and 15. It is clear here in these figures the *δ*12*SATρ*, *δ*22*SAT<sup>ρ</sup>* obtaining (0.5) synaptic weight is small, so most of the means value range is small that led to less spread curve line, on the other side, the high representation of 2SAT clauses makes the length of the box highest because the volatile in the mean values of 2SAT clauses it gives a different result depending on negative literal, where (¬*ri* ∨ *rj*), (*ri* ∨ ¬*rj*) and (*ri* ∨ *rj*) have the mean values different from (¬*ri* ∨ ¬*rj*) the effect also by ES algorithm searching and that effect in cost function in Equation (12), that pull the logic curve and boxplotwhiskers into sides, that reflects in the spread of values in boxplot its highest than in *Y* = 0.6, 0.7. In terms of *ρ* parameter, its high effect here, in boxplot in *δ*12*SATρ* and *δ*22*SATρ* is clearly in the range of values, most of it full in the negative side, more clearly in Q, A( 11,12) because the mean values of full negative second-order logic clauses it is highest here as we clarify in FNAE matric. It is also noted for Q (9,10), A10 is in the positive side because the *ρ* is small therefore, the mean will be positive and ES algorithm searching tend to find inconsistent interpretation. This indicates the effect of the parameter *ρ* but A9 still has first-order logic, which makes the data spread in two sides with light tail. However, in Q10 and Q12, the tail because the extreme mean values that come from full negative clauses and first-order logic clauses.

**Figure 16.** The Raincloud Plot analysis for (**a**) *δ*12*SATρ*, (**b**) *δ*22*SATρ* synaptic weight means when. (d) When *Y* = 0.9 is observed from Figure 17 as follows:

**Figure 17.** The Raincloud Plot analysis for (**a**) *δ*12*SATρ*, (**b**) *δ*22*SATρ* synaptic weight means when *Y* = 0.9.

The *δ*12*SATρ* probability function curve indicates that it is reasonable to be a symmetric shape in A(15,16), but A14 tends to be non-symmetric, with a thin tail in two sides, implying that outliers are infrequent. Whereas A13 is left–right skewed and heavy-tailed, which implies that there a lot of outliers on the left, but in *δ*22*SATρ*, Q (13,14,16) is symmetrical. While Q15 tends to be nonsymmetric, they have a thin tailed on two sides, implying that outliers are infrequent. Moreover, Q14 is heavily tailed which indicates there is a lot of outliers, but Q13, 16 have light tails and outliers are infrequent. When we look at interquartile ranges, we can observe that *δ*12*SATρ*, A (15,16) is considerably distributed from the median compared to A (13,14) because the IQR range is similarly high in *δ*22*SATρ*. Meanwhile, Q (13,15,16) is highly dispersed from the median compared to Q (14) because the IQR range is highest in terms of an outlier. When reviewing box whiskers, the *δ*12*SATρ* and *δ*22*SATρ* have the approximate behavior of a huge outlier however, we can note that Q, A (13,14,) is more outlier than Q, A (15,16). Finally, from the boxplot, it is clearly the non-symmetric for *δ*12*SATρ* and *δ*22*SATρ* as we previously mentioned. The reasons for these results are justified as follows:

In terms of *Y* parameter, the number of second-order logic clauses that have the smallest appearance, so the mean values are high, is clear in the *δ*12*SATρ*, *δ*22*SATρ* figures. Moreover, the majority of 2SAT clauses representing, make the spread in all length box highest in *δ*12*SATρ*, *δ*22*SATρ* because of the volatility in the mean of 2SAT clauses, as mentioned previously. This effect in the logic curve and pulling the logic curve into the two sides also for boxplot-whiskers, which reflect in the dispersion of value in boxplot is more than *Y* = 0.6, 0.7. In terms of the ρ parameter, it has a high effect as well, in the boxplot in *δ*12*SATρ* and *δ*22*SATρ*, it is clearly in the range of value, most of it fails in the negative side also more clearly in Q, A (15,16) because the mean of full negative second-order logic clauses is highest here. As we explain in the FNAE metric, for other logic A, Q (14, 13) still has more first-order logic, which causes the mean spread in two directions and a heavy tail in Q14 and A13 due to the extrema value that occurs due to the full negative clauses and second-order logic clauses.

From this result, we can note the significance of the synaptic weight analysis; it gives a summary of the search space area for a specific algorithm in training phases, and it is clarified by the mean synaptic weight results, which give the center of search space (optimal) and the wide by the range of spread (suboptimal) from the previous result the mean synaptic weight gives a general perspective for the mechanism of ES algorithm in this search space. Thus, we can observe the behavior of working in this limited space, as well as the behaviors of obtaining a solution using optimal and suboptimal synaptic weights. The ES has a unique search space that is heavily influenced by the number of neurons and the structure of logic.

#### *6.6. The Limitation of the DHNN-δ2SAT*

One of the limitations of *DHNN* − *δ*2*SAT* in this study is that the proposed hybrid network DHNN only considers propositional logic programming. The DHNN is unable to embed other variant of logic, such as predicate logic, fuzzy logic, or probabilistic logic due to the nature of Hopfield Neural Network proposed by Pinkas [56] that are limited to symmetric connectionist network, as well as the DHNN's low storage capacity and the cost function proposed by Wan Abdullah (1992), which only considers bipolar neurons. Conversely, this study takes a number of neurons limits is less than 52 because of ES. Consequently, as we improve, will replace ES by metaheuristics such as Artificial Bee Colony Algorithm [57] and Election Algorithm [58]. Despite DHNN flexibility, *δ*2*SAT*'s the quality of solutions offered needed to be improved. We can increase the iterations numbers required in our simulations by increasing the number of learning. The proposed model may yield more variation neurons, less errors, and a global minimum solution with more iterations.

#### *6.7. Summary*

In this section, we provide a brief summary of the beneficial properties of the logical structure of the proposed model; moreover, we provide a simple summary of the most important accomplishments of the proposed logic system, clarifying the findings given in the Results and Discussion section with respect to the following points:


statistical tests used to study the behaviors of synaptic weight to deduce information about the performance of a proposed logic system in the training phases. Whereas, in this study, the descriptive statistical method analyzed the synaptic weight distribution by obtaining the mean of the synaptic weight in the testing phase.




### **7. Conclusions and Future Work**

It is critical to create a non-systematic logical framework in a DHNN, employing parameters conducive to building a flexible final neuronal state. This study introduced a new probability logic phase that assigns the probability of the first- and second-order clauses and the desired negative literals appearing in each sentence, which helped to address the requirements of datasets. Statistical tools govern the creation of Θ*δ*2*SAT* during the probability logic phase. The novel logic probability phase of the proposed *δ*2*SAT* model provides a new enhancement with which to shape the logic structure according to the dataset, for which it was found that these models have high values in two parameters (*Y* = 0.9, *ρ* = 0.9) of two *δγ*2*SATρ* types, which introduced efficient logic structures in the probability logic phase. The new logic was embedded in the *DHNN* − *δ*2*SAT* by reducing the logical inconsistency of the corresponding zero-cost function's logical rule. The achieved cost function that corresponds to satisfaction was used to calculate the synaptic weight of the DHNN's effectiveness with a *δ*2*SAT* logical structure, which was examined using three proposal metrics in comparison with state-of-the-art methods, such as 2SAT, MAJ2SAT, RAN2SAT, RAN3SAT, YRAN2SAT, and *r*SAT. The final neuron state was assessed based on various initial neuron states, statical method parameters, and various metric performances, such as learning errors, synaptic weight errors, energy profiles, testing errors, and similarity metrics, which were compared with existing benchmark works. To further demonstrate the efficiency and robustness of the proposed Θ*δ*2*SAT*, it was validated using four different second-order probability distributions with four different proportions of extensive simulations. Further, a new prospective logical investigation was introduced in this study, which consisted of the analysis of the mean of synaptic weight for *DHNN* − *δ*2*SAT* to evaluate the existence of a flexible logical structure. The findings demonstrated that the proposed *δ*2*SAT* was successful in achieving a flexible logical structure with a prevailing attribute dataset compared to other state-ofthe-art SAT. For future work: (1) A metaheuristic analysis of the probability logic phase would aid the selection of the negative literals' positions in a logic system. (2) A metaheuristic analysis of the training phases would aid the satisfaction of Equation (12). (3) A metaheuristic analysis of the testing phases would aid the generation of a vast range of space solutions. (4) Synaptic weight analysis can

be applied in the training phases to address the effects of the energy function and global solutions on the synaptic weight. Moreover, we can add the measure of variability to address the deviation in the results. Notably, the robust architecture of ANNs integrated with our proposed logic would serve as a good foundation for real-life applications such as Natural Disaster prediction. In this context, each neuron would represent the attributes from the data, such as rainfall trends, river levels, and drainage and ground conditions. These attributes will be embedded into the logic-mining approach proposed by [45], which will lead to the formation of induced logic, which, in turn, has predictive and classificatory abilities. In other developments, the proposed logic system would be indispensable in finding the optimal route in the Travelling Salesman Problem.

**Author Contributions:** Conceptualization, methodology, software, writing—original draft preparation, S.A.; formal analysis, validation, N.E.Z.; supervision and funding acquisition, M.S.M.K.; writing—review and editing, G.M.; visualization, N.A.; project administration, M.A.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by Ministry of Higher Education Malaysia for Transdisciplinary Research Grant Scheme (TRGS) with Project Code: TRGS/1/2022/USM/02/3/3.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors would like to express special thanks to all researchers in the Artificial Intelligence Research Development Group (AIRDG) for their continued support.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**



#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **Edge Computing Offloading Method Based on Deep Reinforcement Learning for Gas Pipeline Leak Detection**

**Dong Wei 1, Renjun Wang 1,2,3,4, Changqing Xia 1,2,3,4,\*, Tianhao Xia 2,3,4, Xi Jin 2,3,4 and Chi Xu 2,3,4**

<sup>1</sup> School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China

<sup>2</sup> State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences,


**Abstract:** Traditional gas pipeline leak detection methods require task offload decisions in the cloud, which has low real time performance. The emergence of edge computing provides a solution by enabling offload decisions directly at the edge server, improving real-time performance; however, energy is the new bottleneck. Therefore, focusing on the gas transmission pipeline leakage detection scenario in real time, a novel detection algorithm that combines the benefits of both the heuristic algorithm and the advantage actor critic (AAC) algorithm is proposed in this paper. It aims at optimization with the goal of real-time guarantee of pipeline mapping analysis tasks and maximizing the survival time of portable gas leak detectors. Since the computing power of portable detection devices is limited, as they are powered by batteries, the main problem to be solved in this study is how to take into account the node energy overhead while guaranteeing the system performance requirements. By introducing the idea of edge computing and taking the mapping relationship between resource occupation and energy consumption as the starting point, the optimization model is established, with the goal to optimize the total system cost (TSC). This is composed of the node's transmission energy consumption, local computing energy consumption, and residual electricity weight. In order to minimize TSC, the algorithm uses the AAC network to make task scheduling decisions and judge whether tasks need to be offloaded, and uses heuristic strategies and the Cauchy– Buniakowsky–Schwarz inequality to determine the allocation of communication resources. The experiments show that the proposed algorithm in this paper can meet the real-time requirements of the detector, and achieve lower energy consumption. The proposed algorithm saves approximately 56% of the system energy compared to the Deep Q Network (DQN) algorithm. Compared with the artificial gorilla troops Optimizer (GTO), the black widow optimization algorithm (BWOA), the exploration-enhanced grey wolf optimizer (EEGWO), the African vultures optimization algorithm (AVOA), and the driving training-based optimization (DTBO), it saves 21%, 38%, 30%, 31%, and 44% of energy consumption, respectively. Compared to the fully local computing and fully offloading algorithms, it saves 50% and 30%, respectively. Meanwhile, the task completion rate of this algorithm reaches 96.3%, which is the best real-time performance among these algorithms.

**Keywords:** edge computing; deep reinforcement learning; heuristic algorithm; task offloading; resource allocation

**MSC:** 68W99

#### **1. Introduction**

With the advent of 5G, high-performance computing and other technologies in industry have developed in the direction of high real-time engagement and low energy consumption, and many delay-sensitive and computationally intensive applications and

**Citation:** Wei, D.; Wang, R.; Xia, C.; Xia, T.; Jin, X.; Xu, C. Edge Computing Offloading Method Based on Deep Reinforcement Learning for Gas Pipeline Leak Detection. *Mathematics* **2022**, *10*, 4812. https://doi.org/10.3390/ math10244812

Academic Editors: Francois Rivest and Abdellah Chehri

Received: 18 November 2022 Accepted: 16 December 2022 Published: 18 December 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

services have emerged. Although cloud computing can provide sufficient computing resources, a large amount of traffic generated in the process of task delivery to the cloud will likely lead to network congestion, unpredictably high delay, and massive transmission energy consumption, and a distributed computing method is needed to solve these problems. Edge computing makes this feasible. To move computing to the edge of the network solves the problem of high latency of cloud services and makes up for the lack of computing resources of end devices to a certain extent.

Although edge computing provides a feasible solution for such scenarios, it entails the problem of using limited resources to realize high real-time performance and low energy consumption. Much research has been performed in this field, with good results. The main concern is to balance low latency and low energy consumption, which can effectively solve the offloading problem when the attributes of the task set to be processed are known. However, such solutions have the common limitation of low robustness, which will lead to a chain reaction when unexpected tasks enter the system, sharply degrading system performance. This is more likely to occur when tasks arrive in real time. Sun et al. [1] proposed a task offloading algorithm based on a hierarchical heuristic strategy, aiming to minimize the task delay and energy consumption, but it assumes the task set to be scheduled is known, without taking into account sudden tasks. Similarly, Li et al. proposed a task offloading algorithm based on deep reinforcement learning, which is based on a known task set, to schedule tasks [2].

Taking the leak detection of a natural gas transmission pipeline as an example, once a leak occurs, there is a great danger. Detectors need to work in the leak area. The faster they locate the leak point, the less the security risk; hence this scenario demands high reliability in real time. Many portable gas leak detectors depend on the collection of infrared or other spectral images for image analysis [3]. Because the detector must constantly change its position during operation, it needs to feedback results immediately so that it will not miss the leak point. However, due to the size of the detector, its computing and battery capacity have certain limitations. It is difficult to complete some complex recognition tasks on time, which greatly affects detection efficiency and accuracy. Figure 1 shows the workflow of the solution. By introducing edge computing, complex image processing tasks generated by the detection equipment can be uploaded to the cloud and processed quickly, which can enable the accurate and quick location of the leak point.

**Figure 1.** Workflow of edge computing in natural gas pipeline detection.

This paper proposes a natural gas leak detection algorithm that combines edge computing task offloading with portable natural gas leak detection technology—a real-time multileak detection algorithm based on the improved advantage actor-critic (AAC) method—to improve the detection efficiency and endurance of instruments. We consider a three-tier edge computing architecture with cloud-side and end-to-end collaboration, where the portable gas leak detector is at the end of the system and has some computing power itself. To improve the efficiency and range of the detector, the image analysis task must be

offloaded, so as to determine where to process the task and allocate resources to it. The current system state is determined, and this is input to the constructed AAC network to determine the processing position of tasks in the system. The results obtained from the network are optimized using the proposed heuristic algorithm. At the same time, the allocation strategy of communication resources is determined, and tasks are scheduled and executed according to the offloading results. By analyzing the problem, improving the detection efficiency and range of the existing detection instruments can address the objectives in the edge computing task offloading problem, that is, to improve the real-time performance and minimize the energy consumption of the edge computing system as much as possible. This paper makes the following contributions:


The remainder of this paper is organized as follows. Section 2 describes related work. Section 3 presents the proposed system model and describes the problem. Section 4 details the main steps of the proposed algorithm. Section 5 compares the performance of the proposed algorithm with baseline algorithms such as DQN and GTO through experiments. Section 6 concludes the paper.

#### **2. Related Work**

Many studies have been conducted on the task offloading problem of edge computing, which is NP-hard, and all solutions thus far have been approximate. However, different optimization techniques can be used such that the approximate solution converges to the optimal solution. These solutions start either with machine learning or traditional means such as greedy heuristics, integer optimization, branch delimitation, game theory, or convex optimization. The two most important factors in edge computing are latency and energy consumption.

#### *2.1. Traditional Task Offloading Methods*

Kan et al. proposed a heuristic algorithm for offloading tasks to MEC servers considering radio and computational resources with the goal of minimizing the average task latency, which was shown by experiments to achieve excellent results under different latency requirements [4]. Due to the relative lack of infrastructure, they introduced drones to assist in edge computing, and proposed a USS algorithm [5] that can satisfy the task processing latency constraint in the multiuser case. Wang, Shen, and Zhao introduced a dynamic penalty function in a study of edge computing in the smart grid domain, and proposed an improved algorithm for solving Lagrange multipliers [6], which overcomes the shortcomings of traditional grid systems that cannot provide deterministic services, and can effectively improve the overall system revenue and reduce the average delay of user tasks. Li et al. considered event-triggered decision systems, whose goal is to optimize the average system revenue to satisfy the average delay constraint for different priority services [7]. Ref. [8] presented designs of online computing task scheduling methods for multi-server edge computing scenarios [8]. Sun et al. [9] considered an ultra-dense network environment that supports edge computing. Constantly moving users dynamically generate computational tasks in the network, which need to be offloaded to the base station for computation. In order to minimize the average delay given a limited energy budget, users need to make mobility management decisions about base station association and switching based on their service requirements without knowing future information.

System energy consumption has long been a concern among edge computing researchers, as an important component of system cost, and especially in mobile edge computing, where energy consumption directly affects system endurance and reliability. Michael proposed a hybrid method based on particle swarm optimization and the gray wolf optimizer [10] to optimize the energy consumption of MEC task offloading. Ding and Zhang [11] proposed a game theory-based computational offloading strategy for massive IoT devices, which improves data transfer and reduces task energy consumption using the beneficial task offloading theory.

Delay and energy consumption factors are usually considered together, and are important factors affecting the user experience. Researchers can decide whether to optimize delay or energy consumption based on specific requirements. Some studies have considered the minimum energy consumption while satisfying the latency constraint using heuristic algorithms [12,13]. Others have proposed a more flexible optimization objective, synthesizing both into a cost objective, where the weights of delay and energy consumption in the cost formulation can be changed according to the case [14,15]. Ref. [16] considered two different cases of adjustable and non-adjustable CPU frequency of APs. A linear relaxation based approach and an exhaustive search based approach are proposed to obtain the offloading decision for these two cases, respectively. The method aims to minimize the total task ground execution delay and the energy consumption of the mobile device (MD) [16]. In order to trade-off the two metrics of energy consumption and computational latency, a Liapunov-based algorithm was proposed in Ref. [17] for computing task offloading decisions in mobile edge computing systems. The algorithm greatly reduces the energy consumption of the device while satisfying the latency constraint [17]. Ref. [18] investigated the computational offloading and scheduling problem, which seeks to minimize the cost per mobile device, where the cost is defined as a linear combination of task completion time and energy consumption. In addition, the literature considers inter-device communication and competition for computational resources. The problem is also defined formally using a game model, and a decentralized algorithm is designed to achieve a pure policy Nash equilibrium [18]. Tang et al. modeled the multi-user computational offloading problem in an uncertain wireless environment as a non-cooperative game based on PT, and then proposed a distributed computational offloading algorithm to obtain a Nash equilibrium, which minimizes the user overhead [19]. Yi et al. considered that tasks are randomly generated by mobile users and proposed a mechanism based on queuing model. This is

used to maximize social welfare and achieve the equilibrium of the noncooperative game among mobile users [20].

Task offloading algorithms based on the above studies are based on ideal mathematical models, and cannot consider all the factors that affect the optimization objective, which limits their task offloading performance. To solve this problem, a new class of offloading methods has been proposed, using deep learning techniques, with good results.

#### *2.2. Machine Learning Task Offloading Methods*

To cope with the variability of edge computing application environments, Wang and Jia et al. proposed a meta-reinforcement learning-based approach to solve the computational offloading problem [21], which enables fast adaptation to dynamic scenarios without updating too many parameters. A joint task offloading and bandwidth allocation problem was considered for multiuser computational offloading, with the goal of minimizing the overall delay in completing user tasks, using a DQN approach to find the optimal solution [22].

Wang Jin et al. [23] found that studies using DRL for task offloading rarely focus on the dependencies between tasks, and proposed a DRL offloading method that can address dependent tasks. The general dependency of tasks was modeled as a directed acyclic graph (DAG), and an S2S neural network captured the features of the DAG and output the offloading strategy. The method can use delay, energy consumption, or tradeoffs of both as optimization objectives.

In Ref. [24], the authors are the first to attempt to consider end-device energy consumption in a deep learning-based modeling of MEC partial offloading schemes [24]. They propose a novel partial offloading scheme EEDOS based on a fine-grained partial offloading framework, in which the cost function comprehensively considers important parameters such as residual energy of end-devices and energy consumption of previous application components. Dai and Niu [25] used unmanned aerial vehicles (UAVs) to assist edge servers for task offloading, minimizing the energy consumption of all mobile end devices by jointly optimizing UAV trajectories, task association, and the resource allocation of computation and transmission. They reduced the problem complexity by decomposing the joint optimization problem into the subproblems of UAV trajectory planning, task association scheduling, and resource allocation of computation and transmission. A proposed hybrid heuristic and learning-based scheduling strategy (H2LS) algorithm incorporated long shortterm memory neural networks, fuzzy c-means, deep deterministic policy gradients, and convex optimization techniques.

As with traditional optimization techniques, most of the research on the application of deep learning in offloading edge computing tasks focuses on the integrated consideration of delay and energy consumption. To focus on only one of these aspects can bring the results closer to the optimal solution, at the price of a narrow range of practical applications. Yang and Lee proposed a deep supervised learning-based dynamic computing task offloading approach (DSLO) for mobile edge computing networks [26], minimizing the delay and energy consumption by jointly optimizing the offloading decision and bandwidth allocation problem. Cao et al. proposed a multi-intelligent deep reinforcement learning (MADRL) scheme [27] to solve the multichannel access and task offloading problems in edge computing-enabled Industry 4.0, which allows edge devices to collaborate and significantly reduce computational latency and mobile device energy consumption relative to traditional methods. Huang et al. [28] considered a mobile edge computing system, in which each user has multiple tasks transferred to the edge server over a wireless network. They proposed a deep reinforcement learning based approach to solve the problem of joint task offloading and resource allocation. In Refs. [29,30], the authors proposed to use deep reinforcement learning methods to solve the task offloading problem in mobile edge computing, and made some progress, obtaining better latency and energy consumption than when using deep learning [29,30].

Although the above deep learning-based solutions have achieved good results, they have limitations if we only consider how to optimize the latency and energy consumption of task processing. In the problem addressed in this paper, each image analysis task is generated in real time, and the optimization goal of low latency can cause some tasks to have low processing latency, at the cost of some subsequent tasks that exceed their deadlines; hence, they cannot guarantee overall high real-time performance. We propose an AAC and heuristic policy-based task offloading algorithm that simultaneously considers overall task execution in real time and low energy consumption, and use it to optimize the performance of a portable gas leak detector. The algorithm reduces the energy consumption of the detector as much as possible by jointly optimizing the task offloading location and resource allocation problems while ensuring completion within the deadline.

#### **3. System Model and Problem Description**

#### *3.1. System Model*

The edge computing system (ECS) consists of a cluster of cloud servers, a wireless communication base station with small edge servers, and K portable gas leak detectors, *γ* = {*U*1, *U*2, *U*3, ... ... , *Uk*}; each detector *Ui* can generate in time order a series of independent image recognition tasks, each task of all detectors is generated in real time and cannot be split, and the set of tasks can be denoted by Г*<sup>i</sup>* = {*Ti*,1, *Ti*,2, *Ti*,3, ... ... *Ti*,*N*}; each task has six attributes, and any task *i* can be denoted as *Ti* = {*j*, *si*, *di*, *Di*, *cyi*, *ωi*}, where *j* is the serial number of the detector, *si* is the release time of task *i* (in seconds), *di* is the relative deadline of the task, *Di* is the size of the data carried by the task (in Mb), *cyi* is the CPU processing cycle required by the task, and *ω<sup>i</sup>* is its priority. An example of the system model is shown in Figure 2. The cloud server has sufficient resources for the detectors, so there is no need to consider the waiting and preemption of tasks in the cloud, and only one task can be processed at a time on a detector. The task offloading algorithm is deployed on the edge server in the communication base station, and the information changes of each node are transmitted to the edge server in real time. In this model, the tasks to be offloaded are generated by the detector in real time, and each task is indivisible. Considering that the offloading decision requires knowledge of the global information of the system, while the system that transmits the main task parameters is not a complete model and the base station is very close to the detector and there is no conflict in the transmission process, the offloading algorithm generates comparable and almost negligible communication energy consumption and delay whether it is executed on the detector or on the base station equipped with the edge computing server [1]. The edge server has more arithmetic power and faster execution, so the communication base station is left in charge of the communication function and makes the offloading decision, based on which the detector offloads the computational task to the cloud or processes it locally. If offloaded to the cloud, the cloud server will return the results after processing, and the energy consumption of the detector during the offloading process includes that for transmission and local processing.

**Figure 2.** System model.

#### *3.2. Description of problem*

Since each task in the system can be chosen to be executed either locally or in the cloud, an offloading decision variable is introduced to indicate the execution location of a task,

$$
\pi\_i^{\mathbb{C}} = \begin{cases} 0 & \text{Task executed locally,} \\ 1 & \text{Task executed in the cloud} \end{cases} \tag{1}
$$

The transmission power of all edge devices (detectors) is *P*. The data transmission rate assigned for any task *i* is *ri*, the average CPU frequency of the cloud server is *FC*, and the CPU frequency of the edge device is *f <sup>L</sup> <sup>i</sup>* . Therefore, the time to locally execute task *Ti* of edge device U is

$$t\_i^L = \frac{\mathfrak{c}y\_i}{f\_i^L} \tag{2}$$

The local execution energy consumption of a task is

$$c\_i^L = \mathbf{a} \* \left( f\_i^L \right)^2 \* c y\_i \tag{3}$$

If a task is unloaded, its unloaded transfer time is

$$t\_i^{LC} = \frac{D\_i}{P} \tag{4}$$

The cloud processing time of a task is

$$t\_i^C = \frac{c y\_i}{F^c} \tag{5}$$

The offloading transmission energy consumption of a task is

$$
\sigma\_i^T = \frac{P\_i \ast D\_i}{r\_i} \tag{6}
$$

where a is the chip-related energy consumption coefficient of edge device U [31].

The mathematical model described in this paper must optimize the objectives of the real-time system and the total energy consumption of edge devices for synergistic optimization while considering load balancing. To achieve the joint optimization of the above objectives, the model optimization objective is transformed to the total system cost TSC,

$$TSC = \min\_{r\_i, \pi\_i^L} \sum\_{1}^{M} \left(1 - \pi\_i^C\right) \ast \mathbf{a} \ast \left(f\_i^L\right)^2 \ast c y\_i + \pi\_i^C \ast \frac{\boxleftarrow{E}}{E\_i} \ast \frac{P\_i \ast D\_i}{r\_i} \tag{7a}$$

$$
\pi\_i^\mathbb{C} = \{0, 1\} \tag{7b}
$$

$$0 \le f\_i^L \le F\_i^L \tag{7c}$$

$$0 \le E\_i \le 1\tag{7d}$$

$$0 \le \sum\_{i=1}^{M} r\_i \le R \tag{7e}$$

where *Ei* is the remaining power percentage of edge device *i*, and *E* is the average power percentage of all devices that are idle and must perform offload tasks.

Equation (7a) is the weighted sum of the local execution energy consumption and offload transmission energy consumption for task *i*, and *<sup>E</sup> Ei* is the distance between the remaining power of each device and the average power. A larger *<sup>E</sup> Ei* indicates that the remaining power of the device is farther from the average power. To reduce the energy consumption of the device, it has the opportunity to share more communication resources (faster data transmission rate) when the system performs bandwidth resource allocation [1]

Constraints 7b, 7c, 7d, and 7e refer to the offloading decision variables, range of CPU frequency variation per device, range of power percentage variation per edge device, and range of data transfer rate variation per device, respectively.

The variables involved in the model are shown in Table 1.



#### **4. Task Offloading Algorithm**

The proposed task offloading algorithm has two parts. The AAC algorithm gives the scheduling location of the task. The initial offloading decision is obtained by the heuristic algorithm, based on which the AAC network is updated. The heuristic algorithm can be solved quickly for the NP-hard problem, but the suboptimal solution found by this method has room for improvement. Reinforcement learning is used to optimize the obtained unloading strategy. The algorithms are described below.

#### *4.1. Heuristic Algorithm*

The heuristic algorithm considered in this paper takes Equation (7a) as the optimization objective. Since the optimization for the TSC is an NP-hard problem, the deep reinforcement learning algorithm is used to first determine whether the new arrival task

is to be offloaded, and the Cauchy–Buniakowsky–Schwarz inequality is used to derive the transmission rate allocation, and thus the processing time for each task. If the processing time exceeds the task deadline, the processing position of the task is redetermined according to the priority of the task, and the transfer rate allocation is calculated. Iterations continue until an approximate optimal solution is found.

The Cauchy–Buniakowsky–Schwarz inequality is often applied to quickly solve ndimensional inequalities [32] and its application to solve the system communication resource allocation can simplify computations and reduce the execution time of the offloading algorithm. When using this inequality, we must first ensure that the left-hand side of the inequality can be split into two non-negative expressions multiplied together.

**Theorem 1.** *If the inequality <sup>R</sup>* <sup>∗</sup> <sup>∑</sup>*<sup>M</sup>* 1 *π<sup>C</sup> <sup>i</sup>* <sup>∗</sup> *<sup>E</sup> Ei* ∗*Pi*∗*Di ri* ≥ ∑*<sup>M</sup> i* \$ *π<sup>C</sup> <sup>i</sup>* <sup>∗</sup> *<sup>E</sup> Ei* ∗ *Pi* ∗ *Di* <sup>2</sup> *satisfies both R* > 0 *and* ∑*<sup>M</sup> π<sup>C</sup> <sup>i</sup>* <sup>∗</sup> *<sup>E</sup> Ei* ∗*Pi*∗*Di ri* ≥ 0*, the equality sign holds when and only when*

$$r\_i^\* = \frac{R\*\sqrt{\pi\_i^C\*\frac{\mathbb{E}}{E\_i}\*P\_i\*D\_i}}{\sum\_{i=1}^M \sqrt{\pi\_i^C\*\frac{\mathbb{E}}{E\_i}\*P\_i\*D\_i}}\tag{8}$$

*Additionally, when ri =r*<sup>∗</sup> *<sup>i</sup> , TSC obtains the minimum value.*

1

**Proof of Theorem 1.** It is known that *R* is the total transmission rate of the system, which is always positive, and each term in ∑*<sup>M</sup>* 1 *π<sup>C</sup> <sup>i</sup>* <sup>∗</sup> *<sup>E</sup> Ei* ∗*Pi*∗*Di ri* is greater than or equal to zero, which satisfies the condition of use of the Cauchy–Buniakowsky–Schwarz inequality. Then the following inequalities are solved by combining the constraints, and the specific solved process is stated in Equation (9) for the optimization objective expression (7a) and its constraint (7e):

$$R\*\sum\_{l}^{M}\frac{\pi\_{i}^{\mathbb{C}}\*\frac{\mathbb{E}}{\mathbb{E}\_{i}}\*P\_{i}\*D\_{i}}{r\_{i}}\geq\sum\_{i}^{M}r\_{i}\*\sum\_{l}^{M}\frac{\pi\_{i}^{\mathbb{C}}\*\frac{\mathbb{E}}{\mathbb{E}\_{i}}\*P\_{i}\*D\_{i}}{r\_{i}}\\\geq\left(\sum\_{i}^{M}\sqrt{\pi\_{i}^{\mathbb{C}}\*\frac{\mathbb{E}}{\mathbb{E}\_{i}}}\*P\_{i}\*D\_{i}\right)^{2}.\tag{9}$$

According to the Cauchy–Buniakowsky–Schwarz inequality, if there exists some *ri* not equal to 0, the equality sign holds when and only when there exists a real number *X* such that for every *<sup>i</sup>* <sup>=</sup> 1, 2, . . . , *<sup>n</sup>*, there is *ri* <sup>∗</sup> *<sup>X</sup>* <sup>+</sup> *<sup>π</sup><sup>C</sup> <sup>i</sup>* <sup>∗</sup> *<sup>E</sup> Ei* ∗*Pi*∗*Di ri* = 0, i.e.,

$$r\_i^\* = \frac{R \ast \sqrt{\pi\_i^C \ast \frac{\mathbb{E}}{E\_i} \ast P\_i \ast D\_i}}{\sum\_{i=1}^M \sqrt{\pi\_i^C \ast \frac{\mathbb{E}}{E\_i} \ast P\_i \ast D\_i}}\tag{10}$$

and when *ri* = *r*<sup>∗</sup> *<sup>i</sup>* , TSC obtains the minimum value, and the theorem is proved. -

In the task scheduling process of the real-time edge system in this paper, not only should we consider making the energy consumption of the edge devices as low as possible, but the tasks should meet the deadline requirements to the maximum extent to improve the real-time performance of the whole system. In traditional scheduling methods, often only the remaining execution time or deadline of a task is used to reflect that the urgency of task execution evaluation criteria is too singular. We propose a dynamic priority evaluation method that integrates the initial priority, remaining execution time, deadline, and idle time of a task. The dynamic task priority Ω*<sup>i</sup>* is composed of the preemption cost *δ<sup>i</sup>* of the task and the execution urgency *ϕi*,

$$
\Omega\_i = \delta\_i \ast \varrho\_i \tag{11}
$$

where

$$\delta\_{\bar{i}} = \frac{\omega\_{\bar{i}}}{t\_{\bar{i}}^{\mathrm{LC}} + t\_{\bar{i}}^{\mathrm{C}}} \tag{12}$$

Tasks have different levels of importance. Equation (12) integrates the initial priority of tasks and deadlines, which can ensure that important tasks can be completed on the basis of as many tasks as possible that are close to the deadline and can also be executed, which protects the tasks being executed to some extent. The task execution urgency is

$$p\_i = q^{\frac{t\_i^{L,C} + t\_i^C}{d\_i - 1}} \tag{13}$$

where *t* is the current moment and *q* ∈ (1, ∞). The execution urgency of the task decreases as the task is executed, which in turn gives a somewhat greater chance of execution for newly arrived tasks.

#### *4.2. Deep Reinforcement Learning Algorithms*

To perform further optimization based on the task unloading decision obtained from the heuristic strategy, a deep reinforcement learning model, AAC, is considered to perform the unloading decision for the newly arrived task. The network structure of the model is shown in Figure 3.

**Figure 3.** Advantage actor-critic network structure.

From Figure 3 above, we can see that the AAC network is composed of two subnetworks, actor and critic, where the first two layers of the two sub-networks are shared, in order to reduce the complexity of the model and speed up the network convergence. Meanwhile, the hidden layers of both sub-networks consist of 256 × 128 neurons, which is the best combination chosen after several attempts in the experiments. Through keeping the other conditions of the experiment constant, only the number of neurons in the network was allowed to increase evenly between 64 × 64 and 256 × 256. We found that too few neurons make the training unstable and difficult to converge, while too many neurons lead

to overfitting. Optimal model performance is only achieved when the number of neurons is varied to near 256 × 128.

The offloading decision is a prerequisite for resource allocation. We discuss the three elements of the reinforcement learning-based offloading decision method: environment, action, and reward.

1. Environment state *S*

The superiority of the state will have a great impact on the final training effect of the model. In this model, the environment state includes the state of the task and the external environment. The model is updated only when a new task arrives, and these arrive in chronological order, so the task's own state includes the properties of the new task, and the external environment state has the remaining power *Ei* of each node in the system at this time, the average remaining power *E*, the CPU speed *f <sup>L</sup> <sup>i</sup>* of the node generating the task, the average CPU speed *F<sup>C</sup>* of the cloud, and the number of tasks to be transmitted in the system.

2 Action *a*

In the reinforcement learning model, the action is the decision made by the agent, and there are only two actions in this scheduling model: transmission and non-transmission.

3 Reward function

The output of the reinforcement learning model is the probability *p<sup>θ</sup>* (*a*|*S*) of selecting different actions in a certain state. To measure the goodness of an action, the system cost TSC is used as the reward.

The AAC algorithm first defines an initial actor *π* to interact with the environment, as shown in Figure 4. The collected information is used to train the critic network to estimate the value function *V*, which is the sum of the rewards received by the system after performing an action until the end of the interaction. The actor network is updated and iterated until both networks converge. The actor network parameters are updated as follows:

$$\widetilde{\nabla R\_{\theta}} \approx \frac{1}{N} \sum\_{n=1}^{N} \sum\_{t=1}^{T\_n} \left( LSC\_t^n + V^\pi \left( S\_{t+1}^n \right) - V^\pi \left( S\_t^n \right) \right) \nabla \log p\_{\theta} \left( a\_t^n | S\_t^n \right) \tag{14}$$

$$
\theta = \theta - \eta \ast \widehat{\nabla \mathcal{R}\_{\theta}} \tag{15}
$$

where ∇ *R<sup>θ</sup>* is the gradient of the mean of the reward sum of multiple trajectories, and *θ* is the parameter of the actor network. Since the optimization goal is to reduce energy consumption while satisfying the real-time performance of the task, which is the opposite of the goal of maximizing the reward of reinforcement learning, gradient descent is used to update the network.

**Figure 4.** Trajectory of interaction between actor π and environment.

#### *4.3. Algorithm Process*

We combine the heuristic algorithm and deep reinforcement learning algorithm, using the AAC network for offloading decisions, and a heuristic algorithm for resource allocation, as shown in Algorithm 1.

**Algorithm 1:** Offloading algorithm for edge computing tasks based on deep reinforcement learning.


The core part of Algorithm 1 uses a heuristic algorithm and the AAC network, which is a deep reinforcement learning network model. In the edge computing scenario considered in this paper, task offloading, and resource allocation is an NP-hard problem. At the same time, the uncertainty of task arrival poses a great challenge for task offloading. Facing this multi-objective optimization problem, traditional optimization techniques (e.g., linear programming) have difficulty in obtaining better results [33]. In addition, deep reinforcement learning has two advantages in facing the above problem: (1) compared with many one-time optimization methods, deep reinforcement learning can adjust the strategy with the change of environment; (2) its learning process does not need to know the relevant a priori knowledge about the law of network state change over time [34, 35]. In fact, the heuristic algorithm is the basis on which the present model can operate efficiently, and the main purpose of the AAC is to further optimize the optimization results derived from the heuristic algorithm. The heuristic algorithm performs the optimization search by introducing the Cauchy–Buniakowsky–Schwarz inequality, which can reduce the

number of iterations and greatly accelerate the solution efficiency by using the conclusion of Theorem 1.

The AAC model is improved from the actor-critic model. In the actor-critic model, both Q-network (Network to evaluate good and bad actions) and V-network (Network to evaluate good and bad status) need to be estimated, which is not only time-consuming, but also has greater uncertainty. In the AAC model, the expectation of the V-network is directly used to estimate the Q-network, that is, the critic network is allowed to learn the Advantage value directly instead of the Q value. In this way, the assessment of behavior is not only based on how good the behavior is, but also on how much the behavior can be improved. The benefit of the advantage function is that it reduces the variation in the values of the policy network, and stabilizes the model, giving the AAC model superior convergence.

#### **5. Experimental Results and Analysis**

Simulation experiments are used to demonstrate the performance of the proposed algorithm. All parameters are chosen according to real scenarios. As shown in Table 2, the number of portable detection devices is set to 10, their computational power is 0.2 GCycles/s, and that of the cloud is 10 GCycles/s. The transmission power (w) of the portable devices is a random number in (0.1,0.2), with a total system transmission rate of 800 Mb/s. The amount of data for each task is (10,40) Mb, and the required computation period is (0.01,0.3) GCycles. The arrival time of the task conforms to a uniform distribution [36].

**Table 2.** Simulation parameters.


To demonstrate the performance of the improved AAC-based multi-leakage real-time detection algorithm, the algorithm and the DQN algorithm are trained simultaneously in the same environment. The proposed algorithm is also compared with two benchmark algorithms, that is, task fully local computation and full offloading. Meanwhile, in order to better represent the performance of the proposed algorithm in this paper, we also compare it with a series of excellent heuristics, such as GTO [37], BWOA [38], EEGWO [39], AVOA [40] and DTBO [41].

The variations in the total cost TSC per iteration based on the improved AAC multileakage real-time detection algorithm and the DQN algorithm in this experimental setting are shown in Figure 5. From Figure 5, it can be seen that the proposed algorithm in this paper has nearly stabilized and the model reached convergence at 50 rounds of training, while the DQN algorithm only shows a significant trough when the training reaches 700 rounds. Although both use a 256 × 128 network structure, the AAC algorithm allows for more stable training and faster convergence due to the presence of the critic network. The figure also shows that the total cost per round of the proposed algorithm is lower than that of the DQN algorithm, so it is better in terms of overall performance. In Figure 6, the vertical coordinate indicates the total system energy consumption. The energy consumption variation curve of the improved AAC-based multi-leakage real-time detection algorithm is approximately 56% lower than that of the DQN algorithm after convergence. The AAC algorithm is improved on the basis of the DQN algorithm, which overcomes the problem of unstable training of the DQN algorithm. Moreover, the AAC algorithm in this paper is not used alone, it works as a further enhancement after the heuristic algorithm gets the

suboptimal solution of the model. Therefore, this algorithm can obtain a large improvement relative to the DQN algorithm.

**Figure 5.** TSC based on improved AAC multi-leakage real-time detection algorithm and DQN algorithm.

**Figure 6.** Total system energy consumption based on improved AAC multi-leakage real-time detection algorithm and DQN algorithm.

Figures 7 and 8 compare the line graphs of the energy consumption of the improved AAC-based multi-leakage real-time detection algorithm to the fully local computation algorithm and the fully offloading algorithm. Since these two algorithms are not machine learning algorithms, there is no training process, so it is not necessary to compare the algorithm convergence here. They show that the system using the improved AAC-based multi-leakage real-time detection algorithm has better total energy consumption than the two baseline algorithms, thus saving approximately 50% of the energy consumption compared to the fully local calculation, and saving approximately 30% of the energy consumption compared to the fully unloaded algorithm. To make the experiments more realistic, the test tasks have different amounts of data and complexity; thus, having them all executed locally or in the cloud would result in higher energy consumption due to the underutilization of system resources. At the same time, if we combine Figures 5–7 together for comparison, we can see that the system energy consumption of the DQN algorithm is around 9, which would be slightly higher than the 6.5 energy consumption for local computation and 5.3 energy consumption for full offloading. This is due to the fact that in the scenario considered in this paper, the task to be offloaded is so random that the performance of the DQN algorithm is no longer sufficient for this scenario, and incorrect predictions can waste a lot of energy.

**Figure 7.** Total system energy consumption based on improved AAC multi-leakage real-time detection algorithm with fully locally calculated system.

Figure 9 shows the comparison of the total system energy consumption between the proposed algorithm and some current excellent heuristics. With a simple calculation, we can conclude that the proposed algorithm saves 21%, 38%, 30%, 31% and 44% of energy consumption compared to the GTO, BWOA, EEGWO, AVOA and DTBO algorithms, respectively. If combined with Figures 6–8, it can be seen that all these heuristics used for comparison in the experiments perform well. Nevertheless, the proposed algorithm outperforms them. Thus, we can say with more certainty that due to the addition of deep reinforcement learning, the performance of the traditional heuristic algorithm can be brought to a higher level.

**Figure 8.** Total system energy consumption based on improved AAC multi-leakage real-time detection algorithm with fully offloaded system.

**Figure 9.** Total system energy consumption based on improved AAC multi-leakage real-time detection algorithm with some excellent current heuristics algorithms.

In this experiment we also obtained another metric to evaluate the performance of the algorithms, namely the task completion rate. Based on the output of the code, we can obtain the task completion rate of 96.4% for the proposed algorithm and 93.2% for the DQN algorithm; the corresponding values were 92.8%, 90.3%, 89.4%, 94.3% and 91.1% for the GTO, BWOA, EEGWO, AVOA and DTBO algorithms, respectively. The task completion rates for the fully local computing and fully offloading algorithms were 86.7% and 93.4%, respectively. According to the experimental results, the use of the proposed algorithm in this paper allows the highest execution success rate of the tasks, indicating that this algorithm has the best real-time performance and can ensure that as many tasks as possible are completed before the deadline. The task completion rate of the fully local computing algorithm is the lowest, which is mainly due to the high complexity of the task and the limited computing power of the nodes.

From the experiments designed in this paper, we can know that this algorithm design idea is reasonable and effective. It is based on the principle of using heuristic algorithm for initial optimization at first, and then further optimization using deep reinforcement learning. It can bring about more efficient task offloading for edge computing, which not only ensures the real-time performance of the algorithm, but also further reduces the system energy consumption compared to the current better optimization-seeking algorithms such as GTO.

#### **6. Conclusions and Future Work**

We studied an edge computing task offloading and resource allocation problem in a natural gas pipeline leak detection scenario, with the optimization goal of minimizing energy consumption while ensuring high real-time performance of the system. Due to the unpredictability of computational tasks, deep reinforcement learning was used to solve this problem. Using the AAC algorithm framework, the final offloading strategy was obtained by fully considering minimizing the overall system cost and continuously optimizing the task offloading strategy, followed by optimizing the allocation of communication resources through a heuristic algorithm based on the Cauchy–Buniakowsky–Schwarz inequality. Simulation results show that this algorithm has a faster convergence speed compared to the DQN algorithm, while the energy consumption is reduced by 56%. Although heuristics such as GTO, BWOA, EEGWO, AVOA and DTBO have better performance than the DQN algorithm, the proposed algorithm still saves 21%, 38%, 30%, 31% and 44% of energy consumption compared to them, respectively. The energy consumption is reduced by 50% compared to the fully local computation, and by 30% compared to the fully offloaded algorithm. This algorithm also has the highest task completion rate with the highest realtime performance. Furthermore, this paper proves a sufficient condition for the heuristic algorithm to achieve a suboptimal solution using the Cauchy–Buniakowsky–Schwarz inequality. From the performance of the DQN algorithm in the experiments, due to the strong real-time nature of the scenario in this paper and the strong uncertainty of the system environment, the model convergence speed of the reinforcement learning algorithm alone is slow, and at the same time, incorrect offloading predictions also tend to lead to higher energy consumption. Finally, the proposed algorithm in this paper is also not optimal for certain application scenarios. This algorithm uses a complex deep reinforcement learning model in order to meet the performance requirements of task arrival scenarios in real time. In contrast, for deterministic scenarios where the set of tasks to be offloaded is known and no prediction of future tasks is required, simpler heuristics, such as linear programming algorithms, etc. can achieve the same or even better performance, and the latter is clearly the better choice.

In this paper, the communication environment of the system is simplified while modeling, and the interference factor of the channel is not considered. The allocation of network resources in the edge computing system is also idealized and will be studied in detail in the next work in conjunction with SDN technology. In future work, we will also further consider the mutual cooperation among edge nodes, in order to maximize the utilization of system idle resources and further reduce the system's energy consumption. In order to further improve this model, we will also allocate computation and storage resources in edge and cloud servers in a more granular way.

**Author Contributions:** Conceptualization, D.W., R.W. and C.X. (Changqing Xia); Formal analysis, R.W. and C.X. (Changqing Xia); Funding acquisition, C.X. (Changqing Xia); Methodology, R.W.; Project administration, C.X. (Changqing Xia), X.J. and C.X. (Chi Xu); Resources, D.W. and C.X. (Changqing Xia); Software, R.W. and T.X.; Supervision, D.W.; Validation, R.W.; Writing—original draft, R.W.; Writing—review and editing, C.X. (Changqing Xia). All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was partially supported by National Key Research and Development Program of China (2022YFB3304004), the National Natural Science Foundation of China (61903356, 61972389, 62133014, 62022088, 62173322 and U1908212), the National Natural Science Foundation of Liaoning province (2022JH6/100100013), and the Youth Innovation Promotion Association CAS (2020207, 2019202, Y2021062).

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Comparison of Genetic Operators for the Multiobjective Pickup and Delivery Problem**

**Connor Little \*, Salimur Choudhury, Ting Hu and Kai Salomaa**

**\*** Correspondence: connor.little@queensu.ca

**Abstract:** The pickup and delivery problem is a pertinent problem in our interconnected world. Being able to move goods and people efficiently can lead to decreases in costs, emissions, and time. In this work, we create a genetic algorithm to solve the multiobjective capacitated pickup and delivery problem, adapting commonly used benchmarks. The objective is to minimize total distance travelled and the number of vehicles utilized. Based on NSGA-II, we explore how different inter-route and intraroute mutations affect the final solution. We introduce 6 inter-route operations and 16 intraroute operations and calculate the hypervolume measured to directly compare their impact. We also introduce two different crossover operators that are specialized for this problem. Our methodology was able to find optimal results in 23% of the instances in the first benchmark and in most other instances, it was able to generate a Pareto front within at most one vehicle and +20% of the best-known distance. With multiple solutions, it allows users to choose the routes that best suit their needs.

**Keywords:** optimization; vehicle routing; genetic algorithm; local search; pickup and delivery

**MSC:** 90C59

#### **1. Introduction**

The pickup and delivery problem is a problem that has gained significant popularity since its inception. Much of its interest is due to strong applications in industry across several important problems such as supply chain routing, distribution, ride hailing, food delivery, etc. [1]. In recent years, in part due to the emergence of COVID-19, these problems have been brought to the forefront of the public consciousness. It has been harder to fulfill the demands of a global population. Supply chains have been hit particularly hard leading to a sharp decrease in the number of goods that have been shipped [2]. The need for robust and efficient solutions is more important than ever.

The pickup and delivery problem is a variation on vehicle routing [3]. Specifically, this paper addresses the multiobjective capacitated pickup and delivery problem (PDP) with time windows. Vehicle routing, in turn, is a generalization of the travelling salesman problem. Vehicle routing expands upon its predecessors by allowing multiple routes and multiple vehicles while still maintaining the goal of minimizing the total distance or time travelled. The pickup and delivery problem further expands on vehicle routing by adding precedence to pairs of nodes, pickup nodes and delivery nodes [3]. Each pickup node must be visited prior to the corresponding delivery node. The added precedence constrains the problem in unique ways such that many algorithms developed for regular vehicle routing must be altered. The added constraints of capacity and time windows further restrict operations on the solution.

The pickup delivery problem also has some additional constraints which are important to this version of the problem. All vehicles start and end at the same location called a depot. The fleet is assumed to have all the vehicles be the same and travel at uniform speeds to simplify the problem. It is assumed that no node is visited twice as the pickup and

**Citation:** Little, C.; Choudhury, S.; Hu, T.; Salomaa, K. Comparison of Genetic Operators for the Multiobjective Pickup and Delivery Problem. *Mathematics* **2022**, *10*, 4308. https://doi.org/10.3390/ math10224308

Academic Editors: Abdellah Chehri and Francois Rivest

Received: 27 September 2022 Accepted: 10 November 2022 Published: 17 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

School of Computing, Queen's University, Kingston, ON K7L 3N6, Canada

delivery only needs to be completed once. If a vehicle must travel through a node to arrive at another node, this is simply ignored.

As pickup delivery with time windows is NP-hard [1], there have been many attempts to develop heuristics. One class of heuristics which have had success in solving the multiobjective PDP is genetic algorithms. With the introduction of NSGA-II (Non-dominated Sorting Genetic Algorithm II) [4], evolutionary algorithms have a strong framework for modelling this kind of problem. Standard techniques to approach multiobjective problems include assigning weights to different objective values or solving them one at a time. The latter essentially turns the problem into a sequential single-objective problem. NSGA-II introduces nondominated sorting which allows solutions to be ranked according to numerous objectives without specifying precedence or weights, by sorting solutions into dominating fronts. Further, a crowding distance allows the comparison between within each front while still avoiding the previous issues. This allows better solutions to propagate without placing bias or preference to any one solution.

The large majority of current research into the pickup and delivery problem is single objective. This is insufficient in practice in many cases. Real-world scenarios are dynamic with many factors such as profit, vehicle count, or greenhouse emissions being factors in determining routes. Even the visual nature of the routes can determine whether a solution will be utilized in practice [5]. The research that does explore multiobjective pickup and delivery often reduces it down into a weighted single-objective problem, adding bias and eliminating diverse solutions [6]. This area of research has also grown with the recent popularity in green vehicle routing. Green vehicle routing often attempts to minimize environmental impacts alongside reducing costs and distance, making it a prime candidate for multiobjective techniques.

To improve upon NSGA-II, mutation and crossover operators can be specified. There are many different mutation and crossover operators that can be utilized for travelling salesman problems and their variants [7].

This paper introduces several local search operators to quantify and compare them. There are two main classifications of operators that are covered: inter-route operations and intraroute operations. As multiple inter-route operations need to be included in order to cover the search space, an ablation study is utilized to explore each operator's effectiveness. Intraroute operations are compared directly.

The paper is structured as follows. Section 2 reviews the current state of works in the field of multiobjective pickup and delivery. Section 3 formally introduces the problem and supply a linear programming model for the problem. Section 4 introduces the genetic algorithm and explains its properties. Section 5 shows results and finally, Section 6 concludes with a discussion of the results and future directions.

#### **2. Background and Related Work**

The pickup and delivery problem is a well-researched problem within the literature. There are numerous different variations and distinctions, each presenting different challenges. A full taxonomy was constructed by Berbeglia et al. [8].

Being a combinatorial optimization problem, much of the research is looking into heuristics to speed up computation time. Multiobjective capacitated pickup and delivery with time windows (MOCPDPTW) is an extension of the vehicle routing problem. The highly constrained nature of the problem makes designing heuristics a unique challenge. One important paper for multiobjective problems is "A fast and elitist multiobjective genetic algorithm: NSGA-II" by Deb et al. [4]. This paper proposed an efficient framework for multiobjective evolutionary algorithms. Based on nondominated sorting, fronts were assembled. Fronts are collections of solutions based on how many other solutions are dominated. This allowed direct comparisons of groups of solutions.

Evolutionary algorithms such as genetic algorithms and memetic algorithms are common approaches to solve the multiobjective pickup and delivery problem. Bravo, Rojas, and Parada [9] focused on green vehicle routing, specifically on reducing pollution. They introduced an MO-PRP (multiobjective pollution routing problem) model which considered customers serviced, distance travelled, and fuel consumption. This could involve introducing additional variables to capture each objective. Chami et al. [10] offered a hybrid algorithm that combined genetic algorithms with a local search to optimize distance and cost. They did not cover time windows in their formulation. Fatemi-Anaraki et al. [11] also offered a hybrid genetic algorithm which first clustered the nodes before creating an initial population using one genetic algorithm. After the population had been generated, NSGA-II was run to find the final solutions. Fatemi-Anaraki et al. [11] aimed to minimize greenhouse emissions and cost of travel. Their formulation also did not contain time windows. Garcia-Najera and Gutierrez-Andrade [12] attempted to solve the multiobjective capacitated pickup and delivery problem with time windows by designing their own evolutionary algorithm based on solution domination. Gong et al. [13] used a bee-inspired algorithm to solve the MOCPDPTW. Their framework combined NSGA-II with the beeevolutionary-guided algorithm to minimize fuel consumption, waiting time, and distance. Again, their model abstained from considering time windows. Li, Sahoo, and Chiang [14] designed their evolutionary algorithm based on R2 indicators. Velasco et al. [15] formulated their problem assuming the vehicles would be helicopters with no need for time windows. They designed a genetic algorithm based on NSGA-II as well and improved upon it with local search operators. Wang and Chen [16] explored genetic algorithms with numerous different mutations in order to minimize vehicles and minimize travel distance. Finally, Zhu et al. [17] introduced a memetic algorithm with locality-sensitive-hashing local-search operators. They did not include time windows in their analysis.

There has been much exploration outside of the field of evolutionary computing as well. Grandinetti [18] solved the problem by an -constraint method. This involved iteratively solving constrained single objective functions to approximate the Pareto front. Ren et al. [19] designed a variable neighbourhood search algorithm. Their methodology generated the solutions, perturbed them with nonoptimal search operators, before improving them again in hopes of leaving local optima. Wang et al. [6] compared two different frameworks for MOCPDPTW: multiobjective local search (MOLS) and multiobjective memetic algorithms (MOMA). Zou, Li, and Li [20] used particle swarm optimization hybridized with a variable neighbourhood search.

The majority of the work done in the domain of vehicle routing and pickup and delivery has been conducted on single objective functions. This is slowly changing as the demand and the types of problems encountered change. The number of multiobjective papers has been increasing in recent years. Previous surveys mention very few instances of MO-PDP [21] but in the last 5 years, there has been papers by Chami et al. [10], Li et al. [14], Gong et al. [13], and Bravo et al. [9], amongst others. The consequence of single-objective work taking the majority of the attention is that many ideas have not been applied in this domain. Carrabs, Cordeau, and Laporte [22] worked on the single-objective version of the problem and introduced novel local search operators based on combining pickups and deliveries into single entities. In recent years, the exploration of multiobjective problems has become more prevalent. The increase in attention is driven in part by green vehicle routing. Green vehicle routing aims to not only reduce the distance and the number of vehicles but also to reduce the emissions the routes will produce.

One drawback of all of these papers is that most abstain from describing their methodology in full. The mutation and crossover operators are overlooked or implemented far more simply than they need to be. The work by Bravo et al. [9] did not mention which operators were chosen, making replication and derivation of their work difficult. Others, such as the work by Chami et al. [10], only tested one operator: the swap operator. This operator, while a classic genetic algorithm operator, does not take into account the structure of the problem. Our work aims to further help the creation and study of genetic algorithms by giving other researchers a jumping-off point when creating their algorithms. We aim to compare and contrast how different operators affect the final solutions so that a more intelligent algorithm design can be implemented.

#### **3. Problem Definition**

The capacitated pickup and delivery with time windows problem is built on a complete directed graph *G* = {*N*, *A*}. *N* is the set of nodes and is further broken down into *N* = {0} ∪ *P* ∪ *D*, where 0 is the depot, *P* = {1, 2, ... , *n*} is the set of all pickups and *D* = {*n* + 1, *n* + 2, ... , 2*n*} is the set of all deliveries. n is the number of requests. Pickup and delivery locations always come in pairs. Explicitly, for any pickup *i*, the corresponding delivery is *n* + *i* with *n* being the number of pickup and delivery pairs. The depot defines the starting and ending location for all vehicles.

For each pickup or delivery node, there is additional information supplied. Given a node *i* ∈ *N*, there exists *qi*, *di*, *ETWi*, and *LTWi*. *qi* designates the demand at node *i*. For pickups, this represents the space needed in the vehicle to pick an item up and is positive, while for deliveries it is negative to represent the removal of an item from the vehicle. *d*\_*i* is the service time at each node. This represents the amount of time it takes to perform a pickup or delivery. Finally, [*ETWi*, *LTWi*] are the early time window and late time window, respectively. This represents the time when a vehicle can visit and the service can be performed. Should a vehicle arrive prior to *ETWi*, it has to wait, and should a vehicle arrive after *LTWi*, the route is invalid.

We are also given a set *K* of vehicles. Each vehicle must keep track of how much it is carrying and how long it has been travelling. Let *Q*\_*i*, *k* be the total capacity of a vehicle at a given node. As a vehicle traverses through the graph, the latter is updated to add *q*\_*i*. For nodes a vehicle does not visit, this value is irrelevant. Let *B*\_*i*, *k* be the time at which a vehicle has arrived at a given node. For each visited node, this should add the travel time and the service time of the node. Again, should a vehicle not visit a node, this value is irrelevant.

A is the set of all edges between the nodes *A* = {(*i*, *j*)|*i*, *j* ∈ *N*, *i* = *j*}. Each element (*i*, *j*) in A has an associated cost *Ci*,*j*. This typically represents distance or time to travel.

In addition to *G*, we also receive the max capacity of each vehicle *Q*. As this version of the problem has a homogeneous fleet, it is a constant. The max route time is implicitly supplied by the latest time that the depot may be visited. Again, we are assuming a homogeneous fleet, so all vehicles travel at the same speed and have the same capacity.

Let *xi*, *j*, *k* be a decision variable to determine if a vehicle *k* travels from node *i* to node *j*. This is a multiobjective problem so there are two objective functions to minimize.

$$\min \sum\_{i \in N, k \in K} \mathbf{x}\_{i, 0, k}$$

Objective 1 aims to minimize the total number of vehicles used.

$$\min \sum\_{i,j \in \mathcal{N}, i \neq j} c\_{i,j} \ast x\_{i,j,k}$$

Objective 2 minimizes the total travel time that a vehicle takes. This does not take into account waiting time to not incentive idling.

With these objectives, the following constraints are added to construct a linear programming model:

$$\sum\_{k \in K} \sum\_{j \in N} x\_{i,j,k} = 1 \; \forall i \in N \tag{1}$$

$$\sum\_{j \in N} \mathbf{x}\_{i,j,k} - \sum\_{j \in N} \mathbf{x}\_{n+i,j,k} = 0 \,\,\forall \, i \in P, k \in K \tag{2}$$

$$\sum\_{j \in N} x\_{0,j,k} = 0 \; \forall \; k \in K \tag{3}$$

$$\sum\_{j \in N} x\_{j,i,k} - \sum\_{j \in N} x\_{i,j,k} = 0 \,\,\forall \, i \in N, k \in K \tag{4}$$

$$\sum\_{j \in N} x\_{j, 0, k} = 0 \,\,\forall k \in K \tag{5}$$

$$\iota \times\_{i,j,k} \* (Q\_{i,k} + q\_j) \prec = Q\_{j,k} \,\forall \, i \in N, j \in N, k \in K \tag{6}$$

$$\max\left[0, q\_i < = Q\_{i\lambda} < = \min\left[Q, Q + q\_i\right] \; \forall \; i \in N, k \in K \tag{7}$$

$$\left(\mathbf{x}\_{i,j,k} \ast \left(B\_{i,k} + c\_{i,j} + d\_j\right) \right) \prec= B\_{j,k} \,\forall \, i \in N, j \in N, k \in \mathbb{K} \tag{8}$$

$$b\_{i,k} + c\_{i,n+i} + q\_{n+i} \prec = b\_{n+i,k} \; \forall \; i \in P, k \in K \tag{9}$$

$$ET\mathbb{W}\_{i} \cdot \leq = B\_{i,k} \cdot \leq = LT\mathbb{W}\_{i} \; \forall \; i \in \mathbb{N}, k \in \mathcal{K} \tag{10}$$

$$
\omega\_{i,j,k} \in [0,1] \tag{11}
$$

Constraint 1 enforces that each node is visited once and only once across all vehicles. As the pickup delivery problem with time windows assumes a complete graph, it is assumed that any intermediate stops are irrelevant. Constraint 2 is to enforce that pickup and delivery pairs are in the same route. If a vehicle picks up a product, it must also be the one to deliver it. Constraints 3, 4, and 5 ensure that a subroute is consistent and both starts and ends at the depot. In other words, the vehicle must start and stop at the depot, while also making a cycle. Constraints 6, 7, and 8 guarantee that the routes always arrive within the allowed time window. Equations (9) and (10) guarantee that a vehicle always has a sufficient capacity for the route it is assigned to. Lastly, Constraint 11 enforces that the decision variable be a Boolean variable.

The above is a 3-index model of the pickup and delivery problem with time windows, constructed by [23]. With mixed-integer programming, the objectives are solved hierarchically. First, the minimum number of vehicles are found by solving the model with only one objective. The number of vehicles is then set constant by adding an equality constraint, and the model is rerun with the second objective function to find the minimum distance.

#### **4. Genetic Algorithm**

The motivation behind constructing a genetic algorithm heuristic is the size of the problem. MOCPDPTW is NP-hard [1], as it extends the vehicle routing problem (VRP) which is provably NP-hard. This makes finding solutions increasingly difficult as the size of the problem increases. Using Gurobi, the three-index model was unable to find solutions to 50 request instances within the time limit of an hour. The two-index model [24] was able to find solutions but they were worse than using simple construction heuristics such as the cheapest insertion method. Heuristics are required as they trade speed for solution quality. For unexplained notations and for those unfamiliar with evolutionary computing, the reader is referred to a review by Katoch et al. [25] or the introduction by Mitchell [26]. A survey on genetic algorithms with respect to capacitated vehicle routing is provided by Karakatiˇc and Podgorelec [21].

#### *4.1. Solution Representation*

For this problem, we encoded a solution (chromosome in genetic algorithms) as an array of arrays. Each array in the outer array represented the route a vehicle would take. Each route was a permutation of nodes sampled from N. An example route can be seen in Figure 1. For each pickup, the corresponding delivery, x + n, appears after. Each route is implicitly known to start and end at the depot, so those nodes are added during the evaluation step.

**Figure 1.** Example route with 5 pickups and 5 deliveries; n = 5.

#### *4.2. Initial Population*

The populations were initialized using the insertion heuristics as construction heuristics. First, we predicted an upper bound on the number of vehicles that were available to

cap the number of routes generated. Each route was seeded with a random request. This ensured that each individual would be different. Afterwards, a cheapest parallel insertion heuristic was used. This algorithm is explained in Algorithm 1. Given a solution, each request is inserted into each possible location and the cost is calculated. The solution with the cheapest cost is kept and the process is repeated with the next request. This inserts the request which minimizes the total route time. Once each request has been inserted, a 2-opt algorithm is run on each route to improve the initial solutions.

#### **Algorithm 1** Parallel Insertion

**Input:** Insertion heuristic H, insertion operator I, local search operator O, number of routes K

**Output:** A feasible solution


#### *4.3. Evaluation*

Our genetic algorithm utilized NSGA-II to enable multiple objectives. The first objective was to minimize the total distance over all routes. This did not include waiting time or service time. Service time was constant across all nodes, so adding it did not change the solutions relative to each other. The waiting time was the time during which a vehicle was simply sitting idle. This could occur if a vehicle arrived prior to the earliest time window. The second objective was to minimize the number of vehicles needed. This evaluation step was different from the linear programming version as nondominated sorting was used instead of hierarchical methods.

#### *4.4. Selection*

Selection followed the standard given by NSGA-II [4]. For parent selection, a binary tournament was employed with replacement. Parents were chosen iteratively until the number of parents was equal to two times the population. This allowed for the number of offspring to be equal to the population. The offspring were then generated by performing a crossover and a mutation operation before being added into the population. Selecting the next generation was done by sorting the combined offspring and prior population into fronts by which nodes they dominated/were dominated by. They were then sorted within each front by the crowding distance. The individuals were then chosen based hierarchically on their front, followed by their crowding distance until the new population was the same size as the old population. This framework is the same as the (*μ* + *λ*) framework [26].

#### *4.5. Crossover*

Due to the highly constrained nature of the problem, a specialized crossover function was used. The crossover function began by initializing an empty solution. Iteratively, a route was selected from each parent until no route was left in either parent. If that route contained only pickup and delivery pairs which had not been seen prior, the route was appended to the solution as is. If a route had a node which had already been included, that pickup delivery pair was removed, and the rest of the route was kept intact. The shortened route was then added to the solution. At the end, the routes added got smaller due to nodes being removed. As a final optimization step, all routes with 2 or fewer pickup and delivery nodes were removed from the solution, and the pickup and delivery pairs were extracted. These requests were then reinserted into the solution in a parallel fashion. The final solution was then returned. The intuition for allowing partial routes was that it did not separate requests that were often paired together. This crossover function was called route crossover with ejection. For an example of how this works, see Figure 2.


**Figure 2.** Example of crossover without ejection. The final offspring would then have all "small" routes removed and reinserted.

A second crossover operator was tested. Developed by Wang et al. [6] and Alvarenga and Mateus [27]. Wang et al. [6] and Alvarenga and Mateus [27] chose routes iteratively but only accepted routes that could be added in their entirety. Those that could not had all nodes set aside to be reinserted into the surviving routes.

In our trials it was found that the first crossover operator produced slightly better results, so in the results, we opted to use that one. A crossover rate of 1/5 was used. The crossover algorithm is described in Algorithm 2.


#### *4.6. Mutation*

Mutation was divided into two stages. The first step was to perform an inter-route operation. This moved nodes between each subroute. The second step was to perform intraroute optimizations. After a route was chosen, it searched an operational neighbourhood for an improved solution. The motivation behind exploring different mutation operators stemmed from the lack of diversity within the literature. Of those that employ intra-route operations, swap mutations are by far the most common mutation. Chami et al. [10], Gong et al. [13], Garcia-Najera and Gutierrez-Andrade [12], and Zhu et al [17] all used a variation of this operator.

4.6.1. Inter-Route Operations

There were six inter-route mutations applied.

	- **–** Removes a single pickup and delivery pair from a random route and attempts to insert it into another route.
	- **–** Randomly selects 2 routes and attempts to swap a pickup and delivery pair between them.
	- **–** Randomly picks a route and attempts to add a random pickup and delivery pair.
	- **–** Randomly picks a route and attempts to add the pickup and delivery pair according to a heuristic
	- **–** Selects a route and unassign all pickup and delivery pairs. Afterwards, it attempts to insert all of them.
	- **–** Selects a route and creates 2 new routes out of the pickup and delivery pairs.

These 6 mutations were given an equal probability of occurring. Each of these mutations required an insertion operator. To make the most general insertion operators, we followed the algorithms as described in Algorithms 1 and 3. These inter-route operations were inspired by the works of Wang and Chen [16] and Yanik, Bozkaya, and Dekervenoael [28].


For the sequential insertion, a heuristic, an insertion operator, and a local search operator were supplied. Starting with an empty solution, routes were built iteratively by choosing a request based on the insertion heuristic and the insertion operator. Once the optimal request had been inserted, the resulting route was improved by the local search operator until it could not be improved anymore. If at any point a new pickup and delivery pair could not be inserted, the route was added to the solution as is and a new route was started with the previously uninserted pair. These processes were repeated until all requests were inserted.

The parallel insertion heuristic worked very similarly. Given a starting number of vehicles k, k routes were initialized with a randomly chosen request. The optimal request across all routes were chosen via the insertion heuristic and operator. Only one route was improved at each iteration. From there, the route was improved with the local search operator and inserted into the solution. If a request could not be inserted into any route, a new route was appended much like the sequential variation.

#### 4.6.2. Insertion Operators

Another consideration in the design was which insertion operators to use. As mentioned previously one can insert in both parallel and sequential fashions. There are also several heuristics to improve which requests get inserted and where to insert. Common

methods include the cheapest insertion, which inserts the request that minimizes the total distance, the furthest insertion, which maximizes the total distance, and the random insertion, which places a random request.

We found that choosing requests via the cheapest insertion method was the most useful method. Moreover, our trials found that the number of vehicles had little effect on the final result. As such, choosing a number of initial routes for the parallel insertion operator was not impactful. In the end, we chose to use the parallel construction heuristic for our initial populations. While both produced similar results, the parallel construction was able to create more varied populations and therefore had more diversity throughout.

#### 4.6.3. Intraroute Operations

In addition to insertion operators, the routes are often further optimized with local search operators. Multiple local search operators are tested and explained in Table 1 below. Some were standard genetic algorithm operators such as the swap mutation while others were more problem-specific such as the blocked 2-opt operator. A blocked operation involves grouping the pickup and delivery pairs into single entities. The idea is to keep pickup and delivery pairs together. Sequentially going down a list, the nodes are added into a bin starting with a pickup and until the corresponding delivery node is reached. After the first bin has been filled, the route is restarted at the next pickup node and the process repeats until all pickup and delivery pairs are considered. All nodes between a pickup and delivery pair are included in the group and as such this results in multiple copies of some nodes. The original decoding by [22] was LIFO (last in, first out) and assumed that there would never be any overlap. To address this, only the first copy of a node was kept when converting back into a normal route. This can be further seen in Table 2. Operations were then performed on these groups instead of individual elements. The reasoning for this was to preserve precedence. If pickup and delivery pairs moved together, it was impossible for the precedence to be violated.

#### *4.7. Datasets*

When choosing a dataset, there is often many factors to consider. For nonstandard PDP, there is no consensus on a benchmark dataset, with most papers generating their own [1]. While this does allow data to be curated for any problem, there is the issue on how representative of real-world scenarios the synthesized data will be.

One dataset we elected to use comes from Sulzbach Sartori and Buriol [29]. This dataset is an open-source dataset based on geographical data from capital cities. It supplies several instances of varying node counts and incorporates real-world travel time to ensure that it is representative of actual data. We used 25 instances with 100 nodes. The input instances were labelled with the city they were based on, the number of nodes, and the instance number. In this case bar-n100-2 would be the second instance in Barcelona with 100 nodes.

In addition, we also used the well-known Li Lim [30] dataset. This is a commonly used benchmark dataset for the pickup and delivery problem. It uses Euclidean distances between points and hierarchically solves for the number of vehicles and then distance. Li Lim [30] distinguished their instances based on how the nodes were arranged. Lr instances were randomly distributed, lc instances were clustered, and lrc instances were partially distributed randomly.


**Table 1.** Descriptions of each local search operator.

**Table 2.** Example of a route that has been blocked.


Both datasets can be trivially adapted into multiobjective instances. Moreover, both datasets come with the best-known solution which was treated as the optimal solution with respect to the lower bound and for calculating optimality. Optimality gaps were calculated through the equation

1 − (Found solution/best known solution)

#### **5. Results**

All instances were run for a max time of 30 min or a max epoch of 300 on an i7-9750H CPU with 16 GB of RAM. A population of 50 was utilized. A time limit of 1800 s was applied, should the algorithm fail to terminate within that time. The total results for each instance can be found in Appendix A.

The max epochs of 300 was chosen arbitrarily such that the time limit was the more important factor. This allowed the algorithms to compare highly complex operations against very efficient operators without heavily biasing the results towards search techniques with larger neighbourhoods. Measuring epochs instead of time limits biases the algorithm towards complex and costly operations. Very rarely did instances hit the epoch limit as opposed to the time limit. In this instance, convergence meant that all genomes within the population had the same fitness value, essentially reducing the diversity to 0. Convergence is important as if the population is still very diverse, it means that the local optimum has not been reached yet, while if the diversity is 0, then no more learning can be done. Test runs were run in 10 min intervals on the first five instances of the Li Lim dataset in order to empirically choose these values. The 2-opt operator was chosen for these runs. These runs held all other parameters constant. For a time limit of 10, the algorithm did not converge at all, still having around 16 fronts on average. In four out of five cases the 20 min test run converged, while when given 30 min, all test cases converged. As the time got longer the solutions got better. A final time limit of 30 min was chosen to allow a greater chance at convergence, especially given more complex operators. On the final run, with a 30 min time limit, the algorithm converged about 25% of the time.

The population size was chosen in much the same fashion. Using a grid search technique, population sizes of 25, 50, and 100 were tested. A time limit of 30 min was allotted. On the five test instances, a populations size of 50 was found to perform the best. A population size of 25 had the population converge very early on, preventing further learning from taking place. With a population size of 50, there was still some diversity within the population. A population size of 100 was far too large for the problem. Within 30 min, none of the five test cases had converged and the four instances had upwards of 60 fronts. Of the resulting best solutions, all five came from the test case where the population size was 50. With 25 and 50, the optimal solution was found twice.

The results from the Li Lim dataset can be seen in Table A1. This table lists the instance name, the best solution and our found solution. The solutions are in the form of number of vehicles, distance travelled. Our algorithm was able to find the optimal (best known) result in 13 out of 56 instances. In the rest of the trials, we were able to find results within one vehicle and within 10% in most cases for the distance. The worst result we achieved was on the instance lr205 in which our three-vehicle solution was only 27% within optimality in regard to distance. Our five-vehicle solution was within 13%, however. Four of our results were 20% or higher, while in forty-three instances our result was within 10% of optimality. Omitting the lr 200 instances, the average distance optimality gap was 4% as seen in Table 3. When nodes were randomly distributed, our algorithm performed the worst. The proposed algorithm performed the best when the nodes were clustered, in which 9 out of 17 were optimal. In three cases, our algorithm was able to reduce the total distance to below that of the best-known solution. Often, the genetic algorithm would converge to a local optimum and would then cease learning. The addition of the 4-opt mutation was able to help remove solutions from this pool on occasion.


**Table 3.** Summary on Li-Lim benchmark.

The results from the Sulzbach Sartori and Buriol's [29] dataset can be seen in Table A2, and summarized in Table 4. This dataset was more complex than the previous one, with our algorithm not always converging within a 30 min time limit. As such, our algorithm was not able to solve to optimality in any of the instances. Despite this, we were able to solve the problem in every case and produce solutions within one vehicle and an average of 2.97% of the optimal distance on average. Distancewise the worst solution was bar-n100-6 with an optimality gap of 11.81%. It was one of two total solutions with a gap larger than 10%. It is not surprising that we did not find that many optimal solutions within the specified time limit. Our algorithm had a lot of overhead dedicated to finding many feasible solutions as opposed to finding one optimal solution. Maintaining multiple unique genomes allow a greater diversity and more options for choosing a final solution.


**Table 4.** Summary on Sulzbach Sartori and Buriol's benchmark.

A secondary study was conducted to determine which inter-route and intraroute operators would be most effective.

To choose which intraroute operations to utilize, a comparison was generated. For each operator five trials were run on five different instances. Each trial was run using identical parameters. Each run had a population of 50 and was run for 200 epochs. Initial populations were generated with a parallel construction and then solved to be 2-opt optimal. The crossover rate was 0.2 and the starting number of vehicles was chosen to be slightly higher than the known best solution, typically higher than four.

For each run, the total number of fronts and the number of unique solutions were measured to quantify the diversity of the population, as seen in Table 5. Table 6 measures the solution quality. The points at the Pareto front and the hypervolume were measured to compare the quality of the solutions. For each operator the z score of the hypervolume was also recorded. This allowed a direct measurement of how much better each operator was in comparison. Table 7 aggregates all of the trials.


**Table 5.** Local search operators effects on diversity of bar-n100-1 [29].

Each operator with the exception of the 4-opt operator was run in a dynamic programming fashion, fully exhausting the neighbourhood to ensure the best move was made. For the 4-opt and blocked 4-opt operators, this was infeasible due to the size of the search space, so a Monte Carlo framework was used. One hundred random neighbourhood moves were tested and the best one was used for this iteration of local search.


**Table 6.** Local search operators effects on solution quality of bar-n100-1 [29].

Reference Point 8, 900.

The results indicated that the 2-opt operator was the best move by a decently large margin. The 3-opt operator move was too slow to test exhaustively and so failed to converge like the other trials. The 4-opt operator had the largest variance of any operator. On some runs it found the best solution and on some it found the worst. The standard array of mutation operators performed adequately but were not able to compare to more specialized operators.

As for diversity, out of those that converged, the 4-opt operator had the best diversity within each front, with an average of eight unique individuals per front. Insertion mutation had the worst diversity averaging only 1.8 fronts and 5.6 individuals.

To address both diversity and solution quality, a combination of 4-opt and 2-opt was used in the final model.

To assess the effectiveness of each intraroute operator an ablation study was conducted. There were seven different scenarios run over five different instances. All runs held all parameters constant aside from the inter-route operations. They were run for 200 epochs with a population of 50. The initial population was created with parallel insertion and the 2-opt operator, with the intraroute operation also using the 2-opt operator. The 2-opt operator is a standard local search operator for variants on the travelling salesman problem. It involves selecting two edges and swapping them, effectively generating two new routes. In each scenario, a single inter-route operation was removed to assess its effect on the final solution. Results are in Tables 8–10. Figure 3 shows the average hypervolume value over time and Figure 4 shows how minimum distance is affected. The hypervolume is a special measurement that calculates the area of the solution space and a reference point that is larger in magnitude than any given point in all dimensions. This area allows a direct comparison of the solutions generated. It was first introduced by Ziztler and Theile [31] in 1999. The main benefit of this measure is that it makes no assumptions on any knowledge about the Pareto front, which the other measures require.


**Table 7.** Local search operators summary statistics.

The experiment tested how much each pickup and delivery operation affected the end result. For each operator, the genetic algorithm was run with all other parameters fixed. The only difference was the inclusion of each operator. Each run was executed with a population of 40, over 300 epochs. All other parameters were held constant.

**Figure 3.** Hypervolume after removing each inter-route operator.

**Figure 4.** Minimum distance after removing each inter-route operator.


**Table 8.** Inter-route operation effects on diversity of bar-n100-5 [29].

**Table 9.** Inter-route operation effects on solution quality of bar-n100-5 [29].


Reference Point 8, 900.

**Table 10.** Inter-route operation effects on solution quality of bar-n100-5 [29] Part 2.


Reference Point 8, 900.

#### **6. Discussion and Conclusions**

In this work we formulated a genetic algorithm based on NSGA-II for solving the multiobjective capacitated pickup and delivery problem with time windows. We built two generic metaheuristics which allowed solution construction and insertion and explored six different inter-route operations and sixteen different intraroute operations. We found that adding intraroute operations in addition to inter-route operations greatly improved solution quality, with 2-opt being the best operator we trialed.

Of the inter-route operations, all of the tested operators benefited the end result. The variety of each operator enabled a good diversity within the population. The intraroute operators had more interesting results. Standard genetic algorithm operators such as mutation tended to perform poorly. Operators that took structure into account such as k-opt performed much better. Blocking the results did not have as much success despite taking more of the problem into consideration.

Our work does contain some limitations. Without proper benchmarks within the literature, it is difficult to compare results directly. Additionally, future work would involve exploring different ways to maintain diversity, as our algorithm was occasionally stuck in local optima. There are many ways that this could be implemented, either by speciation, island models, or geographical encodings.

**Author Contributions:** Conceptualization, C.L., K.S., T.H. and S.C.; methodology, C.L.; software, C.L.; validation, C.L.; investigation, C.L.; data curation, C.L.; writing—original draft preparation, C.L.; writing—review and editing, K.S., T.H. and S.C.; visualization, C.L.; supervision, K.S., T.H. and S.C.; project administration, C.L. and S.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is funded by Vector Scholarship in Artificial Intelligence and NSERC CGS-M 2021 scholarships of Connor Little, as well as NSERC discovery grant received by Dr. Salimur Choudhury (RGPIN-2018-1507).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Li Lim dataset [30]: https://www.sintef.no/projectweb/top/pdptw/ li-lim-benchmark/ (accessed on 25 September 2022).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:

PDP Pickup and delivery problem

MOCPDPTW Multiobjective capacitated pickup and delivery problem with time windows

#### **Appendix A**

Below are Tables A1 and A2 and displaying the found and best solutions for each instance in the Li Lim [30] and Sulzbach Sartori and Buriol [29] datasets, respectively.

**Table A1.** Results on Li-Lim's benchmark.


**Table A1.** *Cont.*


Results in which distance was improved are indicated wtih an \*.




**Table A2.** *Cont.*

Results in which distance was improved are indicated with an \*.

#### **References**


#### *Article*
