**A Multi-Depot Dynamic Vehicle Routing Problem with Stochastic Road Capacity: An MDP Model and Dynamic Policy for Post-Decision State Rollout Algorithm in Reinforcement Learning**

**Wadi Khalid Anuar 1,2, Lai Soon Lee 2,3,\*, Hsin-Vonn Seow <sup>4</sup> and Stefan Pickl <sup>5</sup>**


**Abstract:** In the event of a disaster, the road network is often compromised in terms of its capacity and usability conditions. This is a challenge for humanitarian operations in the context of delivering critical medical supplies. To optimise vehicle routing for such a problem, a Multi-Depot Dynamic Vehicle-Routing Problem with Stochastic Road Capacity (MDDVRPSRC) is formulated as a Markov Decision Processes (MDP) model. An Approximate Dynamic Programming (ADP) solution method is adopted where the Post-Decision State Rollout Algorithm (PDS-RA) is applied as the lookahead approach. To perform the rollout effectively for the problem, the PDS-RA is executed for all vehicles assigned for the problem. Then, at the end, a decision is made by the agent. Five types of constructive base heuristics are proposed for the PDS-RA. First, the Teach Base Insertion Heuristic (TBIH-1) is proposed to study the partial random construction approach for the non-obvious decision. The heuristic is extended by proposing TBIH-2 and TBIH-3 to show how Sequential Insertion Heuristic (SIH) (I1) as well as Clarke and Wright (CW) could be executed, respectively, in a dynamic setting as a modification to the TBIH-1. Additionally, another two heuristics: TBIH-4 and TBIH-5 (TBIH-1 with the addition of Dynamic Lookahead SIH (DLASIH) and Dynamic Lookahead CW (DLACW) respectively) are proposed to improve the on-the-go constructed decision rule (dynamic policy on the go) in the lookahead simulations. The results obtained are compared with the matheuristic approach from previous work based on PDS-RA.

**Keywords:** reinforcement learning; Markov decision processes; approximate dynamic programming; rollout algorithm; constructive base heuristic; vehicle routing problem.

**MSC:** 90C40; 90B15; 90C59

#### **1. Introduction**

Recent events have shown that the occurrence of a disaster continues to claim many lives despite the growing number of relief organisations to support and help the victims throughout the world. In the case of the 2015 Nepal earthquake, for example, nearly 9000 lives were lost, and 23,000 people were injured [1]. In the event, critical medical supplies and health personnel were far from lacking, given aid rushed into Nepal as soon

**Citation:** Anuar, W.K.; Lee, L.S.; Seow, H.-V.; Pickl, S. A Multi-Depot Dynamic Vehicle Routing Problem with Stochastic Road Capacity: An MDP Model and Dynamic Policy for Post-Decision State Rollout Algorithm in Reinforcement Learning. *Mathematics* **2022**, *10*, 2699. https://doi.org/10.3390/ math10152699

Academic Editors: Francois Rivest and Abdellah Chehri

Received: 2 June 2022 Accepted: 26 July 2022 Published: 30 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

as the news went out. The relief supplies received worldwide were so large in volume that the Kathmandu airport was overwhelmed with the sudden increase in air traffic [2]. However, these life-changing resources could not be distributed accordingly and in a timely manner given the lack of coordination between the local and international relief aid providers. This led to inefficiencies in executing relief operations [1,2]. The fact that Nepal is a landlocked country without any sea access also exacerbated the logistical challenges further, as seen from the bottleneck problem in Kathmandu airport [3]. The hilly topography of Nepal is prone to landslides [1,2,4], while continuous aftershocks damaged the road infrastructure, such as the Pasang Lhamu and Araniko highways [5], which compromised the road network to the disaster zone [4,6]. Additionally, the sudden onset of the disaster meant that there was limited availability of vehicles and little preparation for emergency logistics operations [4]. The viability of the long-term humanitarian operations with the available vehicles, with regards to safety and logistics and asset management, were also in question since the road network was compromised [4]. This hindered efficient relief operations in terms of transport and delivery [4,7].

Meanwhile, urgent local medical supplies were rapidly diminishing as both field and local hospitals were overrun by victims seeking immediate treatment [2,8]. If the relief aid and supplies that were laying dormant at the Tribhuvan Airport could have been channelled through effectively, it would certainly have helped alleviate the problem of the urgent need for medical supplies and treatment.

In terms of disaster management preparedness, this case study serves to highlight the critical role of transportation in the event of a disaster. Transportation service is a crucial element when it comes to humanitarian logistics operation, in particular the in-country transportation for delivery of relief supplies [4]. From the 2015 Nepal earthquake event, some observations have been made with regards to ensuring efficient delivery of goods through vehicle routing. The study in [4] pointed out that the geographical topology and mountainous landscapes of Nepal as well as the second earthquake tremors and weather conditions during the event heavily impacted the delivery speed, especially involving the last mile of the delivery. In the disaster event, the road condition and capacity were compromised by major accidents, causing traffic density and loss of cargo [4]. Furthermore, there was an overwhelming demand for the limited number of trucks in terms of capacity and availability due to critical delay and backlogs. This increased the price of transport vehicle procurement to as high as 40%. Additionally, the lack of a decision support system (DSS) to monitor and track transport vehicles, coupled with untrained drivers, also led to a serious shortage of reliable transportation. Landslides, in particular, limited the routing to certain areas. In addition, the risk of accidents due to a unfamiliar route was significant and not helped by landslides which are sensitive to weather conditions. As such, intelligent routing and communication access is pivotal for this particular vehicle routing problem.

The proposed Multi-Depot Dynamic Vehicle-Routing Problem with Stochastic Road Capacity (MDDVRPSRC) model addresses such delivery problems in the setting of relief humanitarian operations by incorporating the aforementioned challenges. The problem of a bottleneck could be solved if "mini" airports were erected at strategic locations (multidepot). The problem of limited transport vehicles in terms of availability and capacity could be addressed at some length through transport vehicles performing split deliveries which is known to be a cheaper alternative by almost 50% [9]. Furthermore, these vehicles could also perform multi-trip deliveries. Communication among Logistics Service Providers (LSPs) and their local contacts helps in updating the road network conditions which may hint towards dynamic problem modelling, with information updates at regular intervals. The uncertainty which is the pivotal aspect of efficient delivery operations should be addressed in terms of dynamic and stochastic settings of road capacity and conditions within the road network [4,7].

The Markov Decision Processes (MDP) modelling framework offers a natural way to address these dynamic and stochastic aspects of the problem. However, incorporating these aspects leads to an exponential growth of state, action and outcome spaces which is needed when solving such a real and complex problem. To deal with the curse of dimensionality [10], the Approximate Dynamic Programming (ADP) approach is commonly applied to solve the problem from the Machine Learning (ML) (Reinforcement Learning, RL) perspective. To address the large outcome space, the Post-Decision State (PDS) is applied as an extended version of the basic MDP modelling framework [11].

The ADP approach is usually applied to approximate the value of the next state *sk*<sup>+</sup><sup>1</sup> or the value of the action *ak* when solving the Bellman equation [12]. In ADP, the lookahead approach is suitable when dealing with the large action and outcome space that constitutes part of the curse of dimensionality. In this paper, the known Post-Decision State Rollout Algorithm (PDS-RA) [13] is applied as part of the lookahead approach in the ADP.

A base heuristic in the PDS-RA is taken as the guiding policy for the decision rule applied as the state transitions within the lookahead horizon. In modelling the VRP through the MDP framework, the decision rule is generally the assignment of vehicles when the state transitions. In the rollout, the transition occurs in the lookahead horizon beginning from the potential next state *sk*<sup>+</sup><sup>1</sup> to the lookahead end state *sK*. In other words, the base decision rule is a route computed for all vehicles to navigate within the future lookahead horizon. In the case of MDDVRPSRC, however, assigning vehicles based on a computed route, which is computed once, becomes problematic. This is due to the stochastic road capacity which may render the computed route unusable in the next lookahead update. One way to navigate around this is to have the route build dynamically on the go by a simple constructive heuristic known as a base heuristic.

From the decision rule or route obtained through this approach at each iteration, only the first assignments of the constructed route are applied while the rest are ignored. This method is feasible and practical due to the less expansive computation performed by the simple constructive heuristic. Here, the Teach Base Insertion heuristic (TBIH-1) is proposed to balance the random exploration and to guide the exploitation by dictating obvious assignments. Extending from this, the authors apply two known constructive heuristics: Sequential Insertion Heuristic SIH(I1) and Clarke and Wright (CW), in a dynamic setting and embed them into the TBIH-1(as proposed TBIH-2 and TBIH-3). To the best of our knowledge, no paper has shown how these constructive heuristics can be executed, as proposed, in a PDS-RA setting. We further derived from these two heuristics: TBIH-4 and TBIH-5 that seek, within their algorithm, promising vehicle assignments by looking up to two steps ahead. To deal further with the large action space, most research provides some mechanism to segregate or cluster vehicles with a specific set of customers to serve. Due to the stochastic road capacity, such mechanism is difficult to adopt. Thus, further approximation is made when computing the optimal action *a <sup>k</sup>* with regards to executing the PDS-RA. Through this proposed method, each PDS-RA is executed for each vehicle for every potential assignment that can be given to that vehicle.

The contributions made in this paper are as follows: first, the novel MDDVRPSRC is proposed in a disaster event setting based on the modelling framework of MDP. Next, TBIH-1 heuristic is proposed in this work along with four extended variants, namely TBIH-2, TBIH-3, TBIH-4, and TBIH-5. By doing so, it is shown how SIH(I1) and CW can be applied in the dynamic setting of route-based MDP for PDS-RA [14]. The authors also show how these two can be extended by looking up to two steps ahead. Finally, it is also shown and validated how the near optimal decision can be approximated further by disintegrating the collective assignment decisions to an individual near optimal decision. To the best of our knowledge, both the MDDVRPSRC MDP model and the five base heuristics applied in the PDS-RA algorithm are novel and have not yet been proposed.

This work is the extension of research published by [15], where the damage determination of the roads within the road network is referred from. Furthermore, the Poisson stochastic distribution of a stochastic road capacity is also referred from. In this paper, the dynamic and stochastic MDDVRPSRC MDP model is presented to complement the earlier research in [15] that modelled the Deterministic Multi-Depot VRP with Road Capacity (D-MDVRPRC) as well as the two-stage Stochastic Integer Liner Programming (SILP) model of a Multi-Depot VRP with Stochastic Road Capacity (MDVRPSRC-2S). To solve the MDP Model presented in this paper via the PDS-RA algorithm, the five base heuristics are proposed as an alternative to the "cluster first, route second" approach that cannot be applied. Meanwhile, to benchmark these heuristics, the matheuristic rollout presented in [15] is applied where tractable.

This paper is organised as follows: Section 2 describes the literature review focusing on the proposed model and base heuristics. Section 3 describes the problem of MDDVRPSRC where some elements of the models have been referenced from the authors' previous work. The known PDS-RA approach is briefly described in Section 4 as well as how the optimal decision is approximated through the proposed mechanism. Additionally, variants of the proposed base heuristics are presented here. Section 5 presents the computational results. while Section 6 synthesises the findings. Finally, Section 7 concludes the paper.

#### **2. Literature Review**

An extensive synthesis of the literature on works adapting VRP for humanitarian operations can be found in [16,17]. In [17], numerous papers within the last decade (as of 2020) have been reviewed in terms of the application of VRPs for three selected humanitarian operations. Various modelling aspects, such as dynamic and stochastic problem, multi-disaster phase, multi-objectives, multi-trips, multi-depots, split delivery and more, which are relevant to the model proposed in this work, are elaborated. Furthermore, the solution approaches applied are also discussed in detail, especially the challenges when dealing with stochastic and dynamic VRPs. Ref. [15] extends the findings by discussing some papers applying VRP outside the field of humanitarian operation settings but are still relevant to the model that was proposed. Meanwhile, the RL adaptation in solving Supply Chain Management (SCM) is discussed in [18]. Additionally, the adoption of RL in VRP and TSP is discussed in [19].

#### *2.1. Stochastic Vehicle Routing Problem for Humanitarian Operations*

The survey in [20] discussed the recent dynamic VRP for various applications from 2015 to 2021. From the analysis of their work, it could be observed that the research focusing on dynamic problems usually also address stochastic problems. The opposite, however, may not necessarily be true. As such, some discussion on research works regarding stochastic VRP for humanitarian operations is warranted to complement those discussed in [17]. For example, the recent work of [21] addressed the Inventory Routing Problem (IRP) with an uncertain traffic network. Here the distribution of essential multi-commodities in the chaotic post-disaster phase among relief shelters and distribution centers is modelled through the network flow model. The uncertain traffic network is due to the magnitude of the earthquake's attributes, such as the earthquake's magnitude and the time of its occurrence. Such attributes also affect the vehicle speeds when making split deliveries. Apart from the optimized routing decision, the efficiency of the commodities distribution is further improved through the optimized inventory decision. Both are computed via the simulation optimization technique where the Sample Averaging Approximation (SAA) method is applied.

On the other hand, Ref. [22] presented a two-stage Location Routing Problem (LRP) of distributing first aid relief materials post-earthquake disaster where the complex demand uncertainty is addressed. Here the mixture of uncertain demand from the perspective of randomness (probabilistic theory) and fuzziness (possibility theory) due to the merging of subjective and objective data forms the hybrid demand uncertainties. A scenariobased stochastic demand over a specified interval per scenario is considered for stochastic probabilistic programming due to the strong relationship between events/scenarios such as post-earthquakes and aftershocks and the demands' uncertainty. As such, the demand parameter is considered a fuzzy random variable. In the two-stage robust programming model, the decision for locating a warehouse among existing warehouses is first determined, while the routing and distribution decision is computed in the second stage. The objective is to minimize the cost of warehouse location and the penalty induced by unsatisfied demand. Here the equivalent crisp model is represented by the Basic Possibilistic Chance Constraint Programming (BPCCP) model and later modified as the two-stage robust programming model to overcome the drawback of BPCCP. A small-scale instance problem based on the actual case study of Hamadan province of Iran, a location prone to earthquakes, is applied to verify the model proposed. The solution obtained using CPLEX indicates that the demand satisfaction level is significantly higher than in conventional scenario-based stochastic programming.

Meanwhile, the work proposed in [23] focused on the redistribution of food to charitable agencies, such as homeless shelters and soup kitchens, by picking up the resource (food) from donors such as grocery stores and restaurants. Here the objective is to assure fairness in distributing the donated foods while also considering the waste implicitly through the constraints introduced. The demand from charitable agencies and the resource (donation) are stochastic in this problem. Additionally, equity is the rate between allocated amounts to respective agencies over the total demands. Some assumptions are made, such as unlimited vehicle capacities and that all donors must be visited before distribution is performed. A decomposition strategy in the form of a heuristic is applied to solve the problem where the recipients and donors are clustered first. Then the route is computed for respective clusters, and resources are allocated for each recipient.

Another multi-objective problem is addressed in [24] to deliver both non-perishable and perishable items to demand points considering uncertainty, such as the location and number of relief centers that should be established at the demand points as well as the delivery means of the relief item post-disasters. A Mixed-Integer Non-Linear programming (MINLP) model is formulated to minimise the total distance travelled, the maximum travelled distance between relief centers and demand points, and the total cost associated with acquiring the relief items and vehicles utilized as well as the inventory cost. This model is solved by GAMS software for small-scale instances. Meanwhile, the larger instances are solved by a Grasshopper Optimization Algorithm (GOA) metaheuristic.

Ref. [25] addressed another multi-objective problem involving the COVID-19 pandemic. In this problem, multi-period collection and delivery of multi-products in a single open and close loop Supply Chain (SC) is modelled through the formulation of transportation problems and the Pick and Delivery VRP (PDVRP). Here the open and close supply chain system involving reusable and non-reusable products is transferred from hybrid depots that produce and recycle the products through a forward and reverse flow. The transfer collection centers located in the affected COVID-19 areas acted as the intermediary between the depots and the hospitals. Heterogeneous vehicles traverse the forward and reverse flow via PDVRP from the transfer collection centers to the hospitals when delivering products through split delivery while receiving the old product for reproduction. Various uncertain parameters are defined involving vehicle cost, production cost, the demand from the hospitals, and the returned products when computing for multiple decision variables focusing on the number and routing of the vehicles, production and return of products, shipping, and inventory of products. Finally, a robust optimization approach is applied where the Tchebycheff method is adopted to solve the complex problem.

A complex problem of relief distribution within the humanitarian logistics network and victim evacuation is addressed by [26]. This problem is based on the Facility Location Problem (FLP) and VRP, where the uncertain demand, transportation time, miscellaneous cost, injured victims, and facility capacity are considered. Moreover, the formulated problem involves stakeholders, such as the suppliers of relief materials (e.g., charitable organizations), the distribution centers, the emergency centers that distribute the aids, distribution units where the evacuated victims are located, and hospitals to which they are sent to for further treatment. Thus, the decisions are based on locating and distributing relief materials and allocating evacuated victims assigned to respective routed vehicles. Multi-objectives are also addressed where the total humanitarian logistics cost, the total time of relief operations, and the variation between the lower and upper bound of the transportation cost of the distribution centers are minimized. The uncertainties in the form of inconsistencies and unclear and inaccurate information are captured through the neutrosophic fuzzy set. These uncertainties are prioritized and dealt with, respectively, through the neutrosophic set and the robust optimisation, where the latter deals with the uncertainties associated with worst-case planning involving victims, facility capacity, and relief supply.

The work by [27] similarly addressed the LRP through the Conditional Value at Risk with Regret (CVaR-R) bi-objective mixed nonlinear programming model in dealing with relief distribution post-disaster. In this problem, the optimal decisions include selecting distribution centers to be made operational among all existing distribution centers, the number of vehicles that should be assigned, the assignment of vehicles to respective demand points, and the allocation of relief aids delivered to each of these demand points. The concept of regret when making these decisions due to inaccurate expectations of demands allows for a novel chance constraint programming approach introduced through the CVaR-R measure. The regret value for each objective to: (1) minimise the total waiting time of demand points and (2) minimise the total system cost is measured by defining possible demand scenarios with respective probability. For each demand scenario, the difference between the ideal objective values (from the deterministic model) and the objective values computed given the demands in each scenario is determined, from which the worst-case scenarios are identified and applied to compute the CVaR-R. To solve the problem model, the Nash Bargaining Solution (NBS) approach is applied with the help of a hybrid Genetic Algorithm (GA) when solving the single LRP version of the problem (via the GA) as well as determining the Pareto frontier (via the Non-Dominated Sorting GA Algorithm II (NSGA-II)) used to compute the NBS.

Finally, Ref. [28] also addressed the risk decision factor regarding the relief distribution with uncertain travel time. This problem is formulated based on the multi-level network via a mixed integer programming model to minimise the total arrival time of vehicles delivering relief materials to only selected demand points instead of satisfying demand for all demand points. Furthermore, the near-optimal delivery route computed ensures that a particular service level is reached throughout the operation by formulating the associated constraints. By introducing the risk-averse approach, the objective function is adapted by incorporating the standard deviation term representing the risk of the decision regarding uncertain time travel. Meanwhile, the non-additional term is the expected total arrival that needs to be minimized. Both are weighted to balance the importance of risk to the decision maker when computing for the near-optimal decision. Here, the Variable Neighbourhood Search (VNS) is employed to solve the problem based on the data obtained from the Haiti earthquake case study in 2010.

#### *2.2. Markov Decision Processes Model for Humanitarian Operations*

In this paper, the current trends of VRP in the scope of MDP modelling are discussed. It could be derived from [17] that MDP modelling for VRPs in the setting of humanitarian operations is scarce. This is supported by the finding that only [29] addressed the problem in humanitarian operations. Ref. [29] looked at multiple humanitarian operations, such as delivery and search and rescue, by incorporating the Relief Assessment Team (RAT) and the Emergency Relief Team (ERT), using the Decision-Making Agent (DMA) to coordinate the former two. The problem is modelled as an MDP with multiple random parameters, such as the demand and stability of the transport link. Here, the RAT is tasked to assess affected areas and dynamically report the demand situation for each zone. ERT, on the

other hand, is tasked to serve the zones assigned by the DMA. Meanwhile, decisions are composed by assigning various combinations of aid organisations to a specified affected zone, as well as routing decisions for both the ERT and RAT. Finally, Q-learning is applied to solve the problem involving a maximum of 7 RAT and 10 ERT.

Other than papers reviewed by [17], there are works such as [30] which plan evacuation routing for the last minute disaster preparedness phase. In this problem, residents are evacuated prior to disaster occurrence via buses that pick up stochastic resident evacuees at bus stops. The problem is modelled as a single trip operation with a homogeneous fleet of vehicles within a finite horizon. Here, the action or decision is the assignment of the next pick-up point for each of the vehicles and the number of evacuees taken, suggesting a split delivery operation. The small instance of the problem is solved exactly using structured value iteration. Meanwhile, dynamic re-routing is applied for a large-scale problem through a reduced network flow MIP and Robust MIP (RMIP) model. Beyond these aforementioned works, related works on modelling a VRP through the framework of the MDP for humanitarian operations are sparse.

Instead, the MDP application is used to address humanitarian operations that revolve around the coverage problem, the allocation problem, the path planning, and the scheduling problem. Ref. [31] computed the evacuation routes during a disaster by modelling the problem as an MDP model. However, the work cannot be regarded as a VRP as the target application is not specified in terms that would constitute a VRP, such as the vehicles' availability or their capacity. Similarly, Ref. [32] used the MDP model to address the problem of clearing the debris from blocked edges or roads in an optimal assignment with uncertain clearance resources. In the post-disaster event, the optimal decision was computed considering the delivery of aid or service for demand nodes. Then we have [33] who addressed the problem of congestion in terms of hospital facilities as well as limited ambulances to rescue patients in the aftermath of a disaster. Considering the stochastic treatment time and transportation availability, the decision for such planning was to allocate ambulances to affected patients and to choose which medical facility the ambulance should be headed for. Two types of vehicles are considered: a dedicated ambulance for a location and another ambulance with flexibility in terms of the location. The latter might suggest split delivery. However, neither capacity nor comprehensive routing were considered in this problem. Dynamic patient treatment times were updated, and the problem was solved based on proposed heuristics that applied a myopic approach with policy improvement.

#### *2.3. Markov Decision Processes Model for Industrial Problems and the Application of Approximate Dynamic Programming*

Apart from the application for humanitarian operations, the MDP modelling framework has also been adopted for industrial problems since 2000. Interestingly, most of the early works are dedicated to solve the single VRP with stochastic demand such as in [34–37]. The study in [38] is among the first to look into the theoretical aspect of the pick-up and delivery of a single vehicle-routing problem with stochastic request using MDP modelling. The focus on the multi-vehicle problem is seen later in [39–41] for a VRP with stochastic demands. On the other hand, Ref. [42] deviated from stochastic demands by addressing the problem with stochastic customers or stochastic requests. In this type of application, various different and diverse solution strategies were adopted. This contrasted with the problem with stochastic demands which mainly applied the lookahead approach such as the PDS-RA.

Meanwhile, in [14,43] the route-based MDP was introduced as opposed to the conventional formulation of an MDP addressing a VRP while incorporating a dynamic route plan at every decision epoch. This framework was applied in [44] to solve the problem of maximising the number of services within the same period for stochastic service requests using the Value Function Approximation (VFA). The problem was to decide which new stochastic request should be accepted and which should be postponed to the next operation period. Apart from the fixed working time, multi-periods would be considered as multi-trip

operations. VFA was applied as one of the ADP solution approaches allowing for online computation while dealing with the decision and outcome spaces. Through the VFA, the state space was segregated based on common features. The resulting MDP model led to a huge number of decision points which was dealt with by introducing a classification of multi-period operations. The application of VFA as shown here is commonly known to ignore some details of the state due to the aggregation mechanism within. The same authors in [45] alleviated this problem by proposing a hybrid of two ADP approaches: the VFA and the Rollout Algorithm (RA) to address the stochastic request for a single vehiclerouting problem. The authors later addressed the same problem in [46] with a different hybrid mechanism of VFA and RA by having the second part of the simulation horizon driven by a base policy dictated by the VFA. This method was effective in increasing the solution quality while reducing the computation time. A single vehicle-routing problem was also addressed by [47] with regards to taxi routing when searching for a passenger. In this MDP model, the transition probability is derived from the taxi–passenger matching probability on a link. Here, an enhanced value iteration was applied to solve the problem by reformulating Bellman's Equation into a series of matrix operations.

#### *2.4. Approximate Dynamic Programming in Machine Learning*

As can be seen from the example mentioned above, aside for [47], ADP is commonly adopted as a solution approach for problems formulated in MDP. The ADP emerged as a means to solve real and practical MDP problem models. This is due to the complexity of the MDP model that increases exponentially due to the explosion of state, outcome, and action spaces [48,49]. Such problem renders the exact solution to be prohibitive as shown in [50,51]. Afterwards, Ref. [52] coined the term ADP which is otherwise known as RL or neuro–dynamic programming. The interest in ADP sparked around the mid–1990s when it was extensively written on by [53,54]; although the ideas and concepts could be found dating back to the 1950s. For instance, Ref. [55] described the concept of approximating the value of a position in a chess game which is based on the state of the board. He likened the concept to an experienced player evaluating a move roughly but not based on all possible scenarios. Later, Ref. [12], when introducing dynamic programming, hinted at the idea with the mention of value space and approximation. Then [56] not only applied the idea of [55] in evaluating a position move, but also showcased the practical use of the lookahead approach while approximating the value of a move in the game of checkers. At the same time, he considered the possibility of remembering all the positions and moves, much like the concept of a dynamic lookup table. The work was further improved in [57] with regards to a tree search lookahead with an improved alpha–beta pruning scheme based on the memorisation of a board position, known as the book-learning procedure. The basic ideas from the aforementioned works were explored further resulting in pivotal works of [58–60] which formed what is known as the modern era of ADP [53]. Following that, Refs. [61–63] in his experiments with the game Backgammon afterwards demonstrated the practical application of ADP to solve real and complex problems.

#### *2.5. Rollout Algorithm and Post-Decision State Rollout Algorithm in Approximate Dynamic Programming*

The authors take advantage of the well-known operations of the VRP from the humanitarian operations' perspective and propose that the model based on MDP is adopted for MDDVRPSRC with a known variant of RA (PDS-RA) being applied to solve the problem online. The intuition for the lookahead approach as elaborated above can be identified back to the work of [55] who thought of evaluating a move by thinking ahead in a lookahead manner. The RA is based on the same principle but is refined by means of the quality of the lookahead which depends on the base policy applied. Furthermore, the horizon of the lookahead plays an important role as was mentioned by [64,65]. Another crucial aspect of the rollout is the Monte Carlo sampling that would enable the multiple lookaheads to account for the stochastic parameter for the simulated episode. Finally, the number of

simulations performed would also contribute to a better mean approximation of the state or PDS value. The RA is proposed by [48] belonging to the class type of policy iteration in reinforcement learning. Note that [34] was among the first to develop a specified RA in solving sequencing problems. This work was extended in [35] for a single VRP. In [49], the performance of the rollout policy is compared to the optimistic approximate policy iteration for the single VRP. The former showed a better performance. Cyclic heuristic is applied as the base policy for the RA in [66] to solve the problem of of a VRP with stochastic demand (VRPSD) as well as the problem of the Travelling Salesmen Problem (TSP) with stochastic travel time. The former was also addressed by [36] who presented the two–stage stochastic programming solution approach and compared it to the online approach using RA. The outcome space was addressed exactly for all these problems up to this point. However, Ref. [37] managed to tackle this challenging computation approach by applying Monte Carlo sampling instead in a single- and two-stage RA. The computation was shown to have shrunk by 65% when compared with [35].

In [39], the multi-vehicle VRPSD is finally addressed. The challenges that come with the multi-vehicle problem is addressed through clustering to dedicate a set of different customers to each vehicle. An offline a priori route is computed for respective vehicles, and RA is only applied if a route failure occurred. Although RA was not applied from the get-go, this work highlighted the potential of a clustering mechanism when dealing with a multi-vehicle VRP. This was seen in [40] who made the clustering mechanism dynamic at every decision point as the status of stochastic demands were being updated. The problem was then solved by proposed variants of RA making use of the extension of the MDP framework, pre-decision state (PRE) ,and post-decision state (PDS) , advocated by [67]. Here, the base policy applied was the fixed route heuristic. This heuristic was relaxed forming a known restocking fixed route heuristic in [41], from which policies were iterated and obtained through a local search. These policies were evaluated with the help of the optimal value computed by dynamic programming to help with pruning the search space in the search for a more effective rollout base policy. This base policy was then applied in the RA to obtain a more optimal policy for the problem. The same concept was applied in [68] for the same problem, apart from the duration limit for a single vehicle, where a hybrid of backward and forward recursive dynamic programming was applied instead. In the work of [42], the authors applied RA with the cheapest insertion heuristic as the base policy for the problem of single VRP with stochastic requests. The solution was compared with the solution obtained from a greedy heuristic and VFA, respectively. Meanwhile, Ref. [69] proposed a framework for applying RA for the dynamic and stochastic VRP as an ADP solution approach. In [45], PDS–RA was applied with the VFA method driving the decision rule for the lookahead.

#### *2.6. Rollout Algorithm as Matheuristic*

An RA with which the base heuristic is driven by the policy of applying a mathematical programming method could be regarded as a matheuristic method. Among those who applied such a technique are [15,70–73]. In the work of [70], the authors addressed the scheduling problem with RA, where the decision rule was obtained using the quadratic programming approach. In [71], the Mixed Integer Linear Programming (MILP) was applied to obtain the decision rule for the RA in solving the problem of inventory routing with a single vehicle. Such approaches were also seen in [72,73] for solving the inventory routing problem. In other works, Ref. [15] proposed a matheuristic RA method to solve the multi-vehicle routing problem for humanitarian delivery operations by reducing the two-stage stochastic programming model to two reduced models that was dependent on the vehicle's mode of operation: replenishment or serving an emergency shelter.

#### *2.7. Knowledge Gap and Research Contribution*

By referring to Section 2.2, it is very clear that a humanitarian VRP, which is being addressed through machine learning, is still lacking. Meanwhile, from the industrial problem's perspective, the literature available shows that solving the VRP problem via reinforcement learning and RA are largely limited to stochastic demands, customers, or request problems. No such approach applied has addressed the problem with stochastic road capacity. The authors intend to fill this gap.

Furthermore, addressing the VRP through MDP formulation and the RA solution method is often limited to those of a single-vehicle problem as seen in the work [35–37,41,42,45,49,66,68]. This is evident even in the published works of the last five years. Those who solve multi-vehicle problems such as the work [39–41] often resort to a clustering or decomposition method of the vehicles to a set of different customers. Such a method, with the exception of a dynamic decomposition method, could not be performed when addressing MDDVRPSRC as the road capacity; thus the route is uncertain. Furthermore, it is not clear how a dynamic clustering or decomposition method could address the problem with stochastic road capacity while also accounting for the split delivery and multi-trip operations. To the best of the authors' knowledge, there is no literature that has proposed the method of building the decision rule on the go, guided by the constructive heuristic as the policy while performing the rollout. Similarly, no known variants of such heuristics have been introduced to allow for decision ruling on the go. Here the authors intend to fill the research gaps by proposing the application of proposed heuristics (TBIH-1–TBIH-5) within the PDS-RA adopted from [13]. In terms of the modelling approach, to the best of the authors' knowledge, no literature addresses the MDDVRPSRC, whether in the form of a mathematical programming model or in the MDP formulation, especially in humanitarian operations settings. Most literature for MDP formulation in VRPs addressed multi-trip operations only for on-route failure occasions. For example, the work such as [74] addressed the split operation also when triggered by the event of route failure due to stochastic demands. Literature such as [44,45] which addressed the multi-period operations, however, do not include the possibility of split delivery operations. Furthermore, most of these models only described operations involving a single vehicle. We differ from existing literature by intentionally allowing multi-trip operation as well as split delivery to address the limitation of delivery trucks during a disaster event rather than a result of route failure. Addressing multi-vehicles leads to the explosion of action space and is often very difficult to solve without resorting to a clustering approach. The authors therefore present here how with a moderate number of destination nodes in various simulated road network, the MDDVRPSRC could be solved without utilising a clustering or dynamic clustering method.

#### **3. MDDVRPSRC MDP Model**

#### *3.1. Problem Statement*

The problem of MDDVRPSRC focuses on the delivery problem, one of the crucial humanitarian operations during a disaster and post-disaster event. Here, the road network is represented as an undirected incomplete graph *G* = (*H*, *E*) in graph theory. *H* is the set of nodes in graph *<sup>G</sup>* such that *<sup>H</sup>* = {*D*} \*{*S*} \*{*N*} where *<sup>D</sup>*, *<sup>S</sup>*, and *<sup>N</sup>* are the set of depots, emergency shelters, and connecting nodes respectively. The connecting nodes represent the junction connecting the edges (*i*, *j*) ∈ *E*, representing the roads such that *<sup>E</sup>* = {(*i*, *<sup>j</sup>*) : *<sup>i</sup>*, *<sup>j</sup>* ∈ *<sup>N</sup>* \* *<sup>S</sup>* \* *<sup>D</sup>*, *<sup>i</sup>* = *<sup>j</sup>*}. Note that emergency shelters whose demands are satisfied can act as connecting nodes for vehicles to travel through.

In MDDVRPSRC, the medical supplies are to be delivered to temporary erected emergency shelters, *s* ∈ *S*, with different demands, *ws*, by a homogeneous fleet of vehicles, *m* ∈ *M*, with capacity, *qm*. The delivery of medical supplies is conducted via split delivery to account for the limited number of vehicles during the sudden onset of a disaster. Vehicles are allowed to perform multi-trip deliveries throughout the humanitarian operations to satisfy all the demands. The vehicle capacity, *qm*, can be replenished to a full capacity, *Q*, as soon as they return to the depot, *d* ∈ *D*. All vehicles must be dispatched throughout the operation until the total demand is satisfied as there is no guarantee that any assigned vehicles might reach their designated emergency shelter. More on this point, a vehicle is considered stranded when it is unable to travel to the next node on the way to the depot for the consecutive decision points of *ST* once the total demand is satisfied. Unless such an event occurs, all vehicles must return to the depot. However, they are not constrained as to which depot they should return to, advocating the flexibility needed for the humanitarian operations.

All throughout the operation, the road capacity, *ri*,*j*, is uncertain. The mean road capacity for each road, (*i*, *j*), as well as the capacity distribution deteriorates as the operation time progresses for all damaged roads [15]. The deteriorating road capacity mean is due to the damages inflicted by the subsequent post-disaster events, such as tremors of an earthquake. This is simulated as the outward radial circles originating from the epicentre of the earthquake [15]. For more information of road capacity distribution, road damage determination, and simulation of earthquake tremors, refer to the work of [15].

#### *3.2. Agent Solving MDDVRPSRC in Reinforcement Learning*

In reinforcement learning, an agent learns to make decisions based on given information of the system. The information is given in the form of the state representation of the system as well as the transition of the state when making a constrained action or decision. An agent learns to make optimal decisions based on the series of interactions it has made from making decisions and in return obtained rewards. An MDP model formulates how an agent sees the system through: (i) the state representation, (ii) how the system transitions, (iii) the constrained actions or decisions it is allow to make as well, and (iv) the reward it received from making a decision. In this work, the agent perceived the problem as a MDDVRPSRC and will make a near optimal decision at every decision point. This agent then becomes a part of the DSS for delivering critical medical supplies during a disaster.

The agent learns of the aforementioned delivery operations of the MDDVRPSRC through a series of discrete states it observes at each decision point, *k*, beginning with the initial state, *s*0, until the end state, *sK*, representing the end of delivery operations. The Pre-Decision State (PRE), *sk*, represents the state of the operations at decision point, *k*, prior to decision, *ak*, computed by the agent. The state observed by the agent after decision, *ak*, is made and denoted as the Post-Decision State (PDS), *s ak <sup>k</sup>* . These states (PRE and PDS) are described through the state variables *l*, *t*, *q*, *w*,*e*, and *r*, respectively:


In the MDDVRPSRC MDP formulation, an agent computes a decision, *ak*, to send a fleet of vehicles to a destination node based on the state that it observed, *sk*. Once the decision *ak* is executed, the PRE transitions to PDS, *s ak <sup>k</sup>* , deterministically and waits for the next decision point, *k* + 1, which is triggered at *Tk*<sup>+</sup><sup>1</sup> by the arrival of one or more vehicles *m* ∈ *M <sup>k</sup>*+<sup>1</sup> simultaneously at the emergency shelter *s* ∈ *S* or connecting node *n* ∈ *N* or depot *d* ∈ *D*.

Once the decision point *k* + 1 is triggered, the random road capacities *r*ˆ*i*,*<sup>j</sup>* are observed for all roads (*i*, *j*) ∈ *E* through the dynamic update from the locals at the arrival destinations. At this point, the PDS transitions to PRE, *sk*+1, stochastically and the PRE state variables *ek* and *rk* are updated. This new random information is now known to the agent (via the updates of *ek* and *rk*) who then uses it to compute the next decision, *ak*+1, for all vehicles. Once the decision *ak*<sup>+</sup><sup>1</sup> is computed, the PRE transitions to PDS, *s ak*+<sup>1</sup> *<sup>k</sup>*+<sup>1</sup> . At the same time, demand *wh*, ∀*h* ∈ *S* is served or the vehicle's capacity is replenished or neither (when a vehicle arrives at connecting node *h* ∈ *H* ∩ {*S* + *D*}) depending on where the vehicles

arrived. Hence, the variables of PDS representing the next destination, *l ak* , arrival time to next destination *tak* , capacity, *qak* , status edges travelled, *eak* and demand status, *wak* are updated accordingly. The MDDVRPSRC formulation that follows is developed by referring to [75] and the work of [13]. All parameters, state variables. and decision variables are listed in Tables 1 and 2.

**Table 1.** MDDVRPSRC: Parameters.


**Table 2.** MDDVRPSRC: State and Decision Variables.



#### *3.3. MDDVRPSRC Formulation*

The pre-decision state (PRE) *sk* is a multi dimensional vector consisting of other vectors representing each state variable, respectively. Within this vector are the state variables defined in Table 2:

$$s\_k = |l\_{k'} t\_{k'} q\_{k'} w\_{k'} r\_{k'} v\_k|.\tag{1}$$

The PDS representation shares the same features as the PRE differs only by annotation and that its variables are updated after decision *ak* is made:

$$s\_k^{a\_k} = [l^{a\_k}, t^{a\_k}, q^{a\_k}, w^{a\_k}, r^{a\_k}, e^{a\_k}].\tag{2}$$

Once the decision point is triggered, based on the minimum current values within the arrival time vector *tak* , at:

$$T\_k = \min\_{m \in M, t\_m^{a\_k} \ge 0} t^{a\_k},\tag{3}$$

the respective decision/assignment is computed for all vehicles including those that arrived as shown in Equation (4) below:

$$M\_k' = \underset{m \in M, \, t\_m^{a\_k} \ge 0}{\text{arg min}} \quad \text{s} \tag{4}$$

Here, the vehicle that is still en route to its destination during the decision point *k* is denoted by {*m* ∈ *M*\*M k*}.

In the MDDVRPSRC, the decision *ak* is a |*M*| dimensional vector in a decision space (a set) *A*(*sk*) given the state *sk*. The decision involves assigning the next destination for all vehicles at every decision point *k*,

$$a\_k = a \in H^{|M|} = [a\_{0\prime} a\_1, \dots, a\_{|M-1|}] \in A(s\_k). \tag{5}$$

However, the decision space *A*(*sk*) for MDDVRPSRC is too large to obtain a good solution within reasonable computation efforts. Therefore, for every decision point *k*, the decision by the agent is computed as proposed in [15] where the reduced decision set for a single vehicle *Om* is as defined for the rollout in Table 1:

$$\begin{split} \mathcal{A}(S\_{k}) = \{a\_{k} \in H^{\text{int}} \; ; \; \mathcal{I} \\ & \quad \quad a\_{m} = j, \forall \{m \in M\_{k}^{l} : i = lm\_{j} \neq i, r\_{l,j} > 0, \boldsymbol{w}\_{0} \neq \boldsymbol{0}, q\_{m} \neq \boldsymbol{0}, j \in \mathcal{O}\_{\text{m}} : \mathcal{O}\_{\text{m}} \subset \mathcal{S} \} \\ & \quad a\_{m} = j, \forall \{m \in M\_{k}^{l} : i = lm\_{j} \neq i, r\_{l,j} > 0, \sum\_{h \in H} \boldsymbol{w}\_{h} \neq \boldsymbol{0}, q\_{m} = \boldsymbol{0}, j \in \mathcal{O}\_{\text{m}} : \mathcal{O}\_{\text{m}} \subset \mathcal{O} \} \\ & \quad a\_{m} = j, \forall \{m \in M\_{k}^{l} : i = lm\_{j} \neq i, r\_{l,j} > 0, \sum\_{h \in H} \boldsymbol{w}\_{h} = \boldsymbol{0}, i \notin \mathcal{I} \} \neq \boldsymbol{0}, j \in \mathcal{O}\_{\text{m}} : \mathcal{O}\_{\text{m}} \subset \mathcal{O} \\ & \quad a\_{m} = i, \forall \{m \in M\_{k}^{l} : i = lm\_{j}, j, d \neq i, r\_{l,j} > 0, r\_{l,j} = 0, \sum\_{h \in H} \boldsymbol{w}\_{h} \neq \boldsymbol{0}, q\_{m} \neq 0, \\ & \quad j, d \in \mathcal{O}'\_{m} \subset \mathcal{O} \} \\ & \quad a\_{m} = i, \forall \{m \in M\_{k}^{l} : i = lm\_{j}, j \neq i, r\_{l,j} > 0, r\_{l,j} = 0, \sum\_{h \in H} \boldsymbol{w}$$

The state *Sk* transitions deterministically to PDS, *s ak k* :

$$s\_k^{a\_k} = S^{M,a}(s\_{k'} a\_k)\_{'} \tag{7}$$

where the decision is made by the agent (*l ak* = *ak*). This is where the next destination *l ak* , arrival time *tak* , capacity of vehicle *qak* , travelled edges status by vehicles *eak* , as well as the demands status *wak* are updated. At this point, the stochastic road capacity is not known; hence *rak* is not updated.

The time of arrival *t ak <sup>m</sup>* ∈ R, ∀*m* ∈ *M* is updated to: *t*

*ak <sup>m</sup>* = ⎧ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ *tm* − *Z*, ∀{*m* ∈ *M <sup>k</sup>* : *lm* ∈ *D*, ∑ *h*∈*H wh* = 0} *tm* − *Z*, ∀{*m* ∈ *M <sup>k</sup>* : *i* = *lm*,*ri*,*<sup>j</sup>* = 0, ∀(*i*, *j*) ∈ *E*, |*F*(*m*, *i*)| = *ST*} *twait*, ∀{*m* ∈ *M <sup>k</sup>* : *am* = *lm*, *lm* ∈ *D*, *wh* = 0, ∀*h* ∈ *H*, } *Tk* + *tlm*,*am* , ∀{*m* ∈ *M <sup>k</sup>* : *lm* = *am*, *lm* ∈ *D*, ∑ *h*∈*H wh* = 0, *plm*,*am* = 0} *Tk* <sup>+</sup> *tlm*,*am* <sup>+</sup> *tlm*,*am* <sup>×</sup> *plm*,*am* <sup>10</sup> , ∀{*m* ∈ *M <sup>k</sup>* : *lm* = *am*, *lm* ∈ *D*, ∑ *h*∈*H wh* = 0, *plm*,*am* = 0} *twait*, ∀{*m* ∈ *M <sup>k</sup>* : *am* = *lm*, ∑ *h*∈*H wh* = 0, } *Tk* + *tlm*,*am* , ∀{*m* ∈ *M <sup>k</sup>* : *lm* = *am*, ∑ *h*∈*H wh* = 0, *plm*,*am* = 0} *Tk* <sup>+</sup> *tlm*,*am* <sup>+</sup> *tlm*,*am* <sup>×</sup> *plm*,*am* <sup>10</sup> , ∀{*m* ∈ *M <sup>k</sup>* : *lm* = *am*, ∑ *h*∈*H wh* = 0, *plm*,*am* = 0} *tm*, otherwise (8)

where *twait* is defined as:

$$t\_{\text{wait}} = \begin{cases} t\_{2\prime} & t\_k' = (t\_1, t\_{2\prime}...t\_n), \quad t\_i \in t\_k : t\_i \ge 0, t\_i - t\_{i-1} > 0, n \ge 2\\ \min\_{\forall (i,j) \in E} t\_{i,j\prime} & \text{otherwise} \end{cases},\tag{9}$$

and *t <sup>k</sup>* is an *n*-tuple with an increasing order of arrival time.

Meanwhile, the capacity of all vehicles *m* ∈ *M* is updated to:

$$q\_{\mathfrak{m}}^{a\_{\mathfrak{k}}} = \begin{cases} Q, & \forall m \in \{M\_{\mathbf{k}}': l\_{\mathfrak{m}} \in D\} \\ \max(q\_{\mathfrak{m}} - w\_{l\_{\mathfrak{m}'}}, 0), & \forall m \in \{M\_{\mathbf{k}}': l\_{\mathfrak{m}} \in S\} \\ q\_{\mathfrak{m}'} & \text{otherwise} \end{cases} \tag{10}$$

The travelled edges *<sup>e</sup>ak* are updated ∀(*i*, *<sup>j</sup>*) ∈ *<sup>E</sup>*, ∀*<sup>m</sup>* ∈ *<sup>M</sup>*:

$$\mathfrak{e}\_{i,j,m}^{a\_k} = \begin{cases} 1, & \forall m \in \{M\_k' : i = l\_{m,j} = a\_{m,i} i \neq j\} \\ \mathfrak{e}\_{i,j,m}, & \text{otherwise} \end{cases}. \tag{11}$$

Finally, the demand of emergency shelter is also updated ∀*h* ∈ *H*:

$$w\_{l\_k}^{a\_k} = \begin{cases} \max(w\_{l\_m} - q\_{m\prime}0), & \forall m \in \{M\_k^{\prime} : l\_m \in S\} \\ w\_{l\prime} & \text{otherwise} \end{cases}.\tag{12}$$

At decision point *Tk*+1, the uncertainty of the road capacity *r*ˆ*i*,*j*∀(*i*, *j*) ∈ *E* is now observed by the agent which leads to the transition from PDS to PRE, *sk*+1:

$$s\_{k+1} = S^{M,W}(s\_k^{a\_k}, \mathcal{W}\_{k+1}).\tag{13}$$

The road capacity at this point is no longer uncertain ( *ri*,*j*∀(*i*, *j*) ∈ *E*) since it has been sampled/known to vehicles that have arrived at their destinations. This information is thus known to the agent. The next destination *lm* = *l ak <sup>m</sup>* (∀*m* ∈ *M*), the arrival time *tm* = *t ak <sup>m</sup>* (∀*m* ∈ *M*), capacity of the vehicle *qm* = *q ak <sup>m</sup>* (∀*m* ∈ *M*), as well as the shelter demand *wh* <sup>=</sup> *<sup>w</sup>ak <sup>h</sup>* (∀*h* ∈ *H*) remain the same.

The travelled edges status, *ek* is updated ∀*m* ∈ *M*, ∀(*i*, *j*) ∈ *E*:

$$\varepsilon\_{i,j,m} = \begin{cases} 0, & \forall m \in M'\_k \\ \varepsilon\_{i,j,m'}^{a\_k}, & \text{otherwise} \end{cases} \tag{14}$$

The road capacity *rk* is at this point observed and updated:

$$r\_{i,j} = \min((\dot{r}\_{i,j} - \sum\_{m \in M} e\_k(i, j, m)), 0) \quad \forall (i, j) \in E,\tag{15}$$

where the random road capacity *r*ˆ*i*,*j*∀(*i*, *j*) ∈ *E* is obtained from a random Poisson distribution as described in [15].

When transitioning to PDS, *s ak <sup>k</sup>* , the agent receives a reward *Rk*(*sk*, *ak*) contributed by all vehicles *m* ∈ *M* at decision point *k*:

$$R\_k(s\_{k\prime}a\_k) = \sum\_{m \in \mathcal{M}} R\_{k,m}(s\_{k\prime}a\_k)\_{\prime} \tag{16}$$

where *Rk*,*m*(*sk*, *ak*), ∀*m* ∈ *M* is given by:

*Rk*,*m*(*sk*, *ak*) = ⎧ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ 0, ∀*m* ∈ {*M*\*M k*} 0, ∀*m* ∈ {*M <sup>k</sup>* : *i* = *lm*, *lm* = *am*, ∑ *h*∈*H wh* = 0} (*max*(*wlm* − *qm*, 0) × *G*2) − *clm*,*am* − *tlm*,*am* , ∀*m* ∈ {*M <sup>k</sup>* : *qm* = 0, *lm* ∈ *S*, *wlm* > *qm*, *plm*,*am* = 0} (*max*(*qm* − *wlm* , 0) × *G*2) − *clm*,*am* − *tlm*,*am* , ∀*m* ∈ {*M <sup>k</sup>* : *qm* = 0, *lm* ∈ *S*, *wlm* < *qm*, *plm*,*am* = 0} (*max*(*wlm* <sup>−</sup> *qm*, 0) <sup>×</sup> *<sup>G</sup>*2) <sup>−</sup> *clm*,*am* <sup>−</sup> *tlm*,*am* <sup>+</sup> *tlm*,*am* <sup>×</sup> *plm*,*am* 10 , ∀*m* ∈ {*M <sup>k</sup>* : *qm* = 0, *lm* ∈ *S*, *wlm* > *qm*, *plm*,*am* = 0} (*max*(*qm* <sup>−</sup> *wlm* , 0) <sup>×</sup> *<sup>G</sup>*2) <sup>−</sup> *clm*,*am* <sup>−</sup> *tlm*,*am* <sup>+</sup> *tlm*,*am* <sup>×</sup> *plm*,*am* 10 , ∀*m* ∈ {*M <sup>k</sup>* : *qm* = 0, *lm* ∈ *S*, *wlm* < *qm*, *plm*,*am* = 0} *G*2 − *clm*,*am* − *tlm*,*am* , ∀*m* ∈ {*M <sup>k</sup>* : *lm* = *am*, *qm* = 0, *lm* ∈ *S*, *wlm* = *qm*, *plm*,*am* = 0} or ∀*m* ∈ {*M <sup>k</sup>* : *lm* = *am*, *qm* = 0, *am* ∈ *<sup>D</sup>*, ∑ *h*∈*H wh* = 0, *plm*,*am* = 0} or ∀*m* ∈ {*M <sup>k</sup>* : *lm* = *am*, *am* ∈ *<sup>D</sup>*, ∑ *h*∈*H wh* = 0, *plm*,*am* = 0} *<sup>G</sup>*<sup>2</sup> <sup>−</sup> *clm*,*am* <sup>−</sup> *tlm*,*am* <sup>+</sup> *tlm*,*am* <sup>×</sup> *plm*,*am* 10 , ∀*m* ∈ {*M <sup>k</sup>* : *lm* = *am*, *qm* = 0, *lm* ∈ *S*, *wlm* = *qm*, *plm*,*am* = 0} or ∀*m* ∈ {*M <sup>k</sup>* : *lm* = *am*, *qm* = 0, *am* ∈ *<sup>D</sup>*, ∑ *h*∈*H wh* = 0, *plm*,*am* = 0} or ∀*m* ∈ {*M <sup>k</sup>* : *lm* = *am*, *am* ∈ *<sup>D</sup>*, ∑ *h*∈*H wh* = 0, *plm*,*am* = 0} −*clm*,*am* − *tlm*,*am* , otherwise, *plm*,*am* = 0, ∀*m* ∈ *M k* <sup>−</sup>*clm*,*am* <sup>−</sup> *tlm*,*am* <sup>+</sup> *tlm*,*am* <sup>×</sup> *plm*,*am* 10 , otherwise, *plm*,*am* = 0, ∀*m* ∈ *M k* (17)

Here, a policy of decisions denotes the guiding principle on which the decision is based. For example, a policy *<sup>π</sup>*<sup>B</sup> <sup>∈</sup> <sup>Π</sup> could be a heuristic if a decision function *<sup>A</sup>π*<sup>B</sup> (*sk*) is mapped from that heuristic. In this model formulation, the decision *ak* <sup>=</sup> *<sup>A</sup>π*(*sk*) <sup>⊂</sup> *<sup>A</sup>*(*sk*)[67] is selected among other potential decisions in the decision space, *A*(*sk*) following a certain policy *<sup>π</sup>* <sup>∈</sup> <sup>∏</sup> such that the decision rule function *<sup>A</sup>π*(*sk*) : *sk* <sup>→</sup> *ak*.

The objective (Equation (18)) is to find an optimal policy *π* such that the expected total rewards are maximised (objective function for the Bellman optimality equation [67]):

$$\max\_{\pi \in \Pi} \quad \mathbb{E}^{\pi} \left\{ \sum\_{k=0}^{K} \left( R\_k(s\_{k\prime} A^{\pi}(s\_k)) \right) \right\}. \tag{18}$$

Hence:

$$\pi^{\star} = \underset{\pi \in \Pi}{\text{arg}\, \text{max}} \quad \mathbb{E}^{\pi} \left\{ \sum\_{k=0}^{K} \left( R\_k(s\_{k\prime} A^{\pi}(s\_k)) \right) \right\},\tag{19}$$

where for every decision point *k*, the optimal decision *a <sup>k</sup>* is chosen: *<sup>a</sup> <sup>k</sup>* <sup>=</sup> *<sup>A</sup>π* (*sk*) by following the optimal policy *π*.

#### **4. MDDVRPSRC Solution Approach**

To solve an MDP is to seek an optimal policy *π*. Through this optimal policy, every decision made by the agent is optimal: *Aπ* (*sk*) : *sk* <sup>→</sup> *<sup>a</sup> <sup>k</sup>* as stated by the principal of optimality [12]. To obtain the optimal policy, the Bellman Equation is solved [12] such that the optimal decision *a <sup>k</sup>* is computed for every decision point *k* for each given state *sk* observed by the agent. This series of optimal computed decisions is said to be guided by the optimal policy *<sup>π</sup>* <sup>∈</sup> <sup>Π</sup>, and therefore the problem formulated in MDP is solved. To compute the optimal decision, *a <sup>k</sup>* , the Bellman Equation could be written as Equation (20) [67]:

$$a\_k^\* = \underset{a\_k \in A(s\_k)}{\text{arg}\, \max} \{ R\_k(s\_{k\prime} a\_k) + \lambda^k \mathbb{E}\{ V\_{k+1}(s\_{k+1}) \} \}. \tag{20}$$

Due to the curse of dimensionality as seen in Equation (20), computing for an optimal decision is often intractable. To alleviate the curse associated with the outcome or transition space, the PDS is introduced [67] (Equation (21)) and the equation was rewritten as in Equation (22):

$$\mathcal{V}\_k^{a\_k}(s\_k^{a\_k}) = \mathbb{E}\{V\_{k+1}(s\_{k+1})\},\tag{21}$$

$$a\_k^\* = \underset{a\_k \in A(s\_k)}{\text{arg}\max} \left( R\_k(s\_k, a\_k) + \lambda^k V\_k^{a\_k}(s\_k^{a\_k}) \right). \tag{22}$$

Even with PDS introduced as above, the computation for an optimal decision is usually challenging, especially when computing the value of the PDS, *Vak <sup>k</sup>* (*s ak <sup>k</sup>* ) in Equation (22). The ADP approach approximates the value of PDS instead. This is to deal with the large state space in the MDP. Through the ADP approach, Equation (22) is rewritten as Equation (23), where the value of PDS is approximated and thus the decision is computed to near optimality instead:

$$\widetilde{a\_k^\star} = \underset{a\_k \in A(s\_k)}{\arg\max} \left( R\_k(s\_k, a\_k) + \lambda^k \overline{V\_k^{a\_k}}(s\_k^{a\_k}) \right). \tag{2.3}$$

For MDDVRPSRC, this equation is still challenging to solve since the decision space *A*(*sk*) in Equation (23) is too large for a practical number of vehicles to be involved. Furthermore, the decisions consist of combinations of vehicle assignments that would require a long rollout horizon as well as a large number of Monte Carlo simulations. The concern is that the reward obtained for one vehicle may exaggerate the value of the decision for all vehicles collectively if computed prematurely. This is seen in the initial experiments

where, given limited computation capabilities on the machine, inefficient assignments for vehicles resulted.

Alternatively, to cope with this challenge, the near optimal decision could be further approximated as in Equation (24) as proposed in work [15]:

$$\widetilde{a\_k^\star} \in H^{|M|} \approx [\widetilde{a\_1^\star}, \widetilde{a\_2^\star}, \dots, \widetilde{a\_{|M|}^\star}] \quad \forall k\_\star \tag{24}$$

where the decision space, made of combinations of vehicle assignments (which could be astronomical), could be restricted to a decision space of possible assignments for each vehicle *Om* instead, as shown in Equation (25):

$$\widehat{a^{\star}\_{m}} = \underset{a\_{m} \in O\_{m}}{\arg\max} \left( R\_{k}(s\_{k'}a\_{k}) + \lambda^{k} \overline{V\_{k}^{d\_{m}}}(s\_{k}^{a\_{m}}) \right) \quad \forall (k,m). \tag{25}$$

Even with such measures, given the machine's capabilities, the computation is only limited to a small number of rollout horizons and Monte Carlo simulations. However, it is shown in this work that the decisions computed are applicable to this type of problem.

To compute Equation (25), the PDS value, *Vam <sup>k</sup>* (*s am <sup>k</sup>* ) is approximated using PDS-RA, as proposed by [40]. However, different base heuristics can be applied to solve for the MDP problems with characteristics such as the one in this work. Finally, this approach is applied to reciprocate the model for the agent's decision in Equation (6).

#### *4.1. PDS-RA Algorithm*

PDS–RA is one of the RA families first introduced in [40] as an ADP solution algorithm to the dynamic VRP with stochastic demand. PDS–RA takes advantage of the PDS structures that alleviate the problem associated with the outcome or transition space. This thus reduces the number of rollout executions compared to the conventional RA to approximate the value of PDS in a modified Bellman Equation effectively. The general PDS-RA could be referred to in Algorithm 1. Here, the rollout transitioned PDS (simulation) is denoted as *s<sup>a</sup>* to avoid confusion with the real-time transitioned PDS, (*s ak <sup>k</sup>* ) observed by the agent. In this algorithm, the values of PDS, *Vak <sup>k</sup>* (*s ak <sup>k</sup>* ) associated with each respected *ak* ∈ *A*(*sk*) is approximated (*Vak <sup>k</sup>* (*s ak <sup>k</sup>* )). For each possible PDS associated with the next decision *ak* ∈ *A*(*sk*), the PDS-RA is executed, and by the end of the execution, the approximated value of PDS, *Vak <sup>k</sup>* (*s ak <sup>k</sup>* ) is obtained. In every execution of PDS-RA, the base policy *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) <sup>∈</sup> <sup>Π</sup> is first

assigned (policy to apply heuristic B), computing the decision rule function *<sup>π</sup>*B(*<sup>s</sup> ak k* ) (*sk*) for the rollout simulation is based on the heuristic B performed given the PDS, *s ak <sup>k</sup>* . Based on this specific decision rule (which is normally the assignment of vehicles to next destination for VRP), the lookahead into the future as far as horizon *K* is performed. Transiting from

the simulated PRE to PDS is enabled by referring to *<sup>π</sup>*B(*<sup>s</sup> ak k* ) (*sk*) when making a decision during the lookahead. This decision is followed by a stochastic transition, transitioning PDS back to PRE (*sk* = *SM*,*W*(*sa*, *Wk*<sup>+</sup>1(*ω*(*k* + 1)))) in the lookahead simulation. Here, the random information of road capacities for all roads, (*Wk*<sup>+</sup>1(*ω*(*k* + 1))), is known by sampling *ω*(*k* + 1) ∈ Ω through a known distribution as part of the Monte Carlo simulation approach. By sampling *ω*(*k* + 1) ∈ Ω, the exhaustive computation for all random transitions of outcomes in the outcome space Ω is prevented as first observed by [37] in her application of RA.

Each time the transition occurs within the lookahead along the horizon, rewards *Rk*(*sk*, *ak*) are consecutively amassed. At the end of a one-episode lookahead simulation, the sum of rollout rewards *<sup>B</sup>*.*n*(*π*B(*<sup>s</sup> ak k* ) , *k* + 1, *K*) is obtained. This value is used to update the approximated PDS value of the respective potential decision *ak* through an incremental mean approach (Algorithm 1: line 14). The process repeats for *N* Monte Carlo simulation episodes. After this number of Monte Carlo simulations, the resulting updated approximated PDS value, *Vak k N* (*s ak <sup>k</sup>* ) is considered good enough an estimation for the respective

decision *ak*. This approximated PDS value is mapped to the respective decision by a function: *<sup>f</sup>* : *ak* <sup>←</sup> *<sup>V</sup>ak k N* (*s ak <sup>k</sup>* ).

**Algorithm 1** Compute *V<sup>a</sup> <sup>k</sup>* (*s ak <sup>k</sup>* ) (as shown in [75] based on PDS-RA proposed by [13] and highlighted in [15]).

**Require:** *sk*, *λ*, *ak* **Ensure:** *V<sup>a</sup> <sup>k</sup>* (*s ak k* ) 1: Initialise *n*, *k*, *Rk*(*sk*, *ak*), *B<sup>n</sup>* 2: *s ak <sup>k</sup>* <sup>=</sup> *<sup>S</sup>M*,*a*(*sk*, *ak*) 3: *<sup>π</sup>*B(*<sup>s</sup> ak k* ) (*sk*) <sup>←</sup> *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) ← B(*<sup>s</sup> ak k* ) 4: **while** *n* ≤ *N* **do** 5: *<sup>s</sup><sup>a</sup>* ← *<sup>s</sup> ak k* 6: **while** *k* = *K* **do** 7: *Rk*(*sk*, *ak*) = *Rk*(*sk*, *ak*) + *λkRk*(*sk*, *ak*) 8: *sk* = *SM*,*W*(*sa*, *Wk*<sup>+</sup>1(*ω*(*k* + 1))) 9: *ak* ← *<sup>π</sup>*B(*<sup>s</sup> ak k* ) (*sk*) 10: *s<sup>a</sup>* = *SM*,*a*(*sk*, *ak*) 11: *k* = *k* + 1 12: **end while** 13: *<sup>B</sup>*.*n*(*π*B(*<sup>s</sup> ak k* ) , *k* + 1, *K*) ← *Rk*(*sk*, *ak*) 14: *V<sup>a</sup> k n* (*s ak <sup>k</sup>* ) = *<sup>V</sup><sup>a</sup> k n*−1 (*s ak <sup>k</sup>* ) + <sup>1</sup> *n <sup>B</sup>*.*n*(*π*B(*<sup>s</sup> ak k* ) , *<sup>k</sup>* + 1, *<sup>K</sup>*) − *<sup>V</sup><sup>a</sup> k n*−1 (*s ak k* ) 15: *n* = *n* + 1 16: **end while** 17: return *Vak k N* (*s ak k* )

The PDS-RA is then executed for the next possible decision *ak* ∈ *A*(*sk*) after which the process repeats. Finally, when the PDS-RA is executed ∀*ak* ∈ *A*(*sk*), Equation (23) can now be computed based on all approximated PDS values associated with each respective potential next decision for agent to make. Based on the computation of Equation (23), a decision is then made.

It should be noted that Algorithm 1 is applied to the rollout and looks into the future of each potential decision where each decision *ak* revolves on the assignments of all vehicles *am* <sup>=</sup> *ak*[*m*], <sup>∀</sup>*<sup>m</sup>* <sup>∈</sup> *<sup>M</sup>* simultaneously per PDS-RA. The base policy *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) guides the construction of the decision function *<sup>π</sup>*B(*<sup>s</sup> ak k* ) (*sk*) in one go, per PDS-RA execution ∀*ak* ∈ *A*(*sk*).

In the applied solution, the rollout is executed for every potential next destination for each vehicle *am*, ∀*m* ∈ *M*, and the value for each PDS associated with the potential next destination is computed by PDS-RA, such that near optimal assignments *a <sup>m</sup>* would be computed in Equation (25). These near optimal individual assignments form the near optimal decision as described in Equation (24). The base policy *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) is based on an iterative policy *π*B(*sk* ) applied at each lookahead decision point *sk* to construct the decision rule *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) on-the-go using constructive base heuristics B. The decision rule *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) is constructed on the go such that *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) : *sk* → *<sup>η</sup> <sup>π</sup>*B(*<sup>s</sup> <sup>k</sup>*) (*sk*) where *η <sup>π</sup>*B(*<sup>s</sup> <sup>k</sup>*) (*sk*) is the decision rule computed at every decision epoch *k* in the lookahead horizon when applying heuristic B based on the rollout state *sk* according to the iterative policy *π*B(*sk* ).

In Algorithm 2, for example, CPLEX (denoted as *CPLEX*) is applied instead as the base heuristic. Here, CPLEX is run for a Stochastic Linear Integer Programming (SILP) version of the reduced MDDVRPSRC to construct *<sup>π</sup>CP*(*<sup>s</sup> ak k* ) (*sk*) on the go given the current rollout state *sk* and this results in *η πCP*(*<sup>s</sup> <sup>k</sup>*) (*sk*). Since here the policy is to apply CPLEX, we denote that the policy that the decision rule follows is of *πCP*(*<sup>s</sup> ak k* ) .

A detailed explanation on this matheuristic is given in [15] (Algorithm 2). The authors used this exact configuration (with *CPLEX* as the base heuristic) as a benchmark where tractable. As a solution approach in general, the Algorithm 3 is referred. The authors first introduced the TBIH-1 heuristic (Algorithm 4) based on a pure random insertion for the non-obvious decisions. Furthermore, other constructive heuristics were applied dynamically (TBIH-2 and TBIH-3), extended from TBIH-1, to construct the decision rule on the go ( *<sup>π</sup>*B(*<sup>s</sup> ak k* ) ) as shown in Algorithm 3, in contrast to Algorithm 2. Additionally from these constructive heuristics, the authors propose another two new variants (TBIH-4 and TBIH-5) for this problem by introducing the exploitation mechanism on both TBIH-2 and TBIH-3.

**Algorithm 2** Matheuristic Extended from Algorithm 1 to Compute *a <sup>m</sup>* [15].

**Require:** *sk*, *λ*, *ak* **Ensure:** *a m* 1: **for** *am* ∈ *Om* at decision point *k* **do** 2: Initialise *n*, *k*, *Rk*(*sk*, *ak*), *B<sup>n</sup>* 3: **while** *n* ≤ *N* **do** 4: *s am <sup>k</sup>* <sup>=</sup> *<sup>S</sup>M*,*a*(*sk*, *am*) 5: *<sup>s</sup><sup>a</sup>* ← *<sup>s</sup> am k* 6: **while** *k* = *K* **do** 7: *Rk*(*sk*, *ak*) = *Rk*(*sk*, *ak*) + *λkRk*(*sk*, *ak*) 8: *sk* = *SM*,*W*(*sa*, *Wk*<sup>+</sup>1(*n*(*k* + 1))) 9: *πCPk* (*sk* ) ← *CPLEX*(*sk*) 10: *η <sup>π</sup>*B(*<sup>s</sup> <sup>k</sup>*) ← *πCPk* (*sk* ) 11: *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) : *sk* → *<sup>η</sup> <sup>π</sup>*B(*<sup>s</sup> <sup>k</sup>*) (*sk*) 12: *am* ← *<sup>π</sup>*B(*<sup>s</sup> ak k* ) (*sk*) 13: *s<sup>a</sup>* = *SM*,*a*(*sk*, *am*) 14: *k* = *k* + 1 15: **end while** 16: *<sup>B</sup>*.*n*(*π*B(*<sup>s</sup> ak k* ) , *k* + 1, *K*) ← *Rk*(*sk*, *ak*) 17: *Vam k n* (*s am <sup>k</sup>* ) = *<sup>V</sup>am k n*−1 (*s am <sup>k</sup>* ) + <sup>1</sup> *n <sup>B</sup>*.*n*(*π*B(*<sup>s</sup> ak k* ) , *<sup>k</sup>* <sup>+</sup> 1, *<sup>K</sup>*) <sup>−</sup> *<sup>V</sup>am k n*−1 (*s am k* ) 18: *n* = *n* + 1 19: **end while** 20: *<sup>f</sup>* : *am* <sup>←</sup> *<sup>V</sup>am k N* (*s am k* ) 21: **end for** 22: *a <sup>m</sup>* = arg max *am*∈*Om* (*Rk*(*sk*, *ak*) + *<sup>λ</sup><sup>k</sup> <sup>f</sup>*(*am*)) ∀(*k*, *<sup>m</sup>*) 23: return *a m*

**Algorithm 3** Compute *a <sup>m</sup>* based on [15] using other Base Heuristic, B.

**Require:** *sk*, *λ*, *ak* **Ensure:** *a m* 1: **for** *am* ∈ *Om* at decision point *k* **do** 2: Initialise *n*, *k*, *Rk*(*sk*, *ak*), *B<sup>n</sup>* 3: **while** *n* ≤ *N* **do** 4: *s am <sup>k</sup>* <sup>=</sup> *<sup>S</sup>M*,*a*(*sk*, *am*) 5: *<sup>s</sup><sup>a</sup>* ← *<sup>s</sup> am k* 6: **while** *k* = *K* **do** 7: *Rk*(*sk*, *ak*) = *Rk*(*sk*, *ak*) + *λkRk*(*sk*, *ak*) 8: *sk* = *SM*,*W*(*sa*, *Wk*<sup>+</sup>1(*n*(*k* + 1))) 9: *<sup>π</sup>*B(*sk* ) ← B(*sk*) 10: *<sup>η</sup>π*B(*sk* ) <sup>←</sup> *<sup>π</sup>*B(*sk* ) 11: *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) : *sk* → *<sup>η</sup> <sup>π</sup>*B(*<sup>s</sup> <sup>k</sup>*) (*sk*) 12: *am* ← *<sup>π</sup>*B(*<sup>s</sup> ak k* ) (*sk*) 13: *s<sup>a</sup>* = *SM*,*a*(*sk*, *am*) 14: *k* = *k* + 1 15: **end while** 16: *<sup>B</sup>*.*n*(*π*B(*<sup>s</sup> ak k* ) , *k* + 1, *K*) ← *Rk*(*sk*, *ak*) 17: *Vam k n* (*s am <sup>k</sup>* ) = *<sup>V</sup>am k n*−1 (*s am <sup>k</sup>* ) + <sup>1</sup> *n <sup>B</sup>*.*n*(*π*B(*<sup>s</sup> ak k* ) , *<sup>k</sup>* <sup>+</sup> 1, *<sup>K</sup>*) <sup>−</sup> *<sup>V</sup>am k n*−1 (*s am k* ) 18: *n* = *n* + 1 19: **end while** 20: *<sup>f</sup>* : *am* <sup>←</sup> *<sup>V</sup>am k N* (*s am k* ) 21: **end for** 22: *a <sup>m</sup>* = arg max *am*∈*Om* (*Rk*(*sk*, *ak*) + *<sup>λ</sup><sup>k</sup> <sup>f</sup>*(*am*)) ∀(*k*, *<sup>m</sup>*) 23: return *a m*

#### *4.2. Teach Base Insertion Heuristic (TBIH-1)*

In this section, the base heuristics applied are described in general (Algorithm 4), and the elaboration of each is described in the subsections that follow. TBIH-1, TBIH-2, TBIH-3, TBIH-4, and TBIH-5 are the heuristics applied in this work to both validate the model and to cross-compare the performance for each of the models.

The algorithm for each heuristic applied here follows the same main structure of: (i) the teaching part (TP) and (ii) the seeking part (SP).

In the TP, the obvious decisions are chosen without running any heuristics to search for the best next destination. These obvious decisions are stated in Equation (6) and applied to each vehicle:


**Algorithm 4** TBIH-1 and General Structure Algorithm. (Note: |*M <sup>k</sup>*| ≤ 1 in rollout). **Require:** *sk*, *M <sup>k</sup>*, *M*\*M k* **Ensure:** *Decision* during lookahead 1: update *qm*, ∀*m* ∈ *M* and *wh*, ∀*h* ∈ *H* as in Equation (10) and Equation (12) 2: unserved shelters vector, *US* = [*h*]∀*h*∈*H*:*wh* <sup>=</sup><sup>0</sup> 3: **for** *m* ∈ *M <sup>k</sup>* **do** 4: i = *lm* 5: potential next destination vector, *next* = [*h*]∀*h*∈*H*:*ri*,*h*∈*E*,*ri*,*h*><sup>0</sup> 6: init empty list *nextS*, *nextD*, *nextND*, *Decision* 7: *nextS* = [*i* : *i* ∈ (*next* ∩ *US*)] 8: *nextD* = [*i* : *i* ∈ (*next* ∩ *D*)] 9: *nextND* = [*i* : *i* ∈ (*next* ∈ *D*)] 10: **if** *qm* = 0 AND len(*nextS*!=0) **then** 11: **if** len(*nextS*)!=1 **then** 12: *am* = random.choice(*nextS*) 13: **while** len(*nextS*)!=0 **do** 14: **if** *ri*,*am* > 0 **then** 15: Decision.append(*am*) 16: *ri*,*am* = *max*(*ri*,*am* − 1, 0) 17: break 18: **else** 19: *nextS*.remove(*am*) 20: **if** len(*nextS*)!=0 **then** 21: *am* = random.choice(*nextS*) 22: **else** 23: continue 24: **end if** 25: **end if** 26: **end while** 27: **if** *Decision*!= 0 **then** 28: break 29: **else** 30: continue 31: **end if** 32: **else** 33: *am* = random.choice(*nextS*) 34: **if** *ri*,*am* > 0 **then** 35: Decision.append(*am*) 36: *ri*,*am* = *max*(*ri*,*am* − 1, 0) 37: break 38: **else** 39: continue 40: **end if** 41: **end if** 42: **else if** *qm* == 0 AND len(*nextD*!=0) **then** 43: **if** len(*nextD*)!=1 **then** 44: *am* = random.choice(*nextD*) 45: **while** len(*nextD*)!=0 **do** 46: **if** *ri*,*am* > 0 **then** 47: Decision.append(*am*) 48: *ri*,*am* = *max*(*ri*,*am* − 1, 0) 49: break




These decisions are considered obvious decisions where computation efforts should not be focused on. Instead, only when all of the above TP decisions are not applicable (no obvious decisions), will the heuristic be applied. Ideally, none of these obvious decisions should be specified. Instead, the agent should be able to figure out and learn the obvious decision based on the reward obtained. However, such an ideal mechanism would require a humongous amount of rollout episodes with horizons extending to the termination state on each of them. For practical purposes, limited by computation power, the obvious decisions are filtered out to avoid extensive computation efforts. Hence the term "teach" in TBIH-1's "teaching part" (TP).

For the SP, a purely random selection of the next destinations could be applied as in TBIH-1 (Algorithm 4 (line 122)). In the proposed TBIH-1, the obvious decisions are inserted by "teaching". The non-obvious decision is decided by a purely random selection among possible next destinations. The general structure of TBIH-1 is described in Algorithm 4 where the SP part is shown in line (121–147).

The TP consists of updating the capacity of all the vehicles as well as the demand of the shelters (Algorithm 4 (line 1)). This is performed so that the decision selected is based on the current status of the demand and capacity which are otherwise updated/observed by the agent during the transition from PRE to PDS. The next part involves determining the potential next destinations *j* for vehicle *m* and sorting these destinations whether they are emergency shelters, depots or non–depot nodes (line 5–9). Afterwards, the obvious decision selection follows as described in Equation (6) (line 10–120). The SP is executed if none of the obvious decisions are suitable (line 121–147). For en route vehicles, the decision is to remain at their current destination (line 149–153). Finally, a decision is selected and returned by the algorithm.

Instead of a purely random selection as in Algorithm 4 (line 121–147), there could be more meaningful guided approaches to select the next destination for the SP. From here, an extend Algorithm 4 (line 121–147) shows the possibilities of inserting a better possible next destination in the route by applying a dynamic SIH-I1 (DSIH) (Algorithm 5) in TBIH-2 and a DCW in TBIH-3 (Algorithm 6), in their respective SP. The authors also experimented with the proposed heuristics TBIH-4 (with an embedded Dynamic Lookahead SIH (DLASIH) in the SP) (Algorithm 7) and TBIH-5 (with an embedded Dynamic Lookahead CW (DLACW) in the SP) (Algorithm 8) to see if both aforementioned heuristics could be enhanced further for better insertion.

#### *4.3. TBIH-2*

Among the first to develop SIH is [76], whose work is based on the generalised savings algorithm. Ref. [77] then introduced three types of SIH to solve the VRP and scheduling problem with a time window. The proposed SIH (I1) constructs a route by considering two criteria: the first involves determining the best place for insertion, *c*<sup>1</sup> based on *c*1,1 and *c*1,2. The second is the consideration for the best un-routed node *υ* to be inserted *c*2. For the VRP considering time windows, the SIH (I1) is computed with the following equations [77]:

$$c\_{1,1}(i\_\prime\upsilon\_\prime j) = c\_{i\_\prime\upsilon} + c\_{\upsilon\_\prime j} - \xi c\_{i\_\prime j} \quad \xi \ge 0,$$

$$c\_{1,2}(i\_\prime\upsilon\_\prime j) = b\_{j\_\upsilon} - b\_{j\_\prime}$$

where *c*1,1 is the generalised savings, and *c*1,2 is the time difference between the new service time for *j*, *bj<sup>υ</sup>* and the time prior to insertion of *υ*. Together, the best insertion place of *υ* is computed as *c*(*i*(*υ*), *υ*, *j*(*υ*)) given by:

$$c(i(\upsilon), \upsilon\_\prime j(\upsilon)) = \min\_{(i\_\prime \upsilon\_\prime j) \in c\_1} c\_1(i\_\prime \upsilon\_\prime j)\_{\prime\prime}$$

where *c*<sup>1</sup> is given as:

$$c\_1 = \theta\_1 c\_{1,1}(i\_\prime \upsilon\_\prime j) + \theta\_2 c\_{1,2\prime} \quad \theta\_1 + \theta\_2 = 1.$$

Meanwhile, the best *υ* insertion criterion *c*<sup>2</sup> is given by:

$$c\_2(i, \upsilon, j) = \zeta c\_{0, \upsilon} - c\_1(i, \upsilon, j), \quad \zeta \ge 0,$$

where node 0 in the formulation is the depot; *υ* is then chosen based on:

$$\upsilon^\* : (i, \upsilon^\*, j) = \underset{(i, \upsilon^\*, j) \in c\_2}{\text{arg}\, \text{max}} \quad c\_2.$$

Since the MDDVRPSRC does not consider time windows, the *θ*<sup>2</sup> value is given the value of zero and therefore *c*1,2 is not considered. This turns *c*<sup>1</sup> into a generalisation of [76]. Furthermore, both *ζ* and *θ*<sup>1</sup> are given the value of one. Both node 0 and *i* are considered as the current position of vehicle *m* at *sk* during the lookahead. In the DSIH, the seed *j* is chosen randomly by looking one step ahead beyond the next destination of *m* currently at *i* in a set such that {*j* : (*υ*, *j*) ∈ *E*,*rυ*,*<sup>j</sup>* > 0, *j* = *i*, *j* ∈ (set of the immediate neighbor of *i*), ∀*j* ∈ *H*, ∀*υ* ∈ (set of the immediate neighbor of *i*)}. This is illustrated in Figure 1a–c. Here, the road capacity is not considered to simplify the description of the DSIH. In Figure 1a, the current position of vehicle *m* is at node 0 (not a depot) with capacity to serve. Node S5 is an emergency shelter with demand, while the rest are connecting nodes. The purpose of the DSIH is to treat the next possible destination from the current position as *υ* by treating *j* as the seed when constructing a route from node 0. In Figure 1b, the node *j* is identified as node S9, node 4, and node 3. In the DSIH, the seed *j* is then chosen randomly among these three potential seeds. In Figure 1c, node 3 is chosen randomly as the seed *j*. Here, two possible nodes could be inserted as the *υ*: node 1 and node 5. After applying the SIH(I1) (without time window consideration, *θ*<sup>1</sup> and *ζ* is given the value one), node 5 is considered the best inserted node and the route *ηπ*B(*sk* ) is constructed from node 0–5–3. The next destination of the vehicle from node 0, *am* <sup>=</sup> *<sup>η</sup>π*B(*sk* )(*sk*) is then selected as node 5. This also means that at the lookahead decision point *k*, the lookahead route *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) is

constructed on the go/updated such that *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) : *sk* <sup>→</sup> *<sup>η</sup>π*B(*sk* )(*sk*) with heuristic <sup>B</sup> as per TBIH-2. In addition, when applying TBIH-2, the route-based MDP concept is applied [14]. This means that at the lookahead decision point *k* + 1, the vehicle may not necessarily move to node 3 (*ηπ*B(*sk* )(*sk*<sup>+</sup>1)) next upon arriving at node 5 as the DSIH will be executed at every decision point. In this example, the road capacity is ignored to simplify the explanation of the DSIH that is used in the SP of TBIH-2. In reality, if the edge (0, 1) has no road capacity available (*r*0,1 = 0), then SIH(I1) will not be applied as the only possible *υ* is node 5.

**Figure 1.** Performing the DSIH in TBIH-2 in an example network (**a**) with node 0 as current position of vehicle *m* and node S9 as an emergency shelter. The potential seeds *j* are considered in (**b**) and chosen randomly in (**c**). As a result, node 5 (*υ*) is inserted and route (0–5–3) is constructed.

The decision selection specific to the DSIH is highlighted in Algorithm 5. Here, the possible seed candidates are selected based on the neighbours of potential destination *υ* for vehicle *m* (lines 3–5). The seed is then randomly selected (line 7) from which the number of potential destinations *υ* is reduced (line 8). For each of the possible *υ*, the insertion criteria are evaluated according to the SIH(I1) (lines 9–10), and the insertion of *υ* is determined (lines 11–12) from which the decision is made and returned (line 18).


#### *4.4. TBIH-3*

The CW is derived from the work of [78] as an alternative heuristic solution to the method proposed in [79] to solve the general VRP [80]. This algorithm which is also known as the savings algorithm [81] is based on the savings computed for two non-depot nodes (*i*, *j*) in a complete graph (where all non-depot nodes have arcs connecting them to the depot). The savings is computed as S*i*,*<sup>j</sup>* = 2*ci*,0 + 2*cj*,0 − (*ci*,0 + *cj*,0 + *ci*,*j*) where 0 is the depot [81]. The route is then constructed based on the savings computed for all non-depot nodes in a decreasing order, provided that the capacity constraint is respected and the connection between edges are allowed (normally the theoretical application onto a complete graph). In the event that the capacity constraint is violated, a new route is constructed for the next vehicle in the same manner of the remaining savings pair. The concept of the CW algorithm is illustrated in Figure 2 [82]. One of the few surveys on the CW algorithm for VRP can be referred to in [83]. A good example of the CW application can be referred to in [84].

**Figure 2.** Concept of the Savings Algorithm. Reprinted/adapted with permission from Ref. [82]. 2015, ProQuest Information and Learning Company.

In this work, the application of the Dynamic Clarke and Wright Algorithm (DCW) is proposed and applied. By replacing the random selection of the TBIH-1 in the SP with the DCW, TBIH-1 is then modified to form TBIH-3 and is used as the base heuristic (among other heuristics) in the execution of the PDS–RA. In the DCW, the route-based MDP approach [14] is adopted during the rollout resulting in the on-the-go construction of the decision function *<sup>π</sup>*B(*<sup>s</sup> ak k* ) . The idea is to apply the CW iteratively (*π*B(*sk* )) when a decision for a single vehicle's next assignment during the lookahead is required, given that the TBIH-3 has been selected as the base heuristic. Iteratively, the CW is applied when no obvious decision can be taken during the lookahead when transitioning. At the point of decision, the current position *lm* is regarded as the single depot in the CW while the neighbouring nodes *j* ∈ *Om* : *Om* ⊂ *H* are considered customers. The example for applying the DCW is illustrated in Figure 3a,b where node 3 is the current arrival spot of vehicle *m*. Additionally, a shelter in the network example is denoted as the node S2. Using the CW, route *ηπ*B(*sk* ) is constructed from the assumed depot (node 3) and back to the depot.

An example of a constructed route *<sup>η</sup>π*B(*sk* ) could be (3 <sup>−</sup> <sup>5</sup> <sup>−</sup> <sup>1</sup> <sup>−</sup> <sup>4</sup> <sup>−</sup> 3) or (3 <sup>−</sup> <sup>4</sup> <sup>−</sup> 1 − 5 − 3), or both if both edges (5,1) and (4,1) happen to have the highest savings. *am* would then be chosen as the first insertion *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) : *sk* <sup>→</sup> *<sup>η</sup>π*B(*sk* )(*sk*) of the chosen route to transition within the lookahead horizon.

The algorithm for the TBIH-3 is shown in Algorithm 6 as the extension of Algorithm 4 by replacing lines 121–147. In this algorithm, a decision is computed if no other obvious decision can be chosen. To execute the CW, all possible pairs (*j*, *k*) ∈ *E* are detected from the possible next destination nodes in the list *nextND* (lines 3–4). If there are no edges that exist, a randomly selected node is chosen as the next destination for the vehicle *m* (lines 5–9). Otherwise, the savings are computed from these edges and sorted in decreasing order (lines 11–12) prior to constructing the temporary decision function *ηπ*B(*sk* ) (line 16) which constructs the on-the-go base decision function *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) : *sk* <sup>→</sup> *<sup>η</sup>π*B(*sk* )(*sk*) (line 17). Lines 148–156 are similar to Algorithm 4 where the computed decision is returned.

**Figure 3.** Performing DCW in TBIH-3 in an example network (**a**) with node 3 as current position of vehicle *m* and node S2 as an emergency shelter. The components for performing CW are selected in (**b**).

#### **Algorithm 6** TBIH-3 Algorithm.

**Require:** *sk*, *M <sup>k</sup>*, *M*\*M <sup>k</sup>*, current position of vehicle **Ensure:** *Decision* 1: Perform Algorithm 4 (line 1–120) 2: **if** len(*nextND*) > 1 **then** 3: Possible edges from the pair that could be formed in *nextND*, *Pairs* = ( *nextND* <sup>2</sup> ) 4: Remove (*j*, *k*) ∈ *Pairs*, if (*j*, *k*) ∈ *E* 5: **if** len(*pairs* == 0) **then** 6: *am* = random.choice(*nextND*) 7: *Decision*.append(*am*) 8: *ri*,*am* = *max*(*ri*,*am* − 1, 0) 9: break 10: **else** 11: compute savings: (*j*, *k*) ← *ci*,*<sup>j</sup>* + *ci*,*<sup>k</sup>* − *cj*,*<sup>k</sup>* ∀(*j*, *k*) ∈ *Pairs* 12: sort (*j*, *k*) ∈ *Pairs* with decreasing savings 13: **if** len(*Pairs*)==1 **then** 14: *am* = *j* : (*j*, *k*) ∈ *Pairs* 15: **else** 16: construct route *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) : *sk* <sup>→</sup> *<sup>η</sup>π*B(*sk* )(*sk*) from *<sup>i</sup>* <sup>=</sup> *lm* by inserting (*j*, *<sup>k</sup>*) <sup>∈</sup> *Pairs* as would be done in CW (decreasing savings) 17: *am* ← *A <sup>π</sup>*B(*<sup>s</sup> ak k* ) (*sk*) 18: **end if** 19: *Decision*.append(*am*) 20: *ri*,*am* = *max*(*ri*,*am* − 1, 0) 21: break 22: **end if** 23: **else** 24: *Decision*.extend(*nextND*) 25: break 26: **end if** 27: continue Algorithm 4 (148–156) **return** *Decision*

#### *4.5. TBIH-4*

The application of the SIH in the MDDVRPSRC for the rollout is quite clear, given the chosen "seed customer" is the main driver of the method. In the MDDVRPSRC, the authors concur with [85] that choosing an appropriate seed is very important for insertion heuristics. For this particular problem of depending on the capacity status of vehicle *m*, the seed is either the depot in the route to replenish capacity or the emergency shelter in the route to serve a shelter. This is different from Algorithm 5 where the seed is randomly chosen based on potential destinations *υ*'s neighbour. In Figure 1a–c, it is clear that more can be done to guide the vehicles towards fulfilling their task. In these figures, it could be argued that the selection of node 5 (while ignoring the road capacity condition in the network), which might occur through the random selection of seeds that occurs in Algorithm 5, may not be the best choice. Instead, node 1 could be the better choice as it would lead to serving node S9. In the TBIH-4 (Algorithm 7), the potential seeds *j* are reserved only for those which are either one of the depots, emergency shelters, or neighbours of either. In Figure 1a–c, node 1 will be chosen instead as *υ* since this node would lead to serving shelter S9. Only node 1 could be inserted as only a one-time insertion procedure in DLASIH is performed at every decision point in the lookahead during the construction of the route. For the shelter or depot which is farther than a one-step lookahead, as seen in Figure 4a–c, the neighbour of either that shelter or depot is then chosen as the seed *j*. In this case where the vehicle is packed with delivery supplies, node 9 shall be chosen as the seed (*j*) since it is the neighbour of shelter S7. Since only node 1 can connect to node 9 from node 0 in the one-time insertion, node 1 is then regarded as the next destination *υ*. Here, no SIH(I1) computation is necessary as the option is rather obvious. In this illustration case, recognising node 9 as the neighbour of emergency shelter S7 helped in trimming down potential seeds to be considered and thus also reduced the number of potential *υ*.

**Figure 4.** Performing the DLASIH in the TBIH-4 in an example network (**a**) with node 0 as current position of vehicle *m* ready to serve and node S7 as an emergency shelter. The potential seeds *j* are considered in (**b**) and Node 9 was selected since it is the neighbour of shelter S7 in (**c**). As a result, node 1 (*υ*) is inserted and route (0–1–9–S7) is constructed. (**c**) shows two potential seeds *j* (node 4 and node 9) in which case a random selection between node 4 and node 9 as a seed is done.

However, if there are more potential *υ* leading to the neighbours of emergency shelters, then one of these neighbours will be chosen randomly. If more than one possible *υ* is connected to the chosen seed (*j*), the SIH(I1) will be executed.

In the TBIH-4 algorithm (Algorithm 7), the selection for *j* is restricted to those that would lead to either a shelter or depot, depending on *qm* (lines 5 and 55). However, if such nodes are not available, another lookahead is performed to see if there are potential destinations *j* that could lead to neighbours of either a depot or shelter (lines 21 and 71). Depending on the case considered, the numbers of possible *j* from which the seed for the SIH could be chosen from (lines 9, 37, 59, and 94) can be reduced. For either case, the insertion criteria are computed and evaluated such that the on-the-go lookahead route *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) is updated: *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) : *sk* <sup>→</sup> *<sup>η</sup>π*B(*sk* )(*sk*), and the decision *am* <sup>=</sup> *<sup>η</sup>π*B(*sk* )(*sk*) are returned (line 98 onwards).

**Algorithm 7** TBIH-4 (with an embedded DLASIH) Algorithm.

**Require:** *sk*, *M <sup>k</sup>*, *M*\*M <sup>k</sup>*, current position of vehicle **Ensure:** *Decision* 1: Perform Algorithm 4 (line 1–120) 2: **if** len(*nextND*)> 1 **then** 3: *dict* : *υ* ← {*j* : ∀*j* ∈ *H*,(*υ*, *j*) ∈ *E*,*rυ*,*<sup>j</sup>* > 0, *j* = *i*, *j* ∈ *nextND*} ∀*υ* ∈ *nextND* 4: **if** *qm* = 0 AND len(*US*)!= 0 **then** 5: *dictA* : *υ* ← {*j* : ∀*j* ∈ *US*,(*υ*, *j*) ∈ *E*,*rυ*,*<sup>j</sup>* > 0, *j* = *i*, *j* ∈ *nextND*} ∀*υ* ∈ *nextND* 6: **if** len(*dictA*)!= 0 **then** 7: *newnextNDS* = {*j* : ∀*j* ∈ *dictA*(*υ*), ∀*υ* ∈ *nextND*, *dictA*(*υ*) = {} 8: remove duplicate node: list(set(*newnextNDS*)) 9: *seed* = random.choice(*newnextNDS*) 10: list of potential inserted nodes based on selected *seed*, *InsertList* = {*υ* : ∀*υ* ∈ *nextND*,(*υ*,*seed*) ∈ *E*} 11: *dictC*1 : (*i*, *υ*,*seed*) ← (*ci*,*<sup>υ</sup>* + *cυ*,*seed* − *ci*,*seed*), ∀*υ* ∈ *InsertList* 12: *dictC*2 : (*i*, *υ*,*seed*) ← (*λci*,*seed* − *dictC*1(*i*, *υ*,*seed*)), ∀*υ* ∈ *InsertList* 13: choose *υ* from (*i*, *υ*,*seed*) = arg max (*i*,*υ*,*seed*)∈*dictC*2 (*dictC*2) 14: *am* = *υ* 15: *Decision*.append(*am*) 16: *ri*,*am* = *max*(*ri*,*am* − 1, 0) 17: break 18: **else** 19: List of shelter's neighbours, *USN* = {*n* : *n* ∈ *H*,(*s*, *n*) ∈ *E*, ∀*s* ∈ *US*, *n* = *i*} 20: *USN* = list(set(*USN*)) 21: *dictB* : *υ* ← {*j* : ∀*j* ∈ *USN*,(*υ*, *j*) ∈ *E*,*rυ*,*<sup>j</sup>* > 0, *j* = *i*, *j* ∈ *nextND*} ∀*υ* ∈ *nextND* 22: **if** len(*dictB*)!=0 **then** 23: *LNS* = {*j* : ∀*j* ∈ *dictB*(*υ*), ∀*υ* ∈ *nextND*, *dictB*(*υ*) = {} 24: *LNS* = list(set(*LNS*)) 25: **else** 26: continue 27: **end if** 28: init *nextNDS* 29: **if** len(*LNS*)== 0 **then** 30: *nextNDS* = {*j* : ∀*j* ∈ *dict*(*υ*), ∀*υ* ∈ *nextND*, *dict*(*υ*) = {} 31: **else** 32: *nextNDS*.extend(*LNS*) 33: **end if** 34: **if** len(*nextNDS*)==0 **then** 35: *am* = random.choice(*nextND*) 36: *Decision*.append(*am*) 37: *ri*,*am* = *max*(*ri*,*am* − 1, 0) 38: break 39: **else** 40: pass 41: **end if**

**Algorithm 7** *Cont*. 42: *seed* = random.choice(*nextNDS*) 43: list of potential inserted nodes based on selected *seed*, *InsertList* = {*υ* : ∀*υ* ∈ *nextND*,(*υ*,*seed*) ∈ *E*} 44: *dictC*1 : (*i*, *υ*,*seed*) ← (*ci*,*<sup>υ</sup>* + *cυ*,*seed* − *ci*,*seed*), ∀*υ* ∈ *InsertList* 45: *dictC*2 : (*i*, *υ*,*seed*) ← (*λci*,*seed* − *dictC*1(*i*, *υ*,*seed*)), ∀*υ* ∈ *InsertList* 46: choose *υ* from (*i*, *υ*,*seed*) = arg max (*i*,*υ*,*seed*)∈*dictC*2 (*dictC*2) 47: *am* = *υ* 48: *Decision*.append(*am*) 49: *ri*,*am* = *max*(*ri*,*am* − 1, 0) 50: break 51: **end if** 52: **else** 53: *dictC* : *υ* ← {*j* : ∀*j* ∈ *D*,(*υ*, *j*) ∈ *E*,*rυ*,*<sup>j</sup>* > 0, *j* = *i*, *j* ∈ *nextND*} ∀*υ* ∈ *nextND* 54: **if** len(*dictC*)!= 0 **then** 55: *newnextNDS* = {*j* : ∀*j* ∈ *dictC*(*υ*), ∀*υ* ∈ *nextND*, *dictC*(*υ*) = {} 56: remove duplicate node: list(set(*newnextNDS*)) 57: *seed* = random.choice(*newnextNDS*) 58: list of potential inserted nodes based on selected *seed*, *InsertList* = {*υ* : ∀*υ* ∈ *nextND*,(*υ*,*seed*) ∈ *E*} 59: *dictC*1 : (*i*, *υ*,*seed*) ← (*ci*,*<sup>υ</sup>* + *cυ*,*seed* − *ci*,*seed*), ∀*υ* ∈ *InsertList* 60: *dictC*2 : (*i*, *υ*,*seed*) ← (*λci*,*seed* − *dictC*1(*i*, *υ*,*seed*)), ∀*υ* ∈ *InsertList* 61: choose *υ* from (*i*, *υ*,*seed*) = arg max (*i*,*υ*,*seed*)∈*dictC*2 (*dictC*2) 62: *am* = *υ* 63: *Decision*.append(*am*) 64: *ri*,*am* = *max*(*ri*,*am* − 1, 0) 65: break 66: **else** 67: List of Depot's neighbours, *UDN* = {*n* : *n* ∈ *H*,(*d*, *n*) ∈ *E*, ∀*d* ∈ *D*, *n* = *i*, *n* ∈ *D*} 68: *UDN* = list(set(*UDN*)) 69: *dictD* : *υ* ← {*j* : ∀*j* ∈ *UDN*,(*υ*, *j*) ∈ *E*,*rυ*,*<sup>j</sup>* > 0, *j* = *i*, *j* ∈ *nextND*} ∀*υ* ∈ *nextND* 70: **if** len(*dictD*)!=0 **then** 71: *LND* = {*j* : ∀*j* ∈ *dictD*(*υ*), ∀*υ* ∈ *nextND*, *dictD*(*υ*) = {} 72: *LND* = list(set(*LND*)) 73: **else** 74: continue 75: **end if** 76: init *nextNDD* 77: **if** len(*LND*)== 0 **then** 78: *nextNDD* = {*j* : ∀*j* ∈ *dict*(*υ*), ∀*υ* ∈ *nextND*, *dict*(*υ*) = {} 79: **else** 80: *nextNDD*.extend(*LND*) 81: **end if**


#### *4.6. TBIH-5*

would make more sense.

In the previous section (Section 4.4), an example network (Figure 3a) is used to demonstrate how the DCW could be applied in constructing a temporary route *<sup>η</sup>π*B(*sk* ) with <sup>B</sup> being the heuristic from TBIH-3. From the temporary route, the first insertion is selected as the decision for the current lookahead state *sk* based on the temporary route constructed: *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) : *sk* <sup>→</sup> *<sup>η</sup>π*B(*sk* )(*sk*). From the example, it is seen that either node 4 or 5 (Figure 3b) could be selected as the next destination since two routes could be computed from the CW heuristic (if two edges have similar highest savings). However, it is also seen in the example network that node S2 is next to node 4, while node 5 is much further than node S2. If the vehicle is with capacity, inserting node 4 for the on-the-go lookahead route *<sup>π</sup>*B(*<sup>s</sup> ak k* )

If TBIH-3 (with an embedded DCW) could perform as a sort of lookahead (for non-

obvious decisions, SP) for a nearby emergency shelter when a vehicle *m* has capacity, then the selection for the next destination would be more accurate. This is illustrated in Figure 5a,b. In Figure 5a, the current position of vehicle *m* is at node 3, and the nearby emergency shelter is node S9. With the exception of node 5, which is the neighbour of node 3, both nodes 1 and 4 are neighbours of node S9. As a result, only nodes 4 and 1 are considered when constructing *<sup>η</sup>π*B(*sk* )(*sk*) even though node 5 is also a neighbour of node 3. This leads to a more promising construction of *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) on the go. If node 5 is taken into consideration, there is a possibility of node 5 being selected as the next destination for vehicle *m*. This is undesirable as that would lead vehicle *m*, which has capacity, farther from serving S9. With this concept, the DCW is extended into DLACW (turning TBIH-3 into TBIH-5). The principle of the proposed DLACW is, for most parts, similar with the exception of a mechanism to detect a nearby shelter or depot depending on the capacity status of vehicle *m*.

The algorithm for TBIH-5 is presented in Algorithm 8. Similar to Algorithm 5, Algorithm 6 is extended ,resulting in a different base heuristic. When compared to Algorithm 6, some parts of this algorithm consist of detecting the potential next destination *j* of vehicle *m* that might lead to either a shelter of depot, depending on the current capacity status *qm* (lines 3–9). Through this mechanism, the possible option for *j* is restricted to only those ideally more guided destinations. Edges are detected (line 10) and removed if they do not exist in the network (line 11), while savings are computed (line 18) and sorted (line 19). Finally the on-the-go base policy *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) is updated for the lookahead state *sk*, where *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) : *sk* <sup>→</sup> *<sup>η</sup>π*B(*sk* )(*sk*) and the decision *am* <sup>=</sup> *<sup>η</sup>π*B(*sk* )(*sk*) is returned.

## **Algorithm 8** TBIH-5 (with an embedded DLACW) algorithm.

**Require:** *sk*, *M <sup>k</sup>*, *M*\*M <sup>k</sup>*, current position of vehicle **Ensure:** *Decision* 1: Perform Algorithm 4 (line 1–120) 2: **if** len(*nextND*)> 1 **then** 3: **if** *qm* = 0 AND len(*US*)!= 0 **then** 4: *dict* : *j* ← {*k* : ∀*k* ∈ *US*,(*j*, *k*) ∈ *E*,*rj*,*<sup>k</sup>* > 0, *k* = *i*, *k* ∈ *nextND*} ∀*j* ∈ *nextND* 5: *newnextND* = {*j* : ∀*j* ∈ *nextND*, *dict*(*j*) = {}} 6: **else** 7: *dict* : *j* ← {*k* : ∀*k* ∈ *D*,(*j*, *k*) ∈ *E*,*rj*,*<sup>k</sup>* > 0, *k* = *i*, *k* ∈ *nextND*} ∀*j* ∈ *nextND* 8: *newnextND* = {*k* : ∀*k* ∈ *dict*(*j*), ∀*j* ∈ *nextND*, *dict*(*j*) = {}} 9: **end if** 10: Possible edges from pair that could be formed in *newnextND*, *Pairs* = ( *newnextND* <sup>2</sup> ) 11: Remove (*j*, *k*) ∈ *Pairs*, if (*j*, *k*) ∈ *E* 12: **if** len(*pairs* == 0) **then** 13: *am* = random.choice(*nextND*) 14: *Decision*.append(*am*) 15: *ri*,*am* = *max*(*ri*,*am* − 1, 0) 16: break 17: **else** 18: compute savings, (*j*, *k*) :← *ci*,*<sup>j</sup>* + *ci*,*<sup>k</sup>* − *cj*,*<sup>k</sup>* ∀(*j*, *k*) ∈ *Pairs* 19: sort (*j*, *k*) ∈ *Pairs* with decreasing savings 20: **if** len(*Pairs*)==1 **then** 21: *am* = *j* : (*j*, *k*) ∈ *Pairs* 22: **else** 23: construct route *<sup>π</sup>*B(*<sup>s</sup> ak <sup>k</sup>* ) : *sk* <sup>→</sup> *<sup>η</sup>π*B(*sk* )(*sk*) from *<sup>i</sup>* <sup>=</sup> *lm* by inserting (*j*, *<sup>k</sup>*) <sup>∈</sup> *Pairs* as would be done in CW (decreasing savings) 24: *am* ← *A <sup>π</sup>*B(*<sup>s</sup> ak k* ) (*sk*) 25: **end if** 26: *Decision*.append(*am*) 27: *ri*,*am* = *max*(*ri*,*am* − 1, 0) 28: break 29: **end if** 30: **else** 31: *Decision*.extend(*nextND*) 32: break 33: **end if** 34: continue Algorithm 4 (148–156)

**return** *Decision*

**Figure 5.** Performing the DLACW in TBIH-5 in an example network (**a**) with node 3 as the current position of vehicle *m* and node S9 as an emergency shelter. The components for performing the CW are selected in (**b**).

#### **5. Computational Results**

This study presents computational results for the following purposes:


The experiment is conducted using the authors' developed MDDVRPSRC Decision Support System (DSS) program (Figure 6) with codes written in Python 2.7 programming language. Embedded in this program is also a network monitoring layout through which the simulation of medical supplies delivery operations in the setting of the MDDVRPSRC is observed in real time (live simulation). Both the MDDVRPSRC model and the computation of the agent's decision are also implemented at the heart of the DSS. As part of the computation of the agent's decision, this program also executes the matheuristic (upon selection) proposed in work of [15] with CPLEX computation executed through the DOCPLEX API for Python. For the experiment, the MDDVRPSRC DSS is run on a laptop computer running on Intel*<sup>R</sup>* CoreTM i7-7500U CPU at 2.70–2.90 GHz with 20 GB RAM.

**Figure 6.** MDDVRPSRC DSS for medical delivery operation.

To test and validate the model and solutions algorithm for such a unique problem as the medical supplies delivery operations with compromising circumstances, the common benchmarks of Perl, Gaskell, and Christofides cannot be applied. So several test instances are designed [86,87], ranging from a small and simple road network, to medium and more challenging networks based on the available experimental apparatus (Figures A1–A10). Along with each test instance are the associated initial damage for each road within the network of the respective test instances [86,87]. These datasets, consisting of test instances and the associated damage files are available in [87] and are shown in Table 3. In Table 3, the test instances are ordered by increasing complexity levels, characterised in terms of total number of nodes as well as the ratio between connecting node to a depot and emergency shelter. The test instances associated with the road network D4N30S10 are placed last as it is hypothesised as the most challenging network among the test instances listed. Each network is comprised of multi-depots, multi emergency shelters with different demands and connecting nodes as described in Section 3.1. The placement of the nodes is based on the lessons learned from the 2015 Nepal earthquake.

In each of the road networks seen in Figures A1–A10, the blue, yellow, and brown nodes each represent the depots, emergency shelters, and connecting nodes, respectively. The violet circle lines represent the outward tremor that originated from an earthquake epicentre (coordinate (460, 180) in all the networks). The degree of the initial road damage is based on the intersection of these circles onto the edges, and the corresponding random road capacity is denoted in red at the center of the edges. The demands of each emergency shelter hovers above in a pink box. The green boxes represent vehicles that have arrived at the nodes where they are currently stationed. The blue boxes represent vehicles en route to each of their next assigned destinations. In Figure A11, the simulation example of an ongoing medical supplies operation is shown. In the road network D8N20S8, five vehicles are assigned to deliver medical supplies to eight emergency shelters with their respective demands. The full road capacity in this network for a city road, normal road, and highway is given in Table 3 as (6, 7, 8), respectively. In all road networks, the highways are placed at the outer part of the network, while the city roads are placed at the innermost sections of the networks. Normal roads can be found connecting highways with city roads in most cases, especially in the larger networks. At decision point 104, which is at the simulated time of 3097 min (translated as 2:3:37:00), the road capacity for each road changes randomly based on the dynamic road capacity mean for the random distribution of each road. These dynamic deteriorating road capacities in turn depend on the initial damage sustained by the road (given in the damage file of each test instance in the repository [87]). Thus the road capacity for the edges with more interceptions with the radial earthquake tremor circles are seen with a tendency to have less road capacity at random when compared with the edges with less or zero intersections. Hence, vehicles travelling at these edges will suffer longer travel times proportional to the initial damage sustained by the edges as accounted for in the MDDVRPSRC model described in Section 3.3. The work [15] is referred to for more explanation on how the random road capacity is sampled at each decision point. The experiment settings for both the simulation and computation of the agent's decision (PDS-RA) is given in Table 4.

For the model and solution validation, simulated data is compiled (Figure 6). For each proposed base heuristics (TBIH-1, TBIH-2, TBIH-3, TBIH-4, and TBIH-5) applied for all test instances in Table 3, 10 complete simulations of a medical supplies delivery operation are performed. Out of the 10 complete simulations, there are 10 readings for 4 key measurements:



**Table 3.** Simulated test instances applied to validate the model and solutions algorithm.

The first three key measurements are self explanatory. The last key measurement is the average time taken for the agent to make one decision at decision point *k* based on the total computation time divided by the number of decisions made (decision points) to complete the delivery operations simulation. The PDS-RA with the proposed heuristic bases are benchmarked with the matheuristic rollout found in the work of [15] for all vehicle number settings (4, 8, 15, 30, and 50) for the road networks D3N8S8-D7N18S7. For the road network D8N20S8-D4N30S10, however, the benchmarking is completed only up to the vehicle number settings of 4, 8, and 15. This is due to the resultant computational time which is far longer than considered reasonable when compared with the longest computation time obtained among the five proposed heuristics (Figures A14 and A18). With the resulting simulated data, the model is then validated based on the analysis of the output data produced (Figure 6). Furthermore, the performance of the proposed heuristics compared to the matheuristic rollout applied is observed through a descriptive and comparative analysis.

**Table 4.** Simulation and PDS-RA Configuration.


The computational results in the Supplementary file (Tables S1–S81) are collected and recorded for a time span of more than two years; given the hardware available for the experiments. A total of 10 readings were taken for each of the proposed base heuristics applied in the PDS-RA for all test instances. This was for all key measurements (K1–K4) given the stochastic road capacity and dynamic deterioration of the mean road capacity in the problem addressed. From each 10 readings, the descriptive analysis is performed to measure the mean, standard deviation, variance, and covariance of the sample data. The Normality test is performed to determine that a suitable comparative analysis method is applied for benchmarking. A total of 11,600 key readings were recorded as a result of 2900 simulations performed for further analysis involving the key measurements of K1, K2, K3, and K4 mentioned in Section 5. The 2900 simulations consist of 290 sets of 10 readings per set, for each of the 4 key measurements which are then used to compute the average reading. Not all 290 sets tested were found to have a normal distribution based on the Shapiro–Wilk test [88] performed in the Excel [89]. The highest percentage for normal data (around 50%) is only seen in the K3 and K4 measurements. Furthermore, the 10 readings for each key measurement of a test instance is considered small for a parametric test. As such, a non-parametric test (Wilcoxon Signed Rank Test) was applied to test for significance in differences against the matheuristic solution (PDS-RA with CPLEX as base heuristic). Moreover, the Best So Far (BSF) measurement among the solution algorithms applied at each test instance was performed to observe the performance of each PDS-RA of the respective proposed heuristics against the matheuristic rollout.

The full computation results are presented in the supplementary file and the abbreviations applied are listed in Table 5. Furthermore, the general overview of the simulated data collected is shown in Table A1. Thorough investigation and synthesise of the resulting simulation data by means of cross-referencing key values were performed to ensure that no errors are presented.

The results obtained in Tables S1–S81 are further synthesised for numerical analysis focusing on model validation and base heuristics performances. The MDDVRPSRC model is validated based on the trends and patterns observed in Figures A12, A13, A16 and A17. Meanwhile, the performance of the proposed heuristics, as compared to the matheuristic rollout, can be seen in the remaining figures between Figures A12–A27 and in the supplementary file S1 (Figures S1–S62). Figures A12–A15 show the trends for average measurements of each of the 10 sample readings based on all four key measurements, while Figures A16–A19 shows the trends for best measurements among the 10 reading samples for each key measurement. Figures A20–A23 show the total numbers of BSF counts for each algorithm for all 40 instances with the matheuristic rollout benchmark. Figures A24–A27

depict the total numbers of BSF counts in percentage for each algorithm for all 50 instances with and without (omitting 10 test instances for matheuristic rollout due to computation time limitation) the matheuristic rollout benchmark. A more detailed breakdown per test instance of the percentage of BSF associated with each heuristics is shown in Figures S1–S40. Meanwhile, Tables A2–A7 give a more detailed breakdown on numbers of the BSF counts, normal distribution data, and the significant differences for each key measurements. Finally, a detailed performance of each PDS-RA with proposed heuristics for all key measurements is shown in Figures S41–S62.

**Table 5.** Abbreviation for Tables and Figures.


#### **6. Discussion**

In terms of the MDDVRPSRC MDP model validation, the behaviours plotted in Figures A12, A13, A16 and A17 are conforming to the natural expectation on how the humanitarian operational aspects will shape out based on the key measurements. Figures A12 and A16, for example, show a logical increase of total distance travelled with the increase in the number of vehicles. Here, the increase in total distance is also attributed to the policy that all vehicles must be dispatched for delivery to compensate for the potential risk that a vehicle might be stranded while en route due to the road damage incurred. Furthermore, a stochastic road capacity with multiple dispatches of vehicles might ensure a faster delivery time at the cost of an increase in total distance travelled.

For the road network D3N8S3, the increase of total distance is higher than that of networks D4N11S4, D5N13S5, and D6N16S6. This is comparable to that of network D7N18S7 onwards with operations involving 30 and 50 vehicles. This is due to the large amount of vehicles travelling on a road network with limited roads. The random road capacity as well as deteriorating road conditions cause a bottleneck at some connecting nodes. However, a steady increase of roads in more complex networks alleviates this problem, as shown in networks D4N11S4, D5N13S5, and D6N16S6. Given the increasing demands and more complex networks, a different observation could be made.

The road networks of D7N18S7, D8N20S8, D9N25S9, and D9N30S10, for example, indicate roughly the same trend of total distance travelled with an occasional peak at about 10,000 km for networks D8N20S8 and D9N25S10. However, an obvious increase of total distance travelled can be seen for the network D4N30S10, thereby confirming the hypothesis that this network is the most complex in terms of delivery operations. This is explained by the ratio of depots to connecting nodes where the vehicle has only a limited number of depots to replenish supplies in this network as compared to the other networks. Furthermore, the ratio of depots and shelters also contributes to this observation, showing the difficulties of completing the deliveries given the smaller number of depots to replenish.

Additionally, the reduced number of depots in this network also leads to more connecting options between the depots and connecting nodes which may not necessarily be advantageous to the delivery operations. This is especially true for networks that tend to

have a shorter route disabled due to random road capacity and a dynamic reduction of the road capacity due to damage to the road. As a result, a longer route is taken leading to the increase of total distance travel by all vehicles.

All of the observations for the total distance travelled shown in Figures A12 and A16 also apply to the observations seen in Figures A13 and A17. In general, the increase of the total vehicle numbers leads to the reduced delivery operations time (total travel time). For the network D3N8S3, a limited number of vehicles in a small network with high demands relative to the number of vehicles led to an increase in total travel time compared to network D4N11S4. This is due to the longer time required by the smaller number of vehicles to satisfy the total demands within the network. Moreover, the deteriorating road capacity for each damaged road may lead to lesser road availability for an already small road network. This leads to vehicles taking the longer route compared to that of network D4N11S4.

Vehicles may also travel back and forth along the same road due to connecting roads becoming increasingly less available. The bottleneck effect is also seen for the larger number of vehicles when comparing the total time travel within the road network of D3N8S3 with the road networks of D4N11S4, D5N13S5, and D6N16S6. It is also shown clearer here that the bottleneck effect could be alleviated through trends observed for networks D7N18S7, D8N20S8, D9N25S9, and D9N30S10. Similarly the reduced ratio between depots to shelters and depots to connecting nodes leads to a more complex network. This is despite not having the highest number of nodes that contributes to a higher total travel time for some of the algorithms. Interestingly, the matheuristic rollout approach does not show the same observations. This shows the potential of the matheuristic rollout in navigating more complex networks compared to the proposed heuristics.

This, however, comes at the cost of computation time as shown in Figures A14, A15, A18, A19, A22, A23, A26 and A27. This was observed when investigating the performance of the proposed base heuristics against the matheuristic rollout as a benchmark. The total computation time increases for the matheuristic rollout applying CPLEX at every lookahead decision point for road network D5N13S5 onwards for all vehicle settings (Figures A14 and A18) when compared with the results obtained with PDS-RA applying the proposed base heuristics. This trend is even more obvious in Figures A15 and A19, showing a clear increase in computation time for the agent in making a decision on average. As a result, no BSF count was ever obtained through the matheuristic rollout for the key measurement of K3 and K4 (Figures A22, A23, A26 and A27).

Apart from showing an exponential increase for both K3 and K4 (see Figures A15 and A19), it is also obvious that this trend depends on the total number of nodes that are involved in the network. This is evident when comparing the two key measurements for networks D9N30S10 and D4N30S10. However, the road networks sharing a similar number of nodes as D4N30S10, such as D9N25S10, do not indicate a similar magnitude of increment. Therefore, it could be concluded that both the number of nodes and complexity associated with each network affect the two key measurements for the matheuristic rollout.

Meanwhile, the performance of the proposed PDS-RA applying base heuristics is further investigated through the BSF count for all instances tested. Figures A22, A23, A26 and A27 confirm the observation made for the matheuristic rollout in terms of computation time (K3 and K4). However. the matheuristic rollout shows clear dominance in terms of the key measurements of K1 and K2 (Figures A20, A21, A24 and A25). This is especially seen in the breakdown of K1 and K2 in Figures S46 and S52 which Figure S46 interestingly also show a good performance of TBIH-1 for K1. This shows the relevance of the matheuristic approach for complex stochastic problems. In most of the individual networks, the matheuristic approach seems to also perform better compared to the other proposed approaches for K1 and K2 (Figures S46 and S52). However, as it can be seen in Figures A12–A17 with the exception of Figures A14 and A15, the application of PDS-RA with proposed heuristics remains competitive with low gaps of difference. This is also supported by the statistical numerical evidence that show lower significance differences

recorded throughout all simulated data involving K1 and K2 when compared with data obtained by the matheuristic rollout (Table A1).

Of those, a vast majority of significance difference is seen in Figures A14, A15, A18 and A19 which corroborates findings in terms of computation time for K3 and K4. Judging from the trends, the practicality of the matheuristic rollout as benchmarked, is shown to be poor at least for the given hardware used for the experimentation. This is despite the good performance shown for K1 and K2, albeit with no significance difference.

On the other hand, the TBIH-1 shows clear advantage as shown from Figures A20–A27 in terms of the BSF count. This is perhaps expected considering the stochastic problem which may favour the exploratory approach more than the exploitation part, as is performed in TBIH-1 with random selection for the SP part of the algorithm. The comparable performance of PDS-RA with TBIH-1 compared to the matheuristic rollout is also shown in most of the road networks, respectively. Furthermore, TBIH-1 is seen at times neck to neck with the benchmark when looking into the performance in each individual network, such as in Figures S26, S29, S33, S34, and S36 among others. It is also noteworthy to see that the algorithm also performs rather well for the network D4N30S10, with the exception of key measurement K2. Moreover, the dominance of the TBIH-1 is increasingly more noticeable for larger networks as best seen in Figures S41, S47, S53, and S58. Meanwhile, both the TBIH-2 and TBIH-4 also perform well in the overall BSF count (as seen in Figures A20–A27) when compared to that of TBIH-3 and TBIH-5 which is based on DCW. This highlights the advantages of the DSIH which centred on the concept of inserting and placing promising nodes in ways that optimise the operation.

DCW, on the other hand, tends to ignore the inner part of the nodes and favour the outer nodes in an attempt to reduce parallel connections to the origin node as CW has always been intended for. This is evident by the performance of TBIH-3 which is the lowest followed by TBIH-2 when looking at BSF counts for both the individual network and overall networks (Figures S41–S62). Except for K3 in Figure S55, the TBIH-3 only scores a BSF count of one for all other key measurements (Figures S43, S49 and S60). This translates in a low BSF count obtained in Figures A20–A27 where the TBIH-3 is seen multiple times with BSF counts as low as 0% and 2.5% while topping at most only an 8% as seen in Figure A26. TBIH-5 shows an improved performance when compared to TBIH-3 (for K1, K2, K3, and K4 in Figures S45, S51, and S57) and TBIH-2 (except for K3 and K4) with the addition of a lookahead mechanism for selecting more promising nodes in the route. This demonstrates the strength of exploitation in the heuristics to improve selection. The TBIH-2, however, is better in terms of K3 and K4 (Figures S54 and S59), displaying the trade off for embedding such features.

Similarly the TBIH-2's performance is improved in TBIH-4 by means of an exploitation mechanism that requires a lookahead in selecting more promising nodes and filtering out those that are not. Unlike the TBIH-3 and TBIH-5, however, the gap in the computation speed between the TBIH-2 and TBIH-4 is not obvious. This shows that the TBIH-5 might be more costly to implement compared to TBIH-4 which improves on the TBIH-2 with less trade-off as seen in Figures S54, S56, S59 and S61. As such, the TBIH-4 could be considered an all-rounder with a balanced performance next to TBIH-1.

It should be noted that the PDS–RA is performed per vehicle when making a collective decision for all vehicles. Furthermore, both the number of Monte Carlo simulations and the length of the lookahead horizon shown in Table 4 could be considered low when compared to other similar work. However, the new perspective of the computing decision, as proposed in Equation (24), demands some compromise be made, especially with limited computational power available for this research. Furthermore, the method applied in this research is necessary to break the usual practice of clustering the emergency hot-spots per vehicle and then computing the routing decision afterwards. Additionally, the research for stochastic road capacity problems with additional consideration for damaged roads is very limited among reinforcement-learning-oriented research. Due to the stochastic road capacity, the resulting key measurements are highly varied as shown in the variance and

covariance measurement of each of the simulated samples collected (Tables S2–S81). Ideally, a good amount of Monte Carlo simulations of the rollout and a longer horizon for the lookahead would be best to account for such stochastic problems. A trade-off still needs to be made where the limitation of computation is a concern. If anything, this research proves that the proposed methods could be applied to a machine with limited capability to simulate, visualise, and compute decisions as a DSS for an emergency medical supplies delivery humanitarian operation.

However, more study into this research is warranted. With capable machines, the number of lookahead horizons and the number of Monte Carlo simulations should be increased. With such an increase in parameters, perhaps the TP of the algorithm could be discarded; allowing the agent a pure learning opportunity when making decisions. In the experiments here, this could not be achieved; hence the TP is needed. Furthermore, with enough Monte Carlo simulations, the highly stochastic problem concerning routing can be properly addressed. A longer horizon of the lookahead ensures better decisions in a long-term perspective.

The investigation included a one-factor experiment performed by varying the fixed number of vehicles per road networks, which is a limitation. It should be note,d however that various road networks were tested consisting of varying numbers of depots, emergency shelters, and connecting nodes. Furthermore, given the entry level machine that is utilized, this experiment (involving 2900 simulations and 11,600 measurement readings) took more than one year to complete. Given a more capable machine, factorial experiments should be performed to investigate the performance of the proposed heuristics againist the matheuristic benchmark. For example, through a factorial experiment, the existing network could be expanded into more challenging networks. In D4N30S10 for instance, it would also be interesting to see how the delivery operation with such a number of depots and an increase of connecting nodes fares with a smaller number of emergency shelters as more options for routing become available. Will the agent with the proposed solution method be able to navigate intelligently among these many options? Hence, more studies should be performed with expanded networks where the combination of ratios between depots, connecting nodes, and emergency shelters are varied. For this experiment, the vehicles are placed randomly at depots initially. This is performed to account for the degree of unpreparedness, where coordination should be planned with random accounts of assets. Hence, even though the key measurements are assessed through 10 average readings for each test instance, the initial situation for each simulation run is varied. There are two ways that this study could be expanded further: (1) to increase the number of simulations per test instance to obtain more than 10 readings for a better average reading, and (2) to apply a fixed assignment of vehicles per depot for all simulations. The latter approach, however would not account for a more realistic scenario of emergency medical supplies delivery operations. Finally, in this study, the placement of depots, connecting nodes, and emergency shelters are made such that the findings obtained from the lessons learned in the 2015 Nepal earthquake are addressed. Instead of utilising a simulated network, a more concrete simulation could be performed by applying real networks and incorporating details of the depots, connecting nodes, emergency shelters, vehicles, road damages, and road capacities during that actual disaster event. It is noted that such data is usually of a sensitive nature. However, developing a simulated network allows for flexibility when completing planning exercise and experiments.

#### **7. Conclusions**

As part of the DSS for humanitarian emergency medical supplies delivery operations, the 2015 Nepal earthquake is referred to in developing the MDDVRPSRC MDP model. The presented model focuses on the difficulty in navigating through stochastic road capacity within the compromised road network due to the ongoing tremors from the earthquake. The model also incorporates multi-depots, multi-trips, and split delivery operations. Here the conventional approach of "cluster first, route second" largely applied among related

research cannot be applied. Instead, to solve the problem, a lookahead approach of ADP is adopted, where the PDS-RA is applied. As part of the PDS-RA mechanism, five base constructive heuristics are proposed to construct the decision rule on the go dynamically and iteratively. Unlike conventional applications of the PDS-RA in VRP, this research adopted the proposed method in the work of [15] for a consecutive application of the PDS-RA for each vehicle that arrives at every decision point. The resulting individual assignment of vehicles computed collectively forms an MDP decision at every decision point.

The five proposed base heuristics are based on a decision-making strategy that consists of obvious decisions (TP) and non-obvious decisions (SP) to reduce the burden of computation. In the TBIH-1, the SP applied pure random selection for selecting a vehicle's next destination. Alternatively, the principle of constructive heuristics used in SIH(I1) and CW, (coined as DSIH and DCW, respectively) are adopted in the TBIH-2 and the TBIH-3. A lookahead exploitation mechanism is adapted to both the DCW and the DSIH, giving birth to DLASIH and DLACW which is applied in the proposed TBIH-4 and TBIH-5, respectively. These five proposed base heuristics are compared with the matheuristic proposed in the authors' previous work, [15]. Moreover, test instances were developed and made available in the repository [87]. The results presented in the supplementary file validate the model where expected behaviour is observed from the simulated operations based on four key measurements: K1, K2, K3, and K4. Furthermore, the performance of the PDS-RA applied with the proposed five base heuristics shows comparable performance for K1 and K2 with no significant difference recorded. Meanwhile, all the proposed heuristics showed superior performance for K3 and K4 when compared to the matheuristic. The results also highlight the power of exploration associated to pure random selection in the TBIH-1 in addressing a highly stochastic problem such as the MDDVRPSRC. Furthermore, the advantages of exploitation are shown in TBIH-4 and TBIH-5 when compared with the performance of TBIH-2 and TBIH-3, respectively. For problems such as the MDDVRPSRC, it would appear that the DSIH (TBIH-2) and DLASIH (TBIH-4) perform better than their counterparts: DCW (TBIH-3) and DLACW (TBIH-5).

**Supplementary Materials:** The following are available at https://www.mdpi.com/article/10.3390/ math10152699/s1.

**Author Contributions:** Conceptualisation, W.K.A., L.S.L. and S.P.; methodology, W.K.A., L.S.L. and S.P.; software, W.K.A. and L.S.L.; validation, W.K.A. and L.S.L.; formal analysis, W.K.A., L.S.L. and H.-V.S.; investigation, W.K.A. and L.S.L.; resources, W.K.A.; data curation, W.K.A. and L.S.L.; writing—original draft preparation, W.K.A., L.S.L. and H.-V.S.; writing—review and editing, W.K.A., L.S.L. and H.-V.S.; visualisation, W.K.A.; supervision, L.S.L. and S.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The primary simulated data is presented in the supplementary file. The test instances from which the simulated data is generated is found in [87].

**Acknowledgments:** This research was supported by the Ministry of Higher Education Malaysia through the Fundamental Research Grant Scheme with reference code FRGS/1/2019/STG06/UPM/02/1.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **Appendix A. Simulated Road Networks and Analysis Results**

**Figure A1.** Road network for instance D3N8S3 [15].

**Figure A2.** Road network for instance D4N11S4 [15].

**Figure A3.** Road network for instance D5N13S5 [15].

**Figure A4.** Road network for instance D6N16S6.

**Figure A5.** Road network for instance D7N18S7.

**Figure A6.** Road network for instance D8N20S8.

**Figure A7.** Road network for instance D8N22S9.

**Figure A8.** Road network for instance D9N25S10.

**Figure A9.** Road network for instance D9N30S10.

**Figure A10.** Road network for instance D4N30S10.

**Figure A11.** Example of medical supply delivery in progress for the network D8N20S20.


**Table A1.** Descriptive overall view on simulated data collected.

**Table A2.** PDS\_RA performance with TBIH-1 application as base heuristic.



**Table A2.** *Cont*.


**Table A2.** *Cont*.

**Table A3.** PDS\_RA performance with TBIH-2 application as base heuristic.



**Table A3.** *Cont.*

**Table A4.** PDS\_RA performance with TBIH-3 application as base heuristic.



**Table A4.** *Cont.*


**Table A4.** *Cont.*

**Table A5.** PDS\_RA performance with TBIH-4 application as base heuristic.



**Table A5.** *Cont.*


**Table A6.** PDS\_RA performance with TBIH-5 application as base heuristic.


**Table A6.** *Cont.*

**Table A7.** PDS\_RA performance with CPLEX application as base heuristic.



**Table A7.** *Cont.*

**Figure A12.** Algorithms performance compared to matheuristic rollout applied in PDS-RA: average total distance travelled (km) based on test instances.

**Figure A13.** Algorithms performance compared to matheuristic rollout applied in PDS-RA: average total travel time (min) based on test instances.

**Figure A14.** Algorithms performance compared to matheuristic rollout applied in PDS-RA: average total computation time (sec) based on test instances.

**Figure A15.** Algorithms performance compared to matheuristic rollout applied in PDS-RA: average decision computation time (sec) based on test instances.

**Figure A16.** Algorithms performance compared to matheuristic rollout applied in PDS-RA: best total distance travelled (km) measured based on test instances.

**Figure A17.** Algorithms performance compared to matheuristic rollout applied in PDS-RA: best total travel time (min) measured based on test instances.

**Figure A18.** Algorithms performance compared to matheuristic rollout applied in PDS-RA: best total computation time (sec) measured based on test instances.

**Figure A19.** Algorithms performance compared to matheuristic rollout applied in PDS-RA: best average decision computation time (sec) measured based on test instances.

**Figure A20.** Algorithms performance compared to matheuristic rollout applied in PDS-RA: total BSF measured for total travelled distance over all test instances applied (omitting 10 non-benchmarked measurements).

**Figure A21.** Algorithms performance compared to matheuristic rollout applied in PDS-RA: total BSF measured for total travel time over all test instances applied (omitting 10 non-benchmarked measurements).

**Figure A22.** Algorithms performance compared to matheuristic rollout applied in PDS-RA: total BSF measured for total computation time over all test instances applied (omitting 10 non-benchmarked aasurements).

**Figure A23.** Algorithms performance compared to matheuristic rollout applied in PDS-RA: total BSF measured for average decision computation time over all test instances applied (omitting 10 non-benchmarked measurements).

**Figure A24.** Algorithms performance compared to matheuristic rollout applied in PDS-RA: total BSF measured for total travelled distance over all test instances applied (including 10 non-benchmarked measurements).

**Figure A25.** Algorithms performance compared to matheuristic rollout applied in PDS-RA: total BSF measured for total travel time over all test instances applied (including 10 non-benchmarked measurements).

**Figure A26.** Algorithms performance compared to matheuristic rollout applied in PDS-RA: total BSF measured for total computation time over all test instances applied (including 10 non-benchmarked measurements).

**Figure A27.** Algorithms performance compared to matheuristic rollout applied in PDS-RA: total BSF measured for average decision computation time over all test instances applied (including 10 non-benchmarked measurements).

#### **References**


## *Article* **An Optimized Decision Support Model for COVID-19 Diagnostics Based on Complex Fuzzy Hypersoft Mapping**

**Muhammad Saeed 1, Muhammad Ahsan 1, Muhammad Haris Saeed 1, Atiqe Ur Rahman 1, Asad Mehmood 1, Mazin Abed Mohammed 2, Mustafa Musa Jaber 3,4 and Robertas Damaševiˇcius 5,\***


**Abstract:** COVID-19 has shaken the entire world economy and affected millions of people in a brief period. COVID-19 has numerous overlapping symptoms with other upper respiratory conditions, making it hard for diagnosticians to diagnose correctly. Several mathematical models have been presented for its diagnosis and treatment. This article delivers a mathematical framework based on a novel agile fuzzy-like arrangement, namely, the complex fuzzy hypersoft (CFH S ) set, which is a formation of the complex fuzzy (CF) set and the hypersoft set (an extension of soft set). First, the elementary theory of CFH S is developed, which considers the amplitude term (A-term) and the phase term (P-term) of the complex numbers simultaneously to tackle uncertainty, ambivalence, and mediocrity of data. In two components, this new fuzzy-like hybrid theory is versatile. First, it provides access to a broad spectrum of membership function values by broadening them to the unit circle on an Argand plane and incorporating an additional term, the P-term, to accommodate the data's periodic nature. Second, it categorizes the distinct attribute into corresponding sub-valued sets for better understanding. The CFH S set and CFH S -mapping with its inverse mapping (INM) can manage such issues. Our proposed framework is validated by a study establishing a link between COVID-19 symptoms and medicines. For the COVID-19 types, a table is constructed relying on the fuzzy interval of [0, 1]. The computation is based on CFH S -mapping, which identifies the disease and selects the optimum medication correctly. Furthermore, a generalized CFH S -mapping is provided, which can help a specialist extract the patient's health record and predict how long it will take to overcome the infection.

**Keywords:** COVID-19; disease modelling; complex numbers (C-numbers); complex fuzzy hypersoft set; mapping; inverse mapping

**MSC:** 03E72, 68U35

### **1. Introduction**

COVID-19 marked itself on the world's map at the end of 2019 in the Hunan Seafood market of Wuhan district (Hubei, China) [1]. After a few days, deep sequencing analysis of the samples taken from the infected patients' lower respiratory tract led to identifying a novel virus that belonged to the Severe Acute Respiratory Syndrome. From there, it was given the name SARS-CoV-2 and was found to be the infection-causing agent of the pneumonia clusters observed in the infected patients [2]. The coronavirus family is thought to be the cause of sickness in both animals and humans, according to [3]. A total of seven

**Citation:** Saeed, M.; Ahsan, M.; Saeed, M.H.; Rahman, A.U.; Mehmood, A.; Mohammed, M.A.; Jaber, M.M.; Damaševiˇcius, R. An Optimized Decision Support Model for COVID-19 Diagnostics Based on Complex Fuzzy Hypersoft Mapping. *Mathematics* **2022**, *10*, 2472. https:// doi.org/10.3390/math10142472

Academic Editors: Francois Rivest and Abdellah Chehri

Received: 31 May 2022 Accepted: 12 July 2022 Published: 15 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

family members of the Coronavirus family can produce infection in humans. The most common infection causative agents in humans out of these viruses are namely: 229E, HKU1, NL63, and OC43 [1].

Machine learning and deep learning methods are widely used for COVID-19 diagnostics (see, for example, [4–6]), severity prediction [7], and spread prediction [8,9]. The overview of these methods can be found in review papers [10,11]. Fuzzy-logic-based methods have also been applied extensively for disease diagnostics with various examples ranging from advice on the common cold [12] to Huntington's disease [13].

Conventional methods are insufficient to solve multidimensional challenges in the environment, economics, engineering, and robotics. The four theories discussed here that specialize in solving these types of problems include the fuzzy set theory Zadeh [14], interval mathematics [15], the probability set theory [16], and the rough set theory Pawlak [17]. They are widely used in various fields such as statistics, machine learning, and artificial intelligence. Liu et al. [18] characterized the concept of a correlation coefficient between hesitant fuzzy sets and applied it for medical diagnoses. Molodtsov [19] showed that soft set (SS) theory has significant applications in the fields of data mining, medical imaging, Riemann integration, game theory, and pattern recognition. Soft sets were initially deployed by Maji et al. [20] to handle judgment call dilemmas. S-sets and associated variants are relevant according Yang et al. [21]. The paradigm of imprecise SS and its various forms was established by Maji et al. [22]. Kharal et al. [23,24] established the concepts of mappings on fuzzy soft subclasses and soft classes. They deployed examples and empirical evidence to explore the preservation of the image of fuzzy soft sets and soft sets. In [25], Karaaslan investigated the word smooth class and its relevant functions. Alkhazaleh et al. [26] developed the concepts of a mapping on classes and categorised neutrosophic soft set collections into single-valued neutrosophic classes and also explored and identified a single-valued neutrosophic image and neutrosophic soft images of neutrosophic soft sets. Ropiak [27] combined rough set based granular computing with deep learning methods, which allowed for the improvement of knowledge extraction.

The notion of mappings over collections of multifunctional fuzzy soft sets was pioneered by Sulaiman et al. [28]. They focused on a few factors linked to the image and INI of multi-aspect fuzzy soft sets and demonstrated their findings with numerical examples. The concept of mappings between picture fuzzy soft sets and an intuitionistic fuzzy soft set and INI was defined by Bashir and Salleh [29].

Samarandache [30] offered the fuzzified hypersoft (FHS) and hypersoft sets (HS) as modifications of fuzzy soft and soft settings. Saeed et al. [31–34], Zulqarnain et al. [35], Martin et al. [36], Musa et al. [37], Ajay et al. [38], and Debnath et al. [39] discussed the basics of the HS and their entire mappings in an HS environment, as well as their exposition of the HS in object classification, cell imaging, and multi-eligibility requirements. Ramot et al. [40] proposed an extensive analysis of the mathematical properties of the CF set. Elementary predetermined operations on CF sets were studied, including CF complement, union, and intersection. Thirunavukarasu et al. [41] examined the intuitive understanding of a soft CF set's aggregation operation. They also illustrated uses for consolidation techniques, demonstrating that the approach may be successfully used in a wide range of circumstances including uncertainty and periodicities. In 2020, Rahman et al. [42] combined two major theories complex set and hypersoft set in a fuzzy setting: a sophisticated neutrosophic set, and a complex intuitionistic imprecise information given to build hypersoft mixtures.

The paradigm of a complex multi-fuzzy collection, which is a fusion of CF collections and multi-fuzzy defines, was established by Al-Qudah et al. [43]. Their developed scheme would indeed be equipped to deal with instabilities, ambiguities, and evaluation based of two-dimensional cross inputs by continuously storing the magnitude and P-terms of the C-numbers.

The main objective of this study is to simulate a feasible type of situation of COVID-19-specific diagnosis, as well as to ensure an effective treatment because it is difficult to distinguish other upper-respiratory infections from COVID-19 using existing theoretical

and empirical models and techniques [23,24,44,45] because these techniques are restrained from finalizing configurations. The above mentioned strategies are inadequate to thoroughly assess the data to gain a more substantial insight and correct diagnosis. To remedy this defect, these foundations are coupled into a multifaceted system composed of a fuzzy output and a hypersoft (HS) setting.

This approach is far more flexible in two main ways. To continue, it extends the CFH S to obtain a new term 'P-term' to support the statement's reoccurring aspect, permitting a broad range of weighted parameters. They cannot keep up in two-dimensional quantities to the unit circle in an image plane. Furthermore, the CFH S traits can be further grouped into sub-values to enhance explanation. A mapping is a correlation between the two or more segments which is handled by guidelines that transfer an embedding feature to its underpinning normative considered appropriate predicated on subsystem and subsurface properties. This tool enables comparable inputs to be treated by a single basic value. The goal of the research is to investigate COVID-19 treatments in the community, as well as the manifestations that correlate with them. It is impossible to discern which characteristic of COVID-19 is causing issues and how substantial it is after gazing at the COVID-19 health consequences. To reduce this issue, the CFH S set and CFH S -mapping with its INM are often used.

When linked with scientific modelling, these concepts are effective and crucial for appropriately addressing the issues. A table based on the fuzzy region among [0, 1] is constructed for the diverse strains of COVID-19. The approaches rely on CFH S -mapping and would be used to create an index that indicates the ailment and then decides the correct diagnosis. In addition, a detailed CFH S -mapping , which will support a practitioner in estimating the time before the symptoms are alleviated, is established.

The main contributions and the advantages of the proposed method can be summarized as follows:

	- (a). Uncertainties involved in the approximation of alternatives.
	- (b). Periodic nature of the data.
	- (c). Consideration of sub-parametric values as disjointed classes.
	- (d). Entitlement of multi-argument approximate function.

It tackles first issue by assigning a fuzzy membership grade to each alternative corresponding to parameters, the second by considering phase and amplitude terms, and the third and fourth by considering the hypersoft setting. Thus it leads to constructing a reliable decision support system by addressing these issues collectively.


The article is presented in the following manner: Section 2 highlights the concept of complex fuzzy hypersoft classes. image The opted approach is validated by practically applying it to a problem with comparative analysis in Section 3, while the conclusion sums up the study in the Section 4.

#### **2. Implementation of** CFH S **Set for COVID-19**

This section's primary focus is to analyze the highlighted problem related to COVID-19. The analysis is based on the cause of the disease, its symptoms, diagnosis, and treatment of patients. CFH S mapping and inverse mapping are applied for precise and accurate analysis and suggest a procedure policy purely based on mathematical strategies presented in this article.

#### *2.1. COVID-19 and Its Variants*

With the passage of time, scientists have identified numerous variants of COVID-19 as it has evolved. There are many distinct forms of coronaviruses, but just four are discussed below.

The first is SARS, also known as Severe Acute Respiratory Syndrome, whose causative agent also belongs to the coronavirus family. The SARS-CoV virus has a zoonotic origin, targets the lungs, and causes acute respiratory problems. It was the first virus whose virology or genetic sequence was remotely similar to the COVID-19 virus upon the first examination.

SARS-CoV-2 or COVID-19 is responsible for the pandemic that started back in 2019. As explained in the introduction, it also hinders respiratory functions and is renowned as the successor of the SARS-CoV-1 virus by the US National Institutes of Health.

MERS, or Middle Eastern Respiratory Syndrome, also belongs to the coronavirus family. It first presented itself in the Middle Eastern countries of Asia around 2012, and it is called MERS or the Camel Flu. Its symptoms are quite similar to those of the COVID-19 virus, but the most prominent are mild to high fever, shortness of breath, diarrhea, and cough. MERS is regarded as more severe when compared with other diseases.

The OC43(HCoV-OC43) strain of the human coronavirus is a component of the COVID family and belongs to a group of viruses called the Betacoronavirus 1. This strain is prominent in infecting humans and cattle. As far as the virus's structural integrity goes, it is a simple-stranded RNA, positive-sense, enclosed virus. It is also one of those viruses of the coronavirus family that affects humans out of the seven strains. Its host-entering mechanism involves binding with the N-acetyl-9-O-acetylneuraminic acid receptor of the host cell.

These are the specific symptoms associated with these problems: loss of speech or movement, chest pain or pressure, difficulty breathing or shortness of breath, loss of taste or smell, headache, diarrhea, sore throat, aches and Pain, tiredness, dry cough, and fever.

An algorithm based on CFH S -mapping is proposed to diagnose COVID-19, suggest appropriate treatments, and track the treatment steps and improvement measures for the patients.

#### *2.2. Preliminaries*

This portion provides a few basic concepts to facilitate the readers for clear understanding proposed approach.

**Definition 1** ([14])**.** *The FS is characterized by a membership mapping* <sup>ˆ</sup> : <sup>Θ</sup><sup>ˆ</sup> <sup>→</sup> <sup>Ω</sup><sup>ˆ</sup> *which is stated as a family of pairs* (ˆ *θ*, ˆ(ˆ *θ*)) *where* ˆ(ˆ *θ*) *and* Ωˆ *are regarded as belonging degree of* ˆ *<sup>θ</sup>* <sup>∈</sup> <sup>Θ</sup><sup>ˆ</sup> *and unit closed interval, respectively.*

**Definition 2** ([19])**.** *An SS is stated as the family of pairs* (*o*ˆ, ˆ *ζ*(*o*ˆ)) *where* ˆ *<sup>ζ</sup>* : <sup>Ξ</sup><sup>ˆ</sup> <sup>→</sup> <sup>P</sup>ˆ(Θ<sup>ˆ</sup> ) *with* ˆ *ζ*(*o*ˆ), Ξˆ *, and* Pˆ(Θˆ ) *as an o*ˆ*-approximate member of SS, a set of evaluating indicators and the family of subsets of* Θˆ *, respectively.*

**Definition 3** ([41])**.** *Let say* Θˆ *and* Πˆ *are the initial universal set and attributes, respectively. For any <sup>g</sup>* <sup>∈</sup> <sup>Π</sup><sup>ˆ</sup> *, let <sup>F</sup>* <sup>⊆</sup> <sup>Π</sup><sup>ˆ</sup> *and* (*ϕ*, *<sup>F</sup>*) *be a CF soft set over* <sup>Θ</sup><sup>ˆ</sup> *. Then, a CF soft set* (*ϕ*, *<sup>F</sup>*) *is subjected to* <sup>Θ</sup><sup>ˆ</sup> *, which is specified by a function <sup>ϕ</sup><sup>F</sup> that represents a mapping <sup>ϕ</sup><sup>F</sup>* : *<sup>F</sup>* <sup>→</sup> *<sup>C</sup>*(Θ<sup>ˆ</sup> )*. Here, <sup>ϕ</sup><sup>F</sup> is known as CF approximate function of the CF soft set, and it can be signified as*

$$\mathcal{C}(\mathfrak{q}, F) = \{ (\mathfrak{g}, \mathfrak{q}\_F(\mathfrak{J})) : \mathfrak{J} \in F, \mathfrak{q}\_F(\mathfrak{J}) \in \mathcal{C}(\mathring{\Theta}) \}.$$

**Definition 4** ([30])**.** *An HS is stated as a class of pairs* (¨ *δ*, ˆ *ζHS*(¨ *δ*)) *where* ˆ *<sup>ζ</sup>HS* : <sup>Λ</sup><sup>ˆ</sup> <sup>→</sup> <sup>P</sup>ˆ(Θ<sup>ˆ</sup> ) *with* ˆ *ζHS*(¨ *δ*) *as* ¨ *δ-multi-argument approximate member of HS for* ¨ *<sup>δ</sup>* <sup>∈</sup> <sup>Λ</sup><sup>ˆ</sup> *and* <sup>Λ</sup><sup>ˆ</sup> *is equal to* <sup>Λ</sup><sup>ˆ</sup> <sup>1</sup> <sup>×</sup> <sup>Λ</sup><sup>ˆ</sup> <sup>2</sup> <sup>×</sup> ... <sup>×</sup> <sup>Λ</sup><sup>ˆ</sup> *n, whereas all* <sup>Λ</sup><sup>ˆ</sup> *<sup>i</sup> are disjoint sub-classes of parameters having their respective sub-parametric values. For more definition see, [31].*

#### *2.3. Methodology*

This section aims to describe the various stages of complete methodology that are adopted for this study. The Figure 1 presents the graphical view of complete methodology adopted in this study.

**Figure 1.** Pictographic view of various stages involved in adopted methodology.

2.3.1. Description of Fuzzy Rules

In accordance with the terminological understanding of "fuzzy rule", the following criteria have been employed to justify fuzzy rule requirements:



**Table 1.** Comparison of employed membership function with other membership functions.

#### 2.3.2. Pre-Stage

COVID-19 patients show similar symptoms to the sickness caused by the viruses listed above, making it hard to pinpoint the cause of the ailment and propose an appropriate treatment for the disease. This ambiguity and vagueness are dealt with by using CFH S in a specialized manner. To translate oral data into numerical language, a fuzzy interval [0, 1] is constructed for various types of COVID. A chart is created to find the actual form of COVID from its different types; see Table 2.

**Table 2.** COVID diagnosis table with ranges.


Diseases are known to progress over time, so this paper will utilize this fact by collecting the patient's data for 2–3 days, comparing the symptoms and the side effects (if any) presented, leading to a complete workup of the patient's history. Further on, additional graphs regarding the present condition compared to the previous condition of the patients are created for better monitoring and trend identifying purposes. The above statement is expanded in Table 3 and Figure 2. Depending on the conditions of COVID, it is divided into a set of three ranges, namely serious, moderate, and low. Figure 2 defines the ranges along with the constraints allocated to these ranges.


**Table 3.** COVID is analysed using associated concerns and how they are treated on a daily basis.

**Figure 2.** Flowchart of various ranges related to COVID-19's mentioned criteria.

*2.4. Proposed Algorithm for Pre-Diagnosis of Patients Based on CFHS Mapping*

This section proposes a multi-attribute decision-making-based (MA DM) algorithm (Algorithm 1) for pre-diagnosis of COVID-19 in patients who are under observation.

#### **Algorithm 1:** Procedural flow of pre-diagnosis of COVID-19 in patients.

#### -**Start**

#### -**Input**

*Step 1.* To categorise the coronavirus family. Suppose W = {*∂*1, *∂*2, *∂*3, ..., *∂n*} be set of four patients suspected to have COVID and A = {ˆ 1, ˆ 2, ˆ 3, ..., ˆ *<sup>v</sup>*} be set of *v*

symptoms whose sub-values related to sets F*i*'s, where F = ∏*i*=1 F*i*. Following a crucial evaluation at *ε*th times, the consultant's CFH S set chart is customised

as: *z<sup>ε</sup>* <sup>F</sup> <sup>=</sup> {*z<sup>ε</sup> <sup>s</sup>* <sup>=</sup> {*∂*,<sup>T</sup> *<sup>ε</sup> <sup>s</sup>* (*∂*)} : <sup>T</sup> *<sup>ε</sup> <sup>s</sup>* (*∂*) <sup>∈</sup> *<sup>C</sup>*(*F*), *<sup>∂</sup>* <sup>∈</sup> <sup>W</sup> ,*<sup>s</sup>* <sup>∈</sup> <sup>F</sup> }, where <sup>T</sup> *<sup>ε</sup> <sup>s</sup>* (*∂*) are CFH S membership of SARS-CoV, SARS-CoV-2, MERS-CoV and OC43 (beta) for *lth* patients and *kth* symptoms respectively and

(*ε* = 1, 2, 3, ..., *t*; *k* = 1, 2, 3, ..., |F |; *l* = 1, 2, 3, ..., *n*). The CFH S union of all "t" day to clinical charts is used to procure the most relevant data on all patients. *Step 2.* It is anticipated that B = {ˆ <sup>1</sup>, ˆ <sup>2</sup>, ˆ <sup>3</sup>, ..., ˆ *<sup>w</sup>*} a class having relevant

indications and the their respective sub-classes are F *<sup>i</sup>* 's with F = *w* ∏*i*=1 F *i* .

An CFH S set is constructed having weights proposed by decision-makers (health experts) after assessing the physical condition of the patient under observation over time *ε*.

*Step 3.* Now, mappings are defined as follows: *λ* : W → W and : F → F characterized as follows; *λ*(*∂l*) = *∂l*, (*sk*)=(*s k*),

(*k* = 1, 2, 3, ..., |F |; *k* = 1, 2, 3, ..., |F |; *l* = 1, 2, 3, ..., *n*) (based on the interrelations with the basic symptoms).

Suppose CFH S -mapping *σ* = (*λ*, ) : CFH S (W ) → CFH S (W ) defined as;

$$\mathcal{R}\_{\sigma(z\_{\mathcal{B}})}(s')(\partial) = |\mathcal{R}\_{s'\_{k'}}| \begin{cases} \max\_{v \in \lambda^{-1}(\partial)} \left( \max\_{s \in \mathcal{o}^{-1}(s') \cap \mathcal{F}} \mathcal{P}\_{z\_{\mathcal{F}}} \right)(\partial) \text{ if }\\ \lambda^{-1}(\partial) \neq \mathcal{O}, \mathcal{o}^{-1}(s') \cap \mathcal{F} \neq \mathcal{O},\\ 0 \text{ if } otherwise \end{cases}$$

where T*s <sup>k</sup>* are weights from *<sup>z</sup>*F that are connected. Get the image of *z<sup>ε</sup>* <sup>F</sup> by using the mappings *σ* and denoted as *z* <sup>F</sup> .

#### -**Construction**

*Step 4.* Transform CFH S set to aggregation values by using,

T*z*(*s*)(*∂*) = *w*1*μz*(*s*)(*∂*) + *w*2( <sup>1</sup> <sup>2</sup>*<sup>π</sup>* )*ωz*(*s*)(*∂*) [46], where *w*1, *w*<sup>2</sup> ∈ [0, 1]. *Step 5.* Then, by making use of the information from Table 3, constitute a set after

symptoms and assemble the pre-diagnosis table which leads to the assessment for consistency of the proposed study.

*Step 6.* Take the mean for each specific patient centred on their clinical manifestations. Now, compare our outcomes to the diagnosis Table 2.

#### -**Computation**

*Step 7.* Consider a class B = {ˆ <sup>1</sup>, ˆ <sup>2</sup>, ˆ <sup>3</sup>, ..., ˆ *<sup>w</sup>*} consists of symptoms which are correlated concurrently, where *k* = *w* ∏*i*=1 |F *<sup>i</sup>* | and *C* = {1, 2, 3, ..., *x*} is a list

of potential medicines, then it allows for constructing *χ*F , where *χ* is the CFH S function from F to W (*C*) that is the collection with recommendations of physician.

*Step 8.* Obtain W <sup>1</sup> *<sup>C</sup>* by applying min-max composition over *z* <sup>F</sup> and *χ*F . *Step 9.* Use medications that offer additional benefits while having fewer side effects. To determine the patient's status, the guidelines are followed.

#### **Algorithm 1:** *Cont*.

#### -**Output**

*Step 10.* Consider two mappings: *<sup>λ</sup>* : *<sup>J</sup>q*−<sup>1</sup> → *<sup>J</sup><sup>q</sup>* and

*<sup>λ</sup>* : <sup>W</sup> *<sup>q</sup>*−<sup>1</sup> → <sup>W</sup> *<sup>q</sup>* and : *<sup>C</sup>q*−<sup>1</sup> → *<sup>C</sup><sup>q</sup>* such that *<sup>λ</sup>* (*∂l*) = *∂<sup>l</sup>* and (*x*) = *x*. Then this mapping can be constructed in this mechanism:

*σ* = (*λ* , ) : W *<sup>q</sup>*−<sup>1</sup> *<sup>C</sup>* <sup>→</sup> <sup>W</sup> *<sup>q</sup> <sup>C</sup>* and can be regarded as:

$$\mathcal{W}\_{\mathbb{C}}^{q} = \sigma'(\mathcal{W}\_{\mathbb{C}}^{q-1})(\mathbb{U})(\mathfrak{d}) = \frac{1}{q} \begin{cases} \vee\_{\pi \in \lambda'^{-1}(\mathfrak{d})} (\vee\_{\theta \in \mathcal{O}'^{-1}(\mathfrak{d}) \cap \mathbb{C}} \mathcal{W}\_{\mathbb{C}}^{q^{-1}}(\pi) \text{ if }\\ \lambda'^{-1}(\mathfrak{d}) \neq \mathcal{O}, \mathcal{O}'^{-1}(\mathfrak{d}) \cap \mathbb{C} \neq \mathcal{O}' \\\\ 0 \quad \text{if } otherwise \end{cases}$$

where *g* ∈ (*C*) ⊆ *<sup>C</sup>*, *<sup>v</sup>* ∈ <sup>W</sup> *<sup>q</sup>*, *<sup>π</sup>* ∈ <sup>W</sup> *<sup>q</sup>*−1, *<sup>ϑ</sup>* ∈ (*C*)*q*−<sup>1</sup> for *<sup>q</sup>* = 2, 3, 4... is the number of episodes of treatments.

*Step 11.* Continue step 10 whenever the outcomes need to be assessed and finally compute the score values by taking arithmetic mean of all final obtained values corresponding to each patient.

#### Methodological Limitations

Prior to the application of the algorithm above, the following limitations of the technique are checked:


#### **3. Experimental Study**

The usage of the algorithm described above in a clinical situation is the main emphasis of this section. The patient's medical condition is first translated into mathematical syntax with the aid of medical personnel. The next step involves the comparison of the mathematical syntax of the patient with the syntax of the patients recorded in the database beforehand. The patient with distinct symptoms of COVID-19 is monitored with the help of a diagnostic map, and day-by-day reports can be seen in (Tables 2 and 3). These tables can be used for a comparative analysis to deduce the intensity of the disease on a particular patient. The most significant advantage of the algorithm is its use case for determining a particular disease based on its symptoms and severity using mapping functions. The algorithm can propose an optimal treatment method based on the disease based on the patient's condition. The technique's development will be aided by a fully generalized mapping of the physician's rehabilitation and convenient restoration graphs for clinical practice, retrospective cohort analysis, and application users. Four patients present similar symptoms making it complicated for medical professionals to suggest a diagnosis based on their overlapping symptoms. Many dynamics are considered, but some are ruled out for ease of explanation of the algorithm, such as the previous skin color changes, history, and other aspects. Based on the diagnosis presented by the algorithm and the doctor's intuition, a treatment method can be started along with the patient's rehabilitation plan. The following example is performed on hypothetical data, but if real data is used, it can lead

to fruitful results and help optimize the workflow in hospitals while minimizing human errors and misdiagnosis problems.

#### *Step 1.*

Let W = {*∂*1, *∂*2, *∂*3, *∂*4} be considered a set of four patients. Let ˆ <sup>1</sup> = Fever, ˆ <sup>2</sup> = Cough, ˆ <sup>3</sup> = Pain, be ailments with distinct attributes, the attributes of which are associated to the sets F1, F<sup>2</sup> and F3, respectively. Let F<sup>1</sup> = {ˆ <sup>11</sup> = Intermittent fever, ˆ <sup>12</sup> = Remittent fever}, F<sup>2</sup> = {ˆ <sup>21</sup> = Dry cough}, F<sup>3</sup> = {ˆ <sup>31</sup> = Pain in temples of head, ˆ <sup>32</sup> = Pain in forehead}. Now, generate the first two (*ε* = 2) days chart given in Tables 4 and 5 which are in the form of CFH S . After that, take the union between them. The results can be seen in Table 6, where 0 ≤ *θ* ≤ 2*π*.

**Table 4.** *z*<sup>1</sup> <sup>F</sup> : Symptoms from F on the first day of patient's treatment.


**Table 5.** *z*<sup>2</sup> <sup>F</sup> : Symptoms from F on the second day of patient's treatment.


**Table 6.** *z<sup>ε</sup>* <sup>F</sup> : CFH S union of *<sup>z</sup>*<sup>1</sup> <sup>F</sup> and *<sup>z</sup>*<sup>2</sup> F .


#### *Step 2.*

Let F <sup>1</sup> = {ˆ <sup>11</sup> = tightness sensation in the head, ˆ <sup>12</sup> = stroke}, F <sup>2</sup> = {ˆ <sup>21</sup> = Scratchy sensation}, F <sup>3</sup> = {ˆ <sup>31</sup> = Malaise, ˆ <sup>32</sup> = Body aches} be three sets related to three different attributes ˆ <sup>1</sup> = Headaches, ˆ <sup>2</sup> = Sore throat, ˆ <sup>3</sup> =weakness, respectively, for COVIDrelated symptoms. Specialists weight clinical conditions depending on clinical knowledge and translate relevant knowledge to quantitative transcription to establish the CFH S *z*F displayed in Table 7.


**Table 7.** *z*F : Scales assigned to each CFH S patient's clinical manifestations.

*Step 3.*

Define the mappings listed below; *λ* : W → W and : *F* → F such that; *λ*(*∂*1) = *∂*1, *λ*(*∂*2) = *∂*2, *λ*(*∂*3) = *∂*3, *λ*(*∂*4) = *∂*4, and (ˆ 11, ˆ 21, ˆ <sup>31</sup>)=(ˆ <sup>11</sup>, ˆ <sup>21</sup>, ˆ <sup>31</sup>), (ˆ 11, ˆ 21, ˆ <sup>32</sup>)=(ˆ <sup>12</sup>, ˆ <sup>21</sup>, ˆ <sup>31</sup>), (ˆ 12, ˆ 21, ˆ <sup>31</sup>)=(ˆ <sup>11</sup>, ˆ <sup>21</sup>, ˆ <sup>32</sup>), (ˆ 12, ˆ 21, ˆ <sup>32</sup>)=(ˆ <sup>12</sup>, ˆ <sup>21</sup>, ˆ <sup>32</sup>). Measure the image of *z<sup>ε</sup> <sup>F</sup>* as well as *z* <sup>F</sup> in Table 8.

**Table 8.** *z* <sup>F</sup> : The image of *z<sup>ε</sup> <sup>F</sup>* under CFH S mapping


*Step 4.*

Changed Table 8 to fuzzy values, for this please see Table 9 by applying T*z*(*s*)(*∂*) = *w*1*μz*(*s*)(*∂*) + *w*2( <sup>1</sup> <sup>2</sup>*<sup>π</sup>* )*ωz*(*s*)(*∂*) [46], with weights *w*<sup>1</sup> = 0.2, *w*<sup>2</sup> = 0.4.

**Table 9.** Scores in the form of FHS set.


*Step 5.*

Compare Table 9 with Table 3 to obtain initial diagnosis and generate a diagnosis Table 10. This table is utilized to check the accuracy of the generated diagnosis.

*Step 6.*

Determine the average of all the aspects from Table 9 that correspond to each individual's symptoms. This can be seen in Table 11. The COVID chart (Table 2) is currently being compared to the Table 11 findings. Patients *∂*1, *∂*<sup>3</sup> are diagnosed with SARS-CoV, while patients *∂*2, *∂*<sup>4</sup> are suspected with SARS-CoV-2.


**Table 10.** Initial treatment chart is developed to assess the validity of the results.

#### *Step 7.*

The doctor prescribes medicine after accurately assessing the true essence of each clinical condition. The CFH S set evolves based on critical specific suggestions, along with the adequate care for the different sorts of COVID. Suppose *C* = {<sup>1</sup> = Pfizer, <sup>2</sup> = Moderna, <sup>3</sup> = Novavax, <sup>4</sup> = AstraZeneca} be distinctive sustainable therapies, then *χ*F is established, which is a set of surgeon's advice for the effective treatments for COVID manifestations, and repurpose CFH S to fuzzy values using T*z*(*s*)(*∂*) = *w*1*μz*(*s*)(*∂*) + *w*2( <sup>1</sup> <sup>2</sup>*<sup>π</sup>* )*ωz*(*s*)(*∂*) [46], with weights *w*<sup>1</sup> = 0.2, *w*<sup>2</sup> = 0.4 to obtain aggregation values. Table 12 contains *χ*F ∈ CFH S (W ). The assessment methods in Table 12 are determined depending on each patient's condition. *Step 8.*

Measure the CFH S union among both *χ*F , *z* <sup>F</sup> and collect the linkage among both predicted treatments and doctors as CFH S set *χ*F *z* <sup>F</sup> <sup>=</sup> <sup>W</sup> <sup>1</sup> *<sup>C</sup>* , see Table 13. *Step 9.*

The prescription is pertinent for the patients because it generates more rewards while having low toxicity. Table 14 shows the best medicine dosages for each patient. From Table 14, the treatment <sup>4</sup> is most suitable for patient *∂*1, while one of the treatments among 1, 3, and <sup>4</sup> is to be advised for patient *∂*2; for patient *∂*<sup>3</sup> the most suitable treatment is 1, and the treatment <sup>4</sup> is the most suitable for patient *∂*4. The concluding position relies on the person's actual status, disease features, and disease. *Step 10.*

The individual's predicament is classified by the characteristics of ailments and the patient's condition. The incidences will repeat whenever illnesses are cured. By using CFH S mapping and creating mappings to assess the development of each patient; *<sup>λ</sup>* : <sup>W</sup> *<sup>q</sup>*−<sup>1</sup> → <sup>W</sup> *<sup>q</sup>* and : *<sup>C</sup>q*−<sup>1</sup> → *<sup>C</sup><sup>q</sup>* such that

$$
\lambda'(\partial\_1) = \partial\_1, \lambda'(\partial\_2) = \partial\_2, \lambda'(\partial\_3) = \partial\_3, \lambda'(\partial\_4) = \partial\_4;
$$

and

$$
\varpi'(\mathbb{O}\_1) = \mathbb{O}\_{1\prime}\,\varpi'(\mathbb{O}\_2) = \mathbb{O}\_{2\prime}\,\varpi'(\mathbb{O}\_3) = \mathbb{O}\_{3\prime}\,\varpi'(\mathbb{O}\_4) = \mathbb{O}\_{4\prime}.
$$

This is how the CFH S -mapping can be conveyed;

$$
\sigma' = (\lambda', \mathfrak{o}') : \mathbb{X}\_{\mathbb{C}}^{q-1} \to \mathbb{X}\_{\mathbb{C}}^{q}
$$

The CFH S -mapping is underlying as;

$$\mathcal{W}\_{\mathbb{C}}^{q} = \sigma'(\mathcal{W}\_{\mathbb{C}}^{q-1})(\mathbb{U})(\mathfrak{d}) = \frac{1}{q} \begin{cases} \vee\_{\pi \in \lambda'^{-1}(\mathfrak{d})} (\vee\_{\theta \in \mathcal{o}'^{-1}(\mathfrak{d}) \cap \mathbb{C}} \mathcal{W}\_{\mathbb{C}}^{q-1}(\pi) \text{ if} \\ & \lambda'^{-1}(\mathfrak{d}) \neq \mathcal{O}\_{\circ} \mathcal{o}'^{-1}(\mathfrak{d}) \cap \mathbb{C} \neq \mathcal{O}\_{\circ} \\ 0 & \text{ if } otherwise \end{cases}$$

where ∈ (*C*) ⊆ *<sup>C</sup>*, *<sup>∂</sup>* ∈ <sup>W</sup> *<sup>q</sup>*, *<sup>π</sup>* ∈ <sup>W</sup> *<sup>q</sup>*−1, *<sup>ϑ</sup>* ∈ *<sup>C</sup>q*−<sup>1</sup> identify the number of remedies and rehabilitation exacerbations in Tables 15–18 for *q* = 2, 3, 4, 5. *Step 11.*

Step 10 is reiterated until patients' targets are met. Figures 3–6 depict each patient's status update.


**Table 11.** Personal information from patient significance levels related to clinical manifestations.

**Table 12.** *χ*F is represented in a tabular format: Doctor's advice for COVID symptoms and the appropriate treatment.


**Figure 3.** Graph of progress of patient *∂*1.

**Figure 4.** Graph of progress of patient *∂*2.

**Figure 5.** Graph of progress of patient *∂*3.

**Figure 6.** Graph of progress of patient *∂*4.

**Table 13.** W <sup>1</sup> *<sup>C</sup>* tabular representation: union of *χ*F and *z* <sup>F</sup> to investigate the affiliation for both envisaged treatments and patients.


**Table 14.** Data pertaining to suggested treatment is represented in a tabular format.



**Table 15.** W <sup>2</sup> *<sup>C</sup>* tabular representation: after the second therapy event, the patient's improvement report.

**Table 16.** W <sup>3</sup> *<sup>C</sup>* tabular representation: After the third therapy event, the patient's improvement report.


**Table 17.** W <sup>4</sup> *<sup>C</sup>* tabular representation: After the fourth therapy event, the patient's improvement report.


**Table 18.** W <sup>5</sup> *<sup>C</sup>* tabular representation: After the fifth therapy event, the patient's improvement report.


In Table 19, the symbols and × are meant for YES and NO, respectively. Similarly, the features such as FMG stand for fuzzy membership grade, COP for "consideration of parameters", COSP for "consideration of sub-parameters", MOPND for "management of periodic nature of data" and REC for "ranking evaluation criteria".


**Table 19.** The proposed CFH S is compared to established paradigms.

#### *3.1. Target Users of the Proposed Approach*

The proposed algorithm aims to be a problem-solving support for early assortment alternatives and identifying sufferers with conflicting medical indications. This exploration demonstrates a well-built association between the signs and mathematically records them. The scheme is assembled on trimming CFH S set designs that can predict a patient's state and estimate medical indications over time to analyse the health effects of a medicine. It can be carried out to foresee the contagion's reinfection possibilities in anticipation of a cure. In their upcoming implementation, such pattern recognition-based algorithms are committed to diminishing medical inaccuracies and receiving inspiring results depending on various patient configurations.

#### *3.2. Comparative Analysis*

The concept of CFH S mapping is both broad and appropriate for various illnesses. Existing theories cannot be used to cope with and examine the challenges; however, our proposals do have their limits (Table 19). Because of such boundaries, some physicians may be incapable or unwilling to gather all the initial information. The presented method can transform the patient's condition into a quantitative style without gaps or overlaps, permitting us to secure the best diagnosis and treatment. The presented approach is compared to existing theories on structural and computational basis in Tables 19 and 20. When attributes are further split into attribute values and the concerns include complex (2*D*) data, current techniques fail to execute. The proposed mapping addresses these shortfalls. It reveals that, compared to conventional techniques, our framework is stable and effective in responding to such obstacles satisfactorily.

Now, the recommended plan is discussed along with its comprehensive nature.



**Table 20.** Comparison of computational results of proposed algorithm.

#### **4. Conclusions**

COVID-19 and its associated complications have been discussed in this article. A technique is suggested for diagnosing the patient's primary symptoms and analyzing their COVID. As a result, the CFH S -mapping, INM and a few practical works with associated features are described. There are three stages to the calculation that have been established. The model examines the patients' actual COVID in the first stage. In the second step, CFH S -mapping was utilized to locate suitable medications for the patients depending on their COVID-19 severity. Thirdly, generalized mapping is developed for the patient's development. The system predicts which medication will best treat the patient until the patient achieves suitable immune response. By associating this approach with existing knowledge, the findings thus gained are precise, simple to cope with, and have outstanding flexibility to examine MCDM issues. Other zones of the Neutrosophic HS set, Plithogenic HS set, Plithogenic Intuitionistic Fuzzy HS set, Q-Rung Orthopair Fuzzy HS set and their gluing models are to be explored for developing flexible hybrid structures. It may also be adapted for intelligent machines, diagnostic devices, information retrieval, information processing, social bonding, personalized recommendation approaches, algorithms, media platforms, remote sensing, the macroeconomic paradigm, classification techniques, image recognition, virtual architecture, and probabilistic reasoning.

**Author Contributions:** Conceptualization, M.S., M.A., A.U.R. and M.A.M.; methodology, M.S., M.A., A.U.R. and M.M.J.; software, M.H.S., A.U.R., M.A.M., M.M.J. and R.D.; validation, M.H.S., A.M., M.M.J. and R.D.; formal analysis, M.S., A.U.R., M.A. and R.D.; investigation, M.S., M.A., A.U.R., M.A.M. and M.M.J.; data curation, M.H.S., A.M., M.A.M. and R.D.; writing of the original draft, M.A., A.U.R., M.H.S. and M.A.M.; writing of the review and editing, M.S., M.A., M.A.M. and R.D.; visualization, M.S., M.A., A.U.R., M.H.S. and M.M.J.; supervision, M.S., M.A.M., M.M.J. and R.D.; project administration, M.S., M.A.M. and M.M.J.; funding acquisition, R.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Application of ANN in Induction-Motor Fault-Detection System Established with MRA and CFFS**

**Chun-Yao Lee 1,\*, Meng-Syun Wen 1, Guang-Lin Zhuo <sup>1</sup> and Truong-An Le <sup>2</sup>**


**Abstract:** This paper proposes a fault-detection system for faulty induction motors (bearing faults, interturn shorts, and broken rotor bars) based on multiresolution analysis (MRA), correlation and fitness values-based feature selection (CFFS), and artificial neural network (ANN). First, this study compares two feature-extraction methods: the MRA and the Hilbert Huang transform (HHT) for induction-motor-current signature analysis. Furthermore, feature-selection methods are compared to reduce the number of features and maintain the best accuracy of the detection system to lower operating costs. Finally, the proposed detection system is tested with additive white Gaussian noise, and the signal-processing method and feature-selection method with good performance are selected to establish the best detection system. According to the results, features extracted from MRA can achieve better performance than HHT using CFFS and ANN. In the proposed detection system, CFFS significantly reduces the operation cost (95% of the number of features) and maintains 93% accuracy using ANN.

**Keywords:** multiresolution analysis (MRA); correlation and fitness values-based feature selection (CFFS); artificial neural network (ANN); feature selection

**MSC:** 68T07

#### **1. Introduction**

With the fourth industrial revolution developing, the way factories operate will no longer be the same. Factory automation can save manpower and avoid equipment failures with online fault-detection systems [1–3]. In factories, motors can cause production equipment failure and a significant impact on the economy [4]. Therefore, establishing a motor-detection system could solve the failure problems before severe damages are caused to factory productions. This study analyzes and builds a fault-detection system for common cases of motor failure [5]: (1) bearing fault, (2) interturn short circuit, and (3) broken rotor bar, based on motor-current signature analysis (MCSA) [6].

In recent years, many signal-processing methods have received high attention in the problem of fault-detection systems. For example, R. Romero-Troncoso improved the fast Fourier transform (FFT) by fractional resampling and proposed a multirate signalprocessing technique for induction-motor fault detection [7]. M. Riera-Guasp et al. proposed the Gabor analysis of the current via the chirp z-transform to obtain high-resolution time–frequency images of transient motor currents [8]. V. Climente-Alarcon used a combination of Wigner–Ville distribution (WVD) and particle-filtering feature extraction to study in detail the evolution of principal slot harmonics (PSH) in induction motors under different load profiles [9]. M. Z. Ali et al. proposed a threshold-based fault-diagnosis method for induction motors, first using discrete wavelet transform to process the stator current, and then calculating the threshold value of the motor load through a curve-fitting equation [10].

**Citation:** Lee, C.-Y.; Wen, M.-S.; Zhuo, G.-L.; Le, T.-A. Application of ANN in Induction-Motor Fault-Detection System Established with MRA and CFFS. *Mathematics* **2022**, *10*, 2250. https://doi.org/ 10.3390/math10132250

Academic Editors: Francois Rivest and Abdellah Chehri

Received: 22 March 2022 Accepted: 24 June 2022 Published: 27 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**<sup>\*</sup>** Correspondence: cyl@cycu.edu.tw

The above signal-processing methods have their own advantages, but the current signal may obtain nonlinear and nonstationary noise signals in the time and frequency domains due to the faulty motor, which limits the performance of these methods. For example, FFT and GT are sensitive to noise [7]. The cross-term interference of nonstationary signals limits the performance of WVD [11]. The predefined wavelet-based parameters cause the WT may not be able to adaptively process nonstationary signals [12].

In recent years, several studies have demonstrated the advantages of multiresolution analysis (MRA) [13,14] and Hilbert Huang transform (HHT) [15–17] in analyzing nonlinear and nonstationary noise signals of induction motors. Therefore, this study compares two signal-processing approaches: (1) MRA, (2) HHT. The result of the research could help establish the best fault-detection system for induction motors. (1) MRA can analyze undetectable fault information in the time and frequency domain with current signals that are composed of detail coefficients and approximation coefficients. MRA is used to analyze motor-failure-current signals and extract the important features for fault-detection system from noisy signals; (2) HHT is widely used to analyze nonlinear and nonstationary signals. In conclusion, the HHT is used to analyze the noisy current signals that are caused by a faulty motor in order to find the noise frequency through the Hilbert transform to improve the accuracy of the fault-detection system.

The fault-detection system established with the features extracted from signal-processing approaches. Therefore, this study uses feature engineering to improve the system. Feature engineering can be divided into three categories [18]: feature construction [19,20], feature extraction [21–23], and feature selection [24–26]. Feature construction can increase the number of features by creating the new features based on old features. If the new features are important information, the fault-detection system may achieve better performance. Feature extraction can decrease the dimension of features from high-dimensional features with transfer function, and also avoid a situation where the accuracy of the system would be reduced when the Hughes phenomenon occurs. Feature selection has two methods: filter and wrapper. The filter selects the features based on feature correlation. The wrapper selects the features based on the evaluation function. Therefore, this study uses correlation and fitness values-based feature selection (CFFS) [27] to select the features. The CFFS is improved from correlation-based feature selection (CFS) [28]. CFFS uses Relief [29,30] and ReliefF [31] to calculate the correlation. CFFS selects the features based on evaluation function (performance of artificial neural network (ANN)) and features correlation. In conclusion, the CFFS obtains the advantages from the filter method and the wrapper method.

The selected classifier is the last part of fault-detection system. In [32], most classifier types are compiled, the advantages and disadvantages are discussed, and it is shown that ANNs are supervised by machine learning and achieve robust performance for irrelevant input data and noise and nonlinear data. This study also trains the neural network with Levenberg–Marquardt (LM) [33,34]. LM has advantages when training the neural network with small or medium data, so it is widely used for training feedforward networks [35–37]. Therefore, this study uses an artificial neural network with LM to establish a fault-detection system, selects important features via feature-selection method, and adds additive white Gaussian noise with a different signal-to-noise ratio (SNR) to test the efficiency of the fault-detection system.

#### **2. Measure and Analyze the Current Signals**

The classes of motor faults and damages are shown in Figure 1. As the equipment layout is shown in Figure 2, this study uses the AC power supply with 3 phases and 220 volts for motors. The control panel could adjust the load of the servo motor, which has a 220 V rated voltage, a 60 Hz power frequency, a 2 Hp output, a 1764 rpm rated speed, and a 0.8 power factor. The data-acquisition equipment (PXI-1033) captures the current from all types of motors. Labview can save each observation for 2 s and save sampling frequency for 1 kHz. Corresponding to four types (one healthy motor and three faulty

1.96 0.53 8 10 (**a**) (**b**) (**c**)

motors) Labview can collect 400 observations for each case, save each observation for 2 s, and save the sampling frequency at 1 kHz.

**Figure 1.** Faulty motor failure sample. (**a**) Bearing fault (0.53 mm width and 1.96 mm length), (**b**) interturn short circuit (5 insulation destructive coils), (**c**) broken rotor bar (2 holes—10 mm depth and 8 mm diameter).

**Figure 2.** Equipment layout.

After measuring the data, this study establishes the fault-detection system with Matlab as shown in Figure 3. This classification system is divided into five parts: (a) NI PXI-1033 is used to capture 400 observations of current signals for four types of motors. The current signals will be processed by normalization, benefiting system operation. (b) A total of 1600 observations (4 classes) of normalized current signals were analyzed using MRA and HHT, while features were captured by Matlab. In this section, a fault dataset of 4 types of induction motors with 1600 observations and 4 classes is established. The number of extracted features is described in detail in the next subsection. (c) Critical features are selected by feature-selection approaches to lower the number of features. (d) In the dataset, each type is divided into 300 observations for training and 100 observations for testing. The artificial neural network is trained by the LM to build the fault-detection system. (e) Finally, the accuracy of the fault-detection system can be calculated.

**Figure 3.** Schematic diagram of classification system. (**a**) capture the observations, (**b**) build fault detection dataset, (**c**) feature selection, (**d**) train the ANN, (**e**) classification result.

#### *2.1. MRA and Feature Distribution of Current Signals*

The MRA is used to analyze the current signals of four motors. According to [38], the MRA function in (1) demonstrates that signal *f*(*t*) can be decomposed into approximation coefficient *aj* and detail coefficient *dj*. *ϕ*(*t*) is the scaling function. *ψ*(*t*) is the wavelet function, where *g*<sup>0</sup> and *h*<sup>0</sup> are filter coefficients.

$$f(t) = \sum\_{k} a\_{j0,k} \varphi\_{j0,k}(t) + \sum\_{j} \sum\_{k} d\_{j,k} \psi\_{j,k}(t) \tag{1}$$

$$\varphi(t) = \sum\_{k} \wp\_0(k) + \wp\_k(2t - k) \tag{2}$$

$$
\psi(t) = \sqrt{2} \sum\_{k} h\_0(t) + \varphi\_k(2t - k) \tag{3}
$$

Firstly, the MRA decomposes the signal and uses detail coefficients and approximation coefficients to compose the signal, as shown in Figure 4, where x-axis is the time and y-axis is the amplitude. Then, 60 features extracted from the signal will be composed with d1–5 and a5, as shown in Table 1, namely (1) Tmax; (2) Tmin; (3) Tmean; (4) Tmse; (5) Tstd; (6) Fmax; (7) Fmin; (8) Fmean; (9) Fmse; (10) Fstd. Features are summarily presented below. The frequency domain is analyzed with FFT. Finally, Figure 5 shows the feature distribution of IM.


**Figure 4.** The MRA of current signal. (**a**) Normal motor, (**b**) bearing fault, (**c**) interturn short circuit, (**d**) broken rotor bar.

**Figure 5.** Feature distribution of the MRA. (**a**) Normal motor, (**b**) bearing fault, (**c**) interturn short circuit, (**d**) broken rotor bar.



#### *2.2. Hilbert–Huang Transform and Feature Distribution of Current Signals*

This study uses Hilbert–Huang transform (HHT) to analyze the current signals of four classes of motors. According to [39], the HHT decomposes the signal into several intrinsic mode functions (IMF) *ci* by empirical mode decomposition (EMD) and calculates *Hi*(*t*) from *ci* with Hilbert transform (HT) in (4), as shown. (5) and (6) calculate the instantaneous amplitude *ai*(*t*) and instantaneous phase angle *θi*(*t*). Finally, (7) differentiates the instantaneous phase angle *θi*(*t*) and obtains instantaneous frequency *ωi*(*t*).

$$H\_i(t) = \frac{1}{\pi} \int\_{-\infty}^{\infty} \frac{c\_i}{t - \tau} d\tau \tag{4}$$

$$a\_i(t) = \sqrt{c\_i^2(t) + H\_i^2(t)}\tag{5}$$

$$\theta\_i(t) = \tan^{-1} \frac{H\_i(t)}{c\_i(t)} \tag{6}$$

$$
\omega\_i(t) = \frac{d\theta\_i(t)}{dt} \tag{7}
$$

Firstly, the HHT decomposes the signal into seven (limitation of the signal) intrinsic mode functions, IMF1 (c1) to IMF7 (c7) by EMD, as shown in Figure 6, where x-axis is the amplitude, y-axis is the time. Then, instantaneous frequencies w1 to w7 are calculated with c1 to c7, as shown in Figure 7, where x-axis is the time, y-axis is the frequency. In w1, most of the bandwidths are around 60 Hz (fundamental frequency), and some of the bandwidths are close to 1 kHz, because the value of AC current emerged close to zero has a great slope. Furthermore, 70 features are extracted from c1 to c7 and w1 to w7, as shown in Table 2, namely (1) max; (2) min; (3) mean; (4) mse; (5) std. Features are summarily presented below. Finally, Figure 8. shows the feature distribution of IM.


**Figure 6.** The EMD of current signal. (**a**) Normal motor, (**b**) bearing fault, (**c**) interturn short circuit, (**d**) broken rotor bar.

**Figure 7.** Instantaneous frequency of EMD. (**a**) Normal motor, (**b**) bearing fault, (**c**) interturn short circuit, (**d**) broken rotor bar.

c3 F11 F12 F13 F14 F15 c4 F16 F17 F18 F19 F20 c5 F21 F22 F23 F24 F25 c6 F26 F27 F28 F29 F30 c7 F31 F32 F33 F34 F35


**Table 2.** Feature extraction of the HHT.

EMD

**max min mean mse std** HT w1 F36 F37 F38 F39 F40 w2 F41 F42 F43 F44 F45 w3 F46 F47 F48 F49 F50 w4 F51 F52 F53 F54 F55 w5 F56 F57 F58 F59 F60 w6 F61 F62 F63 F64 F65

w7 F66 F67 F68 F69 F70

**Figure 8.** Feature distribution of the HHT. (**a**) Normal motor, (**b**) bearing fault, (**c**) interturn short circuit, (**d**) broken rotor bar.

#### **3. Feature-Selection Approaches for Features of the MRA and HHT**

#### *3.1. ReliefF*

The ReliefF algorithm shows as Algorithm 1. ReliefF is improved for multiclass classification situations. This study uses ReliefF to calculate the correlation between feature and classification. The algorithm selects the feature (*Fh*) from all of the features, and *Fh* is selected as one value of the set. Then, the feature (*Fh*) chooses the nearest values of the same classification and other classifications. In addition, function (8) is used to calculate the correlation, and features with greater correlation will be considered more important.


5: Calculate the *Fh* correlation *Rf Fh* in (8);

6: **until** obtain all correlations *Rf F* with ReliefF for feature selection

7: Choose the best performance of feature set for establish ANN

$$R\_{fF} = \mathcal{W}\_i - (\frac{1}{km})diff(f\_{h\prime}f\_{nh}) + (\frac{p(m\mathcal{F}n)}{1 - p(n)})(\frac{1}{km}) \times diff(f\_{h\prime}f\_{umb})\tag{8}$$

where

$$\mathcal{R}fF \begin{pmatrix} \mathcal{R}\_{fF1} \\ \vdots \\ \mathcal{R}\_{fFi} \\ \vdots \\ \mathcal{R}\_{fFm} \end{pmatrix} r$$

is the correlation between feature and classification.

#### *3.2. CFS*

The CFS algorithm is shown as Algorithm 2. CFS calculates the Merit value for selecting the features under three conditions: (I) feature correlation and (II) correlation between feature and classification. The algorithm calculates the correlation *Rf* between features with Relief that is shown in (9). Next, ReliefF is used to calculate the correlation *RfF* between feature and classification in (8). In addition, (III) calculates the Merit value in (10).



2: **repeat**


$$R\_f = W\_i - \left(\frac{1}{k}\right)diff(f\_{h\prime}f\_{nh})^2 + \left(\frac{1}{k}\right) \times diff(f\_{h\prime}f\_{nm})^2\tag{9}$$

where

$$Rf \begin{pmatrix} 1 & R\_{f12} & R\_{f13} & \dots & \dots & \dots & R\_{f1m} \\ 0 & 1 & R\_{f23} & \dots & \dots & \vdots \\ \vdots & 0 & \ddots & \dots & R\_{flm} & \vdots \\ \vdots & 0 & 0 & \ddots & \dots & \vdots \\ \vdots & \vdots & \vdots & 0 & 1 & \vdots \\ 0 & \dots & \dots & \dots & 0 & 1 \end{pmatrix}'$$

is the correlation between features;

$$\text{Merit} = \frac{n\_f \times \overline{\mathcal{R}}\_{f\overline{n}}}{\sqrt{n\_f + n\_f(k-1) \times \overline{\mathcal{R}}\_{f\overline{\eta}}}} \tag{10}$$

#### *3.3. CFFS*

The CFFS algorithm is shown as Algorithm 3. CFFS is the feature-selection approach improved by CFS, which is proposed in our previous study [28]. CFFS selects the features under four conditions. The algorithm calculates (I) correlation between features in (9), (II) correlation between features and classification in (8), (III) Merit value in (10). Then, (IV) fitness value *Wfi* is calculated for Merit\_new value in (11).

$$\text{Merit\\_new} = \text{Merit} \times \mathcal{W}\_{fi} \tag{11}$$

The fitness value was calculated by PSO. The PSO is used to optimize the weights of features [40,41] and selects the best-known solution in swarms. Therefore, this study could establish the best induction-motor fault-detection system with the features selected by CFFS and the weights of these features after training ANN.

To compare the feature-selection approach's performance, this study chooses the 1st to the 10th feature-selection approach orders through the MRA and the HHT, which are shown in Table 3. The MRA–ReliefF, MRA–CFS, and MRA–CFFS have the same 9 features (F35, F24, F54, F60, F27, F57, F30, F51, and F58). The HHT–ReliefF, HHT–CFS, and HHT–CFFS only have the same 2 features (F5 and F4). The important features mentioned above are marked in Figures 5 and 8 (the red dot •). Inferring to Table 3, the features extracted from the MRA with feature-selection approaches are more similar than the HHT. According to the result, the performance of feature selection is affected by the features extracted from signal processing.



**Table 3.** Features order.


#### **4. The Result of Induction-Motor Fault Detection**

This section demonstrates the results of the fault-detection system and analyzes the current signals using MRA and HHT. As shown in Figure 9, the feature-selection method is used to reduce the number of features to test the efficiency of IMFD with noise current signals (including SNR: 40 dB, 30 dB, 20 dB, and 10 dB): (a) Use Matlab to add the AWGN into current signals; (b) analyze the data; (c) select the features. The feature order after adding noise is the same as the feature-selection method applied to the original signal. (d) Training the fault-detection system. (e) Finally, obtain the accuracy of this fault-detection system. ReliefF and CFS both select features based on feature correlation, whereby the feature orders of ReliefF and CFS are the same. CFFS selects the features based on feature correlation and the performance of the fault-detection system, whereby feature orders will change every time according to accuracy. Therefore, the accuracies of the MRA–ReliefF, MRA–CFS, HHT–ReliefF, and HHT–CFS are at an average level through 50 rounds of training and testing. The MRA–CFFS and HHT–CFFS only undergo the training and testing process once, whereby the accuracy curves are more unstable than the accuracy curves of the MRA–ReliefF, MRA–CFS, HHT–ReliefF, and HHT–CFS. In conclusion, this study compares the accuracy curve of all results and proposed the best model to establish the fault-detection system.

**Figure 9.** Schematic diagram of current signal added the noise to establish fault-detection system. (**a**) capture the observations, (**b**) build fault detection dataset, (**c**) feature selection, (**d**) train the ANN, (**e**) classification result.

#### *4.1. Parameter Setting of ANN*

The ANN is composed of the input layer, hidden layer, output layer, and neurons. In the hidden layer, the input is computed via weights, biases, and activation functions. The classification result is computed by the output layer. In ANN, the weight and bias of each neuron are adjusted by calculating the error between the output and the target. Updating the weights and biases during the iteration will reduce the cross-entropy loss. The parameter settings of the ANN used in this study are shown in Table 4.

**Table 4.** Parameter setting of ANN.


*4.2. Compare the Signal-Processing Aproaches: The MRA, and the HHT*

The accuracies of the MRA–ReliefF (Figure 10) are displayed at 60 feature numbers and the accuracies of the HHT–ReliefF (Figure 11) are displayed at 70 feature numbers under different noise conditions. The comparison results are summarized below. The accuracies under different noise conditions of the MRA–ReliefF is higher than the accuracies of the HHT–ReliefF.


The accuracies of the MRA–CFS (Figure 12) are displayed at 60 feature numbers and the accuracies of the HHT–CFS (Figure 13) are displayed at 70 feature numbers under different noise conditions. The comparison results are summarized below. The accuracies under different noise conditions of the MRA–CFS are higher than the accuracy of the HHT–CFS.


The accuracies of the MRA–CFFS (Figure 14) are displayed at 60 feature numbers and the accuracies of the HHT–CFFS (Figure 15) are displayed at 70 feature numbers under different noise conditions. The comparison results are summarized below. The accuracies under different noise conditions of the MRA–CFFS are higher than the accuracy of the HHT–CFFS.


**Figure 10.** Accuracy curves of the MRA–ReliefF.

**Figure 11.** Accuracy curves of the HHT–ReliefF.

**Figure 12.** Accuracy curves of the MRA–CFS.

**Figure 13.** Accuracy curves of the HHT–CFS.

**Figure 14.** Accuracy curves of the MRA–CFFS.

**Figure 15.** Accuracy curves of the HHT–CFFS.

#### *4.3. Compare the Feature-Selection Approaches: ReliefF, CFS, and CFFS*

The highest efficiencies of the MRA with different feature-selection approaches under different noise conditions are shown in Tables 5–7. The comparison is summarized as below. The accuracies of the CFFS are slightly higher than the accuracy of ReliefF and the CFS under ∞ dB, 40 dB, and 30 dB. Under severe noise conditions such as 20 dB and 10 dB, the CFFS achieves a better performance than ReliefF and the CFS.


The highest efficiencies of the HHT with different feature-selection approaches under different noise conditions are shown in Tables 8–10. The comparison is summarized below. The accuracies of the CFFS are slightly lower than the accuracy of ReliefF and the CFS under ∞ dB, 40 dB, and 30 dB. Under severe noise conditions such as 20 dB and 10 dB, the CFFS achieves a better performance than ReliefF and the CFS.


According to the comparison of the signal-processing approaches and feature-selection approaches, the performance of the MRA is better than the HHT, and the CFFS can establish an effective fault-detection system than ReliefF and CFS. The result could be inferred by the feature distribution of MRA (Figure 5) and the HHT (Figure 8). The features of MRA (Figure 5) have more significant features than the HHT (Figure 8). For establishing the faultdetection system, the selected signal-processing approach has an impact on the system, and the system established with the feature-selection approach could reduce the considerable feature numbers.

**Table 5.** Result of the MRA–ReliefF.







**Table 8.** Result of the HHT–ReliefF.

#### **Table 9.** Result of the HHT–CFS.


**Table 10.** Result of the HHT–CFFS.


#### **5. Conclusions**

The study proposes the CFFS with the advantage of filter and wrapper; therefore, the CFFS has significant performance in the fault-detection system. According to the results of this research, the choice of signal processing and feature-selection approach is a crucial influence on the accuracy of the fault-detection system. MRA is one useful method to analyze the faulty motor in this paper, which provides good features for the CFFS, which has a significant effect on the system, reducing 57 (95%) of the features from MRA and achieving 93% accuracy. The system established with CFFS also achieves excellent performance under 40 to 10 dB AWGN, reducing about 54 to 57 (90% to 95%) features and maintaining an accuracy of about 92% to 93%. In this research, the low-dimensional feature is suitable to use CFFS. In other words, CFFS uses in other cases with high-dimensional features could have higher operating costs; this factor is the limitation for CFFS. Therefore, this study establishes the fault-detection system with MRA and CFFS for the faulty motors in this study.

**Author Contributions:** Conceptualization, M.-S.W. and G.-L.Z.; methodology, M.-S.W.; software, G.-L.Z.; validation, C.-Y.L., M.-S.W. and G.-L.Z.; formal analysis, M.-S.W.; resources, C.-Y.L.; data curation, M.-S.W.; writing—original draft preparation, M.-S.W.; writing—review and editing, C.-Y.L., G.-L.Z. and T.-A.L.; visualization, M.-S.W.; supervision, C.-Y.L. and T.-A.L.; project administration, C.-Y.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Nomenclature**


#### **References**


## *Article* **HIFA-LPR: High-Frequency Augmented License Plate Recognition in Low-Quality Legacy Conditions via Gradual End-to-End Learning**

**Sung-Jin Lee 1,†, Jun-Seok Yun 1,†, Eung Joo Lee <sup>2</sup> and Seok Bong Yoo 1,\***


**Abstract:** Scene text detection and recognition, such as automatic license plate recognition, is a technology utilized in various applications. Although numerous studies have been conducted to improve recognition accuracy, accuracy decreases when low-quality legacy license plate images are input into a recognition module due to low image quality and a lack of resolution. To obtain better recognition accuracy, this study proposes a high-frequency augmented license plate recognition model in which the super-resolution module and the license plate recognition module are integrated and trained collaboratively via a proposed gradual end-to-end learning-based optimization. To optimally train our model, we propose a holistic feature extraction method that effectively prevents generating grid patterns from the super-resolved image during the training process. Moreover, to exploit highfrequency information that affects the performance of license plate recognition, we propose a license plate recognition module based on high-frequency augmentation. Furthermore, we propose a gradual end-to-end learning process based on weight freezing with three steps. Our three-step methodological approach can properly optimize each module to provide robust recognition performance. The experimental results show that our model is superior to existing approaches in low-quality legacy conditions on UFPR and Greek vehicle datasets.

**Keywords:** gradual end-to-end learning; single-image super-resolution; automatic license plate recognition; low-quality legacy conditions; holistic feature extraction; high-frequency augmentation

**MSC:** 68T45

### **1. Introduction**

#### *1.1. License Plate Recognition in a Real-World Scenario*

Scene text detection and recognition is a task that detects text regions and recognizes letters and numbers in image frames. This task can be utilized in various applications in smart parking and driving such as illegal parking detection and traffic sign recognition. When this task is applied to image frames in real-world scenarios, it does not assure satisfactory performance due to the varying resolutions of the input images. Specifically, this is a critical issue for the license plate recognition task. As shown in Figure 1, the plate regions detected in vehicle LP images may have small resolutions depending on the distance between the camera and the vehicle object. Even if the detected region images are input directly to the LP recognition module, it causes severe recognition accuracy degradation, as shown in the result of the low-resolved LP recognition approach in Figure 1. To address this problem, a bicubic-interpolation-based approach can be considered. However, this approach also generates low recognition accuracy with resized low-quality images, as shown in the

**Citation:** Lee, S.-J.; Yun, J.-S.; Lee, E.J.; Yoo, S.B. HIFA-LPR: High-Frequency Augmented License Plate Recognition in Low-Quality Legacy Conditions via Gradual End-to-End Learning. *Mathematics* **2022**, *10*, 1569. https://doi.org/10.3390/math10091569

Academic Editors: Francois Rivest and Abdellah Chehri

Received: 5 April 2022 Accepted: 4 May 2022 Published: 6 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

result of the bicubic-interpolation-based LP recognition approach in Figure 1. To acquire high-quality LP images, super-resolution (SR) techniques can be considered. Nevertheless, conventional SR modules may not be suitable for enhancing recognition accuracy because the SR modules only focus on improving image quality, not recognition. For this reason, such an approach causes insufficient recognition performance, as shown in the result of the super-resolved LP recognition approach in Figure 1. Hence, in the real world, there is a necessity for an integrated model to improve LP recognition accuracy by restoring LP images in terms of LP recognition.

**Figure 1.** Examples of LP recognition approaches in a real-world scenario. Our proposed HIFA-LPR model outperforms conventional approaches. "\_" denotes a missing character recognition. Red character denotes incorrect prediction. Blue character denotes correct prediction. "GT" denotes ground truth. "Pred" denotes the prediction result.

#### *1.2. High-Frequency Augmented License Plate Recognition Model*

To tackle this issue in Section 1.1, we propose a high-frequency augmented license plate recognition (HIFA-LPR) model. The HIFA-LPR model can improve the input image resolution optimally from the point of view of LP recognition and robustly classify the LP characters in the optimal super-resolved image, as shown in the last row of Figure 1. To this end, we suggest gradual end-to-end learning so that LP recognition accuracy is robust to input data with various image qualities and resolutions. When performing the gradual end-to-end learning method, most SR modules are trained with a small-sized image patch such as about 8 × 8 pixels. However, this patch-based SR approach is not suitable for end-to-end learning due to the grid patterns generated by the SR module. Since grid patterns cut off characters, the patch-based SR approaches cannot preserve character information that is closely related to LP recognition accuracy. Hence, we propose a holistic

image feature extraction method that is adopted for preventing grid pattern generation while preserving character information in the SR module.

In recognition tasks, high-frequency information, such as edge, contrast, and texture, is closely related to recognition performance. For this reason, we propose an LP recognition module based on high-frequency augmentation to exploit enhanced high-frequency information. Our LP recognition module mainly consists of high-frequency augmentation blocks (HAB). We utilize the discrete cosine transform (DCT) principle that the component corresponds to the higher-frequency component as it goes to the right and bottom directions in the DCT domain. In the HAB, we extract the desired high-frequency components using the DCT principle and augment the high frequency of the feature map.

The training process of HIFA-LPR consists of three steps based on a weight freezing technique. First, the SR and the LP recognition modules are independently trained for stabilizing the training process. Second, to properly restore images in terms of LP recognition, the SR module is trained with SR loss and recognition loss while the LP recognition module weights are frozen. Third, the recognition module is trained with super-resolved LP images for enriching LP recognition accuracy. By using the weight freezing technique, we enhance the collaborative correlations between each module. To verify HIFA-LPR, we perform experiments with SR and recognition performance in low-quality legacy conditions using the UFPR [1] and Greek vehicle datasets [2]. The UFPR dataset is organized with Brazilian LP images and character labels for detecting LP and recognizing LP characters. The Greek vehicle dataset is organized with Greek LP images for LP detection only. To utilize this dataset for low-resolution (LR) recognition, we build the LP recognition dataset by manually annotating each LP character in the LP images. The contributions of this study can be summarized as follows:


#### **2. Related Works**

#### *2.1. Single-Image Super-Resolution*

Single-image SR is a method to predict a high-resolution (HR) image from the corresponding LR image. However, single-image SR is an ill-posed problem because there are various methods of degradation while reducing the image quality from HR to LR. To address this problem, several studies have been conducted with deep-learning-based methods. A super-resolution convolutional neural network (SRCNN) [3] proposed the SR method based on convolutional neural networks for the first time, and it showed innovative restoration performance. A very deep convolutional network (VDSR) [4] proposed a deeper SR neural network with a residual learning strategy. An efficient sub-pixel convolutional neural network (ESPCNN) [5] proposed a pixel-shuffling layer that can learn an up-sampling module. Using this layer, ESPCNN solved the limitation of feature map magnification in the neural network. A deep back-projection network (DBPN) [6] proposed an iterative up-sampling and down-sampling module that repeatedly stacks the image

upscaling and downscaling layers. A residual channel attention network (RCAN) [7] proposed a channel attention mechanism that helps to create a deep model. A dual regression network (DRN) [8] proposed a closed circuit and added an LR domain loss function that calculates the difference from the input image. A residual dense network (RDN) [9] proposed a neural network that can learn the hierarchical representation of all feature maps through the residual density structure. A second-order attention network (SAN) [10] showed outstanding performance by strongly improving the representation of image feature maps and learning the interdependencies between feature maps. Meta-transfer learning for zeroshot SR (MZSR) [11] proposed a flexible algorithm for restoring images that are blurred under actual blur conditions by training on various kernels. Shifted windows using image restoration (SwinIR) [12] proposed the SR method for image restoration by using a shifted windows (Swin) transformer which performs reliable performance on high-level vision tasks. By using this method, SwinIR outperforms the state-of-the-art SR method.

However, these SR modules only focus on image reconstruction. Since the SR modules cannot appropriately restore the image in terms of LP recognition, the recognition performance is degraded. To obtain better recognition performance, we propose a stepwise gradual end-to-end learning method using combined loss and weight freezing. Specifically, the above SR modules [3–12] generate grid patterns during the end-to-end training process due to patch-extraction-based approaches. The extracted patches that include insufficient character information hinder the optimization of the LP recognition module. To address this issue, we propose the holistic-feature-extraction-based SR module that takes the whole LP image as input. Since our method uses full character information, our SR module can be trained to improve LP character recognition compared with existing patch-extraction-based SR modules.

#### *2.2. License Plate Recognition*

LP recognition is the task to recognize the LP characters in the vehicle image. Various studies have been conducted to boost LP character recognition performance. OpenALPR [13] is the LP recognition API and it is based on OpenCV and TesseractOCR [14]. Lee et al. [15] mentioned that LP recognition performance is improved with the SR mode when the LP image is too small to recognize with the recognition module, which has a fixed input size. This shows that the SR method can improve the LP recognition performance with LR images. A super-resolved recognition method [16] was proposed for LR image character recognition and the data augmentation algorithm for left-right reversal. Wang et al. [17] proposed a method that can exploit a synthetic data generation approach based on a generative adversarial network (GAN) for a data generation procedure to obtain a large representative LP dataset. Hamdi et al. [18] proposed double GAN for image enhancement with LP images. They performed SR training used for constructive LP denoising and SR to increase the LP recognition accuracy when an LR image was used for recognition. Wang et al. [19] proposed a convolutional recursive neural network followed by the connectionist temporal classification for LP character recognition. Combining with multitask cascaded convolutional neural network detection, they proposed a recognition module that can detect the LP region and classify characters of LP.

LPRNet [20] proposed the LP character recognition module with the end-to-end method for automatic LP recognition (ALPR) without preliminary character segmentation. Moreover, this method is lightweight enough to run on a variety of platforms, including embedded devices. Laroca et al. [1] proposed the LP dataset with 4500 fully annotated images focused on usual and different real-world scenarios to help address the inadequacy of an LP database and to address the low-recognition problem. Nguyen et al. [21] proposed the LP detection and LP recognition module which is embedded with the spatial transformer to increase the accuracy of LP character detection and recognition with the CCPD dataset [22]. Xu et al. [23] proposed the location-aware 2D attention-based recognition module that recognizes both single-line and double-line plates with perspective deformation. Vasek et al. [24] proposed the LP recognition neural network with the CNN method in LR

frames. Lee et al. [25] proposed the GAN-based SR method that can be adopted in LPrecognition-challenged environments. Zhang et al. [26] proposed the multitask generative adversarial network (MTGAN) which combines the SR and recognition modules in a onestep end-to-end learning method. Li et al. [27] proposed the unified deep neural network for the LP image localizing and recognizing the characters at once in a single forward pass. This method operates the LP detection and recognition jointly by a single network to avoid intermediate error accumulation and accelerate the LP processing speed. However, these methods are not the end-to-end training method for LP recognition that cannot guarantee the training stability for stable performance. Moreover, these methods cannot make the recognition module be optimized so that the loss function of LP recognition approaches the global minimum. Zhang et al. [28] proposed the robust LP recognition module in the wild situation. This method also proposed the GAN-based LP generation engine to reduce the exhausting human annotation work. However, there is significant performance degradation when these methods are applied to the image frames in real-world scenarios because the input images have various image resolutions and qualities.

Even if the SR-based approach [15–17,24,25] is applied to the low-quality legacy image, it produces insufficient performance due to irreverent relationships with the recognition module. Moreover, other approaches [18,26] tried to figure it out using the single-step end-to-end learning method. However, these methods based on single-step end-to-end learning cannot optimally strengthen the collaborative correlations between the SR and LP recognition modules. Meanwhile, these LP recognition modules [13–28] do not have any module that augments the high frequency of characters, which is the main clue of LP recognition. For this reason, the LP recognition performance of the referred LP recognition module is not satisfactory at recognizing low-quality characters.

To tackle this issue, in this study, we propose a gradual end-to-end learning method with three steps: Step 1: the independent training of the SR module and LP recognition module; Step 2: SR module training with LP recognition module weight freezing; Step 3: LP recognition module training with SR module weight freezing.

Furthermore, we propose the LP recognition module based on high-frequency augmentation. Our LP recognition module mainly consists of HAB which reinforces the high frequencies of precise character components such as edge, texture, and contrast. Due to HAB, our LP recognition can provide robust recognition performance even if low-quality legacy LP images are inputted.

#### **3. Proposed Method**

In this section, we present our methodological contributions. First, a holistic image feature-extraction SR module is proposed to guarantee a stable end-to-end learning process, unlike DBPN [6]. Second, an LP recognition module based on high-frequency augmentation is proposed to strengthen the high-frequency component for LP recognition accuracy, unlike the state-of-the-art object recognition module, Yolov5 [29]. Our LP recognition module mainly consists of HAB. The proposed HAB extracts only the desired frequency by using our new DCT-based frequency mask. Finally, a gradual end-to-end learning process with three steps based on weight freezing is suggested to strengthen the collaborative correlations with each module.

#### *3.1. Architecture of Each Module in HIFA-LPR Model*

In this section, we introduce the HIFA-LPR model architecture. The HIFA-LPR model consists of a holistic-feature-extraction-based SR module and an LP recognition module based on high-frequency augmentation.

**Holistic-Feature-Extraction-based SR Module.** The existing SR methods [3–12] randomly extract patches with about 8 × 8 pixels by cropping the LR image, due to a lack of computing power. The extracted patches pass through the SR module and LP recognition module in an end-to-end learning process. However, character information is not fully considered because of the truncated character information. It causes severe training

performance degradation of the recognition module in the end-to-end learning process. On the other hand, our holistic-feature-extraction-based SR ensures the training stability of the LP recognition module because our module utilizes the character position and full character information by using the whole image. Due to a lack of computing power, we set the training batch size to 1 during the end-to-end learning process.

In this study, considering end-to-end learning for super-resolved character recognition, we propose the holistic-feature-extraction-based SR that takes the whole LP image as input. The DBPN [6] is benchmarked to improve the quality of LP images. Our holistic feature extraction consists of a 3 × 3 convolution layer and a 1 × 1 convolution layer. Unlike the original DBPN method, our SR method takes the whole LP image and performs SR by extracting holistic features and repeatedly upscaling and downscaling the holistic features through the convolution layers, as shown in Figure 2. The SR module extracts the feature map of the input image. The extracted feature maps pass through the up-blocks and down-blocks to obtain feature maps of 64 channels per block. The SR module connects the feature maps obtained for each block so that feature maps are passed to the final output layer, and then, the super-resolved RGB 3-channel image is acquired. This SR module can magnify the image resolution and improve the image quality. However, training stability cannot be guaranteed since patch-based SR models generate grid patterns in the superresolved image. Such generated grid patterns cause restoration performance degradation. As shown in Figure 3a, since the patch-based end-to-end training process loses character information, it causes severe training performance degradation. It is a critical issue in terms of LP recognition. In addition, LP recognition model training is impossible with the patch that has a small part of the input image. On the other hand, our holistic image feature extraction consisting of a 3 × 3 convolution layer and a 1 × 1 convolution layer preserves character information in the super-resolved image while alleviating annoying grid patterns, as shown in Figure 3b. By using this method, our LP recognition module can be trained with a stable training process. Algorithm 1 shows the pseudocode of the holistic-featureextraction-based SR module for scale factor of ×4. Our holistic-feature-extraction-based SR module utilizes character position and character information by using the whole image, unlike DBPN.

**Figure 2.** Architecture of holistic-feature-extraction-based SR module.

**Figure 3.** (**a**) Diagram of the patch-based end-to-end training process. (**b**) Diagram of the holisticfeature-based end-to-end training process. In output images, red boxes denote the detected character region.

**LP Recognition Module based on High-frequency Augmentation.** In the LP recognition task, high-frequency information, such as edge, contrast, and texture, affects LP recognition performance. Hence, to improve LP recognition performance, high-frequency components should be appropriately augmented. To this end, we propose the LP recognition module which is benchmarked and improved from Yolov5 [29]. Since there is no correlation between adjacent numbers or characters in a single LP, our proposed LP recognition module independently detects and recognizes each character in the LP instead of a whole-character-based recognition. To this end, we adopt and improve Yolov5, which is known to provide state-of-the-art accuracy in the object detection field. We note that we utilize high-frequency components to promote LP recognition accuracy, unlike Yolov5.

To augment the high-frequency component, it is necessary to extract only the desired high frequency. Therefore, we utilize the DCT, which has the principle that low-frequency components are concentrated on the upper left and high-frequency components are concentrated on the lower right in the DCT spectrum. A two-dimensional image of size *N* × *M* can be transformed into the frequency domain through DCT, as shown in Equation (1).


$$\mathbf{F}(u,v) = \alpha(u)\boldsymbol{\beta}(v)\sum\_{i=0}^{N}\sum\_{j=0}^{M}\mathbf{f}(i,j)\boldsymbol{\gamma}(i,j,u,v) \tag{1}$$

$$\gamma(i,j,\mu,v) = \cos(\frac{\pi(2i+1)\mu}{2N})\cos(\frac{\pi(2j+1)v}{2M})\tag{2}$$

$$\alpha(u) = \begin{cases} \sqrt{\frac{1}{N'}} & u = 0 \\ \sqrt{\frac{2}{N'}} & u \neq 0 \end{cases} \tag{3}$$

$$\beta(v) = \begin{cases} \sqrt{\frac{1}{M}} & v = 0 \\ \sqrt{\frac{2}{M}} & v \neq 0 \end{cases} \tag{4}$$

$$\mathbf{f}(i,j) = \sum\_{u=0}^{N} \sum\_{v=0}^{M} a(u)\beta(v)\mathbf{F}(u,v)\gamma(x,y,u,v) \tag{5}$$

In Equation (1), **f**(*i*, *j*) is the pixel value of the (*i*, *j*) position of the image, and **F**(*u*, *v*) is the DCT coefficient value at the (*u*, *v*) position. Equations (2)–(4) show the definitions of the cosine basis function and regularization constant, respectively. As shown in Equation (5), the frequency domain signal can be transformed into the spatial domain using a twodimensional inverse DCT (IDCT).

By using the DCT principle, we can dynamically extract high-frequency components via a frequency mask *M*. The frequency mask *M* consists of a binary value and is determined depending on the hyper-parameter *ε* as given below.

$$\mathcal{M}(\mathbf{x}, y) = \begin{cases} \ 0, \ y < -\mathbf{x} + 2\varepsilon h \\ \ 1, \text{otherwise} \end{cases},\tag{6}$$

where *h* denotes the height of the input image and *x*, *y* denote the horizontal and vertical coordinates of *M*, respectively. The hyper-parameter *ε* ranges from 0 to 1. Since the more directed from the top left to the bottom right in the zigzag direction is, the higher the frequency component in the DCT domain is, we can extract the desired high-frequency components. Figure 4 shows examples of *M* according hyper-parameter *ε*.

**Figure 4.** Examples of*M*according to *ε*. Because of the larger hyper-parameter *ε*, more low-frequency components are masked. We empirically set *ε* to 0.2.

Our LP recognition module mainly consists of HABs which reinforce the high frequency of the feature map. As shown in Figure 5, the HAB receives the feature map as input. Then, the feature map is transformed into the DCT domain using 2D-DCT. To extract the high frequency in the DCT domain, the element-wise product of the feature map and frequency mask *M* determined by hyper-parameter *ε* is conducted. The extracted highfrequency feature map is transformed into the spatial domain by 2D-IDCT. The obtained high-frequency feature map is added to the original feature map.

**Figure 5.** Architecture of HAB.

Our LP recognition module based on high-frequency augmentation is illustrated in Figure 6. The Focus layer downscales the input image with as little information loss as possible for fast recognition by transforming input image space into depth. By transporting to a convolution batch normalization leaky ReLU (CBL) and a cross-stage partial (CSP) layer, feature maps obtain richer gradient combinations while maintaining lower computations. Moreover, by splitting the gradient flow, CSP reduces the computation of the architecture with the residual unit, which maintains the gradient of the neural network. Then, the feature map passes through the spatial pyramid pooling (SPP) layer to generate a fixed one-dimensional array as input to the fully connected layer to predict. The up-sample block performs up-sampling of the feature map to expand the feature map size, and it allows for small-object detection. Before the concatenation of each feature map, the proposed HAB is employed to exploit enhanced high-frequency information. Finally, the LP recognition module can obtain three different-sized feature maps to detect and recognize the characters of the LP image. The character recognition loss (localization, classification, and confidence losses) is obtained by comparing the result of the LP recognition module with the ground truth label. For training for LP recognition, we define 10 numerical classes for 0–9 and 26 character classes for A–Z. Algorithm 2 shows the pseudocode of the LP recognition module based on high-frequency augmentation.

**Figure 6.** Flowchart of LP recognition module based on high-frequency augmentation.



46: **Output** = Non-maximum suppression (**P1**, **P2**, **P3**)

#### *3.2. Gradual End-to-End Learning Process Based on Weight Freezing*

A schematic diagram of the gradual end-to-end learning method is shown in Figure 7. We implemented the training process based on weight freezing in Steps 1, 2, and 3 as follows. The gradual end-to-end learning process based on weight freezing is one of our methodological contributions.

**Step 1 process.** Step 1 requires the pretrained weights of both modules to guarantee optimized performance. As shown in Figure 7a, we independently train the SR and LP recognition modules using SR loss and LP recognition loss, respectively. The SR loss *LossSR* reduces the L1 difference in pixel values between SR images and HR images. The *LossSR* is defined as

$$Loss\_{SR} = \sum\_{i=1}^{N} |HR\_i - f(LR\_i)|\,\,\,\,\tag{7}$$

where *N* is the number of training images, *LRi* is the *i*-th LR training image, *f*(*LRi*) is the SR result of *LRi*, and *HRi* is the *i*-th HR training image corresponding to the SR image *f*(*LRi*).

**Figure 7.** Flowchart of gradual end-to-end learning method. (**a**) Step 1: Independent training of the SR module and LP recognition module. (**b**) Step 2: SR module training while freezing LP recognition module. (**c**) Step 3: LP recognition module training while freezing SR module. "GT" denotes ground truth. "Pred" denotes the prediction result.

The LP recognition loss *Lossrecognition* consists of localization, confidence, and classification losses [29]. The loss function is defined as

$$Loss\_{\text{recogential}} = \lambda\_{\text{coord}} \sum\_{i=0}^{S^2} \sum\_{j=0}^{B} 1\_{ij}^{\text{obj}} \left[ \left( \mathbf{x}\_i - \hat{\mathbf{x}}\_i \right)^2 + \left( y\_i - \hat{y}\_i \right)^2 \right]$$

$$+ \lambda\_{\text{coord}} \sum\_{i=0}^{S^2} \sum\_{j=0}^{B} 1\_{ij}^{\text{obj}} \left[ \left( \sqrt{w\_i} - \sqrt{\hat{w}\_i} \right)^2 + \left[ \left( \sqrt{h\_i} - \sqrt{\hat{h}\_i} \right)^2 \right] \right] \tag{8}$$

$$\left(1+\sum\_{i=0}^{S^2} \sum\_{j=0}^{B} \mathbf{1}\_{ij}^{obj} \left(\mathbf{C}\_i - \hat{\mathbf{C}}\_i\right)^2 + \lambda\_{noobj} \sum\_{i=0}^{S^2} \sum\_{j=0}^{B} \mathbf{1}\_{ij}^{noobj} \left(\mathbf{C}\_i - \hat{\mathbf{C}}\_i\right)^2 + \sum\_{i=0}^{S^2} \mathbf{1}\_i^{obj} \sum\_{\boldsymbol{\mathcal{c}} \in \text{classars}} \left(p\_i(\mathbf{c}) - p\_i(\mathbf{c})\right)^2, \mathbf{c} \in \mathbb{R}^{n \times n}$$

where *S*<sup>2</sup> denotes the grid cell for recognition. This grid cell gets a value of one if the LP character is recognized and zero otherwise. *λcoord* denotes the constants to take into account more aspects of the loss function. The first and second lines denote the localization loss that computes the error of the position of the bounding box for accurate box detection. *xi* denotes the horizontal coordinate of the *i*-th input image, *yi* denotes the vertical coordinate of the *i*-th input image, *wi* denotes the width of the *i*-th image, and *hi* denotes the height of the *i*-th image. The LP recognition module calculates the sum of squared errors for the *xi*, *yi*, *wi,* and *hi* between the predicted bounding box of the LP recognition module and the

ground truth. *Ci* denotes the confidence of the class in the *i*-th image grid box. This *Ci* is a probability value between zero and one that is determined when a character is detected in the box. When no character is detected, *λnoobj* is used for the LP recognition loss. *pi(c)* denotes the confidence of the detected class of the *i*-th image.

**Step 2 process.** Step 2 is constructed to associate the irrelevant SR modules with the LP recognition module based on the weight freezing method. The SR and LP recognition losses are summed to strengthen collaborative correlations with each other. To generate images that are robust to LP recognition, the summed loss is backpropagated to the SR module, as shown in Figure 7b. In this process, if the parameters of the LP recognition module are changed, the end-to-end learning process cannot be performed properly. Therefore, weights for the LP recognition module are frozen during the training process. Using this process, the SR module is trained so that it reduces both the SR and LP recognition losses. Then, we obtain SR weights that can restore the super-resolved image to improve LP recognition accuracy. The summed loss is calculated as

$$Loss\_{Total} = \text{a } \times Loss\_{SR} + Loss\_{reconstruction} \tag{9}$$

where α is a hyper-parameter that scales the SR loss. In Step 2, α is set to 0.1 to equalize the scales between *LossSR* and *Lossrecognition*.

**Step 3 process.** Step 3 is designed by converting LP recognition module freezing to SR module freezing, as shown in Figure 7c. Although the SR module reconstructs the image to improve LP recognition accuracy, the LP recognition module is not yet adapted to the enhanced super-resolved image. To address this issue, the super-resolved image is used to train the LP recognition module using *LossTotal* with *α* = 0. Through this, the LP recognition module presents superior recognition accuracy to other existing approaches.

As shown in Algorithm 3, a pseudocode capable of gradual end-to-end learning based on weight freezing is implemented.


```
12: gradLP = Backpropagate (LP, Lossrecognition)
13: WLP ←WLP − gradLP
14: end for
15: return WLP
(Step 2) Freeze LP with WLP and Train SR.
SR loss calculation
16: For p = 1 to N do:
17: S = SR(L)
18: LossSR =|H − S|1
19: end for
Recognition loss calculation
20: For q = 1 to N do:
21: S = Bicubic_interpolation(S, 256)
22: labelpred = LP(S)
23: Lossrecognition = Loss(labelpred, labelGT) as in (8)
24: end for
Total loss calculation
25: For p = 1 to N do:
26: Losstotal = LossSR + Lossrecognition × a as in (9)
27: end for
SR weight update via total loss backpropagation
28: For q = 1 to N do:
29: gradSR = Backpropagate (SR, Losstotal)
30: WSR ←WSR − gradSR
31: return WSR
32: end for
(Step 3) Freeze SR with WSR and Train LP.
SR loss calculation
33: For p = 1 to N do:
34: S = SR(L)
35: LossSR =|H − S|1 as in (7)
36: end for
LP loss calculation
37: For p = 1, N do
38: S = Bicubic_interpolation(S, 256)
39: labelpred = LP(S)
40: Lossrecognition = Loss(labelpred, labelGT) as in (8)
41: end for
LP weight update via LP loss backpropagation
42: For q = 1, N do
43: gradLP = Backpropagate (LP, Lossrecognition)
44: WLP ←WLP − gradLP
45: return WLP
46: end for
```
**Optimizer.** Our HIFA-LPR model utilizes the adaptive moment (Adam) optimizer [30] to search a minimum of our loss function *LossTotal* with the iterative operation as follows:

$$m^{(n+1)} = \beta\_1 m^{(n)} + (1 - \beta\_1) \nabla Loss\_{Total}(W^{(n)}),\tag{10}$$

$$w^{(n+1)} = \beta \varrho m^{(n)} + (1 - \beta \varrho) \nabla Loss\_{\text{Total}}(\mathcal{W}^{(n)}) \bullet \nabla Loss\_{\text{Total}}(\mathcal{W}^{(n)}),\tag{11}$$

$$\mathcal{W}^{(n+1)} = \mathcal{W}^{(n)} - \frac{h}{\sqrt{v^{(n+1)} + \omega}} m^{(n+1)}\,,\tag{12}$$

where ∇*LossTotal* denotes the gradient of our loss function *LossTotal*, each *β*<sup>1</sup> and *β*<sup>2</sup> denote the exponential decay rates for the moment estimates, *m(n)* denotes the estimate of the first moment of the gradient, *v(n)* denotes the estimate for the second moment, *W(n)* denotes the vector before the optimization, *W(n+1)* denotes the updated vector by the Adam optimizer, *h* denotes the step size for the optimization process, • denotes the element-wise product, and *ω* denotes the variable that prevents the dividing by zero error in Equation (12). *h* is the important parameter for optimization because it gives a balance between the speed and convergence of the proposed model.

#### **4. Experiments and Analysis**

*4.1. Experimental Setup*

**Dataset.** To verify the effectiveness of our model, we utilize the UFPR [1] and Greek vehicle datasets [2]. The UFPR dataset that consists of Brazilian vehicle LP is organized into 1800 training sets, 900 validation sets, and 1800 test sets. To accurately measure the performance of our framework, we reorganize the UFPR [1] dataset into 3600 training sets and 900 validation sets. We organize the Greek vehicle dataset [2] into 280 training sets and 65 validation sets, and those sets are annotated by ourselves. To simulate low-quality legacy conditions, we resize the HR images to LR images by using the built-in resize function of MATLAB for each scale factor (×3, ×4).

**Metric.** To analyze the performance of SR modules, we use the peak signal-to-noise ratio (PSNR), which evaluates the difference between the original image and the superresolved image. To quantify the recognition accuracy, the mean average precision (mAP) is utilized.

**Environments.** Our framework is implemented in Pytorch 1.8.0., and we use Python 3.8.3, CUDA 11.2, and cuDNN 8.2.0. Our experiment is performed with AMD Ryzen 5 5600X 6-Core Processor CPU, 32GB memory, and NVIDIA RTX 3080 GPU. The 2D-DCT and 2D-IDCT are implemented using the built-in functions of torch.fft.rfft and torch.fft.irfft, respectively. Our HIFA-LPR model is trained by the Adam optimizer [30] with *β*<sup>1</sup> = 0.9 and *β*<sup>2</sup> = 0.999, as shown in Equation (12). The training batch size of our study is set to 16, the number of epochs to 200, *ω* is set to 10<sup>−</sup>8, and the learning rate to 10−4.

#### *4.2. Experimental Results*

We compare our model with other approaches combined with SR modules [8,11,12] and LP recognition modules [14,29]. Among the recent LP recognition modules, we exclude modules that do not provide the source codes [21,23,25–28]. We set the LR image-based LP recognition module trained with LR LP images as the baseline. For a fair comparison, each SR module is trained and validated by the same datasets. In addition, Yolov5 [29] is trained with the corresponding HR LP images. Then, for a fair comparison, each LP recognition module fine-tunes their SR results like Step 3 of our gradual end-to-end learning method. Tables 1 and 2 show the comparison between our HIFA-LPR model and other existing approaches. According to each training step, our HIFA-LPR model is denoted as HIFA-LPR (Steps 1, 2, and 3). Our model outperforms other existing approaches, as shown in Tables 1 and 2. In particular, HIFA-LPR (Step 3) presents that the PSNR is increased by 0.8 dB and the mean average precision (mAP) is increased by 19.7% more than that of HIFA-LPR (Step 1) for a scale factor of ×3. Moreover, HIFA-LPR (Step 3) shows that the PSNR is increased by 0.14 dB and the mAP is increased by 26.5% more than that of HIFA-LPR (Step 1) for the scale factor ×4. These experimental results indicate that our proposed gradual end-to-end learning method is superior to individual learning.


**Table 1.** Comparison between the HIFA-LPR model and other existing approaches for scale factor (×3) on the UFPR dataset.

**Table 2.** Comparison between the HIFA-LPR model and other existing approaches for scale factor (×4) on the UFPR dataset.


Figure 8 shows experimental results from SR modules with the scale factor (×4) on the UFPR dataset. As shown in Figure 8b–d, the SR modules cannot properly improve image quality from the LR image in Figure 8a. As shown in Figure 8f, SwinIR presents a better image quality when compared with the other SR modules. As illustrated in Figure 8h, our HIFA-LPR model provides enhanced edges and textures that are closely related to the LP recognition accuracy. As shown in Figure 9, the LP recognition results are used to quantitatively compare our model and existing approaches. As shown in Figure 9b–e, the

LP recognition results obtained from existing approaches include many missing characters as well as several incorrect predictions. Although the result of HIFA-LPR (Step 3) includes one missing character, it outperforms other approaches in terms of recognition accuracy, as shown in Figure 9h. In addition, Figure 10 shows the mAP results for a numeric class (0–9) for the bicubic method and HIFA-LPR (Step 1, Step 2, and Step 3), respectively. We focus on only numeric class for detailed observation. The mAP result for the numerical class of HIFA-LPR (Step 3) is increased by 21.6% more than HIFA-LPR (Step 1). As the gradual end-to-end learning process progresses step by step, the mAP is increased. It demonstrates the effectiveness of the gradual end-to-end learning method.

**Figure 8.** SR results on the LP image of the UFPR dataset for scale factor (×3): (**a**) input low-quality legacy image (24 × 8), (**b**) bicubic (96 × 32), (**c**) MZSR, (**d**) DRN, (**e**) SwinIR, (**f**) HIFA-LPR (Step 1), (**g**) HIFA-LPR (Step 2), (**h**) HIFA-LPR (Step 3), (**i**) HR image.

**Figure 9.** LP recognition results on (**a**) input low-quality legacy image (24 × 8), (**b**) bicubic image (96 × 32), (**c**) MZSR result, (**d**) DRN result, (**e**) SwinIR result, (**f**) HIFA-LPR (Step 1) result, (**g**) HIFA-LPR (Step 2) result, (**h**) HIFA-LPR (Step 3) result, (**i**) HR image. "\_" denotes a missing character. Red character denotes an incorrect prediction. Blue character denotes a correct prediction. "GT" denotes ground truth. "Pred" denotes the prediction result.

Tables 3 and 4 show the comparison between our HIFA-LPR model and other approaches to the Greek vehicle datasets. As shown in Table 3, HIFA-LPR (Step 3) presents that the mAP is increased by 1.5%, while the PSNR is decreased by 2.93 dB compared to that of SwinIR for scale factor ×3. As shown in Table 4, although HIFA-LPR (Step 3) shows that the PSNR is decreased by 3.6 dB less than SwinIR, the mAP of our model is increased by 1.4%. As shown in Figure 11b–e, the SR modules also cannot properly enhance the quality of the image from the LR image in Figure 11a. On the other hand, our HIFA-LPR model produces enhanced edge components, as shown in Figure 11h. It helps improve the performance of LP recognition. Figure 12 shows the results of the LP recognition

module using SR results for scale factor ×4 on the Greek vehicle dataset. As illustrated in Figure 12, HIFA-LPR (Step 3) accurately predicts all characters, while other approaches include several missing characters and incorrect predictions. These experimental results indicate that the proposed HIFA-LPR model produces high recognition performance because the model performs SR to improve recognition accuracy, despite PSNR degradation. In addition, Figure 13 shows the mAP results for the numeric class (0–9) for the bicubic method and HIFA-LPR (Step 1, 2, and 3), respectively. The mAP result for the numerical class for HIFA-LPR (Step 3) is increased by 18.6% more than that of HIFA-LPR (Step 1).

**Figure 10.** mAP comparison results for only numbers (0–9) on the UFPR dataset for scale factor (×4). (**a**–**d**) represent the mAP results on bicubic results, HIFA-LPR (Step 1) results, HIFA-LPR (Step 2) results, and HIFA-LPR (Step 3) results, respectively.

**Figure 11.** SR results on the LP image of the Greek vehicle dataset for scale factor (×4): (**a**) input low-quality legacy image (24 × 8), (**b**) Bicubic (96 × 32), (**c**) MZSR, (**d**) DRN, (**e**) SwinIR, (**f**) HIFA-LPR (Step 1), (**g**) HIFA-LPR (Step 2), (**h**) HIFA-LPR (Step 3), (**i**) HR image.


**Table 3.** Comparison between the HIFA-LPR model and other existing approaches for scale factor (×3) on the Greek vehicle dataset.

**Table 4.** Comparison between the HIFA-LPR model and other existing approaches for scale factor (×4) on the Greek vehicle dataset.


**Figure 12.** LP recognition results on the Greek vehicle dataset with (**a**) input low-quality legacy image (28 × 8), (**b**) bicubic result (112 × 32), (**c**) MZSR result, (**d**) DRN result, (**e**) SwinIR result, (**f**) HIFA-LPR (Step 1) result, (**g**) HIFA-LPR (Step 2) result, (**h**) HIFA-LPR (Step 3) result. (**i**) HR image. "\_" denotes a missing character. Red character denotes an incorrect prediction. Blue character denotes a correct prediction. "GT" denotes ground truth. "Pred" denotes the prediction result.

**Figure 13.** mAP comparison results for only numbers (0–9) on the Greek vehicle dataset for scale factor (×4). (**a**–**d**) represent the mAP results on bicubic results, HIFA-LPR (Step 1) results, HIFA-LPR (Step 2) results, and HIFA-LPR (Step 3), respectively.

#### *4.3. Ablation Study*

**The effect of holistic feature extraction.** In this section, we verify the effectiveness of the holistic image feature-extraction-based SR module by comparison with the patchextraction-based SR module. For a fair comparison, each module is trained and validated using the same datasets and LP recognition module. As shown in Table 5, the holisticextraction-based SR presents better mAP results than patch-extraction-based SR due to the elimination of grid patterns during the gradual end-to-end learning process.


**Table 5.** Comparison between the patch extraction-based SR and the holistic extraction-based SR for scale factor (×4) on the UFPR dataset.

**The effect of the LP recognition module based on high-frequency augmentation.** In this section, we investigate the effect of our LP recognition module. As we mentioned, the HAB extracts desired high-frequency components according to the hyper-parameter *ε* and augments the extracted high-frequency components. We compare our LP recognition module with Tesseract-OCR [14] and Yolov5 [29] which is our baseline module. Since OpenALPR [13] only provides cloud demo service for a single image, we provide only visual recognition results of OpenALPR, as shown in Figure 14. For a fair comparison, each LP module is trained by the same SR results. As shown in Table 6, our LP recognition module with high-frequency augmentation outperforms other modules because the HAB effectively exploits high-frequency components which are closely related to LP recognition. As shown in Figure 14, while our LP recognition module recognizes all characters, the other modules provide missing or misrecognized characters.

**Figure 14.** LP recognition results on same SR result (80 × 60). (**a**) Tesseract-OCR result, (**b**) OpenALPR result (**c**) Yolov5 result, (**d**) HIFA-LPR result, (**e**) HR image. "\_" denotes a missing character. Red character denotes an incorrect prediction. Blue character denotes a correct prediction. "GT" denotes ground truth. "Pred" denotes the prediction result.

**Table 6.** Comparison between the LP recognition module based on high-frequency augmentation and other modules for scale factor (×4) on the UFPR dataset.


**Extension to other countries' LP images.** Our HIFA-LPR model can be extended to other countries' LP images that include different language characters, such as Korean, by building LP datasets and redefining character classes. To verify our framework, we

conduct an additional experiment on the Korean LP dataset. Our model presents that the PSNR is increased by about 1.7 dB and the mAP is increased by about 16.4% compared with SwinIR-based LP recognition. Note that our model also surpasses other approaches on the Korean LP dataset.

**Limitations.** Our HIFA-LPR model may have limitations in practical applications. First, our experiments only assume that the LR image is downscaled with the known bicubic kernel. Hence, there is a possibility that the SR restoration performance will be degraded in the real world where the blur kernel is unknown. Second, our HIFA-LPR model requires more training time due to gradual stepwise learning. However, the inference time of our model is the same as that of other combination methods. The inference time of the proposed HIFA-LPR model is 3.2 ms when the size of the HR LP image is 168 × 168 pixels as input. The input image is 42 × 42 pixels in size scaled by a scale factor of ×4, and the LP recognition module's input size is 168 × 168 pixels. Moreover, the training time of our HIFA-LPR model can be reduced by optimization techniques such as network weight compression or weight sharing.

#### **5. Conclusions**

This study focuses on LP recognition in low-quality legacy conditions. For this, we propose the HIFA-LPR model via gradual end-to-end learning. To this end, we suggest the gradual end-to-end learning method based on weight freezing. This method consists of three steps. In Step 1, the SR and LP recognition modules are independently trained for stabilizing the training process. In Step 2, the SR module is trained with a combined loss function by freezing the LP recognition module weights to strengthen collaborative correlations with each module. In Step 3, the LP recognition module is trained with super-resolved images by freezing the SR module weights to obtain higher LP recognition accuracy. Due to this method, we can enhance collaborative correlations between each module. To optimally train our model, we propose the holistic feature extraction method that can prevent grid pattern generation in the training process. To exploit high-frequency information, we propose an LP recognition module based on high-frequency augmentation. Our LP recognition module extracts only the desired high frequency and enhances the high-frequency component of the feature map. The experimental results show that our HIFA-LPR model provides the best performance in terms of mAP among various existing approaches. Although our HIFA-LPR model is intended for LP recognition, it can be extended to other object recognition tasks by building the related datasets and redefining object classes. In addition, our HIFA-LPR model can be considered in real-world scene text recognition applications such as smart parking and autonomous driving. In the smart parking task, our method can be applied to parking in designated areas and crackdown on illegal parking. In addition, our HIFA-LPR model can be applied to character recognition on traffic signs to help in autonomous driving.

**Author Contributions:** Conceptualization, S.B.Y.; methodology, S.-J.L., J.-S.Y. and S.B.Y.; software, S.-J.L.; validation, J.-S.Y.; formal analysis, S.-J.L. and J.-S.Y.; investigation, E.J.L. and S.B.Y.; resources, S.-J.L. and S.B.Y.; data curation, S.-J.L. and J.-S.Y.; writing—original draft preparation, S.-J.L. and J.-S.Y.; writing—review and editing, E.J.L. and S.B.Y.; visualization, S.B.Y.; supervision, S.B.Y.; project administration, S.B.Y.; funding acquisition, S.B.Y. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2020R1G1A1100798). This work was supported by an Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2020-0-00004, Development of Previsional Intelligence based on Long-term Visual Memory Network). This work was partly supported by an Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-02068, Artificial Intelligence Innovation Hub).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **An Accelerated Convex Optimization Algorithm with Line Search and Applications in Machine Learning**

**Dawan Chumpungam 1, Panitarn Sarnmeta <sup>2</sup> and Suthep Suantai 1,3,\***


**Abstract:** In this paper, we introduce a new line search technique, then employ it to construct a novel accelerated forward–backward algorithm for solving convex minimization problems of the form of the summation of two convex functions in which one of these functions is smooth in a real Hilbert space. We establish a weak convergence to a solution of the proposed algorithm without the Lipschitz assumption on the gradient of the objective function. Furthermore, we analyze its performance by applying the proposed algorithm to solving classification problems on various data sets and compare with other line search algorithms. Based on the experiments, the proposed algorithm performs better than other line search algorithms.

**Keywords:** forward–backward algorithm; line search; accelerated algorithm; convex minimization problems; data classification; machine learning

**MSC:** 65K05; 90C25; 90C30

#### **1. Introduction**

*The convex minimization problem* in the form of the sum of two convex functions plays a very important role in machine learning. This problem has been analyzed and studied by many authors because of its applications in various fields such as data science, computer science, statistics, engineering, physics, and medical science. Some examples of these applications are signal processing, compressed sensing, medical image reconstruction, digital image processing, and data prediction and classification; see [1–8].

As we know in machine learning, especially in data prediction and classification problems, the main objective is to minimize loss functions. Many loss functions can be viewed as convex functions; thus by employing convex minimization, one could find the minimum of such functions, which in turn solve data prediction and classification problems. Many works have implemented this strategy; see [9–11] and the references therein for more information. In this work, we apply *extreme learning machine* together with the *least absolute shrinkage and selection operator* to solve classification problems; more detail will be discussed in a later section. First, we introduce a convex minimization problem, which can be formulated as the following form:

$$\min\_{\mathbf{x}\in H} \{ f(\mathbf{x}) + \mathbf{g}(\mathbf{x}) \},\tag{1}$$

where *f* : *H* → R ∪ {+∞} is proper, convex differentiable on an open set containing *dom*(*g*) and *g* : *H* → R ∪ {+∞} is a proper, lower semicontinuous convex function defined on a real Hilbert space *H*.

**Citation:** Chumpungam, D.; Sarnmeta, P.; Suantai, S. An Accelerated Convex Optimization Algorithm with Line Search and Applications in Machine Learning. *Mathematics* **2022**, *10*, 1491. https:// doi.org/10.3390/math10091491

Academic Editors: Francois Rivest and Abdellah Chehri

Received: 16 March 2022 Accepted: 27 April 2022 Published: 30 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

A solution of (1) is in fact a fixed point of the operator *proxαg*(*I* − *αf*), i.e.,

$$\mathbf{x}^\* = \operatorname{prox}\_{\mathbf{a}\underline{\mathbf{g}}}(I - \mathbf{a}\nabla f)(\mathbf{x}^\*),\tag{2}$$

where *<sup>α</sup>* <sup>&</sup>gt; 0, and *proxαg*(*<sup>I</sup>* <sup>−</sup> *<sup>α</sup>f*)(*x*) = arg min*y*∈*H*{*g*(*y*) + <sup>1</sup> <sup>2</sup>*<sup>α</sup>* (*<sup>x</sup>* − *<sup>α</sup>f*(*x*)) − *<sup>y</sup>*2}, which is known as the *forward–backward operator*. In order to solve (1), the *forward–backward algorithm* [12] was introduced as follows:

$$\mathbf{x}\_{n+1} = \underbrace{\operatorname{prox}\_{\mathbf{a}\_{\text{avg}}}(I - \mathbf{a}\_{\text{n}} \nabla f)}\_{\text{backward}}(\mathbf{x}\_{\text{flward}}), \text{ for all } n \in \mathbb{N}, \tag{3}$$

where *<sup>α</sup><sup>n</sup>* is a positive number. If *<sup>f</sup>* is *<sup>L</sup>*-Lipschitz continuous and *<sup>α</sup><sup>n</sup>* <sup>∈</sup> (0, <sup>2</sup> *<sup>L</sup>* ), then a sequence generated by (3) converges weakly to a solution of (1). There are several techniques that can improve the performance of (3). For instance, we could utilize an inertial step, which was first introduced by Polyak [13], to solve smooth convex minimization problems. Since then, there have been several works that included an inertial step in their algorithms to accelerate the convergence behavior; see [14–19] for examples.

One of the most famous forward–backward-type algorithms that implements an inertial step is the *fast iterative shrinkage–thresholding algorithm* (FISTA) [20]. It is defined as the following Algorithm 1.

#### **Algorithm 1.** FISTA.


where *L* is a Lipschitz constant of *f* .

The term *xn* + *θn*(*xn* − *xn*−1) is known as an inertial term with an inertial parameter *θn*. It has been shown that FISTA performs better than (3). Later, other forward– backward-type algorithms have been introduced and studied by many authors; see for instance [2,8,18,21,22]. However, most of these works assume the Lipschitz assumption on *f* , which is difficult for computation in general. Therefore, in this paper, we focus on another approach where *f* is not necessarily Lipschitz continuous.

In 2016, Cruz and Nghia [23] introduced a line search technique as the following Algorithm 2.


They asserted that Line Search 1 stops after finitely many steps and proposed the following Algorithm 3.

**Algorithm 3.** Algorithm with Line Search 1.

1: **Input** Given *<sup>x</sup>*<sup>0</sup> <sup>∈</sup> *dom*(*g*), *<sup>δ</sup>* <sup>∈</sup> (0, <sup>1</sup> <sup>2</sup> ), *σ* > 0, and *θ* ∈ (0, 1), for all *n* ∈ N,

$$\mathbf{x}\_{n+1} = \operatorname{prox}\_{\gamma\_n \llcorner (I - \gamma\_n \nabla f)(\mathbf{x}\_n)}(\mathbf{x}\_n),$$

where *γ<sup>n</sup>* := Line Search 1(*xn*, *δ*, *σ*, *θ*).

They also showed that the sequence {*xn*} defined by Algorithm 3 converges weakly to a solution of (1) under Assumptions A1 and A2 where:


It is noted that the *L*-Lipschitz continuity of *f* is not necessarily assumed. Moreover, if *f* is *L*-Lipschitz continuous, then A2 is satisfied.

In 2019, Kankam et al. [3] proposed the new line search as the following Algorithm 4.


They also asserted that Line Search 2 stops at finitely many steps and proposed the following Algorithm 5.


A weak convergence result of this algorithm was obtained under Assumptions A1 and A2. Although Algorithms 3 and 5 obtained weak convergence results without the Lipschitz assumption on *f* , the two algorithms did not utilize an inertial step yet. Therefore, some improvements of their convergence behavior using this technique are interesting to investigate.

Motivated by the works mentioned earlier, we aim to introduce a new line search technique and prove that it is well-defined. Then, we employ it to construct a novel forward– backward algorithm that utilizes an inertial step to improve its performance to be better than

the other line search algorithms. We prove a weak convergence theorem of the proposed algorithm without the Lipschitz assumption on *f* and apply it to solve classification problems on various data sets. We also compare its performance with Algorithms 3 and 5 to show that the proposed algorithm performs better.

This work is organized as follows: In Section 2, we recall some important definitions and lemmas used in later sections. In Section 3, we introduce a new line search technique and algorithm for solving (1). Then, we analyze the convergence and complexity of the proposed algorithm under Assumptions A1 and A2. In Section 4, we apply the proposed algorithm to solve data classification problems and compare its performance with other algorithms. Finally, the conclusion of this work is presented in Section 5.

#### **2. Preliminaries**

In this section, some important definitions and lemmas, which will be used in later sections, are presented.

Let {*xn*} be a sequence in *H* and *x* ∈ *H*. We denote *xn* → *x* and *xn x* as a strong and weak convergence of {*xn*} to *x*, respectively. Let *f* : *H* → R ∪ {+∞} be a proper lower semicontinuous and convex function. We denote *dom*(*f*) = {*x* ∈ *H* : *f*(*x*) < +∞}. A *subdifferential* of *f* at *x* is defined by

$$\partial f(\mathbf{x}) := \{ \boldsymbol{\mu} \in H : \langle \boldsymbol{\mu}, \boldsymbol{y} - \mathbf{x} \rangle \ + f(\mathbf{x}) \le f(\boldsymbol{y}), \ \boldsymbol{y} \in H \}.$$

A *proximal operator prox<sup>α</sup> <sup>f</sup>* : *H* → *dom*(*f*) is defined as follows:

$$prox\_{\mathfrak{a}f}(\mathbf{x}) = (I + \mathfrak{a}\partial f)^{-1}(\mathbf{x})\_{\mathfrak{a}}$$

where *I* is an identity mapping and *α* is a positive number. It is well known that this operator is single-valued, nonexpansive, and

$$\frac{\mathbf{x} - \operatorname{prox}\_{af}(\mathbf{x})}{a} \in \partial f(\operatorname{prox}\_{af}(\mathbf{x})), \text{ for all } \mathbf{x} \in H \text{ and } a > 0; \tag{4}$$

see [23] for more details. Next, we present some important lemmas for this work.

**Lemma 1** ([24])**.** *Let ∂ f be a subdifferential of f . Then, the following hold:*


**Lemma 2** ([25])**.** *Let f* , *g* : *H* → R ∪ {+∞} *be proper lower semicontinuous convex functions with dom*(*g*) ⊆ *dom*(*f*) *and J*(*x*, *α*) = *proxαg*(*x* − *αf*(*x*)). *Then, for any x* ∈ *dom*(*g*) *and α*<sup>2</sup> ≥ *α*<sup>1</sup> > 0, *we have*

$$\frac{\alpha\_2}{\alpha\_1} \|\mathbf{x} - J(\mathbf{x}, \alpha\_1)\| \ge \|\mathbf{x} - J(\mathbf{x}, \alpha\_2)\| \ge \|\mathbf{x} - J(\mathbf{x}, \alpha\_1)\|.$$

**Lemma 3** ([26])**.** *Let H be a real Hilbert space. Then, for all a*, *b*, *c* ∈ *H and ζ* ∈ [0, 1], *the following hold:*


**Lemma 4** ([8])**.** *Let* {*an*} *and* {*bn*} *be sequences of non-negative real numbers such that*

$$a\_{n+1} \le (1 + b\_n)a\_n + b\_n a\_{n-1}, \text{ for all } n \in \mathbb{N}.$$

*Then, the following holds:*

$$a\_{n+1} \le K \cdot \prod\_{j=1}^{n} (1 + 2b\_j), \text{ where } K = \max\{a\_1, a\_2\}.$$

$$Moreover, \text{ } \text{if } \sum\_{n=1}^{+\infty} b\_n < +\infty, \text{ then } \{a\_n\} \text{ is bounded.}$$

**Lemma 5** ([26])**.** *Let* {*αn*}, {*βn*} *and* {*γn*} *be sequences of non-negative real numbers such that*

$$
\alpha\_{n+1} \le (1 + \gamma\_n)\alpha\_n + \beta\_{n\prime} \text{ for all } n \in \mathbb{N}.
$$

*If* +∞ ∑ *n*=1 *γ<sup>n</sup>* < +∞ *and* +∞ ∑ *n*=1 *<sup>β</sup><sup>n</sup>* <sup>&</sup>lt; <sup>+</sup>∞, *then* lim *<sup>n</sup>*→+<sup>∞</sup> *<sup>α</sup><sup>n</sup> exists.*

**Lemma 6** ([27], Opial)**.** *Let* {*xn*} *be a sequence in a Hilbert space H. If there exists a nonempty subset* Ω *of H such that the following hold:*


#### **3. Main Results**

In this section, we define a new line search technique and a new accelerated algorithm with the new line search for solving (1). We denote *S*<sup>∗</sup> the set of all solutions of (1) and suppose that *f* , *g* : *H* → R ∪ {+∞} are two convex functions that satisfy Assumptions A1 and A2, and *dom*(*g*) is closed. Furthermore, we also suppose that *S*<sup>∗</sup> = ∅.

We first introduce a new line search technique as the following Algorithm 6.


We first show that Line Search 3 terminates at finitely many steps.

**Lemma 7.** *Line Search 3 stops at finitely many steps.*

**Proof.** If *x* ∈ *S*∗, then *x* = *L*(*x*, *σ*) = *S*(*x*, *σ*), so Line Search 3 stops with zero steps. If *x* ∈/ *S*∗, suppose by contradiction that, for all *n* ∈ N, the following hold:

$$\frac{\sigma\theta^{\mathfrak{n}}}{2}(\|\nabla f(\mathbf{S}(\mathbf{x},\sigma\theta^{\mathfrak{n}}))-\nabla f(L(\mathbf{x},\sigma\theta^{\mathfrak{n}}))\|+\|\nabla f(L(\mathbf{x},\sigma\theta^{\mathfrak{n}}))-\nabla f(\mathbf{x})\|)$$

$$\geq\delta(\|\|S(\mathbf{x},\sigma\theta^{\mathfrak{n}})-L(\mathbf{x},\sigma\theta^{\mathfrak{n}})\|+\|L(\mathbf{x},\sigma\theta^{\mathfrak{n}})-\mathbf{x}\|),\tag{5}$$

or

$$\sigma \theta^{\mathfrak{n}} \| \big| \bigtriangledown f(L(\mathfrak{x}, \sigma \theta^{\mathfrak{n}})) - \bigtriangledown \bigtriangledown 4\delta \big| \big| L(\mathfrak{x}, \sigma \theta^{\mathfrak{n}}) - \mathfrak{x} \big| \big|.\tag{6}$$

Then, from these assumptions, we can find a subsequence {*σθnk*} of {*σθn*} such that (5) or (6) holds. First, we show that

$$\left\{ \left\| \left\| \nabla f(L(\mathbf{x}, \sigma \theta'')) - \nabla f(\mathbf{x}) \right\| \right\} \right. \quad \text{and} \quad \left\{ \left\| \left\| \nabla f(S(\mathbf{x}, \sigma \theta'')) - \nabla f(L(\mathbf{x}, \sigma \theta'')) \right\| \right\} \right\}$$

are bounded. It follows from Lemma 2 that

$$\|\|L(\mathfrak{x}, \sigma \theta^n) - \mathfrak{x}\|\| \le \|\|L(\mathfrak{x}, \sigma) - \mathfrak{x}\|\|\_{\prime}$$

for all *<sup>n</sup>* ∈ N. In combination with A2, we conclude that {*f*(*L*(*x*, *σθn*)) − *f*(*x*)} is bounded. Next, we prove that {*f*(*S*(*x*, *σθn*)) − *f*(*L*(*x*, *σθn*))} is bounded. Since *proxγ<sup>g</sup>* is nonexpansive, for any *γ* > 0, then

$$\begin{split} \|S(\mathbf{x}, \sigma \theta^{\mathbf{n}}) - \nabla L(\mathbf{x}, \sigma \theta^{\mathbf{n}})\| \\ &= \|\operatorname{prox}\_{\sigma \theta^{\mathbf{n}} \underline{\mathcal{J}}} (L(\mathbf{x}, \sigma \theta^{\mathbf{n}}) - \sigma \theta^{\mathbf{n}} \nabla f(L(\mathbf{x}, \sigma \theta^{\mathbf{n}}))) - \operatorname{prox}\_{\sigma \theta^{\mathbf{n}} \underline{\mathcal{J}}} (\mathbf{x} - \sigma \theta^{\mathbf{n}} \nabla f(\mathbf{x}))\| \\ &\leq \|(L(\mathbf{x}, \sigma \theta^{\mathbf{n}}) - \sigma \theta^{\mathbf{n}} \nabla f(L(\mathbf{x}, \sigma \theta^{\mathbf{n}})) - (\mathbf{x} - \sigma \theta^{\mathbf{n}} \nabla f(\mathbf{x})))\| \\ &\leq \|L(\mathbf{x}, \sigma \theta^{\mathbf{n}}) - \mathbf{x}\| + \sigma \theta^{\mathbf{n}} \|\nabla f(L(\mathbf{x}, \sigma \theta^{\mathbf{n}}) - \nabla f(\mathbf{x}))\| \\ &\leq \|L(\mathbf{x}, \sigma \theta^{\mathbf{n}}) - \mathbf{x}\| + \sigma \|\nabla f(L(\mathbf{x}, \sigma \theta^{\mathbf{n}}) - \nabla f(\mathbf{x})))\|. \end{split}$$

for all *<sup>n</sup>* ∈ N; hence, {*S*(*x*, *σθn*) − *L*(*x*, *σθn*)} is bounded. Again, it follows from A2 that {*f*(*S*(*x*, *σθn*)) − *f*(*L*(*x*, *σθn*))} is bounded. To complete the proof, we consider the only two possible cases to find a contradiction.

Case 1: Suppose that there exists a subsequence {*σθnk*} of {*σθn*} such that (5) holds, for all *<sup>k</sup>* ∈ N. Then, it follows that *S*(*x*, *σθnk* ) − *<sup>L</sup>*(*x*, *σθnk* ) → 0 and *L*(*x*, *σθnk* ) − *<sup>x</sup>* → 0, as *k* → +∞. Since *f* is uniformly continuous, we obtain:

$$\|\|\nabla f(\mathbf{S}(\mathbf{x}, \sigma \theta^{\eta\_k})) - \nabla f(L(\mathbf{x}, \sigma \theta^{\eta\_k}))\| \to 0 \text{ and } \ \|\|\nabla f(L(\mathbf{x}, \sigma \theta^{\eta\_k})) - \nabla f(\mathbf{x})\| \to 0,$$

as *<sup>k</sup>* <sup>→</sup> <sup>+</sup>∞. Therefore, it follows from (5) that *L*(*x*,*σθnk* )−*x σθnk* → 0, as *k* → +∞. By (4), we obtain *<sup>x</sup>* <sup>−</sup> *σθnkf*(*x*) <sup>−</sup> *<sup>L</sup>*(*x*, *σθnk* )

$$\frac{\infty - \sigma \theta^{\mathfrak{n}\_k} \nabla f(\mathbf{x}) - L(\mathbf{x}, \sigma \theta^{\mathfrak{n}\_k})}{\sigma \theta^{\mathfrak{n}\_k}} \in \partial \mathfrak{g}(L(\mathbf{x}, \sigma \theta^{\mathfrak{n}\_k})).$$

Thus, *<sup>L</sup>*(*x*,*σθnk* )−*<sup>x</sup> σθnk* <sup>−</sup> *f*(*x*) <sup>∈</sup> *<sup>∂</sup>g*(*L*(*x*, *σθnk* )). Since *<sup>L</sup>*(*x*, *σθnk* ) <sup>→</sup> *<sup>x</sup>*, as *<sup>k</sup>* <sup>→</sup> <sup>+</sup>∞, we obtain from Lemma 1 that 0 ∈ *f*(*x*) + *∂g*(*x*) ⊆ *∂*(*f* + *g*)(*x*). Hence, *x* ∈ *S*∗, which is a contradiction.

Case 2: Suppose that there is a subsequence {*σθnk*} of {*σθn*} satisfying (6), for all *<sup>k</sup>* ∈ N. Then, *L*(*x*, *σθnk* ) − *<sup>x</sup>* → 0, as *<sup>k</sup>* → +∞. Again, from the uniform continuity of *<sup>f</sup>* , we have

$$\left\| \left| \nabla f(L(\mathfrak{x}, \sigma \theta^{n\_k})) - \nabla f(\mathfrak{x}) \right| \right\| \to 0,$$

as *k* → +∞. From (6), we conclude that

$$\frac{||L(\mathbf{x}, \sigma \theta^{n\_k}) - \mathbf{x}||}{\sigma \theta^{n\_k}} \to 0,$$

as *k* → +∞. By the same argument as in Case 1, we can show that 0 ∈ *∂*(*f* + *g*)(*x*), and hence, *x* ∈ *S*∗, a contradiction. Therefore, we conclude that Line Search 3 stops with finite steps, and the proof is complete.

We propose a new inertial algorithm with Line Search 3 as following Algorithm 7.

**Algorithm 7.** Inertial algorithm with Line Search 3.

1: **Input** Given *<sup>x</sup>*0, *<sup>x</sup>*<sup>1</sup> <sup>∈</sup> *dom*(*g*), *<sup>α</sup><sup>n</sup>* <sup>∈</sup> [0, 1], *<sup>β</sup><sup>n</sup>* <sup>≥</sup> 0, *<sup>σ</sup>* <sup>&</sup>gt; 0, *<sup>θ</sup>* <sup>∈</sup> (0, 1) and *<sup>δ</sup>* <sup>∈</sup> (0, <sup>1</sup> <sup>8</sup> ), for *n* ∈ N, *yn* = *xn* + *βn*(*xn* − *xn*−1), *zn* = *Pdom*(*g*)*yn*, *wn* = *proxγ<sup>n</sup> <sup>g</sup>*(*zn* − *γnf*(*zn*)), *xn*+<sup>1</sup> = (1 − *αn*)*wn* + *α<sup>n</sup> proxγ<sup>n</sup> <sup>g</sup>*(*wn* − *γnf*(*wn*)), where *γ<sup>n</sup>* := Line Search 3(*zn*, *δ*, *σ*, *θ*), and *Pdom*(*g*) is a metric projection map onto

The diagram of Algorithm 7 can be seen in Figure 1.

*dom*(*g*).

**Figure 1.** Diagram of Algorithm 7.

Next, we prove the following lemma, which will play a crucial role in our main theorems.

**Lemma 8.** *Let γ<sup>n</sup>* := *Line Search 3*(*zn*, *δ*, *σ*, *θ*). *Then, for all n* ∈ N *and x* ∈ *dom*(*g*)*, the following hold:*

$$\begin{array}{ll} (I) & \|z\_{n} - \mathbf{x}\|^{2} - \|w\_{n} - \mathbf{x}\|^{2} \geq 2\gamma\_{n}[(f+\mathbf{g})(w\_{n}) - (f+\mathbf{g})(\mathbf{x})] + (1-8\delta)\|w\_{n} - z\_{n}\|^{2};\\ (II) & \|z\_{n} - \mathbf{x}\|^{2} - \|v\_{n} - \mathbf{x}\|^{2} \geq 2\gamma\_{n}[(f+\mathbf{g})(w\_{n}) + (f+\mathbf{g})(v\_{n}) - 2(f+\mathbf{g})(\mathbf{x})] \\ & & + (1-8\delta)(\|w\_{n} - z\_{n}\|^{2} + \|v\_{n} - w\_{n}\|^{2}). \end{array}$$

*where vn* = *proxγ<sup>n</sup> <sup>g</sup>*(*wn* − *γnf*(*wn*)).

**Proof.** First, we show that (*I*) is true. From (4), we know that

$$\frac{z\_n - w\_n}{\gamma\_n} - \nabla f(z\_n) \in \partial \mathcal{g}(w\_n), \quad \text{for all } n \in \mathbb{N}.$$

Moreover, it follows from the definitions of *∂g*(*wn*), *f*(*zn*) and *f*(*wn*) that

$$\|\mathbf{g(x)} - \mathbf{g(w\_n)}\| \ge \frac{z\_n - w\_n}{\gamma\_n} - \nabla f(z\_n) \, , \mathbf{x} - w\_n \rangle\_{\mathbf{w}}$$

$$f(\mathbf{x}) - f(z\_n) \ge \langle \nabla f(z\_n), \mathbf{x} - z\_n \rangle \text{ and } f(z\_n) - f(w\_n) \ge \langle \nabla f(w\_n), z\_n - w\_n \rangle,$$
 
$$\text{i.e., } \square$$

for all *n* ∈ N. Consequently,

$$\begin{split} f(\boldsymbol{x}) - f(\boldsymbol{z}\_{n}) + g(\boldsymbol{x}) - g(\boldsymbol{w}\_{n}) &\geq \frac{1}{\gamma\_{n}} (\boldsymbol{z}\_{n} - \boldsymbol{w}\_{n}, \boldsymbol{x} - \boldsymbol{w}\_{n}) + \langle \boldsymbol{\nabla}f(\boldsymbol{z}\_{n}), \boldsymbol{w}\_{n} - \boldsymbol{z}\_{n} \rangle \\ &= \frac{1}{\gamma\_{n}} (\boldsymbol{z}\_{n} - \boldsymbol{w}\_{n}, \boldsymbol{x} - \boldsymbol{w}\_{n}) + \langle \boldsymbol{\nabla}f(\boldsymbol{z}\_{n}) - \boldsymbol{\nabla}f(\boldsymbol{w}\_{n}), \boldsymbol{w}\_{n} - \boldsymbol{z}\_{n} \rangle \\ &\quad + \langle \boldsymbol{\nabla}f(\boldsymbol{w}\_{n}), \boldsymbol{w}\_{n} - \boldsymbol{z}\_{n} \rangle \\ &\geq \frac{1}{\gamma\_{n}} (\boldsymbol{z}\_{n} - \boldsymbol{w}\_{n}, \boldsymbol{x} - \boldsymbol{w}\_{n}) - ||\boldsymbol{\nabla}f(\boldsymbol{z}\_{n}) - \boldsymbol{\nabla}f(\boldsymbol{w}\_{n})|| ||\boldsymbol{w}\_{n} - \boldsymbol{z}\_{n}|| \\ &\quad + \langle \boldsymbol{\nabla}f(\boldsymbol{w}\_{n}), \boldsymbol{w}\_{n} - \boldsymbol{z}\_{n} \rangle \\ &\geq \frac{1}{\gamma\_{n}} (\boldsymbol{z}\_{n} - \boldsymbol{w}\_{n}, \boldsymbol{x} - \boldsymbol{w}\_{n}) - \frac{4\delta}{\gamma\_{n}} ||\boldsymbol{w}\_{n} - \boldsymbol{z}\_{n}||^{2} + f(\boldsymbol{w}\_{n}) - f(\boldsymbol{z}\_{n}), \end{split}$$

for all *n* ∈ N. It follows that

$$\frac{1}{\gamma\_n} \langle z\_{\mathbb{H}} - w\_{\mathbb{H}}, w\_{\mathbb{H}} - \mathbf{x} \rangle \ge (f + \mathbf{g})(w\_{\mathbb{H}}) - (f + \mathbf{g})(\mathbf{x}) - \frac{4\delta}{\gamma\_n} \|w\_{\mathbb{H}} - z\_{\mathbb{H}}\|^2\_{\text{\text{\textquotedblleft}}} \text{ for all } n \in \mathbb{N}.$$

From Lemma 3, we have *zn* <sup>−</sup> *wn*, *wn* <sup>−</sup> *<sup>x</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> (*zn* − *<sup>x</sup>*<sup>2</sup> − *zn* − *wn*<sup>2</sup> − *wn* − *<sup>x</sup>*2), and hence,

$$\frac{1}{2\gamma\_n}(||\mathbf{z}\_n - \mathbf{x}||^2 - ||\mathbf{z}\_n - \mathbf{w}\_n||^2 - ||\mathbf{w}\_n - \mathbf{x}||^2) \ge (f + \mathbf{g})(\mathbf{w}\_n) - (f + \mathbf{g})(\mathbf{x}) - \frac{4\delta}{\gamma\_n}||\mathbf{w}\_n - \mathbf{z}\_n||^2,$$

for all *n* ∈ N. Then, it follows that, for any *x* ∈ *dom*(*g*),

*zn*<sup>−</sup> *<sup>x</sup>*2− *wn*<sup>−</sup> *<sup>x</sup>*<sup>2</sup> <sup>≥</sup> <sup>2</sup>*γn*[(*<sup>f</sup>* <sup>+</sup> *<sup>g</sup>*)(*wn*)<sup>−</sup> (*<sup>f</sup>* <sup>+</sup> *<sup>g</sup>*)(*x*)]+ (<sup>1</sup> <sup>−</sup> <sup>8</sup>*δ*)*wn*<sup>−</sup> *zn*2,

and (*I*) is proven. Next, we show (*I I*). From (4), we have that

$$\begin{aligned} \frac{z\_{\mathfrak{U}} - \mathfrak{w}\_{\mathfrak{U}}}{\gamma\_{\mathfrak{u}}} - \nabla f(z\_{\mathfrak{u}}) &\in \partial \mathfrak{g}(w\_{\mathfrak{u}}), \text{ and} \\\\ \frac{w\_{\mathfrak{u}} - \upsilon\_{\mathfrak{u}}}{\gamma\_{\mathfrak{u}}} - \nabla f(w\_{\mathfrak{u}}) &\in \partial \mathfrak{g}(\upsilon\_{\mathfrak{u}}). \end{aligned}$$

Then,

$$
\mathcal{g}(\mathbf{x}) - \mathcal{g}(w\_n) \ge \langle \frac{z\_n - w\_n}{\gamma\_n} - \nabla f(z\_n), \mathbf{x} - w\_n \rangle, \text{ and}
$$

$$
\mathcal{g}(\mathbf{x}) - \mathcal{g}(v\_n) \ge \langle \frac{w\_n - v\_n}{\gamma\_n} - \nabla f(w\_n), \mathbf{x} - v\_n \rangle, \text{ for all } n \in \mathbb{N}.
$$

Moreover,

$$f(\mathbf{x}) - f(z\_{\mathbf{n}}) \ge \langle \nabla f(z\_{\mathbf{n}}), \mathbf{x} - z\_{\mathbf{n}} \rangle\_{\prime}$$

$$f(\mathbf{x}) - f(w\_{\mathbf{n}}) \ge \langle \nabla f(w\_{\mathbf{n}}), \mathbf{x} - w\_{\mathbf{n}} \rangle\_{\prime}$$

$$f(z\_{\mathfrak{n}}) - f(w\_{\mathfrak{n}}) \ge \langle \nabla f(w\_{\mathfrak{n}}), z\_{\mathfrak{n}} - w\_{\mathfrak{n}} \rangle, \text{ and}$$

$$f(w\_{\mathfrak{n}}) - f(v\_{\mathfrak{n}}) \ge \langle \nabla f(v\_{\mathfrak{n}}), w\_{\mathfrak{n}} - v\_{\mathfrak{n}} \rangle, \text{ for all } \mathfrak{n} \in \mathbb{N}.$$

The above inequalities imply

*f*(*x*) − *f*(*zn*) + *f*(*x*) − *f*(*wn*) + *g*(*x*) − *g*(*wn*) + *g*(*x*) − *g*(*vn*) ≥ 1 *γn zn* − *wn*, *x* − *wn*+*f*(*zn*), *wn* − *zn*+ 1 *γn wn* − *vn*, *x* − *vn*+*f*(*wn*), *vn* − *wn* <sup>=</sup> <sup>1</sup> *γn zn* − *wn*, *x* − *wn* + *f*(*zn*) − *f*(*wn*), *wn* − *zn* + *f*(*wn*), *wn* − *zn* + 1 *γn wn* − *vn*, *x* − *vn* + *f*(*wn*) − *f*(*vn*), *vn* − *wn* + *f*(*vn*), *vn* − *wn* ≥ 1 *γn zn* − *wn*, *x* − *wn* + 1 *γn wn* − *vn*, *x* − *vn*−*f*(*wn*) − *f*(*zn*)*wn* − *zn* + *f*(*wn*), *wn* − *zn*−*f*(*vn*) − *f*(*wn*)*vn* − *wn* + *f*(*vn*), *vn* − *wn* ≥ 1 *γn zn* − *wn*, *x* − *wn* + 1 *γn wn* − *vn*, *x* − *vn* − *f*(*wn*) − *f*(*zn*)(*wn* − *zn* + *vn* − *wn*) + *f*(*wn*), *wn* − *zn* − *f*(*vn*) − *f*(*wn*)(*wn* − *zn* + *vn* − *wn*) + *f*(*vn*), *vn* − *wn* <sup>=</sup> <sup>1</sup> *γn zn* − *wn*, *x* − *wn*+ 1 *γn wn* − *vn*, *x* − *vn*+*f*(*wn*), *wn* − *zn*+*f*(*vn*), *vn* − *wn* − (*f*(*wn*) − *f*(*zn*) + *f*(*vn*) − *f*(*wn*))(*wn* − *zn* + *vn* − *wn*) ≥ 1 *γn zn* − *wn*, *x* − *wn* + 1 *γn wn* − *vn*, *x* − *vn* + *f*(*wn*), *wn* − *zn* <sup>+</sup> *f*(*vn*), *vn* <sup>−</sup> *wn* − <sup>2</sup>*<sup>δ</sup> γn* (*wn* <sup>−</sup> *zn* <sup>+</sup> *vn* <sup>−</sup> *wn*)<sup>2</sup> ≥ 1 *γn zn* − *wn*, *x* − *wn* + 1 *γn wn* − *vn*, *x* − *vn* + *f*(*vn*) − *f*(*zn*) <sup>−</sup> <sup>4</sup>*<sup>δ</sup> γn* (*wn* <sup>−</sup> *zn*<sup>2</sup> <sup>+</sup> *vn* <sup>−</sup> *wn*2),

for all *x* ∈ *dom*(*g*) and *n* ∈ N. Hence,

$$\begin{aligned} &\frac{1}{\gamma\_n} \langle z\_{\boldsymbol{\eta}} - \boldsymbol{w}\_{\boldsymbol{\eta}, \boldsymbol{\eta}} \boldsymbol{w}\_{\boldsymbol{\eta}} - \boldsymbol{x} \rangle + \frac{1}{\gamma\_n} \langle \boldsymbol{w}\_{\boldsymbol{\eta}} - \boldsymbol{v}\_{\boldsymbol{\eta}, \boldsymbol{\eta}} \boldsymbol{v}\_{\boldsymbol{\eta}} - \boldsymbol{x} \rangle \\ &\geq (f+\boldsymbol{g})(\boldsymbol{w}\_{\boldsymbol{\eta}}) + (f+\boldsymbol{g})(\boldsymbol{v}\_{\boldsymbol{\eta}}) - 2(f+\boldsymbol{g})(\boldsymbol{x}) - \frac{4\delta}{\gamma\_n} \| \boldsymbol{w}\_{\boldsymbol{\eta}} - \boldsymbol{z}\_{\boldsymbol{\eta}} \| ^2 - \frac{4\delta}{\gamma\_n} \| \boldsymbol{v}\_{\boldsymbol{\eta}} - \boldsymbol{w}\_{\boldsymbol{\eta}} \| ^2. \end{aligned}$$

Moreover, from Lemma 3, we have, for all *n* ∈ N,

$$
\langle z\_{\mathfrak{n}} - w\_{\mathfrak{n}}, w\_{\mathfrak{n}} - \mathfrak{x} \rangle = \frac{1}{2} (||z\_{\mathfrak{n}} - \mathfrak{x}||^2 - ||z\_{\mathfrak{n}} - w\_{\mathfrak{n}}||^2 - ||w\_{\mathfrak{n}} - \mathfrak{x}||^2),
\text{ and}
$$

$$
\langle w\_{\mathfrak{n}} - v\_{\mathfrak{n}}, v\_{\mathfrak{n}} - \mathfrak{x} \rangle = \frac{1}{2} (||w\_{\mathfrak{n}} - \mathfrak{x}||^2 - ||w\_{\mathfrak{n}} - v\_{\mathfrak{n}}||^2 - ||v\_{\mathfrak{n}} - \mathfrak{x}||^2).
$$

As a result, we obtain

$$\begin{aligned} &\frac{1}{2\gamma\_n}(\|z\_n - \mathbf{x}\|^2 - \|z\_n - w\_n\|^2) - \frac{1}{2\gamma\_n}(\|w\_n - v\_n\|^2 + \|v\_n - \mathbf{x}\|^2) \\ &\geq (f + \mathbf{g})(w\_n) + (f + \mathbf{g})(v\_n) - 2(f + \mathbf{g})(\mathbf{x}) - \frac{4\delta}{\gamma\_n} \|w\_n - z\_n\|^2 - \frac{4\delta}{\gamma\_n} \|v\_n - w\_n\|^2) \end{aligned}$$

for all *x* ∈ *dom*(*g*), and *n* ∈ N. Therefore,

$$\begin{aligned} \left\| \|z\_{\mathfrak{n}} - \mathbf{x} \|\|^2 - \|v\_{\mathfrak{n}} - \mathbf{x} \|\|^2 &\geq 2\gamma\_n [(f+\mathfrak{g})(w\_{\mathfrak{n}}) + (f+\mathfrak{g})(v\_{\mathfrak{n}}) - 2(f+\mathfrak{g})(\mathbf{x})] \\ &+ (1 - 8\delta) (\left\| w\_{\mathfrak{n}} - z\_{\mathfrak{n}} \right\|\|^2 + \left\| v\_{\mathfrak{n}} - w\_{\mathfrak{n}} \right\|\|^2) ) \end{aligned}$$

for all *x* ∈ *dom*(*g*), and *n* ∈ N, and hence, (*I I*) is proven.

Next, we prove the weak convergence result of Algorithm 7.

**Theorem 9.** *Let* {*xn*} *be a sequence generated by Algorithm 7. Suppose that the following hold: B1.* +∞ ∑ *β<sup>n</sup>* < +∞;

*n*=1 *B2. There exists γ* > 0 *such that γ<sup>n</sup>* ≥ *γ*, *for all n* ∈ N.

*Then,* {*xn*} *converges weakly to some point in S*∗*.*

**Proof.** Let *x*<sup>∗</sup> ∈ *S*∗; obviously, *x*<sup>∗</sup> ∈ *dom*(*g*). The following are direct consequences of Lemma 8:

$$\begin{aligned} \|\|z\_n - \mathbf{x}^\*\|\|^2 - \|w\_n - \mathbf{x}^\*\|\|^2 &\geq 2\gamma\_n[(f+\mathbf{g})(w\_n) - (f+\mathbf{g})(\mathbf{x}^\*)] + (1-8\delta)\|w\_n - z\_n\|\|^2 \\ &\geq (1-8\delta)\|w\_n - z\_n\|\|^2, \end{aligned} \tag{7}$$

and

$$\begin{aligned} \left\| \|z\_{n} - \mathbf{x}^\*\|\|^2 - \|v\_{n} - \mathbf{x}^\*\|\|^2 &\geq 2\gamma\_n [(f+\mathbf{g})(w\_n) + (f+\mathbf{g})(v\_n) - 2(f+\mathbf{g})(\mathbf{x}^\*)] \\ &+ (1 - 8\delta)(\left\|w\_{n} - z\_{n}\right\|^2 + \left\|v\_{n} - w\_{n}\right\|^2) \\ &\geq (1 - 8\delta)(\left\|w\_{n} - z\_{n}\right\|^2 + \left\|v\_{n} - w\_{n}\right\|^2), \end{aligned} \tag{8}$$

where *vn* = *proxγ<sup>n</sup> <sup>g</sup>*(*wn* − *γnf*(*wn*)). Then, we have

$$\begin{split} \|\mathbf{x}\_{\mathsf{n}+1} - \mathbf{x}^\*\| &\leq (1 - \alpha\_{\mathsf{n}}) \|w\_{\mathsf{n}} - \mathbf{x}^\*\| + \alpha\_{\mathsf{n}} \|v\_{\mathsf{n}} - \mathbf{x}^\*\| \\ &\leq (1 - \alpha\_{\mathsf{n}}) \|w\_{\mathsf{n}} - \mathbf{x}^\*\| + \alpha\_{\mathsf{n}} \|z\_{\mathsf{n}} - \mathbf{x}^\*\| \\ &\leq \|z\_{\mathsf{n}} - \mathbf{x}^\*\|. \end{split} \tag{9}$$

Next, we show that lim*n*→<sup>∞</sup> *xn* <sup>−</sup> *<sup>x</sup>*∗ exists. Since *Pdom*(*g*) is nonexpansive, we have

$$\begin{split} \|\|\mathbf{x}\_{n+1} - \mathbf{x}^\*\|\| &\le \|\mathbf{z}\_n - \mathbf{x}^\*\| \\ &= \|P\_{\text{dom}\,\langle \mathbf{y} \rangle} \mathbf{y}\_n - P\_{\text{dom}\,\langle \mathbf{y} \rangle} \mathbf{x}^\*\| \\ &\le \|\|\mathbf{y}\_n - \mathbf{x}^\*\|\| \\ &\le \|\mathbf{x}\_n - \mathbf{x}^\*\|\| + \beta\_n \|\|\mathbf{x}\_n - \mathbf{x}\_{n-1}\|\| \\ &\le (1 + \beta\_n) \|\|\mathbf{x}\_n - \mathbf{x}^\*\|\| + \beta\_n \|\|\mathbf{x}\_{n-1} - \mathbf{x}^\*\|\|, \quad \text{for all } n \in \mathbb{N}. \end{split} \tag{10}$$

By using Lemma 4, we have that {*xn*} is bounded. Consequently, +∞ ∑ *n*=1 *βnxn* − *xn*−1 < +∞, and

$$\|\|y\_n - \mathbf{x}\_n\|\| = \beta\_n \|\|\mathbf{x}\_n - \mathbf{x}\_{n-1}\|\| \to 0, \text{ as } n \to +\infty.$$

By (10) together with Lemma 5, we conclude that lim *<sup>n</sup>*→+<sup>∞</sup> *xn* <sup>−</sup> *<sup>x</sup>*∗ exists. Since *xn* <sup>∈</sup> *dom*(*g*), for all *n* ∈ N, we obtain

$$||y\_n - z\_n|| \le ||y\_n - x\_n||\_\prime \quad \text{for all } n \in \mathbb{N}\_\prime$$

which implies that lim *<sup>n</sup>*→+<sup>∞</sup> *yn* <sup>−</sup> *zn* <sup>=</sup> 0. Consequently, lim *<sup>n</sup>*→+<sup>∞</sup> *xn* <sup>−</sup> *zn* <sup>=</sup> 0, and hence, lim *<sup>n</sup>*→+<sup>∞</sup> *xn* <sup>−</sup> *<sup>x</sup>*∗ <sup>=</sup> lim *<sup>n</sup>*→+<sup>∞</sup> *zn* <sup>−</sup> *<sup>x</sup>*∗. Now, we will show that lim *<sup>n</sup>*→+<sup>∞</sup> *xn* <sup>−</sup> *wn* <sup>=</sup> 0. To do this, we consider the following two cases.

Case 1. lim sup *n*→+∞ *α<sup>n</sup>* = *c* < 1, then from (9), we obtain

> lim sup *n*→+∞ *wn* − *x*∗ = lim sup *n*→+∞ *xn* − *x*∗ = lim sup *n*→+∞ *zn* − *x*∗.

Therefore, we obtain from (7) that lim *<sup>n</sup>*→+<sup>∞</sup> *wn* <sup>−</sup> *zn* <sup>=</sup> 0. As a result, we have lim *<sup>n</sup>*→+<sup>∞</sup> *xn* <sup>−</sup>

$$\|w\_n\|\| = 0.$$

Case 2. lim sup *n*→+∞ *α<sup>n</sup>* = 1, then it follows from (9) that

> lim sup *n*→+∞ *vn* − *x*∗ = lim sup *n*→+∞ *xn* − *x*∗ = lim sup *n*→+∞ *zn* − *x*∗.

It follows from (8) that lim *<sup>n</sup>*→+<sup>∞</sup> *wn* <sup>−</sup> *zn* <sup>=</sup> 0, and hence, lim *<sup>n</sup>*→+<sup>∞</sup> *xn* <sup>−</sup> *wn* <sup>=</sup> 0.

We claim that every weak-cluster point of {*xn*} belongs to *S*∗. To prove this claim, let *w* be a weak-cluster point of {*xn*}. Then, there exists a subsequence {*xnk*} of {*xn*} such that *xnk w*, and hence, *wnk w*. Next, we show that *w* ∈ *S*∗. From A2, we know that *f* is uniformly continuous, so lim *<sup>k</sup>*→+<sup>∞</sup> *f wnk* <sup>−</sup> *f znk* <sup>=</sup> 0. From (4), we also have

$$\frac{z\_{n\_k} - \gamma\_{n\_k} \nabla f z\_{n\_k} - w\_{n\_k}}{\gamma\_{n\_k}} \in \partial \mathcal{g}(w\_{n\_k}), \text{ for all } k \in \mathbb{N}.$$

Hence,

$$\frac{z\_{n\_k} - w\_{n\_k}}{\gamma\_{n\_k}} - \nabla f z\_{n\_k} + \nabla f w\_{n\_k} \in \partial g(w\_{n\_k}) + \nabla f w\_{n\_k} = \partial (f + \g)(w\_{n\_k}), \text{ for all } k \in \mathbb{N}.$$

By letting *k* → +∞ in the above inequality, we can conclude from (1) that 0 ∈ *∂*(*f* + *g*)(*w*), and hence, *w* ∈ *S*∗. It follows directly from Lemma 6 that {*xn*} converges weakly to a point in *S*∗, and the proof is now complete.

If we set *β<sup>n</sup>* = 0, for all *n* ∈ N, in Algorithm 7, we obtain the following Algorithm 8.


The diagram of Algorithm 8 can be seen in Figure 2.

**Figure 2.** Diagram of Algorithm 8.

We next prove the complexity of Algorithm 8.

**Theorem 10.** *Let* {*xn*} *be a sequence generated by Algorithm 8. Suppose that there exists γ* > 0 *such that γ<sup>n</sup>* ≥ *γ*, *for all n* ∈ N, *then* {*xn*} *converges weakly to a point in S*∗. *In addition, if <sup>δ</sup>* <sup>∈</sup> (0, <sup>1</sup> <sup>16</sup> ), *then the following also holds:*

$$(f+g)(x\_n) - \min\_{x \in H} (f+g)(x) \le \frac{1}{2\gamma} \frac{[d(\infty\_0, \mathbb{S}\_\*)]^2}{n},\tag{11}$$

*for all n* ∈ N.

**Proof.** A weak convergence of {*xn*} is guaranteed by Theorem 9. It remains to show that (11) is true. Let *vn* = *proxγ<sup>n</sup> <sup>g</sup>*(*wn* − *γnf*(*wn*)) and *x*<sup>∗</sup> ∈ *S*∗.

We first show that *f*(*xk*<sup>+</sup>1) ≤ *f*(*xk*), for all *k* ∈ N. We know that *xk* = *zk* in Lemma 8, so for any *x* ∈ *dom*(*g*) and *k* ∈ N, we have:

$$\|\|\mathbf{x}\_k - \mathbf{x}\|\|^2 - \|\|w\_k - \mathbf{x}\|\|^2 \ge 2\gamma\_k[(f+\mathbf{g})(w\_k) - (f+\mathbf{g})(\mathbf{x})] + (1-8\delta)\|\|w\_k - \mathbf{x}\_k\|\|^2,\tag{12}$$

and

$$\begin{split} \left\| \|\mathbf{x}\_{k} - \mathbf{x} \|\|^{2} - \|\mathbf{v}\_{k} - \mathbf{x} \|\|^{2} &\geq 2\gamma\_{k} [(f+\mathbf{g})(\mathbf{w}\_{k}) + (f+\mathbf{g})(\mathbf{v}\_{k}) - 2(f+\mathbf{g})(\mathbf{x})] \\ &+ (1 - 8\delta) (\left\| \|\mathbf{w}\_{k} - \mathbf{x}\_{k} \|\right\|^{2} + \left\| \|\mathbf{v}\_{k} - \mathbf{w}\_{k} \|\right\|^{2}). \end{split} \tag{13}$$

Putting *x* = *xk* in (12) and (13), we obtain

$$-\|\boldsymbol{w}\_{k} - \mathbf{x}\_{k}\|^{2} \geq 2\gamma\_{k} [(f+\mathbf{g})(\boldsymbol{w}\_{k}) - (f+\mathbf{g})(\mathbf{x}\_{k})] + (1-8\delta)\|\boldsymbol{w}\_{k} - \mathbf{x}\_{k}\|^{2},\tag{14}$$

and

$$\begin{aligned} -\|\boldsymbol{v}\_{k} - \boldsymbol{x}\_{k}\|^{2} &\geq 2\gamma\_{k} [(f+\boldsymbol{g})(\boldsymbol{w}\_{k}) + (f+\boldsymbol{g})(\boldsymbol{v}\_{k}) - 2(f+\boldsymbol{g})(\boldsymbol{x}\_{k})] \\ &+ (1-8\delta) (\|\boldsymbol{w}\_{k} - \boldsymbol{x}\_{k}\|^{2} + \|\boldsymbol{v}\_{k} - \boldsymbol{w}\_{k}\|^{2}) , \end{aligned} \tag{15}$$

respectively. Substituting *x* with *wk* in (13), we obtain

$$\begin{split} \|\mathbf{x}\_{k} - \mathbf{w}\_{k}\|^{2} - \|\mathbf{v}\_{k} - \mathbf{w}\_{k}\|^{2} &\geq 2\gamma\_{k} [(f+\mathbf{g})(\mathbf{v}\_{k}) - (f+\mathbf{g})(\mathbf{w}\_{k})] \\ &+ (1 - 8\delta) (\left\|\mathbf{w}\_{k} - \mathbf{x}\_{k}\right\|^{2} + \left\|\mathbf{v}\_{k} - \mathbf{w}\_{k}\right\|^{2}). \end{split} \tag{16}$$

By summing (15) and (16), we obtain

$$(16\delta - 1)\|\mathbf{x}\_k - \mathbf{w}\_k\|^2 + (16\delta - 4)\|\mathbf{v}\_k - \mathbf{w}\_k\|^2 \ge 4\gamma\_k[(f+\mathbf{g})(\mathbf{v}\_k) - (f+\mathbf{g})(\mathbf{x}\_k)].\tag{17}$$

It follows from (14) and (17) that

$$(f+\emptyset)(w\_k) \le (f+\emptyset)(\mathbf{x}\_k) \quad \text{and} \quad (f+\emptyset)(v\_k) \le (f+\emptyset)(\mathbf{x}\_k).$$

respectively, for all *k* ∈ N. Hence,

$$(f+g)(\mathbf{x}\_{k+1}) - (f+g)(\mathbf{x}\_k) \le (1-a\_k)(f+g)(\mathbf{w}\_k) + a\_k(f+g)(\mathbf{v}\_k) - (f+g)(\mathbf{x}\_k) \le 0,\tag{18}$$

for all *k* ∈ N. Hence, {(*f* + *g*)(*xk*)} is a non-increasing sequence. Now, put *x* = *x*<sup>∗</sup> in (12) and (13), then we obtain

$$\|\|w\_k - \mathbf{x}^\*\|\|^2 - \|\mathbf{x}\_k - \mathbf{x}^\*\|\|^2 \le 2\gamma\_k [(f+\mathbf{g})(\mathbf{x}^\*) - (f+\mathbf{g})(w\_k)],\tag{19}$$

and

$$\begin{split} \|\boldsymbol{v}\_{k} - \mathbf{x}^\*\|^2 - \|\mathbf{x}\_k - \mathbf{x}^\*\|^2 &\le 2\gamma\_k [2(f+\mathbf{g})(\mathbf{x}^\*) - (f+\mathbf{g})(\mathbf{w}\_k) - (f+\mathbf{g})(\mathbf{v}\_k)] \\ &\le 2\gamma\_k [(f+\mathbf{g})(\mathbf{x}^\*) - (f+\mathbf{g})(\mathbf{v}\_k)]. \end{split} \tag{20}$$

Inequalities (19) and (20) imply that

$$\begin{split} \|\mathbf{x}\_{k+1} - \mathbf{x}^\*\|^2 - \|\mathbf{x}\_k - \mathbf{x}^\*\|^2 &\le (1 - a\_k) \|w\_k - \mathbf{x}^\*\|^2 + a\_k \|\upsilon\_k - \mathbf{x}^\*\|^2 - \|\mathbf{x}\_k - \mathbf{x}^\*\|^2 \\ &\le 2\gamma\_k (1 - a\_k) [(f + \mathbf{g})(\mathbf{x}^\*) - (f + \mathbf{g})(\mathbf{w}\_k)] \\ &\qquad + 2\gamma\_k a\_k [(f + \mathbf{g})(\mathbf{x}^\*) - (f + \mathbf{g})(\upsilon\_k)] \\ &= 2\gamma\_k (f + \mathbf{g})(\mathbf{x}^\*) - 2\gamma\_k [(1 - a\_k)(f + \mathbf{g})(\mathbf{w}\_k) + a\_k (f + \mathbf{g})(\upsilon\_k)] \\ &\le 2\gamma\_k [(f + \mathbf{g})(\mathbf{x}^\*) - (f + \mathbf{g})(\mathbf{x}\_{k+1})], \end{split}$$

for all *k* ∈ N. Since *γ<sup>k</sup>* ≥ *γ*, we obtain

$$\begin{split} 0 \ge (f+\mathbf{g})(\mathbf{x}^\*) - (f+\mathbf{g})(\mathbf{x}\_{k+1}) &\ge \frac{1}{2\gamma\_k} (||\mathbf{x}\_{k+1} - \mathbf{x}^\*||^2 - ||\mathbf{x}\_k - \mathbf{x}^\*||^2) \\ &\ge \frac{1}{2\gamma} (||\mathbf{x}\_{k+1} - \mathbf{x}^\*||^2 - ||\mathbf{x}\_k - \mathbf{x}^\*||^2), \end{split} \tag{21}$$

for all *k* ∈ N. Summing the above inequality over *k* = 1, 2, 3, ..., *n* − 1, we obtain

$$n(f+g)(\mathbf{x}^\*) - \sum\_{k=0}^{n-1} (f+g)(\mathbf{x}\_k) \ge \frac{1}{2\gamma} \|\mathbf{x}\_n - \mathbf{x}^\*\|^2 - \|\mathbf{x}\_0 - \mathbf{x}^\*\|^2.$$

for all *n* ∈ N. Since, {(*f* + *g*)(*xk*)} is a non-increasing, we have

$$n(f+\mathbf{g})(\mathbf{x}^\*) - n(f+\mathbf{g})(\mathbf{x}\_\mathbf{n}) \ge \frac{1}{2\gamma}||\mathbf{x}\_\mathbf{n} - \mathbf{x}^\*||^2 - ||\mathbf{x}\_\mathbf{0} - \mathbf{x}^\*||^2,$$

for all *n* ∈ N. Hence,

$$(f+g)(\mathbf{x}\_n) - (f+g)(\mathbf{x}^\*) \le \frac{1}{2\gamma} \frac{||\mathbf{x}\_0 - \mathbf{x}^\*||^2}{n} \tag{22}$$

Since *x*<sup>∗</sup> is arbitrarily chosen from *S*∗, we obtain

$$(f+g)(\mathfrak{x}\_{\mathbb{R}}) - \min\_{\mathbf{x}\in H} (f+g)(\mathbf{x}) \le \frac{1}{2\gamma} \frac{[d(\mathfrak{x}\_{0\boldsymbol{\nu}} \mathcal{S}\_{\ast})]^2}{n} \lambda$$

for all *n* ∈ N, and the proof is now complete.

#### **4. Some Applications on Data Classification**

In this section, we apply Algorithms 3, 5, 7, and 8 to solve some classification problems based on a learning technique called *extreme learning machine (ELM)* introduced by Huang et al. [28]. It is formulated as follows:

Let {(*xk*, *tk*) : *xk* ∈ R*n*, *tk* ∈ R*m*, *<sup>k</sup>* = 1, 2, ... , *<sup>N</sup>*} be a set of *<sup>N</sup>* samples where *xk* is an *input* and *tk* is a *target*. A simple mathematical model for the output of ELM for SLFNs with *M* hidden nodes and activation function *G* is defined by

$$\rho\_j = \sum\_{i=1}^{M} \eta\_i G(\langle w\_i, x\_j \rangle + b\_i)\_{\prime}$$

where *wi* is the weight that connects the *i*-th hidden node and the input node, *η<sup>i</sup>* is the weight connecting the *i*-th hidden node and the output node, and *bi* is the bias. The hidden layer output matrix **H** is defined by

$$\mathbf{H} = \begin{bmatrix} G(\langle w\_1, \mathbf{x}\_1 \rangle + b\_1) & \cdots & G(\langle w\_{M'}, \mathbf{x}\_1 \rangle + b\_M) \\ \vdots & \ddots & \vdots \\ G(\langle w\_1, \mathbf{x}\_N \rangle + b\_1) & \cdots & G(\langle w\_{M'}, \mathbf{x}\_N \rangle + b\_M) \end{bmatrix}.$$

The main objective of ELM is to calculate an optimal weight *η* = [*η<sup>T</sup>* <sup>1</sup> , ... , *<sup>η</sup><sup>T</sup> M*] *<sup>T</sup>* such that **H***η* = **T**, where **T** = [*t<sup>T</sup>* <sup>1</sup> , ... , *<sup>t</sup><sup>T</sup> N*] *<sup>T</sup>* is the training target. If the *Moore–Penrose generalized inverse* **H**† of **H** exists, then *η* = **H**†**T** is the solution. However, in general cases, **H**† may not exist or be difficult for computation. Thus, in order to avoid such difficulties, we transformed the problem into a convex minimization problem and used our proposed algorithm to find the solution *η* without **H**†.

In machine learning, a model can be overfit in the sense that it is very accurate on a training sets, but inaccurate on a testing set. In other words, it cannot be used to predict unknown data. In order to prevent overfitting, the *least absolute shrinkage and selection operator (LASSO)* [29] is used. It can be formulated as follows:

$$\text{Minimize: } \|\mathbf{H}\eta - \mathbf{T}\|\_2^2 + \lambda \|\eta\|\_{1\prime} \tag{23}$$

where *<sup>λ</sup>* is a regularization parameter. If we set *<sup>f</sup>*(*x*) := **H***<sup>x</sup>* − **<sup>T</sup>**<sup>2</sup> <sup>2</sup> and *g*(*x*) := *λx*1, then the problem (23) is reduced to the problem (1). Hence, we can use our algorithm as a learning method to find the optimal weight *η* and solve classification problems.

In the experiments, we aim to classify three data sets from https://archive.ics.uci.edu (accessed on 15 November 2021):

**Iris data set** [30]. Each sample in this data set has four attributes, and the set contains three classes with 50 samples for each type.

**Heart disease data set** [31]. This data set contains 303 samples each of which has 13 attributes. In this data set, we classified two classes of data.

**Wine data set** [32]. In this data set, we classified three classes of 178 samples. Each sample contains 13 attributes.

In all experiments, we used the sigmoid as the activation function. The number of hidden nodes *M* = 30. We calculate the accuracy of the output data by:

$$\text{accuracy} = \frac{\text{correctly predicted data}}{\text{all data}} \times 100.1$$

We chose control parameters for each algorithm as seen in Table 1.


**Table 1.** Chosen parameters of each algorithm.

In our experiments, the inertial parameters *β<sup>n</sup>* for Algorithm 7 were chosen as follows:

$$\beta\_n = \begin{cases} 0.95, \text{ if } n \le 1000 \\ \frac{1}{n^2}, \text{ if } n \ge 1001. \end{cases}$$

In the first experiment, we chose the regularization parameter *λ* = 0.1 for all algorithms and data sets. Then, we used 10-fold cross-validation and utilized *Average ACC* and *ERR*% for evaluating the performance of each algorithm.

$$\text{Average ACC} = \sum\_{i=1}^{N} \frac{\mathbf{x}\_i}{\mathbf{y}\_i} \times 100\% / N\_{\prime \prime}$$

where *N* is the number of folds (*N* = 10), *xi* is the number of data correctly predicted at fold *i*, and *yi* is the number of all data at fold *i*.

Let err*Lsum* = the sum of errors in all 10 training sets, err*Tsum* = the sum of errors in all 10 testing sets, *Lsum* = the sum of all data in all 10 training sets, and *Tsum* = the sum of all data in all 10 testing sets. Then,

$$\text{ERR}\_{\mathbb{W}} = (\text{err}\_{L\%} + \text{err}\_{T\%}) / 2\text{.}$$

where err*L*% <sup>=</sup> err*Lsum Lsum* <sup>×</sup> 100% and err*T*% <sup>=</sup> err*Tsum Tsum* × 100%.

With these evaluation tools, we obtained the results for each data set as seen in Tables 2–4.

**Table 2.** The performance of each algorithm in the first experiment at the 200th iteration with 10-fold cv. on the Iris data set.



**Table 3.** The performance of each algorithm in the first experiment at the 200th iteration with 10-fold cv. on the Heart disease data set.

**Table 4.** The performance of each algorithm in the first experiment at the 200th iteration with 10-fold cv. on the Wine data set.


As seen in Tables 2–4, with the same regularization *λ* = 0.1, Algorithms 7 and 8 perform better than Algorithms 3 and 5 in terms of accuracy, while the computation times are relatively close among the four algorithms.

In the second experiment, the regularization parameters *λ* for each algorithm and data set were chosen using 10-fold cv. We compared the error of each model and data set with various *λ*, then chose the *λ* that gives the lowest error (*ERR*%) for the particular model and data set. Hence, the parameter *λ* varies depending on the algorithm and data set. The choice of parameters *λ* can be seen in Table 5.

**Table 5.** Chosen *λ* of each algorithm.


With the chosen *λ*, we also evaluated the performance of each algorithm using 10-fold cross-validation and similar evaluation tools as in the first experiment. The results can be seen in the following Tables 6–8.


**Table 6.** The performance of each algorithm in the second experiment at the 200th iteration with 10-fold cv. on the Iris data set.

With the chosen regularization parameters *λ* as in Table 5, we see that the *ERR*% of each algorithm in Tables 6–8 is lower than that of Tables 2–4. We can also see that Algorithms 7 and 8 perform better than Algorithms 3 and 5 in terms of accuracy in all experiments conducted.

In Figure 3, we show the graph of *ERR*% for each algorithm of the second experiment. As we can see, Algorithms 7 and 8 have lower *ERR*%, which means they perform better than Algorithm 3 and 5.

**Table 7.** The performance of each algorithm in the second experiment at the 200th iteration with 10-fold cv. on the Heart disease data set.


From Tables 6–8, we can notice that the computational time of Algorithms 7 and 8 is 30% slower than Algorithm 3 at the same number of iterations. However, from Figure 3, we see that at the 120th iteration, both Algorithms 7 and 8 have lower *ERR*% than Algorithm 3 at the 200th iteration. Therefore, the time needed for Algorithms 7 and 8 to achieve the same accuracy as or higher accuracy than Algorithm 3 is actually lower because we can compute the 120-step iteration much faster than the 200-step iteration.


**Table 8.** The performance of each algorithm in the second experiment at the 200th iteration with 10-fold cv. on the Wine data set.

**Figure 3.** ERR% of each algorithm and data set of the second experiment.

#### **5. Conclusions**

We introduced a new line search technique and employed it in order to introduce new algorithms, namely Algorithms 7 and 8. Furthermore, Algorithm 7 also utilizes an inertial step to accelerate its convergence behavior. Both algorithms converge weakly to a solution of (1) without the Lipschitz assumption on *f* . The complexity of Algorithm 8 was also analyzed and studied. Then, we applied the proposed algorithms to the data classification of the Iris, Heart disease, and Wine data set, then their performances were evaluated and compared with other line search algorithms, namely Algorithms 3 and 5. We observed from our experiments that Algorithm 7 achieved the highest accuracy in all data sets under the same number of iterations. Moreover, Algorithm 8, which is not an inertial algorithm, also performed better than Algorithms 3 and 5. Furthermore, from Figure 3, we see that at

a lower number of iterations, the proposed algorithms were more accurate than the other algorithms at a higher iteration number.

Based on the experiments on various data sets, we conclude that the proposed algorithms perform better than the previously established algorithms. Therefore, for our future works, we would like to implement the proposed algorithm to predict and classify the data of patients with non-communicable diseases (NCDs) collected from Sriphat Medical Center, Faculty of Medicine, Chiang Mai University, Thailand. We aim to make an innovation for screening and preventing non-communicable diseases, which will be used in hospitals in Chiang Mai, Thailand.

**Author Contributions:** Writing—original draft preparation, P.S.; software and editing, D.C.; supervision, review and funding acquisition, S.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** The NSRF via the Program Management Unit for Human Resources & Institutional Development, Research and Innovation (Grant Number B05F640183).

**Data Availability Statement:** All data can be obtained from https://archive.ics.uci.edu (accessed on 15 November 2021).

**Acknowledgments:** This research has received funding support from the NSRF via the Program Management Unit for Human Resources & Institutional Development, Research and Innovation (Grant Number B05F640183). This research was also supported by Chiang Mai University.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Mathematics* Editorial Office E-mail: mathematics@mdpi.com www.mdpi.com/journal/mathematics

Academic Open Access Publishing

www.mdpi.com ISBN 978-3-0365-7741-8