Next Article in Journal
Toward Identifying Sustainability Leadership Competencies: Insights from Mapping a Graduate Sustainability Education Curriculum
Next Article in Special Issue
Simulation Model for Wire Harness Design in the Car Production Line Optimization Using the SimPy Library
Previous Article in Journal
Sustainable Circular Economy Strategies: An Analysis of Brazilian Corporate Sustainability Reporting
Previous Article in Special Issue
Applicability of Industry 4.0 Technologies in the Reverse Logistics: A Circular Economy Approach Based on COmprehensive Distance Based RAnking (COBRA) Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Artificial-Immune-System-Based Algorithm Enhanced with Deep Reinforcement Learning for Solving Returnable Transport Item Problems

1
Laboratoire Genie Industriel, CentraleSupelec, Paris Saclay University, 3 Rue Joliot-Curie, 91192 Gif-sur-Yvette, France
2
Complex Systems and Interations, Ecole Centrale of Casablanca, Ville Verte, Bouskoura 27182, Morocco
3
Laboratoire Ingénierie, Management Industriel et Innovation (LIMII), Hassan First University, 577 Route de Casablanca, Settat 26000, Morocco
4
HEC Management School, University of Liege, 14 Rue Louvrex, 4000 Liege, Belgium
*
Author to whom correspondence should be addressed.
Sustainability 2022, 14(10), 5805; https://doi.org/10.3390/su14105805
Submission received: 24 March 2022 / Revised: 1 May 2022 / Accepted: 2 May 2022 / Published: 11 May 2022
(This article belongs to the Special Issue New Trends in Sustainable Supply Chain and Logistics Management)

Abstract

:
This paper proposes a new approach, i.e., virtual pooling, for optimising returnable transport item (RTI) flows in a two-level closed-loop supply chain. The supply chain comprises a set of suppliers delivering their products loaded on RTIs to a set of customers. RTIs are of various types. The objective is to model a deterministic, multi-supplier, multi-customer inventory routing problem with pickup and delivery of multi-RTI. The model includes inventory-level constraints, the availability of empty RTIs to suppliers, and the minimisation of the total cost, including inventory holding, screening, maintenance, transportation, sharing, and purchasing costs for new RTIs. Furthermore, suppliers with common customers coordinate to virtually pool their inventory of empty RTIs held by customers so that, when loaded RTIs are delivered to customers, each may benefit from this visit to pick up the empty RTI, regardless of the ownership. To handle the combinatorial complexity of the model, a new artificial-immune-system-based algorithm coupled with deep reinforcement learning is proposed. The algorithm combines artificial immune systems’ strong global search ability and a strong self-adaptability ability into a goal-driven performance enhanced by deep reinforcement learning, all tailored to the suggested mathematical model. Computational experiments on randomly generated instances highlight the performance of the proposed approach. From a managerial point of view, the results stress that this new approach allows for economies of scale and cost reduction at the level of all involved parties to about 40%. In addition, a sensitivity analysis on the unit cost of transportation and the procurement of new RTIs is conducted, highlighting the benefits and limits of the proposed model compared to dedicated and physical pooling modes.

1. Introduction

Returnable transport items (RTIs) are all reusable assets used to facilitate product shipping, storing, handling, and protection in the supply chain [1]. RTIs cover reusable drums, pallets, crates, rolls, boxes, and barrels [2,3,4]. Along with globalised supply chains, the use of RTIs has become more popular in recent decades as they eliminate the wastes that one-way secondary packaging may generate [5]. The use of RTIs has been proved to be an enabler for better ergonomics and productivity while facilitating automation, better inventory control, and improved quality [3,4,6,7]. Furthermore, their operational benefits help reduce the disposal costs of packaging material and improve productivity [8]. These assets usually flow in a closed-loop supply chain between players [5,9]. Loaded RTIs are received and unloaded at a given level of the supply chain. Either the empty RTI can be collected and returned to the sender or the receiver can reuse them to ship his products and thus continue to flow downstream the supply chain. Therefore, there exist two flows of RTIs that must be managed [10]: forward flows, which correspond to the forward distribution of goods loaded on RTIs, and reverse flows, which correspond to the collection and return of empty RTIs to their owners. This paper aims to optimise both forward and reverse flows of RTIs in a two-level closed supply chain.
Managing such assets has become a primary concern of supply chain managers, along with managing warehouses, machines, and vehicles [7,9]. Indeed, it has become very pressing for companies to effectively package products and guarantee to have them in the proper quantity, at the right place, and at the right time. To avoid shortages, many companies frequently tend to invest in more RTIs, resulting in higher holding and purchasing costs [3,11]. Moreover, supply chain players experience RTI losses with rates varying from 3 to 20% [12]. This mismanagement lengthens turnaround times and pushes players to overinvest in these assets, leading to inefficient budgetary practices: companies buy new RTIs to replace the lost ones and recruit additional staff to handle them [2,9,12].
According to [7,13,14], RTI management can be divided into two modes depending on the ownership of empty RTIs: a dedicated mode (private RTIs) and a shared mode (public RTIs). The dedicated mode (DM) refers to the case where RTIs are owned by players (suppliers, for example) who use them exclusively to deliver their products without considering sharing them with others. They are responsible, in general, for collecting, refurbishing, and managing the inventory of their specific assets. In this system, RTIs received by a partner are shipped back to their specific owner. In the shared mode (SM), players agree to share their RTIs within a “pooled” system. A service provider company manages this shared system, and running such a pool is its core business [15]. In this pool, empty RTIs are physically stored and can be used by all players without any obligation for these assets to return to their starting point at their next movement [15]. RTI pools can be categorised into two types: “rented” and “open” pools [15]. The “rented” pool is based on a one-owner pool model: RTIs are owned by one company that rents and provides the supply chain players with the empty RTIs they need. In this case, the company manages and oversees its RTI pool’s day-to-day operations and services. The “open” pool is based on a changing-owner pool model: all partners store their RTIs in a pool, and when an RTI is used, its ownership is transferred to the receiving partner, who must return similar RTIs of comparable quality (1:1 exchange concept). In both cases, a pooling system involves a pooler responsible for supplying ready-to-use RTIs to all partners, collecting them from downstream levels, refurbishing damaged ones, and holding inventory within its facilities until new RTI orders are placed [16].
The literature review (see Section 2) shows that most papers exclusively address DM and SM and highlight each mode’s benefits on the overall supply chain performance. However, both modes may not always be profitable and practicable. Compared to the SM, the DM may be easier to implement and it does not lead to resource dependency, as each player is always free to manage and use his inventory of empty RTIs [7]. On the other hand, the SM is typically less expensive, as it may offer cost benefits through the shared use of RTIs among tier suppliers [13,14]. However, the prerequisites of commonly serviceable RTIs for various materials from several suppliers are hard to meet [13]. Moreover, the SM compels advanced decision-making on where to locate pooler facilities, how to set facilities’ capacities, and how to distribute transportation flows (i.e., delivery, pickup, inventory balancing, and supply) across the network, which may imply additional managerial costs (i.e., transportation, inventory holding) and a need for solid information system support [17]. The SM may also establish a resource dependency, as each player is not always free to pick up the empty RTIs needed to deliver his products. This is particularly true for complex supply chains, which include multiple origins and destinations and multiple RTIs that flow within, and in which constraints such as variable demands, vehicle capacity, and shortage are to be considered. This paper proposes a new approach to overcome the shortcomings above of both modes in a closed-loop supply chain. Specifically, we consider the case of a two-level closed-loop supply chain comprising a set of suppliers delivering products to joint customers. We assume that each supplier owns RTIs that can be held in either his or the customer’s inventory. In addition, each supplier is responsible, as in DM, for collecting, refurbishing, and managing his inventory. We also assume that the suppliers coordinate their logistics operations so that, while delivering loaded RTIs to customers, each supplier may benefit from this visit to pick up empty RTIs regardless of the ownership. This has earmarks of the classic lateral transshipment that relies on authorising the virtual pooling of finished products’ inventory between members belonging to the same echelon of supply chains [18]. This practice usually takes place to re-balance the entire system’s stock levels to react to scenarios where one of the locations faces a shortage while others have residual stock in hand. Accordingly, instead of calling upon a pooler or a leasing company to acquire the needed quantities of RTIs, this paper suggests that suppliers arrange to “virtually” pool/share their stock of identically substitutable RTIs: no need for a real and physical pool to store RTIs as in SM. As such, we conserve the ownership of RTIs as in the DM and allow the shared use of RTIs as in SM (Table 1). Moreover, each supplier buys, when needed, and adds new RTIs to the whole system. Therefore, the order may be filled, the customers receive what they want, and the partners free up space in their inventory and reduce idle stock. It is mutually beneficial for all parties. Consequently, suppliers can sidestep the shortage of empty RTIs at their levels and reduce the cost of transportation, inventory holding, and the procurement of new RTIs. Such a strategy creates a valuable partnership but implies additional logistics operations that must be optimised.
Our paper has three main contributions. First, we develop a new mathematical formulation of the RTI pickup and delivery problem in a closed-loop supply chain consisting of a set of suppliers shipping their products to a set of common customers (e.g., plants, retailers) and using a set of RTIs, i.e., a multi-supplier multi-customer inventory routing problem with the pickup and delivery of multi-type shared RTIs (IRPPDS). We assume that supply chain partners adopt a vendor-managed inventory policy (VMI): their operations are coordinated to organise deliveries and pickups to fulfil customers’ demands. Thus, we address a multi-supplier, multi-customer, multi-RTI inventory routing problem that is hard to solve due to its inherent combinatorial complexity. Suggesting an efficient way to cope with this complexity by developing a breakthrough solving approach is the second contribution of this paper. Indeed, we use a matheuristic that hybridises an artificial-immune-system-based metaheuristic and a mathematical programming algorithm. Furthermore, thanks to its generality and flexibility, this matheuristic uses deep reinforcement learning techniques that were initially proposed by [19] for solving dynamic and stochastic inventory routing problems successfully. Furthermore, the performance of the approach is compared to the one developed in [19] and to two pure metaheuristics. Finally, broad experiment campaigns are conducted on instances of large sizes. These experiments stress that the resolution approach is very competitive compared to other existent metaheuristics: it leads to better quality solutions and reduces computational time. Furthermore, we evaluate the cost reduction enabled by the virtual pooling of RTIs compared to DM and SM.
The remainder of the paper is organised as follows. Section 2 presents an overview of related works. After a detailed definition of the problem in Section 3, the mathematical formulation is provided in Section 4. Section 5 describes the proposed resolution approach and explains the hybridisation scheme used to integrate the mathematical model, the artificial-immune-system-based algorithm and the deep reinforcement learning technique. Section 6 provides the computational results and presents the matheuristic performance analysis compared to three resolution approaches. Finally, Section 7 summarises the main findings and provides perspectives for further research.

2. Related Work

This section reviews research streams that are mostly related to our work. The objective is to position our contributions in line with papers on the inventory routing problem (IRP) with pickup and delivery and RTIs management modes and highlight our contribution to the resolution approaches applied to solve similar problems.
The vehicle routing problem (VRP) calls for determining the optimal set of routes to be performed by a fleet of vehicles to serve a given set of customers [20]. In the literature, three different variants related to the structure of pickup and delivery and the number of origins and destinations are to be distinguished [21]: one-to-one (1-1), in which a request is originated at one location and destined for another location; one-to-many-to-one (1-M-1), in which each customer receives a delivery originating from a common depot and sends a pickup quantity to the depot; and finally, many-to-many (M-M), in which a commodity may be picked up at one of many locations and also delivered to one of many locations [22,23,24,25]. The IRP calls for inventory management, vehicle routing, and delivery scheduling decision-making problems [26]. Our paper’s most relevant research stream addresses IRP with pickup and delivery (IRPPD). According to [27], this problem has three variations regarding vehicle routing: (1) VRP with simultaneous pickup and delivery (SPD), in which products are delivered whilst others are simultaneously sent back to the origin; (2) VRP with backhauls, where all deliveries must be undertaken before any pickup on each route; (3) VRP with mixed pickup and delivery, which can be characterised as a particular case of the VRP with SPD in which customers may have pickup or delivery demands. Some recent applications of the VRP/IRP with pickups and deliveries can be found in [4,7,28,29,30,31]. IRP problems have been intensively studied in the literature, and the reader is referred to [26] for a thorough overview of more related papers. Furthermore, for the more recent papers on decision support models for RTIs, the reader is referred to the review by [5,32], which provides a systematic literature review of decision models in managing closed-loop supply chains, including RTIs. Along with developing decision support models, significant research efforts have also been devoted to investigating RTI management strategiesin both the dedicated and the shared modes [14]. Most related works address the management of RTIs as part of a VMI policy and develop decision support models for cost reduction under stochastic or deterministic environments for the dedicated mode. Applications can be found in [1,2,14]. In [4,33,34], the authors propose models for inventory routing problems with simultaneous pickups and deliveries for a single-supplier, single-RTI, multi-customer (1-M-1) closed-loop supply chain. The models consider the maintenance costs of the reused RTIs returned from customers and the cost of buying a new one. In [33], other scheduled pickups and the supply of new RTIs are integrated as alternatives to sidestep the shortage of empty RTIs at the supplier level. Finally, in [11], a decentralised two-stage supply chain with a Retailer Stackelberg game is studied. The authors develop an analytical model to determine lot-sizing and pricing decisions for the product and its secondary packaging. As for the shared mode, most related work has studied different scenarios for the pooling or rental of RTIs with the help of mathematical modelling and simulation. The authors of [35] investigate a lot-sizing problem and assignment strategy that minimises the pallet management cost under environmental constraints. The authors of [36] study the pallet allocation problem under stochastic supply scenarios and customer priority, while those of [6] study a fresh fruit and vegetable supply chain and develop a mathematical model to select the best packaging (reusable/disposable) and minimise holding and handling costs. The authors of [37] analyse the effects of pallet service conditions and repair facilityon the economic and environmental performance of a pallet pooling system. In their paper, a new RTI procurement decision is also taken into consideration. The authors of [38] analyse the reverse logistics of plastic pallets in Canada, focusing on recovery options, such as reusing, remanufacturing, and recycling. A mathematical model is developed to determine the best locations in a pallet reverse logistics network and optimise the distribution flows between the network players. The authors of [16] analyse the transportation operations of a pallet pooling company serving a set of retailers. A pooler company is assumed to be responsible for supplying, collecting, and refurbishing pallets. Buying/selling and pooling management strategies are assessed and compared through what-if analysis. The authors of [39] study the service centres’ location problem considering a pallet pool mode. By integrating the forward and reverse flow of pallets, the objective is to minimise the total cost, including fixed construction, inventory, delivery, and recovery costs. The authors of [7] develop a mixed-integer program model for planning the distribution and vehicle routing for a single type of RTI and in a single period. They consider a pooler company responsible for dispatching leased empty containers to its customers and collecting the customers’ surplus empty containers. In their model, minimising procurement, storage, and maintenance costs is not considered. The authors of [40] use a simulation-based approach to model sharing a single RTI between two producers in a closed-loop supply chain. The results show that collaboration can lead to economies of scale and cost reduction. They also highlight the need for a third party to manage the entire system to promise mutual benefits for the concerned parties. On the other hand, the routing decisions are not optimised in their simulation model. Moreover, the model is not generic and realistic, as it considers a simple supply chain and only one type of RTI that flows in.
As for combinatorial complexity, VRP/IRP with pickup and delivery problems are well known to be NP-hard [41,42]. To tackle this complexity, approximation algorithms or metaheuristics are used. The most commonly encountered metaheuristics are either stochastic algorithms such as simulated annealing (SA) or ones based on artificial intelligence algorithms such as artificial immune system (AIS), genetic algorithm (GA), particle swarm optimisation (PSO) and ant colony (AC). Though AIS-based algorithms are a relatively new complex-problem-solving approach compared to other metaheuristics, the inherent characteristics of the immune memory, vaccination process, and self-recognition ability of the antibody and the diversity of immunity allow it to have a high level of flexibility and a good balance between global and local search [43]. Furthermore, AIS has demonstrated efficiency in convergence compared to other algorithms for large instances. The authors of [44,45,46,47,48,49] reported that AIS has a higher convergence rate than GA, PSO, AC, and SA. Therefore, AIS is used to solve our model for large-sized instances for all the reasons above.
To further enhance the convergence speed of AIS, we use machine learning (ML) techniques. Indeed, metaheuristics, through their iterative search processes, generate a lot of data that can be turned into explicit knowledge if coupled with ML models. This data concern decision-making solutions and the objective spaces visited during the search process, the sequence of solutions or trajectories, successive populations of solutions, moves, recombination, local optima, elite solutions, and bad solutions [50]. ML techniques can help analyse this data, extract valuable knowledge, and enhance metaheuristics’ search performance. Thus, metasearch techniques become “data-driven”, “well informed”, and therefore “smarter”. In this respect, ML was used to address discrete optimisation problems that focus on the travelling salesman problem and VRP. The data-driven metaheuristics have been proven to be advantageous in convergence speed, solution quality, and robustness. The methodologies in ML for decision problems, typically addressed by operation research (OR), are mainly found in reinforcement learning (RL), learning to search, and multi-armed bandits. The authors of [19,51,52,53,54,55] illustrate the recent successes achieved by RL concerning problems typically addressed by OR. For instance, the authors of [19] develop a matheuristic enhanced by RL techniques to solve a dynamic and stochastic IRP. The authors of [56,57] introduce ML in the solution processes of inventory and location problems. Finally, the authors of [58,59,60] use an RL-based technique to solve a VRP. As far as we are concerned, our paper is the first that combines RL with AIS to solve a multi-supplier multi-customer multi-RTI IRP with pickup and delivery in a closed-loop supply chain [61,62].
This review shows that despite the extensive literature on RTIs related to IRP, there is a lack of efficient tools and techniques to solve complex combinatorial problems such as closed-loop multi-product, multi-period, inventory routing problems with deliveries and pickups of multiple types of RTIs. As already mentioned, our research makes three main scientific contributions. Firstly, we develop a mathematical model to address the deterministic, multi-supplier, multi-customer (M-M), inventory routing problem, considering the delivery and return flows of multiple RTIs which are virtually pooled between a given number of suppliers. Secondly, we use a new artificial-immune-system-based algorithm and combine its strong global search capability with RL’s strong self-adaptability and goal-driven performance, all tailored to the mathematical model. Thirdly, computational experiments on specially designed instances highlight the performance of the proposed algorithm. From a managerial point of view, the results stress that this new approach allows for economies of scale and cost reduction at the level of all the involved parties. Furthermore, a sensitivity analysis on unit cost and the procurement of new RTIs is conducted and highlights the benefits and limits of the proposed model compared to other RTI management modes.

3. Mathematical Formulation

This section presents the mathematical models developed for IRPPDS, DM, and SM.

3.1. Mathematical Model for IRPPDS

We examine a multi-supplier, multi-customer, multi-RTI closed-loop supply chain. A set of m suppliers distribute different types of products using a set of r types of RTIs to a set of n common customers over a finite planning horizon. Each supplier delivers RTIs loaded with products to a set of customers. Each customer uses these products in his production process and constitutes an inventory of empty RTIs. The supplier then collects those empty RTIs to be reused for future productions and deliveries at his level. We assume that all supply chain players adopt a centralised management policy to synchronise operations according to each player’s requirements, optimise deliveries and pickups, and meet customers’ expectations.
The planning horizon is defined by a discrete and finite set of periods (days). Each player has a storage zone separated into two areas: an area for the inventory of empty RTIs (E) and another for the inventory of loaded RTIs (L). Each of these inventory areas is characterised by an initial inventory level and a maximum holding capacity. Initial inventories of loaded and empty RTIs are supposed to be positive and known at the beginning of the planning horizon. Deliveries and pickups are carried out by a set of homogeneous fleets of vehicles. Each vehicle can transport loaded or empty RTIs, or both, with a determined capacity in terms of the number of RTIs without distinction between empty and loaded RTIs (foldable RTIs are not considered). It is assumed that each constructed route starts from a supplier to visit a set of customers, and there is no route built between suppliers. Furthermore, customers are visited by each supplier independently of other suppliers’ planned routes. Since vehicles have a limited capacity, multiple suppliers’ routes are allowed. We assume that a vehicle can perform at most one pickup and delivery per period, all routes start and finish at each supplier, and split pickup/deliveries are not allowed. In each period, the sequence of events is as follows. First, each supplier prepares the quantity of loaded RTIs to be shipped by considering the current inventory. He uses his empty RTIs and those of other suppliers to load products on the appropriate type of RTIs. Then, each supplier visits each customer in each period to deliver the required quantity of products (in terms of loaded RTIs) for his production. The available inventory of empty RTIs at each level of the supply chain is checked. Depending on the demand that he must satisfy in the next period, each supplier picks up empty RTIs belonging to him. If these are not sufficient, he picks up other RTIs belonging to the other suppliers as long as these latter have sufficient inventory to meet demands for the next period. After pickups are performed, the empty RTIs are subject to quality control at each supplier location. Damaged RTIs are disposed of, serviceable RTIs are repaired, and undamaged RTIs are transferred to the inventory of empty RTIs. All the RTIs present in the inventory (repaired/cleaned) at the end of each period can be reused in the next period. Moreover, we assume that, in addition to the virtual pooling of empty RTIs, a supplier can purchase empty RTIs that he may need to fulfil future demands. In this case, buying RTIs is permitted in each period, and each RTI is available for use in the following one.
The objective of the IRPPDS model is to determine, for each level of the supply chain and over the finite planning horizon, the quantity of loaded RTIs to be delivered by each supplier and the quantity of empty RTIs to be picked up by each customer and shared. The demand is supposed to be deterministic by being time-varying. Such planning considers the inventory-level constraints (no shortages, backlogs, or overstocking are allowed), the availability of empty RTIs to suppliers, and the minimisation of the total cost, including inventory holding, maintenance, transportation, sharing, and the purchasing of new RTIs.
To model IRPPDS, we introduce different notations. We consider: a set N = i | i = 1 , n of n customers; a set P = 0 p | 0 p = 0 1 , 0 m of m suppliers; a set N p = i | i = 0 p , 1 n that represents the n customers, and the node 0 p that represents the supplier p; a set R = r | r = 1 , u of u types of RTIs that are used to carry on different types of products; and a set V = v | v = 1 , k of k homogeneous vehicles with a capacity of Q in terms of the number of RTIs. Accordingly, loaded and empty RTIs occupy the same volume as in the case of boxes and containers. We also consider a horizon T = t | t = 1 , l of l periods. Each supplier p and customer i incurs a holding cost for loaded RTIs (L) and empty RTIs (E): H p L , r , h i L , r , H p E , r and h i E , r (€ per unit), respectively. I p 0 L , r , L i 0 L , r , I p 0 E , r and L i 0 E , r represent the initial inventory level of loaded and empty RTI of type r, respectively, at the supplier p and customer i. C p L , c i L , C p E and c i E represent the maximum holding capacity for loaded and empty RTI, respectively, for the supplier p and customer i. At the beginning of the planning horizon, each supplier p receives information on demand to satisfy D p i t r (expressed in terms of loaded RTIs) of each customer i N for each period t T and for each RTI r. The distance between actors i N p and j N p is denoted by d i j p . The fixed cost of transportation is represented by a in € per km, and b represents the variable cost of transportation in € per weight unit and per km. The weights of a loaded and empty RTI are w L r and w E r , respectively. The cost of buying an RTI is e r in € per unit. The sharing cost incurred by each supplier p is s r per unit of unowned empty RTIs of type r belonging to other suppliers p used at his level to deliver products. This cost represents the utilisation cost of an unowned RTI used by each supplier if it occurs. Finally, g r is the maintenance cost per RTI of type r used by the suppliers to deliver products, including inspection and cleaning costs. The model’s notation is summarised in Table 2.
The IRPPDS in a multi-supplier, multi-customer, multi-RTI closed-loop supply chain is modelled as follows:
min i N t T r R ( h i L r L i t L r + h i E r L i t E r ) + p P t T r R ( H p L , r I p t L r + H p E r I p t E r ) + p P t T r R e r n t p , r + p P t T p P r R g r F p t p r + i N p P p P t T r R s r W i p t p r + p P t T i N p j N p a v V x i j v t p + r R b ( w L r X i j t p r + w E r E i j t p r ) d i j p
subject to:
L p i t L r = L p i t 1 L r + p P Q p i t p r D p i t r i N , t T , p P , r R
I p t L r = I p t 1 L r i N p P Q p i t p r + p P F p t p r t T , p P , r R
L i t E r = L i t 1 E r p P Z i t p r + p P D p i t r p P p P W i p t p r i N , t T , r R
I p t E r = I p t 1 E r + i N Z i t p r p P F p t p r + n t p r + p P W i p t p r p P , t T , r R
i N p , i j ( X i j t p r X j i t p r ) = p P Q p j t p r j N , p P , t T , r R
i N p , i j ( E j i t p r E i j t p r ) = Z j t p r + p P W j p t p r j N , p P , t T , r R
0 p P r R L p i t L r c i L i N , t T
0 r R I p t L r C p L p P , t T
0 p P r R L p i t E r c i E i N , t T
0 r R I p t E r C p E p P , t T
p P r R ( X i j t p r + E i j t p r ) Q p P v V x i j v t p i , j N p , t T
i N p v V x i j v t p 1 j N , p P , t T
i N p i j x i j v t p = i N p i j x j i v t p v s . V , j N p , p P , t T
j N x 0 p j v t p 1 v s . V , p P , t T
The objective function (1) minimises inventory costs at the level of each customer and supplier, the costs of purchasing new RTIs, the cost of the maintenance of RTIs, the sharing cost of RTIs undertaken by each supplier, and finally, the fixed and variable cost of transportation for pickup and delivery. Constraints (2) define the conditions for the conservation of the inventory levels of loaded RTIs owned by supplier p at the level of each customer i. Constraints (3) state that at the level of each supplier p, the inventory level of loaded RTIs at the end of period t is equal to the inventory level at the beginning of the period minus the quantities of loaded RTIs delivered to all customers and plus the quantities of empty RTIs that were loaded by supplier p in period t. Constraints (4) indicate that the inventory level for customer i at the end of period t of empty RTIs, held by supplier p, is equal to the inventory level of empty RTIs at the beginning of the period minus the quantity picked up by each supplier p plus the RTIs that have been emptied after demand has been satisfied minus the quantity of empty RTIs belonging to each supplier p that other suppliers have collected. Constraints (5) indicate that at the level of each supplier p, the inventory level of empty RTIs at the end of period t is equal to the inventory level at the beginning of the period plus the quantity of his empty RTIs collected from all customers plus the quantity of empty RTIs belonging to other suppliers that have been collected from customers by supplier p, minus the quantity of empty RTIs that have been loaded in period t plus the quantity of purchased RTIs. Constraints (6) ensure that the quantities of loaded RTIs owned by supplier p are delivered to customer j. Constraints (7) show that the flow of empty RTIs belonging to supplier p outgoing from node j is equal to the quantity of empty RTIs belonging to supplier p collected by supplier p, plus the quantity of empty RTIs belonging to other suppliers collected by supplier p, minus the inflow from all customers. Constraints (8)–(11) indicate the boundaries of the inventory levels of loaded and empty RTIs at the level of each supplier p and customer i. Constraints (12) stipulate that the quantities delivered and collected between two nodes i and j must not exceed the capacity of the vehicles on the arc ( i , j ). Constraints (13)–(15) express the conditions for the construction of tours. Constraints (13) indicate that at most one vehicle is used to visit node j. Constraints (14) guarantee the continuity of a tour. Constraints (15) ensure that vehicles leave the supplier only once per period or remain at the depot. Finally, constraints that define the non-negative constraints and the binary nature of the decision variables are imposed.

3.2. Mathematical Model for DM

For the DM model, there is no pooling of empty RTIs between the suppliers. That is, W i p t p r = F p t p r = 0 , i f p p , p , p P , i N , t T , r R . Each supplier manages, independently of other suppliers, the deliveries of his loaded RTIs to customers, the pickups of empty ones from customers, and their inventories at his level and customers’ location. Accordingly, the mathematical model is solved for each supplier independently and costs to minimise include inventory holding of empty and loaded RTIs, the transportation cost for delivery and pickups, maintenance, and the procurement of new RTIs. Accordingly, the formulation of the DM model, p P , is as follows:
min i N t T r R ( h i L r L i t L r + h i E r L i t E r ) + t T r R ( H p L , r I p t L r + H p E r I p t E r ) + t T r R e r n t p , r + t T r R g r F p t r + t T i N p j N p a v V x i j v t p + r R b ( w L r X i j t p r + w E r E i j t p r ) d i j p
The objective function minimises inventory costs for the supplier p and each customer, the costs of purchasing new RTIs, the maintenance cost of RTIs, and finally, the fixed and variable transportation costs for pickup and delivery.
It is subject to:
L p i t L r = L p i t 1 L r + Q p i t r D p i t r i N , t T , r R
I p t L r = I p t 1 L r i N Q p i t r + F p t r t T , r R
L i t E r = L i t 1 E r p P Z i t p r + p P D p i t r i N , t T , r R
I p t E r = I p t 1 E r + i N Z i t p r F p t r + n t p r t T , r R
i N p , i j ( X i j t p r X j i t p r ) = Q p j t r j N , t T , r R
i N p , i j ( E j i t p r E i j t p r ) = Z j t p r j N , t T , r R
0 p P r R L p i t L r c i L i N , t T
0 r R I p t L r C p L t T
0 p P r R L p i t E r c i E i N , t T
0 r R I p t E r C p E t T
p P r R ( X i j t p r + E i j t p r ) Q p P v V x i j v t p i , j N p , t T
i N p v V x i j v t p 1 j N , t T
i N p i j x i j v t p = i N p i j x j i v t p v s . V , j N p , t T
j N x 0 p j v t p 1 v s . V , t T
with:
  • Q p i t r : Quantity of loaded RTIs of type r owned by supplier p and that have been delivered to customer i in period t.
  • F p t r : Quantity of empty RTIs of type r owned by supplier p and that have been filled with products at his level in period t.

3.3. Mathematical Model for SM

In the SM model, a pooler company manages the inventory, pickups, and procurement of empty RTIs. On the other hand, each supplier is responsible for delivering loaded RTIs and managing their corresponding inventory. Furthermore, empty RTIs are delivered directly from customers to a series of centres (pooler facilities) managed by the company rather than to suppliers, as in the DM and IRPPDS models. The centres are assumed to be located near the suppliers. To determine the location of these centres, we solve a multi-period weighted clustering problem (MPC). The clustering consists in grouping supplier nodes into clusters to minimise the total distance between suppliers. Each cluster centroid of suppliers represents the centre in which empty RTIs of these suppliers are stored, cleaned, and repaired. When needed, the centre sends empty RTIs to suppliers so that they can produce and deliver their products to customers. As for costs, two other costs are to be considered: inventory holding at each centre ι and pooling cost. The latter incorporates the management of centres by the pooler company and each unowned RTIs used by each supplier (which is assumed, for the purposes of simplification, to be equivalent to the sharing cost in IRPPDS). The constraints of IRPPDS for the inventory and routing of pickups of empty RTIs from customers to centres and from the centres to suppliers are rewritten accordingly. In the following, the formulation of the SM model is presented.

3.3.1. Multi-Period Clustering Problem

To determine the location of the centres, we first solve an MPC. To do so, we define the binary variables θ p ι that have a value of 1 if a supplier p belongs to the cluster ι ( ι K = ι | ι = ι , κ m ) , and 0 otherwise with a binary variable ϵ p p ι having a value of 1 if the suppliers p and p belong to the same cluster. MPC can be then modelled as follows:
min p , p P ι K d p p ϵ p p ι
subject to:
ι K θ p ι = 1 p P
p P r R 1 t t D p ι t r θ p ι t Q ι K , t T
ϵ p p ι θ p ι , ϵ p p ι θ p ι ι K , p P , p P , p p
ϵ p p ι θ p ι + θ p ι 1 ι K , p P , p P , p p
ϵ p p ι , θ p ι 0 , 1 ι K , p P , p P , p p
The objective function (31) is to minimise the distance between suppliers ( p , p ) belonging to the same cluster ( ι ). Constraints (32) ensure that each supplier is assigned to a unique cluster. Constraints (33) state that the aggregate quantity of empty RTIs in each cluster in terms of demands over the planning horizon must fit into the available capacity, t Q , where Q is the vehicle capacity. Constraints (34) and (35) state that the distance between suppliers p, p , and d p p is included in the objective function if and only if suppliers p and p are assigned to the same cluster. Constraints (36) define the binary nature of the decision variables.

3.3.2. SM Model

In the SM model, two costs are considered:
  • Inventory holding at each centre ι : ι K t T r R H ι E r L ι t E r ;
  • Pooling cost for each unowned RTI used by each supplier (which is equivalent to the sharing cost in IRPPDS).
The constraints of IRPPDS for the inventory and routing of pickups of empty RTIs from customers to centres and from the centres to suppliers are rewritten as follows. Conservation of inventory levels and flows of empty RTIs at the level of each supplier p, customer i, and centre ι (with θ p ι and ϵ p p ι already determined by solving MPC):
L i t E r = L i t 1 E r ι K p P θ p ι Z i t p r + ι K p P θ p ι D p i t r i N , t T , r R
I p t E r = I p t 1 E r + ι K θ p ι R p t ι r + p P i N ι K ϵ p p ι W i p t p r p P ι K ϵ p p ι F p t p r p P , t T , r R
L ι t E r = L ι t 1 E r + i N p P θ p ι Z i t p r p P θ p ι R p t ι r p , p P i N ϵ p p ι W i p t p r p P i N Z i t p r + n t ι r ι K , t T , r R
i N p ι , i j ( E j i t ι r E i j t ι r ) = Z j t p r j N p ι , t T , r R , p P : θ p ι = 1 ι K
p P p p ( E p p t ι r E p p t ι r ) = θ p ι R p t ι r + i N p P ϵ p p ι W i p t p r p P , ι K , t T , r R
0 r R L ι t E r c ι E ι K , t T
p P r R X i j t p , r Q p P v V x i j v t p i , j N p , i j , t T
r R E i j t ι , r Q v V y i j v t ι i , j N p ι , i j , ι K , t T
r R E p p t ι , r Q v V y p p v s . t ι t T , p , p P , p p , ι K : ϵ p p ι = 1
i N p ι v V y i j v t ι 1 j N , t T , ι K
i N p ι i j y i j v t ι = i N p ι i j x j i v t ι v s . V , j N p ι , t T
j N y 0 ι j v t ι 1 v V , t T , ι K
p P p ι 0 ι v V y p p v s . t ι 1 p P p ι , t T , ι K
p P p ι 0 ι p p y p p v t ι = p P p ι 0 ι p p y p p v t ι v s . V , p P p ι , t T
p P p ι y 0 ι p v t ι 1 v V , t T , ι K
with:
  • N p ι : set of customers for whom the supplier p belongs to the cluster of centre ι (node 0 ι ).
  • P p ι : set of suppliers belonging to the cluster of centre ι .
  • R p t ι , r : quantity of empty RTIs of type r belonging to supplier p and sent to centre ι to which supplier p belongs.
  • E i j t ι , r : quantity of empty RTIs of type r transported from node i to node j in period t and sent to centre ι .
  • E p p t ι , r : quantity of empty RTIs of type r transported from node p to node p in period t and sent by centre ι .
  • L ι t E , r : inventory level of empty RTIs of type r at centre ι in period t.
  • y i j v t ι : binary variable equal to 1 if node j is visited right after node i by vehicle v, 0 otherwise.
  • y 0 ι j v t ι : binary variable equal to 1 if customer j is visited by v from node (cluster) 0 ι , 0 otherwise.
  • y p p v s . t ι : binary variable equal to 1 if supplier p is visited right after supplier p by vehicle v, 0 otherwise.
  • y 0 ι p v t ι binary variable equal to 1 if supplier p is visited by v from node (cluster) 0 ι , 0 otherwise.

4. Resolution Approach

The DM, SM, and IRPPDS models described in the previous section are NP-hard. To tackle their combinatorial complexity, a resolution approach is proposed.
We aim at determining over a given planning horizon the required quantities of RTIs to allow for supplying the needed quantities of products from a set of suppliers to a set of customers. We also seek to construct the optimal routes for pickups and deliveries of RTIs. Since the construction of the routes is the most complex part of the problem, we first use an appropriate heuristic to determine those routes. Once constructed, we solve a modified version of the three MILPs described in Section 3 to determine the other decision variables related, for example, to the quantities transported, delivered, and collected. Each of these versions is a min-cost network flow problem that is easier to solve. Regarding IRPPDS, its modified version is called FMILP, where the routing decision variables, x i j v t p , are fixed:
FMILP:
min i N t T r R ( h i L r L i t L r + h i E r L i t E r ) + p P t T r R ( H p L r I p t L r + H p E r I p t E r ) + p P t T r R e r n t p , r + p P t T p P r R g r F p t p r + i N p P p P t T r R s r W i p t p r
subject to Constraints (2)–(11).
We use a matheuristic to construct routes and improve the final solution as described above. The matheuristic hybridises the FMILP with an artificial-immune-system-based algorithm and a deep Q-learning process into a global solving scheme called AIS-DQL. The overview of the matheuristic AIS-DQL is presented in Figure 1.
These steps are described in detail in the following subsections.

4.1. Artificial Immune System

Artificial-immune-system-based algorithms are bio-inspired metaheuristics that imitate the principles and processes of immune system functioning [63]. The algorithms are typically modelled after the immune system’s characteristics of learning and memory for use in problem solving. They imitate antigen recognition, antigen and antibody binding, and the antibody production process. Furthermore, they abstractly use the diversity and memory mechanism of the immune system. Therefore, they can ensure individual diversity while maintaining a high affinity, thereby avoiding premature phenomena and showing a strong global search ability. In this paper, antigens correspond to the input data of the problem, and the antibodies correspond to the routes to construct or the different suppliers. Their structure, depicted in Figure 2, consists of sequences of possible nodes to be visited in each route and for each supplier and each period.
As depicted in Figure 1, AIS starts with an initialisation phase. A population of random routing solutions representing a pool of antibodies (routes) is initially generated. The routes are built using a 2-opt local search algorithm [64]. Proliferation and maturation processes are undergone by cloning each member of the initial pool, i.e., copying each of the initial solutions based on their affinity. The rate of proliferation is chosen to be directly proportional to the affinity, such that the higher the affinity, the more offspring there are. For this purpose, selection, hypermutation (HM), and receptor editing (RE) operators are used.

4.1.1. Affinity and Cloning Selection

Each time an antibody (routing decisions) is generated, it is used as an input to solve the FMILP. Therefore, the corresponding feasible objective function (OF) and the remaining decision variables of the model are computed. The affinity f ι of an antibody, ι , is computed using the corresponding objective function O F ι : f ι = 1 O F ι . Thus, the higher affinity value would have a lower total cost. Hence, as an antibody’s cloning rate is proportional to its affinity, more antibody clones have lower costs in the next generation than antibodies with higher costs. The probability, PS, of selecting an antibody to be cloned depends on its affinity. If f ι is the affinity of an antibody ι in the population, its probability P S ι is defined as: P S ι = f ι ς f ς .

4.1.2. Affinity Maturation

Since the algorithm needs to thoroughly explore and exploit the search space to obtain a good solution, exploration and exploitation are carried out depending on the evolution operator’s capability variation. These operators conduct random perturbations on each gene to generate the next generation’s population in the current population. The variation in the antibodies is performed through HM and RE mechanisms. The HM mechanism ensures that the higher-affinity antibodies are hyper-mutated at a slower rate. The H M ι the rate for an antibody ι is defined as H M ι = e ω f ι , where ω is the decay control factor. A new population is created after hyper-mutation, and each antibody undergoes various affinity changes. Antibodies are therefore reorganised once again based on the affinity assessment.
After cloning and mutation processes, a percentage of the antibodies in the current population is eliminated (the worst ϕ % of the population) and replaced by the randomly generated antibodies. This mechanism, which is a vertebrate immune system mechanism, is called receptor editing [44]. This mechanism generates new antibodies that correspond to the new search area of the search space. Exploring new search areas may help the algorithm to escape from local optima. The new antibody population then becomes the next generation of antibodies.
Finally, if a generation’s objective function value does not improve over that of the previous generation, convergence is assumed to be achieved, and it is possible to retrieve the best equivalent antibody as the best solution, and the algorithm stops.

4.2. AIS Enhanced with Deep Q-Learning

In this section, we highlight AIS limitations and present an RL technique used to overcome them.

4.2.1. AIS and RL

According to [43], although many results have proved the convergence of AISs to a global optimum, a Markov-chain analysis shows weak convergence of the AIS algorithms. Indeed, due to the single-point random mutation of the antibody, AIS converges slowly, meaning that a given antibody selects a gene bit and changes its value randomly to some of the other selectable values. Moreover, it cannot retain any locally excellent gene blocks in some low-affinity antibodies because of other poor gene blocks. As a result, the search speed is low. From this stems the idea of using RL to tackle this problem. Indeed, since random searching leads to slow evolution and weak AIS convergence efficiency, environmental feedback signals and the updated action policy of deep Q-learning are used to construct an algorithm with strong self-adaptability and goal-driven performance.
In this paper, RL is employed to assist in analysing data on moves and recombination that have been performed to construct solutions to the problem. The goal is to extract meaningful information from this data to direct and improve the AIS’s search performance and speed. Indeed, just like a human being, the agent that symbolises the antibody (solution to the problem) learns of its own to acquire successful strategies that result in the largest long-term rewards. RL is a paradigm of learning by trial and error based entirely on rewards or penalties. The agent constructs and learns its information directly from moves it makes using operators such as HM and RE. RL is indeed used to assist AIS in determining the optimal actions to take in terms of the best moves for each operator.

4.2.2. Q-Learning

Q-Learning is a self-adaptive RL off-policy method characterised by strong environmental feedback signals [65]. The fundamental idea is to use the feedback signal to adjust an agent’s action policy to make the best decision when interacting with the environment (i.e., antibody space). The agent (i.e., antibody) arrives in different states based on actions (i.e., AIS operators). Actions determine positive and negative rewards. The concept behind Q-learning is to put the agent in a series of state-action combinations, observe the rewards, and then change the predictions of a table (called a Q-table) to those rewards until the best policy properly predicts them. As a result, the “Q” stands for quality, which indicates how effective a particular action is in earning a possible reward.

4.2.3. Deep Q-Learning

Q-learning is a relatively basic and effective algorithm. However, it may be time-consuming, as the amount of memory required to save and update the Q-table grows with the number of states, and the amount of time required to investigate each state to construct the appropriate Q-table is impracticable. In this paper, these Q-values are estimated using neural networks known as deep Q-learning (DQ). Accordingly, the state is an input, and the output is the Q-value of all potential actions. Once the network is trained, selecting the right action means comparing each action’s possible rewards and choosing the best one.

4.2.4. Deep Q-Learning Architecture

DQ begins by estimating random Q-values to explore the environment, as shown in Figure 3. DQ enhances its Q-value estimations by employing the same dual-action paradigm, with a present action having a current Q-predicted value and a target action with a target Q-value. The direction of the predicted Q-target values varies since the network and its weights are equal; they remain unchanged but may fluctuate with each update. The Q-target values are stabilised by employing a second network that has not been trained. After a pre-determined number of iterations, referred to as C-iteration, the learned weights from the Q-predicted network are copied to the Q-target network. The DQ design has two neural networks (Q-predicted and Q-target) and an experience replay agent, as shown in Figure 3. For data generation during Q-network training, the experience replay interacts with the environment. These data contain all of AIS’s operators’ moves, which are recorded as <st,a, R ,st’> tuples (see notation below Equation (53)). Then, a sample is picked randomly from these data such that it consists of a mix of older and more recent samples. This batch of training data is used in the Q-predicted and Q-target networks. The Q-predicted network takes the current state and moves out of each sample, and for that move predicts the Q-value. The Q-predicted value, the Q-target value, and the observed data sample reward are used to compute the loss for the Q-network training (see Equation (53)). A batch of data is selected from all prior experiences to reduce variance and guarantee the algorithm’s stability in C-iteration. Next, a sample is chosen from these data, with an equal mix of older and newer samples. The Q-predicted and Q-target networks use this batch of training data. The Q-predicted network predicts the Q-value based on the current state and moves out of each sample. The loss for the Q network training is computed using the Q-predicted value, the Q-target value, and the observed data sample reward (see Equation (53)). After each C-iteration, a batch of data is selected from all prior experiences to reduce variation and ensure the algorithm’s stability.
L o s s = [ R t + 1 + γ max a ( θ T Q ( s t , a ) θ T Q ( s t , a ) ) ] 2
where:
  • γ : discount-rate parameter to measure the weight of the future awards.
  • a , a : current and future action, respectively.
  • s t , s t : current and future state, respectively.
  • R t + 1 : future reward.
  • Q ( s t , a ) : learned action-value function.
  • θ T : transpose matrix of network weights.
Finally, as for the AIS memory, a set of the best antibodies having the highest affinity is stored, and the best moves have been obtained so far. Instead of starting from scratch every time the algorithm is run to solve the model for a given antigen, similar to the antigens (instances) already solved, we use the genetic memory to rapidly obtain the best solutions and the optimal policies for the antibodies. Similar antigens are selected based on the K-nearest neighbours algorithm [66].

5. Implementation and Experimental Analysis

This section presents the experimental design adopted for this study and the analysis of the computational results. All the optimisation steps were carried out on a personal computer (MacBook Pro, macOS Cataline, CPU 3.3 GHz Quad-Core Intel Core i7, 8 GB of RAM).

Experimental Design and Parameters Tuning

The MILP developed for the multi-supplier, multi-customer, multi-RTI closed-loop supply chain was first solved to optimality for small-and-medium-size instances using the Branch-and-Cut solver of CPLEX 12.9 (academic version). The objective was to check the model’s validity, representativeness, and exact solving approach limitations.
To implement the matheuristic, we used Python 3.7 and Pytorch interfaced with CPLEX. The approach was first tested on the same instances optimally solved using CPLEX. The objective was to assess its performance. We ran the AIS algorithm without the learning process (AIS) and compared the improvement provided by the deep Q-learning when coupled with the AIS algorithm (AIS-DQL). We also compared the algorithm’s performance with a pure genetic algorithm (GA) and its improved learning version (GA-DQL). GA is also a population-based metaheuristic that mimics the principle of natural genetics to find a solution. The algorithm is known for its strong global search. The algorithm starts with an initial set of random solutions called a population. Each individual in the population is called a chromosome, representing a solution to the problem at hand. The best parents (best chromosomes having the highest affinity) are selected from the current generation and considered for a two-point crossover operation to form their offspring. The mutation process is also integrated as it helps to obtain new information randomly for the genetic search process and ultimately helps avoid getting trapped at local optima. In this paper, the chromosomes also represent the routing decisions and are decoded as the antibodies of AIS. For a thorough description of GA-DQL, the reader is referred to [19].
The tests were performed in 20 replications for the 40 generated instances to evaluate the stability of the algorithms, and the average value of the objective function is presented. A statistical analysis using ANOVA was also conducted to assess the eventual randomness of the differences between the obtained results (see Table 3). These results stress that for all resolution approaches under consideration, p-value > 0.05 , which means that there is no significant difference between the algorithms and the solutions obtained using CPLEX. Table 4 reports the algorithm parameters tuned so that a trade-off between the algorithm’s performance and speed is satisfied.
The instances had a number of suppliers varying from 5 to 25, a number of customers from 6 to 24, and a number of RTIs varying from 2 to 10. The planning horizon of deliveries and pickups was five days, corresponding to a workweek. Customer demands were randomly generated between 5 and 70 in loaded RTIs. For each instance, suppliers’ and customers’ locations were randomly chosen in the Euclidean space between (0, 0) and (1000, 1000). Moreover, we considered initial inventory levels and unit costs for transportation, holding and maintenance of the self-same scale as in [4], which considers a 1-M-1 IRP for a single type of RTIs. As the unit cost of an RTI may go from a few euros for plastic boxes to 1300 euros for stillages, according to the study conducted by [3], we considered a randomly generated purchase cost varying between 3 and 1000 euros. Finally, we considered a unit cost of sharing varying between 2 and 10 euros per type of RTI.
In the remainder of the paper, we refer to the instances using the following notation: (number of RTIs) R, (number of suppliers) S, (number of customers) C, (number of vehicles) V, and (number of periods) T; e.g., 1R2S5C2V5T refers to the instance where one type of RTI is shared and used to ship the products of two suppliers to a set of five customers, transported by two vehicles over a planning horizon of 5 days.

6. Computational Experiments

First, the three models, developed for the DM, SM, and IRPPDS modes, were solved using CPLEX. The objective was to compare the benefits and limitations of each mode on the performance of the overall supply chain. Then, given their combinatorial complexity, the three models were solved using four approaches: AIS and GA with and without DQL. The performance of each of these approaches was analysed by comparing it to the solutions obtained using CPLEX on small instances. The benefits of DQL on the performance of the methods were also highlighted. Given the contribution of DQL, the three models were solved on large instances using the AIS-DQL and GA-DQL approaches.

6.1. Results on Small Instances Solved Using CPLEX

The SM, DM, and IRPPDS models were first solved using the Branch-and-Cut solver of CPLEX until reaching optimality. We first considered solving the models with only one type of RTI for a number of customers varying from 6 to 24 in a planning horizon that corresponds to a week of 5 days. We also conducted additional experiments in which we considered a number of RTIs varying from 2 to 10, and finally a number of suppliers varying between 5 and 25. The objective was to provide partial insights regarding the benefits of RTI sharing, the representativeness of the results, and the run time needed to solve this problem. Table 5 summarises the computational results for each instance under consideration. It reports the breakdown of the total cost (TC), namely: transportation (T), inventory of the suppliers (I-S), inventory of the customers (I-C), inventory at the centres for SM (I-K), maintenance (M), procurement of new RTIs (P), and sharing (S). Table 5 also provides the saving (%) between total costs for SM and IRPPDS regarding the total cost of DM and the CPU time in seconds. The saving, noted CS, is computed as follows:
C S = T o t a l C o s t D M T o t a l C o s t S M o r I R P P D S T o t a l C o s t D M 100 .
From Table 5, we can see that, as expected, compared to DM, both SM and IRPPDS reduce total costs. Moreover, IRPPDS can help achieve significant cost savings; for IRPPDS, the average total cost was reduced by 40% against 17% for SM. Indeed, in DM, each supplier needs to manage his inventory, deliveries, and the pickups of his empty RTIs from customers. As no shortage is permitted, if his inventory of empty RTIs is not sufficient to meet customer demand, he buys this needed quantity, and a procurement cost is then incurred. Furthermore, regarding transportation costs, as each supplier can only use his RTIs, which cannot be shared among suppliers, the cost of picking up these latter from customers is incurred.
In SM, by contrast, as empty RTIs are owned and centrally managed by a pooler company, procurement costs could be reduced thanks to the risk pooling effect. Transportation costs (which include a variable cost that depends on the quantity of RTIs transported) are slightly reduced. Indeed, deliveries incurred by the suppliers remain the same as in DM, but not for the pickups of empty RTIs from customers. These empty RTIs are later transported instead to RTI centres owned by the pooler company, which are assumed to be located near suppliers, and they are transported to the suppliers when required. However, in SM, since the requests of RTI are not balanced between the suppliers, the pooler company must buy the needed quantities and ship them to the suppliers, which increases transportation and procurement costs.
As for IRPPDS, the transportation and procurement costs are significantly reduced. Indeed, in this configuration, the supply chain is centrally managed, and each supplier has his RTIs held at his inventory/customers and picks up empty ones from customers when vehicles visit customers to deliver the required products. In addition, as each supplier can also benefit from this visit to pick up not only his RTIs but also the RTIs of other suppliers, vehicle fill rates are improved (as transportation cost includes a variable cost that depends on the quantity of RTIs transported), and each supplier no longer needs to buy the RTIs he may need to meet his customers’ demand. Orders are thus satisfied by any RTI, and a procurement order is only triggered if needed, which reduces its relative cost compared to DM.
Moreover, in IRPPDS, the risk pooling is maintained as each supplier, when he buys RTIs, adds them to the system pool. Furthermore, the additional management costs of SM, including pooling, inventory, and transportation on the level of the pooler company’s centres, are no longer incurred. Furthermore, as is shown in Figure 4, compared to IRPPDS, the quantity of new RTIs bought may represent, for some instances, up to 70% of the available inventory of empty RTIs in SM. More RTIs are purchased at each centre to meet the needs of the suppliers it serves. Finally, from Table 5 and Figure 4, we notice that IRPPDS takes on more interest as the number of RTIs and suppliers increases. Indeed, in SM, when the number of suppliers increases, more centres are needed, especially in different and distant geographical areas, making it challenging to reduce the cost of logistics and procurement for centres servicing clusters housing a significant number of suppliers. As a result, the demands for empty RTIs are not balanced; procurement, inventory, and transportation costs increase as the number of visits from customers to these centres and from these centres to suppliers increases.

6.2. First Insights into the Effectiveness of the Resolution Approach on Small Instances

As we can see, solving exactly the three models under consideration is very combinatorially complex, and the CPU time increases drastically with the number of suppliers. As described in the experimental design, the resolution approach AIS-DQL was compared to other metaheuristics to assess its performance: GA, AIS, and GA-DQL. Table 6 gives the results of the comparison. The gap regarding total cost is computed as:
G a p = T o t a l C o s t M e t a h e u r i s t i c T o t a l C o s t C P L E X T o t a l C o s t C P L E X 100 .
From Table 6, we can see that for all the instances under consideration, AIS-DQL, compared to AIS, GA-DQL, and GA, can find solutions with minor gaps. On average, GA provided solutions with a gap of 12.6%, AIS with 9.4%, GA-DQL with 4.8%, and AIS-DQL with a gap of 0.1%. Indeed, AIS-DQL was more stable, as it was less sensitive to small changes (perturbations) in the input data and the instances’ size. Moreover, AIS-DQL allowed for reducing the computational time considerably. GA and AIS may have similar mutation mechanisms, but AIS’s immune memory makes it more robust and stable. Furthermore, AIS learning requires increasing the relative population size of each of these antibodies, which proved valuable. A clone is generated temporarily, and those low-affinity antibodies are eliminated. The goal is to solve the problem using minimal resources and time. Therefore, the algorithm’s response efficiency was greatly enhanced by the memory associated with the first and best antibodies obtained for different and similar antigens, and it was capable of providing the best solutions with a high affinity for a given instance only after a few iterations. Indeed, our algorithm ensures that both the speed and accuracy of the immune response are progressively higher after each model resolution. In addition, combined with a deep reinforcement learning technique and KNN, the immune memory further strengthens the interaction with the environment, resulting in a continuous improvement of the algorithm’s ability and prior knowledge of similar problems to solve the model for a given instance.

6.3. Extra Experiments on Large Instances Solved Using GA-DQL and AIS-DQL

To obtain more insights into the effectiveness of IRPPDS and AIS-DQL, we further ran tests on large instances and compared the results to those obtained using GA-DQL. We solved the DM, SM, and IRPPDS. Then, we computed the total cost and the corresponding savings. We present the results obtained within a CPU time of less than half an hour. The computational results are summarised in Table 7. Table 7 also reports the difference (Diff) between the total costs computed using GA-DQL and AIS-DQL as follows:
D i f f = T o t a l C o s t G A - D Q L T o t a l C o s t A I S - D Q L T o t a l C o s t G A - D Q L 100 .
As expected, the AIS-DQL allows for feasible solutions to large-sized problems within a reasonable time. AIS-DQL allowed for obtaining better solutions with an average of 14% compared to GA-DQL and with less time, with an average CPU of 479 s for DM, 557 s for SM, and 556 s for IRPPDS against 743 s for DM, 644 s for SM, and 697 s for IRPPDS. As for the results, for all the instances under consideration, IRPPDS reduced total cost compared to DM, with an average saving of 35% (against 16% for SM). Moreover, the benefits of promoting virtual pooling were highlighted when the number of RTIs and suppliers increased. Furthermore, if the demands to be satisfied required the use of several types of RTIs, the benefits of SM were smaller compared to those of IRPDPS and even DM (according to the results of [14]). This was truer when the number of suppliers increased. Indeed, even if SM can reduce the long-distance transportation of empty RTIs compared to DM, when empty RTIs were not balanced, SM’s transportation cost increased since it includes the incurred costs from customers to the pooler’s centres from these latter to the suppliers. In addition, even if the centres are located near suppliers (which is often the case in the automotive industry), it would be challenging to balance the quantity of empty RTIs between all suppliers. Therefore, SM may work in favour of or against any supplier regardless of location, the demands they should meet, or the centres’ number.

6.4. Sensitivity Analysis on Unit Cost

Considering that the performances may depend on the different unit costs, a sensitivity analysis was conducted, and the results are given in this section. Without loss of generality, we ran tests on the instance 10R20S30C. In each test, we considered three scenarios. The first scenario represents the case where the unit cost of sharing is significantly lower than the smallest unit cost of procurement ( s r = 5 ). The second one corresponds to the case where the unit cost of sharing is equal to a given unit cost of procurement ( s r = 700 ). Furthermore, the third scenario represents the case where the unit cost of sharing is significantly higher than the greater unit cost of procurement ( s r = 1800 ). Figure 5 and Figure 6 depict the variation in cost reduction (CR) for different values of cost parameters. It is worth noting that variable transportation and procurement costs were chosen to conduct the sensitivity analysis due to their significant contributions to the total costs.
From Figure 5, we see that from a cost-reducing perspective, IRPPDS generally has obvious advantages compared to DM for the three scenarios under consideration. Moreover, we notice that any change in the procurement cost has the most significant impact on cost reduction for IRPPDS. For the lower unit cost of sharing, we see that as the procurement cost increases, the performance advantages of IRPPDS increase significantly. Indeed, authorising virtual pooling reduces inventory holding costs for empty RTI owners (lowering the idle stock of non-used empty RTIs), while suppliers who use these RTIs can meet more demands when no or fewer RTIs need to be bought. On the other hand, when the procurement cost is smaller compared to the unit cost of sharing, saving is smaller, and the advantages of DM and IRPPDS are comparable (this is even more evident in scenario 3). Thus, the advantages of IRPPDS may lessen as the cost incurred by sharing cannot be offset by the saving it brings regarding the reduction of procurement costs. Therefore, IRPPDS becomes profitable with a higher procurement cost. Moreover, as shown in Figure 6, significant savings are achieved when the sharing cost is smaller. In addition, when the transportation cost increases, IRPPDS is more profitable compared to DM. Indeed, more empty and loaded RTIs can be transported in a period (high fill rates), while fewer customers are visited and fewer RTIs are bought in the next period. However, the predominance of IRPPDS may weaken when the sharing cost increases (this was even more evident in scenario 3, with GAP tending to zero). Indeed, with higher unit costs of sharing and transportation, it would be preferable and more cost-effective to have low fill rates (i.e., not to accept loading of unowned RTIs and to send them to the suppliers for further reuse) and to buy the needed RTIs rather than to have to pay for shared RTIs. Consequently, DM may be more profitable compared to IRPPDS.

7. Conclusions and Perspectives

This paper considered a deterministic, multi-supplier, multi-customer and multi-RTI inventory routing problem with delivery and pickup in a collaborative supply chain in which empty RTIs inventories are virtually pooled among suppliers. We developed an MILP and solved it using CPLEX. Experiments showed that the virtual pooling of RTIs significantly reduces new RTI procurement costs as well as inventory and transportation costs compared to dedicated and shared modes. Moreover, to handle the combinatorial complexity of the problem, we developed an artificial-immune-system-based algorithm coupled with deep reinforcement learning tailored to the mathematical program. We implemented our resolution approach using Python and Pytorch and compared it to the CPLEX solver and three metaheuristics, namely, AIS without deep learning and GA with and without deep learning. Both variants of GA and AIS coupled with DQL seem to be competitive. However, the AIS variant outperformed GA thanks to its immune memory, which continuously improved the algorithm’s speed and stability in solving the model. AIS-QDL even allowed for obtaining optimal solutions for some instances and feasible solutions with a tiny gap and within a small amount of time. Using AIS-QDL, we solved the model for large instances of up to 700 suppliers, 34 customers, and 31 types of RTIs. A sensitivity analysis of units’ costs was also conducted. These results highlight how virtual pooling can be preferable compared to the dedicated and shared modes.
While the benefits of the model and the effectiveness of the AIS-DQL were demonstrated using randomly generated instances, it would be beneficial to assess further their effectiveness on real data. Moreover, several possible applications may be investigated. For example, one could study the integration of cross docks in the RTI flows, as in the case of automotive supply chains. The idea is to combine and consolidate, when it seems advantageous, numerous smaller RTI loads provided by different suppliers and to deliver them downstream. Future research may also investigate the case of stochastic demands as room to exploit further and assess the limits of the resolution approach to tackle this kind of problem, the relative power of all parties in decision making, and maximising profit and its allocation. One way to address this latter may rely on the degree of commitment of the players. Indeed, as many supply chains experience the highest loss and damage rates of RTIs (which can be trackable using, for instance, RFID tags), the pool manager can reduce the costs incurred by “good” users and increase those of the “bad” ones or offer them training on the use of these RTIs so that they can improve on their weak points, reduce the environmental impacts, and increase the competitiveness of the whole system. Furthermore, decisions related to fleet composition and fuel consumption are to be considered in future work.

Author Contributions

Conceptualization, F.E.A., F.R., E.S. and S.L.; methodology, F.E.A. and S.L.; software, F.E.A.; validation, E.S.; investigation, F.E.A., F.R. and S.L.; resources, F.E.A.; data curation, F.E.A.; writing—original draft preparation, F.E.A., F.R., E.S. and S.L.; writing—review and editing, F.E.A., F.R., E.S. and S.L.; visualization, F.E.A. and S.L.; supervision, F.R., E.S. and S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cobb, B. Inventory control for returnable transport items in a closed-loop supply chain. Transp. Res. Part E Logist. Transp. Rev. 2016, 86, 53–68. [Google Scholar] [CrossRef]
  2. Kim, T.; Glock, C. On the use of RFID in the management of reusable containers in closed-loop supply chains under stochastic container return quantities. Transp. Res. Part E Logist. Transp. Rev. 2014, 64, 12–27. [Google Scholar] [CrossRef]
  3. Limbourg, S.; Martin, A.; Paquay, C. Optimal Returnable Transport Items Management. 2016. Available online: https://hdl.handle.net/2268/200983 (accessed on 1 May 2022).
  4. Iassinovskaia, G.; Limbourg, S.; Riane, F. The inventory-routing problem of returnable transport items with time windows and simultaneous pickup and delivery in closed-loop supply chains. Int. J. Prod. Econ. 2017, 183, 570–582. [Google Scholar] [CrossRef] [Green Version]
  5. Glock, C.H. Decision support models for managing returnable transport items in supply chains: A systematic literature review. Int. J. Prod. Econ. 2017, 183, 561–569. [Google Scholar] [CrossRef]
  6. Bortolini, M.; Galizia, F.G.; Mora, C.; Botti, L.; Rosano, M. Bi-objective design of fresh food supply chain networks with reusable and disposable packaging containers. J. Clean. Prod. 2018, 184, 375–388. [Google Scholar] [CrossRef]
  7. Liu, G.; Li, L.; Chen, J.; Ma, F. Inventory sharing strategy and optimization for reusable transport items. Int. J. Prod. Econ. 2020, 228, 107742. [Google Scholar] [CrossRef]
  8. Twede, D.; Clarke, R. Supply Chain Issues in Reusable Packaging. J. Mark. Channels 2005, 12, 7–26. [Google Scholar] [CrossRef]
  9. Sarkar, B.; Ullah, M.; Kim, N. Environmental and economic assessment of closed-loop supply chain with remanufacturing and returnable transport items. Comput. Ind. Eng. 2017, 111, 148–163. [Google Scholar] [CrossRef]
  10. Talaei, M.; Farhang Moghaddam, B.; Pishvaee, M.S.; Bozorgi-Amiri, A.; Gholamnejad, S. A robust fuzzy optimization model for carbon-efficient closed-loop supply chain network design problem: A numerical illustration in electronics industry. J. Clean. Prod. 2016, 113, 662–673. [Google Scholar] [CrossRef]
  11. Meherishi, L.; Narayana, S.A.; Ranjani, K.S. Integrated product and packaging decisions with secondary packaging returns and protective packaging management. Eur. J. Oper. Res. 2021, 292, 930–952. [Google Scholar] [CrossRef]
  12. TrackX. A Return on Returnables. Technical Report. 2017. Available online: https://www.omniq.com/wp-content/uploads/2020/11/Quest-TrackX-WP-RTI-Assets.pdf (accessed on 1 May 2022).
  13. Na, B.; Sim, M.K.; Lee, W.J. An Optimal Purchase Decision of Reusable Packaging in the Automotive Industry. Sustainability 2019, 11, 6579. [Google Scholar] [CrossRef] [Green Version]
  14. Zhang, Q.; Segerstedt, A.; Tsao, Y.C.; Liu, B. Returnable packaging management in automotive parts logistics: Dedicated mode and shared mode. Int. J. Prod. Econ. 2015, 168, 234–244. [Google Scholar] [CrossRef]
  15. GS1 Global Office. Reusable Transport Items within GS1 EANCOM; Technical Report; GS1 Global Office: Brussels, Belgium, 2014. [Google Scholar]
  16. Accorsi, R.; Baruffaldi, G.; Manzini, R.; Pini, C. Environmental Impacts of Reusable Transport Items: A Case Study of Pallet Pooling in a Retailer Supply Chain. Sustainability 2019, 11, 3147. [Google Scholar] [CrossRef] [Green Version]
  17. Govindan, K.; Soleimani, H.; Kannan, D. Reverse logistics and closed-loop supply chain: A comprehensive review to explore the future. Eur. J. Oper. Res. 2015, 240, 603–626. [Google Scholar] [CrossRef] [Green Version]
  18. Paterson, C.; Kiesmüller, G.; Teunter, R.; Glazebrook, K. Inventory models with lateral transshipments: A review. Eur. J. Oper. Res. 2011, 210, 125–136. [Google Scholar] [CrossRef] [Green Version]
  19. Achamrah, F.E.; Riane, F.; Limbourg, S. Solving inventory routing with transshipment and substitution under dynamic and stochastic demands using genetic algorithm and deep reinforcement learning. Int. J. Prod. Res. 2021, 1–18. [Google Scholar] [CrossRef]
  20. Toth, P.; Vigo, D. The Vehicle Routing Problem; SIAM: Philadelphia, PA, USA, 2002. [Google Scholar]
  21. Berbeglia, G.; Cordeau, J.F.; Gribkovskaia, I.; Laporte, G. Static pickup and delivery problems: A classification scheme and survey. TOP Off. J. Span. Soc. Stat. Oper. Res. 2007, 15, 1–31. [Google Scholar] [CrossRef]
  22. Andersson, H.; Christiansen, M.; Fagerholt, K. The Maritime Pickup and Delivery Problem with Time Windows and Split Loads. INFOR Inf. Syst. Oper. Res. 2011, 49, 79–91. [Google Scholar] [CrossRef]
  23. Rais, A.; Alvelos, F.; Carvalho, M.S. New mixed integer-programming model for the pickup-and-delivery problem with transshipment. Eur. J. Oper. Res. 2014, 235, 530–539. [Google Scholar] [CrossRef]
  24. Chen, H.K.; Chou, H.W.; Hsueh, C.F.; Yu, Y.J. The paired many-to-many pickup and delivery problem: An application. TOP 2014, 23, 220–243. [Google Scholar] [CrossRef]
  25. Li, B.; Krushinsky, D.; Van Woensel, T.; Reijers, H. An Adaptive Large Neighborhood Search Heuristic for the Share-a-Ride Problem. Comput. Oper. Res. 2015, 66, 170–180. [Google Scholar] [CrossRef] [Green Version]
  26. Coelho, L.C.; Laporte, G. Improved solutions for inventory-routing problems through valid inequalities and input ordering. Int. J. Prod. Econ. 2014, 155, 391–397. [Google Scholar] [CrossRef]
  27. Parragh, S.N.; Doerner, K.F.; Hartl, R.F. A survey on pickup and delivery problems. J. Betr. 2008, 58, 21–51. [Google Scholar] [CrossRef]
  28. Tarantilis, C.D.; Anagnostopoulou, A.K.; Repoussis, P.P. Adaptive Path Relinking for Vehicle Routing and Scheduling Problems with Product Returns. Transp. Sci. 2012, 47, 356–379. [Google Scholar] [CrossRef]
  29. Archetti, C.; Christiansen, M.; Grazia Speranza, M. Inventory routing with pickups and deliveries. Eur. J. Oper. Res. 2018, 268, 314–324. [Google Scholar] [CrossRef]
  30. Van der Heide, G.; Buijs, P.; Roodbergen, K.J.; Vis, I.F.A. Dynamic shipments of inventories in shared warehouse and transportation networks. Transp. Res. Part E Logist. Transp. Rev. 2018, 118, 240–257. [Google Scholar] [CrossRef]
  31. Archetti, C.; Speranza, M.G.; Boccia, M.; Sforza, A.; Sterle, C. A branch-and-cut algorithm for the inventory routing problem with pickups and deliveries. Eur. J. Oper. Res. 2020, 282, 886–895. [Google Scholar] [CrossRef]
  32. Meherishi, L.; Narayana, S.A.; Ranjani, K.S. Sustainable packaging for supply chain management in the circular economy: A review. J. Clean. Prod. 2019, 237, 117582. [Google Scholar] [CrossRef]
  33. Achamrah, F.E.; Bouras, A.; Riane, F.; Darmoul, S. Returnable Transport Items Management: A New Approach to Sidestep Shortage. In Proceedings of the 7th International Conference IEEE on Advanced Logistics and Transport, Marrakech, Morocco, 14–16 June 2019; pp. 92–97. [Google Scholar]
  34. Singh, S.K.; Gupta, Y.; Mishra, A.; Darla, S. Inventory Routing Problem with Simultaneous Pickup and Delivery of Returnable Transport Items with Consideration of Renting and Repairing. Int. J. Eng. Res. Technol. 2017, 6, 92–97. [Google Scholar]
  35. Ech-Charrat, M.R.; Amechnoue, K.; Zouadi, T. Dynamic Planning of Reusable Containers in a Close-loop Supply Chain under Carbon Emission Constrain. Int. J. Supply Oper. Manag. 2017, 4, 279–297. [Google Scholar] [CrossRef]
  36. Ren, J.; Liu, B.; Wang, Z. An optimization model for multi-type pallet allocation over a pallet pool. Adv. Mech. Eng. 2017, 9, 1687814017705841. [Google Scholar] [CrossRef] [Green Version]
  37. Tornese, F.; Pazour, J.A.; Thorn, B.K.; Roy, D.; Carrano, A.L. Investigating the environmental and economic impact of loading conditions and repositioning strategies for pallet pooling providers. J. Clean. Prod. 2018, 172, 155–168. [Google Scholar] [CrossRef]
  38. Hassanzadeh Amin, S.; Wu, H.; Karaphillis, G. A perspective on the reverse logistics of plastic pallets in Canada. J. Remanuf. 2018, 8, 153–174. [Google Scholar] [CrossRef] [Green Version]
  39. Zhou, K.; Song, R. Location Model of Pallet Service Centers Based on the Pallet Pool Mode. In Proceedings of the 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chengdu, China, 15–17 March 2019; pp. 1185–1189. [Google Scholar] [CrossRef]
  40. Achamrah, F.E.; Riane, F.; Bouras, A.; Sahin, E. Collaboration Mechanism for Shared Returnable Transport Items in Closed Loop Supply Chains. In Proceedings of the 9th International Conference on Operations Research and Enterprise Systems, Valletta, Malta, 22–24 February 2020. [Google Scholar] [CrossRef]
  41. Karp, R. Reducibility among Combinatorial Problems; Springer: Boston, MA, USA, 1972; Volume 40, pp. 85–103. [Google Scholar] [CrossRef]
  42. Papadimitriou, C.; Steiglitz, K. On the Complexity of Local Search for the Traveling Salesman Problem. SIAM J. Comput. 1977, 6, 76–83. [Google Scholar] [CrossRef] [Green Version]
  43. Bernardino, H.S.; Barbosa, H.J.C. Grammar-Based Immune Programming for Symbolic Regression BT—Artificial Immune Systems; Springer: Berlin/Heidelberg, Germany, 2009; pp. 274–287. [Google Scholar]
  44. De Castro, L.; Jos, F.; von Zuben, A.A. Artificial Immune Systems: Part II—A Survey of Applications. 2000. Available online: http://https://www.dca.fee.unicamp.br/~vonzuben/tr_dca/trdca0200.pdf (accessed on 1 May 2022).
  45. Wong, E.Y.C.; Lau, H.Y.K.; Mak, K.L. Immunity-based evolutionary algorithm for optimal global container repositioning in liner shipping. Spectrum 2010, 32, 739–763. [Google Scholar] [CrossRef] [Green Version]
  46. Tiwari, M.K.; Prakash; Kumar, A.; Mileham, A.R. Determination of an optimal assembly sequence using the psychoclonal algorithm. Proc. Inst. Mech. Eng. Part B J. Eng. Manuf. 2005, 219, 137–149. [Google Scholar] [CrossRef]
  47. Panigrahi, B.K.; Yadav, S.R.; Agrawal, S.; Tiwari, M.K. A clonal algorithm to solve economic load dispatch. Electr. Power Syst. Res. 2007, 77, 1381–1389. [Google Scholar] [CrossRef]
  48. Pierrard, T.; Coello Coello, C.A. A Multi-Objective Artificial Immune System Based on Hypervolume BT—Artificial Immune Systems; Springer: Berlin/Heidelberg, Germany, 2012; pp. 14–27. [Google Scholar]
  49. Navarro, M.; Herath, P.; Villarrubia, G.; Prieto-Castrillo, F.; Venyagamoorthy, G. An Evaluation of a Metaheuristic Artificial Immune System for Household Energy Optimization. Complexity 2018, 2018, 9597158. [Google Scholar] [CrossRef] [Green Version]
  50. Talbi, E.G. Machine Learning into Metaheuristics: A Survey and Taxonomy of Data-Driven Metaheuristics. 2020. Available online: https://hal.inria.fr/hal-02745295/file/ACM-CR.pdf (accessed on 1 May 2022).
  51. Bello, I.; Pham, H.; Le, Q.; Norouzi, M.; Bengio, S. Neural Combinatorial Optimization with Reinforcement Learning. arXiv 2016, arXiv:1611.09940. [Google Scholar]
  52. Dai, H.; Khalil, E.; Yuyu, Z.; Dilkina, B.; Song, L. Learning Combinatorial Optimization Algorithms over Graphs. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  53. Han, M.; Senellart, P.; Bressan, S.; Wu, H. Routing an autonomous taxi with reinforcement learning. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, IN, USA, 24–28 October 2016; pp. 2421–2424. [Google Scholar]
  54. Kaempfer, Y.; Wolf, L. Learning the multiple traveling salesmen problem with permutation invariant pooling networks. arXiv 2018, arXiv:1803.09621. [Google Scholar]
  55. Huang, D.; Mao, Z.; Fang, K.; Chen, L. Solving the shortest path interdiction problem via reinforcement learning. Int. J. Prod. Res. 2021, 1–18. [Google Scholar] [CrossRef]
  56. Ahmadian, S. Approximation Algorithms for Clustering and Facility Location Problems. Ph.D. Thesis, University of Illinois at Urbana-Champaign, Champaign, IL, USA, 2017. Available online: http://hdl.handle.net/10012/11640 (accessed on 1 May 2022).
  57. OroojlooyJadid, A.; Hajinezhad, D. A review of cooperative multi-agent deep reinforcement learning. arXiv 2019, arXiv:1908.03963. [Google Scholar]
  58. Lu, H.; Zhang, X.; Yang, S. A Learning-Based Iterative Method for Solving Vehicle Routing Problems. 2019. Available online: https://openreview.net/forum?id=BJe1334YDH (accessed on 1 May 2022).
  59. Duan, L.; Zhan, Y.; Hu, H.; Gong, Y.; Wei, J.; Zhang, X.; Xu, Y. Efficiently solving the practical vehicle routing problem: A novel joint learning approach. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 3054–3063. [Google Scholar]
  60. Achamrah, F.E.; Riane, F.; Aghezzaf, E.H. Bi-level programming for modelling inventory sharing in decentralized supply chains. Transp. Res. Procedia 2022, 62, 517–524. [Google Scholar] [CrossRef]
  61. Nakib, A.; Hilia, M.; Heliodore, F.; Talbi, E.G. Design of metaheuristic based on machine learning: A unified approach. In Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Lake Buena Vista, FL, USA, 29 May–2 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 510–518. [Google Scholar]
  62. Seyyedabbasi, A.; Aliyev, R.; Kiani, F.; Gulle, M.U.; Basyildiz, H.; Shah, M.A. Hybrid algorithms based on combining reinforcement learning and metaheuristic methods to solve global optimization problems. Knowl. Based Syst. 2021, 223, 107044. [Google Scholar] [CrossRef]
  63. Talbi, E.G. Metaheuristics: From Design to Implementation; John Wiley & Sons: Hoboken, NJ, USA, 2009; Volume 74. [Google Scholar]
  64. Chiang, C.W.; Lee, W.P.; Heh, J.S. A 2-Opt based differential evolution for global optimization. Appl. Soft Comput. 2010, 10, 1200–1207. [Google Scholar] [CrossRef]
  65. Alom, M.Z.; Taha, T.M.; Yakopcic, C.; Westberg, S.; Sidike, P.; Nasrin, M.S.; Hasan, M.; Van Essen, B.C.; Awwal, A.A.S.; Asari, V.K. A State-of-the-Art Survey on Deep Learning Theory and Architectures. Electronics 2019, 8, 292. [Google Scholar] [CrossRef] [Green Version]
  66. Mohtashami, A. A Novel Dynamic Genetic Algorithm-Based Method for Vehicle Scheduling in Cross Docking Systems with Frequent Unloading Operation. Comput. Ind. Eng. 2015, 90, 221–240. [Google Scholar] [CrossRef]
Figure 1. Overview of the implementation of the AIS-DQL matheuristic.
Figure 1. Overview of the implementation of the AIS-DQL matheuristic.
Sustainability 14 05805 g001
Figure 2. Structure representing the route of Supplier i (antibodies).
Figure 2. Structure representing the route of Supplier i (antibodies).
Sustainability 14 05805 g002
Figure 3. DQ architecture adapted from [19].
Figure 3. DQ architecture adapted from [19].
Sustainability 14 05805 g003
Figure 4. Ratio of quantity of RTIs bought over the inventory level of empty RTIs made available in SM and IRPPDS.
Figure 4. Ratio of quantity of RTIs bought over the inventory level of empty RTIs made available in SM and IRPPDS.
Sustainability 14 05805 g004
Figure 5. Saving in terms of total costs for the various unit costs of procurement.
Figure 5. Saving in terms of total costs for the various unit costs of procurement.
Sustainability 14 05805 g005
Figure 6. Variation in cost reduction for the various unit costs of transportation.
Figure 6. Variation in cost reduction for the various unit costs of transportation.
Sustainability 14 05805 g006
Table 1. Characteristics of RTIs management strategies.
Table 1. Characteristics of RTIs management strategies.
DMSMVirtual Pooling Mode
Owner of RTIsEach supplierAll suppliers or a pooler companyEach supplier
Management of empty RTIs, collection, refurbishing…Each supplierOne pooler companyAll suppliers
Storage of empty and shared RTIs-In dedicated facilitiesAt suppliers’ level
Table 2. Model’s notation summary.
Table 2. Model’s notation summary.
Sets
NSet of n customers.
PSet of m suppliers.
RSet of u RTIs.
VSet of k vehicles.
TSet of l periods.
Parameters
aFixed cost of transportation (€ per km).
bVariable cost of transportation (€ per weight per km).
H p L , r , h i L , r , H p E , r , H i E , r Cost of holding inventory of loaded and empty RTIs, respectively, for each supplier p and customer i.
e r Cost of buying a new RTI of type r (€ per unit).
s r Cost of sharing incurred by each supplier which is proportional to the number of unowned empty RTIs of a type r used at his level to deliver products (€ per unit of unowned RTI used).
g r Cost of maintenance of one RTI of type r (€ per RTI loaded).
w L r , w E r Weights of a loaded and empty RTIs of type r, respectively.
QCapacity of vehicle in terms of the number of RTIs.
d i j p Distance between nodes i and j  N p .
D p i t r Demand of each customer i for each period t loaded on an RTI r satisfied by supplier p.
I p 0 L , r , L i 0 L , r , I p 0 E , r , L i 0 E , r Initial inventory level of loaded and empty RTIs of type r, respectively, for the supplier p and customer i.
C p L , c i L , C p E , c i E Maximum holding capacity for loaded and empty RTIs, respectively, for the supplier p and customer i.
Decision variables
x i j v t p Binary variable stating whether the vehicle v visited node j immediately after node i in period t.
F p t p t Quantity of empty RTIs of type r owned by supplier p that have been filled with products by supplier p in period t. This quantity also includes the case where p = p (supplier uses his RTI).
I L r p t Inventory level of loaded RTIs of type r held by the supplier p at the end of period t.
L p i t L r Inventory level of RTIs of type r filled with the product of supplier p by customer i at the end of period t.
Q p i t p r Quantity of loaded RTIs of type r owned by supplier p and delivered by supplier p to customer i in period t.
X i j t p r Quantity of loaded RTIs of type r filled with a product of supplier p transported from node i to node j in period t.
L i t E r Inventory level of empty RTIs of type r held by the customer i at the end of period t.
I p t E r Total quantities of all empty RTIs of type r held by the supplier p at the end of period t.
Z i t p r Quantity of empty RTIs of type r owned by supplier p collected from a customer i in period t.
W i p t p r Quantity of empty RTIs of type r owned by supplier p and collected from customer i by supplier p in period t.
E i j t p r Quantity of empty RTIs of type r collected by supplier p transported from node i to node j in period t.
n t p r Quantity of RTIs of type r bought by supplier p in period t.
Table 3. Statistical analysis using ANOVA.
Table 3. Statistical analysis using ANOVA.
Resolution ApproachFp-Value
GA2.160.14
AIS1.570.22
GA-DQL0.910.34
AIS-DQL0.490.49
Table 4. Values of the tuning parameters.
Table 4. Values of the tuning parameters.
Tuned ParameterValue
Population size (GA/AIS)200
Maximum iteration number (GA/AIS)200
Crossover probability (GA/AIS)0.81
Mutation probability (GA/AIS)0.46
Selection probability0.80
Receptor editing rate0.28
Table 5. Computational results for DM, SM, and IRPPDS on small and medium instances solved using CPLEX.
Table 5. Computational results for DM, SM, and IRPPDS on small and medium instances solved using CPLEX.
InstancesModelT (€)I-S (€)I-C (€)I-K (€)M (€)P (€)S (€)TC (€)CS (%)CPU (s)
1R2S6P40V5TDM106,899138614280141280,2240390,078-424
SM105,3095921368751133122,7712371233,29440629
IRPPDS84,85411881308014198,739865187,09552451
1R2S12P40V5TDM315,279370617210299907,08501,228,090-5050
SM245,866203213592390331737,2688940998,186196445
IRPPDS228,259356211900294415,4802998651,784475265
1R2S18P40V5TDM519,475306749880366831,05101,358,947-8776
SM471,330171026801765351731,67855041,215,0181112,115
IRPPDS402,425253219110352611,92048351,023,975259331
1R2S24P40V5TDM853,0124136804006853,280,78104,146,653-24,314
SM711,3003230406128906882,758,06377613,487,9941631,701
IRPPDS552,8934744304606961,886,92961072,454,4154124,399
2R2S5P40V5TDM267,3341013460701521,306,99801,580,105-473
SM203,25462929577461711,188,76032401,399,75711591
IRPPDS158,77192820110154781,6311326944,82140496
4R2S5P40V5TDM575,795205511,26905021,719,31002,308,932-1309
SM508,1171177582914324261,493,78469282,017,692131626
IRPPDS413,9151859514104571,056,45053241,483,147361316
6R2S5P40V5TDM984,677388611,40106451,953,35002,953,958-3013
SM601,250199313,83430075781,608,98512,6812,242,328244131
IRPPDS571,2453372655205621,499,51214,7112,095,952294405
8R2S5P40V5TDM1,196,050355919,33707992,649,39803,869,143-5423
SM1,027,848209917,12337387342,270,434221,003,344,076147346
IRPPDS704,453330711,69307651,726,95915,4462,462,623365968
10R2S5P40V5TDM1,536,319737622,048010053,674,09705,240,844-7981
SM1,450,099384423,893914611222,852,57245,7354,386,4111610,675
IRPPDS878,014677711,19909672,263,59818,6313,179,185398687
1R5S5P40V5TDM223,327169834730257414,5560643,312-8084
SM213,9349473137964241340,1873415562,8251313,686
IRPPDS157,568159117540233245,5551440408,141378969
1R10S5P60V5TDM470,2663753605205441,040,46401,521,078-22,526
SM465,720174458791869537487,7394204967,6933633,882
IRPPDS383,377266158560505209,1262136603,6616024,005
1R15S5P40V5TDM1,018,250662012,5470798882,49301,920,708-32,387
SM995,463489863644941845677,58318,5441,708,6391141,501
IRPPDS715,381614549660823612,77579221,348,0123034,055
1R20S5P40V5TDM1,595,794950021,299013101,929,22303,557,125-55,543
SM1,419,87964139549796711391,724,05525,9883,194,9891067,511
IRPPDS893,61810,265793301181754,92293031,677,2225355,135
1R25S5P40V5TDM2,251,17511,30628,742014881,758,97604,051,687-67,115
SM2,044,658653025,480538713881,510,89017,2123,611,5451185,413
IRPPDS1,439,15711,08714,999015361,394,43813,9322,875,1482963,468
Table 6. Assessment of the performance of GA, AIS, GA-DQL, and AIS-DQL compared to CPLEX on relatively small and medium instances.
Table 6. Assessment of the performance of GA, AIS, GA-DQL, and AIS-DQL compared to CPLEX on relatively small and medium instances.
InstanceModelCPLEXGAAISGA-DQLAIS-DQL
TC (€)CPU (s)TC (€)CPU (s)Gap (%)TC (€)CPU (s)Gap (%)TC (€)CPU (s)Gap (%)TC (€)CPU (s)Gap (%)
1R2S6P40V5TDM390,078424396,31981.6395,929281.5396,319151.6390,07820.0
SM233,294629257,557910.4254,757229.2245,892105.4234,461310.5
IRPPDS187,095451204,30899.2199,443416.6200,379147.1187,09540.0
1R2S12P40V5TDM1,228,09050501,331,2502568.41,312,8282406.91,251,4244211.91,228,090250.0
SM998,18664451,044,1024274.61,038,1132054.01,032,1243073.41,003,177210.5
IRPPDS651,7845265677,2044013.9689,587575.8682,4184614.7651,784210.0
1R2S18P40V5TDM1,358,94787761,498,9198410.31,382,049321.71,394,280892.61,358,947180.0
SM1,215,01812,1151,364,46652712.31,362,0366212.11,273,339574.81,219,879770.4
IRPPDS1,023,97593311,135,58882410.91,087,4619256.21,054,69448331,023,975500.0
1R2S24P40V5TDM4,146,65324,3144,486,67913948.24,470,0928847.84,457,65216747.54,146,653470.0
SM3,487,99431,7013,857,72198910.63,864,69750810.83,808,8894169.23,508,922210.6
IRPPDS2,454,41524,3992,947,752231320.12,923,2082519.12,648,3144557.92,454,4151630.0
2R2S5P40V5TDM1,580,1054731,922,9881221.71,783,9393712.91,685,972306.71,581,68510.1
SM1,399,7575911,542,5324010.21,511,738128.01,426,353201.91,402,557420.2
IRPPDS944,8214961,037,413189.8993,95245.2963,717352945,76630.1
4R2S5P40V5TDM2,308,93213092,673,7437515.82,542,1347610.12,403,59844.12,308,932120.0
SM2,017,69216262,360,70039172,209,373419.52,031,816510.72,017,692220.0
IRPPDS1,483,14713161,739,73112417.31,576,585216.31,566,203795.61,483,14750.0
6R2S5P40V5TDM2,953,95830133,202,090638.43,190,275428.03,140,0572276.32,953,95860.0
SM2,242,32841312,679,5835819.52,684,067619.72,419,47287.92,249,055530.3
IRPPDS2,095,95244052,290,876709.32,292,9713799.42,179,7902404.02,098,04820.1
8R2S5P40V5TDM3,869,14354234,604,28053819.04,016,1705273.83,927,1803921.53,873,012290.1
SM3,344,07673463,691,86052510.43,410,9576042.03,390,8932091.43,347,420410.1
IRPPDS2,462,62359682,856,643350162,570,978464.42,561,128314.02,462,623440.0
10R2S5P40V5TDM5,240,84479815,801,61473310.75,848,78267011.65,382,3472932.75,246,085280.1
SM4,386,41110,6754,618,8913195.34,614,5044005.24,496,0714152.54,425,888320.9
IRPPDS3,179,18586873,795,94744719.43,808,66420619.83,344,5031775.23,179,185600.0
1R5S5P40V5TDM643,3128084702,4974219.2714,7206311.1656,178302.0643,955270.1
SM562,82513,686665,25927618.2613,479549.0602,786387.1567,32860.8
IRPPDS408,1418969454,26172611.3422,4267023.5436,7111007.0408,54970.1
1R10S5P60V5TDM1,521,07822,5261,718,81842213.01,630,59617327.21,522,5993370.11,522,5991680.1
SM967,69333,8821,065,43030710.11,065,43098010.11,047,0434758.2969,628520.2
IRPPDS603,66124,005719,56428519.2705,68050816.9644,1062956.7604,2651600.1
1R15S5P40V5TDM1,920,70832,3872,214,576227215.32,212,656143315.22,058,99928717.21,920,708480.0
SM1,708,63941,5011,942,72252513.71,905,132124511.51,802,6143955.51,717,182220.5
IRPPDS1,348,01234,0551,419,45729475.31,443,72115987.11,376,3209492.11,349,3602890.1
1R20S5P40V5TDM3,557,12555,5434,197,408165618.04,030,223500513.33,841,6951488.03,557,1251030.0
SM3,194,98967,5113,600,75391012.73,466,5634068.53,252,4995101.83,201,379120.2
IRPPDS1,677,22255,1351,893,58490212.91,893,584531212.91,769,4694515.51,677,2222960.0
1R25S5P40V5TDM4,051,68767,1154,590,561335813.34,517,631202211.54,221,85816364.24,051,6875240.0
SM3,611,54585,4134,174,946314015.64,113,54940013.93,795,7332115.13,611,545420.0
IRPPDS2,875,14863,4683,346,672406416.43,197,165375011.22,955,652702.82,875,1483850.0
Table 7. Computational results for large instances obtained using GA-DQL and AIS-DQL.
Table 7. Computational results for large instances obtained using GA-DQL and AIS-DQL.
InstancesAIS-DQLGA-DQLDiff (%)
DMSMIRPPDSCS (%)DMSMIRPPDSCS (%)
TC (€)CPU (s)TC (€)CPU (s)TC (€)CPU (s)SMIRPPDSTC (€)CPU (s)TC (€)CPU (s)TC (€)CPU (s)SMIRPPDSDMSMIRPPDS
15R15S15C90V5T1,055,305330979,362274806,1912177241,217,8225271,074,615318902,934276122613911
15R20S15C120V5T2,698,5168242,407,0324751,827,52625411323,192,34412072,919,5615502,077,897316935151812
15R30S15C190V5T5,826,310445,075,94810703,877,63798013336,659,472705,079,08912034,552,3461259243213015
15R40S15C250V5T13,140,45618112,054,1263357,920,96739184015,715,98530014,922,2663729,006,139473543161912
15R50S15C350V5T27,632,33351319,565,56439517,761,072329293631,031,11082927,865,27146320,673,8884101033113014
15R60S15C450V5T64,334,83613558,442,22216542,490,64825693472,248,02122771,156,12119749,884,021329231111815
15R70S15C550V5T143,862,93297134,278,72315680,904,270156744172,347,793151134,553,70016997,004,220198224417017
15R80S15C650V5T301,974,530717268,660,125922200,455,20111451134339,721,3461155286,184,6291146225,311,6461454163411611
15R100S15C750V5T641,210,086398606,987,781443429,635,068414533721,361,347615612,861,772471514,273,176536152911116
15R200S15C2000V5T1,291,975,415106832,110,939694663,928,17675136491,536,158,7681651,135,671,285852794,722,0279272648162716
15R300S15C4000V5T2,721,138,3437282,256,821,8808032,082,860,76966517233,091,213,15811462,503,447,9698702,447,361,4048271921121015
15R400S15C6000V5T5,791,824,4722684,014,243,7372683,126,883,84023731466,718,516,3884094,960,776,5752893,592,789,5322922647141913
15R600S15C8000V5T11,431,199,7153848,504,611,0933577,582,649,005178263413,603,127,6616249,546,143,9043918,507,732,1842173037161111
31R20S34C400V5T4,428,8837403,736,5128453,445,81297116225,053,35611054,036,0519543,979,9131182202112713
31R40S34C900V5T10,976,1873017,719,3785697,089,787839303513,028,7344759,160,2486908,507,74410333035161617
31R60S34C1300V5T23,233,21991716,916,59639514,939,103312273627,368,732138825,748,71942416,731,795392639153411
31R80S34C2500V5T53,970,01959543,722,96562228,413,814603194762,605,22289057,142,01471431,993,955745949142311
31R110S34C4000V5T120,114,26333197,779,29972465,665,2519381945138,972,202492103,217,41985877,288,0001188264414515
31R130S34C5200V5T265,313,611515251,439,010284210,348,468214521315,457,883796299,527,277346243,583,526277523161614
31R150S34C6000V5T572,377,195294506,511,145531410,627,4554951228681,701,239456569,930,183608473,453,4566141631161113
31R200S34C9000V5T1,227,011,5389421,134,046,280346959,955,0511728221,425,787,40714811,312,205,0273851,136,586,780221820141416
31R300S34C14000V5T2,529,324,2145612,398,032,1835631,745,537,7234355313,030,130,4088692,645,201,2536972,049,261,287556133217915
31R400S3418000V5T5,169,582,1083194,520,814,5154833,311,432,13051113365,789,931,9615165,245,182,1295563,751,852,603640935111412
31R500S34C20000V5T11,050,906,4662448,845,362,9285476,925,392,214728203713,006,916,91037612,963,475,1176147,964,201,046925039153213
31R600S34C24500V5T23,471,937,843102718,265,998,726112814,568,743,0681110223826,734,537,203159823,081,746,819142017,278,529,27914121435122116
31R700S34C29000V5T49,800,010,40793332,665,152,291109026,308,338,1661160344757,270,011,968143956,207,283,103118531,096,455,7121430246134215
Average4,489,886,8924793,287,441,1685572,646,228,01655617355,185,154,4027434,685,635,0816443,092,258,3276971435141614
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Achamrah, F.E.; Riane, F.; Sahin, E.; Limbourg, S. An Artificial-Immune-System-Based Algorithm Enhanced with Deep Reinforcement Learning for Solving Returnable Transport Item Problems. Sustainability 2022, 14, 5805. https://doi.org/10.3390/su14105805

AMA Style

Achamrah FE, Riane F, Sahin E, Limbourg S. An Artificial-Immune-System-Based Algorithm Enhanced with Deep Reinforcement Learning for Solving Returnable Transport Item Problems. Sustainability. 2022; 14(10):5805. https://doi.org/10.3390/su14105805

Chicago/Turabian Style

Achamrah, Fatima Ezzahra, Fouad Riane, Evren Sahin, and Sabine Limbourg. 2022. "An Artificial-Immune-System-Based Algorithm Enhanced with Deep Reinforcement Learning for Solving Returnable Transport Item Problems" Sustainability 14, no. 10: 5805. https://doi.org/10.3390/su14105805

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop