Next Article in Journal
Mathematical Modeling and Machining of the Internal Double-Arc Spiral Bevel Gear by Finger Milling Cutters for the Nutation Drive Mechanism
Previous Article in Journal
Design of a Herringbone-Grooved Bearing for Application in an Electrically Driven Air Compressor
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Q-Learning-Based Artificial Bee Colony Algorithm for Distributed Three-Stage Assembly Scheduling with Factory Eligibility and Setup Times

School of Automation, Wuhan University of Technology, Wuhan 430070, China
*
Author to whom correspondence should be addressed.
Machines 2022, 10(8), 661; https://doi.org/10.3390/machines10080661
Submission received: 7 July 2022 / Revised: 29 July 2022 / Accepted: 4 August 2022 / Published: 5 August 2022
(This article belongs to the Section Industrial Systems)

Abstract

:
The assembly scheduling problem (ASP) and distributed assembly scheduling problem (DASP) have attracted much attention in recent years; however, the transportation stage is often neglected in previous works. Factory eligibility means that some products cannot be manufactured in all factories. Although it extensively exists in many real-life manufacturing processes, it is hardly considered. In this study, a distributed three-stage ASP with a D P m 1 layout, factory eligibility and setup times is studied, and a Q-learning-based artificial bee colony algorithm (QABC) is proposed to minimize total tardiness. To obtain high quality solutions, a Q-learning algorithm is implemented by using eight states based on population quality evaluation, eight actions defined by global search and neighborhood search, a new reward and an adaptive ε greedy selection and applied to dynamically select the search operator; two employed bee swarms are obtained by population division, and an employed bee phase with an adaptive migration between them is added; a new scout phase based on a modified restart strategy is also presented. Extensive experiments are conducted. The computational results demonstrate that the new strategies of QABC are effective, and QABC is a competitive algorithm for the considered problem.

1. Introduction

Scheduling is an important decision-making process in manufacturing and service industries and has been widely studied since 1954. As a typical scheduling problem, ASP is an effective way to balance batch production and production flexibility and has attracted much attention. After the pioneering works of Lee et al. [1] and Potts et al. [2], a number of related works have been obtained. In recent years, Framinan et al. [3] gave a unified notation for ASP and provided a full review of the previous works and future topics. Komaki et al. [4] implemented a consolidated survey of ASP and proposed salient research opportunities.
Two-stage ASP has been widely studied, which consists of a fabrication stage and an assembly stage, and various methods such as exact algorithm, heuristic and meta-heuristics are used to solve the problem. Since meta-heuristic performs better than the exact algorithm ([5,6]) on large-scale scheduling problems and often can produce better results than heuristics, meta-heuristics have become the main approach for solving two-stage ASP, which includes genetic algorithm (GA [7,8]), tabu search (TS [9]), particle swarm optimization (PSO [9]), grey wolf optimizer [10], differential evolution (DE [11]), and imperialist competitive algorithm (ICA [12]) etc.
However, real-life assembly production is typically composed of three sequential stages: fabrication, transportation and assembly. It is unreasonable to ignore the collection and transfer of parts or components, so it is necessary to deal with a three-stage ASP with the transportation stage between the fabrication stage and the assembly stage.
The related results on the three-stage ASP are limited. Christos and George [13] firstly handled the problem and showed that the problem is NP-hard. Hatami et al. [14] presented a mathematical model, a TS and a simulated annealing (SA) for the problem with sequence-dependent setup times (SDST). Maleki-Darounkolaei et al. [15] proposed a meta-heuristic based on SA for the problem with SDST and blocking times. Maleki-Darounkolaei and Seyedi [16] developed a variable neighborhood search (VNS) algorithm and a well-known SA for the same problem. Shoaardebili and Fattahi [17] provided two multi-objective meta-heuristics based on SA and GA to solve the problem with SDST and machine availability. For three-stage ASP with D P m 1 layout, in which m dedicated parallel machines exist at the fabrication stage and one assembly machine is at the assembly stage, Komaki et al. [18] and Campos et al. [19] presented an improved discrete cuckoo optimization algorithm and a general VNS heuristic, respectively.
With the further development of economic globalization, production is shifted from a single factory to multiple factories, and distributed scheduling in multiple factories has attracted much attention [20,21,22,23,24,25]. DASP is the extended version of ASP in multi-factory environments and a number of works have been obtained on DASP with various processing constraints. Some constructive heuristics and meta-heuristics have been developed for DASP with no-idle [26,27,28,29,30]. Gonzalez-Neira et al. [31] studied a biased-randomized simheuristic for the distributed assembly permutation flowshop problem considering stochastic processing times. Li et al. [32] developed a fuzzy distributed assembly flow shop scheduling problem and presented a novel ICA with empire cooperation. Shao and Shao [33] investigated a distributed assembly blocking flowshop scheduling problem and proposed a constructive heuristic algorithm and a product-based insertion process. They also designed a constructive heuristic and a water wave optimization algorithm with problem-specific knowledge to solve the same problem [34]. Yang and Xu [35] dealt with DASP with flexible assembly and batch delivery and presented seven algorithms using four heuristics, a VNS and two iterated greed (IG). Yang et al. [36] proposed a scatter search-based memetic algorithm to solve the distributed assembly permutation flowshop scheduling problem with no-wait, no-idle and due date constraints. Zhang et al. [37] studied a matrix-cube-based estimation of the distribution algorithm to address the energy-efficient distributed assembly permutation flow-shop scheduling problem.
DASP with setup times is also often considered. Song and Lin [38] presented a genetic programming hyper-heuristic algorithm, and Hatami et al. [39] proposed two constructive heuristics, VNS and IG, for the problem with SDST and makespan. Regarding DASP with a D P m 1 layout and setup times, Xiong et al. [40] developed a hybrid GA with reduced VNS and a hybrid discrete DE with reduced VNS. Deng et al. [41] presented a mixed integer linear programming model and a competitive memetic algorithm. Zhang and Xing [42] proposed a memetic social spider optimization algorithm by adopting two improvement techniques, the problem-special local search and self-adaptive restart strategy. Lei et al. [43] designed a cooperated teaching-learning-based optimization algorithm with class cooperation.
As stated above, DASP with various processing constraints, such as the no-idle and setup, is considered; however, some constraints, such as factory eligibility, are seldom investigated. Take factory eligibility as an example; this constraint means that not all factories are eligible for each product, that is, at least one product cannot be produced by all factories. This is the extended version of machine eligibility [44,45,46] and often exists in many real-life multi-factory production environments. For example, a large Chinese electronic display company consists of several factories located in different cities in China, and some products cannot be manufactured in all factories. Qin et al. [46] studied a novel integrated production and distribution scheduling problem with factory eligibility and third-party logistics in hybrid flowshops and proposed three heuristics and an adaptive human-learning-based GA; however, DASP with factory eligibility has hardly been investigated; moreover, DASP with factory eligibility and other constraints such as setup times has also hardly been considered. In the real world, multiple factories, factory eligibility and setup times often exist simultaneously, and their considerations can result in a high application value of the obtained schedule; thus, it is necessary to deal with DASP with factory eligibility and setup times.
In recent years, the integration of reinforcement learning (RL) with meta-heuristics has become a new topic, and some results have been produced for production scheduling. Chen et al. [47] solved flexible job shop scheduling by a self-learning GA with a Q-learning algorithm, which is used to adaptively adjust key parameters of GA. Cao et al. [48] presented a cuckoo search (CS) with RL and surrogate modeling for a semiconductor final testing scheduling problem with multi-resource constraints. Cao et al. [49] developed a knowledge-based CS with a knowledge base based on an RL algorithm for flexible job shop scheduling with sequencing flexibility. In these two papers, the parameters of CS are also adjusted by RL. Oztop et al. [50] dealt with a no-idle flowshop scheduling problem by using a novel general VNS with Q-learning algorithm used to determine the parameters of VNS. Ma and Zhang [51] provided an improved ABC algorithm based on a Q-learning algorithm. Lin et al. [52] applied a Q-learning-based hyper-heuristic (QHH) algorithm to solve a semiconductor final testing scheduling problem. In QHH, a Q-learning algorithm is used to autonomously select a heuristic from a heuristic set. Karimi-Mamaghan et al. [53] proposed a novel efficient IG algorithm for the permutation flowshop scheduling problem, which can adaptively select the perturbation operators using the Q-learning algorithm. The above integrations of RL and meta-heuristics are mainly used to adaptively adjust parameter settings or select a search operator [54,55]. As a result, the performance of the meta-heuristic can be improved, and thus, it is an effective way to add RL into a meta-heuristic for scheduling problems such as DASP with factory eligibility.
As shown above, meta-heuristics, including GA, PSO and VNS, are frequently applied to solve ASP and DASP. As the main method for production scheduling, ABC has been successfully applied to cope with various production scheduling problems in a single factory [56,57,58,59,60] and multiple factories [61,62,63,64,65]; however, ABC is seldom used to solve DASP. Compared with some meta-heuristics such as GA, ABC has some features such as simplicity and ease of implementation; on the other hand, ABC has successful applications in single factory scheduling and distributed scheduling [64,65,66,67] with permutation-based representation, and the solution of DASP is also represented as a permutation of products. ABC is suitable for solving DASP; moreover, the RL algorithm can be integrated easily with ABC because of its above features. As a result, the performance of ABC can be improved effectively, and thus, it can be concluded from the above analyses that it is beneficial to apply ABC to solve DASP by its integration with RL [68].
In this study, transportation stage, factory eligibility and setup times are adopted in a distributed three-stage ASP, and an effective path is given to integrate the Q-learning algorithm and ABC. The main contributions can be summarized as follows. (1) A distributed three-stage ASP with D P m 1 layout, factory eligibility and setup times is considered. (2) A Q-learning-based artificial bee colony (QABC) is proposed to minimize total tardiness. A Q-learning algorithm is implemented by using eight states based on population quality evaluation, eight actions defined by global search and neighborhood search, a new reward and an adaptive ε greedy selection. Unlike the previous works [47,48,49,50], the Q-learning algorithm is applied to dynamically select a search operator. Population division, the employed bee phase with adaptive migration and a new scout phase based on a modified restart strategy are also added. (3) Extensive experiments are conducted to test the performances of QABC by comparing it with other methods from the literature. Computational results demonstrate that the usage of new strategies, including Q-learning, is effective and efficient, and QABC can provide promising results for the considered problem.
The remainder of the paper is organized as follows. The problem description is given in Section 2 followed by an introduction to ABC and Q-learning in Section 3. Section 4 shows the proposed QABC for the problem. Numerical experiments on QABC are reported in Section 5, the conclusions are summarized in the final section and some topics of future research are provided.

2. Problem Description

Distributed three-stage ASP with D P m 1 layout, factory eligibility, and setup times is described as follows. Notations used for this problem are shown in Table 1.
There are n products and F factories in a factory set F . Factory eligibility means that there exists an available factory set F i for product i, F i F . Each factory f has m dedicated parallel machines M 1 f , M 2 f , , M m f for fabrication, a transportation machine T M f and an assembly machine A M f . With respect to T M f , it just works in a factory f, suppose that T M f has sufficient capacity so that all components of any products can be transferred at one time. In a transportation, T M f moves components of just one product i from the fabrication machine of the last finished component to A M f . All components of each product are transported by T M f once.
Each product has m components. When product i is allocated into factory f F i , its m components are first processed on M 1 f , M 2 f , , M m f at the fabrication stage, and then they are collected by T M f and transferred to A M f at the assembly stage; finally, the product is obtained by assembling its all components.
Setup time is anticipatory and can start when a machine is available, which is required for three stages. For production i transferred by T M f , its setup time s t t i f is used to load and unload product i.
Factory eligibility indicates that not all factories are eligible for each product, that is, at least one product i has a set F i F .
All products can be produced at time 0; each machine can fabricate, transport or assemble at most one product at a time; each product can be fabricated, transported or assembled at most one machine at a time; no interruption and breakdowns are considered; once a product is assigned to a factory, it cannot be transferred to another factory.
The problem can be divided into factory assignment sub-problem and scheduling sub-problem. There are strong coupled relations between the two sub-problems. Factory assignment notably affects the results of the scheduling sub-problem, and optimal solutions can be obtained after the solutions to the two sub-problems are effectively combined.
The goal of the problem is to minimize total tardiness when all constraints are met.
T T = i = 1 n T i
An illustrative example with six products ( n = 6 ), three factories ( F = 3 ) and three machines ( m = 3 ) at the fabrication stage of each factory is shown in Table 2. For factory set F = { 1 , 2 , 3 } , product i can be produced by the factory in F i F , F 1 = { 2 } , F 2 = { 1 , 3 } , F 3 = { 1 , 2 } , F 4 = { 1 , 2 , 3 } , F 5 = { 3 } , F 6 = { 1 , 2 , 3 } . “—” For example, in F 1 = { 2 } , product 1 cannot be assigned to factories 1 and 3, then p t 112 = 31 , p t 122 = 26 , and so on. A Gantt chart of a schedule of the example is shown in Figure 1, T 1 = 6 , T 2 = 0 , T 3 = 37 , T 4 = 0 , T 5 = 0 , T 6 = 34 , total tardiness of factory 1, 2, and 3 is 37, 6, 34, respectively, and the corresponding T T is 77.

3. Introduction to ABC and Q-Learning

In this study, ABC and an RL algorithm named Q-learning are integrated together; thus, we introduce ABC and the Q-learning algorithm.

3.1. ABC

In ABC, a feasible solution to the problem is represented by a food source, and a search agent is represented by a bee. All bees are categorized into three groups: employed bees, onlooker bees and scouts. In general, the employed bee tries to exploit a food source, the onlooker bee waits in the hive to make a decision of choosing a food source, and the scout carries out a random search for a new food source.
ABC begins with a randomly generated initial population P with N solutions, and then three phases called employed bee phase, onlooker bee phase, and scout phase are executed sequentially.
In the employed bee phase, each employed bee produces a candidate source x b from x b = { x b 1 , x b 2 , , x b D } , x b P .
x b ω = x b ω + ϕ ( x b ω x c ω )
where D is the number of dimensions, ϕ is a real random number in the range [ 1 , 1 ] , and x c P is a randomly selected solution, b , c { 1 , 2 , , N } , b c , ω { 1 , 2 , , D } .
A greedy selection is applied: if f i t ( x b ) < f i t ( x b ) , then x b substitutes for x b , where f i t ( x b ) denotes fitness of x b .
In the onlooker bee phase, the onlooker bee chooses a food source by roulette selection based on the probability p r o b b .
p r o b b = f i t x b f i t x i y P f i t y v = 1 N f i t x v
Once an onlooker bee selects a food solution x b , a new solution x b is obtained, and then the above greedy selection is applied to decide if x b can be replaced with x b .
In the above two phases, a t r i a l b is computed for each x b . Initially, t r i a l b = 0 . If the newly obtained x b cannot update x b , t r i a l b = t r i a l b + 1 ; otherwise, t r i a l b = 0 .
In the scout phase, if t r i a l b of a food source exceeds a threshold L i m i t , the corresponding employed bee will turn into a scout, which randomly produces a solution to substitute for the food source.

3.2. Introduction to Q-Learning Algorithm

RL is a learning approach that can be applied to a wide variety of complex problems. RL has been extensively considered and has been successfully applied to solve many problems [47,48,49,50,51,69,70].
The Q-learning algorithm [71] is the most commonly used model-free RL algorithm. It provides a learning capability for the intelligence system in the Markov environment to select the optimal action using the experienced action. The main components of Q-learning include a learning agent, an environment, states, actions, and rewards. The illustration plot is shown in Figure 2. The Q-learning algorithm has a simple structure and is implemented easily. It has been successfully integrated with meta-heuristics such as GA, CS and QHH for production scheduling [47,48,52]. Its simplest form is defined by
Q s t , a t Q s t , a t + α [ r t + 1 + γ max a Q s t + 1 , a Q s t , a t ]
where α is the learning rate, γ indicates the discount factor, r t + 1 is the reward received from the environment by taking the a t of s t , and max a Q ( s t + 1 , a ) represents the biggest Q value in the Q-table at state s t + 1 .
Action selection is performed based on the Q-table. Initially, all elements of the Q-table are zero, which means that the agent does not have any learning experience. ε greedy is often used and expressed as follows. If a random number r a n d < ε , then randomly select an action a; otherwise, select an action a that maximizes the Q values, that is, a = arg max a Q s t , a .

4. QABC for Distributed Three-Stage ASP with Factory Eligibility and Setup Times

This study contributes an effective integration of the Q-learning algorithm and ABC to implement the dynamical selection of the search operator. Moreover, two employed bee swarms are used for population division, and a new scout phase based on a modified restart strategy is also applied. The details of QABC are shown below.

4.1. Representation and Search Operators

4.1.1. Solution Representation

Because the problem has two sub-problems, a two-string representation is used, in which a solution is denoted by a factory assignment string [ θ 1 , θ 2 , ⋯, θ n ] and a scheduling string [ q 1 , q 2 , ⋯, q n ] , where factory θ i is allocated for product i, θ i F i and q i is a real number in [ 0 , 1 ] and corresponds to product i.
The scheduling string is a random key one, so suppose that products i, i + 1 , ⋯, j are manufactured in the same factory, that is, θ i = θ i + 1 = ⋯ = θ j , product permutation is determined after all q l are sorted in ascending order, l [ i , j ] , i < j . If q i = q j , then product i will be placed before product j because j is greater than i.
The decoding procedure is shown in Algorithm 1. For the example in Table 2, a possible solution is composed of factory assignment string [ 2 , 3, 1, 2, 3, 1 ] and scheduling string [ 0.98 , 0.43 , 0.32 , 0.21 , 0.72 , 0.67 ] . For factory 1, products 3 and 6 are assigned to it in terms of factory assignment string, their permutation [3, 6] is obtained because q 3 < q 6 , that is, product 3 starts followed by product 6.  Take product 3 as an example; three components of it are first processed on M 1 f , M 2 f , M 3 f , and then they are collected by T M f and transferred to A M f to assemble them. The corresponding schedule is illustrated in Figure 1.
Algorithm 1: Decoding procedure
Input: factory assignment string [ θ 1 , θ 2 , ⋯, θ n ] ; scheduling string [ q 1 , q 2 , ⋯, q n ]
Output: Permutations of all factories
 1: for f = 1 to F do
 2:    Find all products allocated to factory f according to factory assignment string
 3:    Determine permutation of all products in factory f by sorting q l in ascending order
 4:    Start with the first product on the permutation, handle the fabrication of all of its components, transfer all of its components to A M f and assemble them.
 5: end for

4.1.2. Search Operators

In this study, a search operator is made up of a global search between two solutions, reassignment, inversion and neighborhood search.
A global search between solutions x , y is shown below. Solution z is produced by a uniform crossover of both the factory assignment string and scheduling string of x , y , and greedy selection is applied: if z is better than x, then x is replaced with z. Figure 3 describes the process of a uniform crossover of the above two strings. In Figure 3a, a string Θ of random numbers [0.67, 0.78, 0.13, 0.69, 0.28, 0.91] is obtained, and then, a new factory assignment string [ 2 , 1 , 1 , 3 , 3 , 1 ] is produced by elements in string Θ . For example, the first element is 0.67 > 0.5 , and the first gene of z is selected from y; the third element is 0.13 0.5 , and the third gene of z is from x.
Total tardiness is related to each factory, so uniform crossover is used and simultaneously acts on two strings of x, y.
The reassignment operator acts on a factory assignment string of a solution x in the following way: randomly select β = μ × n genes, and then each chosen gene θ l is displaced by a randomly decided factory in F l , a new solution z is obtained, and a greedy selection is executed, where μ is a random decimal in the range (0, 1], and u ( u ) indicates the closest integer to u. An example of a reassignment operator is shown in Figure 4. If μ = 0.45, β = 3 , three products 2, 4, and 6 are randomly selected. θ 2 = 1 can be obtained, which is randomly chosen from F 2 . θ 4 = 3 and θ 6 = 2 are generated similarly.
Inversion is described as follows. For scheduling string of a solution x, randomly decide τ 1 , τ 2 , τ 1 < τ 2 and invert genes between positions τ 1 and τ 2 . A new solution z is produced, and greedy selection is complete.
Eight neighborhood structures N 1 N 8 are used to construct a neighborhood search. The factory with maximum total tardiness is defined as the critical factory f * . The position is decided based on the product permutation of the factory.
Neighborhood structure N 1 is described below. Stochastically select a product i from the factory f * , insert i into a randomly decided position of the factory f * , and reassign q i of each product according to the product permutation of the factory f * . For the above solution of the example, the critical factory f * is 1, product 3 is inserted into the position of product 6, and a new permutation is 6 , 3 , so q 6 = 0.32 and q 3 = 0.67 .
When a randomly chosen factory substitutes for the factory f * in N 1 , N 2 is obtained. N 3 is shown as follows. Swap two randomly selected products from the factory f * . N 4 differs from N 3 in that a stochastically chosen factory is used.
N 5 acts on the factory f * in the following way: a product i with T i > 0 is randomly selected from the factory f * , suppose that i is on the position τ 1 of product permutation of the factory f * , insert i into a randomly decided position τ 2 < τ 1 . When a randomly selected factory is substituted for factory f * in N 5 , N 6 is produced.
N 7 is shown below. Randomly find a product i with T i > 0 from the factory f * and stochastically choose a factory f F i , remove i from the critical factory and insert it into a randomly decided position of factory f. An example of N 7 is shown in Figure 5, in which f * = 2 , i = 4 with T 4 > 0 is selected stochastically, and θ 4 is replaced by another factory 1 that is randomly chosen from F 4 .
The above neighborhood structures of the critical factory are proposed because of the following feature of the problem: a new position of product i in critical factory f * or a movement of product i from factory f * to another factory is very likely to diminish total tardiness.
Seven neighborhood searches are constructed by different combinations of neighborhood structures. N S 1 contains four neighborhood structures N 1 , N 3 , N 5 , N 7 related to the critical factory f * . N S 2 consists of N 2 , N 4 , N 6 , N 8 . In N S 3 , six insertion-related neighborhood structures N 1 , N 2 , N 5 , N 6 , N 7 , N 8 are applied. N S 4 is composed of two swap-based neighborhood structures N 3 , N 4 .
N S 5 is established by N 1 , N 2 , N 3 and N 4 . N 5 , N 6 , N 7 , and N 8 are used in N S 6 . N S 7 has all eight structures for a comprehensive effect.
The procedure of each N S φ is given in Algorithm 2. Seven search operators are defined, each of which is composed of a global search, reassignment, inversion and N S φ , φ { 1 , 2 , , 7 } . w φ is the number of neighborhood structure in N S φ , w 1 = 4 , w 2 = 4 , w 3 = 6 , w 4 = 2 , w 5 = 4 , w 6 = 4 , w 7 = 8 .
Algorithm 2: N S φ
Input:x, R 1
Output: updated solution x
 1: let I t e r = 0
 2: while I t e r < R 1 do
 3:     randomly decide a usage sequence of all neighborhood
 4:     structures of N S φ
 5:     suppose that the obtained sequence is g 1 , g 2 , , g w φ
 6:     let h = 1
 7:     while  h w φ  do
 8:         produce a new solution z N g h ( x )
 9:         if  f i t ( z ) < f i t ( x )  then
 10:             x = z
 11:         else
 12:             h = h + 1
 13:         end if
 14:          I t e r = I t e r + 1
 15:     end while
 16: end while
 17: return updated solution x

4.2. Q-Learning Algorithm

In this study, the Q-learning algorithm is integrated with ABC to dynamically select the search operator. To realize the above purpose, population evaluation results are used to describe state s t , the search operator described above is applied to depict action a t , and, as a result, action selection can result in a dynamical selection of the search operator.

4.2.1. State and Action

Three indices are used to evaluate population quality, which are t r i a l * of elite solution x * , evolution quality E v o t of population P and diversity index D t . Initially, t r i a l * = 0 , if elite solution x * is updated, then t r i a l * = 0 ; otherwise, t r i a l * = t r i a l * + 1 , where t r i a l * is defined similarly to t r i a l i in Section 3.1.
E v o t = x i P u i t N
D t = x i P v i t v i N 1 N 1 N
where u i t = 1 if t r i a l i = 0 on generation t and 0 otherwise; v i t = | { x j | j i , f i t ( x j ) f i t ( x i ) } | on generation t.
Eight states are depicted by using three indices, as shown in Table 3. t r i a l * = 0 means that the elite solution x * is updated on generation t. Elite solution x * does not deteriorate because of greedy selection, so t r i a l * may be 0 or positive, μ 1 , μ 2 are integers, μ 2 > μ 1 . μ 1 = 20 and μ 2 = 50 are obtained by experiments. For E v o t and D t , two cases exist, which are E v o t D t and E v o t < D t .
For the instance of 250 × 5 × 20 depicted in Section 5, Figure 6 shows the percentage of occurrence in four cases of t r i a l * and two cases of E v o t and D t in the whole search process of QABC, and Figure 7 presents a pie chart of the percentage of the eight states. It can be found that all states exist in the search process of QABC, so it is reasonable to set eight states.
In QABC, population P is divided into two employed bee swarms E B 1 , E B 2 and an onlooker bee swarm O B . Population division is shown below. Initially, E B 1 , E B 2 , O B are empty. The dividing steps are shown below. Randomly select β 1 × N solutions from population P and add them into E B 1 , then stochastically choose β 2 × N solutions from the remaining part of P and include them in E B 2 ; finally, O B consists of the remaining solutions in P. β 1 , β 2 [ 0.25 , 0.4 ] based on experiments.
Seven search operators are directly defined as actions a 1 , a 2 , ⋯, a 7 . a φ is composed of global search, reassignment, inversion and N S φ . Once action a φ , φ 7 is chosen, it acts on E B 1 , E B 2 and O B . Action a 8 is defined by randomly selecting a search operator a φ , φ 7 for E B 1 , E B 2 and O B , respectively, so when a 8 is selected, E B 1 , E B 2 and O B may apply different search operators.

4.2.2. Reward and Adaptive Action Selection

Elite solution x * is the output of QABC, and its improvement is very important for QABC. When t r i a l * = 0 , that is, x * is updated, a positive reward should be given; moreover, the bigger the E v o t + D t + I m e t is, the bigger the reward is. When t r i a l * > 0 , the elite solution is kept invariant; in this case, a negative reward should be added. Based on the above analyses, reward r t + 1 is defined by
r t + 1 = e 4 × I m e t + 1 + E v o t + 1 + D t + 1 4 × I m e t + 1 + E v o t + 1 + D i t + 1 3 3 i f t r i a l * = 0 e E v o t + 1 + D i t + 1 o t h e r w i s e
Let A , B indicate f i t ( x * ) on generations t and t + 1 , respectively,
I m e t + 1 = ( A B ) / A
For ε -greedy action selection, the learner will explore with the probability ε and exploit the historical experience with the probability of 1 ε by choosing the action with the highest Q value, where ε plays a key role in the trade-off between exploration and exploitation, and some adaptive methods are used [72,73].
In this study, a new adaptive ε -greedy action selection is proposed, where ε is adaptively changed with t r i a l * and the current selected action a t ,
ε max ε 0 , ε × 1 ε i f t r i a l * = 0 a n d a t = arg max a Q s t , a o r t r i a l * > 0 a n d a t arg max a Q s t , a min ε × 1 + ε , 1 ε 0 i f t r i a l * > 0 a n d a t = arg max a Q s t , a o r t r i a l * = 0 a n d a t arg max a Q s t , a
where ε 0 = 0.01 . Obviously, ε [ 0.01 , 0.99 ] .
If t r i a l * = 0 and a t = max a Q ( s t , a ) , that is, action a t with the biggest Q ( s t , a ) leads to new x * , in this case, ε should be reduced to enlarge the probability of exploitation; if t r i a l * = 0 and a t max a Q ( s t , a ) , that is, a randomly chosen a t results in a new x * , then ε should increase for a larger probability of exploration. Two other cases can be explained in the same way.
For instance 250 × 5 × 20 , Figure 8 shows the updating processes of state and action. When the stopping condition reaches t = 323 , Figure 8 describes the changes of state and action in the whole process of the Q-learning algorithm. It can be found that population P can keep a state for many generations; for example, the population is in state 6 between generations 162 and 183. Moreover, the action often changes from a 1 to a 8 . An example of the update process of the Q-table is given in Table 4. If s t = 8, s t + 1 = 2, a t = 6, α = 0.1, γ  = 0.8, r t + 1 = 1.38 according to Equation (7). As shown in Table 4(a), Q(8, 2) = 0.879 before updating, and after the Q-table is updated, 0.653 for Q(8, 2) is obtained by Equation (4), which is shown in Table 4(b). The selection of a search operator also exists in a hyper-heuristic, in which a low-level heuristic (LLH) is often selected by using a random method, choice function and tabu search; however, the selection of a hyper-heuristic is often time-consuming. Lin et al. [52] applied a Q-learning algorithm to select an LLH from a set of LLH. Our Q-learning algorithm differs from the work of Lin et al. (1) Fitness proportion is used to depict the state [52], while population evaluation is applied to describe the state in this study. (2) Lin et al. [52] employed a Q-learning algorithm as high-level strategy, which is a part of the hyper-heuristic. The Q-learning algorithm is only adopted to select the search operator and does not substitute any phases of ABC, so in QABC, three phases still exist and are not replaced with Q-learning.

4.3. Three Phases of QABC

On each generation of t, two employed bee swarms E B 1 , E B 2 are used by population division, and the employed bee phase with adaptive migration between them is shown in Algorithm 3, where m i g is an integer and δ is a parameter of migration.
If the condition of migration is met, the worst solution of E B o , o = 1 , 2 is replaced with the best solution of E B 3 o ; as a result, the worst solutions are deleted, and the best solution of E B o is reproduced.
A simple tournament selection is applied in the onlooker bee phase, and a detailed description is shown in Algorithm 4.
As shown above, when an action a φ , φ 7 is selected, according to the Q-learning algorithm, the corresponding search operator, which is composed of a global search, reassignment, inversion and N S φ , is used for E B 1 , E B 2 and O B ; when action a 8 is chosen by the Q-learning algorithm, the search operator from a 1 , a 2 , , a 7 is randomly selected for E B 1 , E B 2 , O B , respectively.
In general, when t r i a l b > L i m i t , the corresponding employee bee of x b will become a scout. In this study, when the condition of the elite solution x * is met, a new scout phase is proposed based on a modified restart strategy [74], which has been proven to be capable of being used to avoid premature convergence. The new scout phase is described in Algorithm 5, where e l * is an integer.
In Algorithm 5, when global search, reassignment and inversion are performed on x b , the obtained new solution directly substitutes for x b ; that is, greedy selection is not used in the scout phase.
Algorithm 3: Employed bee phase
Input: E B 1 , E B 2
 1: for o = 1 to 2 do
 2:     for each solution x E B o  do
 3:         execute the chosen search operator of E B o on x
 4:     end for
 5:     update best solution and worst solution of E B i
 6: end for
 7: if m i g > δ then
 8:     for  o = 1 to 2 do
 9:         replace the worst solution of E B o with best
 10:       solution of E B 3 o
 11:     end for
 12:      m i g = 0
 13: else
 14:      m i g = m i g + 1
 15: end if
Algorithm 4: Onlooker bee phase
 1: for each solution x O B  do
 2:     Randomly select v E B 1 and y E B 2
 3:     if  f i t ( v ) < f i t ( y )  then
 4:          x = v
 5:     else
 6:          x = y
 7:     end if
 8:     if  f i t ( x ) < f i t ( x )  then
 9:          x = x
 10:     end if
 11:     Execute the chosen search operator of O B on x
 12: end for
Algorithm 5: Scout phase
Input: e l * , L i m i t
 1: if e l * > L i m i t then
 2:     sort all solutions of P in ascending order of T T
 3:     construct five sets ψ ϱ , ϱ { 1 , 2 , , 5 }
 4:      ψ ϱ = 0.2 ϱ 1 N , 0.2 ϱ 1 N + 1 , , 0.2 ϱ N
 5:     for each solution x b , b ψ 2  do
 6:         randomly select a solution x ϱ , ϱ ψ 1
 7:         execute global search between x b and x ϱ
 8:     end for
 9:     for each solution x b , b ψ 3  do
 10:         apply reassignment operator on x b
 11:     end for
 12:     for each solution x b , b ψ 4  do
 13:         perform inversion operator on x b
 14:     end for
 15:     for each solution x b , b ψ 5  do
 16:         randomly generate a solution
 17:     end for
 18:      e l * = 0
 19: else
 20:      e l * = e l * + 1
 21: end if
 22: for each solution x b P  do
 23:     update x * if x b is better than x *
 24: end for

4.4. Algorithm Description

Algorithm 6 gives the detailed steps of QABC, and Figure 9 describes its flow chart, in which t indicates the number of generations, and it also denotes the number of iterations of the Q-learning algorithm.
Algorithm 6: QABC
 1: let m i g , e l * , t r i a l b , t r i a l * be 0, t = 1
 2: Randomly produce an initial population P
 3: Initialize Q-table
 4: while termination condition is not met do
 5:     divide P into E B 1 , E B 2 , and O B
 6:     select action a t by Q-learning algorithm
 7:     execute employed bee phase by Algorithm 3
 8:     perform onlooker bee phase by Algorithm 4
 9:     apply scout phase by Algorithm 5
 10:     execute reinforcement search on x *
 11:     update state and Q-table
 12:      t = t + 1
 13: end while
The reinforcement search of elite solution x * is depicted below. Repeat the following steps R 2 times: execute the global search between x * and y ( y P and y x * ) and apply reassignment and inversion on x * sequentially, for each operator, when a new solution z is obtained, and x * is updated if z is better than x * .
QABC has the following features: (1) The Q-learning algorithm is adopted by using eight states based on the population evaluation, eight actions and a new adaptive action selection strategy. (2) Population P is divided into three swarms E B 1 , E B 2 , O B , and the Q-learning algorithm is used to dynamically select a search operator for these swarms. (3) The employed bee phase with adaptive migration and a new scout phase is implemented based on the modified restart method that is used.
In the Q-learning algorithm, eight actions mean that there are eight different search operators, and one of them is dynamically chosen; that is, the evolution of three swarms can be evolved with different operators, and as a result, the exploration ability can be intensified, and the possibility of falling local optima also diminishes greatly. Moreover, a migration and restart can maintain the high diversity of a population; thus, these features may lead to a good performance.

5. Computational Experiments

Extensive experiments were conducted to test the performance of QABC for a distributed three-stage ASP with D P m 1 layout, factory eligibility and setup times. All experiments were coded in C by using CodeBlocks 16.01 and run on a desktop computer with an Intel i5-10210 CPU (2.10GHz) and 8-GB RAM.

5.1. Test Instances and Comparative Algorithms

A total of 92 instances are applied and depicted by F 2 , 3 , 4 , 5 , 6 , n 6 , 10 , 20 , 50 , 100 , 200 , 250 , 300 , 500 and m 3 , 5 , 10 , 20 . For each instance denoted as n × F × m , p t i k f , t t i f , a t i f [ 1 , 100 ] , s p t i k f , s t t i f , s a t i f [ 1 , 20 ] , d i [ m × F × p t i , n × p t i ] , where p t i = ( f = 1 F k = 1 m p t i k f + s p t i k f ) / F / m + ( f = 1 F t t i f + a t i f + s t t i f + s a t i f ) / F . The elements of F i are randomly selected from F , and F i contains at least one factory. The above times and due date are integers and follow a uniform distribution on the above intervals.
As stated above, distributed ASP with factory eligibility is not considered, and there are no existing comparative algorithms.
For the distributed heterogeneous flowshop scheduling problem, Chen et al. [75] presented a probability model-based memetic algorithm (PMMA) with search operators and a local intensification operator, Li et al. [64] proposed a discrete artificial bee colony (DABC) with neighborhood search operators, a new acceleration method and a population update method, and Meng and Pan [65] designed an enhanced artificial bee colony (NEABC) by using a collaboration mechanism and restart strategy.
PMMA [75], DABC [64] and NEABC [65] have been successfully applied to solve the above distributed flowshop scheduling; moreover, these algorithms can be directly used to solve distributed three-stage ASP with factory eligibility after transportation and assembly are added into decoding process, and thus they are chosen as comparative algorithms.
Two variants named ABC1 and ABC2 are constructed. When the Q-learning algorithm is removed from QABC, ABC1 is obtained. When population division, migration, restart, and reinforcement search are removed from ABC1, and a scout phase is implemented as in Section 3.1, ABC2 is produced. When the Q-learning algorithm is removed, the search operator of P is fixed. We tested seven search operators, and two variants with a 1 is better than these algorithms with other operators.

5.2. Parameter Settings

In this study, a stopping condition is defined by CPU time. We found through experiments that QABC converges fully on all instances when 0.5 × n seconds is reached; moreover, all comparative algorithms, ABC1 and ABC2, also converge fully when this CPU time is reached, and so we set 0.5 × n seconds as the stopping condition of all algorithms.
With respect to the parameters of the Q-learning algorithm, we directly use the initial ε of 0.9 and learning rate α = 0.1 , according to Wang et al. [76]. The following parameters of QABC, which are N, R 1 , R 2 , L i m i t , δ and discount rate γ are tested according to the Taguchi method [77] on instance 250 × 5 × 10 . The levels of each parameter are shown in Table 5. The results of A v g and the S/N ratio are given in Figure 10, where A v g is the average value of 10 elite solutions in 10 runs, A v g = g = 1 10 e l i t e g / 10 , e l i t e g represents the elite solution for the gth run, and the S/N ratio is defined as 10 log 10 A v g 2 .
As shown in Figure 10, when the levels of N, R 1 , R 2 , L i m i t , δ , and γ are 2, 2, 2, 2, 3, 3, QABC produces a smaller average A v g and a bigger S/N ratio than QABC with other combinations of levels, and so the suggested settings are N = 150 , R 1 = 40 , R 1 = 50 , L i m i t = 150 , δ = 100 and γ = 0.8 .
Parameters of ABC1 and ABC2 are directly selected from QABC. Except for the stopping condition, the other parameters of PMMA, DABC, and NEABC are chosen from [64,65,75]. We also found that these settings of comparative algorithms can result in a better performance than other settings.

5.3. Results and Analyses

QABC is compared with ABC1, ABC2, PMMA, DABC and NEABC. Each algorithm randomly runs 10 times on each instance. Table 6, Table 7 and Table 8 show the computational results of QABC and its comparative algorithms, where M i n indicates the smallest total tardiness in 10 runs, M i n = min g = 1 , 2 , , 10 e l i t e g , and S T D is the standard deviation for 10 elite solutions in 10 runs, S T D = g = 1 10 ( e l i t e g A v g ) 2 / 10 . QA, A1, A2, PM, DA and NE denote QABC, ABC1, ABC2, PMMA, DABC and NEABC for simplicity, respectively. Figure 11 displays the mean plot with a 95% confidence interval of all algorithms, and Figure 12 describes convergence curves for instances 200 × 5 × 5 and 500 × 6 × 5 . Table 9 shows the results of pair-sample t-test, in which t-test (A, B) means that a paired t-test is conducted to judge whether algorithm A gives a better sample mean than B. If a significance level is 0.05, there is a significant difference between A and B in the statistical sense if the p-value is less than 0.05.
As shown in Table 6, Table 7 and Table 8, QABC significantly performs better than ABC1 in most of the instances. The M i n of QABC is smaller than that of ABC1 by at least 10% in 31 instances, A v g of QABC is less than that of ABC1 by at least 200 in more than 35 instances and S T D of QABC is smaller than that of ABC1 in nearly all instances. Table 9 shows that there are notable performance differences between QABC and ABC1 in a statistical sense. Figure 11 depicts the notable differences between the S T D of the two algorithms, and Figure 12 reveals that QABC significantly converges better than ABC1.
It can be found from Table 6 that ABC1 produces better M i n than ABC2 in 54 of 92 instances. As shown in Table 7, A v g of ABC1 is less than or equal to that of ABC2 in 84 of 92 instances. Table 8 shows that ABC2 performs better than ABC1 on S T D in 64 instances. Figure 12 and Table 9 also reveal that ABC1 performed better than ABC2.
Although some new parameters such as δ and γ are added because of the inclusion of new strategies such as Q-learning and migration, the above analyses on ABC, ABC1 and ABC2 demonstrate that the Q-learning algorithm, migration and new scout phase, etc., have really positive impacts on the performance of QABC, and thus, these new strategies are effective and reasonable.
As shown in Table 6, Table 7 and Table 8, QABC and PMMA converge to the same best solution for most of the instances with n < 100 , QABC does not generate worse M i n than PMMA in any instances with n 100 ; moreover, QABC produces A v g and S T D smaller than or the same as PMMA in almost all instances. QABC performs better than PMMA. The statistical results in Table 9 also reveal the above conclusion can be obtained. Figure 11 and Figure 12 show the performance difference between the two algorithms regarding S T D and M i n , respectively.
When QABC is compared with DABC, it can be seen from Table 6, Table 7 and Table 8 that QABC has smaller M i n than DABC in 80 instances, generates smaller A v g than DABC in 85 instances and obtains smaller S T D than DABC in 85 instances; moreover, performance differences between QABC and DABC increase with an increase in n × F × m . The convergent curves in Figure 12 and results in Table 9 can also demonstrate the performance difference in M i n between QABC and DABC, the performance differences in A v g also can be validated by the statistical results in Table 9, and Figure 11 and Table 9 show that QABC significantly outperforms DABC in S T D .
It can be concluded from Table 6, Table 7 and Table 8 that QABC performs significantly better than NEABC. QABC produces smaller M i n than NEABC by at least 20% in about 39 instances, also generates better A v g than NEABC by at least 20% in more than 58 instances and obtains better S T D than or the same S T D as NEABC on nearly all instances. QABC performs notably better than NEABC, and the same conclusion can be found in Table 9. Figure 11 shows the significant difference in S T D , and Figure 12 demonstrates the notable convergence advantage of QABC.
As stated above, the inclusion of the Q-learning algorithm, the migration between two employed bee swarms and modified restart strategy in the scout phase really improve the performance of QABC. The Q-learning algorithm results in the dynamical adjustment of search operators in the employed bee phase and onlooker bee phase. As a result, the search operator is not fixed and varied dynamically, and the exploration ability can be improved. Migration leads to the full use of the best solutions of E B 1 and E B 2 , and the restart strategy makes the population evolve with higher diversity. These features can lead to better search efficiency. Based on the above analyses, it can be concluded that QABC can effectively solve the distributed three-stage ASP with factory eligibility and setup times.

6. Conclusions

DASP has attracted some attention in recent years; however, a distributed three-stage ASP with various actual production constraints is seldom investigated. In this study, a distributed three-stage ASP with a D P m 1 layout, factory eligibility and setup times is considered. An effective QABC algorithm is developed to minimize total tardiness. In QABC, a Q-learning algorithm is implemented with eight states, eight actions, a new reward and an effective adaptive ϵ -greedy action selection and is adopted to dynamically decide the search operator for E B 1 , E B 2 and O B , obtained initially by population division. Adaptive migration between E B 1 and E B 2 and a modified restart strategy are executed in the employed bee phase and scout phase, respectively. A number of experiments are conducted, and the experimental results validate that QABC has reasonable and effective strategies and very competitive performances on the considered problem.
The distributed three-stage ASP is our main topic in the near future. We will focus on distributed three-stage ASP with other constraints such as fuzzy processing time and stochastic breakdown. We are also interested in other distributed scheduling problems, such as the distributed flexible job shop scheduling and distributed hybrid flow shop scheduling. Swarm intelligence optimizations and RL are also the focus of our attention, and we will try to carry out more effective combination modes and innovative strategies. We will also pay attention to the multi-objective optimization problem in distributed production networks.

Author Contributions

Conceptualization, J.W. and D.L.; methodology, J.W.; software, J.W.; validation, J.W., D.L. and M.L.; formal analysis, J.W.; investigation, M.L.; resources, J.W.; data curation, J.W.; writing—original draft preparation, J.W.; writing—review and editing, D.L.; visualization, J.W.; supervision, D.L.; project administration, D.L.; funding acquisition, D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number 61573264), and supported by “the Fundamental Research Funds for the Central Universities” (grant number 225211002).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lee, C.Y.; Cheng, T.C.E.; Lin, B.M.T. Minimizing the makespan in the 3-machine assembly-type flowshop scheduling problem. Manag. Sci. 1993, 39, 612–625. [Google Scholar] [CrossRef]
  2. Potts, C.N.; Sevast’Janov, S.V.; Strusevich, V.A.; Van Wassenhove, L.N.; Zwaneveld, C.M. The two-stage assembly scheduling problem: Complexity and approximation. Oper. Res. 1995, 43, 346–355. [Google Scholar] [CrossRef]
  3. Framinan, J.M.; Perez-Gonzalez, P. The 2-stage assembly flowshop scheduling problem with total completion time: Efficient constructive heuristic and metaheuristic. Comput. Oper. Res. 2017, 88, 237–246. [Google Scholar] [CrossRef]
  4. Komaki, G.M.; Sheikh, S.; Malakooti, B. Flow shop scheduling problems with assembly operations: A review and new trends. Int. J. Prod. Res. 2018, 57, 2926–2955. [Google Scholar] [CrossRef]
  5. Daneshamooz, F.; Fattahi, P.; Hosseini, S.M.H. Mathematical modeling and two efficient branch and bound algorithms for job shop scheduling problem followed by an assembly stage. Kybernetes 2021, 50, 3222–3245. [Google Scholar] [CrossRef]
  6. Zhang, Z.; Gong, X.; Song, X.L.; Yin, Y.; Lev, B.; Chen, J. A column generation-based exact solution method for seru scheduling problems. Omega 2022, 108, 102581. [Google Scholar] [CrossRef]
  7. Mohammad, Y.; Sahar, I. Integrated decision making for parts ordering and scheduling of jobs on two-stage assembly problem in three level supply chain. J. Manuf. Syst. 2018, 46, 137–151. [Google Scholar] [CrossRef]
  8. Saeedeh, A.B.; Mohammad, M.M.; Mohammad, N. Bi-level genetic algorithms for a two-stage assembly flow-shop scheduling problem with batch delivery system. Comput. Ind. Eng. 2018, 126, 217–231. [Google Scholar] [CrossRef]
  9. Allahverdi, A.; Al-Anzi, F.S. Evolutionary heuristics and an algorithm for the two-stage assembly scheduling problem to minimize makespan with setup times. Int. J. Prod. Res. 2006, 44, 4713–4735. [Google Scholar] [CrossRef]
  10. Komaki, G.M.; Kayvanfar, V. Grey wolf optimizer algorithm for the two-stage assembly flow shop scheduling problem with release time. J. Comput. Sci. 2015, 8, 109–120. [Google Scholar] [CrossRef]
  11. Fawaz, S.A.; Ali, A. A self-adaptive differential evolution heuristic for two-stage assembly scheduling problem to minimize maximum lateness with setup times. Eur. J. Oper. Res. 2007, 182, 80–94. [Google Scholar] [CrossRef]
  12. Hamed, K.; Mohammad, A.M.; Mohammad, R. The two stage assembly flow-shop scheduling problem with batching and delivery. Eng. Appl. Artif. Intel. 2017, 63, 98–107. [Google Scholar] [CrossRef]
  13. Christos, K.; George, J.K. The three-stage assembly flowshop scheduling problem. Comput. Oper. Res. 2001, 28, 689–904. [Google Scholar] [CrossRef]
  14. Hatami, S.; Ebrahimnejad, S.; Tavakkoli-Moghaddam, R.; Maboudian, Y. Two meta-heuristics for three-stage assembly flowshop scheduling with sequence-dependent setup times. Int. J. Adv. Manuf. Technol. 2010, 50, 1153–1164. [Google Scholar] [CrossRef]
  15. Maleki-Darounkolaei, A.; Modiri, M.; Tavakkoli-Moghaddam, R.; Seyyedi, I. A three-stage assembly flow shop scheduling problem with blocking and sequence-dependent set up times. J. Ind. Eng. Int. 2012, 8, 2–7. [Google Scholar] [CrossRef]
  16. Maleki-Daronkolaei, A.; Seyedi, I. Taguchi method for three-stage assembly flow shop scheduling problem with blocking and sequence-dependent set up times. J. Eng. Sci. Technol. 2013, 8, 603–622. [Google Scholar]
  17. Shoaardebili, N.; Fattahi, P. Multi-objective metaheuristics to solve three-stage assembly flow shop scheduling problem with machine availability constraints. Int. J. Prod. Res. 2014, 53, 944–968. [Google Scholar] [CrossRef]
  18. Komaki, G.M.; Teymourian, E.; Kayvanfar, V.; Booyavi, Z. Improved discrete cuckoo optimization algorithm for the three-stage assembly flowshop scheduling problem. Comput. Ind. Eng. 2017, 105, 158–173. [Google Scholar] [CrossRef]
  19. Campos, S.C.; Arroyo, J.E.C.; Tavares, R.G. A general vns heuristic for a three-stage assembly flow shop scheduling problem. In Proceedings of the 2016 International Conference on Intelligent Systems Design and Applications, Porto, Portugal, 16–18 December 2016; pp. 955–964. [Google Scholar] [CrossRef]
  20. Framinan, J.M.; Perez-Gonzalez, P.; Fernandez-Viagas, V. Deterministic assembly scheduling problems: A review and classification of concurrent-type scheduling models and solution procedures. Eur. J. Oper. Res. 2019, 273, 401–417. [Google Scholar] [CrossRef]
  21. Ruiz, R.; Pan, Q.K.; Naderi, B. Iterated Greedy methods for the distributed permutation flowshop scheduling problem. Omega 2019, 83, 213–222. [Google Scholar] [CrossRef]
  22. Lei, D.M.; Wang, T. Solving distributed two-stage hybrid flowshop scheduling using a shuffled frog-leaping algorithm with memeplex grouping. Eng. Optimiz. 2020, 52, 1461–1474. [Google Scholar] [CrossRef]
  23. Huang, H.P.; Pan, Q.K.; Miao, Z.H.; Gao, L. Effective constructive heuristics and discrete bee colony optimization for distributed flowshop with setup times. Eng. Appl. Artif. Intel. 2021, 97, 104016. [Google Scholar] [CrossRef]
  24. Rossi, F.L.; Nagano, M.S. Heuristics and iterated greedy algorithms for the distributed mixed no-idle flowshop with sequence-dependent setup times. Comput. Ind. Eng. 2021, 157, 107337. [Google Scholar] [CrossRef]
  25. Yan, Q.; Wu, W.B.; Wang, H.F. Deep reinforcement learning for distributed flow shop scheduling with flexible maintenance. Machines 2022, 10, 210. [Google Scholar] [CrossRef]
  26. Shao, W.S.; Pi, D.C.; Shao, Z.S. Local search methods for a distributed assembly no-idle flow shop scheduling problem. IEEE Syst. J. 2019, 13, 1945–1956. [Google Scholar] [CrossRef]
  27. Chen, J.F.; Wang, L.; Peng, Z.P. A collaborative optimization algorithm for energy-efficient multi-objective distributed no-idle flow-shop scheduling. Swarm Evol. Comput. 2019, 50, 100557. [Google Scholar] [CrossRef]
  28. Zhao, F.Q.; Zhao, J.L.; Wang, L.; Tang, J.X. An optimal block knowledge driven backtracking search algorithm for distributed assembly no-wait flow shop scheduling problem. Appl. Soft Comput. 2021, 112, 107750. [Google Scholar] [CrossRef]
  29. Zhao, F.; Zhang, L.; Cao, J.; Tang, J. A cooperative water wave optimization algorithm with reinforcement learning for the distributed assembly no-idle flowshop scheduling problem. Comput. Ind. Eng. 2020, 153, 107082. [Google Scholar] [CrossRef]
  30. Li, Y.Z.; Pan, Q.K.; Ruiz, R.; Sang, H.Y. A referenced iterated greedy algorithm for the distributed assembly mixed no-idle permutation flowshop scheduling problem with the total tardiness criterion. Knowl Based Syst 2020, 239, 108036. [Google Scholar] [CrossRef]
  31. Gonzalez-Neira, E.M.; Ferone, D.; Hatami, S.; Juan, A.A. A biased-randomized simheuristic for the distributed assembly permutation flowshop problem with stochastic processing times. Simul. Model. Pract. Theory 2017, 79, 23–26. [Google Scholar] [CrossRef]
  32. Li, M.; Su, B.; Lei, D.M. A novel imperialist competitive algorithm for fuzzy distributed assembly flow shop scheduling. J. Intel. Fuzzy Syst. 2021, 40, 4545–4561. [Google Scholar] [CrossRef]
  33. Shao, Z.S.; Shao, W.S.; Pi, D.C. Effective constructive heuristic and iterated greedy algorithm for distributed mixed blocking permutation flow-shop scheduling problem. Knowl. Based Syst. 2021, 221, 106959. [Google Scholar] [CrossRef]
  34. Zhao, F.Q.; Shao, D.Q.; Wang, L.; Xu, T.P.; Zhu, N.N.; Jonrinaldi. An effective water wave optimization algorithm with problem-specific knowledge for the distributed assembly blocking flow-shop scheduling problem. Knowl. Based Syst. 2022, 243, 108471. [Google Scholar] [CrossRef]
  35. Yang, S.L.; Xu, Z.G. The distributed assembly permutation flowshop scheduling problem with flexible assembly and batch delivery. Int. J. Prod. Res. 2021, 59, 4053–4071. [Google Scholar] [CrossRef]
  36. Yang, Y.; Peng, L.; Wang, S.; Bo, L.; Luo, Y. Scatter search for distributed assembly flowshop scheduling to minimize total tardiness. In Proceedings of the 2017 IEEE Congress on Evolutionary Computation, San Sebastian, Spain, 5–8 June 2017; pp. 861–868. [Google Scholar] [CrossRef]
  37. Zhang, Z.Q.; Hu, R.; Qian, B.; Jin, H.P.; Wang, L.; Yang, J.B. A matrix cube-based estimation of distribution algorithm for the energy-efficient distributed assembly permutation flow-shop scheduling problem. Expert Syst. Appl. 2022, 194, 116484. [Google Scholar] [CrossRef]
  38. Song, H.B.; Lin, J. A genetic programming hyperheuristic for the distributed assembly permutation flowshop scheduling problem with sequence dependent setup times. Swarm Evol. Comput. 2021, 80, 100807. [Google Scholar] [CrossRef]
  39. Hatami, S.; Ruiz, R.; Romano, C.A. Heuristic and metaheuristics for the distributed assembly permutaiton flowshop scheduling problem with sequence dependent setup times. Int. J. Prod. Econ. 2015, 169, 76–88. [Google Scholar] [CrossRef]
  40. Xiong, F.L.; Xing, K.Y.; Wang, F.; Lei, H.; Han, L.B. Minimizing the total completion time in a distributed two stage assembly system with setup times. Comput. Oper. Res. 2014, 47, 92–105. [Google Scholar] [CrossRef]
  41. Deng, J.; Wang, L.; Wang, S.Y.; Zheng, X.L. A competitive memetic algorithm for the distributed twostage assembly flow-shop scheduling problem. Int. J. Prod. Res. 2016, 54, 3561–3577. [Google Scholar] [CrossRef]
  42. Zhang, G.; Xing, K. Memetic social spider optimization algorithm for scheduling two-stage assembly flowshop in a distributed environment. Comput. Ind. Eng. 2018, 125, 423–433. [Google Scholar] [CrossRef]
  43. Lei, D.M.; Su, B.; Li, M. Cooperated teachinglearning-based optimisation for distributed two-stage assembly flow shop scheduling. Int. J. Prod. Res. 2020, 59, 7232–7245. [Google Scholar] [CrossRef]
  44. Wang, I.L.; Wang, Y.C.; Chen, C.W. Scheduling unrelated parallel machines in semiconductor manufacturing by problem reduction and local search heuristics. Flex. Serv. Manuf. J. 2013, 25, 343–366. [Google Scholar] [CrossRef]
  45. Li, D.B.; Wang, J.; Qiang, R.; Chiong, R. A hybrid differential evolution algorithm for parallel machine scheduling of lace dyeing considering colour families, sequence-dependent setup and machine eligibility. Int. J. Prod. Res. 2020, 59, 2722–2738. [Google Scholar] [CrossRef]
  46. Qin, H.; Li, T.; Teng, Y.; Wang, K. Integrated production and distribution scheduling in distributed hybrid flow shops. Memet. Comput. 2021, 13, 185–202. [Google Scholar] [CrossRef]
  47. Chen, R.; Yang, B.; Li, S.; Wang, S. A self-learning genetic algorithm based on reinforcement learning for flexible job-shop scheduling problem. Comput. Ind. Eng. 2020, 149, 106778. [Google Scholar] [CrossRef]
  48. Cao, Z.C.; Lin, C.R.; Zhou, M.C.; Huang, R. Scheduling semiconductor testing facility by using cuckoo search algorithm with reinforcement learning and surrogate modeling. IEEE Trans. Autom. Sci. Eng. 2018, 16, 825–837. [Google Scholar] [CrossRef]
  49. Cao, Z.C.; Lin, C.R.; Zhou, M.C. A knowledge-based cuckoo search algorithm to schedule a flexible job shop with sequencing flexibility. IEEE Trans. Autom. Sci. Eng. 2019, 18, 56–69. [Google Scholar] [CrossRef]
  50. Oztop, H.; Tasgetiren, M.F.; Kandiller, L.; Pan, Q.K. A novel general variable neighborhood search through q-learning for no-idle flowshop scheduling. In Proceedings of the 2020 IEEE Congress on Evolutionary Computation, Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar] [CrossRef]
  51. Ma, P.; Zhang, H.L. Improved artificial bee colony algorithm based on reinforcement learning. In Proceedings of the International Conference on Intelligent Computing, Lanzhou, China, 2–5 August 2016; pp. 721–732. [Google Scholar] [CrossRef]
  52. Lin, J.; Li, Y.Y.; Song, H.B. Semiconductor final testing scheduling using Q-learning based hyper-heuristic. Expert Syst. Appl. 2022, 187, 115978. [Google Scholar] [CrossRef]
  53. Karimi-Mamaghan, M.; Mohammadi, M.; Pasdeloup, B.; Meyer, P. Learning to select operators in meta-heuristics: An integration of q-learning into the iterated greedy algorithm for the permutation flowshop scheduling problem. Eur. J. Oper. Res. 2022, in press. [Google Scholar] [CrossRef]
  54. Maryam, K.M.; Mehrdad, M.F.; Patrick, M.; Amir, M.K.M.; El, G.T. Machine learning at the service of metaheuristics for solving combinatorial optimization problems: A state-of-the-art. Eur. J. Oper. Res. 2022, 296, 393–422. [Google Scholar] [CrossRef]
  55. Cheng, L.X.; Tang, Q.H.; Zhang, L.P.; Yu, C.L. Scheduling flexible manufacturing cell with no-idle flow lines and job shop via q-learning-based genetic algorithm. Comput. Ind. Eng. 2022, 169, 108293. [Google Scholar] [CrossRef]
  56. Li, J.Q.; Pan, Q.K.; Gao, K.Z. Pareto-based discrete artificial bee colony algorithm for multi-objective flexible job shop scheduling problems. Int. J. Adv. Manuf. Techmol. 2011, 55, 1159–1169. [Google Scholar] [CrossRef]
  57. Pan, Q.K.; Tasgetiren, M.F.; Suganthan, P.N.; Chua, T.J. A discrete artificial bee colony algorithm for the lot-streaming flow shop scheduling problem. Inform. Sci. 2011, 181, 2455–2468. [Google Scholar] [CrossRef]
  58. Banharnsakun, A.; Sirinaovakul, B.; Achalakul, T. Job shop scheduling with the best-so-far abc. Eng. Appl. Artif. Intel. 2012, 25, 583–593. [Google Scholar] [CrossRef]
  59. Han, Y.Y.; Gong, D.W.; Sun, X.Y. A discrete artificial bee colony algorithm incorporating differential evolution for the fow-shop scheduling problem with blocking. Eng. Optimiz. 2015, 47, 927–946. [Google Scholar] [CrossRef]
  60. Li, J.Q.; Han, Y.Q. A hybrid multi-objective artificial bee colony algorithm for flexible task scheduling problems in cloud computing system. Cluster Comput. 2020, 23, 2483–2499. [Google Scholar] [CrossRef]
  61. Lei, D.M.; Liu, M.Y. An artificial bee colony with division for distributed unrelated parallel machine scheduling with preventive maintenance. Comput. Ind. Eng. 2020, 141, 106320. [Google Scholar] [CrossRef]
  62. Meng, T.; Pan, Q.K.; Wang, L. A distributed permutation flowshop scheduling problem with the customer order constraint. Knowl. Based Syst. 2019, 184, 104894. [Google Scholar] [CrossRef]
  63. Lei, D.M.; Yuan, Y.; Cai, J.C. An improved artificial bee colony for multi-objective distributed unrelated parallel machine scheduling. Int. J. Prod. Res. 2020, 59, 5259–5271. [Google Scholar] [CrossRef]
  64. Li, H.; Li, X.; Gao, L. A discrete artificial bee colony algorithm for the distributed heterogeneous no-wait flowshop scheduling problem. Appl. Soft Comput. 2021, 100, 106946. [Google Scholar] [CrossRef]
  65. Meng, T.; Pan, Q.K. A distributed heterogeneous permutation flowshop scheduling problem with lotstreaming and carryover sequence-dependent setup time. Swarm Evol. Comput. 2021, 60, 100804. [Google Scholar] [CrossRef]
  66. Baysal, M.E.; Sarucan, A.; Büyüközkan, K.; Engin, O. Artificial bee colony algorithm for solving multi-objective distributed fuzzy permutation flow shop problem. J. Intell. Fuzzy Syst. 2022, 42, 439–449. [Google Scholar] [CrossRef]
  67. Tao, X.R.; Pan, Q.K.; Gao, L. An efficient self-adaptive artificial bee colony algorithm for the distributed resource-constrained hybrid flowshop problem. Comput. Ind. Eng. 2022, 169, 108200. [Google Scholar] [CrossRef]
  68. Wang, J.; Lei, D.M.; Cai, J.C. An adaptive artificial bee colony with reinforcement learning for distributed three-stage assembly scheduling with maintenance. Appl. Soft Comput. 2021, 117, 108371. [Google Scholar] [CrossRef]
  69. Zhou, Q.L. A novel movies recommendation algorithm based on reinforcement learning with DDPG policy. Int. J. Intel. Comput. Cyber. 2020, 13, 67–79. [Google Scholar] [CrossRef]
  70. Pandit, M.H.; Mir, R.N.; Chishti, M.A. Adaptive task scheduling in IOT using reinforcement learning. Int. J. Intel. Comput. Cyber. 2020, 13, 261–282. [Google Scholar] [CrossRef]
  71. Watkins, C.J. Q-learning. Mach. Learn. 1992, 3, 279–292. [Google Scholar] [CrossRef]
  72. Nabavi, S.; Somayeh, H. Exploration and exploitation tradeoff in fuzzy reinforcement learning. Int. J. Comput. Appl. 2011, 9, 26–31. [Google Scholar] [CrossRef]
  73. Wang, Y.F. Adaptive job shop scheduling strategy based on weighted q-learning algorithm. J. Intel. Manuf. 2020, 31, 417–432. [Google Scholar] [CrossRef]
  74. Yu, C.L.; Semeraro, Q.; Matta, A. A genetic algorithm for the hybrid flow shop scheduling with unrelated machines and machine eligibility. Comput. Oper. Res. 2018, 100, 211–229. [Google Scholar] [CrossRef]
  75. Chen, J.; Wang, L.; He, X.; Huang, D. A probability model-based memetic algorithm for distributed heterogeneous flow-shop scheduling. In Proceedings of the 2019 IEEE Congress on Evolutionary Computation, Wellington, New Zealand, 10–13 June 2019; pp. 411–418. [Google Scholar] [CrossRef]
  76. Wang, H.; Yan, Q.; Zhang, S. Integrated scheduling and flexible maintenance in deteriorating multi-state single machine system using a reinforcement learning approach. Adv. Eng. Inform. 2021, 100, 101339. [Google Scholar] [CrossRef]
  77. Montgomery, D.C. Design and Analysis of Experiments, 10th ed.; John Wiley & Sons: New York, NY, USA, 2019; Available online: https://www.wiley.com/en-us/Design+and+Analysis+of+Experiments (accessed on 7 July 2022).
Figure 1. A schedule of the example.
Figure 1. A schedule of the example.
Machines 10 00661 g001
Figure 2. An illustration of Q-learning.
Figure 2. An illustration of Q-learning.
Machines 10 00661 g002
Figure 3. Example of a global search. (a) Factory assignment string; (b) Scheduling string.
Figure 3. Example of a global search. (a) Factory assignment string; (b) Scheduling string.
Machines 10 00661 g003
Figure 4. Example of reassignment operator.
Figure 4. Example of reassignment operator.
Machines 10 00661 g004
Figure 5. Example of N 7 .
Figure 5. Example of N 7 .
Machines 10 00661 g005
Figure 6. Four cases on t r i a l * and two cases of E v o t and D t .
Figure 6. Four cases on t r i a l * and two cases of E v o t and D t .
Machines 10 00661 g006
Figure 7. Percentages of eight states.
Figure 7. Percentages of eight states.
Machines 10 00661 g007
Figure 8. The updated process of state and action.
Figure 8. The updated process of state and action.
Machines 10 00661 g008
Figure 9. The flowchart of QABC.
Figure 9. The flowchart of QABC.
Machines 10 00661 g009
Figure 10. Mean A v g and mean S/N ratio.
Figure 10. Mean A v g and mean S/N ratio.
Machines 10 00661 g010
Figure 11. Mean plot with 95% confidence interval for all algorithms.
Figure 11. Mean plot with 95% confidence interval for all algorithms.
Machines 10 00661 g011
Figure 12. Convergence curves of two instances.
Figure 12. Convergence curves of two instances.
Machines 10 00661 g012
Table 1. A summary of notations.
Table 1. A summary of notations.
Indexes and ParametersDescription
i , j Product index i , j { 1 , 2 , , n }
kProcessing machine and product indexes k { 1 , 2 , , m }
fFactory index f F
F Factory set F = { 1 , 2 , , F }
nNumber of products
mNumber of fabrication machines and components of each product
FNumber of factories
F i Feasible factory set of product i, F i F
M k f The kth fabrication machine of factory f
T M f Transportation machine of factory f
A M f Assembly machine of factory f
p t i k f Processing time of its kth component on machine M k f
t t i f Transportation time on T M f
a s i f Assembly time on A M f
s p t i k f Setup time of the kth component of product i on M f k
s t t i f Setup time of product i on T M f
s a s i f Setup time of product i on A M f
d i Due date of product i
C i The completion time of product i
T i Tardiness of product i, T i = max C i d i , 0
T T Total tardiness of all products
Table 2. An illustrative example.
Table 2. An illustrative example.
i123456
F i {2}{1, 3}{1, 2}{1, 2, 3}{3}{1, 2, 3}
p t i 1 f —, 31, —52, —, 8414, 11,—88, 38, 66—, —, 5518, 27, 64
p t i 2 f —, 26, —54, —, 6721, 77,—30, 27, 82—, —, 9751, 79, 8
p t i 3 f —, 60, —54, —, 1639, 9,—64, 50, 42—, —, 3010, 66, 47
t t i f —, 22, —65, —, 5739, 79,—16, 17, 86—, —, 4213, 95, 65
a t i f —, 45, —47, —, 5189, 51,—88, 68, 3—, —, 4188, 59, 50
s p t i 1 f —, 9, —19, —, 197, 6,—19, 19, 19—, —, 317, 4, 18
s p t i 2 f —, 17, —4, —, 810, 2,—12, 11, 14—, —, 713, 18, 19
s p t i 3 f —, 18, —17, —, 720, 20,—4, 15, 6—, —, 720, 4, 16
s t t i f —, 7, —8, —, 913, 16,—11, 2, 8—, —, 77, 4, 19
s a t i f —, 12, —13, —, 59, 15,—15, 3, 11—, —, 220, 10, 8
d i 204357150245228448
Table 3. The eight state representations.
Table 3. The eight state representations.
StateState Description
s 1 t r i a l * = 0 and E v o t D t
s 2 t r i a l * = 0 and E v o t < D t
s 3 0 < t r i a l * μ 1 and E v o t D t
s 4 0 < t r i a l * μ 1 and E v o t < D t
s 5 μ 1 < t r i a l * μ 2 and E v o t D t
s 6 μ 1 < t r i a l * μ 2 and E v o t < D t
s 7 t r i a l * > μ 2 and E v o t D t
s 8 t r i a l * > μ 2 and E v o t < D t
Table 4. The update process of the Q-table.
Table 4. The update process of the Q-table.
(a)
a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8
s 1 0.7220.6970.0110.283−0.1160.3290.2450.462
s 2 −0.175−0.241−0.335−0.218−0.553−0.150−0.256−0.233
s 3 −0.346−0.297−0.239−0.311−0.361−0.382−0.303−0.272
s 4 −0.107−0.213−0.113−0.116−0.319−0.118−0.127−0.237
s 5 −0.111−0.207−0.111−0.112−0.114−0.1570.085−0.163
s 6 −0.560−0.554−0.579−0.459−0.472−0.588−0.468−0.473
s 7 −0.203−0.209−0.518−0.400−0.746−0.212−0.600−0.385
s 8 −0.703−0.663−0.858−0.665−0.6730.879−0.704−0.694
(b)
a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8
s 1 0.7220.6970.0110.283−0.1160.3290.2450.462
s 2 −0.175−0.241−0.335−0.218−0.553−0.150−0.256−0.233
s 3 −0.346−0.297−0.239−0.311−0.361−0.382−0.303−0.272
s 4 −0.107−0.213−0.113−0.116−0.319−0.118−0.127−0.237
s 5 −0.111−0.207−0.111−0.112−0.114−0.1570.085−0.163
s 6 −0.560−0.554−0.579−0.459−0.472−0.588−0.468−0.473
s 7 −0.203−0.209−0.518−0.400−0.746−0.212−0.600−0.385
s 8 −0.703−0.663−0.858−0.665−0.6730.653−0.704−0.694
Table 5. Levels of the parameters.
Table 5. Levels of the parameters.
ParametersFactor Level
123
N120150180
R 1 204050
R 2 305070
L i m i t 100150200
δ 5075100
γ 0.40.60.8
Table 6. Computational results of M i n by six algorithms.
Table 6. Computational results of M i n by six algorithms.
n × F × m QAA1A2PMDANE n × F × m QAA1A2PMDANE
6 × 2 × 3 827827827827827827 100 × 4 × 20 275330972974306835803244
6 × 2 × 5 568568568568568568 100 × 5 × 5 196419642105200322862251
10 × 2 × 3 576576576576592576 100 × 5 × 10 117911791179117912521280
10 × 2 × 5 808808808808808808 100 × 5 × 20 146816751849155322722157
6 × 3 × 3 6666666666157 100 × 6 × 5 108810881088108812111106
6 × 3 × 5 9292929292232 100 × 6 × 10 100810181103105211531265
10 × 3 × 3 737373737390 100 × 6 × 20 91391395791310461006
10 × 3 × 5 112112112112112237 200 × 2 × 5 140251729242344209
20 × 2 × 5 197319731973197320661973 200 × 2 × 10 718467714197885021162
20 × 2 × 10 145914591459145914701477 200 × 2 × 20 451692058122485580604974
20 × 2 × 20 915915922915971915 200 × 3 × 5 429475451495507891
20 × 3 × 5 174117411741174117621741 200 × 3 × 10 260831083315316730573672
20 × 3 × 10 111111111111126154 200 × 3 × 20 509509525617509509
20 × 3 × 20 768768768768780768 200 × 4 × 5 163016381808201620752072
20 × 4 × 5 122812281228122813611239 200 × 4 × 10 576626635690785723
20 × 4 × 10 680680680680724753 200 × 4 × 20 8799881075109810681309
20 × 4 × 20 135413541359135414101402 200 × 5 × 5 384541354304422541814867
20 × 5 × 5 684684684684757779 200 × 5 × 10 886959101292410981044
20 × 5 × 10 122812281228182018661341 200 × 5 × 20 592865166344638470976605
20 × 5 × 20 9509509509509841071 200 × 6 × 5 332134293436353737183934
20 × 6 × 5 321321321321321798 200 × 6 × 10 9119119119271002979
20 × 6 × 10 121712171217121712931368 200 × 6 × 20 694716745695780838
20 × 6 × 20 9499499499569491212 250 × 3 × 5 527576049537742760897459
50 × 2 × 5 638265016531655069146524 250 × 3 × 10 272333845204341332563640
50 × 2 × 10 578157815821578159495958 250 × 3 × 20 7691722539014289551721
50 × 2 × 20 261727322757278029223150 250 × 4 × 5 110612181247134912671477
50 × 3 × 5 7337337467331240733 250 × 4 × 10 452354625269624459955457
50 × 3 × 10 388339434004396744664265 250 × 4 × 20 242330153413298442872907
50 × 3 × 20 511852965367531855315626 250 × 5 × 5 595267066853697568198730
50 × 4 × 5 9349349599431136955 250 × 5 × 10 165620992270233825892577
50 × 4 × 10 104410481151105112741075 250 × 5 × 20 280340352418340393
50 × 4 × 20 541541561541618581 250 × 6 × 5 736779189303840280149628
50 × 5 × 5 570570573570663570 250 × 6 × 10 202825052447230823982408
50 × 5 × 10 591591592591666621 250 × 6 × 20 393945185354458542725169
50 × 5 × 20 840840840840867856 300 × 5 × 5 644817159613307121356
50 × 6 × 5 552552562552592552 300 × 5 × 10 273331414122415038914647
50 × 6 × 10 102102102102110102 300 × 5 × 20 162213334672261587
50 × 6 × 20 675675762696813851 300 × 6 × 5 568265097374714160917840
100 × 2 × 5 420544574419438749375064 300 × 6 × 10 220525843181363923143291
100 × 2 × 10 125014211491128414271364 300 × 6 × 20 309438834542517537504703
100 × 2 × 20 322136794316330660753547 500 × 5 × 5 7207387381396730920
100 × 3 × 5 6736776766731138691 500 × 5 × 10 148516421617323519722120
100 × 3 × 10 368745214631394461094505 500 × 5 × 20 117121140484166294
100 × 3 × 20 737737576750806862 500 × 6 × 5 10981883396620089242257
100 × 4 × 5 826842885836915872 500 × 6 × 10 103911541230188311611643
100 × 4 × 10 903928100993312211253 500 × 6 × 20 154222962499650519693679
Table 7. Computational results of A v g by six algorithms.
Table 7. Computational results of A v g by six algorithms.
n × F × m QAA1A2PMDANE n × F × m QAA1A2PMDANE
6 × 2 × 3827.0827.0827.0827.0827.0857.0100 × 4 × 202883.93266.33393.13278.24108.83562.2
6 × 2 × 5568.0568.0568.0568.0568.0568.0100 × 5 × 51989.11993.52343.02110.12681.42451.4
10 × 2 × 3576.0576.0576.0576.0592.0576.0100 × 5 × 101179.01182.31246.11197.61394.71353.8
10 × 2 × 5808.0808.0808.0808.0808.0818.8100 × 5 × 201560.01970.02047.41677.02627.22408.2
6 × 3 × 366.066.066.066.066.0157.0100 × 6 × 51088.01097.61120.81098.51312.91194.7
6 × 3 × 592.092.092.092.092.0232.0100 × 6 × 101027.31114.21254.61119.61444.21390.1
10 × 3 × 373.073.073.074.074.190.0100 × 6 × 20913.0916.31067.5989.81336.91136.8
10 × 3 × 5112.0112.0112.0112.0112.0252.5200 × 2 × 5229.9617.51170.2478.5788.2621.1
20 × 2 × 51973.01973.01994.92006.92119.22022.2200 × 2 × 101323.17161.02500.51332.71122.11956.3
20 × 2 × 101459.01459.01476.91459.31558.31747.5200 × 2 × 204894.610,303.79547.85896.19525.111,185.9
20 × 2 × 20915.0917.1948.0927.41066.71125.0200 × 3 × 5466.7595.6754.4908.81179.81297.8
20 × 3 × 51741.01741.01756.71749.41782.41759.8200 × 3 × 102781.93396.33594.03779.44182.04171.5
20 × 3 × 10111.0111.0121.0122.5190.8283.7200 × 3 × 20515.4550.2699.5762.21180.4780.2
20 × 3 × 20768.0768.0775.2770.8822.7837.1200 × 4 × 51691.91822.62011.92120.52885.72294.5
20 × 4 × 51228.01229.61253.81342.31438.41517.2200 × 4 × 10582.4726.6801.7773.8989.8924.8
20 × 4 × 10680.0682.3716.1707.8826.7810.2200 × 4 × 20940.61163.51144.21398.71415.81761.5
20 × 4 × 201354.01356.41374.01378.01503.01577.5200 × 5 × 54020.54451.34718.94423.44987.85332.8
20 × 5 × 5684.0684.0712.4731.1832.11087.5200 × 5 × 10943.41100.11173.71042.21472.01343.5
20 × 5 × 101228.01230.61248.61836.51956.11527.4200 × 5 × 206063.36806.96574.86766.88226.46850.5
20 × 5 × 20950.0950.01002.3998.41124.61238.0200 × 6 × 53379.43630.13697.03760.34100.84166.8
20 × 6 × 5321.0321.0322.9323.9352.0921.4200 × 6 × 10911.0945.3983.1965.81079.01039.6
20 × 6 × 101217.01217.01247.51270.31350.21584.2200 × 6 × 20694.0781.1874.5791.2997.0982.3
20 × 6 × 20949.0949.0963.9977.91037.41321.9250 × 3 × 56102.78676.410,756.08596.28927.39505.3
50 × 2 × 56429.76588.76952.96683.97610.37466.0250 × 3 × 102868.14335.86686.24983.95523.44614.3
50 × 2 × 105807.25826.76050.05883.76316.66233.1250 × 3 × 201011.53123.97648.02737.62319.92871.2
50 × 2 × 202686.22855.62984.52924.33522.93390.1250 × 4 × 51181.91449.61410.91755.42008.31654.7
50 × 3 × 5733.0733.0803.9757.01506.8886.5250 × 4 × 104812.86459.46521.16892.68250.86260.2
50 × 3 × 103953.23996.54151.54052.15196.04543.2250 × 4 × 202530.13466.94405.04082.06169.13463.3
50 × 3 × 205195.25368.45554.35448.75840.86284.7250 × 5 × 56286.47597.57951.97856.98185.49400.7
50 × 4 × 5934.6937.81016.31047.21377.81041.3250 × 5 × 101837.52481.42449.62590.93776.13206.7
50 × 4 × 101044.81087.71250.51153.51447.91420.6250 × 5 × 20297.1412.8437.3571.4579.8571.6
50 × 4 × 20546.9556.3609.4600.5714.9737.5250 × 6 × 57883.28967.510,104.29249.08856.29934.1
50 × 5 × 5572.7585.8663.8597.7774.3706.9250 × 6 × 102131.02659.52593.02848.13428.72789.1
50 × 5 × 10591.4597.4646.0609.2780.6712.3250 × 6 × 204125.05173.76546.16109.55405.45694.0
50 × 5 × 20840.0840.0895.6844.21055.6965.7300 × 5 × 5812.01184.32051.21742.41071.62107.3
50 × 6 × 5552.0555.7600.4570.7764.3608.4300 × 5 × 102992.93683.04604.25320.94739.75623.3
50 × 6 × 10102.0102.0113.7108.4192.2167.8300 × 5 × 20215.4421.8433.5807.1491.8824.4
50 × 6 × 20704.9733.0885.4783.81027.9975.3300 × 6 × 56159.27175.17883.88126.96898.78684.9
100 × 2 × 54379.34867.75445.64763.37439.95717.6300 × 6 × 102390.93114.03429.94060.53559.33856.0
100 × 2 × 101323.81535.01909.01326.02517.11979.2300 × 6 × 203437.04312.85199.95807.94993.65342.2
100 × 2 × 203352.84482.35221.73449.97557.23973.4500 × 5 × 5727.1824.4915.92684.71023.11389.4
100 × 3 × 5673.4699.9760.5692.11480.7772.3500 × 5 × 101538.51910.81806.64082.82648.92362.5
100 × 3 × 103886.94755.25054.54505.68002.75265.6500 × 5 × 20119.8154.7159.31499.4212.2632.8
100 × 3 × 20737.0766.1638.0813.01373.71044.6500 × 6 × 51344.52625.95230.54003.61651.63380.1
100 × 4 × 5842.7875.2955.0867.81265.41183.3500 × 6 × 101091.11448.61433.92430.01984.42006.9
100 × 4 × 10909.61076.11223.31050.71707.31508.5500 × 6 × 201763.42545.83108.47247.62578.64670.5
Table 8. Computational results of S T D by six algorithms.
Table 8. Computational results of S T D by six algorithms.
n × F × m QAA1A2PMDANE n × F × m QAA1A2PMDANE
6 × 2 × 30.00.00.00.00.024.5100 × 4 × 2087.4125.5269.8109.5400.0253.4
6 × 2 × 50.00.00.00.00.00.0100 × 5 × 525.440.5248.983.8389.4138.3
10 × 2 × 30.00.00.00.00.00.0100 × 5 × 100.06.574.724.474.161.6
10 × 2 × 50.00.00.00.00.020.6100 × 5 × 2068.2175.9172.494.0386.7232.6
6 × 3 × 30.00.00.00.00.00.0100 × 6 × 50.04.826.87.3101.664.0
6 × 3 × 50.00.00.00.00.00.0100 × 6 × 1023.672.982.074.2232.3157.2
10 × 3 × 30.00.00.02.02.00.0100 × 6 × 200.08.065.245.9227.7107.0
10 × 3 × 50.00.00.00.00.014.6200 × 2 × 575.2240.7408.9219.4307.7413.5
20 × 2 × 50.00.023.152.129.058.2200 × 2 × 10318.61478.11137.2348.3924.0488.7
20 × 2 × 100.00.023.00.985.5160.5200 × 2 × 20222.6767.6719.0777.71580.53866.4
20 × 2 × 200.03.219.010.354.9108.8200 × 3 × 522.7101.3183.6364.0555.2363.4
20 × 3 × 50.00.010.56.734.38.9200 × 3 × 10131.2177.9206.2456.8782.1315.2
20 × 3 × 100.00.012.38.045.863.4200 × 3 × 2019.234.5136.594.8475.6168.9
20 × 3 × 200.00.09.15.716.866.7200 × 4 × 554.583.1114.499.1735.9146.1
20 × 4 × 50.04.821.645.057.4147.6200 × 4 × 1012.672.4109.747.3185.5154.1
20 × 4 × 100.06.927.528.079.971.4200 × 4 × 2047.6130.071.7211.0235.2313.6
20 × 4 × 200.07.27.420.468.1160.1200 × 5 × 576.5222.4337.3159.5528.0278.9
20 × 5 × 50.00.032.947.460.9269.7200 × 5 × 1041.4113.6105.878.7244.5185.8
20 × 5 × 100.05.220.219.378.5113.8200 × 5 × 2068.0233.9161.8164.4773.4177.9
20 × 5 × 200.00.050.641.483.4211.0200 × 6 × 551.1134.4203.7134.4377.8196.2
20 × 6 × 50.00.05.76.129.2110.5200 × 6 × 100.030.535.830.067.869.6
20 × 6 × 100.00.025.831.044.5110.8200 × 6 × 200.062.179.984.7165.584.8
20 × 6 × 200.00.019.714.655.0101.9250 × 3 × 5360.9822.7788.01280.31986.11513.4
50 × 2 × 541.173.8218.089.9443.4557.0250 × 3 × 10143.6598.7645.9913.71567.2486.9
50 × 2 × 1041.544.2176.789.9364.1220.1250 × 3 × 20159.5900.1880.8837.41070.6832.2
50 × 2 × 2066.163.2174.7106.3434.3237.6250 × 4 × 545.6132.8111.3256.0490.7158.2
50 × 3 × 50.00.052.849.3170.6127.5250 × 4 × 10215.0542.4610.8370.71650.3447.6
50 × 3 × 1027.731.797.795.5431.7165.3250 × 4 × 2070.5306.7506.9524.31126.1436.9
50 × 3 × 2055.266.8136.5120.3403.6420.5250 × 5 × 5279.2533.1561.0528.01367.1374.7
50 × 4 × 51.82.736.687.6131.253.7250 × 5 × 1079.5261.9126.0228.2577.8379.7
50 × 4 × 101.627.288.652.8111.8564.7250 × 5 × 2024.548.464.5149.4147.4151.1
50 × 4 × 206.115.430.030.893.6100.4250 × 6 × 5316.7804.8475.2639.2754.1276.4
50 × 5 × 55.220.957.828.351.285.4250 × 6 × 1049.9144.5119.0246.5656.5252.5
50 × 5 × 100.711.542.821.0114.874.3250 × 6 × 20154.1431.4591.4989.7742.5232.1
50 × 5 × 200.00.060.37.7124.869.4300 × 5 × 5111.9266.0381.1360.2164.2383.7
50 × 6 × 50.04.441.717.8118.566.5300 × 5 × 10179.6303.0293.7586.7570.1584.0
50 × 6 × 100.00.012.88.556.260.0300 × 5 × 2035.8121.867.876.3184.8143.2
50 × 6 × 2031.760.387.345.1123.972.7300 × 6 × 5251.3436.7390.7672.2581.5445.9
100 × 2 × 5141.0193.5666.2322.01503.3648.7300 × 6 × 1091.3333.0136.4349.3948.2273.4
100 × 2 × 1040.373.0358.929.1994.4542.5300 × 6 × 20215.3518.7307.2464.7797.5415.7
100 × 2 × 2062.6402.7521.685.71132.7370.3500 × 5 × 54.272.0129.1754.3397.6229.8
100 × 3 × 51.236.666.935.9183.562.3500 × 5 × 1031.6147.8224.0762.2368.7185.0
100 × 3 × 10113.8167.2512.7436.31079.3490.0500 × 5 × 204.620.012.8742.041.7471.6
100 × 3 × 200.027.7109.151.0330.1145.7500 × 6 × 5192.9635.0634.11014.8393.3513.9
100 × 4 × 513.919.949.833.2341.1172.6500 × 6 × 1044.6206.3103.4246.2527.6335.5
100 × 4 × 104.7111.8150.595.0378.2155.7500 × 6 × 20119.0195.2389.1752.8592.2538.5
Table 9. Results of paired sample t-test.
Table 9. Results of paired sample t-test.
t-Testp_Value( Min )p_Value( Avg )p_Value( STD )
t-test (QABC, ABC1)0.00010.00000.0000
t-test (QABC, ABC2)0.00000.00000.0000
t-test (QABC, PMMA)0.00000.00000.0000
t-test (QABC, DABC)0.00000.00000.0000
t-test (QABC, NEABC)0.00000.00000.0000
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wang, J.; Lei, D.; Li, M. A Q-Learning-Based Artificial Bee Colony Algorithm for Distributed Three-Stage Assembly Scheduling with Factory Eligibility and Setup Times. Machines 2022, 10, 661. https://doi.org/10.3390/machines10080661

AMA Style

Wang J, Lei D, Li M. A Q-Learning-Based Artificial Bee Colony Algorithm for Distributed Three-Stage Assembly Scheduling with Factory Eligibility and Setup Times. Machines. 2022; 10(8):661. https://doi.org/10.3390/machines10080661

Chicago/Turabian Style

Wang, Jing, Deming Lei, and Mingbo Li. 2022. "A Q-Learning-Based Artificial Bee Colony Algorithm for Distributed Three-Stage Assembly Scheduling with Factory Eligibility and Setup Times" Machines 10, no. 8: 661. https://doi.org/10.3390/machines10080661

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop