Next Article in Journal
Darwiche and Pearl’s Iterated Belief Revision: Not Always Spohn-Expressible
Next Article in Special Issue
A Differential Evolutionary-Based XGBoost for Solving Classification of Physical Fitness Test Data of College Students
Previous Article in Journal
The Mutual Impact of Suppliers’ Online Sales Channel Choices and Platform Credit Decisions for Offline Channels
Previous Article in Special Issue
An Optimized Advantage Actor-Critic Algorithm for Disassembly Line Balancing Problem Considering Disassembly Tool Degradation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Flexible Job Shop Dynamic Scheduling and Fault Maintenance Personnel Cooperative Scheduling Optimization Based on the ACODDQN Algorithm

1
College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310023, China
2
Haitian Plastics Machinery Group Limited Company, Ningbo 315801, China
3
Ningbo Yongxin Optics Co., Ltd., Ningbo 315040, China
4
College of Computer Science, Zhejiang University of Technology, Hangzhou 310023, China
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(6), 932; https://doi.org/10.3390/math13060932
Submission received: 29 January 2025 / Revised: 6 March 2025 / Accepted: 6 March 2025 / Published: 11 March 2025

Abstract

:
In order to address the impact of equipment fault diagnosis and repair delays on production schedule execution in the dynamic scheduling of flexible job shops, this paper proposes a multi-resource, multi-objective dynamic scheduling optimization model, which aims to minimize delay time and completion time. It integrates the scheduling of the workpieces, machines, and maintenance personnel to improve the response efficiency of emergency equipment maintenance. To this end, a self-learning Ant Colony Algorithm based on deep reinforcement learning (ACODDQN) is designed in this paper. The algorithm searches the solution space by using the ACO, prioritizes the solutions by combining the non-dominated sorting strategies, and achieves the adaptive optimization of scheduling decisions by utilizing the organic integration of the pheromone update mechanism and the DDQN framework. Further, the generated solutions are locally adjusted via the feasible solution optimization strategy to ensure that the solutions satisfy all the constraints and ultimately generate a Pareto optimal solution set with high quality. Simulation results based on standard examples and real cases show that the ACODDQN algorithm exhibits significant optimization effects in several tests, which verifies its superiority and practical application potential in dynamic scheduling problems.

1. Introduction

With the accelerated transformation of global manufacturing into automation and intelligence, the Flexible Job Shop Problem [1] (FJSP) has become an important research direction in the field of production management optimization. The deep integration of the new generation of information technology, especially cloud computing [2], the Internet of Things [3], digital twin [4], and other key technologies, has driven manufacturing systems to a high degree of automation and flexibilization [5]. However, the traditional static flexible job shop scheduling model faces many dynamic challenges in practical applications, mainly including random arrivals and process changes at the order level, abnormal disturbances and parameter fluctuations at the equipment level, and real-time response requirements at the system level. In a dynamic environment, the traditional FJSP model often shows limitations regarding response lag and optimization failure, which makes it challenging to meet the demand for efficient scheduling in modern manufacturing systems. Therefore, the dynamic scheduling of flexible job shops (DFJSP) [6] problem has gradually become a key research topic in the field of intelligent manufacturing.
The flexible job shop scheduling problem (FJSP) has been studied extensively before, covering a wide range of types such as standard FJSPs, distributed FJSPs, and uncertain FJSPs. These studies usually decompose the FJSP into two main subproblems, namely operation sequence (OS) and machine assignment (MA) [7]. Existing studies show that the main optimization objectives of the FJSP are mainly focused on minimizing completion time. However, the traditional optimization objectives are often challenging to meet due to the complexity of the actual production environment, so the study of dynamic FJSP (DFJSP) problems with multiple resource constraints and multi-objective optimization is of great significance [8]. In recent years, scholars have conducted in-depth research on the dynamic scheduling problem of multi-objective flexible job shops and proposed a variety of optimization methods. For example, Yue et al. [9] proposed a two-stage double-depth Q-network (TS-DDQN) algorithm to solve the dynamic scheduling problem that includes new workpiece insertion and machine failures, aiming to optimize the total delay time and machine utilization; Zhang et al. [10] designed a two-stage algorithm based on a convolutional network for the study of the dynamic flexible job shop scheduling problem that takes into account machine failures, aiming to optimize completion time minimization and improve scheduling robustness; Gao et al. [11] combined the improved Jaya algorithm with local search heuristics to optimize the re-scheduling process of DFJSP, which achieves the minimization of completion time and the improvement of scheduling stability; Luan et al. [12] proposed an improved chimpanzee optimization algorithm for solving a multi-objective flexible job shop scheduling problem; Lv et al. [13] used the AGE- MOEA algorithm to solve a dynamic scheduling problem under emergency order insertion and multiple machine failures, optimizing completion time and total energy consumption; and Yuan et al. [14] used a deep reinforcement learning algorithm to solve a multi-objective dynamic flexible job shop scheduling problem considering the insertion of new workpieces, and achieved the minimization of multiple objectives such as completion time.
With the depth of research, scholars have gradually come to recognize the important role of human factors in production scheduling; especially in the complex dynamic scheduling environment, the synergy between human and machine resources is crucial to optimizing the scheduling performance. Therefore, incorporating human factors into the scheduling system can not only construct a more accurate and comprehensive scheduling model, but also improve the robustness of rescheduling, thus effectively improving production management efficiency [15]. Based on this, some scholars have begun to explore the multi-objective flexible job shop dynamic scheduling (DFJSP) problem involving the dual resources of humans and machines. For example, Zhang et al. [16] used an improved non-dominated sorting genetic algorithm (INSGA-II) to solve the energy-saving optimization problem of a flexible job shop considering both machine and worker scheduling and the optimization objectives, including the total energy consumption, completion time, and delay time; Li et al. [17] proposed a genetic algorithm based on the Improved Hybridized Producer–Consumer Framework (IPFGA) for solving a multi-objective DFJSP problem that considers the workers’ scheduling. Scheduling: Sun et al. [18] proposed a two-level nested Ant Colony Algorithm for the DFJSP problem involving both worker and machine tool scheduling to optimize the quality of critical jobs and minimize completion time; Mokhtari et al. [19] used a hybrid artificial bee colony (HABCO) algorithm to study the human–computer interface. Although existing studies have considered the factors of worker scheduling, few studies have delved into the problem of scheduling maintenance personnel after a machine failure, especially in dynamic scheduling scenarios, where the occurrence of machine failures often leads to significant changes in the production scheduling plan, and the timely response and reasonable scheduling of the maintenance personnel are of great importance for production recovery.
In terms of solution methods for the dynamic scheduling problem of flexible job shops, the current research focuses on two main categories: local search heuristic algorithms and meta-heuristic algorithms [20]. Local search heuristic algorithms, such as Variable Neighborhood Search (VNS) and Taboo Search (TS), optimize the quality of the solution through neighborhood search, while meta-heuristic algorithms, such as the Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Artificial Bee Colony Algorithm (ABC), and Ant Colony Optimization (ACO) Algorithm, carry out a global search to enhance the scheduling optimization effect through the mechanism of group intelligence or bio-inspiration [21]. Due to the powerful global search capability of meta-heuristic algorithms, they have been widely used to solve DFJSP problems. Long et al. [22] proposed a self-learning artificial bee colony (SLABC) algorithm for solving dynamic scheduling problems considering job insertion; Chen et al. [23] proposed a modified Ant Colony Optimization (PACO) Algorithm for solving the DFJSP problem; and Liang et al. [24] devised a genetic algorithm (GA) to cope with the dynamic scheduling problem considering machine failure interference. Although these heuristic-based algorithms have been successful to a certain extent, with the increasing complexity of the flexible job shop scheduling problem, mainly when multiple resources and multiple objectives are involved, the traditional algorithms can hardly satisfy the demand of intelligent production in terms of solution rate and accuracy. Therefore, it is imperative to explore more efficient algorithms to cope with complex dynamic scheduling problems.
In recent years, deep reinforcement learning (DRL) has been widely applied to solve DFJSP problems as an optimization method with high potential. For example, Chen et al. [25] used the Rainbow Deep Q-Network (Rainbow DQN) to solve a flexible job shop dynamic scheduling problem considering shop floor heterogeneity and workpiece insertion, aiming to minimize the total weighted delay and total energy consumption; Peng et al. [26] designed a two-stage Efficient Modulo Algorithm (EMA) for solving a distributed job shop scheduling problem with flexible machining time, which resulted in the minimization of energy consumption and processing time; Su et al. [27] proposed a graph reinforcement learning (GRL) approach to solve the multi-objective flexible job shop scheduling problem (MOFJSP) and generated a high-quality Pareto solution set by decomposing the multi-objective problem into different sub-problems based on preferences; and Liu et al. [28] used an actor-critic reinforcement learning algorithm to solve job shop scheduling problems. Wang et al. [29] used the PPO algorithm to solve the dynamic job shop scheduling problem considering machine failures and workpiece reworking; Luo et al. [30] constructed a bi-hierarchical deep Q-network (THDQN), which solves the complex and variable dynamic job shop scheduling problem and optimizes the average utilization rate of the machine and the total tardiness rate; Liu et al. [31] used a two-loop deep Q-network approach to solve a flexible job shop dynamic scheduling problem considering emergency order insertion; Palacio et al. [32] used a Q-learning algorithm to optimize a real manufacturing scenario of assembling light switches, achieving the goal of Makespan minimization; Gui et al. [33] used the DDPG algorithm to train a policy network, compounding a single dispatch rule into an optimal dispatch rule and solving for the average machine utilization and lateness; Zhang et al. [34] designed a dual two-depth Q-network algorithm to solve the dynamic scheduling problem of a flexible job shop with consideration of transportation time. Although deep reinforcement learning (DRL) has made some progress in the dynamic scheduling of flexible job shops, the algorithms face efficiency and accuracy challenges when dealing with complex scheduling problems. Therefore, this paper proposes an innovative scheduling method combining deep reinforcement learning and heuristic algorithms, aiming to solve multi-objective, multi-resource, and human–machine collaborative scheduling problems and provide an efficient solution for dynamic scheduling in intelligent production. Table 1 summarizes the differences between the above studies and this study.
In this paper, the ACO and DDQN algorithms are combined to propose an efficient method to solve the multi-resource and multi-objective flexible job shop dynamic scheduling problem (MMO-DFJSP), namely the Adaptive Ant Colony Algorithm based on reinforcement learning (ACODDQN). The algorithm combines the global search capability of the Ant Colony Optimization (ACO) Algorithm with the adaptive learning characteristics of the double-depth Q-network (DDQN), aiming to cope with the scheduling optimization problem in complex dynamic environments, so as to achieve the dual minimization of delay time and completion time. Specifically, the main contributions of this paper include (1) constructing a multi-resource, multi-objective dynamic scheduling optimization model covering workpieces, machines, and maintenance personnel, which entirely takes into account the complexity of the production system and achieves the minimization of both delay time and completion time; (2) designing four kinds of composite scheduling rules based on workpieces, machines, and maintenance personnel, which improve the flexibility of the scheduling scheme through adaptive scheduling strategies and enhance the ability to cope with complex production environments. (3) A self-learning Ant Colony Algorithm based on deep reinforcement learning (ACODDQN) is proposed, which combines the global search capability of the Ant Colony Optimization (ACO) Algorithm with the intelligent decision-making mechanism of the double-depth Q-network (DDQN), significantly improves the algorithm’s adaptivity and optimization capability in dynamic scheduling problems, and provides more efficient scheduling decision support for complex production systems.
The rest of the paper is organized as follows: Section 2 describes the multi-objective flexible job shop dynamic scheduling problem and constructs a mathematical model. Section 3 introduces the ACODDQN algorithm. Section 4 analyzes the extended calculus further and provides examples using the algorithm. Section 5 concludes the paper.

2. Problem Description and Model Construction

2.1. Problem Description

The multi-constrained flexible job shop dynamic scheduling problem can be described as follows: the job shop has n workpieces, m machines, and w maintenance personnel; each workpiece contains n i processes; each process has the corresponding machinable machine set and the corresponding processing time set; the ability of each maintenance personnel is different; and the time used for fault maintenance of the same machine is also different. Maintenance activities should not be interrupted during machine processing, and processing activities should not be carried out while maintenance is in progress. Under the conditions of workpiece constraint, machine constraint, and maintenance personnel constraint, the optimal machine combination is determined for each workpiece, and the most suitable maintenance personnel is assigned to the faulty machine.
The multi-constrained, multi-objective dynamic scheduling problem is studied based on workpiece, machine, and maintenance personnel. The research goal is to minimize the completion and delay times. In order to better solve the problem, the following hypotheses are given:
(1)
All workpieces, machines, and workers are available at 0;
(2)
There are order constraints between the operations of a job;
(3)
Maintenance personnel for different machine fault maintenance times is known;
(4)
At the same time, a machine can only process one process, and a maintenance worker can only carry out one maintenance activity;
(5)
Once started, unless machine failure occurs, no interruption is allowed until the operation is complete;
(6)
When the machine fails, the machine will be shut down immediately;
(7)
The repair time of the damaged machine is known;
(8)
The machine processing and maintenance process cannot be interrupted;
(9)
Do not consider transport time in the workpiece processing.

2.2. Model Establishment

The parameter settings are shown in Table 2. The mathematical model of MMO-DFJSP is established as follows.
The objective function is as follows:
Minimum delay time:
f 1 = min D T s u m = min i = 1 n D T i
Minimize maximum completion time:
f 2 = min ( max C i )
Constraint condition:
Workpieces are processed in a sequential order:
( t i j k ) f ( t i j 1 g ) f t i j k ( i = 1 , 2 , , n ; j = 2 , , n i ; k = 1 , 2 , , m ; g = 1 , 2 , , m )
The machine can perform only one process task at a time:
( t p s k ) f ( t i j k ) f + X i j p s k × t i j k + t c m k ( i , p = 1 , 2 , , n ; j , s = 1 , 2 , , n i ; k = 1 , 2 , , m ; X i j p s k = 0   o r   1 )
Maintenance personnel can only perform one maintenance activity at a time:
( t i j k ) f ( t w k ) f + X w k g × t c m k ( i = 1 , 2 , , n ; j = 1 , 2 , , n i ; k = 1 , 2 , , m ; w = 1 , 2 , , q ; X w k g = 0   o r   1 )
Each operation can only be prioritized on one machine:
k = 1 m X i j p s k = 1 i , p = 1 , 2 , , n ; j , s = 1 , 2 , , n i
Maintenance personnel can only repair one machine at a time:
k = 1 m X w k g   1 w = 1 , 2 , , q , g = 1 , 2 , , m
Failure occurs when machine k is selected to process the operation:
X i j k   X i j p s k i , p = 1 , 2 , , n ; j , s = 1 , 2 , , n i ; k = 1 , 2 , , m
Maintenance personnel can only repair malfunctioning machines:
X w k g     i = 1 n j = 1 n i X i j k w = 1 , 2 , , q ; g , k = 1 , 2 , , m
Starting moment of machining on machine k for process j of workpiece i :
( t i j k ) s = max ( t i j - 1 g ) f , p k ( i = 1 , 2 , , n ; j = 1 , 2 , , n i ; k = 1 , 2 , , m ; g = 1 , 2 , , m )
Time worker w starts maintenance activity on machine k :
( t w k ) s = max m w , m k
Completion time of the workpiece:
C i = j = 1 n i k = 1 m t i j k + j = 1 n i k = 1 m ( X i j k × t w k )
Deadline completion time of job i
t i j ¯ = mean k m t i j k
D i = t i j k s + ( j = 1 n i t i j ¯ ) × U P i
Total delay time of job i
D T i = max C i D i

2.3. Dynamic Scheduling Strategy

An event-driven rescheduling strategy is proposed to solve the problem of production schedule disruption caused by equipment failure in flexible job shop scheduling (FJSP). Taking machine failure as the event-driven triggering condition, it identifies the faulty machine and the time of failure in real time and dynamically adjusts the production schedule in combination with the scheduling cycle. The self-learning Ant Colony Algorithm (ACODDQN) is used to optimize the allocation of unfinished parts and maintenance personnel. The specific process is shown in Figure 1.

3. ACODDQN for MMO-DFJSP

The dynamic scheduling of flexible job shops is an NP-hard problem, and the Ant Colony Optimization (ACO) Algorithm [35,36] has been widely used to solve this problem. However, ACO has slow convergence and poor robustness in large-scale and dynamic environments. Deep reinforcement learning (DRL), especially the double-depth Q-network, DDQN [37,38], has demonstrated its potential to deal with complex scheduling problems by learning the optimal policy through interacting with the environment. However, it still faces the problem of slow convergence speed. To cope with these challenges, this paper proposes a self-learning Ant Colony Algorithm (ACODDQN) that combines ACO and DDQN, which integrates the advantages of global search and local optimization to solve complex scheduling problems more efficiently.

3.1. ACO Algorithm

The Ant Colony Optimization (ACO) Algorithm is a colony intelligence bionic algorithm derived from the process of simulating the foraging behavior of ants. In solving the dynamic flexible job shop scheduling problem (DFJSP), ACO explores the scheduling solution space by mimicking the behavior of ants looking for food. Each ant selects the machine and execution order of each process in the solution space, thus forming a complete scheduling solution. The core idea of ACO is to guide the search process through pheromone updating and collaboration among ants to gradually approach the global optimal solution. The specific algorithm flow is shown in Figure 2.

3.2. DDQN Algorithm

The DDQN algorithm is a deep reinforcement learning algorithm based on value functions, which aims to solve the high estimation problem in the traditional DQN algorithm. By introducing the goal network mechanism, the DDQN algorithm first evaluates the advantages and disadvantages of action selection, selects them, and then uses the goal network to estimate the goal Q-value. This approach effectively avoids overestimation in solving the multi-objective flexible job shop scheduling problem by decoupling the computation of action selection and target Q-value, thus reducing model bias.
When solving the large-scale multi-objective flexible job shop scheduling problem, the DDQN algorithm optimizes the scheduling strategy by continuously learning the intelligence. Its pseudo-code, shown in Algorithm 1, describes the key steps and update rules throughout the algorithm.
Algorithm 1. DDQN Algorithm
Input: D-empty reply: θ -initial network parameters;   θ copy of θ ; Nb-training batch size;
Nr-reply buffer maximum size; N -target network replacement freq.
Output: Parameters of network
1:for episode e ∈ {1,2,…,M}, do
2:Initializing frame sequence x ←()
3:for t ∈ {0,1,…}, do
4:  Set state s←x, sample action a~ π b
5:  Sample next frame x t from environment ε given (s,a), receive reward r, append x t to x
6:  if |x| > Nf, then delete oldest frame x t m i n from x end
7:  Set s’←x, add transition tuple (s,a,r,s’) to d
8:      Replace the oldest tuple if |D| ≥ Nr
9:  Sample a min-batch of Nb tuples (s,a,r,s’) to Unif (D)
10:    Construct target values, one for each of the Nb tuples
11:    Define amax(s’, θ ) = argmaxa. Q (s’; a’; θ )
12:     y r =                                   r                                                       ,     i f   s   i s   t e r m i n a l r + γ Q ( s ,     a m a x   ( s ; θ ) ; θ ) ,   o t h e r w i s e
13:    Do gradient decent step with loss | | γ i Q ( s , a ; θ ) | |   2
14:    Replace target parameters θ θ every N’ steps
15:    end
16:end

3.3. ACODDQN Algorithm

The core idea of the ACODDQN algorithm is to combine the Ant Colony Optimization (ACO) Algorithm with the double-depth Q-network (DDQN), which performs an iterative search in the solution space through the pheromone-guided and updating mechanism and gradually approaches the optimal scheduling scheme. Meanwhile, the DDQN algorithm optimizes the parameters in the ACO through reinforcement learning to cope with the dynamic changes and challenges in the complex flexible job shop scheduling problem.

3.3.1. ACODDQN Algorithm Framework

The ACODDQN algorithm framework is shown in Figure 3. The steps of the ACODDQN algorithm are detailed below:
Step 1: Initialize algorithm parameters. To provide the base settings for algorithm execution, initialize the pheromone matrix in the ACO Algorithm and the DDQN parameters.
Step 2: The ant colony searches the solution space. The Ant Colony Algorithm explores the solution space, selects the next visited node by transferring the probability, constructs the path, and keeps searching until it traverses all the nodes and records the ants’ exploration path.
Step 3: State space construction. Record the feasible solution for the ant exploration path in the state space of the DDQN algorithm.
Step 4: Pheromone update with non-dominated sorting. Rank the solutions by non-dominated sorting and assign different ranks to update the pheromone, strengthen the selection of high-quality solutions, and optimize the pheromone guidance.
Step 5: Iteration with DDQN training. Determine whether the preset number of iterations has been reached. If not, use DDQN to train the ant colony path and select N feasible solutions to continue the search.
Step 6: Feasible solution division. Assign valid feasible solutions based on uniform weights to ensure the diversity of the solution set.
Step 7: Monte Carlo feasibility validation. Verify the feasibility of the solutions using Monte Carlo simulation methods to ensure that they satisfy the scheduling constraints.
Step 8: Output Pareto optimal solution set. Output a valid Pareto optimal solution set that provides a balanced solution to the multi-objective scheduling problem.
Figure 3. Flowchart of the ACODDQN algorithm.
Figure 3. Flowchart of the ACODDQN algorithm.
Mathematics 13 00932 g003

3.3.2. Ant Colony Search Solution Space

In the ACODDQN algorithm, ants explore the solution space by simulating natural foraging behavior. The process can be described as follows:
(1) Initialization: Set the initial number of ants to 0 and gradually increase it. Each ant represents a solution explorer and adopts a roulette selection strategy to choose the next node among the unvisited nodes. The probability of selection is jointly determined by pheromone concentration and heuristic information;
(2) Node selection and taboo table update: The ants perform node selection based on the roulette selection strategy and add the visited nodes to the taboo table to avoid repeated visits. The taboo table is updated instantly after each node selection to ensure the diversity of path exploration;
(3) Path completion and recording: The ants select nodes until all nodes are visited and path exploration is completed. After path completion, path G and its quality metrics (e.g., path length or objective value) are recorded, and the path is used as part of the current solution to provide a reference for subsequent optimization and decision-making.

3.3.3. Non-Dominant Sort

In the previous section, the use of an ant colony to explore the solution space has been proposed. Next, non-dominated ordering will be used to evaluate the merits of the generated solutions, so as to ensure that the solution set can effectively cover the Pareto front, which is designed as follows:
(1) Solution set generation: Each time, a set of solutions is generated through an ant exploration process, each generated by ants under a roulette selection strategy. Specifically, the ants choose the decisions of artifact scheduling (artifact i is assigned to machine k ), machine assignment (machine k is used in the order), and repairer scheduling (repairer w repairs machine k ). Each solution corresponds to a specific scheduling scheme involving the start and end times of tasks and the assignment of repair tasks.
(2) Dominance relations and ordering: The dominance relations between each pair of solutions are computed, and a non-dominated ordering of the solution set is performed to classify the solution set into multiple classes. The solutions in the solution set are ranked based on the Pareto front and their congestion is evaluated, which is mainly ranked for each objective, and the distance d between the solution and the neighboring solutions is calculated as follows:
C r o w d i n g   d e g r e e Q = i d i d m a x
where d i is the distance on target i and d m a x is the maximum distance on that target dimension.
(3) Selection of optimal solution: The optimal solution is selected based on the non-dominated ranking and the degree of congestion. To ensure the quality and diversity of solutions, preference is given to solutions with a low dominance rank (closer to the Pareto front) and higher congestion.

3.3.4. Pheromone Update Mechanism

In the ACODDQN algorithm, the definition of state space and action space provides the basis for learning and decision-making for intelligence. However, the pheromone update mechanism becomes crucial in optimizing the scheduling strategy further and improving the algorithm’s search efficiency and convergence speed. The pheromone is not only related to the quality of path selection, but also needs to be updated with the Q-value in the DDQN algorithm to achieve an effective combination of ACO and DDQN. This mechanism is designed to enhance the role of pheromones in guiding the DDQN learning process and optimize the overall search strategy.
First, to adapt the pheromone update to the dynamic search progress of the algorithm, we designed the adaptive evaporation rate mechanism. This mechanism dynamically adjusts the evaporation rate according to the algorithm’s current search progress to make the pheromone evaporation more reasonable. Its calculation formula is as follows:
p t = p 0 × 1 t T max
P 0 is the initial evaporation rate; T max is the maximum number of iterations; and t is the current number of iterations.
Next, we design the adaptive pheromone increment mechanism to better guide the ants in selecting quality paths. This mechanism adjusts the pheromone increment through the Q-value learning progress in the DDQN algorithm to improve the quality of path selection. Its increment calculation formula is as follows:
Δ τ i j ( t ) = Q L i j × 1 + a × Q s , a
Q s , a is the Q-value of the DDQN as state s and action a; a is a constant, used to adjust the degree of influence of the Q-value on pheromone increment; and L i j is the path mass.
In order to further enhance the pheromone guidance, the mechanism of forward and reverse pheromone updating was adopted. During forward updating, ants tend to choose the path with a higher pheromone concentration. After each Q-value update, the pheromone will be adjusted in the reverse direction according to the change in the Q-value. Its updating formula is as follows:
τ i j ( t + 1 ) = ( 1 - p t ) × t i j ( t ) + Δ τ i j ( t ) × ( 1 + β × Δ Q s , a )
Δ Q s , a is the change in the Q-value of state s and action a in DDQN and β is a constant used to regulate the effect of changes in the Q-value on pheromone renewal.
In summary, the final pheromone update mechanism can be expressed as follows:
τ i j ( t + 1 ) = ( 1 - p t ) × t i j ( t ) + Q L i j × 1 + a × Q s , a × ( 1 + β × Δ Q s , a )
Through the above design, the pheromone updating mechanism can dynamically reflect the superior and inferior paths in the search process and simultaneously combine with the feedback mechanism of the Q-value to further optimize the search strategy and accelerate convergence.

3.3.5. DDQN Framework

To further optimize the scheduling strategy, the DDQN algorithmic framework is designed in this section to enhance the learning capability and improve the optimization of the solutions generated by the ACO Algorithm.
(1)
State set
In the solution space exploration phase of the ACO Algorithm (Section 3.3.2), the ants generate the initial scheduling solution through a roulette wheel selection strategy. To ensure the effectiveness of subsequent optimization, the ACODDQN framework introduces a state space design. The quality of scheduling decisions relies on accurately perceiving the current production state. By computing real-time shop floor processing information, we construct state feature vectors reflecting the production environment to minimize the delay and completion times. The state features include the average completion time, the standard deviation of completion time, the average delay time, and the standard deviation of delay time of the workpiece.
C a v e = i = 1 N C i N
C s t d = i = 1 N C i C a v e 2 N
D T a v e = i = 1 N D T i N
D T s t d = i = 1 N D T i D T a v e 2 N
Equations (21)–(24) represent the four state characteristics. Equation (21) represents the average completion time of all artifacts and Equation (22) represents the standard deviation of the completion time of all artifacts. Equation (23) represents the average delay time of all artifacts and Equation (24) represents the standard deviation of the delay time of all artifacts.
(2)
Action state
In the aforementioned non-dominated sorting (Section 3.3.3), we ensured the distribution of different solutions in the Pareto front by sorting the generated solution set. In order to cope with complex production environments, the DDQN framework needs to optimize the action selection further for the initial solutions generated by the ACO Algorithm. To this end, we designed composite scheduling rules in the DDQN framework for the selection of workpieces, machines, and maintenance workers, respectively: one, two, and two scheduling rules for workpieces, machines, and maintenance workers, and four composite scheduling rules are formed by different combinations, as shown in Table 3, aiming to optimize the dynamic scheduling and co-scheduling of the maintenance workers for machine failures.
The workpiece scheduling rule J1 follows the first-in-first-out (FIFO) principle.
Machine scheduling rule M1 selects the earliest available machine and M2 selects the machine with the longest idle time.
Maintenance worker scheduling rule W1 selects the maintenance worker with the longest idle time and W2 selects the available worker with the shortest maintenance time.
(3)
Reward
In the pheromone update mechanism (Section 3.3.4), the concentration of pheromones reflects the quality of the current solution and guides the generation of subsequent solutions. In the DDQN framework, the reward function is designed to enhance the scheduling optimization further. In order to solve the multi-objective optimization problem (minimizing the delay time and completion time), this paper proposes a systematic reward function design scheme combining a weighted reward function, multi-objective optimization, and a penalty mechanism, aiming to balance the optimization effects of different objectives and ensure the feasibility and efficiency of the solution.
Firstly, in order to balance the two objectives of delay time D T s , a and completion time C max s , a in the optimization process, this paper uses the weighted sum method to design the reward function in the following form:
R * s , a = w 1 × D T s , a + w 2 × C max s , a
where w 1 and w 2 are the weighting coefficients of delay time and completion time, respectively, and satisfying w 1 + w 2 = 1 ensures a balanced contribution of the two objectives in the optimization process.
In order to avoid the impact of imbalance between the objectives due to different magnitudes, the delay time and completion time are normalized in this paper and mapped to a uniform range of values. The normalized reward function is of the following forms:
D T s , a = D T max D T
C max s , a = C max max C max
R s , a = w 1 × D T s , a + w 2 × C max s , a
In order to ensure the feasibility of the solution process and to avoid the generation of invalid solutions, a penalty mechanism is designed in this paper. If the scheduling scheme increases the delay or completion time, the penalty mechanism guides the model away from non-compliant solutions through negative rewards. The following formula calculates the penalty term:
R p e n a l t y s , a = - α × ( D T s , a + C max s , a )
where α is a penalty coefficient that adjusts the strength of the negative reward to ensure that the model avoids ineffective or inefficient scheduling schemes.
Combining the above designs, the final reward function takes the following form:
R s , a = w 1 × D T s , a + w 2 × C max s , a + R p e n a l t y s , a
This function can balance the two objectives of delay time and completion time in the optimization process and, at the same time, improve the quality of the scheduling solution by constraining the solutions that do not meet the constraints through a penalty mechanism.

3.3.6. Feasible Solution Optimization

In the ACODDQN algorithm, a diversity partition mechanism of uniform weight distribution is adopted to ensure the diversity of the solution set and avoid local optimization. First, the feasible solutions are divided into subsets based on uniform weight allocation to ensure that the solution set is evenly distributed in the target space (such as delay and completion time). By calculating the crowding degree, the solution with a high crowding degree is preferentially retained to avoid local clustering. The dynamic weight adjustment formula is as follows:
w n e w = w o l d + α × d t a r g e t - d c u r r e n t
where d t a r g e t is the density of the target distribution, d c u r r e n t is the density of the current solution set, and α is the adjustment coefficient.
Finally, through the elite retention mechanism combined with non-dominated sorting, the top N high-quality solutions are retained to enter the next iteration to ensure the quality and diversity of the solution set. This mechanism effectively improves the algorithm’s global search ability.

3.3.7. Pareto Optimal Solution Set Generation and Output

In the ACODDQN algorithm, the solution of level 1 is extracted by non-dominant sorting, and the redundant solution is eliminated by combining the crowding sorting to ensure the universality and uniform distribution of the Pareto frontier. The Pareto optimal solution set is stored in a tree structure, where the key is the target vector (e.g., delay time, completion time), and the value is the corresponding scheduling scheme. Through this process, ACODDQN effectively outputs a set of Pareto optimal solutions that meet the needs of multi-objective optimization.

4. Experiment and Discussion

4.1. Extension Examples

The simulation experiment is implemented in the Matlab (Matlab2020a) language environment. Due to the lack of a general benchmark considering maintenance personnel resources in DFJSP problems, to verify the performance of the ACODDQN algorithm, this study conducted a test by extending the existing MK01-15 [39] benchmark example. The proposed model combined dynamic production scheduling with maintenance personnel’s maintenance of faulty machines. The added machine failure and maintenance personnel-related parameters for the MK01-15 test set are as follows:
(1) It is known that there are k machines, w maintenance personnel, and a production cycle C . For each machine k , determine its downtime;
(2) The maintenance personnel determine the maintenance time, and the maintenance time is T r e p a i r ~ N μ w , σ w . If T k T c u r r e n t , the faulty machine k is determined to be repaired by the maintenance worker w , T r e p a i r , w , k ~ N μ w , σ w ;
(3) The dynamic scheduling of the whole system is carried out via real-time simulation to ensure that the maintenance personnel can deal with machine failure when it occurs and meet the maintenance personnel’s ability and machine’s maintenance requirements.
Considering the above factors, the extended example is MWK01-15, in which the processing time, maintenance time, and workpiece urgency fluctuate within a specific range. The extended example is shown in Table 4.

4.2. Parameter Settings

For the ACODDQN algorithm, most of these parameters are random variables that fluctuate within a fixed range, so the optimal values of the parameters need to be determined. In this paper, we use Taguchi experiments to determine the optimal parameter values for the ACODDQN algorithm, which mainly involves the relevant parameters of the ACO and DDQN algorithms, with five levels set for each factor. The parameter design is shown in Table 5.
In Table 4, m represents the ant colony number; ρ is pheromone volatilization rate; α is the learning rate; γ is the discount factor; and ε is the greed rate.
The orthogonal test in this paper is based on five levels and five factors, so the L25 (5^5) orthogonal table is chosen. According to the orthogonal test parameter table, MWK10 was selected as the main example of the orthogonal test, and the ACODDQN algorithm changed the relevant parameter values. Each parameter value was run 20 times to obtain 20 different solution sets under this parameter. Then, the 20 solution sets were combined, and their Pareto frontier was determined as the optimal solution under this set of parameters. At the same time, the HV index of the solution set is selected as the comprehensive evaluation index of the orthogonal experiment. In order to observe the experimental results more clearly, the HV values of the five levels of the same factor were summed and averaged, defined as the influence factor of the factor on the example. The resulting graph was drawn, as shown in Figure 4. It can be seen from Figure 4 that when the number of ants ( m = 175) significantly increased the HV value, it was helpful to explore the solution space fully. A low pheromone volatilization rate ( ρ = 0.1) maintains a high HV value and avoids premature convergence. Smaller learning rates ( α = 0.03) guarantee stable convergence. The discount factor ( γ = 0.85) balances short- and long-term goals to improve performance. The low greed ratio ( ε = 0.1) ensures that the algorithm is stable and avoids over-reliance on the current optimal solution. The final optimal parameters are as follows: m = 175, ρ = 0.1, α = 0.03, γ = 0.85, ε = 0.1.

4.3. Evaluation Indicators

The SP, IGD, and HV [40] indexes comprehensively evaluate the multi-objective solution set’s uniformity, convergence, and diversity.
S P ( N ) = i N ( d i   ¯ d i ) 2 N 1
where d i is the manhattan distance between two points and N is the population size. The smaller the SP value, the more uniform the distribution of the solutions and the better the diversity of the solutions.
I G D ( P , P * ) = x p * d i s ( x , P ) P *
where P is the solution set of the ACODDQN algorithm, P * the uniform sampling set of true Pareto fronts, and d i s ( x , P ) the Euclidean distance between P * and P . Smaller values of IGD represent the better quality of the solution set composite.
H V = δ × ( U i = 1 | S | V i )
where δ denotes the Lebesgue measure and S is used to measure the volume. denotes the number of non-dominated solution sets, and V i denotes the hypervolume formed by the reference point and the ith solution in the solution set. Larger values of HV indicate the better homogeneity and diversity of the algorithm’s results.

4.4. The Experimental Results of the Extended Example

4.4.1. Comparison with the Composite Scheduling Rule

This paper compares the ACODDQN algorithm with four compound scheduling rules in detail. The optimal results are marked in bold through 20 independent runs of 15 extended examples and the average processing of each evaluation index, and relevant statistical results are obtained (Table 6). As seen from Table 6, compared with the composite scheduling rule, the ACODDQN algorithm performs best in SP, IGD, and HV. In these three indexes, the ACODDQN algorithm (93.3%, 93.3%, and 100%) is superior to the other four compound scheduling rules, which proves that the ACODDQN algorithm has apparent advantages in convergence, uniformity, and diversity compared with other scheduling rules. The average time of 20 runs of the ACODDQN algorithm and four composite scheduling rules in 15 extended calculation examples is shown in Table 7, and the shortest time is marked in bold. It can be seen from Table 7 that the ACODDQN algorithm shows the shortest running time in the calculation examples of different scales. This shows that the ACODDQN algorithm has substantial solution accuracy and a significant advantage in computing efficiency.

4.4.2. Comparing Algorithms

In order to verify the performance of the proposed ACODDQN algorithm for solving the MMO-DFJSP model, 15 extended examples were used to compare the ACODDQN algorithm with MOPSO [41], IGWO [42], NSGA-II [43], DDQN [44], PPO [45], and GA [46]. The optimal results are marked in bold through 20 independent runs of 15 extended examples and the average processing of each evaluation index, and relevant statistical results are obtained (Table 8 and Table 9). As can be seen from Table 8, in both the SP and IGD indexes, the ACODDQN algorithm outperformed the other six algorithms in thirteen experiments (86.7%). As can be seen from Table 9, among the HV indicators, the ACODDQN algorithm outperformed the other six algorithms in thirteen experiments (93%), indicating that the ACODDQN algorithm has obvious advantages in terms of convergence, diversity, and coverage. In order to more intuitively show the average performance of the SP, IGD, and HV indicators of each algorithm in solving extended examples, the indicators of three extended examples, MWK02, MWK05, and MWK10, are drawn as box diagrams (Figure 5). As can be seen from Figure 5, compared with the other six algorithms, the ACODDQN algorithm has the lowest mean value of the SP and IGD indicators and the highest mean value of the HV indicators, and the index range fluctuated less, indicating that ACODDQN is superior to other algorithms in terms of solution quality and stability. In addition, the average time of twenty runs between the ACODDQN algorithm and six comparison algorithms in fifteen extended examples was counted, and the shortest time was marked in bold, as shown in Table 10. It can be seen from Table 10 that with the expansion of the scale of extended examples, the running time gap between the ACODDQN algorithm and the other six comparison algorithms gradually increased. Moreover, the ACODDQN algorithm always shows the shortest running time in all the extended examples, which fully proves the excellent solving efficiency of the ACODDQN algorithm.
In order to further analyze the performance advantages of the ACODDQN algorithm in solving dynamic scheduling problems, four extended examples of MWK01, MWK02, MWK04, and MWK10 were selected in this paper, and the key parameters ( m , i , w , and U P i ) in each example are shown. The ACODDQN algorithm and six other algorithms were used to run these four extended examples twenty times independently, average the target values of the twenty times, and finally draw the results, as shown in Figure 6. As seen from Figure 6, among the four extended examples shown, the ACODDQN algorithm shows significant advantages for the two optimization objectives of minimizing delay time and completion time, which makes it superior compared to other comparison algorithms. This result proves that the ACODDQN algorithm can effectively reduce the delay time in production scheduling and significantly shorten overall completion time. Thus, the system’s overall scheduling efficiency can be improved.

4.5. Case Experiments

4.5.1. Case Description

In order to verify the effectiveness of the ACODDQN algorithm in solving multi-objective dynamic scheduling problems, this study selected a manufacturing flexible processing plant as an example. In the production process of this workshop, machine failure will affect the overall scheduling. Figure 7 shows the plant’s assembly process flow chart. According to the flow, this paper simplifies the workpiece processing information to a 10 × 6 flexible job scheduling problem, as shown in Table 11. Table 12 provides the maintenance time information for machines that can be serviced by maintenance personnel. A 10 × 6 × 3 multi-objective flexible job shop scheduling problem is formed by incorporating maintenance personnel scheduling into a flexible job shop scheduling problem. This example further verifies the effectiveness and advantages of the ACODDQN algorithm in solving dynamic scheduling problems.

4.5.2. Case Solving and Analysis

In this study, by considering the influence of machine fault as a dynamic disturbance factor on the production process, the ACODDQN algorithm and six comparison algorithms are used to solve the production scheduling problem based on the scheduling information of the workpiece, machine tool, and maintenance personnel. The optimization goal is to minimize latency and completion time. In the experiment, the ACODDQN algorithm was optimized for 100 iterations with 6 other comparison algorithms, and its performance was evaluated by drawing the target value change curve of each algorithm in the iteration process (Figure 8). The results show that compared with the other six algorithms, the ACODDQN algorithm not only achieves the optimal initial solution with minimum delay and completion times, but also achieves the optimal solution with the fastest convergence speed, which indicates that ACODDQN has a stronger solving ability and stability when solving complex production scheduling problems.
In order to further demonstrate the response effect of the rescheduling mechanism of the ACODDQN algorithm when machine failure occurs, a Gantt diagram of the assembly manufacturing process solved using the ACODDQN algorithm is drawn. The Gantt chart for the initial scheduling shows that the total completion time was 150 min (Figure 9), and that within 60–90 min, machine 2 was temporarily shut down due to a power outage, while machine 3 stopped working due to the wear of mechanical components. Currently, the ACODDQN algorithm’s rescheduling mechanism enables the rescheduling of the processes O10.4 and O2.3 affected by the fault. Specifically, the correct shift scheduling and all rescheduling policies are applied; the unfinished operation O2.3 after the failure of machine 3 is moved right to the available machine 2. In contrast, the unstarted operation O10.4 before the failure of machine 2 is scheduled for the available machine 3. After failure, all processes have been reasonably rescheduled to achieve the minimum completion target. Finally, as shown in Figure 10, the Gantt diagram after the fault repair has a total completion time of 145 min, which is 5 min less than the initial scheduling time, ensuring the regular operation of the factory. At the same time, maintenance workers 1 and 3 are reasonably scheduled to repair machine 2 and machine 3, respectively, and the maintenance arrangement is shown in Figure 11. The results show that although machine fault interrupts production, the production process can be quickly resumed, and the completion time is shorter than the initial scheme thanks to reasonable scheduling and the fault maintenance arrangement of the ACODDQN algorithm, which further verifies the effectiveness and advantages of the ACODDQN algorithm in dynamic scheduling problems.

5. Conclusions

This study proposes a self-learning Ant Colony Algorithm based on deep reinforcement learning (ACODDQN) for the problem of brutal production plan execution due to equipment failure in the dynamic scheduling of a flexible job shop problem (DFJSP). The algorithm integrates the advantages of Ant Colony Optimization (ACO) and the double-depth Q-network (DDQN), explores the solution space through the ACO search mechanism, and combines with non-dominated sorting to screen high-quality solutions to improve the global quality of the solutions and convergence efficiency. Meanwhile, the pheromone updating mechanism dynamically adjusts the search strategy based on the environmental feedback to ensure the adaptability of the optimization direction and avoid the trap of local optimums. The DDQN framework’s introduction further enhances the algorithm’s learning ability, so it has better decision stability and adaptability when coping with stochasticity and unexpected events (e.g., machine failures) in the scheduling environment. In addition, the algorithm locally adjusts the candidate solutions through the feasible solution optimization mechanism to ensure that they satisfy the constraints. It generates the Pareto optimal solution set under the multi-objective optimization framework, which provides high-quality optimization solutions for complex dynamic scheduling problems. The experimental results show that the ACODDQN algorithm exhibits good adaptability and solution performance in minimizing completion and delay times.
This study does not consider the impact of the travel time of maintenance personnel and other potential wastes on the scheduling results, and future studies can take these factors into account to better reflect the actual scheduling scenario.

Author Contributions

Conceptualization, X.X.; methodology, Y.S.; software, Z.C.; formal analysis, J.C.; writing—review and editing, J.Z.; project administration, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Zhejiang Science and Technology Plan Project, grant number 2024C01208.

Data Availability Statement

The data presented in this study are available upon request from the corresponding authors.

Conflicts of Interest

Author Jun Cao was employed by the company Haitian Plastics Machinery Group Limited Company. Author Yiping Shao was employed by the company Ningbo Yongxin Optics Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Niu, H.; Wu, W.; Zhang, T.; Shen, W.; Zhang, T. Adaptive salp swarm algorithm for solving flexible job shop scheduling problem with transportation time. J. Zhejiang Univ. 2023, 57, 1267–1277. [Google Scholar]
  2. Guo, H.Z.; Wang, Y.T.; Liu, J.J.; Liu, C. Multi-UAV Cooperative Task Offloading and Resource Allocation in 5G Advanced and Beyond. IEEE Trans. Wirel. Commun. 2024, 23, 347–359. [Google Scholar] [CrossRef]
  3. Tariq, A.; Khan, S.A.; But, W.H.; Javaid, A.; Shehryar, T. An IoT-Enabled Real-Time Dynamic Scheduler for Flexible Job Shop Scheduling (FJSS) in an Industry 4.0-Based Manufacturing Execution System (MES 4.0) (vol 12, pg 49653, 2024). IEEE Access 2024, 12, 118941. [Google Scholar] [CrossRef]
  4. Wang, J.; Liu, Y.; Ren, S.; Wang, C.; Ma, S. Edge computing-based real-time scheduling for digital twin flexible job shop with variable time window. Robot. Comput.-Integr. Manuf. 2023, 79, 102435. [Google Scholar] [CrossRef]
  5. Cheng, D.; Lou, S.; Zheng, H.; Hu, B.; Hong, Z.; Feng, Y.; Tan, J. Industrial Exoskeletons for Human-Centric Manufacturing: Challenges, Progress, and Prospects. Comput. Integr. Manuf. Syst. 2024, 30, 4179. [Google Scholar]
  6. Jiang, Q.; Wei, J. Real-time Scheduling Method for Dynamic Flexible Job Shop Scheduling. J. Syst. Simul. 2024, 36, 1609–1620. [Google Scholar]
  7. Gen, M.; Lin, L.; Ohwada, H. Advances in Hybrid Evolutionary Algorithms for Fuzzy Flexible Job-shop Scheduling: State-of-the-Art Survey. In Proceedings of the ICAART (1), Online, 4–6 February 2021; pp. 562–573. [Google Scholar]
  8. Dauzère-Pérès, S.; Ding, J.; Shen, L.; Tamssaouet, K. The flexible job shop scheduling problem: A review. Eur. J. Oper. Res. 2024, 314, 409–432. [Google Scholar] [CrossRef]
  9. Yue, L.; Peng, K.; Ding, L.S.; Mumtaz, J.; Lin, L.B.; Zou, T. Two-stage double deep Q-network algorithm considering external non-dominant set for multi-objective dynamic flexible job shop scheduling problems. Swarm Evol. Comput. 2024, 90, 13. [Google Scholar] [CrossRef]
  10. Zhang, G.H.; Lu, X.X.; Liu, X.; Zhang, L.T.; Wei, S.W.; Zhang, W.Q. An effective two-stage algorithm based on convolutional neural network for the bi-objective flexible job shop scheduling problem with machine breakdown. Expert Syst. Appl. 2022, 203, 12. [Google Scholar] [CrossRef]
  11. Gao, K.; Yang, F.; Li, J.; Sang, H.; Luo, J. Improved jaya algorithm for flexible job shop rescheduling problem. IEEE Access 2020, 8, 86915–86922. [Google Scholar] [CrossRef]
  12. Luan, F.; Tang, B.; Li, Y.; Liu, S.Q.; Yang, X.; Masoud, M.; Feng, B. Solving multi-objective green flexible job shop scheduling problem by an improved chimp optimization algorithm. J. Intell. Fuzzy Syst. 2024, 46, 7697–7710. [Google Scholar] [CrossRef]
  13. Lü, Y.; Xu, Z.; Li, C.; Li, L.; Yang, M. Comprehensive Energy Saving Optimization of Processing Parameters and Job Shop Dynamic Scheduling Considering Disturbance Events. J. Mech. Eng. 2022, 58, 242–255. [Google Scholar]
  14. Yuan, E.D.; Wang, L.J.; Song, S.J.; Cheng, S.L.; Fan, W. Dynamic scheduling for multi-objective flexible job shop via deep reinforcement learning. Appl. Soft Comput. 2025, 171, 13. [Google Scholar] [CrossRef]
  15. Jimenez, S.H.; Trabelsi, W.; Sauvey, C. Multi-Objective Production Rescheduling: A Systematic Literature Review. Mathematics 2024, 12, 3176. [Google Scholar] [CrossRef]
  16. Zhang, H.; Xu, J.; Tan, B.; Xu, G. Dual Resource Constrained Flexible Job Shop Energy-saving Scheduling Considering Delivery Time. J. Syst. Simul. 2023, 35, 734–746. [Google Scholar]
  17. Li, X.; Xing, S. Dynamic scheduling problem of multi-objective dual resource flexible job shop based on improved genetic algorithm. In Proceedings of the 43rd Chinese Control Conference, CCC 2024, Kunming, China, 28–31 July 2024; pp. 2046–2051. [Google Scholar]
  18. Sun, A.; Song, Y.; Yang, Y.; Lei, Q. Dual Resource-constrained Flexible Job Shop Scheduling Algorithm Considering Machining Quality of Key Jobs. China Mech. Eng. 2022, 33, 2590–2600. [Google Scholar]
  19. Mokhtari, G.; Abolfathi, M. Dual Resource Constrained Flexible Job-Shop Scheduling with Lexicograph Objectives. J. Ind. Eng. Res. Prod. Syst. 2021, 8, 295–309. [Google Scholar]
  20. Jiang, B.; Ma, Y.J.; Chen, L.J.; Huang, B.D.; Huang, Y.Y.; Guan, L. A Review on Intelligent Scheduling and Optimization for Flexible Job Shop. Int. J. Control Autom. Syst. 2023, 21, 3127–3150. [Google Scholar] [CrossRef]
  21. Turkyilmaz, A.; Senvar, O.; Unal, R.; Bulkan, S. A research survey: Heuristic approaches for solving multi objective flexible job shop problems. J. Intell. Manuf. 2020, 31, 1949–1983. [Google Scholar] [CrossRef]
  22. Long, X.; Zhang, J.; Zhou, K.; Jin, T. Dynamic self-learning artificial bee colony optimization algorithm for flexible job-shop scheduling problem with job insertion. Processes 2022, 10, 571. [Google Scholar] [CrossRef]
  23. Chen, F.; Xie, W.; Ma, J.; Chen, J.; Wang, X. Textile Flexible Job-Shop Scheduling Based on a Modified Ant Colony Optimization Algorithm. Appl. Sci. 2024, 14, 4082. [Google Scholar] [CrossRef]
  24. Liang, Z.Y.; Zhong, P.S.; Zhang, C.; Yang, W.L.; Xiong, W.; Yang, S.H.; Meng, J. A genetic algorithm-based approach for flexible job shop rescheduling problem with machine failure interference. Eksploat. I Niezawodn. 2023, 25, 13. [Google Scholar] [CrossRef]
  25. Chen, Y.; Liao, X.J.; Chen, G.Z.; Hou, Y.J. Dynamic Intelligent Scheduling in Low-Carbon Heterogeneous Distributed Flexible Job Shops with Job Insertions and Transfers. Sensors 2024, 24, 2251. [Google Scholar] [CrossRef]
  26. Peng, N.; Zheng, Y.; Xiao, Z.; Gong, G.; Huang, D.; Liu, X.; Zhu, K.; Luo, Q. Multi-objective dynamic distributed flexible job shop scheduling problem considering uncertain processing time. Clust. Comput. 2025, 28, 185. [Google Scholar] [CrossRef]
  27. Su, C.; Zhang, C.; Wang, C.; Cen, W.; Chen, G.; Xie, L. Fast Pareto set approximation for multi-objective flexible job shop scheduling via parallel preference-conditioned graph reinforcement learning. Swarm Evol. Comput. 2024, 88, 101605. [Google Scholar] [CrossRef]
  28. Liu, C.-L.; Chang, C.-C.; Tseng, C.-J. Actor-critic deep reinforcement learning for solving job shop scheduling problems. IEEE Access 2020, 8, 71752–71762. [Google Scholar] [CrossRef]
  29. Wang, L.; Hu, X.; Wang, Y.; Xu, S.; Ma, S.; Yang, K.; Liu, Z.; Wang, W. Dynamic job-shop scheduling in smart manufacturing using deep reinforcement learning. Comput. Netw. 2021, 190, 107969. [Google Scholar] [CrossRef]
  30. Luo, S.; Zhang, L.; Fan, Y. Dynamic multi-objective scheduling for flexible job shop by deep reinforcement learning. Comput. Ind. Eng. 2021, 159, 107489. [Google Scholar] [CrossRef]
  31. Liu, Y.; Shen, X.; Gu, X.; Peng, T.; Bao, J.; Zhang, D. Dual-System Reinforcement Learning Approach for Dynamic Scheduling in Flexible Job Shops. J. Shanghai Jiao Tong Univ. 2022, 56, 1262–1275. [Google Scholar] [CrossRef]
  32. Palacio, J.C.; Jiménez, Y.M.; Schietgat, L.; Van Doninck, B.; Nowé, A. A Q-Learning algorithm for flexible job shop scheduling in a real-world manufacturing scenario. Procedia CIRP 2022, 106, 227–232. [Google Scholar] [CrossRef]
  33. Gui, Y.; Tang, D.; Zhu, H.; Zhang, Y.; Zhang, Z. Dynamic scheduling for flexible job shop using a deep reinforcement learning approach. Comput. Ind. Eng. 2023, 180, 109255. [Google Scholar] [CrossRef]
  34. Zhang, L.; Yan, Y.; Yang, C.; Hu, Y. Dynamic flexible job-shop scheduling by multi-agent reinforcement learning with reward-shaping. Adv. Eng. Inform. 2024, 62, 102872. [Google Scholar] [CrossRef]
  35. Huang, X.; Zhang, X.; Ai, Y. ACO integrated approach for solving flexible job-shop scheduling with multiple process plans. Comput. Integr. Manuf. Syst. 2018, 24, 558–569. [Google Scholar]
  36. Zhang, G.; Yan, S.; Lu, X.; Zhang, H. Improved Hybrid Multi-Objective Ant Colony Algorithm for Flexible Job Shop Scheduling Problem with Transportation and Setup Times. Appl. Res. Comput. 2023, 40, 3690–3695. [Google Scholar] [CrossRef]
  37. Lu, S.J.; Wang, Y.Q.; Kong, M.; Wang, W.Z.; Tan, W.M.; Song, Y.X. A Double Deep Q-Network framework for a flexible job shop scheduling problem with dynamic job arrivals and urgent job insertions. Eng. Appl. Artif. Intell. 2024, 133, 22. [Google Scholar] [CrossRef]
  38. Meng, F.; Guo, H.; Yan, X.; Wu, Y.; Zhang, D.; Luo, L. Solving Flexible Job Shop Joint Scheduling Problem Based on Multi-Agent Reinforcement Learning. Comput. Integr. Manuf. Syst. 2024, 30, 1–29. [Google Scholar] [CrossRef]
  39. Meng, L.L.; Zhang, C.Y.; Ren, Y.P.; Zhang, B.; Lv, C. Mixed-integer linear programming and constraint programming formulations for solving distributed flexible job shop scheduling problem. Comput. Ind. Eng. 2020, 142, 13. [Google Scholar] [CrossRef]
  40. Tang, H.; Xiao, Y.; Zhang, W.; Lei, D.; Wang, J.; Xu, T. A DQL-NSGA-III algorithm for solving the flexible job shop dynamic scheduling problem. Expert Syst. Appl. 2024, 237, 121723. [Google Scholar] [CrossRef]
  41. Zain, M.Z.B.; Kanesan, J.; Chuah, J.H.; Dhanapal, S.; Kendall, G. A multi-objective particle swarm optimization algorithm based on dynamic boundary search for constrained optimization. Appl. Soft Comput. 2018, 70, 680–700. [Google Scholar] [CrossRef]
  42. Nadimi-Shahraki, M.H.; Taghian, S.; Mirjalili, S. An improved grey wolf optimizer for solving engineering problems. Expert Syst. Appl. 2021, 166, 25. [Google Scholar] [CrossRef]
  43. Yuan, Y.; Xu, H. Multi objective Flexible Job Shop Scheduling Using Memetic Algorithms. IEEE Trans. Autom. Sci. Eng. 2015, 12, 336–353. [Google Scholar] [CrossRef]
  44. Luo, S. Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning. Appl. Soft Comput. 2020, 91, 17. [Google Scholar] [CrossRef]
  45. Zhang, M.; Lu, Y.; Hu, Y.X.; Amaitik, N.; Xu, Y.C. Dynamic Scheduling Method for Job-Shop Manufacturing Systems by Deep Reinforcement Learning with Proximal Policy Optimization. Sustainability 2022, 14, 5177. [Google Scholar] [CrossRef]
  46. Li, X.; Gao, L. An effective hybrid genetic algorithm and tabu search for flexible job shop scheduling problem. Int. J. Prod. Econ. 2016, 174, 93–110. [Google Scholar] [CrossRef]
Figure 1. Rescheduling strategy.
Figure 1. Rescheduling strategy.
Mathematics 13 00932 g001
Figure 2. Flowchart of the ACO Algorithm.
Figure 2. Flowchart of the ACO Algorithm.
Mathematics 13 00932 g002
Figure 4. Main parameters SNR line chart.
Figure 4. Main parameters SNR line chart.
Mathematics 13 00932 g004
Figure 5. Boxplots of average SP, IGD, and HV metrics for MWK02, MWK05, and MWK10 instances.
Figure 5. Boxplots of average SP, IGD, and HV metrics for MWK02, MWK05, and MWK10 instances.
Mathematics 13 00932 g005aMathematics 13 00932 g005b
Figure 6. Comparison of average objective values for ACODDQN and six algorithms across MWK01, MWK02, MWK04, and MWK10 instances.
Figure 6. Comparison of average objective values for ACODDQN and six algorithms across MWK01, MWK02, MWK04, and MWK10 instances.
Mathematics 13 00932 g006aMathematics 13 00932 g006b
Figure 7. Assembly and processing flowchart.
Figure 7. Assembly and processing flowchart.
Mathematics 13 00932 g007
Figure 8. Objective value iteration diagram of the algorithm: (a) Minimize the completion time iteration curve; (b) minimize the delay time iteration curve.
Figure 8. Objective value iteration diagram of the algorithm: (a) Minimize the completion time iteration curve; (b) minimize the delay time iteration curve.
Mathematics 13 00932 g008
Figure 9. Initial scheduling Gantt chart.
Figure 9. Initial scheduling Gantt chart.
Mathematics 13 00932 g009
Figure 10. Gantt chart of rescheduling after failure.
Figure 10. Gantt chart of rescheduling after failure.
Mathematics 13 00932 g010
Figure 11. Gantt chart of maintenance worker scheduling.
Figure 11. Gantt chart of maintenance worker scheduling.
Mathematics 13 00932 g011
Table 1. Summary of the existing literature.
Table 1. Summary of the existing literature.
WorkStateDynamic EventObjective AlgorithmProblem
Chen et al.
(2024) [25]
DiscreteJob random arrivalTotal tardiness;
Total energy consumption
Rainbow DQNFJSP
Peng et al.
(2025) [26]
DiscreteRandom processing time;
Variable production
Makespan;
Total energy consumption
Efficient meme algorithm (EMA)FJSP
Su et al.
(2024) [27]
DiscreteRandom processing timeMakespan;
Maximum load;
Machine workload
GRLFJSP
Liu et al.
(2020) [28]
DiscreteMachine breakdowns;
Random processing time
MakespanDeep deterministic policy
gradient
JSP
Wang et al. (2021) [29]DiscreteMachine breakdowns;
Random processing time
MakespanPPOJSP
Luo et al.
(2021) [30]
ContinuousRandom job insertionMakespan;
Total tardiness;
Average machine utilization
THDQNJSP
Liu et al.
(2020) [31]
DiscreteMachine breakdowns;
Emergency Orders
MakespanActor-CriticJSP
Palacio et al. (2022) [32]DiscreteMachine breakdowns;
Emergency Orders
MakespanQ-learningJSP
Gui et al.
(2023) [33]
ContinuousRandom processing time;
Job random arrival
Mean tardinessDeep deterministic policy
gradient
FJSP
Zhang et al.
(2024) [34]
DiscreteJob random arrivalMean tardinessDueling double-depth Q-networkFJSP
This paperContinuousJob random arrival;
Random processing time;
Machine breakdowns
Minimum tardiness;
Makespan
ACODDQNFJSP
Table 2. Main variables.
Table 2. Main variables.
NotationDescription
Indexes:
i Job index, i {1,2,…, n }
j Operational index, j {1,2,…, n i }
k Machine index, k {1,2,…, m }
w Maintenance personnel index, w {1,2,…, q }
Parameters:
n Total number of jobs
m Total number of machines
n i Number of operations of job i
q Number of maintenance personnel
J i The i t h job
O i j The j t h operation of job J i
( t i j k ) s Process O i j machining start time of machine k
( t i j k ) f Procedure O i j machining end time of machine k
t i j Processing time of the workpiece O i j
( t w k ) s Maintenance worker w at the beginning of machine k maintenance
( t w k ) f Maintenance worker w at the end of machine k maintenance
t w k Time required for maintenance personnel w to repair machine k
p k Last time machine k can be processed
m k Machine k last serviceable time
m w Worker w recent repairable moments
T k Machine k continuous working time
C Production cycle time
D i Deadline completion time of job i
D T i Delay time of job i
C i Completion time of job i
U P i Urgency of job i , U P i ∈ [1,3]
Decision variables:
X i j p s k 1 if machine k preferentially process operation O i j and 0 otherwise
X w k g 1 if Maintenance worker w repairs machine k first and 0 otherwise
X i j k 1 if operation O i j fails on machine k and 0 otherwise
Table 3. Combined strategies of four composite scheduling rules.
Table 3. Combined strategies of four composite scheduling rules.
Composite Scheduling RuleJob
Scheduling Rule
Machine Scheduling RuleMaintenance Worker Scheduling Rule
Rule1J1M1W1
Rule2J1M2W1
Rule3J1M1W2
Rule4J1M2W2
Table 4. Parameters of numerical instances.
Table 4. Parameters of numerical instances.
Instance t w k n i m q t ij t w k U P i
MWK01106103[1, 25][1, 18][1, 3]
MWK02106105[1, 25][1, 18][1, 3]
MWK0315983[1, 19][1, 20][1, 3]
MWK041514155[1, 35][1, 20][1, 3]
MWK051515105[4, 29][1, 24][1, 3]
MWK0615943[1, 19][2, 24][1, 3]
MWK07205103[1, 21][2, 29][1, 3]
MWK08207105[1, 21][1, 31][1, 3]
MWK09209153[4, 27][1, 27][1, 3]
MWK102012155[5, 39][4, 30][1, 3]
MWK1130963[10, 29][2, 28][1, 3]
MWK12307105[10, 29][2, 28][1, 3]
MWK133015155[10, 29][1, 27][1, 3]
MWK14309155[10, 29][1, 24][1, 3]
MWK153011255[10, 29][3, 16][1, 3]
Table 5. Orthogonal test parameters.
Table 5. Orthogonal test parameters.
Parameter Parameter Level
12345
m 100125150175200
ρ 0.10.20.30.40.5
α 0.010.020.030.040.05
γ 0.70.750.80.850.9
ε 0.10.20.30.40.5
Table 6. Comparative analysis of average SP, IGD, and HV scores for ACODDQN and composite scheduling rules.
Table 6. Comparative analysis of average SP, IGD, and HV scores for ACODDQN and composite scheduling rules.
Instance SPIGDHV
ACO
DDQN
Rule1Rule2Rule3Rule4ACO
DDQN
Rule1Rule2Rule3Rule4ACO
DDQN
Rule1Rule2Rule3Rule4
MWK
01
0.0010.1560.2440.2230.2780.0010.1730.2830.24920.2490.3170.1270.0730.0980.012
MWK
02
0.0060.1780.3070.2990.4180.0080.2140.3180.3240.4150.4120.2210.0840.1780.061
MWK
03
0.0180.2170.3170.2640.3480.0150.2310.3210.2790.3770.4260.2010.1010.1650.081
MWK
04
0.0870.1240.2450.1990.2680.1110.2780.3870.2940.4760.5780.3120.0880.2110.045
MWK
05
0.0170.1880.2760.2640.4120.0270.2150.2980.2780.4650.3190.1170.0640.0710.012
MWK
06
0.1680.2480.3240.3450.3980.1740.1410.3640.3040.5340.4680.1210.0410.0720.019
MWK
07
0.0750.1260.2080.1670.3450.1080.1470.2120.1880.3780.3710.1460.0730.1190.038
MWK
08
0.0010.1480.2740.2110.4420.0030.1670.2890.2410.4540.3140.1460.0690.1120.013
MWK
09
0.0780.1240.2780.2250.3540.1150.1970.3070.2590.3940.6170.4120.2170.3160.110
MWK
10
0.0010.1240.2790.1880.3170.0010.1210.2970.2260.3240.4070.2010.0620.1230.032
MWK
11
0.1120.1760.2360.1850.3180.1280.1890.2430.1980.3370.5090.3470.1240.1730.081
MWK
12
0.1120.0980.2470.1640.3240.1170.1970.3780.2410.4890.4620.2170.1140.1620.046
MWK
13
0.2240.2880.3980.3270.4120.2450.3170.4080.3870.4890.5860.3170.1140.2650.072
MWK
14
0.1150.2280.3640.3080.3780.1670.2910.3780.3220.4080.4960.3280.1020.2670.041
MWK
15
0.1160.3170.4020.3790.4090.2180.3470.4660.4610.5070.6170.3180.1060.2170.071
Table 7. Average running time of ACODDQN algorithm and four compound scheduling rules.
Table 7. Average running time of ACODDQN algorithm and four compound scheduling rules.
T/sInstanceMWK01MWK02MWK03MWK04MWK05MWK06MWK07MWK08MWK09MWK10MWK11MWK12MWK13MWK14MWK15
ACO
DDQN
525568636773815976687275868995
Rule162697798107112134176136116136199279185221
Rule27285103136175178196335229207346319514346389
Rule3697779104120157180246161176196208364264251
Rule489109145180214267349368346380426657616765796
Table 8. Comparison of average SP and IGD scores between ACODDQN and six algorithms.
Table 8. Comparison of average SP and IGD scores between ACODDQN and six algorithms.
InstanceSPIGD
ACO
DDQN
IGWONSGA
II
PPOMO
PSO
DDQNGAACO
DDQN
IGWONSGA
II
PPOMO
PSO
DDQNGA
MWK010.0650.1310.2110.0740.2510.0830.3270.0180.2260.2660.1850.3250.2030.350
MWK020.0500.3350.4310.3320.4970.3470.5250.2390.4080.4300.3930.3840.3960.411
MWK030.0590.1810.2250.0510.3710.1020.2450.1110.2630.2920.1520.3290.1960.367
MWK040.0360.2110.2940.0470.3360.1190.4060.1980.3860.4560.1240.5610.2130.536
MWK050.0240.2200.3880.1640.4470.2290.4380.1950.3450.4720.2470.4970.3460.521
MWK060.0490.1470.1990.0250.2810.0880.3340.1460.2570.2800.1970.3130.2130.282
MWK070.0410.1540.2530.0720.3310.1020.2930.1340.2120.2730.1770.3950.1930.369
MWK080.0800.3620.4760.1980.5320.2990.5540.1560.4460.5010.2020.5400.3400.465
MWK090.0900.1760.2920.1040.3340.1330.3450.1890.4540.5210.1920.5890.2210.581
MWK100.1760.3550.4710.2790.5170.3510.5040.2880.4380.4730.3440.5020.3670.393
MWK110.2370.5670.6690.3150.7120.4720.6940.3990.5780.6720.3470.7340.4860.674
MWK120.1090.3240.3770.2690.4070.3120.4270.1560.3560.3980.2770.4260.3410.455
MWK130.2050.3310.3410.2470.4110.2670.4310.2320.3420.3590.2610.4350.2790.410
MWK140.1670.3780.3810.2220.4150.2710.3650.1880.3940.4160.2640.5070.2890.448
MWK150.1160.3450.3880.2690.4160.3050.4300.1360.3670.4110.2970.4650.3240.402
Table 9. Comparison of average HV scores between ACODDQN and six algorithms.
Table 9. Comparison of average HV scores between ACODDQN and six algorithms.
InstanceMWK01MWK02MWK03MWK04MWK05MWK06MWK07MWK08MWK09MWK10MWK11MWK12MWK13MWK14MWK15
HVACO
DDQN
0.6110.3470.3710.6240.5810.3900.2810.5130.3270.1420.4170.1910.1370.2880.131
IGWO0.2340.0130.1910.0120.2140.0170.0980.3110.0590.0380.1190.0460.0760.1240.056
NSGAII0.1070.0040.0510.010.2050.0120.0830.1370.0350.0230.0840.0080.0360.1070.035
PPO0.5520.1350.2620.2610.4760.3900.1360.5280.1820.1170.3230.1290.1240.2140.112
MOPSO0.0040.0060.0320.0040.3650.0120.0810.1340.0270.0170.0420.0050.0260.0820.032
DDQN0.4950.1110.2140.2070.1010.3120.1230.3520.1920.0540.3170.0920.1010.1950.081
GA0.0030.0010.0180.0060.0930.0020.0950.1150.0300.0290.0440.0250.0210.1310.035
Table 10. Average running time of ACODDQN and six algorithms.
Table 10. Average running time of ACODDQN and six algorithms.
InstanceMWK01MWK02MWK03MWK04MWK05MWK06MWK07MWK08MWK09MWK10MWK11MWK12MWK13MWK14MWK15
T/sACO
DDQN
525568636773815976687275868995
IGWO565794728814715368162105160163162161188
NSGA
II
73761089610017220091212124196206212193228
PPO81104162136142236241124294180246270294239305
MOPSO968212110911021015999164132189178164168197
DDQN102105122121117227235130265129214239265205265
GA152166183172160246267143302135236251301246368
Table 11. Processing machine information.
Table 11. Processing machine information.
JiOijm1m2m3m4m5m6JiOijm1m2m3m4m5m6
J1O11---2216-J6O61 2716 15
O1212-212916-O62 132528 26
O131921-2818-O632626 272013
O1417---2317J7O71-202217-18
J2O212128--1916O7215 19 28
O22-1521 20-O73 14 241627
O2325191853-26J8O812825 21 20
J3O3124 20222225O82 1928 19
O32 21 2323O8320 22 18
O332317 19 J9O91 172524 29
O34 3228 2921O9220 212522
J4O41 181621 O93 18 16 18
O42 21 242617O9421 25 2923
O43 27 182022J10O101 20 1824
J5O51232417 2117O10.2151819181622
O5230 12232526O10.314 12 16
O53 22 18 19O10.4 2324 27
O54212623 27 O10.522 1718 19
Table 12. Repairable machine information.
Table 12. Repairable machine information.
Mim1m2m3m4m5m6
Wi
W1 12 18 15
W281316 1924
W3915121622
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lu, J.; Zhang, J.; Cao, J.; Xu, X.; Shao, Y.; Cheng, Z. Flexible Job Shop Dynamic Scheduling and Fault Maintenance Personnel Cooperative Scheduling Optimization Based on the ACODDQN Algorithm. Mathematics 2025, 13, 932. https://doi.org/10.3390/math13060932

AMA Style

Lu J, Zhang J, Cao J, Xu X, Shao Y, Cheng Z. Flexible Job Shop Dynamic Scheduling and Fault Maintenance Personnel Cooperative Scheduling Optimization Based on the ACODDQN Algorithm. Mathematics. 2025; 13(6):932. https://doi.org/10.3390/math13060932

Chicago/Turabian Style

Lu, Jiansha, Jiarui Zhang, Jun Cao, Xuesong Xu, Yiping Shao, and Zhenbo Cheng. 2025. "Flexible Job Shop Dynamic Scheduling and Fault Maintenance Personnel Cooperative Scheduling Optimization Based on the ACODDQN Algorithm" Mathematics 13, no. 6: 932. https://doi.org/10.3390/math13060932

APA Style

Lu, J., Zhang, J., Cao, J., Xu, X., Shao, Y., & Cheng, Z. (2025). Flexible Job Shop Dynamic Scheduling and Fault Maintenance Personnel Cooperative Scheduling Optimization Based on the ACODDQN Algorithm. Mathematics, 13(6), 932. https://doi.org/10.3390/math13060932

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop