Next Article in Journal
Semi-Analytical Closed-Form Solutions of the Ball–Plate Problem
Previous Article in Journal
Bacterial Cellulose in Food Packaging: A Bibliometric Analysis and Review of Sustainable Innovations and Prospects
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Enhanced Multi-Objective Evolutionary Algorithm with Reinforcement Learning for Energy-Efficient Scheduling in the Flexible Job Shop

by
Jinfa Shi
,
Wei Liu
* and
Jie Yang
School of Management and Economics, North China University of Water Resources and Electric Power, Zhengzhou 450046, China
*
Author to whom correspondence should be addressed.
Processes 2024, 12(9), 1976; https://doi.org/10.3390/pr12091976
Submission received: 30 August 2024 / Revised: 10 September 2024 / Accepted: 10 September 2024 / Published: 13 September 2024
(This article belongs to the Section Manufacturing Processes and Systems)

Abstract

:
The study of the flexible job shop scheduling problem (FJSP) is of great importance in the context of green manufacturing. In this paper, with the optimization objectives of minimizing the maximum completion time and the total machine energy consumption, an improved multi-objective evolutionary algorithm with decomposition (MOEA/D) based on reinforcement learning is proposed. Firstly, three initialization strategies are used to generate the initial population in a certain ratio, and four variable neighborhood search strategies are combined to increase the local search capability of the algorithm. Second, a parameter adaptation strategy based on Q-learning is proposed to guide the population to select the optimal parameters to increase diversity. Finally, the performance of the proposed algorithm is analyzed and evaluated by comparing Q-MOEA/D with IMOEA/D and NSGA-II through different sizes of Kacem and BRdata benchmark cases and production examples of automotive engine cooling system manufacturing. The results show that the Q-MOEA/D algorithm outperforms the other two algorithms in solving the energy-efficient scheduling problem for flexible job shops.

1. Introduction

The job shop scheduling problem (JSP) is the core part of the production management of the job shop system, which plays an important role in processing and manufacturing and making production decisions, and the efficient scheduling program helps the enterprise to improve production efficiency. The multi-objective flexible job shop scheduling problem (MO-FJSP) enables the simultaneous optimization of multiple indicators in the intelligent manufacturing process. By considering issues such as machine energy consumption in solving the problem of process sequencing, it aligns better with the practical production requirements. As a result, numerous scholars both domestically and internationally have conducted extensive research on this subject.
The MO-FJSP plays a key role in active production scheduling and distributed scheduling. With the development of artificial intelligence and swarm intelligence optimization algorithms, all kinds of new heuristic algorithms have been widely applied to solve the MO-FJSP, such as the grey-wolf algorithm, NSGA-III, and so on. Wei et al. [1] proposed a multi-objective hybrid evolutionary algorithm and designed a multi-dimensional encoding and decoding scheme to represent the feasible solution of the MO-FJSP. Chen et al. [2] proposed a hybrid adaptive differential evolutionary algorithm that combines an elite selection strategy using Pareto dominance relation and a simulated annealing algorithm to improve the performance of the algorithm. In addressing the impact of processing auxiliary tasks on traditional low-carbon scheduling in flexible workshops, Jin et al. [3] proposed an improved NSGA-II algorithm based on dominance strength. The algorithm aims to optimize the maximum completion time, carbon emissions, and machine energy consumption as objectives while solving the model. Pei et al. [4] proposed an improvement on the shortcomings of the backtracking search algorithm to prevent the population from falling into a local optimum during the iterative process by combining the dynamic control of the variation operation with changing the individual search magnitude factor.
In recent years, research on reinforcement learning has gained attention among scholars. By incorporating reinforcement learning mechanisms into various optimization algorithms, agents interact with the environment iteratively, allowing the algorithms to autonomously adapt to different solving environments. Du et al. [5] used deep reinforcement learning to learn and select heuristic local search operators, significantly enhancing the performance of distributed estimation algorithms. Zeng et al. [6] proposed a self-learning taboo search algorithm (DSLTS) based on deep reinforcement learning to address the dynamic adjustment of key parameters in flexible job shop scheduling algorithms. Lin et al. [7] employed the DQN algorithm to autonomously adjust the parameters of the grey-wolf optimization algorithm, accelerating its convergence. Lu et al. [8] addressed the flexible job shop scheduling problem in the context of green manufacturing. They established a multi-objective integer programming model aimed at minimizing the completion time, machine energy consumption, and workshop energy consumption. Furthermore, they proposed an improved, fast, non-dominated sorting genetic algorithm based on Q-learning to solve the problem. Pan et al. [9] introduced a feedback mechanism to adjust the population sizes of dual populations based on evolutionary performance, enhancing the convergence of the dominant population.
Integrating RL into the MOEA/D enables the algorithm to adaptively select the optimal neighborhood search strategies or adjust the algorithm parameters. For instance, RL can learn to choose different weight vectors at various stages of the search or dynamically adjust the scope of neighborhood searches during the process to enhance the algorithm’s global search capability and local search ability. By combining the MOEA/D with RL, intelligent scheduling strategies can be developed that optimize multiple scheduling objectives simultaneously. These strategies not only improve production efficiency but also significantly reduce energy consumption, leading to greener and more sustainable manufacturing practices.
In light of this, this study proposes an improved multi-objective evolutionary algorithm with decomposition based on Q-learning (Q-MOEA/D), aimed at solving the energy-saving scheduling problem in flexible job shops. By incorporating the strategy optimization of reinforcement learning algorithms, not only is the search efficiency of the algorithm enhanced, but its adaptability and robustness in volatile environments are also strengthened. Moreover, by considering energy saving as one of the key objectives of scheduling optimization, this study not only focuses on improving production efficiency but also commits to achieving green manufacturing, responding to the global call for energy conservation, emission reduction, and sustainable development.
The main contributions of this paper are as follows: (1) A combination of three initialization strategies is designed to generate the initial population in a certain proportion, which helps to accelerate the population convergence and improve the accuracy of the final solution. (2) A self-learning tuning strategy for parameters based on reinforcement learning is proposed to help MOEA/D select appropriate T and improve the diversity of optimal solutions. (3) A variable neighborhood search combining four local search strategies is designed. (4) The proposed algorithm is applied to a production example of an engine plant, which is used to verify the effectiveness of the algorithm.

2. Basic Theory

2.1. The MOFJSP Model

2.1.1. Description of the Problem

The MOFJSP needs to solve individual workpieces J 1   , J 2 , J 3 , J n and the problem of the machining sequence M 1   , M 2 , M 3 , M k k 1 , 2 , 3 , m . In this case, each workpiece J i ( i 1 , 2 , , n has a p i process, in addition to the process O i j   j 1 , 2 , , p i ˙ . The machining time is determined by the performance of the selected machine. The goal of scheduling is to optimise the desired multiple performance metrics while satisfying the machining constraints. In addition, the following constraints need to be satisfied during machining:
  • At any given time, a machine can only process one job.
  • At any given time, a machine can only process one operation of the same job.
  • Once an operation of a job begins, it cannot be interrupted.
  • All jobs have equal priority.
  • There are no precedence constraints between operations of different jobs, but there are precedence constraints between operations of the same job.
  • All jobs can be processed at time zero.

2.1.2. Mathematical Modelling

This paper focuses on the simultaneous optimization of two evaluation metrics, namely the maximum completion time C m and minimum total machine energy consumption W t . These two objectives are commonly used in practical production settings. The maximum completion time is crucial to the production cycle of each job, while the machine energy consumption relates to the overall utilization rate of machines in the workshop.
Minimizing the maximum completion time: The completion time refers to the time when the last operation of each job is finished. Among these completion times, the maximum is known as the maximum completion time. It is the fundamental metric for evaluating scheduling plans, reflecting the production efficiency of the workshop. It is also one of the most widely used performance indicators in MOFJSP research.
minC m = max 1 i n ( min i j p i C ij )
Minimizing total machine energy consumption: Variations in the processing times across different machines result in a diverse energy consumption profile, which is contingent upon the scheduling strategy employed. Specifically, when the maximum completion time is held constant, prioritizing the reduction in total energy consumption becomes a strategic objective. This approach not only aligns with sustainability goals but also contributes to cost efficiency within manufacturing operations. Thus, minimizing the aggregate energy consumption of all machines emerges is a critical component in the multi-objective optimization framework, particularly in scenarios where energy conservation is a key performance criterion.
minW t = i = 1 n j = 1 p i k = 1 m T ij k
S i j + T i j k = C i j , i , j , k
C i j S i j + 1 ,   i   ,   j
S t f + T t f k S h g + L 1 β t j h g k , i , h , j , g , k
k = 1 m α ij k = 1 , i . j
S i j 0 , C i j 0
In Equation (1), p i represents the number of operations for job i , and C i j represents the completion time of operation   O i j . In Equation (2), T i j k represents the processing time of operation O i j on machine k . Equation (3) indicates that the processing of operations cannot be interrupted. S i j represents the start time of operation O i j . Equation (4) represents the constraint on the sequence of operations. Equation (5) imposes a restriction on the simultaneous processing of multiple operations on a machine. Here, β t j h g k is a decision variable. If operation O i j precedes operation O h g in processing on machine k , then   β t j h g k = 1. Otherwise β t j h g k = 0. L is a sufficiently large positive number. Equation (6) indicates that an operation can only be processed on one machine. In this equation, α i j k is a decision variable. If operation O i j   is processed on machine k , then α i j k = 1. Otherwise, α i j k = 0. Equation (7) states that all parameter variables must be non-negative.

2.2. The MOEA/D

The MOEA/D is a decomposition-based multi-objective evolutionary algorithm proposed by Zhang [10] in 2007. It is one of the effective methods for solving multi-objective optimization problems, especially when facing complex problems with multiple conflicting objectives.
By decomposing a multi-objective optimization problem into a series of single-objective sub-problems and using the neighborhood structure information, these sub-problems are optimized in parallel by evolutionary algorithmic operations such as selection, crossover, and mutation. Algorithmic features include efficient parallel processing capabilities, flexibility in choosing the appropriate decomposition strategy based on problem characteristics, and broad applicability. The MOEA/D ensures that the optimal solution to the sub-problems is iteratively improved in each generation by updating the strategy, but its performance can be challenged by parameter settings and large-scale problems. This algorithm is remarkably effective and adaptable in solving complex multi-objective problems in a variety of domains.

2.3. Reinforcement Learning

Reinforcement learning (RL) is a machine learning method that mimics the cognitive and trial-and-error processes of human learning [11]. Unlike supervised and unsupervised learning, RL possesses the ability to learn independently, starting from scratch. This approach is particularly suitable for solving sequential decision-making problems, where traditional dynamic programming methods often struggle when faced with large state spaces or unknown state transition matrices or reward functions [12]. In contrast, RL does not require prior knowledge of the state transition probability matrix nor exhaustively searching the state space, making it effective in handling complex sequential decision-making problems.
The core idea of RL is to learn how to make optimal decisions through the interaction between an agent and an environment [13,14]. In RL, the agent takes actions in the environment, and the environment provides feedback in the form of a reward signal. The agent’s objective is to explore different strategies and find an optimal policy that maximizes the cumulative reward obtained over a series of actions. The interaction process between the RL agent and the environment is illustrated in Figure 1.
When solving a problem with reinforcement learning, it is necessary to describe the problem as a Markov Decision Process (MDP), characterized by a conditional probability distribution of future states that depends only on the current state, given the present state and all past states. It is common to use the quintuple   S , A , R , P , γ to describe the process, with S denotes the state space;   A denotes the action space;   R is the reward obtained for choosing action a in the current state s to transfer to the next state s .
R s , a = E R t + 1 | S t = s , A t = a
P   is the state transfer probability,
P s s a = P S t + 1 = s | S t = s , A t = a
γ is the attenuation factor that γ 0 , 1 .

3. The Q-MOEA/D Algorithm

3.1. Coding and Decoding Mechanisms

Encoding refers to representing the solution of a problem with a code, aligning the problem’s state space with the encoding space of an algorithm. The purpose of encoding is to enable genetic operations, such as crossover and mutation, similar to those seen in the biological realm. The most common encoding method for the MO-FJSP is MSOS encoding, which consists of two parts: the machine selection (MS) and the operations sequencing (OS). It is shown in Figure 2:
In the OS segment, each gene is directly encoded with a job number. The order in which the job numbers appear indicates the sequential processing order between operations of the respective job. For example, when compiling the chromosome from left to right, the i -th occurrence of job number j represents the i -th operation of job j , and the total number of occurrences of each job number is equal to the total number of operations for that particular job. In the MS segment, the job numbers and their corresponding operations are arranged sequentially. Each integer represents the order of the currently selected machine for the respective operation from the set of available machines.
Decoding involves traversing the OS vector from left to right to obtain the processing sequence. Subsequently, the MS vector is traversed from left to right to determine the machine selection for each operation. Each operation is scheduled and processed on the corresponding machine. The start time and completion time of each operation are calculated, along with the waiting time at each processing position. This ensures that the start times adhere to the constraints of the MO-FJSP model. Finally, the total machine energy consumption and maximum completion time can be obtained from the current solution.

3.2. Population Initialization

A well-initialized population can reduce computational efforts by narrowing down the search space and accelerating the attainment of optimal solutions. To achieve a high-quality initialization population, this study proposes a hybrid initialization strategy consisting of three heuristic initialization rules:
Rule 1: Minimum Processing Time: Randomly generate the OS segment and select the machine with the minimum processing time for each operation. Choosing the minimum processing time allows for proximity to the lower bound of C m .
Rule 2: Minimum Machine Energy Consumption: Randomly generate the OS segment and select the machine with the minimum energy consumption from the current scheduling candidate machines for each operation. If two machines have the same energy consumption, select the one with the shorter processing time.
Rule 3: Random Initialization: Randomly generate the OS segment and choose a random machine from the candidate machines for each operation.
The specific steps of the hybrid population initialization strategy proposed in this paper are as follows:
Step 1: Generate a subpopulation P 1 , comprising 1/10 of the total population, using Rule 1.
Step 2: Generate another subpopulation P 2 , comprising 1/10 of the total population, using Rule 2.
Step 3: Generate a subpopulation P 3 , comprising 8/10 of the total population, using Rule 3.
Step 4: Merge the populations P = P 1     P 2     P 3 to get the final population. This approach combines the fast construction of well-converged subpopulations through heuristic rules with the random initialization to ensure population diversity. It effectively balances convergence and diversity, resulting in a high-quality initialization population.

3.3. Population Renewal Strategies

In the crossover operation strategy, two improved crossover operators are utilized: Improved Process-based Order Crossover (IPOX) and Machine-based Multi-Point Reserve Crossover (MPX).
IPOX (Improved Process-based Order Crossover): As shown in Figure 3, consider a case with 5 jobs and 12 operations. Let the parent chromosomes be denoted as P 1 and P 2 , and the offspring chromosomes generated through crossover as C 1 and C 2 . Firstly, randomly partition all jobs into two non-empty sets, A and B . For example, let A = J 1 , J 3 , B = J 2 , J 4 ,   J 5 . Then, copy the corresponding jobs from P 1 to C 1 that match the positions of A , while maintaining the original order. Similarly, copy the corresponding jobs from   P 2 to C 2 that match the positions of B , also preserving their original order. Next, copy the corresponding jobs from P 2 to C 1 that match the positions of B , and copy the corresponding jobs from P 1 to C 2   that match the positions of A , while preserving the order of the copied jobs.
MPX (Machine-based Multi-Point Reserve Crossover): As depicted in Figure 4, consider parent machine chromosomes M 1 and M 2 . After the crossover, the offspring chromosomes C 3 and C 4 are generated. Firstly, generate an array C with the same length as the parent machine chromosomes, filled randomly with 0 and 1. Traverse the C array and record the positions where 1 appears. Copy the corresponding jobs from M 1 to C 4 at the positions where 1 appears, while copying the remaining jobs from M 2 at the corresponding positions. Similarly, copy the corresponding jobs from M 2 to C 3 at the positions where 1 appears, and copy the remaining jobs from M 1 at the corresponding positions.
OS mutation operator: In this paper, we use a random two-point exchange operator to perform mutation operations on OS vectors. Firstly, two positions on the OS vector are randomly selected and two genes are exchanged. Finally, replace the original solution with the mutated one. The mutation probability is   P m .
MS mutation operator: a random replacement machine strategy is applied to the MS vector. Several processes are randomly selected to be processed by another machine different from the current one, within the range of their optional machines. The mutation probability is P m .

3.4. Neighborhood Search

The MOEA/D algorithm exhibits excellent global search capability, enabling it to quickly obtain optimized solutions. However, the neighborhood search strategy focuses on optimizing local solutions, thereby enhancing the algorithm’s local search ability and preventing it from getting trapped in local optima. Nonetheless, excessive local search can decrease the algorithm’s search efficiency [15]. In this study, four types of neighborhood actions are utilized.
For the job-based neighborhood structure:
Neighborhood Action 1: Randomly select two adjacent positions on the chromosome, exchange the corresponding sections of job chromosomes, and then randomly insert them into a chosen position on the chromosome.
Neighborhood Action 2: Randomly remove several jobs from the chromosome, and then sequentially insert them into the job encoding segment based on the order of removal.
For the machine-based neighborhood structure:
Neighborhood Action 3: For any machine encoding position in the chromosome, cyclically select a different machine from the feasible machines for the current position’s job. If the newly generated solution improves upon the original one (dominates the original solution or is mutually non-dominated), it is retained.
Neighborhood Action 4: For any machine encoding position in the chromosome, select the machine with the lowest energy consumption among the feasible machines for the current position’s job.

3.5. Self-Learning MOEA/D-Based Q-Learning

The proposed method leverages reinforcement learning (RL), specifically the Q-learning algorithm, to enhance the decision-making process within the multi-objective evolutionary algorithm based on decomposition (MOEA/D). This integration is crucial for optimizing energy consumption within the flexible job shop scheduling problem (FJSSP). The primary goal of applying RL in this context is to dynamically adjust the algorithm’s parameters to better achieve energy-saving objectives.

3.5.1. Introduction to the Q-Learning Algorithm

The Q-leaning algorithm, on the other hand, is a specific algorithm in reinforcement learning for solving Markov Decision Processes (MDPs) in discrete state and action spaces. The core idea of the Q-leaning algorithm is to guide the agent’s decision-making by learning an action-value function (Q-value function). This Q-value represents the expected value of the long-term cumulative reward that can be obtained by performing an action in a given state.
In the Q-leaning algorithm, the Q-value (also known as the action value function) is an expected value that represents the long-term cumulative reward for taking an action in a given state. Specifically, the Q-value is represented by a function Q s ,   a , where s denotes the current state and a denotes the action taken. The definition of the Q-value reflects the agent’s estimation of the expected value of the long-term reward that the agent expects to obtain after executing action a in a given state s . The Q-value includes the immediate reward of the current action and the expected value of the long-term reward of action a . The Q-value is also known as the action-value function. This expectation of a long-term reward includes the immediate reward of the current action as well as possible future rewards.
The update rule of Q-learning is to gradually update the Q-value by continuously interacting with the environment in order to eventually find the optimal action value function. The update rule of Q-learning is usually based on Bellman’s equation with the following expression:
Q S t , A t = Q S t , A t + α R t + γ max Q S t + 1 , A t Q S t , A t
The core idea of Q-learning is to continuously approximate the optimal action value function through such an iterative updating process, enabling the agent to learn an optimal strategy in its interaction with the environment. This updating rule enables the Q-learning algorithm to find the optimal strategy in an unknown environment and maximize the cumulative reward.

3.5.2. Action Definitions

In multi-objective optimization algorithms, the Pareto front is the set of optimal solutions calculated by the MOEA/D, and it reflects the algorithm’s capability. To enhance the diversity of the Pareto front, Q-learning is used to guide the population in selecting the optimal parameter T. Increasing the diversity of the entire population can improve the diversity of the genetic factors. Therefore, each iteration’s genetic factor is abstracted as an agent, reflecting the success of parameter selection. The four candidate values T = 5, 10, 15, and 20 are defined as actions [16].

3.5.3. Status Definitions

The change in the agent state gives Q-learning feedback to judge whether the action performed this time improves the diversity of the PF. To represent the agent state, two metrics, IGD and DV, are used to define convergence and diversity:
IGD ( PF true , PF known ) = i = 1 n | d i | n
DV = i = 1 N 1 | d i d ¯ | ( N 1 ) d ¯
Δ I G D = I G D i 1 I G D i
Δ D V = D V i 1 D V i
IGD (Inverted Generational Distance) reacts to the convergence of the solution, which is obtained by shooting a reference point from the real Pareto front P F t r u e to the algorithm’s solution P F k n o w n . Points are taken uniformly from the real Pareto front, and for each point on the real front, the closest point on the known Pareto front is found, and the distances between these points are summed and averaged. In Equation (11), where n denotes the number of points in the P F t r u e , d i   denotes the nearest Euclidean distance from the known frontier for each point on the true frontier in the target space. The smaller this value is, the better the comprehensive performance of the algorithm.
DV (Diversity) reflects the distribution of solutions in the solution set; a good solution set should not only cover all possible optimal solutions, but also be evenly distributed among the solutions, which can provide the decision-maker with a wider range of choices. In Equation (12), d i denotes P F k n o w n , the Euclidean distance between two adjacent points; d ¯ denotes d i , the mean value of the solution. The larger this value is, the better the distribution of the solution.
During the solution processes, Δ I G D and Δ D V , there are four combinations, or four states of agent: state I: Δ I G D > 0   a n d   Δ D V > 0 ; state II: Δ I G D > 0   a n d   Δ D V 0 ; state III: Δ I G D 0   a n d   Δ D V > 0 ; and state IV: Δ I G D 0   a n d   Δ D V 0 .

3.5.4. Definition of Incentives

Reward definition: In order to guide the algorithm to choose parameters to enhance the distributivity of the frontier surface, when the PF is updated, if the Δ D V > 0 , this means that the DV index becomes larger then the distributivity of the neighboring solutions is more uniform. As the distributivity is better, a reward is given. Δ I G D > 0 indicates that the convergence and diversity of the algorithm is better; the following reward values are set for comprehensive consideration:
R = 10       Δ I G D > 0   a n d   Δ D V > 0 10       Δ I G D 0   a n d   Δ D V > 0 2         Δ I G D > 0   a n d   Δ D V 0 0       Δ I G D 0   a n d   Δ D V 0

3.6. Algorithmic Process

The improved MOEA/D based on Q-learning (Q-MOEA/D) solves the MO-FJSP; the pseudo-code framework of the algorithm is shown in Algorithm 1.
The specific steps of the algorithm are as follows:
Step 1: Input the information regarding the workpieces and equipment. Initialize the algorithm parameters: the proportion of individuals with different generation methods in the initial population, maximum number of iterations, population size, initial crossover probability, mutation probability, learning rate, discount factor, ε-greedy algorithm parameters, local search probability, and MOEA/D neighborhood range. Initialize weight vectors, neighborhoods, and the reinforcement-learning Q-table.
Step 2: Generate the initial population. The individual encoding includes two parts: job scheduling and machine selection. Generate the corresponding machining time for each machine separately. Calculate the objective function values for the initial population. Initialize the reference points, where the reference points are formed by taking the minimum values of each sub-objective function across all individuals.
Step 3: Initialize the I G D (inverted generational distance) and D V (diversity value). Determine the current state based on Δ I G D and Δ D V . Use an ε-greedy strategy to select an action within the MOEA/D neighborhood range.
Step 4: For each individual in the current population P t , perform crossover and mutation operations to generate an offspring population C t . Calculate the objective function values for the offspring population C t and update the neighborhoods. Merge the parent population P t and the offspring population C t   to form the new-generation population. Extract the non-dominant solutions from the current population.
Step 5: Calculate the convergence and diversity indicators ( Δ I G D and Δ D V ) for the current population solutions. Determine the next state and select the action with the maximum reward. Update the Q-table according to the formula (10). Check if the maximum number of iterations is reached. If yes, end the iteration; otherwise, go back to Step 3.
Algorithm 1 Q-MOEA/D
Input: population number p o p n u m ,learning rate α ,exploration rate ϵ ,
discount factor γ , number of iterations N
Output: Nondominated solutions(PF)
1:
Initialize Q t a b l e ,population,weights,neighbor
2:
for i 1 Ndo
3:
    I G D i = I G D i + 1 , D V i = D V i + 1
4:
    S ( Δ I G D , Δ D V )
5:
   if random number < ϵ  then
6:
     Select a random action A for scheduling
7:
   else
8:
     Select action A with maximum Q value for state S
9:
   end if
10:
   for  j 1 to p o p n u m  do
11:
     Select mating pool from the neighborhood of x i
12:
     Apply crossover with probability P c
13:
     Apply mutation with probability P m
14:
     Evaluate offspring solutions using decomposition
15:
     Update neighborhood with better solutions
16:
   end for
17:
    P F k n o w n P F i
18:
    I G D i + 1 I G D ( P F k n o w n ) , D V i + 1 D V ( P F k n o w n )
19:
    Δ I G D = I G D i I G D i + 1 , Δ D V = D V i D V i + 1
20:
    S ( Δ I G D , Δ D V )
21:
   Update Q value for ( S , A ) using Q-learning update rule:
22:
       Q ( S , A ) Q ( S , A ) + α [ R + γ max T Q ( S , A ) Q ( S , A ) ]
23:
end for

4. Simulation Experiment Analysis

4.1. Experimental Design

In order to test the performance of this paper’s algorithm, the Q-MOEA/D, in solving the MO-FJSP considering energy consumption, this paper uses two sets of benchmark algorithms with different characteristics, namely the Kacem algorithms [17] (Kacem01–Kacem04) and BRdata algorithms [18] (mk01–mk10). A total of 14 examples are used in this paper. In the given sets of test cases, the Kacem dataset represents a scenario of complete flexibility in scheduling, while the BRdata dataset represents a scenario of partial flexibility in scheduling. These two sets of test cases include different scales of production environments to evaluate the performance of the algorithm in handling small-, medium-, and large-scale instances. The test cases contain information about the machines available for each operation and their respective processing times ( min ). The energy consumption information is randomly generated, where the machine’s unit processing energy consumption KJ min s randomly generated within the range of 0.5 , 2 , and the idle energy consumption KJ min s randomly generated within the range of 0.1 , 0.3 .
The experiment was conducted on a 64-bit Windows 11 Home Edition operating system with an AMD Ryzen 7–5800H processor and Radeon Graphics, running at a frequency of 3.20 GHz and 16 GB of memory. The experimental environment used was MATLAB 2023a. To ensure the validity of the comparisons, all algorithms were independently run 20 times in each instance. In order to evaluate the convergence and diversity of the proposed algorithm, the average results were collected and compared for performance after running independently 20 times.

4.2. Experimental Parameters

The Q-MOEA/D algorithm contains four important parameters: population size pop-num, discount factor γ, learning rate α, and greed factor ε. In order to evaluate the effect of each parameter on the algorithm to obtain the optimal parameter combination, this paper uses Taguchi’s test based on the mk01 algorithm. The settings of each parameter are shown in Table 1.
According to the number of parameters and factor levels, an orthogonal experiment of size L16 (44) is chosen. That is, the number of trials is 16, the parameters are 4, and the factor level is 4. Setting the maximum number of iteration generations as 100, the algorithm is run independently 20 times under each parameter combination, and the HV mean of the 20 results is calculated; the test results are shown in Table 2. From comprehensive observation, it can be seen the optimal configuration for setting parameter values is pop-num = 100, γ = 0.8, α = 0.1, ε = 0.9.

4.3. Algorithm Comparison Test

To validate the effectiveness of the proposed algorithm, the comparative algorithms chosen are the improved decomposed-based multi-objective evolutionary algorithm IMOEA/D [19] and the dominance-based multi-objective optimization algorithm NSGA-II [20]. In order to analyze the distribution and convergence of the solution sets obtained by the proposed algorithm and the comparative algorithms, the hypervolume metric (HV) formula [21], the Spread, and the inverted generational distance (IGD) were determined. The IGD metric has been explained in Section 3.5.3 of this paper. HV computes the volume of the hypercube constituted between the solution set and the reference point with the following formula:
HV ( X , P ) = x X X v ( x , P )
where X is the set of non-dominated solutions obtained by the algorithm; P is the reference point of the solution set, setting the coordinates of P to (1.1,1.1); x is the normalized solution; and v denotes the volume of the hypercube surrounded by all x in the solution set and the reference point P.
Spread ( P , P * ) = i = 1 | P * | d ( P , P i * ) + X P | d ( X , P ) d ¯ | i = 1 | P * | d ( P , P i * ) + ( | P | | P * | ) d ¯ , d ( X , P ) = min Y P , Y X F ( X | ) F ( Y ) , d ¯ = 1 | P | X P d ( X , P ) ,
where P* denotes the true frontier; since the true frontier cannot be obtained in advance in the shop floor scheduling problem, P* is replaced by the set of non-dominated solutions obtained by all the comparison algorithms and focused on the non-dominated solution set. The Euclidean distance is denoted by d. The smaller the value of the Spread, the more uniformly the algorithms obtain the distribution of the frontiers, which indicates greater distributivity.
The HV, Spread, and IGD values for the algorithm comparison are presented in Table 3. The bold text indicates the optimal average HV, average Spread, and average IGD values for the current algorithm, which correspond to the largest HV value, the smallest IGD value, and the smallest Spread value. According to the analysis of the results in Table 3, the Q-MOEA/D algorithm outperforms the other algorithms in most of the arithmetic cases. Figure 5 shows the comparison of PF solutions with three different sizes of Kacem04, mk02, and mk05 selected for the arithmetic cases. Due to the Q-MOEA/D algorithm being combined with reinforcement learning, the obtained solutions have good convergence and distribution, which are significantly better than the other two algorithms. Figure 6 shows three Gantt charts representing the total completion time and total energy consumption for the Kacem04, mk02, and mk05 arithmetic cases, respectively.
Based on the analysis presented in Table 3, it is evident that the Q-MOEA/D algorithm has achieved superior results in terms of the average hypervolume (HV) and inverted generational distance (IGD) of the solution sets across various standard test cases. It fell short of achieving optimality only in the Kacem01 and mk08 test cases. Overall, the Q-MOEA/D algorithm has exhibited commendable diversity and distribution within its solution sets. Further examination of Figure 5 and Figure 6 reveals that the Q-MOEA/D algorithm has demonstrated optimal convergence of the non-dominated solution set and exhibited a well-distributed presence within the solution space.

5. Engineering Case

DF Automotive’s engine plant produces four parts, namely the adapter plate bracket, fan bracket, flange, and adapter flange, for the engine cooling system, and a schematic diagram of the engine cooling system is shown in Figure 7 [22]. The parts in the boxes are the production parts of the company.
There are eight machines in the factory shop for machining these parts, i.e., two milling machines (M1, M2), two drilling machines (M3, M4), one machining center (M5), one manual lathe (M6), and two computer numerical control (CNC) lathes (M7, M8), with the specific parameters of the machining process as shown in Table 4.
Using the same experimental environment and algorithm parameter settings as in Section 4 of this paper, the Q-MOEA/D is used with the IMOEA/D and NSGA-II to solve the production example, and each algorithm is run 20 times to obtain the average value; the results of the production example are shown in Table 5.
As can be seen from Table 5, when the Q-MOEA/D is applied to the production case, the solution results of the two indicators are better than those of several other algorithms. Since both the Q-MOEA/D and IMOEA/D adopt a mixed initialization population with multi-heuristic strategies, they can reasonably balance the machine energy consumption, improve the machine utilization rate, and reduce the waste of ineffective energy consumption. The Gantt chart of the scheduling results of the Q-MOEA/D solving this production instance is plotted, and the scheduling Gantt chart of the production instance is shown in Figure 8.
As can be seen in Figure 8, drilling machine 2 (M4) is not used throughout the whole process, while the machining center (M5) is occupied almost all the time, which reflects the special characteristics of the actual production environment; the machining center (M5) is preferred many times because of its versatility and high machining efficiency, and drilling machine 2 (M4) does not have any machining advantages compared with drilling machine 1 (M3), and it is therefore discarded in the current round of scheduling. In summary, the Q-MOEA/D can be validated in real machining environments, balancing production capacity while optimizing productivity, and achieving energy-efficient scheduling for flexible job machining.

6. Summary

This study presents an enhanced multi-objective evolutionary algorithm with decomposition (MOEA/D) integrated with reinforcement learning (RL) for energy-efficient scheduling in flexible job shops. The proposed method has demonstrated promising results in balancing production efficiency and energy consumption. In this section, we discuss the feasibility of our approach in real-time scheduling scenarios and potential areas for future enhancement.

6.1. Feasibility for Real-Time Scheduling Scenarios

The integration of the MOEA/D with RL has shown potential for real-time scheduling applications. The adaptive nature of RL allows the algorithm to respond quickly to changes in job shop conditions, such as machine breakdowns or sudden changes in job orders. However, the computational complexity of the algorithm could pose challenges in real-time environments where rapid decision-making is crucial.
To address this, we propose further research into optimizing the computational efficiency of the algorithm. This could involve developing more-efficient state representation schemes, exploring parallel computing techniques, or implementing incremental learning methods that require less-frequent updates to the model. Additionally, the development of a hybrid system that combines the strengths of our approach with other real-time scheduling methods could be beneficial.

6.2. Potential Areas for Further Enhancement

While our method has shown effectiveness in optimizing energy consumption and production efficiency, there are several areas where further enhancement could be pursued:
Dynamic Scheduling Conditions: Future work could focus on enhancing the algorithm’s capability to handle dynamic scheduling conditions, such as real-time job arrivals and machine availability changes.
Integration with IoT and Big Data: With the advent of Industry 4.0, integrating our method with Internet of Things (IoT) devices and leveraging big data analytics could provide more accurate and real-time data for decision-making.
Multi-Criteria Decision-Making: Expanding the algorithm to include additional objectives, such as cost minimization, quality optimization, or risk management, could provide a more comprehensive optimization solution.
In conclusion, the proposed MOEA/D-RL approach has demonstrated significant potential for energy-efficient scheduling in flexible job shops. While the method shows promise, continuous research and development are necessary to enhance its capabilities and expand its applicability in real-time and dynamic scheduling scenarios.

Author Contributions

Conceptualization, J.S. and W.L.; Methodology, J.S. and W.L.; Software, W.L.; Data curation, W.L. and J.Y.; Writing—original draft preparation, W.L.; validation, W.L. and J.Y.; Writing—review and editing, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research and the APC were funded by the National Natural Science Foundation of China (71371172), the Research and Practice on Higher Education Teaching Reform in Henan Province (2021SJGLX016) and the Philosophy and Social Science Planning Project of Henan Province (2022BJJ066).

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wei, G.; Ye, C. Energy-efficient scheduling of multi-objective dual-resource flexible job shop considering transfer time. Comput. Integr. Manuf. Syst. 2024, 1–29. [Google Scholar] [CrossRef]
  2. Chen, Y.; Liu, Y.; Zhou, Y. Hybrid adaptive differential evolutionary algorithm for solving multi-objective flexible job shop scheduling problems. Manuf. Technol. Mach. Tools 2023, 171–177. [Google Scholar] [CrossRef]
  3. Jin, Z.; Ji, W.; Su, X.; Tang, L. Flexible shop-floor low-carbon scheduling incorporating NSGA-II with dominant intensity. Mod. Manuf. Eng. 2023, 6–14. [Google Scholar] [CrossRef]
  4. Pei, X.; Dai, Y. Improved backtracking search algorithm for solving multi-objective flexible job shop scheduling problem. Oper. Res. Manag. Sci. 2023, 32, 9–15. [Google Scholar]
  5. Du, Y.; Li, J.; Chen, X.; Duan, P.; Pan, Q. Knowledge-based reinforcement learning and estimation of distribution algorithm for flexible job shop scheduling problem. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 7, 1036–1050. [Google Scholar] [CrossRef]
  6. Zeng, L.; Ding, L.; Guan, Z. Flexible job shop scheduling based on deep self-learning taboo search. Comput. Integr. Manuf. Syst. 2024, 1–21. [Google Scholar] [CrossRef]
  7. Lin, C.; Cao, Z.; Zhou, M. Learning-based grey wolf optimizer for stochastic flexible job shop scheduling. IEEE Trans. Autom. Sci. Eng. 2022, 19, 3659–3671. [Google Scholar] [CrossRef]
  8. Lu, X.; Han, X. Reinforcement learning based improved NSGA-II for solving the energy-saving scheduling problem of flexible job shop. Mod. Manuf. Eng. 2023, 22–35. [Google Scholar]
  9. Pan, Z.; Lei, D.; Wang, L. A bi-population evolutionary algorithm with feedback for energy-efficient fuzzy flexible job shop scheduling. IEEE Trans. Syst. Man Cybern. Syst. 2021, 52, 5295–5307. [Google Scholar] [CrossRef]
  10. Zhang, Q.; Li, H. MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 2007, 11, 712–731. [Google Scholar] [CrossRef]
  11. Chen, Z.; Chiappalupi, D.; Lin, T.; Yang, Y.; Beyer, J.; Pfister, H. RL-LABEL: A Deep Reinforcement Learning Approach Intended for AR Label Placement in Dynamic Scenarios. IEEE Trans. Vis. Comput. Graph. 2023, 30, 1347–1357. [Google Scholar]
  12. Qin, T.; Du, S.; Chang, Y.; Wang, C. ChatGPT working principle, key technology and future development trend. J. Xi’an Jiaotong Univ. 2024, 58, 1–12. [Google Scholar]
  13. Jiang, H.; Chen, T.; Cao, J.; Bi, J.; Lu, G.; Zhang, G.; Rong, X.; Li, Y. Stable skill improvement of quadruped robot based on privileged information and curriculum guidance. Robot. Auton. Syst. 2023, 170, 104550. [Google Scholar] [CrossRef]
  14. Chen, L. Research on channel state estimation algorithm for wireless communication networks based on reinforcement learning. Mod. Electron. Technol. 2023, 46, 159–162. [Google Scholar] [CrossRef]
  15. Lei, D.; Li, M.; Wang, L. A two-phase meta-heuristic for multiobjective flexible job shop scheduling problem with total energy consumption threshold. IEEE Trans. Cybern. 2018, 49, 1097–1109. [Google Scholar] [CrossRef]
  16. Li, R.; Gong, W.; Lu, C.; Wang, L. A learning-based memetic algorithm for energy-efficient flexible job-shop scheduling with type-2 fuzzy processing time. IEEE Trans. Evol. Comput. 2022, 27, 610–620. [Google Scholar] [CrossRef]
  17. Kacem, I.; Hammadi, S.; Borne, P. Approach by localization and multiobjective evolutionary optimization for flexible job-shop scheduling problems. IEEE Trans. Syst. Man Cybern. Part C 2002, 32, 1–13. [Google Scholar] [CrossRef]
  18. Brandimarte, P. Routing and scheduling in a flexible job shop by tabu search. Ann. Oper. Res. 1993, 41, 157–183. [Google Scholar] [CrossRef]
  19. Li, R.; Gong, W. Improved decomposition-based multi-objective evolutionary algorithm for solving bi-objective fuzzy flexible job shop scheduling problem. Control Theory Appl. 2022, 39, 31–40. [Google Scholar]
  20. Kalyanmoy, D. A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar]
  21. While, L.; Hingston, P.; Barone, L.; Huband, S. A faster algorithm for calculating hypervolume. IEEE Trans. Evol. Comput. 2006, 10, 29–38. [Google Scholar] [CrossRef]
  22. Yin, L.; Li, X.; Gao, L.; Lu, C.; Zhang, Z. A novel mathematical model and multi-objective method for the low-carbon flexible job shop scheduling problem. Sustain. Comput. Inform. Syst. 2017, 13, 15–30. [Google Scholar] [CrossRef]
Figure 1. RL Interaction Process.
Figure 1. RL Interaction Process.
Processes 12 01976 g001
Figure 2. Coding method.
Figure 2. Coding method.
Processes 12 01976 g002
Figure 3. Example of IPOX crossover.
Figure 3. Example of IPOX crossover.
Processes 12 01976 g003
Figure 4. MPX crossover example.
Figure 4. MPX crossover example.
Processes 12 01976 g004
Figure 5. Comparison results for Pareto front.
Figure 5. Comparison results for Pareto front.
Processes 12 01976 g005
Figure 6. The Gantt chart for solution.
Figure 6. The Gantt chart for solution.
Processes 12 01976 g006
Figure 7. Schematic diagram of an engine cooling system.
Figure 7. Schematic diagram of an engine cooling system.
Processes 12 01976 g007
Figure 8. Gantt chart for production instances.
Figure 8. Gantt chart for production instances.
Processes 12 01976 g008
Table 1. Parameterization.
Table 1. Parameterization.
ParametersRank
1234
pop-num708090100
γ0.80.850.90.95
α0.10.150.20.25
ε0.80.850.90.95
Table 2. Orthogonal table and HV mean value.
Table 2. Orthogonal table and HV mean value.
Parameter CombinationsRankHV Mean
Value
Pop-Numγαε
1700.80.10.81.2063
2700.850.150.851.2083
3700.90.20.91.2063
4700.950.250.951.2044
5800.80.150.91.2056
6800.850.10.951.2083
7800.90.250.81.2058
8800.950.20.851.2038
9900.80.20.951.2071
10900.850.250.91.2071
11900.90.10.851.2069
12900.950.150.81.2075
131000.80.250.851.2077
141000.850.20.81.2078
151000.90.150.951.2067
161000.950.10.91.2061
Table 3. HV, Spread, and IGD results for comparison with other algorithms.
Table 3. HV, Spread, and IGD results for comparison with other algorithms.
InstanceQ-MOEA/DIMOEA/DNSGA-II
HVIGDSpreadHVIGDSpreadHVIGDSpread
Kacem010.88510.010550.91250.822090.0933950.94350.886310.0573751.0145
Kacem021.174400.93560.515630.574491.020.534250.516160.9527
Kacem031.08930.005920.96110.654830.416310.94780.437920.507120.9763
Kacem041.175900.91360.915730.242580.92350.0514621.04850.9488
mk010.956880.00561.00170.422350.459950.98530.888970.0663920.9784
mk021.03200.99780.215760.671480.98740.555050.33771.0026
mk031.198601.00480.115741.11341.01120.235890.879351.0245
mk041.025400.95640.785350.205570.97440.694460.232980.9845
mk051.131700.97410.277310.635450.98780.382930.566561.0000
mk061.196300.90230.110910.976060.94560.229550.741070.9231
mk071.00400.99540.527670.292110.96480.588240.24220.9784
mk081.07410.124950.99690.185621.0.020.97991.2101.0374
mk091.118501.00160.0459811.01550.99560.428160.438231.0212
mk101.184700.97860.0407881.09890.98360.487340.507051.0001
Table 4. Parameters required for processing.
Table 4. Parameters required for processing.
JobOperationProcessing Time (min)/Energy Consumption (KJ)
Miller 1Miller 2Drilling Machine 1Drilling Machine 2Machining Centre 1Manual Lathe 1CNC1CNC2
Adapter plate bracket1–18/1.208/1.26/1.20
1–212/1.8012/1.809/1.80
1–36.0/0.906.0/0.907.0/1.40
1–49.0/1.359.0/1.354.0/0.80
1–53.0/0.453.0/0.452.7/0.54
1–64.0/1.004.0/1.003.0/0.60
1–73.0/0.753.0/0.752.0/0.40
1–83.0/0.753.0/0.752.0/0.40
1–95.0/1.255.0/1.253.8/0.76
1–104.0/1.004.0/1.002.2/0.44
1–116.0/1.506.0/1.504.3/0.86
1–124.2/0.634.2/0.633.7/0.74
1–132.0/032.0/0.301.8/0.36
Fan bracket2–13.0/0.453.0/0.452.0/0.40
2–27.0/1.057.0/1.051.8/0.36
2–35.0/0.755.0/0.752.5/0.50
2–43.0/0.453.0/0.451.8/0.36
2–53.0/0.453.0/0.451.8/0.36
2–63.0/0.753.0/0.751.8/0.36
2–72.0/0.502.0/0.501.0/0.20
2–82.0/0.502.0/0.501.0/0.20
2–95.0/1.002.0/0.672.0/0.67
2–103.2/0.641.0/0.331.0/0.33
2–112.0/0.401.0/0.331.0/0.33
2–123.0/0.753.0/0.752.2/0.44
2–133.0/0.753.0/0.752.0/0.40
2–145.0/0.755.0/0.751.8/0.36
2–153.0/0.453.0/0.451.5/0.30
Fan hub3–11.0/0.30
3–20.5/0.15
3–32.5/0.75
3–40.5/0.15
3–52.0/0.672.0/0.67
3–61.0/0.331.0/0.33
3–71.0/0.331.0/0.33
3–81.5/0.501.5/0.50
3–91.0/0.331.0/0.33
3–103.0/1.003.0/1.00
3–112.0/0.672.0/0.67
3–122.0/0.672.0/0.67
3–132.0/0.672.0/0.67
3–141.2/0.401.2/0.40
3–151.0/0.251.0/0.2
Adapter flange4–11.5/0.45
4–23.5/1.05
4–33.0/1.003.0/1.00
4–42.0/0.672.0/0.67
4–50.8/0.270.8/0.27
4–60.5/0.170.5/0.17
4–70.5/0.170.5/0.17
4–80.5/0.170.5/0.17
4–90.5/0.170.5/0.17
4–100.5/0.170.5/0.17
4–112.2/0.732.2/0.73
4–121.5/0.501.5/0.50
4–132.0/0.502.0/0.501.5/0.30
4–141.0/0.251.0/0.251.0/0.20
Machine no-load energy consumption (KJ/min)0.10.10.150.150.130.210.230.23
Table 5. Production example solution results.
Table 5. Production example solution results.
AlgorithmMaximum Completion Time (min)Total Energy Consumption (KJ)
Q-MOEA/D51.53.8621
IMOEA/D52.24.144
NSGA-II534.1572
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shi, J.; Liu, W.; Yang, J. An Enhanced Multi-Objective Evolutionary Algorithm with Reinforcement Learning for Energy-Efficient Scheduling in the Flexible Job Shop. Processes 2024, 12, 1976. https://doi.org/10.3390/pr12091976

AMA Style

Shi J, Liu W, Yang J. An Enhanced Multi-Objective Evolutionary Algorithm with Reinforcement Learning for Energy-Efficient Scheduling in the Flexible Job Shop. Processes. 2024; 12(9):1976. https://doi.org/10.3390/pr12091976

Chicago/Turabian Style

Shi, Jinfa, Wei Liu, and Jie Yang. 2024. "An Enhanced Multi-Objective Evolutionary Algorithm with Reinforcement Learning for Energy-Efficient Scheduling in the Flexible Job Shop" Processes 12, no. 9: 1976. https://doi.org/10.3390/pr12091976

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop