You are currently viewing a new version of our website. To view the old version click .
Sustainability
  • Article
  • Open Access

24 November 2021

A Q-Learning Rescheduling Approach to the Flexible Job Shop Problem Combining Energy and Productivity Objectives

,
and
LS2N UMR CNRS 6004, IUT de Nantes, Nantes University, 2 Avenue du Pr. J. Rouxel, 44470 Carquefou, France
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Sustainable Manufacturing and Supply Chain in the Context of Industry 4.0: Challenges and Opportunities

Abstract

The flexible job shop problem (FJSP) has been studied in recent decades due to its dynamic and uncertain nature. Responding to a system’s perturbation in an intelligent way and with minimum energy consumption variation is an important matter. Fortunately, thanks to the development of artificial intelligence and machine learning, a lot of researchers are using these new techniques to solve the rescheduling problem in a flexible job shop. Reinforcement learning, which is a popular approach in artificial intelligence, is often used in rescheduling. This article presents a Q-learning rescheduling approach to the flexible job shop problem combining energy and productivity objectives in a context of machine failure. First, a genetic algorithm was adopted to generate the initial predictive schedule, and then rescheduling strategies were developed to handle machine failures. As the system should be capable of reacting quickly to unexpected events, a multi-objective Q-learning algorithm is proposed and trained to select the optimal rescheduling methods that minimize the makespan and the energy consumption variation at the same time. This approach was conducted on benchmark instances to evaluate its performance.

1. Introduction

Energy consumption control is a growing concern in all industrial sectors. Controlling the energy consumption and realizing energy savings are the goals of many manufacturing enterprises. Therefore, the scheduling of a manufacturing production system must now be approached taking into account aspects relating to sustainability and energy management [1]. To implement such measures, researchers focused on developing more energy-efficient scheduling approaches to make a balance between energy consumption and system stability. In addition to that, manufacturing systems constitute dynamic environments in which several perturbations can arise. Such disturbances have negative impacts on energy consumption and system robustness and make the scheduling process much more difficult. In the literature, a lot of researchers solve the job shop problem (JSP) under different types of perturbations, they use different metaheuristics approaches like genetic algorithms [2] or particle swarm optimization [3]. Other researchers use rescheduling approaches that repair the initial disrupted schedule Like dispatching rules.
Recently, many researchers have designed reactive, dynamic, and robust rescheduling approaches using artificial intelligence. These learning-based approaches gain the knowledge of the manufacturing system to be used in the decision-making process. In this case, the rescheduling can adapt to the system’s disruption at any time. Research on reducing energy consumption in job shops has focused on energy consumption optimization in the predictive phase when building the initial schedule. The main contribution of this article is first to develop a new approach where energy consumption reduction is taken into account in the predictive and reactive phase. Second, the developed approach integrates a multi-objective machine learning algorithm to be able to react more quickly in case of disruptions (select best rescheduling method rapidly). In the predictive phase, a genetic algorithm was set to build the initial schedule, taking into consideration both energy consumption and completion time optimization. Then, to get a responsive and energy-efficient production system, a multi-objective Q-learning algorithm was developed. This algorithm selects the best rescheduling strategy that minimizes both the completion time and energy consumption in real time, depending on energy availability.
The remainder of this article is organized as follows: the next section provides a literature review on energy-aware scheduling and rescheduling methods, as well as rescheduling approaches using artificial intelligence techniques. Section 3 contains the FJSP problem formulation and the description of rescheduling methods. The Q-learning algorithm and selection of the optimal rescheduling approach are described in Section 4. The experiments and the evaluation of the approach on FJSP benchmarks are presented in Section 5. Finally, a conclusion and some future directions are provided.

3. A Dynamic Flexible Job Shop Scheduling with Energy Consumption Optimization

The FJSSP has been widely researched in recent decades due to its complexity. On top of that, dynamic events can occur frequently and randomly in job shop systems, which increases its complexity. Many metaheuristics have been proposed in literature to solve this problem. In this section, a solution to FJSSP considering energy consumption optimization is proposed. Then, corresponding rescheduling methods are proposed to handle the dynamic nature of the system.

3.1. Description of FJSSP

In FJSSP, there are n jobs that should be processed on M machines. Each job consists of a predetermined sequence of n j   operations which should be processed in a certain order. The objective of FJSSP is to assign each operation to the suitable machine and arrange the sequence of operations on each machine [36].
We define the notations used in this article to model the FJSSP:
  • J = J 1 J n is a set of n independent jobs to be scheduled.
  • O i j is the operation i of job j.
  • M = M 1 M m is a set of m machines. We denote P i j k   the processing time of operation O i j when executed on machine Mk.
FJSSP is a generalization of the job shop scheduling problem, where an operation can be processed on several machines, usually with varying costs. Here after a list of characteristics of FJSP problem:
  • Jobs are independent and no priorities are assigned to any job type.
  • Operations of different jobs are independent.
  • Each machine can process only one operation at a time.
  • Each operation can be processed without interruption during its performance on one of the set of machines.
  • There are no precedence constraints among operations of different jobs.
  • Two assumptions are considered in this work:
  • All machines are available at time 0 and the transportation time is neglected.
An example of an FJSSP instance is presented in Table 2. A processing machine and time of FJSSP includes 3 jobs and 4 machines.
Table 2. An instance of FJSSP.
A full description of the mathematical mixed integer programming (MIP) formulation for FJSP considering energy consumption proposed MIP has been proposed in [37].
Table 2 illustrates an example of a small FJSP instance.

3.2. Genetic Algorithm (GA)

In this article, we propose to use a classical GA for the initial solving of FJSSP [38]. It is an optimization method based on an evolutionary process. The performance validation of the proposed algorithm is detailed in Section 5.1.
The aim of the FJSSP is to find a feasible schedule that minimizes makespan and energy consumption at the same time. Therefore, makespan and energy consumption are integrated into one objective function (F) using a weighted sum approach. The relative importance of each objective can be modified in F, which represents the fitness of the GA. Since the values of energy consumption and makespan are not proportional, we have to normalize both measures [39]. As presented in equation 1, makespan is divided by MaxMakespan, which is the maximum makespan value for the given problem, and energy consumption is divided by the MaxEnergy, which is the sum of the energy needed to execute all tasks of the problem. λ is the weight that reflects the importance of each objective function, λ ∈ [0…1]. This weight is modified statically, in this work. A dynamic evolution of λ is out of the scope of this article, and future perspectives may consider using an agent that controls the energy availability and triggers a rescheduling order when a threshold is reached.
F = λ   × m a k e s p a n M a x M a k e s p a n + ( 1 λ ) × e n e r g y M a x E n e r g y  
A flow chart illustrating the process of the genetic algorithm is represented in Figure 1. The overall structure of GA can be described in the following steps:
Figure 1. Genetic algorithm process.
  • Encoding: Each chromosome represents a solution for the problem. The genes of the chromosomes describe the assignment of operations to the machines, and the order in which they appear in the chromosome describes the sequence of operations.
  • Tuning: The GA includes some tuning parameters that greatly influence the algorithm performance such as the size of population, the number of generations, etc. Despite recent research efforts, the selection of the algorithm parameters remains empirical to a large extent. Several typical choices of the algorithm parameters are reported in [40,41].
  • Initial population: a set of initial solution is selected randomly.
  • Fitness evaluation: A fitness function is computed for each of the individuals, this parameter indicates the quality of the solution represented by the individuals.
  • Selection: At each iteration, the best chromosomes are chosen to produce their progeny.
  • Offspring generation: The new generation is obtained by applying genetic operators like crossover and mutation
  • Stop criterion: when a fixed number of generations is reached, the algorithm ends and the best chromosome, with their corresponding schedule, is given as output. Otherwise, the algorithm iterates again steps 3–5.

3.3. Disturbances in FJSSP

FJSSP considers a large variety of disturbances. These perturbations are random and uncertain and will bring instability to the initial schedule. In this work, one of the most common and frequent disruption in production scheduling will be considered: machine failures. We will deal with these events using rescheduling methods that will be discussed in the next section. These methods will try to maintain the stability of the system.
To simulate a machine failure [3], we have to select:
  • The moment when the failure occurs (rescheduling time). These failures are randomly occurring, with a uniform distribution between 0 and the makespan of the original schedule generated with GA algorithm.
  • The machine failing.
  • The breakdown duration, which obeys to a uniform distribution between 25% and 50% of the makespan.
To simplify the problem, some assumptions about machine failures are considered:
  • There is only one broken-down machine at a time.
  • The time taken to transfer a job from the broken-down machine to a properly functioning machine is neglected.
  • Machine maintenance is immediate after the failure.

3.4. Rescheduling Strategies

One question can arise when dealing with the system disturbances, or the changed production circumstances: what kind of rescheduling methodologies should be used to produce a new schedule for the disturbance scenario? In the literature, many rescheduling methodologies were reported. Researchers classified these methods into two categories: (i) repairing a schedule that has been disrupted and (ii) creating a schedule that is more robust with respect to disruptions [42,43].
There are common methods used to repair a schedule that is no longer feasible due to disruptions: right shifting rescheduling, partial rescheduling, and total rescheduling. Their definitions are described respectively as follows [24]:
  • Right shifting rescheduling (RSR): postpone each remaining operation by the amount of time needed to make the schedule feasible.
  • Partial rescheduling (PR): reschedule only the operations affected directly or indirectly by the disturbances and preserve the original schedule as much as possible.
  • Total rescheduling (TR): reschedule the entire set of operations that are not processed before the rescheduling point.
The choice of the most appropriate methodology depends on the nature of the perturbation and is generally made by experts. Rescheduling methods have different advantages and drawbacks: RSR and PR can quickly respond to machines’ breakdowns, however TR can offer a high-performance rescheduling, but with excessive computational effort. In this work, the targeted rescheduling strategy is the optimal one that minimizes the makespan and the energy consumption.

4. Proposed Multi Objective Q-Learning Rescheduling Approach

The proposed Q-learning-based rescheduling is described in Figure 2. The system is composed of two modes:
Figure 2. Proposed reschedule decision-making approach under machine failure.
  • An offline mode: in the first place the predictive schedule is obtained using a genetic algorithm, which represents the environment of the Q-learning agent. By interacting with this schedule and simulating experiments of machine failures, this agent learns how to select the optimal rescheduling solution for different states of the system.
  • An online mode: when a machine failure occurs, the state of the system at the time of the interruption is delivered to the Q-learning agent. It responds by selecting the optimal rescheduling decision for this particular type of failure.
A key aspect of RL is that an agent has to learn a proper behavior. This means that it modifies or acquires new behaviors and skills incrementally [44]. An improvement of the Q-learning algorithm was also made to consider different criteria (multi-objective Q-learning). Next sections detail this algorithm.

4.1. Q-Learning Terminologies

In order to be more accurate in the description of the algorithm, some terminologies of Q-learning are recalled below [45]:
  • Agent: The agent interacts with its environment, selects its own actions, and responds to those actions;
  • States: The set of environmental states S is defined as the finite set { s 1 ,...,   s N }, where the size of the state space is N;
  • Actions: The set of actions A is defined as the finite set { a 1 ,...,   a k }, where the size of the action space is K. Actions can be used to control the system’s state;
  • Reward function: The reward function specifies rewards for being in a state or doing some action in a state.
To sum up, the agent will make optimal decisions using experiences, make an action in a particular state, and evaluate its consequences based on a reward. This process is done repeatedly until it becomes able to choose the best decision.
Q-learning is a value-based learning algorithm; it updates the value function based on a Bellman equation. The ‘Q’ here stands for quality of an action. The agent maintains a table of Q(s, a), updated along time based on Equation (2):
Q ( s t , a t ) = ( 1 α )   Q ( s t , a t ) + α (   r t + 1 + γ maxQ ( s t + 1 , a   ) )
where   r t + 1   is the reward received when the agent transferring from the state s t to the state s t + 1 , α is the learning rate (0 < α ≤ 1) (representing the extent to which our Q-values are being updated in every iteration), and γ is the discount factor (0 ≤ γ ≤ 1) (determining what importance is given to future rewards).
The algorithm of Q-learning is detailed in Algorithm 1.
Algorithm 1 Q-Learning
Initialize   Q ( s   ,   A a   ) randomly
Repeat for each episode:
  Initialize s
  Repeat for each step of episode
    Choose an action from a using a policy derived from Q (ε-greedy)
    Take an action a and observe the reward R and the next state s’
    Update
      Q ( s t , a t ) = ( 1 α )   Q ( s t , a t ) + α (   r t + 1 + γ maxQ ( s t + 1 , a   ))
    ss
    until s is terminal

4.2. Multi-Objective Q-Learning

In this case the agent has to optimize two objective functions at the same time. Here, the reward will transform from a scalar value to a vector of the size of the number of objective functions:
R ( s   ,   a   ) = [ R 1 ( s   ,   a   ) , R 2 ( s   ,   a   )   . . R m ( s   ,   a   ) ]
where m is the number of objective functions.
The same thing occurs with action-state value Q(s,a) which becomes also a m-dimensional vector which is defined as follow:
Q ( s ,   a ) = [ Q 1 ( s   ,   a   ) ,   Q 2 ( s   ,   a   )   . . Q m ( s   ,   a   ) ]
where every value corresponds to a reward value from the reward vector.
In this article a multi-objective Q-learning with single policy approach is used. This means that it reduces the dimensionality of the multi-objective function. This new function fairly represents the importance of all objectives. For the single policy approach, many methods have been proposed. The most well-known is the weighted sum approach where scalarizing function is applied to Q(s, a) to acquire a scalar value Q ( s   ,   a   )   ¯ that considers all the objective functions. The linear scalarizing function is used and described as follows:
Q ( s   ,   a   )   ¯ = i = 0 m Q i ( s   ,   a   ) ] w i
where 0 w i 1 is the weight that specifies the importance of each objective function, and must satisfy the following equation: i = 0 m w i = 1
The algorithm of the multi-objective Q-learning is detailed in Algorithm 2.
Algorithm 2 Multi-Objective Q-Learning
Initialize   Q ( s   ,   a   ) randomly
Repeat for each episode:
    Initialize s
    Repeat for each step of episode
      Choose an action from a using a policy derived from Q (ε-greedy)
       Take   an   action   a   and   observe   the   rewards   R 1   and   R 2 and the next state s
      Update
         Q 1 ( s t , a t ) = ( 1 α )   Q 1   ( s t , a t ) + α ( R 1 t + 1 + γ max Q 1   ( s t + 1 , a   ))
         Q 2 ( s t , a t ) = ( 1 α )   Q 2   ( s t , a t ) + α ( R 2 t + 1 + γ max Q 2   ( s t + 1 , a   ))
      ss
      until s is terminal

4.3. State Space Definition

The state space is the set of all possible situations the agent could inhabit. We have to select the number of states that will give the optimal solution and how to define these states. In this article, two indicators were used to establish the state space:
  • s1: indicates the moment when the perturbation happens, e.g., in the beginning, the middle or in the end of the schedule. For this purpose, the initial makespan was divided into 3 intervals, so s1 can take the values 0, 1 or 2.
  • s2: defined by the indicator SD which is the ratio of the duration of the directly affected operation by the machine’s breakdown to the total processing time of the remaining operations on failed machine. The formula is described as follows:
SD = O a f f R T 100
where Oaff is the directly affected operation by the breakdown machine and RT is the total processing time of the remaining operations on failed machine. s2 is an integer between 0 and 9 depending on the value of SD.
The couple (s1, s2) represents the state of the system at a particular time, given the rescheduling time, the failure machine, and the breakdown duration. In total we have 30 states, where 0 ≤ s1 ≤ 2 and 0 ≤ s2 ≤ 9 (s1 and s2 are integers).

4.4. Actions and Reward Space Definition

The agent encounters one of the 30 states, and it takes an action. The action in this case is one of the rescheduling methods:
  • Action 0: Partial rescheduling (PR)
  • Action 1: Total rescheduling (TR)
  • Action 2: Right shifting rescheduling (RSR)
The definition of the reward plays an important role in the algorithm since the Q-learning agent is reward-motivated. This means that it selects the best action by evaluating the reward. In this work, the reward is a vector with two scalars
R ( s   , a   ) = [ R 1 ( s   , a   ) ,   R 2 ( s   , a   ) ]
where R 1 ( s   , a   ) depends on delay time (the longer the delays, the smaller the rewards) and R 2 ( s   , a   ) depends on the difference of energy consumption between the initial scheme and the scheme after rescheduling (the bigger these differences, the smaller the rewards). The rewards are set to be between 5 and −5, based on how much delay time there is and the difference in energy consumption the action will cause.

5. Experiments and Results

In order to evaluate the performance of the proposed model, benchmark problems are used. At the authors’ best knowledge, there are currently no benchmarks available in the literature considering energy in an FJSSP. Therefore, instances had to be created in order to test and validate this work. The choice was made to extend classical problems from the literature to support energy consumption. The chosen problems are taken from Brandimarte [46]. This consists of 10 problems (mk1 to mk10), where the jobs range from 10 to 20 operations, machines from 6 to 15, and operations for each job from 5 to 15. An energy consumption of every operation was added randomly, obeying a uniform distribution between 1 and 100. Thus, for each instance, the machining energy consumption and the idle power of machines are specified as inputs.
In this article, the unit of the makespan is unit of time and the unit of the energy consumption is in kWh.

5.1. Predictive Schedule Based on GA

Initially, the optimal scheduling scheme is acquired based on GA. Python programming is used to develop the proposed method using the distributed evolutionary algorithms in python framework (DEAP), which is a novel evolutionary computation framework. The parameters of GA are set as follows: the size of initial population is 50 and the number of generations is 500.
To validate the GA, a comparison with other methods in literature was made, such as PSO proposed by [47] and TS proposed by [48]. The result of the Brandimarte instances in terms of makespan of these different algorithms is presented in Table 3. The weight of the objective function of genetic algorithm is set to 1, to give importance to makespan rather than energy reduction.
Table 3. Results in terms of makespan (in time units) of the Brandimarte instances for different algorithms.
As can be seen from Table 3, the proposed GA gives similar results to PSO and TS algorithm when the weight is set to 1. Therefore, we consider this proposition as satisfying.
In the next step, more importance is given to energy reduction, therefore the weight of the objective function is modified. The Gantt chart of the predictive schedule using GA of Mk01 for different weight values is shown in Figure 3.
Figure 3. The predictive schedule for different weights of the objective functions. (ad) represent respectively the predictive schedule when the weight of the objective function of GA algorithm is set to 1, 0.5, 0.2, or 0 respectively.
The makespan and energy consumption values for different cases are described in Table 4. This shows that the two objective functions are antagonistic. When the weight is set to 1, importance is given to makespan, therefore in this case GA provides the best makespan (42) but the biggest energy consumption value (2812). On the opposite, when the weight is set to 0, the importance is given to energy reduction, in this case GA provides the worst makespan (73) but the best energy consumption value (2229). It may be noted that when the weight decreases, makespan decreases but energy consumption increases.
Table 4. Makespan (MK in time units) and energy consumption (EC in kWh) calculation example on MK01 instance.

5.2. Rescheduling Strategies

To illustrate the difference between the different rescheduling methods presented in Section 3.4, the predictive schedule of the instance MK01 where the weight is set to 1 is taken as example. A random perturbation (machine failure) is applied, assuming that at time t = 20, machine 1 is broken down and t′ = 6 is the duration of the breakdown. The new schedules acquired by the three rescheduling methods (PR, TR and RSR) are presented in Figure 4, the red line representing the starting time and ending time of machine failure.
Figure 4. Demonstration of initial scheme, PR scheme, TR scheme and RSR scheme. (a) illustrates the predictive schedule, (bd) illustrate the reactive schedule provided by the three rescheduling methods PR, TR and RSR respectively.
The directly affected operations by the failure machine are O 5 , 6 , O 6 , 2   , O 6 , 6 , O 6 , 10 ,   and   O 6 , 3 , these operations are executed by the broken-down machine. In PR, O 5 , 6 , O 6 , 2   , O 6 , 10 are postponed after the breakdown and the O 6 , 6 and O 6 , 3 are executed respectively on machine 4 and 5 with a different processing time (Figure 4b). In TR, all the remaining jobs are rescheduled using the GA algorithm after the breakdown (Figure 4c). As for RSR, all the remaining jobs are postponed by the breakdown duration (Figure 4d). The performance of the rescheduling methods is described in the Table 5.
Table 5. The makespan (time units) and energy consumption (kWh) calculation for rescheduling methods on MK01 instance.
As can be seen from Table 5, the three rescheduling methods gives different results. Both makespan and energy consumption are increased due to the presence of the machine failure that affects a set of operation. In terms of makespan, TR gives the best result (42), but in terms of energy consumption, RSR gives the best result (2887). This result can be explained by the date of the failure, which happened close to the end of the initial schedule.

5.3. Rescheduling Based on Q-Learning

To test the performance of the proposed Q-learning algorithm, we designed simulation experiments of machine failures. The parameters are set as follows:
  • α = 1: A learning rate of 1 means the old value will be completely discarded, the model converges quickly, no large number of episodes are required;
  • γ = 0: The agent considers only immediate rewards. In each episode, one state is evaluated (the initial state of the system at a particular time, given the rescheduling time, the failure machine and the breakdown duration)
  • ε = 0.8, the balance factor between exploration and exploitation. Exploration refers to searching over the whole sample space while exploitation refers to the exploitation of the promising areas found. In the proposed model, 80% is given to exploitation, so in 80% of cases the agent will choose the action with the biggest reward and in 20% of cases he will randomly choose an action to explore more of its environment.
  • The number of episodes is 1000, for the model to converge.
In each episode the Q-table is updated depending on the value of the rewards (Figure 5).
Figure 5. Q-table initialization and update.

5.3.1. The Single Objective Q-Learning

Two types of Q-learning algorithm are proposed in this article: the single objective Q-learning and multi-objective Q-learning.
The aim of the single objective function Q-learning is to minimize the makespan, which means the minimization of the delay time. The curve of the reward and the delay time in the first 50 episodes are described in Figure 6. It can be seen that the longer the delay time, the lower the reward value.
Figure 6. The evolution of reward value and delay time along episodes.
To show how the Q-values are updated in each episode, the state (0.7) is taken as example. Figure 7 describes the variation of Q-values of each action. The agent first selects the action 0 and gets a positive reward so its Q-value increases. After a few episodes, action 0 is chosen again because it has the biggest Q-value but gets a negative reward. Its Q-value thus decreases, giving the chance for action 1 to be selected. After that, action 1 is chosen in every episode because it gets a positive reward each time so its Q-value increases. Action 2 is selected in 100 th and 800 th episodes due to the ε-greedy where the agent still has a 20% probability to explore but its Q-value decreases because it gets negative rewards.
Figure 7. Q-value prediction of state (0.6).

5.3.2. The Multi-Objective Q-Learning

The goal of the multi-objective Q-learning approach is to minimize the makespan and the energy consumption at the same time. In this case, two rewards are considered: reward R 1 that depends on the delay time and reward R 2 that depends on the energy consumption deviation. Figure 8 describes the variation of the reward along the first 50 episodes. It can be seen that R 1   increases when the delay time decreases and R 2 increases when the energy consumption deviation decreases.
Figure 8. The change of rewards, delay time and energy consumption variation along episodes.
This time, state (1.9) is taken as an example and the weight of the objective function of the multi Q-learning algorithm is set to 0.5 (which means that makespan and energy consumption have the same importance). Throughout the episodes, action 1 gets positive rewards and its Q-value increases so it is selected most of the times, on the other hand action 0 and action 2 get negative rewards so their Q-values decrease, they are chosen only in the exploration phase. The Q-value prediction of the state (1.9) is presented in Figure 9.
Figure 9. Q-value prediction of state (1.9).

5.4. Models Validation

The results of the optimal rescheduling methods for the Brandimarte [46] instances and the solution given by the Q-learning agent are represented in Appendix A. In Table 6, an extraction of Appendix A, corresponding to the instance MK01, is taken as example. The first column is the name of the instance, followed by its size and its level of flexibility. In the fourth column, the weight of the objective function of the GA and of the multi-objective Q-learning is defined. In the fifth column, makespan and energy consumption of the predictive schedule are calculated. In the sixth column, different types of machine failures are defined by their failure time, the reference of the failing machine and the failure duration. Next comes the state definition, then the rescheduling methods and their performance. In the last column the evaluated Q-learning approach is presented by giving the makespan (MK) and the energy consumption (EC) of the selected optimal rescheduling solution using single objective Q-learning and multi-objective Q-learning.
Table 6. Performance measurement of the predictive and reactive schedule in MK01 instance.
In the predictive schedule, when the weight decreases, the makespan increases but the energy consumption decreases. This is normal because importance is given to energy consumption each time the weight is decreased. After simulating different types of failure randomly, it can be seen that the Q-learning is able to choose the best rescheduling methods each time; the single objective Q-learning selects the best methods that minimize the makespan but the multi objective Q-learning selects the best methods that minimize the makespan and energy consumption depending on the value of the weight of the objective function.
When this weight is set to 1, the single objective and multi-objective Q-learning have the same results. They both choose the methods that minimize the makespan regardless of the value of the energy consumption. From Table 7, in the case of the MK01, TR proved to have the highest performance and was selected in both algorithms. Giving the same importance to energy consumption, which implies setting the value of the weight to 0.5, the selected method changes to make a compromise between the two objectives. There is a difference between the result of single objective and multi-objective Q-learning. Taking the state (0.9) as example, PR and TR gives 56 and 57 as makespan respectively and 2890 and 2724 as energy consumption respectively, so PR is selected by the single-objective Q-learning because it generates the minimum makespan, but TR is selected by the multi-objective Q-learning because it has better result than PR in terms of energy consumption.
Table 7. CPU time comparison.
By further decreasing the value of the weight to 0.2, more prominence is given to energy consumption. Taking the example of the state (0.4), PR and TR give 75 and 79 as makespan respectively and 2797 and 2757 as energy consumption respectively. Here PR is selected by the single objective Q-learning because it minimizes the makespan, but TR is selected by the multi-objective Q-learning because it has better optimization of the energy consumption that was given more importance. Once the weight is set to 0, the multi-objective Q-values selects the methods that optimizes the energy consumption regardless of the value of the makespan, as in state (0.9) when PR gave the best makespan (91) so it was selected by the single-objective Q-learning, but TR was selected by the multi-objective Q-learning because it gave the best energy consumption (2612).
Considering all the instances of the Brandimarte benchmark, in Appendix A, we can also deduce that the right shift rescheduling turned out to have the worst performance, this is due to the postponement of the remaining tasks which increases both the makespan and the energy variation. Another deduction that can be taken is that generally TR have the best performance in early failures and PR gives better results when the failures occur in the middle or in the end of the schedule and especially with instances that have high flexibility. The results of RSR also become improved at the end of the schedule because the number of postponed operations is smaller.
The Q-learning algorithm not only selects the optimal methods for rescheduling but also responds immediately to perturbation. Table 7 indicates the CPU time comparison between the time spent to execute the three rescheduling methods (PR, TR, RSR) and to select the optimal one and the time spent by the Q-learning algorithm to select the best method from the Q-table. The reported values are evaluated using a laptop computer with Intel core i5-8250U with 1.8 GHZ speed and with 12 Gb memory. The offline training of the Q-learning algorithm can take minutes or even some hours depending on the instance size, but it can be seen that, in online execution, the learning-based rescheduling selection of the optimal solution takes only one millisecond compared with traditional rescheduling that can exceed one minute, this time corresponds to state calculation of the system after perturbation and the selection of the best methods that have the highest Q-values from the corresponding Q-values table. However, the execution of the three rescheduling methods and the selection of the best method can take several seconds, even minutes when the instance is large.

6. Conclusions

This work deals with the flexible job shop scheduling problem under uncertainties. A multi-objective Q-learning rescheduling approach is proposed to solve the FJSSP under machine failures. Two key performance indicators are used to select the best schedule: the makespan and the energy consumption. The idea was not only to maintain effectiveness but also to improve energy efficiency. The approach is hybrid and combines predictive and reactive phases. The originality of this work is to combine AI and scheduling techniques to be able to rapidly solve a bi-objectives problem (makespan and energy consumption) of rescheduling in a context of FJSP.
First, a genetic algorithm was developed to provide an initial predictive schedule that minimizes the makespan and energy consumption simultaneously. In this predictive phase, different types of machine failures were simulated and classical rescheduling policies (RSR, TR, PR) were executed to repair the predictive scheduling and to find new solutions. Based on these results, the Q-learning agent is trained. To consider the energy consumption even in the rescheduling process, a multi-objective Q-learning algorithm was proposed. A weighting parameter is used to make a tradeoff between the makespan and the energy consumption. In the reactive phase, the Q-learning agent is tested on new machine disruptions. The Q-learning agent seeks to find the best action to take given the current state. In fact, the main goal of using AI tools is to be able to react quickly facing failures while rapidly selecting the best rescheduling policy related to the state of the environment. In order to assess the performance of the developed approach, the Brandimarte [46] benchmark was extended to support energy consumption. On this new benchmark, the Q-learning based rescheduling approach was tested to respond to unexpected machine failures and select the best rescheduling strategy.
The results of this study show that the approach proved to be effective in responding quickly and accurately to unexpected machine failures. The Q-learning algorithm provided appropriate strategy choices based on the state of the environment with various balance between the objectives of energy consumption and productivity. The learning phase was therefore efficient enough to enable these efficient choices. The choices of genetic algorithm and Q-learning algorithm proved their efficiency on the extended classical instances of Brandimarte in this work. Nevertheless, the approach leaves the possibility to the user to integrate their own choice of algorithm according to the specific context.
Future works are oriented to take into consideration other types of disruptions like new job insertions, variety of availability of energy, urgent job arrivals, etc. Another future perspective that can be expected is the evaluation of the proposed approach on other types of learning techniques in order to compare with the Q-learning algorithm. On a more global perspective, this work contributes to the development of efficient rescheduling approaches for the control of future industrial systems. Such systems are meant to integrate more and more flexibility, and the performance evaluation of this work on a FJSP shows the compatibility of the approach with this objective. This work also contributes to the integration of multi-objective rescheduling strategies in industry, which is especially relevant for sustainability concerns.

Author Contributions

Conceptualization, R.N. and M.N.; Funding acquisition, O.C.; Investigation, M.N.; Methodology, R.N. and M.N.; Software, R.N.; Supervision, M.N.; Validation, M.N.; Visualization, O.C.; Writing—original draft, R.N.; Writing—review & editing, M.N. and O.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the PULSAR Academy sponsored by the University of Nantes and the Regional Council of Pays de la Loire, France.

Institutional Review Board Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Performance evaluation of the Q-learning approach on the Brandimarte benchmark.
Table A1. Performance evaluation of the Q-learning approach on the Brandimarte benchmark.
InstanceSizepWeight
of BF
Predictive ScheduleMachine FailureState of the SystemReactive ScheduleQ-Learning
PRTRRSRSingle ObjectiveMulti-Objective
MK (Time Units)EC (kWh)Failure TimeBroken-Down
Machine
Failure DurationMK (Time Units)EC (kWh)MK (Time Units)EC (kWh)MK (Time Units)EC (kWh)
MK0110 × 6214230463520(0.5)463064453115613160TRTR
16419(1.9)603128553243663180TRTR
8117(0.6)573099503190583142TRTR
23314(1.7)573101563218583142TRTR
13510(0.4)463058453028523106TRTR
13620(0.9)563098543204593148TRTR
0.549283711112(0.5)542872582826612909TRTR
7523(0.9)562890572724762999PRTR
22222(1.9)622950562968652993TRTR
5212(0.3)542935542853552939TRTR
11112(0.6)542872582826612909PRPR
13413(0.2)502839542816542867PRPR
0.252267231215(1.9)642702672711672672PRPR
4220(0.4)752797782757752800TRPR
10414(0.0)522673582670592714PRPR
10121(0.6)642728682632732798TRPR
20222(1.7)722769762773752820PRPR
6526(0.9)652727682704742804PRTR
079255423620(0.9)9126499926121022692PRTR
1526(0.3)7925607925741022686PRPR
31237(1.8)92266811027061162776PRPR
3224(1.6)88263910026661062689PRPR
16234(0.6)98270011027441162776PRPR
30620(1.9)792564792605982668PRPR
MK0210 × 63.5132317315112(1.7)463234453223453263TRTR
4216(0.7)453216473330493263PRPR
1869(1.8)403205373296433239TRTR
1612(0.3)443223463071443245PRPR
1024(0.9)493232523386513287PRPR
249(0.4)383191373282433239TRTR
0.53724795617(0.6)492525482334562593TRTR
17611(1.9)422494452334502557PRTR
25613(2.9)452497462384502557PRTR
1019(0.7)442503472187462533PRTR
1869(1.6)422490402342462490TRTR
5411(0.3)382487422288502557PRTR
0.249199223214(1.7)592035622014652088PRTR
16123(0.9)532018541996642082PRTR
1616(0.4)552017501935602058TRTR
11118(0.7)632014521983672100TRTR
24220(1.9)642062572071722130PRTR
5618(0.6)602040581940662040TRTR
049196421416(1.9)561990521996662066TRPR
35320(2.9)662010682045712030PRPR
2415(0.5)552000641990652060PRTR
10419(0.6)612035551992692084TRTR
10520(0.9)602038601985682087TRTR
22114(1.6)521995541981642054PRPR
MK0315 × 8312068846113470(1.8)255912023991352799430TRTR
45666(0.4)254926224690422729374TRTR
55259(0.6)250906322192632689342TRTR
75253(1.7)250907821988242729374TRTR
1265(0.3)221983923890012469166PRPR
57882(0.8)269927623791603019606TRTR
0.5227751583867(1.8)278778725472013098171TRTR
182488(2.9)310790529678743178235PRPR
66277(0.6)244761824972093028115PRTR
44180(0.4)304801430775163178235PRTR
94466(1.4)266779124273872978075PRPR
97367(1.4)264796924374262767907PRPR
0.2231720094298(1.9)273740826372753358032TRTR
29476(0.5)284759829172223007832PRTR
131111(0.6)355804236881183558192PRPR
983116(1.8)337790727873273498136TRTR
170488(2.9)304754428274973137856TRTR
401116(0.7)334795835077423538176PRTR
02536574152697(1.9)328704033669523487239PRTR
64467(0.4)282679032569003257150PRPR
1051103(1.8)341708133870803697502TRTR
438121(0.7)296701027668163587414TRTR
308104(0.6)278698329969163617438PRTR
86373(1.5)297684628868053347222PRTR
MK0415 × 8216752066431(0.6)10254278452141025486TRTR
1317(0.1)745249775398845334PRPR
49327(2.9)11053989453471095470TRTR
30217(1.3)675206725315845342PRPR
11219(0.3)675206755342875366PRPR
1726(0.4)835324875495935422PRPR
0.573487243326(1.9)964976874891995080TRTR
34425(1.7)714999685054985072TRTR
3123(0.4)955015935023995080TRTR
28618(1.8)985007844976955048TRTR
3620(0.3)844974854723945040PRTR
36228(1.4)734886784930804886PRPR
0.276456240435(1.9)10647389247241124850TRTR
7127(0.4)103477910747231044786PRTR
42721(1.7)9546358854791014579PRTR
21330(0.7)10947509046151094826PRTR
30137(1.8)110474210548101134858TRPR
11625(0.5)8746218546001034778PRTR
090440637432(1.7)107451010245721264658TRPR
23241(0.7)9444599644621314734PRPR
33339(1.9)113452810745591294679TRPR
8736(0.5)135461112145801304726TRTR
3528(0.8)96449210544881214654PRTR
20724(0.4)108449010345181144598TRPR
MK0515 × 41.51179557730281(0.5)260586622761212865925TRTR
116350(1.9)224570222556762305781PRPR
84248(1.5)229574120657772295777TRTR
124448(2.8)229574921656392305781TRTR
28348(0.3)234576621054962345797TRTR
5378(0.4)257585523459112575889TRTR
0.51864977134179(2.9)257524323152482625309TRTR
57367(0.5)256519724751772565257PRPR
77286(1.8)262522723451622735325TRTR
49387(0.6)276527725253842765337TRTR
122465(1.9)246520224052162555253TRTR
13464(0.4)257524722351202575261TRTR
0.2197483489251(1.5)241499021648822525054TRTR
2355(0.3)256503023249562545062TRTR
43271(0.5)261505821249252745142TRTR
159480(2.9)280515627451122805166TRTR
15262(1.8)243498221848882605086TRTR
105457(1.6)247502724349582555066TRTR
02234751171492(2.9)311501529450503115103TRTR
15358(0.3)284498028650492895007PRRR
19177(0.5)257490124749112995055TRPR
93366(1.5)287499827049502955039TRTR
111468(1.7)287500226849222915023TRTR
1402104(1.9)281500228449902845139PRTR
MK0610 × 15318681086730(0.5)116835911486461218458TRTR
57733(1.9)116831710783171198438TRTR
25825(0.3)106823510783171148388PRPR
37826(1.7)10482029585631078318TRTR
18843(0.7)143847111585971308548TRTR
35643(1.6)10682429984211188428TRTR
0.599800457533(1.8)127815611780391358364TRTR
25747(0.7)143835914176691478484TRTR
3641(0.3)131819312177491418424TRTR
54249(1.9)135888512078001408414TRTR
83146(2.9)142821213981641458346TRTR
29450(0.8)130826513377281538534PRTR
0.211474351 851(1.8)143763013872541627915TRTR
6731(0.3)147774814971401507795PRTR
91532(1.9)161784315374381718005TRTR
78834(2.9)131754712873701507795TRTR
34935(0.5)121752813470711457725PRTR
26951(0.7)239765823974591647935PRTR
0141656426964(0.6)148680716368852067214PRPR
66551(1.8)150671615967461867014PRPR
36160(0.7)172693018168752027147PRTR
94739(2.9)167670216267531856916TRPR
30261(0.9)159688116067001967114PRTR
49944(1.7)155682215866431846994PRTR
MK0720 × 531164559943159(0.5)220580320057022265909PRTR
112577(2.9)242589122158412445999TRTR
8573(0.4)228586120858342375964TRTR
65275(1.8)217587219656562405979TRTR
52475(0.7)244594224558752445999PRPR
1558(0.3)214549522256332235894PRPR
0.518946995186(0.5)270492022846952805154TRTR
86484(1.9)274495024849322745124TRTR
77254(1.5)243498220646242585044TRTR
59184(0.7)243489923445692735119TRTR
145189(2.9)272485925449642855179TRTR
94148(1.7)233479920845642484994TRTR
0.2220434581562(1.5)285457724842772904695TRTR
157194(2.9)288449327545533174830TRTR
39392(0.5)307475027342673124805TRTR
87278(1.7)253451825743662994740PRTR
352102(0.8)276465829444983394890PRTR
110480(1.8)299469628845633004745TRTR
0236409744261(0.7)253421627240922974407PRTR
793111(1.9)285438129042903504667PRTR
51399(0.9)267431927141983324577PRTR
55477(0.5)297435531042283264547PRTR
1724104(2.9)316429832544523414517PRPR
99172(1.5)302433126941783084457TRTR
MK0820 × 101.5152313,2552927250(1.9)61313,95660414,40577515,523PRPR
1257192(0.7)57913,68358213,25071514,983PRPR
941153(0.3)68114,73569314,97468114,677PRPR
2423185(1.8)58413,80957713,75570114,938TRTR
869207(0.5)55913,57956713,71272715,091PRPR
2383151(1.7)56813,68455513,45867214,596TRTR
0.552412,499815258(0.8)49513,85240113,45148714,596TRTR
2162189(1.9)29212,90229312,97937213,642PRPR
1069139(0.5)28012,69927312,58737113,552TRTR
107227(0.6)43414,04634013,58149114,632TRTR
41810152(2.9)40413,04839313,22642013,495TRTR
423196(0.4)35913,48133013,01345814,335TRTR
0.254312,3653377159(1.9)61912,84859512,87268213,616TRTR
1325226(0.8)64613,37763213,34877314,435TRTR
2018174(1.6)63113,19858912,97672013,958TRTR
1311184(0.4)71714,00973413,68372814,030PRTR
3201158(1.8)68913,46769913,17370913,859PRTR
153147(0.3)59212,88958112,55069013,688TRTR
056112,3201949260(1.9)59012,81058412,94978514,336TRPR
2910146(0.3)75013,72071413,66172213,769TRTR
1264260(0.9)60713,06261212,78982114,660PRTR
21410140(1.4)69413,46466713,40470313,598TRTR
43010204(2.9)78213,39674413,42078213,876TRPR
863263(0.8)68913,80964013,24482614,687TRTR
MK0920 × 103134213,9001892132(1.8)46414,96541314,42956715,250TRTR
244797(2.9)51814,43348814,40453114,890TRTR
6810107(0.2)37214,12438214,25944114,890PRPR
50994(0.4)37714,25937914,04442414,720PRPR
115197(1.5)41314,53347814,34142314,810PRPR
112991(0.5)46714,21245114,17644214,900TRTR
0.536212,7882154144(1.9)50413,81343813,16650714,238TRTR
115690(0.4)36912,84138212,56644513,518PRTR
141691(1.6)36912,88437312,64246213,788PRTR
2612102(2.9)44313,63744213,38944213,798TRTR
1225175(1.7)45813,58345213,43452914,458TRTR
2910181(0.6)72613,63569312,21381514,618TRTR
0.236712,4372288134(1.9)50113,26048313,23650613,827TRTR
341097(0.2)37812,52939312,56644813,247PRPR
439169(0.7)45513,25848613,00953814,147PRTR
184693(1.5)40512,76041212,31445213,287PRTR
2458177(2.9)53713,46951413,41354914,257TRTR
929142(0.6)44113,01243512,49551013,867TRTR
043412,3221188126(0.4)54813,35852813,45156213,062TRTR
18710192(1.7)52013,03145712,62262814,262TRTR
462185(0.6)51413,15449113,57961214,102TRTR
1861193(1.8)55513,58554113,30962714,252TRTR
131215(0.5)56913,72956314,03465114,492TRTR
2441158(1.9)53213,33052713,19958813,862TRTR
MK1020 × 151.5129213,70718148(1.8)36514,40035614,37642115,126TRTR
57979(0.4)34214,15533013,92036714,631TRTR
889132(0.7)39614,63036714,33653115,236TRTR
2031130(2.9)41514,43636614,33142915,214TRTR
41186(0.3)34514,05032614,24637914,664TRTR
1194139(1.7)36314,40034514,09541915,104TRTR
0.529712,710107146(0.5)42013,94640913,08245314,426TRTR
2122135(2.9)31913,49439313,62943614,239TRTR
122686(1.7)37013,23532212,72239013,733TRTR
1713128(0.4)30712,78731112,34035913,392PRTR
1574138(1.9)39113,66736812,98344414,327TRTR
913125(0.7)37213,32735912,53841413,997TRTR
0.231611,82683150(0.4)35212,22338512,33447413,564PRPR
125883(1.6)35412,25235011,92140612,816TRTR
1237156(1.9)41012,80240112,61048413,674TRTR
506150(0.6)40312,70540012,04946913,509TRTR
1515123(1.8)42712,85238812,24945013,300PRPR
2543156(2.9)45712,51643812,58246313,296TRPR
034411,483541091(0.7)37511,84837011,74743812,517TRTR
728126(0.5)40512,11744011,75847312,902PRTR
1621102(1.6)41011,99937811,73245112,553PRTR
2727136(2.9)45111,83843512,24148512,750TRPR
1128143(0.8)43612,44142212,17649413,133TRTR
1784169(1.9)43812,38142912,13551413,183TRTR

References

  1. Giret, A.; Trentesaux, D.; Prabhu, V. Sustainability in Manufacturing Operations Scheduling: A State of the Art Review. J. Manuf. Syst. 2015, 37, 126–140. [Google Scholar] [CrossRef]
  2. Zhang, L.; Li, X.; Gao, L.; Zhang, G. Dynamic Rescheduling in FMS That Is Simultaneously Considering Energy Consumption and Schedule Efficiency. Int. J. Adv. Manuf. Technol. 2016, 87, 1387–1399. [Google Scholar] [CrossRef]
  3. Nouiri, M.; Bekrar, A.; Trentesaux, D. Towards Energy Efficient Scheduling and Rescheduling for Dynamic Flexible Job Shop Problem. IFAC-Pap. 2018, 51, 1275–1280. [Google Scholar] [CrossRef]
  4. Masmoudi, O.; Delorme, X.; Gianessi, P. Job-Shop Scheduling Problem with Energy Consideration. Int. J. Prod. Econ. 2019, 216, 12–22. [Google Scholar] [CrossRef]
  5. Liu, Y.; Dong, H.; Lohse, N.; Petrovic, S. A Multi-Objective Genetic Algorithm for Optimisation of Energy Consumption and Shop Floor Production Performance. Int. J. Prod. Econ. 2016, 179, 259–272. [Google Scholar] [CrossRef] [Green Version]
  6. Kemmoe, S.; Lamy, D.; Tchernev, N. Job-Shop like Manufacturing System with Variable Power Threshold and Operations with Power Requirements. Int. J. Prod. Res. 2017, 55, 6011–6032. [Google Scholar] [CrossRef]
  7. Raileanu, S.; Anton, F.; Iatan, A.; Borangiu, T.; Anton, S.; Morariu, O. Resource Scheduling Based on Energy Consumption for Sustainable Manufacturing. J. Intell. Manuf. 2017, 28, 1519–1530. [Google Scholar] [CrossRef]
  8. Mokhtari, H.; Hasani, A. An Energy-Efficient Multi-Objective Optimization for Flexible Job-Shop Scheduling Problem. Comput. Chem. Eng. 2017, 104, 339–352. [Google Scholar] [CrossRef]
  9. Gong, X.; De Pessemier, T.; Martens, L.; Joseph, W. Energy-and Labor-Aware Flexible Job Shop Scheduling under Dynamic Electricity Pricing: A Many-Objective Optimization Investigation. J. Clean. Prod. 2019, 209, 1078–1094. [Google Scholar] [CrossRef] [Green Version]
  10. Chen, X.; Li, J.; Han, Y.; Sang, H. Improved Artificial Immune Algorithm for the Flexible Job Shop Problem with Transportation Time. Meas. Control 2020, 53, 2111–2128. [Google Scholar] [CrossRef]
  11. Salido, M.A.; Escamilla, J.; Barber, F.; Giret, A. Rescheduling in Job-Shop Problems for Sustainable Manufacturing Systems. J. Clean. Prod. 2017, 162, S121–S132. [Google Scholar] [CrossRef] [Green Version]
  12. Caldeira, R.H.; Gnanavelbabu, A.; Vaidyanathan, T. An Effective Backtracking Search Algorithm for Multi-Objective Flexible Job Shop Scheduling Considering New Job Arrivals and Energy Consumption. Comput. Ind. Eng. 2020, 149, 106863. [Google Scholar] [CrossRef]
  13. Xu, B.; Mei, Y.; Wang, Y.; Ji, Z.; Zhang, M. Genetic Programming with Delayed Routing for Multiobjective Dynamic Flexible Job Shop Scheduling. Evol. Comput. 2021, 29, 75–105. [Google Scholar] [CrossRef]
  14. Luo, J.; El Baz, D.; Xue, R.; Hu, J. Solving the Dynamic Energy Aware Job Shop Scheduling Problem with the Heterogeneous Parallel Genetic Algorithm. Future Gener. Comput. Syst. 2020, 108, 119–134. [Google Scholar] [CrossRef]
  15. Tian, S.; Wang, T.; Zhang, L.; Wu, X. An Energy-Efficient Scheduling Approach for Flexible Job Shop Problem in an Internet of Manufacturing Things Environment. IEEE Access 2019, 7, 62695–62704. [Google Scholar] [CrossRef]
  16. Nouiri, M.; Trentesaux, D.; Bekrar, A. EasySched: Une Architecture Multi-Agent Pour l’ordonnancement Prédictif et Réactif de Systèmes de Production de Biens En Fonction de l’énergie Renouvelable Disponible Dans Un Contexte Industrie 4.0. arXiv 2019, arXiv:1905.12083. [Google Scholar] [CrossRef] [Green Version]
  17. Bishop, C.M. Pattern Recognition and Machine Learning (Information Science and Statistics); Springer: Berlin, Germany, 2007. [Google Scholar]
  18. Shahzad, A.; Mebarki, N. Learning Dispatching Rules for Scheduling: A Synergistic View Comprising Decision Trees, Tabu Search and Simulation. Computers 2016, 5, 3. [Google Scholar] [CrossRef] [Green Version]
  19. Wang, C.L.; Rong, G.; Weng, W.; Feng, Y.P. Mining Scheduling Knowledge for Job Shop Scheduling Problem. IFAC-Pap. 2015, 48, 800–805. [Google Scholar] [CrossRef]
  20. Zhao, M.; Gao, L.; Li, X. A Random Forest-Based Job Shop Rescheduling Decision Model with Machine Failures. J. Ambient. Intell. Humaniz. Comput. 2019, 1–11. [Google Scholar] [CrossRef]
  21. Li, Y.; Carabelli, S.; Fadda, E.; Manerba, D.; Tadei, R.; Terzo, O. Machine Learning and Optimization for Production Rescheduling in Industry 4.0. Int. J. Adv. Manuf. Technol. 2020, 110, 2445–2463. [Google Scholar] [CrossRef]
  22. Pereira, M.S.; Lima, F. A Machine Learning Approach Applied to Energy Prediction in Job Shop Environments. In Proceedings of the IECON 2018-44th Annual Conference of the IEEE Industrial Electronics Society, Washington, DC, USA, 21–23 October 2018; pp. 2665–2670. [Google Scholar]
  23. Li, Y.; Chen, Y. Neural Network and Genetic Algorithm-Based Hybrid Approach to Dynamic Job Shop Scheduling Problem. In Proceedings of the 2009 IEEE International Conference on Systems, Man and Cybernetics, San Antonio, TX, USA, 11–14 October 2009; pp. 4836–4841. [Google Scholar]
  24. Wang, C.; Jiang, P. Manifold Learning Based Rescheduling Decision Mechanism for Recessive Disturbances in RFID-Driven Job Shops. J. Intell. Manuf. 2018, 29, 1485–1500. [Google Scholar] [CrossRef]
  25. Mihoubi, B.; Bouzouia, B.; Gaham, M. Reactive Scheduling Approach for Solving a Realistic Flexible Job Shop Scheduling Problem. Int. J. Prod. Res. 2021, 59, 5790–5808. [Google Scholar] [CrossRef]
  26. Adibi, M.A.; Shahrabi, J. A Clustering-Based Modified Variable Neighborhood Search Algorithm for a Dynamic Job Shop Scheduling Problem. Int. J. Adv. Manuf. Technol. 2014, 70, 1955–1961. [Google Scholar] [CrossRef]
  27. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
  28. Riedmiller, S.; Riedmiller, M. A Neural Reinforcement Learning Approach to Learn Local Dispatching Policies in Production Scheduling. In Proceedings of the IJCAI, Stockholm, Sweden, 31 July–6 August 1999; Volume 2, pp. 764–771. [Google Scholar]
  29. Chen, X.; Hao, X.; Lin, H.W.; Murata, T. Rule Driven Multi Objective Dynamic Scheduling by Data Envelopment Analysis and Reinforcement Learning. In Proceedings of the 2010 IEEE International Conference on Automation and Logistics, Hong Kong and Macau, China, 16–20 August 2010; pp. 396–401. [Google Scholar]
  30. Gabel, T.; Riedmiller, M. Distributed Policy Search Reinforcement Learning for Job-Shop Scheduling Tasks. Int. J. Prod. Res. 2012, 50, 41–61. [Google Scholar] [CrossRef]
  31. Zhao, M.; Li, X.; Gao, L.; Wang, L.; Xiao, M. An Improved Q-Learning Based Rescheduling Method for Flexible Job-Shops with Machine Failures. In Proceedings of the 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), Vancouver, BC, Canada, 22–26 August 2019; pp. 331–337. [Google Scholar]
  32. Shahrabi, J.; Adibi, M.A.; Mahootchi, M. A Reinforcement Learning Approach to Parameter Estimation in Dynamic Job Shop Scheduling. Comput. Ind. Eng. 2017, 110, 75–82. [Google Scholar] [CrossRef]
  33. Luo, S. Dynamic Scheduling for Flexible Job Shop with New Job Insertions by Deep Reinforcement Learning. Appl. Soft Comput. 2020, 91, 106208. [Google Scholar] [CrossRef]
  34. Bouazza, W.; Sallez, Y.; Beldjilali, B. A Distributed Approach Solving Partially Flexible Job-Shop Scheduling Problem with a Q-Learning Effect. IFAC 2017, 50, 15890–15895. [Google Scholar] [CrossRef]
  35. Wang, Y.-F. Adaptive Job Shop Scheduling Strategy Based on Weighted Q-Learning Algorithm. J. Intell. Manuf. 2020, 31, 417–432. [Google Scholar] [CrossRef]
  36. Trentesaux, D.; Pach, C.; Bekrar, A.; Sallez, Y.; Berger, T.; Bonte, T.; Leitão, P.; Barbosa, J. Benchmarking Flexible Job-Shop Scheduling and Control Systems. Control. Eng. Pract. 2013, 21, 1204–1225. [Google Scholar] [CrossRef] [Green Version]
  37. Nouiri, M.; Bekrar, A.; Trentesaux, D. An Energy-Efficient Scheduling and Rescheduling Method for Production and Logistics Systems. Int. J. Prod. Res. 2020, 58, 3263–3283. [Google Scholar] [CrossRef]
  38. Mirjalili, S. Genetic algorithm. In Evolutionary Algorithms and Neural Networks; Springer: Berlin/Heidelberg, Germany, 2019; pp. 43–55. [Google Scholar]
  39. Nouiri, M.; Bekrar, A.; Jemai, A.; Trentesaux, D.; Ammari, A.C.; Niar, S. Two Stage Particle Swarm Optimization to Solve the Flexible Job Shop Predictive Scheduling Problem Considering Possible Machine Breakdowns. Comput. Ind. Eng. 2017, 112, 595–606. [Google Scholar] [CrossRef]
  40. Yuan, B.; Gallagher, M. A hybrid approach to parameter tuning in genetic algorithms. In Proceedings of the 2005 IEEE Congress on Evolutionary Computation, Edinburgh, UK, 2–4 September 2005; Volume 2. [Google Scholar]
  41. Angelova, M.; Pencheva, T. Tuning genetic algorithm parameters to improve convergence time. Int. J. Chem. Eng. 2011, 2011, 646917. [Google Scholar] [CrossRef] [Green Version]
  42. Vieira, G.E.; Herrmann, J.W.; Lin, E. Rescheduling Manufacturing Systems: A Framework of Strategies, Policies, and Methods. J. Sched. 2003, 6, 39–62. [Google Scholar] [CrossRef]
  43. Qiao, F.; Wu, Q.; Li, L.; Wang, Z.; Shi, B. A Fuzzy Petri Net-Based Reasoning Method for Rescheduling. Trans. Inst. Meas. Control. 2011, 33, 435–455. [Google Scholar] [CrossRef]
  44. François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An Introduction to Deep Reinforcement Learning. In Foundations and Trends in Machine Learning; University of California: Berkeley, CA, USA, 2018; Volume 11, pp. 219–354. [Google Scholar]
  45. Li, Y. Deep Reinforcement Learning: An Overview. arXiv Preprint 2017, arXiv:1701.07274. [Google Scholar]
  46. Brandimarte, P. Routing and Scheduling in a Flexible Job Shop by Tabu Search. Ann. Oper. Res. 1993, 41, 157–183. [Google Scholar] [CrossRef]
  47. Nouiri, M. Implémentation d’une Méta-Heuristique Embarquée Pour Résoudre Le Problème d’ordonnancement Dans Un Atelier Flexible de Production. Ph.D. Thesis, Ecole Polytechnique de Tunisie, Carthage, Tunisia, 2017. [Google Scholar]
  48. Bożejko, W.; Uchroński, M.; Wodecki, M. Parallel Hybrid Metaheuristics for the Flexible Job Shop Problem. Comput. Ind. Eng. 2010, 59, 323–333. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.