*Article* **Rescheduling of Distributed Manufacturing System with Machine Breakdowns**

**Xiaohui Zhang 1 , Yuyan Han 2 , Grzegorz Królczyk 3 , Marek Rydel 4 , Rafal Stanislawski 4 and Zhixiong Li 3, \***


**Abstract:** This study attempts to explore the dynamic scheduling problem from the perspective of operational research optimization. The goal is to propose a rescheduling framework for solving distributed manufacturing systems that consider random machine breakdowns as the production disruption. We establish a mathematical model that can better describe the scheduling of the distributed blocking flowshop. To realize the dynamic scheduling, we adopt an "event-driven" policy and propose a two-stage "predictive-reactive" method consisting of two steps: initial solution pregeneration and rescheduling. In the first stage, a global initial schedule is generated and considers only the deterministic problem, i.e., optimizing the maximum completion time of static distributed blocking flowshop scheduling problems. In the second stage, that is, after the breakdown occurs, the rescheduling mechanism is triggered to seek a new schedule so that both maximum completion time and the stability measure of the system can be optimized. At the breakdown node, the operations of each job are classified and a hybrid rescheduling strategy consisting of "right-shift repair + local reorder" is performed. For local reorder, we designed a discrete memetic algorithm, which embeds the differential evolution concept in its search framework. To test the effectiveness of DMA, comparisons with mainstream algorithms are conducted on instances with different scales. The statistical results show that the ARPDs obtained from DMA are improved by 88%.

**Keywords:** distributed manufacturing; rescheduling; memetic algorithm

#### **1. Introduction**

With the advancement of economic globalization and the intensification of mergers between enterprises, the emergence of large-scale or concurrent production makes the pattern of distributed manufacturing necessary [1,2]. Distributed manufacturing decentralizes tasks into factories or workshops from different geographical locations. This pattern can help the manufacturers raise productivity, reduce cost, control risks, and adjust marketing policies more flexibly [3]. As an important part of distributed manufacturing, scheduling directly affects the efficiency and competitiveness of enterprises. Generally speaking, to solve such problems, a problem-specific model with production constraints should be first established to describe the scheduling problem considered. Then, optimization methods (e.g., mathematical programming, intelligent optimization, etc.) of operational research are developed to search for an optimal solution. For systems with large-scale and high complexity, mathematical programming such as integer programming, branch and bound, dynamic programming, or cut plane can rarely find an optimal solution (ranking) in the

**Citation:** Zhang, X.; Han, Y.; Królczyk, G.; Rydel, M.; Stanislawski, R.; Li, Z. Rescheduling of Distributed Manufacturing System with Machine Breakdowns. *Electronics* **2022**, *11*, 249. https://doi.org/10.3390/ electronics11020249

Academic Editors: Darius Andriukaitis and Peter Brida

Received: 5 December 2021 Accepted: 10 January 2022 Published: 13 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

target space due to enumeration concept, but the efficiency decreases with the increment of the number of jobs/tasks to be scheduled.

At present, most studies use intelligent optimization algorithms to approximate the optimal solution for scheduling problems. The intelligent optimization algorithm, also called the evolutionary optimization algorithm, or metaheuristic, reveals the design principle of optimization algorithm through the understanding of relevant behavior, function, rules, and action mechanism in biological, physical, chemical, social, artistic, and other systems or fields. It refines the corresponding feature model under the guidance of the characteristics of specific problems and designs an intelligent iterative search process. That is, these kinds of algorithms do not rely on the characteristics of problems, but obtain near-optimal solutions through continuous iterations of global and local search. When an intelligent algorithm is applied for scheduling problems, it can express the schedule as a permutation model in the form of coding, and further compress the solution space into a very flat space, so that a large number of different permutations (schedules) correspond to the same target. Hence, the permutation model-based algorithm can search more different schedules in the target space range in tens of milliseconds to tens of seconds, so as to obtain a solution better than the traditional mathematical programming method.

The object of this study is related to the distributed blocking flowshop scheduling problem (DBFSP) [4]. Figure 1 illustrates DBFSP, which considers *f* parallel factories that contain the same machine configurations and technological processes [5]. The jobs can be assigned to any factories and each job follows the same blocking manufacturing procedure [6]. Although the machines configured in each distributed factory are the same, the processing time of each operation of each job is assumed to be different, thereby the processing tasks assigned to each distributed factory and their completion time are also different. The idea of solving DBFSP is to reasonably allocate the jobs to the factory through optimization algorithms, and then sequence the jobs in each distributed factory, to optimize the manufacturing objectives of the whole work order. Currently, researchers have made great efforts on solving DBFSP in a static environment, the existing researches mainly focused on the construction of mathematical models and the design of optimization algorithms. Zhang et al. [7] have established two different mathematical models using forward and reverse recursion approaches. A hybrid discrete differential evolution (DDE) algorithm was proposed to minimize the maximum completion time (makespan). Zhang et al. [8] constructed the mixed-integer model for DBFSP and developed a discrete fruit fly algorithm (DFOA) with a speed-up mechanism to minimize the global makespan. Additionally, Shao et al. [9] proposed a hybrid enhanced discrete fruit fly optimization algorithm (HEDFOA) to optimize the makespan. A new assignment rule and an insertion-based improvement procedure were developed to initialize the common central location of different fruit fly swarms. Li et al. [10] investigated a special case of DBFSP, in which a transport robot was embedded in each factory. The loading and unloading times are considered and different for all of the jobs conducted by the robot. An improved iterated greedy (IIG) algorithm was proposed to improve productivity. Moreover, Zhao et al. [11] proposed an ensemble discrete differential evolution (EDE) algorithm, in which three initialization heuristics that consider the front delay, blocking time, and idle time were designed. The mutation, crossover, and selection operators are redesigned to assist the EDE algorithm to execute in the discrete domain.

The above researches on DBFSP have formed a certain system, but they assumed that no explicit disruptions occur during the manufacturing process. In fact, a series of uncertainties often happened during the manufacturing process [12]. These uncertainties, which are sudden and uncontrollable, can change the state of the system strongly and affect the scheduling activities continuously [13]. As a result, the original static schedules are no longer suitable for real-time scheduling. To eliminate the impact of sudden uncertainties, rescheduling operations are generally performed in response to disruptions [14,15]. Rescheduling refers to the procedure of modifying the existing schedule to obtain a new feasible one after uncertain events occur [16]. One of the most important rescheduling

strategies for traditional flowshop is "predictive-reactive" scheduling [17]. "predictivereactive" scheduling defines a two-stage "event-driven" scheduling operation: the first stage generates an initial schedule that provides a baseline reference for other manufactural activities such as procurement and distribution of raw materials [18]. Influenced by the disruptions, the second stage explicitly quantifies the disruptions, constructs the management model with the disruption information gathered by the cyber-physical smart manufacturing technology [19–21], adjusts the initial schedule, and makes an effective trade-off between the initial optimization objective and the disturbance objective [22]. Since little literature is on the rescheduling of DBFSP, we review only the rescheduling strategies and algorithms developed for traditional and single flowshop. To realize the rescheduling, a suitable strategy should be determined in advance according to the scenario. Framinan et al. [23] discussed the problem of high system tension caused by continuous rescheduling of multi-stage flow production. A rescheduling strategy was described by estimating the availability of the machines after disruptions and a reordering algorithm based on the critical path was proposed. Katragjini et al. [24] analyzed eight types of uncertainties and designed rescheduling strategies through the classification of job status, which considers the completed, in processed and unprocessed operations. Iris et al. [25] designed a recoverable strategy taking the uncertainty of crane arrival to the ship and the fluctuation of loading and unloading speeds into account. The rescheduling strategy used a proactive baseline with reactive costs as the objective. Ma et al. [26] took the overmatch time (difference between real manufacturing time and the estimated time of the initial schedule) as one of the objectives in the rescheduling model to handle production emergencies in parallel flowshops. Li et al. [27] discussed both machine breakdown and processing change interruptions for a hybrid flowshop. The authors have proposed a hybrid fruit fly optimization algorithm (HFOA) with processing-delay, cast-break erasing, and right-shift strategy to minimize different rescheduling objectives in a steelmaking-foundry system. Li et al. [28] also considered five types of interruption events in the flowshop, namely machine breakdown, new jobs arrival, jobs cancellation, job processing change, and job release time change. A rescheduling strategy based on job status was designed for each event. A discrete teaching and learning optimization (DTLO) algorithm was proposed to optimize the makespan and stability. Valledor et al. [29] applied the Pareto optimum to solve the multi-objective flowshop rescheduling problem with makespan, total weighted tardiness, and steadiness as objectives. Three classes of disruptions (appearance of new jobs, machine faults, and changes in operational times) were discussed and an event management model was constructed. A restarted iterated Pareto greedy (RIPG) metaheuristic is used to find the optimal Pareto front.

**Figure 1.** An example of DBFSP with the Gantt chart.

From the above review, it can be concluded that current researches focused mostly on the rescheduling of a single flowshop with various constraints. Little literature has considered rescheduling from the distributed manufacturing perspective. Though the Industry 4.0 wireless networks [30,31] have quickly developed in recent years, they are involved more in distributed information interconnection rather than decision making in scheduling fields. Likewise, the big data-driven technology [32,33] may provide real-time decisions or schedule rules for small-scale manufacturing, but has not formed a sound system. Moreover, big data technology relies strongly on a large amount of historical data, it is difficult to apply to new products due to the highly discrete, stochastic, and distributed properties of scheduling problems. Therefore, with the in-depth application of distributed manufacturing, distributed rescheduling strategies and approaches need to be formulated prudently so that effective references could be provided for modern decision-makers.

On the other hand, the objects of job shop scheduling are usually individual jobs, products, or other resources in the manufacturing process. Such resources have typical discrete characteristics, which need to be marked and expressed through special information carriers, and then obtain new combinations (ranking) by constantly updating the information carriers. These optimization characteristics are similar to the optimization process of the intelligent algorithm based on the permutation model. Therefore, the intelligent algorithm based on the evolution concept is more suitable for solving scheduling problems.

According to the above analysis and good applicability of the intelligent algorithm, we use an intelligent algorithm to reschedule the distributed blocking flowshop scheduling problem in a dynamic environment (DDBFSP). In the last decade, the application of intelligent algorithms for solving scheduling problems has been extensively investigated. Memetic algorithm (MA), also called the Lamarckian evolutionary algorithm, is attracting increasing concern. The concept of "meme" refers to contagious information patterns proposed by Dawkins in 1976 [34]. "Memes" are similar to genes in GA, but there are differences: Memetic evolution is characterized by Lamarckism, while genetic evolution is characterized by Darwinism. Meanwhile, neural system-based memetic information is more malleable than genetic information, so memes are more likely to change and spread more quickly than genes. In evolutionary computing, MA can combine various global and local strategies to construct different search frameworks, which possess the characteristics of GA but with stronger merit-seeking ability. MA was widely applied in many engineering problems, such as vehicle path planning [35], home care routing [36], bin packing problem [37], broadcast resource allocation [38], and production scheduling optimization [39–41]. Until now, MA has not been applied to solve DBFSP in a dynamic environment(DDBFSP), it will be of significance to extend MA as a solver for DDBFSP.

In summary, this paper aims to optimize DDBFSP with both makespan and stability measures as the objectives. The machine breakdown is defined as the disruption and assumed to happen stochastically in any distributed factories. To handle such dynamic events, a problem-specific disruption management model is constructed. A rescheduling framework that includes a job status-oriented classification strategy and a reordering algorithm-discrete Memetic algorithm (DMA) is proposed. For DMA, the differential evolution (DE) operators have been embedded to execute the neighborhood search. A simulated annealing (SA)-based reference local search framework is designed to help the algorithm escape from local optimums. Finally, the effectiveness of DMA is validated through comparative experiments. It is expected that the effect after rescheduling is to highly maintain the level of optimization of the original manufacturing objective (makespan) while ensuring the stability of the newly generated schedules.

The remainder of the paper is organized as follows. Section 2 states DDBFSP and constructs the mathematical model and objective function for DDBFSP. Section 3 designs the corresponding rescheduling framework. Section 4 elaborates the details of the DMA reordering algorithm. Section 5 verifies the performance of DMA and analyzes the results; Section 6 summarizes the research content of this paper.

#### **2. Method**

In this section, the mathematical models for DBFSP with optimization objectives in both static and dynamic environments are proposed. The classifications of job status after breakdown events are also introduced.

#### *2.1. Statement of DBFSP in Static Environment*

As can be seen in Figure 1, DBFSP not only needs to consider the correlation between processing task characteristics and blocking constraints but also needs to consider the coupling of global scheduling and local scheduling of each distributed factory, the solving process is more complex. As illustrated in Figure 1, a set of jobs *J* = *Jj* |1, 2, . . . , *n* will be assigned to a set of factories *F* = {*F<sup>k</sup>* |*k* =1, 2, . . . , *f* } , each of which contains a set of machines *M* = {*M<sup>i</sup>* |1, 2, . . . , *m*}. The blocking constraint determines that no buffers are allowed between two adjacent machines. Therefore, the job can only be released to the next operation when the subsequent machine is free; otherwise, the job must be blocked on the current machine. We assume the processing time for each job is stochastic and different. After a job is assigned to a processing plant, it is not allowed to move to other factories.

Assume *n<sup>k</sup>* (*n<sup>k</sup>* ≤ *n*) jobs are assigned to factory *k*, and the job sequence in this factory is denoted as *π<sup>k</sup>* , where *π<sup>k</sup>* (*l*) represents the *l*-th job in *π<sup>k</sup>* . The operation *Oπ<sup>k</sup>* (*l*),*<sup>i</sup>* has a processing time *Pπ<sup>k</sup>* (*l*),*i* . Assume *Sπ<sup>k</sup>* (*l*),0 is the start time of *π<sup>k</sup>* (*l*) on the first machine of factory *k*, and *dπ<sup>k</sup>* (*l*),*i* is defined as the departure time of operation *Oπ<sup>k</sup>* (*l*),*<sup>i</sup>* on machine *i*. The recursive formulas of DBFSP can be derived as follows:

$$S\_{\pi\_k(1),0} = 0 \tag{1}$$

$$D\_{\pi\_k(1),i} = D\_{\pi\_k(1),i-1} + P\_{\pi\_k(1),i}, \quad i = 2,3,\ldots,m \tag{2}$$

$$S\_{\pi\_k(\mathbf{0}),0} = D\_{\pi\_k(l-1),1\prime} \; i = 2, \ldots, n\_k \tag{3}$$

$$D\_{\pi\text{(l)},i} = \max\left\{ D\_{\pi\text{(l)},i-1} + p\_{\pi\text{(l)},i}, D\_{\pi\text{(l-1)},i+1} \right\}, \quad l = 2,3,\ldots,n\_{\text{k}} \quad i = 1,2,\ldots,m-1 \tag{4}$$

$$D\_{\pi\_k(l),m} = D\_{\pi\_k(l),m-1} + P\_{\pi\_k(l),m}, \quad l = 2,3,\ldots,n\_k \tag{5}$$

In the above equations, Equations (1) and (2) calculate the start and departure time of the first job *π<sup>k</sup>* from machine 1 to the last machine *m* in factory *k*. Equations (3) and (4) calculate the start and departure time of job *π<sup>k</sup>* (*l*) from machine 1 to machine *m* − 1. Equation (5) gives the departure time of *π<sup>k</sup>* (*l*) on machine *m*. If we take makespan *C*(*π<sup>k</sup>* ) as the optimization objective, *C*(*π<sup>k</sup>* ) of factory *k* can be expressed as:

$$\mathbf{C}(\pi\_k) = D\_{\pi\_k(u\_k),m} \tag{6}$$

As a result, the global makespan for DBFSP is defined through the comparison between *C*(*π<sup>k</sup>* ) of all distributed factories:

$$\mathbb{C}\_{\text{max}}(\Pi) = \max\_{k=1}^{f} \left( \mathbb{C}(\pi\_k) \right) \tag{7}$$

The detailed recursive process and example refer to our previous group work [8] for solving DBFSP.

#### *2.2. Statement of DDBFSP*

When characterizing the machine breakdown event of DBFSP in a dynamic and stochastic environment, the following questions should be marked: (1) Which factory has happened the breakdown event and when (the probability of breakdown)? (2) Which machine in that factory breaks (the distributivity of the breakdown)? (3) When will the machine resume?

In fact, machine breakdowns are difficult to simulate since the probabilistic model of breakdown occurrence could hardly cover the real manufacturing situation. Moreover, the recovery time is mostly predicted based on a priori knowledge, which cannot guarantee accuracy. With consideration of randomness and distribution, this paper triggers machine breakdowns in a randomly selected distributed factory at time *t*. The breakdown time is defined to follow a discrete uniform distribution function which is expressed as follows:

$$E(B\_{k,l}) = rand()^\alpha \mathbb{P}(T\_l) + p\_{\pi\_k(l)} \times i, \quad i = 1, 2, 3, \dots, m, \; k = 1, 2, \dots, f, \; l = 1, 2, \dots, n\_k \tag{8}$$

where *rand*() represents the random function between [0, *ω*] and *ω* denotes the maximum constant of the system; "%" is the remainder operator; *P*(*Ti*) represents the total processing time of all jobs on machine *i*. Equation (8) defines the time range of the breakdown in the factory *k* during the processing time of all jobs, i.e., [*Pπ<sup>k</sup>* (*l*),*i* , *Cπ<sup>k</sup>* (*l*),*i* ].

To maintain the distribution of breakdowns and the convenience of experimental statistics, this study assumes that each distributed factory occurs *β* times random breakdowns during manufacturing. Additionally, to maintain the randomness of breakdown occurrence, each machine has the same probability to break down. To ensure a stochastic dynamic environment, the duration of breakdowns are generated randomly and uniformly following the interval [0, *ω*] and are determined immediately after the event. Moreover, other constraints for the breakdown event in DDBFSP are defined as follows:


#### *2.3. Optimization Objectives of DDBFSP*

In DBFSP, it is generally necessary to consider the production efficiency-related objective, e.g., makespan. While in a dynamic environment, from the decision point of view, stability becomes one importantly practical metric for manufacturing systems. If rescheduling optimization is performed only considering the production efficiency-related indicators, it may generate new schedules that deviate significantly from the initial plan, which in turn affects other planning activities, such as material management and manpower planning. Therefore, in the rescheduling phase, in addition to makespan, the stability of the new schedules should be considered. In this study, the initial schedule of one distributed factory before a breakdown occurs is denoted by *B* (Baseline), and the schedule after rescheduling is denoted by *B*∗. The goal of rescheduling is to optimize both the makespan and the stability of the distributed factory at each breakdown node. The first objective (makespan) of the DDBFSP is expressed as follows:

$$f\_1 = \mathbb{C}\_{\text{max}}(B\*)\tag{9}$$

The second objective (stability) of the DDBFSP is derived as follows.

$$f\_2 = \min \{ \sum\_{i=1}^{m} \sum\_{l=1}^{n\_k} Z\_{\pi\_k(l),i} \}, \quad k = 1, 2, \dots, f, \ n\_k = 1, 2, \dots, n \tag{10}$$

where the decision variable *Zπ<sup>k</sup>* (*l*),*i* indicates whether the relative position of the job *B* and *B*∗ has changed. *Zπ<sup>k</sup>* (*l*),*<sup>i</sup>* = 1 represents that the position of job *π<sup>k</sup>* (*l*) on machine *i* in factory *k* has changed in the new schedule *B*∗ and vice versa *Zπ<sup>k</sup>* (*l*),*<sup>i</sup>* = 0.

To simplify the optimization process and avoid redundant calculations, a weighting mechanism is applied to combine both objective functions:

$$f(B\*) = w\_1 \* f\_1 + w\_2 \* f\_2 \tag{11}$$

In Equation (11), *w*<sup>1</sup> and *w*<sup>2</sup> represent the weight coefficients of *f*<sup>1</sup> and *f*2, respectively. Since *f*<sup>1</sup> and *f*<sup>2</sup> have different distribution ranges of dimensions, to avoid the results being dominated by the data with larger or smaller distribution ranges, the normalization method

proposed in [24] is applied so that the value range of each objective falls in the interval. The normalization function is defined as follows:

$$f(B\*) = w\_1 \* N(f\_1) + w\_2 \* N(f\_2) \tag{12}$$

where:

$$N(f\_1) = \frac{f\_1(B\*) - low(f\_1)}{up(f\_1) - low(f\_1)} \tag{13}$$

$$N(f\_2) = \frac{f\_2(B\*) - low(f\_2)}{up(f\_2) - low(f\_2)} \tag{14}$$

In Equations (13) and (14), *up*(.) and *low*(.) represent the upper and lower bounds of *f*<sup>1</sup> and *f*<sup>2</sup> for the two extreme rankings of the jobs at the breakdown node, respectively. The specific calculation procedure refers to [24].

#### *2.4. Statement of Job Status after Breakdown Event*

After a machine breaks down, the jobs are categorized to construct the event management model:

$$(\mathcal{C}^\*\_{\pi\_k(l),i} - \mathcal{S}^\*\_{\pi\_k(l),i} - P^\*\_{\pi\_k(l),i} + (1 - y\_{\pi\_k(l),i,1})\omega \ge 0 \tag{15}$$

$$\mathcal{C}^\*\_{\pi\_k(l),i} - \mathcal{S}^\*\_{\pi\_k(l),i} - P^\*\_{\pi\_k(l),i} - (1 - y\_{\pi\_k(l),i,1})\omega \le 0 \tag{16}$$

$$(\mathbf{C}^\*\_{\pi\_k(l),i} - \mathbf{S}^\*\_{\pi\_k(l),i} - P^\*\_{\pi\_k(l),i} - B\_{e,i} + B\_{s,i} + (1 - y\_{\pi\_k(l),i,2})\omega \ge 0 \tag{17}$$

$$\mathbf{C}^\*\_{\pi\_k(l),i} - \mathbf{S}^\*\_{\pi\_k(l),i} - P^\*\_{\pi\_k(l),i} - B\_{e,i} + B\_{s,i} - (1 - y\_{\pi\_k(l),i,2})\omega \le 0 \tag{18}$$

$$\mathcal{C}^{\*}\_{\pi\_{\mathbf{k}}(l),i} - \max \left\{ \mathcal{S}^{\*}\_{\pi\_{\mathbf{k}}(l),i'} \mathcal{B}\_{\varepsilon,i} \right\} - P^{\*}\_{\pi\_{\mathbf{k}}(l),i} + (1 - y\_{\pi\_{\mathbf{k}}(l),i,3})\omega \ge 0 \tag{19}$$

$$\mathcal{C}^\*\_{\pi\_k(l),i} - \max \left\{ \mathcal{S}^\*\_{\pi\_k(l),i'} \boldsymbol{\mathcal{B}}\_{\varepsilon,i} \right\} - P^\*\_{\pi\_k(l),i} - (1 - \boldsymbol{y}\_{\pi\_k(l),i,3})\omega \le 0 \tag{20}$$

$$\sum\_{g=1}^{3} Y\_{\pi\_k(l), j, g} = 1, \quad i = \{1, 2, \dots, m\}, \; k = \{1, 2, \dots, f\}, \; g = \{1, 2, 3\} \tag{21}$$

$$Y\_{\pi\_k(l),i,\mathcal{g}} = \{0,1\}, \quad i = \{1,2,\ldots,m\}, \; k = \{1,2,\ldots,f\}, \; \mathcal{g} = \{1,2,3\} \tag{22}$$

In the above equations, *C* ∗ *πk* (*l*),*i* denotes the completion time of job *π<sup>k</sup>* (*l*) on the machine *i* in *B*∗. *S* ∗ *πk* (*l*),*i* and *P* ∗ *πk* (*l*),*i* represent the corresponding start time and processing time of *C* ∗ *πk* (*l*),*i* , respectively. *Be*,*<sup>i</sup>* and *Bs*,*<sup>i</sup>* denote the occurrence time and completion time of the breakdown on the machine *i*. Equation (15) to Equation (20) defines three statuses in which an operation of a job is in when a breakdown occurs: Equations (15) and (16) determine that the operation was completed before the breakdown occurs; Equations (17) and (18) determine that the operation is being processed when the breakdown occurs; Equations (19) and (20) determine that the operation was originally scheduled to start after the machine is recovered. Equation (21) represents that the job can only be in a state in case a breakdown occurs. Equation (22) is a binary decision variable set for three cases: (1) *Yπ<sup>k</sup>* (*l*),*i*,1 = 1 means the operation is completed before the breakdown occurs; (2) *Yπ<sup>k</sup>* (*l*),*i*,2 = 1 denotes the operation overlaps with the machine breakdown node; (3) *Yπ<sup>k</sup>* (*l*),*i*,3 = 1 means the operation begins after the machine is resumed.

For a better understanding, an example is presented in Figure 2 to illustrate the classification of the operation status at the moment that a machine breaks down. In case 1 of Figure 2, machine 2 of factory *k* breaks down at time 55, at which time the operations *Oπ<sup>k</sup>* (1),1, *Oπ<sup>k</sup>* (1),2 and *Oπ<sup>k</sup>* (2),1 have already completed processing. Their start and completion times are not affected and do not need to be adjusted in the rescheduling phase. In case 2 of Figure 2, machine 1 breaks down at node 55 and operation *Oπ<sup>k</sup>* (3),1 is being processed at this time. The breakdown divides *Oπ<sup>k</sup>* (3),1 into two parts: the finished and the remaining part, the remaining part is completed after the machine is recovered. Hence, the start time *Oπ<sup>k</sup>* (3),1 remains unchanged in the rescheduling phase, but its finish time is affected by both

the breakdown time and the recovery time. In case 3 of Figure 2, machine 1 breaks down at node 38 which is before the startup of job *J*<sup>3</sup> and *J*4. Therefore, the start and finish times of *J*<sup>3</sup> and *J*<sup>4</sup> are affected by both the breakdown time and the recovery time. 3 4

\*

\* ( ),

( ), , ,

\*

(1),1 (1),2 (2),1

(3),1

\*

\* ( ), ( ), ( )

( ), ,3 1

(3),1

3 4

( ), ,1 1 ( ), ,2 1

(3),1

**Figure 2.** Classification of job status at machine breakdowns.

#### **3. Rescheduling Framework for DDBFSP**

Since the breakdowns occur during the manufacturing procedure, on which each job has already been fixed in a certain factory for processing, a change of jobs between different factories is unrealistic. Hence, the individual factory is defined as the decision subject in response to the events.

#### *3.1. Rescheduling Strategy*

We envision a stochastic and dynamic distributed scheduling environment. A twostage "predictive-reactive" method is proposed for DDBFSP: initial solution pre-generation and rescheduling. In the first stage, the initial schedule for DDBFSP is generated by considering a static environment without machine breakdowns. After the breakdown occurs, the initial schedule may no longer be optimal. Therefore, in the second stage, the "event-driven" rescheduling is triggered to evaluate the breakdown, and a new schedule is provided in response to the events. Since the new schedule will be executed until the next breakdown occurs, the rescheduling strategy proposed in this study should have a dual objective: on one hand, to adapt and minimize the impact of the event; on the other hand, to generate a schedule that gives a good tradeoff between scheduling quality and stability when the event is resumed.

#### *3.2. Rescheduling Method*

According to the classification of operation status in Section 2.3, we propose a hybrid rescheduling method: "right-shift repair + local reorder". At the breakdown node, no adjustment is made to the completed operations; for the directly affected operations, the right-shift strategy is adopted for a local repair; for the jobs that have not yet started their processing, the reordering algorithm is performed to seek a better partial schedule. The proposed "right-shift repair + local reorder" method is described as follows:

First, determine the jobs that are to be rescheduled. According to the constraint limitation of the flowshop manufacturing, once the processing sequence of the jobs on the first machine is determined, their sequence on other machines must be the same. Therefore, we mark the jobs *π<sup>k</sup>* (*l*) at the breakdown node by using the first machine as the reference. Suppose that there is a breakdown event at time *Bs*,*<sup>i</sup>* when job *π<sup>k</sup>* (*l*) is processed on machine *i*, then the jobs in this factory of which the first operation has been started are counted in the set *Nc*; relatively, the jobs of which the first operation has not been started are counted in the set of unprocessed jobs *Nn*.

Subsequently, the jobs *N<sup>c</sup>* are further divided by taking *π<sup>k</sup>* (*l*) as the midpoint: the jobs sequencing before *π<sup>k</sup>* (*l*) are included in the set *Nc*,*<sup>c</sup>* = {*π<sup>k</sup>* (1), . . . , *π<sup>k</sup>* (*l* − 1)}; the remaining jobs *N<sup>c</sup>* are included in the set *Nc*,*n*. The rescheduling system maintains the same order and time points for the operations of jobs *Nc*,*c*. For the operations of jobs in *Nc*,*n*, the unaffected operations remain unchanged; the affected and other unprocessed operations are adjusted using the right-shift repair method [42], which shifts the start time to right by certain matching time units. The right-shift repair is essentially a FIFO-based heuristic.

At the breakdown node, all jobs *N<sup>n</sup>* have not been started processing. Their initial order may be no longer optimal after the recovery. Thus, we propose an improved algorithm to reorder the jobs. Eventually, the new schedule is merged into the global schedule and is executed as the baseline until the next breakdown occurs.

To describe the proposed "right-shift repair + local reorder" method more clearly, Figure 3 shows a comparative example with different rescheduling methods at the time of breakdown in one distributed factory. As shown in Figure 3a, the initial schedule of the factory has a desired makespan of 36. During manufacturing, Machine 2 breaks down at time *Bs*,*<sup>i</sup>* = 8 and is assumed to be recovered at *Be*,*<sup>i</sup>* = 11. At this point, the jobs that have already been processed on machine 1 are *J*<sup>2</sup> and *J*<sup>1</sup> (marked in gray), and these two jobs are counted in the set *Nc*. In contrast, jobs *J*3, *J*<sup>4</sup> and *J*<sup>5</sup> are counted in *Nn*. The framework adjusts the affected operations in *N<sup>c</sup>* based on the breakdown information and reorders the jobs in *Nn*. As seen in Figure 3b, the right-shift method is implemented on partial operations of the jobs (marked in yellow), the process order remains the same as *J*<sup>2</sup> → *J*<sup>1</sup> → *J*<sup>5</sup> → *J*<sup>3</sup> → *J*<sup>4</sup> and the makespan is delayed to 41. In Figure 3c while using the "right-shift repair + local reorder" method, job *J*3, *J*<sup>4</sup> and *J*<sup>5</sup> (marked in green) is reordered using the reordering algorithm. The process order changes to *J*<sup>2</sup> → *J*<sup>1</sup> → *J*<sup>3</sup> → *J*<sup>4</sup> → *J*<sup>5</sup> and the makespan is 37, which absorbs 4 units of the recovery time. This example proves that the proposed rescheduling method is more efficient and flexible than the single rule-based (right-shift repair) heuristics.

#### *3.3. Rescheduling Procedure*

We illustrate the proposed optimization procedure for DDBFSP in Figure 4. First, with makespan as the optimization objective, a global schedule for DBFSP in a static environment is generated using DFOA [8]; then, each distributed factory executes manufacturing tasks according to their initial schedules; the breakdowns are triggered following the discrete generation mechanism proposed in Section 2.2. Each time the breakdown happens, rescheduling is implemented by the single distributed factory, with makespan and stability as optimization objectives. According to process status at the breakdown node, the jobs are classified and different methods (right-shift repair or reordering algorithm) are performed. The rescheduling results of different job sets are integrated, and the updated schedule under a single breakdown is provided as the initial schedule before the next event occurs. The above procedure is repeated until the termination condition of the breakdown trigger is met, and the final schedule is output.

,

21534

8 ,

11

21345

2 1

3 4 5

3 4 5

**Figure 3.** Comparison of different rescheduling strategies.

**Figure 4.** Proposed rescheduling procedure for DDBFSP.

#### **4. Reordering Algorithm-DMA**

Since the "right-shift repair" is introduced in [42], this section introduces the proposed reordering algorithm-DMA for the jobs in the set *Nn*.

#### *4.1. Introduction of Standard MA*

MA was initially defined as an improvement of GA, it can combine different global and local strategies to construct various search frameworks, which has stronger flexibility, and merit-seeking ability than GA. The flow diagram of the basic MA is shown in Figure 5. MA starts from initializing the population, operates on memes with evolutionary thought, generates new individuals using generating functions (e.g., crossover and variation oper-

ators, etc.), and finally forms new populations using updating functions (e.g., selection operators, etc.).

Although standard MA has a strong optimization-seeking capability, it also encounters problems such as insufficient global exploration capability [43]. On the other hand, DE has proven its powerful search capability since being proposed. Inspired by this, we embed the DE operators into MA and proposed a discrete MA (DMA). DMA mainly contains the following parts: population initialization, DE (mutation and crossover), and local search.

#### *4.2. Population Initialization*

In the initialization phase, we use the well-known job sequence-based method [9] to encode. The position of each job in the sequence represents its processing order on corresponding machines. An initialization method WPNEH (Weighted position NEH method, WPNEH) considering the weighted position of the job is proposed under the premise of determining the reordering jobs.

Step 1: Using the PFT\_NEH(x) heuristic [9] and the initial schedule to generate two seeds *π <sup>P</sup>* and *π B* . The total weighted position of a single job is defined as follows:

$$\boldsymbol{\phi}\_{\rangle} = \chi\_1 \times \boldsymbol{\varrho}\_{\rangle}(\boldsymbol{\pi}^P) + \chi\_2 \times \boldsymbol{\varrho}\_{\rangle}(\boldsymbol{\pi}^B) \tag{23}$$

In Equation (23), *ϕj*(*π P* ) and *ϕj*(*π B* ) represent the absolute positions of job *j* in two seeds. *χ*<sup>1</sup> and *χ*<sup>2</sup> represent the weight coefficients occupied by two seeds. The values of *χ*<sup>1</sup> and *χ*<sup>2</sup> are defined adaptively by the population size *Ps* and the order *l* (*l* = 1, 2, . . . , *Ps*) of the newly generated individuals in the whole population:

$$\chi\_1 = \frac{l}{Ps - 1} \tag{24}$$

$$\chi\_2 = 1 - \frac{l}{Ps - 1} \tag{25}$$

Step 2: Arrange all jobs in decreasing order of the total weighted positions *φ<sup>j</sup>* to obtain a sequence *π* 0 .

Step 3: Create a new empty sequence *π emp*. The jobs in *π* <sup>0</sup> are inserted into each position *π emp* in turn. The solutions obtained from each insertion are evaluated based on Equations (11) and (12) to determine the optimal order. Continue the operations until all jobs finish insertion.

Step 4: Let *l* = *l* + 1, and continue steps 1–3 until all *Ps* individuals are generated. The initialization procedure of WPNEH is shown as Algorithm 1.



#### *4.3. DE Operation*

DE [44] drives the search directions through the mutual synergistic and competitive behaviors among individuals of the population. Its overall structure is similar to GA, but the evolution principle is quite different. DE generates new individuals through perturbing the difference vectors of two or more individuals. It can obtain more operation space and enhance the global search capability when the individuals are significantly different from each other in the early stage of the search. In DMA, two key operators of DE (mutation and crossover) are embedded to perform the global search.

#### 4.3.1. Mutation

The weighted difference vector of two individuals is first selected. Then, the weighted difference is summed with another individual vector. Hence, three individuals are randomly selected from the population *POP*, the optimal one is defined as *πa*, and the other two are defined as *π<sup>b</sup>* and *πc*. The mutation operator can be expressed as:

$$V\_a = \pi\_a \oplus \kappa \otimes (\pi\_b - \pi\_c) \tag{26}$$

where *κ* is the mutation scaling factor used to control the magnitude of the differences. "⊗" represent the weighted differences between *π<sup>b</sup>* and *πc*:

$$
\pi\_{\mathfrak{a}} - \pi\_{\mathfrak{b}} = \Delta \Leftrightarrow \delta(j) = \begin{cases}
\quad \pi\_{\mathfrak{b}}(j), & \text{if } \pi\_{\mathfrak{b}}(j) \neq \pi\_{\mathfrak{c}}(j) \\
& 0, & \text{otherwise}
\end{cases} \quad j = 1, 2, \dots, n \tag{27}
$$

$$\begin{aligned} \text{x} \otimes \Delta = \Phi \Leftrightarrow \varphi(j) = \begin{cases} \begin{array}{c} \delta(j), \quad \text{if } rand() < \kappa \\ 0, \quad \text{otherwise} \end{array} \end{aligned} \tag{28}$$

In Equations (27) and (28), ∆ = [*δ*(1), *δ*(2), . . . , *δ*(*n*)] and Φ = [*ϕ*(1), *ϕ*(2), . . . , *ϕ*(*n*)] are the two temporary vectors used for the calculation. "⊕" means the mutation individual *π<sup>V</sup>* is obtained through adding Φ with the target individual *πbest*:

$$
\pi\_V = \pi\_{\text{best}} \oplus \Phi \tag{29}
$$

The generation process of *π<sup>V</sup>* is described as follows.

Step 1: Select *πa*, set *j* = 1.

Step 2: If *ϕ*(*j*) = 0, set *j* = *j* + 1, go to step 3; otherwise, generate a random number between (0, 1). If *rand*() < *κ*, update *π<sup>a</sup>* by swapping the job *πa*(*j*) and *ϕ*(*j*); else, insert the job *πa*(*j*) into all different positions of *ϕ*(*j*), take the optimal solution and update *πa*. Let *j* = *j* + 1.

Step 3: If *j* ≤ *n*, return to step 2; otherwise, return *π<sup>V</sup>* = *πa*.

The mutation procedure of DMA is sketched in Algorithm 2.


#### 4.3.2. Crossover

When solving discrete scheduling problems based on job sequence, the probability factor is mainly used to determine the crossed jobs [38]. In this study, we improved the determination method of the crossed jobs by eliminating the crossover probability factor and proposed a random crossover operator: First, select two jobs randomly from the mutated individual *πV*; second, put these two jobs and jobs between them in a temporary set *Ntemp*; third, determine the positions in *π<sup>d</sup>* of all jobs from *Ntemp*, remove all jobs from *Ntemp* and keep their positions. Finally, insert the jobs in *π<sup>d</sup>* with original order to obtain a new sequence *π* ′ . The crossover process for *π<sup>d</sup>* is the same. When the crossover operation of the two parents is completed, the new individual obtained is evaluated and the best one is retained. The crossover operation is sketched in Algorithm 3.

#### **Algorithm 3 Crossover Operation**

**Input**: mutation individual *πV*, target individual *π<sup>d</sup>* , temporary job set *Ntemp* **Output**: new individual *πnew*

01: # Crossover operation on *π<sup>d</sup>*


09: Evaluate new solutions: 10: **If** *f*(*π*′′) < *f*(*π* ′ ) 11: Let *πnew* = *π*′′ , return *πnew* 12: **Else**


To facilitate understanding, **Example 4-1** presents the procedure of DE operation in detail.

#### **Example 4-1**

**Mutation:** Three individuals are randomly selected from the initial population: *π<sup>a</sup>* = [*J*6, *J*3, *J*2, *J*4, *J*1, *J*5], *π<sup>b</sup>* = [*J*1, *J*4, *J*6, *J*2, *J*5, *J*3] and *π<sup>c</sup>* = [*J*3, *J*4, *J*2, *J*1, *J*5, *J*6]. With Equation (26) it yields <sup>∆</sup> = *<sup>π</sup><sup>b</sup>* − *<sup>π</sup><sup>c</sup>* = [*J*1, 0, *<sup>J</sup>*6, *<sup>J</sup>*2, 0, *<sup>J</sup>*3]. A set of random numbers [0.7, 0.6, 0.9, 0.4, 0.1, 0.3] are generated according to Equation (27), so that the mutation scaling factor is *κ* = 0.5, then obtain Φ = [0, 0, 0, *J*2, 0, *J*3]. It can be deduced that when *j* = 4 and *j* = 6, there are *ϕ*(4) = *J*<sup>2</sup> and *ϕ*(6) = *J*3. For *j* = 4, a random number 0.2 (<0.5) is generated between the interval (0,1) satisfying the uniform distribution. Then, two jobs *πa*(4) = *J*<sup>4</sup> and *ϕ*(4) = *J*<sup>2</sup> in *π<sup>a</sup>* are swapped and a new solution *π<sup>a</sup>* = [*J*6, *J*3, *J*4, *J*2, *J*1, *J*5] is obtained; for *j* = 6, a random number 0.7 (>0.5) is also generated, the job *ϕ*(6) = *J*<sup>3</sup> in *π<sup>a</sup>* is inserted into the latter position of *πa*(6) = *J*<sup>5</sup> and a new solution *π<sup>a</sup>* = [*J*6, *J*4, *J*2, *J*1, *J*5, *J*3] is obtained.

**Crossover**: The mutation individual *π<sup>V</sup>* = [*J*6, *J*4, *J*2, *J*1, *J*5, *J*3] and the target individual *π<sup>d</sup>* = [*J*5, *J*3, *J*2, *J*4, *J*6, *J*1] are crossed as parents. First, two jobs *J*<sup>2</sup> and *J*<sup>5</sup> are randomly selected from *πV*, and set *Ntemp* = [*J*2, *J*1, *J*5]. For *π<sup>d</sup>* , remove the same jobs from *Ntemp*, obtain *π<sup>d</sup>* = [*X*, *J*3, *X*, *J*4, *J*6, *X*]. The new solution is obtained by reinserting *Ntemp* = [*J*2, *J*1, *J*5] into *π<sup>d</sup>* . The derivation of the new solution for *π<sup>V</sup>* is the same as *π<sup>d</sup>* , and the result is *π<sup>V</sup>* = [*J*6, *J*4, *J*5, *J*2, *J*1, *J*3].

#### *4.4. Job Block-Based Random Reference Local Search*

As mentioned above, the two key operations of DE can improve the individuals concerning the vector difference of the population. Along with the iteration of an algorithm, the difference between individuals becomes smaller, which tends to lead the algorithm to local optimum easily. Therefore, DMA needs to equip with a local search framework to enhance its exploitation capability. For a long time, reference local search (RLS) has proved to be an effective local search algorithm and is often used to enhance the exploitation of metaheuristics [9]. RLS firstly generates a random reference sequence *π<sup>r</sup>* = [*πr*(1), *πr*(2), . . . , *πr*(*z*)], and uses it as a reference to guide the direction of the local search; subsequently, the jobs *πr*(*j*) are sequentially removed and inserted into all remaining positions of *π<sup>r</sup>* to obtain new solutions. The optimal solution is compared with the incumbent solutions of the population, and if it is better, it is replaced with the population. The insertion process is repeated until all the jobs are traversed. Though RLS has a strong local exploitation capability, it still has some problems. On one hand, RLS uses a single job insertion operation, which may destroy the good information of incumbent solutions and cause the loss of other good solutions. On the other hand, the fixed order of the reference sequence and the fixed insertion process of the jobs will result in a fixed path of local search. If *πr*(*j*) is constant for a long time, it will cause a large number of repeated searches, which directly affects the search efficiency of the algorithm.

Boejko et al. [45] have pointed out that in job sorting scheduling problems, compound moves (insertion and swap) based on the job block can retain excellent sequence information during the evolution of an algorithm. It thus expands the neighborhood structure and search space, which is better than single job insertion and swap operations. Inspired by this idea, we hybridize RLS and compound moves of job block, and propose a random reference local search based on job block (BRRLS): firstly, generate a reference sequence *π<sup>r</sup>* = [*πr*(1), *πr*(2), . . . , *πr*(*z*)] randomly, where *n<sup>r</sup>* represents the number of jobs to be rescheduled; secondly, select two jobs *J<sup>a</sup>* and *J<sup>b</sup>* randomly, construct the job block (including *J<sup>a</sup>* and *J<sup>b</sup>* ) and take out all jobs in the individual *πblock* that needs local search; then, insert the job block into all possible positions of *π*, evaluate the generated solutions, and select the optimal one. Repeat the above procedure (each time select two unselected jobs) until all jobs are traversed. The BRRLS process is sketched in Algorithm 4.

#### **Algorithm 4 BRRLS Procedure**

**Input**: job set *Nn*, individual *π*, temporary set *Ntemp*, temporary set Λ **Output**: new individual *πnew*


08: Evaluate the individuals in Λ, return the optimal solution to *πnew*

To further improve the algorithmic performance, a simulated annealing-like (SA) mechanism [46] is introduced as an acceptance criterion for BRRLS, which guides DMA to receive a certain percentage of poor solutions during the search to avoid being trapped in local optimums. The idea of SA is to compare the neighborhood solution *π* ′ obtained by BRRLS with the incumbent solution *π*. If *f*(*π* ′ ) is better than *f*(*π*), SA replaces *π* with *π* ′ , otherwise, the decision of whether to accept *π* ′ is based on a reception probability *µ*: A random number *rand*() satisfying a uniform distribution is generated between (0, 1); if *rand*() < *µ*, *π* is replaced by the worse neighborhood solution obtained by the search. *µ* is expressed as follows:

$$\mu = e^{-\left(f(\pi') - f(\pi)\right)/\text{Temp}}\tag{30}$$

where *Temp* represents the temperature constant:

$$Temp = T\_0 \frac{\sum\_{j=1}^{n\_k} \sum\_{i=1}^{m} P\_{\pi\_k(I), i}}{10 \times m \times n}, k \in \{1, 2, \dots, f\} \tag{31}$$

In Equation (31), *T*<sup>0</sup> is the temperature adjustment parameter preset by SA. It can be seen from Equation (31) that the closer *f*(*π* ′ ) is to *f*(*π*), the closer the value *µ* is to 1 and *π* ′ will be accepted with a higher probability. Conversely, if *f*(*π* ′ ) is much worse than *f*(*π*), the value *µ* will be close to 0, and the solution *π* ′ will be dropped with a higher probability. Hence, the SA-based reception mechanism ensures that the population does not deviate from the current search position, but can additionally absorb a certain percentage of non-quality solutions to avoid the algorithm from falling into local optimums.

#### *4.5. Update Strategy of the Population*

To maintain the diversity of individuals, the following strategy is applied to update the population: first, a new individual is generated using the mutation and crossover operators; subsequently, a local search is performed on these individuals, and the individuals are updated. finally, the incumbent solutions are replaced by the newly generated solutions, and the uniqueness of these new solutions is ensured to complete a single iteration of the whole population.

#### *4.6. Flowchart of DMA*

According to previous descriptions, Algorithm 5 presents the flowchart of DMA.

The flow diagram of DMA is sketched in Figure 6. In general, DMA contains population initialization, DE operation, and local search. In the initialization phase, the population is generated using the WPNEH method; in the DE phase, the discrete differential mutation and crossover operators are executed to obtain the child individuals; in the local search phase, BRRLS is performed, and the simulated annealing mechanism is adopted as the reception criterion. Finally, the population is updated and the optimal solution is output.

#### **Algorithm 5 Flowchart of DMA**

**Input**: population size *Ps*, mutation scaling factor *κ*, reordered job number *n* / 2

**Output**: best solution *π*


/ 2


12: **End While**

**Figure 6.** Flow diagram of DMA.

#### **5. Experimental Comparison and Analysis**

#### *5.1. Experimental Settings*

Since few pieces of literature and public benchmarks have been developed for DDBFSP, we apply DFOA on its benchmark [8] to obtain test instances. These test instances are used as the initial schedules for each distributed factory. To fulfill different experimental requirements, the range of variable intervals of the DPFSP benchmark is set as *n* ∈ {50, 100, 200}, *m* ∈ {5, 10, 20} and *f* ∈ {2, 3, 4}. There are 27 combinations of different parameters, each containing 10 instances. The termination criterion of the algorithm is set to *T*max = 90 × *n* × *m* milliseconds.

The breakdown events are simulated according to the mechanism introduced in Section 2.2. When an event occurs on machine *i*, the trigger node is firstly limited to the interval [*Pπ<sup>k</sup>* (*l*),*i* , *Cπ<sup>k</sup>* (*l*),*i* ] to ensure the timeliness of the breakdown. Since DMA performs only on partial jobs which have not started processing, we compress the breakdown interval to [*Pπ<sup>k</sup>* (*l*),*i* , *Cπ<sup>k</sup>* (*nk*−2),1], i.e., the start time of processing to the completion time of the penultimate job at the first machine, to ensure a feasible execution space for DMA.

The experiments are conducted on a PC with Intel(R) Core(TM) i7-8700 CPU and 16G RAM configuration, and the involved programs are compiled by Python. To balance the objective functions (makespan and stability), both weight coefficients *w*<sup>1</sup> and *w*<sup>2</sup> are set to 0.5. The algorithm is repeated 10 times for each breakdown in each factory, and the experiments use the average relative percent deviation (ARPD) as the metric to evaluate the mean quality of the obtained solutions. Since DMA is implemented for each distributed factory, the sum of ARPDs of all factories is firstly calculated, and the mean value is defined as the ARPD value for a single case of DDBFSP. The experiments will be conducted in the following three perspectives:


#### *5.2. Parameter Calibration*

DMA contains three key parameters: population size *Ps*, mutation scaling factor *κ*, and temperature adjustment parameter *T*0. The design of the experiment (DOE) [9] method is used for parameter calibrations, and in total 36 sets of initial schedules with factory number *f* = 3 were generated. The number of random breakdowns for each distributed factory is set to 2. The key parameters and ARPD values are defined as the control factors and response variables, respectively. The candidate values of the above parameters are set as: *Ps* = {30, 50, 80, 100}, *κ* = {0.4, 0.7, 0.9} and *T*<sup>0</sup> = {0.1, 0.2, 0.3, 0.4}, which generate 48 configuration combinations. We use the "Analysis of Variance" (ANOVA) method to analyze the statistical results, as shown in Table 1.


As can be seen in Table 1, the *p*-values of *Ps*, *κ* and *T*<sup>0</sup> are all less than 0.05 confidence level, which means all the parameters have important impacts on the performance of DMA. Among them, the corresponding *F*-ratio value of *κ* is the largest, which indicates that *κ* has the greatest impact on DMA. Moreover, Table 1 shows that the *p*-values of the interactions between every two parameters are greater than 0.05, which means the parameter interactions do not have a significant effect on DMA, and the parameter can be selected directly from the main effects plot in Figure 7. 0

**Figure 7.** Main effects plot of the parameters for DMA.

0.4 From Figure 7, it can be observed that the performance of DMA decreases with the increment of *κ*, and DMA obtains the best results when *κ* = 0.4. A relatively overlarge

0

0.4

=80 0.4 <sup>0</sup>

0.4

0 0

mutation scaling factor increases the randomness of search and leads to degradation of the mutation. The effect of *Ps* ranks second, the performance of DMA first increases as the number of *Ps* increases, it starts to decrease after reaching the optimum. This indicates that increasing *Ps* appropriately will enhance the diversity of the population. However, an overlarge *Ps* consumes too much running time of a single iteration, which in turn compresses the number of iterations and leads to a decrease in the probability of obtaining the optimal solution. The effect of *T*<sup>0</sup> on the algorithmic performance ranked the 3rd, and the main effect plot shows that the performance fluctuates with the growth of *T*0, DMA obtained the best results when *T*<sup>0</sup> = 0.4. Based on the above analysis, the parameter combinations of DMA are set as: *Ps* = 80, *κ* = 0.4, *T*<sup>0</sup> = 0.4.

#### *5.3. Effectiveness of the Proposed Algorithmic Component*

DMA contains three important components: WPNEH initialization, DE operators, and BRRLS. To verify their effectiveness, we mask one corresponding part of DMA each time, and generate three types of variant algorithms from DMA: (1) DMA\_RI: random initialization, which is used to verify the effectiveness of WPNEH initialization; (2) DMA\_ND: without neighborhood search, which is used to verify the effectiveness of DE operators; (3) DMA\_NL: without BRRLS, which is used to verify the effectiveness of BRRLS. The parameter settings and instances used in Section 5.2 were adopted. As the randomness of breakdown has a large impact on the results, to ensure the fairness of the comparison, only one complete set of breakdowns is simulated, and all variant algorithms are tested under this scenario. Table 2 shows the comparison results.


**Table 2.** Comparison results between DMA and its variant algorithms.

From Table 2, we can observe that the performance of DMA outperforms the other variants in all scenarios. In specific analysis, DMA outperforms DMA\_RI, representing that the WPNEH initialization strategy can provide a better search starting point for DMA; DMA outperforms DMA\_ND, indicating that the DE operators can improve the performance of DMA effectively. The results of DMA\_NL are inferior to those of DMA, which means that the proposed BRRLS and SA-based reception criterion have an important influence on the optimization. BRRLS has retained the "greedy search" idea from RLS but improved the selection of jobs in the reference sequence. It ensures the inconsistency of local search step length through random selection of job blocks strategy, which makes the solutions obtained by local search more diverse and helps DMA to jump out of the local optimum. On the other hand, it can be observed from Table 2 that the differences between the compared algorithms are not significant when the scale of the instance is relatively small (e.g., *n* = 50). If the processing tasks (number of jobs) assigned to a distributed factory are small, the corresponding reorder execution space is also smaller, and the comparison algorithms are more likely to obtain optimal or suboptimal solutions in a given time. With the gradual increase in the problem size, the differences between algorithms start to manifest. The performance of each variant algorithm decreases more on the large-scale instances, while DMA remains relatively more stable.

To verify whether the differences were significant, we conducted a significance test from a statistical point of view. Each algorithm and ARPD value were chosen as the control variable and the response variable. Figure 8 demonstrates the mean plot of the 95% confidence interval. As can be seen, the ARPD values of each algorithm are ranked from top to bottom as DMA\_ND, DMA\_RI, DMA\_NL, and DMA. The confidence intervals of each two algorithms do not overlap, which indicates that DMA is significantly better than its variants. The experimental results reveal that the proposed optimization strategies for each search phase jointly ensure the performance of DMA.

50

**Figure 8.** Mean plots with 95% confidence intervals for DMA and its variant algorithms.

#### *5.4. Comparison with Other Intelligent Algorithms*

In this section, DMA is compared with the algorithms for solving traditional flow shop rescheduling problems. The compared algorithms are (1) Iterative Local Insertion Search (ILS) [25]; (2) Iterative Greedy Algorithm (IG) [25]; (3) Improved Migratory Bird Algorithm (IMBO) [47]; (4) Discrete Teaching and Learning Optimization (DTLO) Algorithm [29]. We follow strictly the literature to compile all the comparison algorithms, which share the same data structure, objective function, and termination criterion. To ensure comparison fairness, we apply the same rescheduling process, i.e., the comparison algorithms share the same initial schedules, the same time node distribution of breakdown events, and the same rescheduling strategy. The algorithms are only used to reorder unprocessed jobs as a way to test their optimization efficiencies. The calculation of ARPD is consistent with previous sections. The parameter combinations of the comparison algorithm were pre-adjusted and the specific settings are shown in Table 3. According to [25], as a single local search algorithm, ILS stops immediately after obtaining a local optimum, so no special parameters or termination conditions need to be set for ILS.


**Table 3.** Parameter determination of the compared algorithms.

Table 4 shows the statistical results of different instances under single breakdown (*β* = 1), the optimal values were bolded. It can be seen that DMA outperforms other algorithms in all instances of distributed scenarios. DTLO, IG, IMBO, and ILS obtain results in 2nd, 3rd, 4th and 5th place. It indicates that DMA is more suitable as a reordering algorithm. Specifically, ILS is an algorithm containing only a local insertion search framework, which lacks key structures such as population initialization and neighborhood search. It is difficult for ILS to balance exploration and exploitation, and therefore has the worst

performance. As metaheuristics, DTLO, IG, and IMBO are more competitive in terms of performance on small-scale instances. For example, when the number of the distributed factory is set as *f* = 4, and the total number of jobs is *n* = 50, the results of other algorithms do not differ from DMA. The main reason is that when the size of the initial schedule is small, fewer processing tasks are assigned to each distributed factory. The search space for the solution becomes smaller, so the efficiency of each algorithm in finding the best solutions in a given time becomes high, thus reducing the variability of the comparison results. The same phenomenon is reflected in the scenarios of *β* = 2 and *β* = 3, as shown in Tables 5 and 6. DMA also performs the best among all comparison algorithms. As the number of breakdowns increases, the performances of the compared algorithms are not affected too much by small-scale instances. The performances on large-scale instances (*n* = 200) showed different degrees of degradation. The errors of single rescheduling accumulate with the increment of breakdowns, which causes a deterioration effect, and thus decreases the algorithmic performances relatively. In comparison, the statistical results of DMA under different breakdown scenarios are less different, which also reveals that DMA is more stable and robust on both small- and large-scale instances.

**Table 4.** Comparison results of different algorithms by *β* = 1.


To further verify the superiority of DMA, the differences between the compared algorithms are observed through statistical tests. ANOVA is used to describe the mean plot with a 95% confidence interval of the results obtained by the algorithms for different *f*, as shown in Figure 9.

It can be observed that the ARPD of DMA falls below that of other algorithms, and none of the confidence intervals overlap. It indicates again that the optimization performance of DMA is significantly better than its comparators. Moreover, it can be seen from Figure 9 that the performance of each algorithm gradually becomes better as *f* increases.

This is mainly because the reordering algorithm does not focus on the assignment of jobs to plants in the initial schedule, rather on the reordering within the distributed factories. An increase in *f* leads to a decrease in assigned jobs and a corresponding decrease in their computational complexity and, therefore, an increase in the optimization-seeking efficiency of an algorithm.

Moreover, we compared and analyzed the performances under different numbers of breakdowns (*β*) based on the statistical results. Figure 10 shows the performance curves of the comparative algorithms. It can be seen that ARPD values increase as *β* gradually increases, representing that all the algorithms are affected by the frequency of breakdowns. Compared with other algorithms, ARPDs of DMA are improved by at least 88%. Moreover, ARPD values of ILS, IG, IMBO, and DTLO algorithms fluctuate more and show an increasing trend with the increase in *β*. Comparatively, ARPD values of DMA fluctuate the least, which indicates that the robustness of DMA under different scenarios is better than the compared algorithms.

In summary, DMA has a good performance for local reordering. Its innovation and advantages can be summarized as follows: (1) WPNEH initialization method provides a better initial population and high-quality search starting point for DMA; (2) the mutation and crossover operators based on DE provides an excellent neighborhood search capability; (3) BRRLS provides stronger local exploitation; (4) the SA-based reception criterion can help DMA jump out of the local optimum effectively. Among them, (2), (3) and (4) balance the exploration and exploitation of DMA.


**Table 5.** Comparison results of different algorithms by *β* = 2.


**Table 6.** Comparison results of different algorithms by *β* = 3.

**Figure 9.** Mean plot with 95% confidence interval of the compared algorithms on different scenarios.

β

**Figure 10.** Performance comparison with different breakdown numbers.

#### **6. Conclusions**

Building rescheduling optimization models and designing effective optimization methods according to the characteristics of distributed manufacturing are of significance to promote the research of the dynamic scheduling theory of distributed manufacturing. This study investigated the rescheduling strategy and algorithm for DDBFSP, in which machine breakdown events are considered as the disruption in the manufacturing site. Firstly, the mathematical model of DDBFSP including the event simulation mechanism is constructed. We consider makespan and stability as the objectives. The goal of this study is to optimize the bi-objective when the stochastic breakdown occurs in any distributed factories. We apply the "event-driven" policy in response to the disruption. A two-stage "predictivereactive" rescheduling strategy is proposed. In the first stage, a static environment (DBFSP) without machine breakdown is considered, and the global initial schedules are generated; in the second stage, after machine breakdown occurs, the initial schedule is locally optimized by a hybrid repair policy based on "right-shift repair + local reorder", and the DMA reordering algorithm based on DE is proposed for local reorder operation. For DMA, a WPNEH initialization method is designed to generate a high-quality population. In the neighborhood search phase, DE is embedded to improve the neighborhood structure and expand the target space by using mutation and crossover operators; in the local search phase, the BRRLS framework is proposed to perturb the high-quality solutions. To maintain the diversity, BRRLS has combined with the SA mechanism to receive the worse solutions with a certain probability. To obtain the best performance of DMA, the DOE method is used to calibrate three key parameters. The effectiveness of the proposed optimization strategy for DMA is verified through comparative experiments. Finally, DMA is compared with other algorithms on different test instances. The statistical analysis using ANOVA has verified the superiority of DMA.

Although the proposed rescheduling strategy has shown effectiveness, there still exist shortcomings. In this study, we only considered the breakdown event as the disruption. Real-life manufacturing suffers from far more than one disruption. The other common disruptions such as job cancellations and their interaction mechanism should be deeply investigated. Therefore, future works will concentrate on the construction of a more refined model that can manage more disruptions simultaneously.

This study attempts to explore the dynamic scheduling problem from the perspective of operational research optimization. With the development of the Industrial 4.0 network and big data, other artificially intelligent technologies play increasingly important roles in smart manufacturing. Combining data-driven technology with intelligent algorithms could adopt their respective advantages, and create more advanced optimization frameworks. For example, intelligent optimization can provide a large amount of historical scheduling data, which can be aggregated with other industrial information as a sample source for datadriven and machine learning. Therefore, the scheduling decision-making function can be deployed hierarchically and decoupled according to different scenarios and environments, thus making rational use of computing resources and improving the flexibility and stability of the system.

**Author Contributions:** Conceptualization, Z.L.; methodology, X.Z.; software, Y.H.; validation, Z.L., X.Z. and M.R.; formal analysis, G.K.; investigation, R.S.; resources, G.K.; data curation, X.Z.; writing original draft preparation, X.Z. and Y.H.; writing—review and editing, Z.L.; visualization, M.R.; supervision, G.K.; project administration, R.S.; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is funded by the Natural Science Foundation of Xuzhou, China (KC21070), National Natural Science Foundation of China (61803192), and the Narodowego Centrum Nauki, Poland (No. 2020/37/K/ST8/02748 & No. 2017/25/B/ST8/00962).

**Data Availability Statement:** All data can be requested from the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Yaping Huang 1, \* , Lei Yan 1 , Yan Cheng 2 , Xuemei Qi 1, \* and Zhixiong Li 3**


**Abstract:** The change in coal seam thickness has an important influence on coal mine safety and efficient mining. It is very important to predict coal thickness accurately. However, the accuracy of borehole interpolation and BP neural network is not high. Variational mode decomposition (VMD) has strong denoising ability, and the long short-term memory neural network (LSTM) is especially suitable for the prediction of complex sequences. This paper presents a method of coal thickness prediction using VMD and LSTM. Firstly, empirical mode decomposition (EMD) and VMD methods are used to denoise simple signals, and the denoising effect of the VMD method is verified. Then, the wedge-shaped coal thickness model is constructed, and the seismic forward modeling and analysis are carried out. The results show that the coal thickness prediction based on seismic attributes is feasible. On the basis of VMD denoising of the original 3D seismic data, VMD-LSTM is used to predict coal thickness and compared with the prediction results of the traditional BP neural network. The coal thickness prediction method proposed in this paper has high accuracy and basically coincides with the coal seam information exposed by existing boreholes. The minimum absolute error of the predicted coal thickness is only 0.08 m, and the maximum absolute error is 0.48 m. This indicates that VMD-LSTM has high accuracy in predicting coal thickness. The proposed coal thickness prediction method can be used as a new method for coal thickness prediction.

**Keywords:** VMD; LSTM; coal thickness; seismic attribute

#### **1. Introduction**

In the construction and production process of large-scale mines, all aspects of coal mine safety production need to accurately calculate the change in coal thickness [1]. When the coal thickness of the working face changes dramatically, the reserves per unit length of the working face will change greatly. If the same mining speed is adopted, the monthly coal output will be unbalanced, and the mine production will be greatly affected. According to statistics, when the actual coal thickness is 10–20% thinner than the designed coal thickness, the coal output will decrease by 35–40% [2]. In general, the thicker the coal is, the more gas is formed in the coalification process, and the greater the probability of gas leakage in the mining process is. The impact risk degree is closely related to the coal thickness and its changes. A large number of field observations and in situ stress measurements have found that in the coal thickness change area, the abnormal phenomenon of in situ stress field occurs, and the stress concentration degree is high, which is easy to induce strong mine earthquakes and rock bursts [3]. Therefore, coal thickness is an essential type of data in the process of coal mine design and mining [4]. Accurately predicting coal thickness can not only improve the economic benefit of coal mines, but also provide a strong geological guarantee for the safe mining of coal mines [5].

With the development of coal mining, in the seismic exploration of coal fields, it is necessary not only to find out the structure in the mining area, but also to provide the change of the coal seam thickness [6]. However, because most coal seams are typically

**Citation:** Huang, Y.; Yan, L.; Cheng, Y.; Qi, X.; Li, Z. Coal Thickness Prediction Method Based on VMD and LSTM. *Electronics* **2022**, *11*, 232. https://doi.org/10.3390/ electronics11020232

Academic Editors: Darius Andriukaitis, Yongjun Pan and Peter Brida

Received: 8 December 2021 Accepted: 10 January 2022 Published: 12 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

thin layers, their vertical resolution is very low, which is difficult to distinguish in seismic records and cannot meet the requirements of solving coal thickness, so the quantitative prediction of coal thickness has been one of the recognized problems in the geophysical field. The traditional method of coal thickness is to compare and interpolate the borehole data, and the accuracy of the coal thickness data provided is limited. In particular, when the coal seam is missed, stripped, and bifurcated, the prediction error of coal thickness is large [7].

In recent years, scholars have conducted a lot of research on coal thickness prediction and have achieved more significant results and progress. Some scholars use seismic attributes to predict coal thickness and have achieved good results. Zou et al. [8] used the staggered grid finite difference method to establish a wedge model to study coal thickness. The results show that when the coal seam is thin, the amplitude attribute is correlated with the coal thickness, and the law can be used to predict the coal thickness qualitatively. Guo et al. [9] analyzed the correlation between seismic attributes at the borehole and coal thickness. They obtained the regression equation between each attribute and coal seam thickness through optimal seismic attributes and achieved good application effects in coal thickness prediction. Lin et al. [10] used logging data combined with 3D seismic data and their interpretation results to conduct wave impedance inversion under multi-well constraints. The corresponding wave impedance layer of the coal seam was determined through horizon calibration, and then the thickness of the wave impedance layer was extracted and matched with the thickness of the coal seam exposed by the borehole. Finally, the distribution law of coal thickness in the 3D seismic survey area was obtained. Some scholars have introduced geostatistics into coal thickness prediction. Du et al. [11] used the Cokriging method to predict coal thickness by taking the thickness of drilled coal seam and seismic amplitude as regional variables, which reduced the error of coal thickness prediction and significantly improved the prediction accuracy. Cheng et al. [12] combined the variogram and the Kriging method to fully consider well data and seismic attribute data. He assigned certain weight coefficients to the two data types and performed a weighted average to calculate the thickness of the coal. The in-seam wave is a kind of guided wave propagating in the coal seam. It has the advantages of long propagation distance, carrying much information about the coal seam, roof, and floor; strong energy; and obvious dispersion characteristics. Based on these advantages, the in-seam wave can be used to study coal thickness. Liu et al. [13] found that the periodicity of the refraction P-wave in the in-seam wave is proportional to the thickness of coal, and the thickness of coal can be predicted by using the period of the refraction P-wave. Li et al. [14] chose to collect the appropriate frequency at the mining face in the transmission method of love waves with wave CT tomography, according to the wave velocity and thickness of the negative correlation and dispersion curve. Combined with the tunnel exposing coal thickness, the wave velocity contour map can be converted to a coal thickness contour map. Implementing the coal thickness quantitative prediction method proved high accuracy. With the development of artificial intelligence, more and more artificial intelligence technology is gradually applied to coal thickness prediction. The mature development of deep neural network technology urges mining workers to study the geological prediction method based on data and forms geological prediction technology driven by data [15,16]. Sun et al. [17] used non-linear BP neural network technology to predict coal thickness by extracting seismic attributes in the wavelet domain of different frequency bands according to the changing characteristics of coal thickness. Wu et al. [18] combined the least square support vector machine (LSSVM) with the Kriging method to predict coal thickness. They used the strong non-linear fitting and generalization capabilities of LSSVM to adaptively fit the experimental variogram, which overcame the disadvantages of the traditional variation function, such as difficulty in solving and strong subjectivity, and greatly improved the prediction accuracy. Guo et al. [19] proposed the coal thickness prediction method by combining genetic algorithm (GA) and simultaneous iterative reconstruction technology (SIRT), which can improved the accuracy of coal thickness inversion. However, the above methods are basically coal

thickness prediction methods directly based on seismic data, and discussion on the actual situation that seismic data usually contains noise is limited. Seismic data contains a lot of noise, which is a practical problem that needs to be faced when predicting coal seam thickness.

Variational mode decomposition (VMD) is a non-recursive modal decomposition method proposed by Dragomiretskiy in 2014 [20]. The VMD algorithm decomposes the signal according to the center frequency, divides the signal around a certain center frequency into one component, and repeats this process to achieve the decomposition of the original signal. When VMD is used to process non-stationary signals, it can effectively avoid the mode-aliasing effect and the endpoint effect caused by the empirical mode decomposition (EMD) algorithm, and the intrinsic mode function (IMF) is redefined as an amplitude modulation–frequency modulation signal. It has a solid mathematical basis. More importantly, the VMD method can effectively attenuate a large amount of random noise contained in seismic data. The long short-term memory neural network (LSTM) was proposed by Hochreiter et al. in 1997 and further improved in the following years [21]. LSTM is designed to learn long-term memory, and the model can overcome the inherent problems of the recurrent neural network (RNN), namely "vanishing gradients" and "exploding gradients". LSTM is one of the most successful recurrent neural networks, and its prediction accuracy is high.

Based on the good denoising ability of VMD and good prediction ability for complex sequences of LSTM, this paper proposes VMD-LSTM method for coal thickness prediction. Firstly, the denoising effect of the VMD method is verified. Then, the coal thickness wedge model is used to illustrate that the seismic attribute can predict the coal thickness. The VMD-LSTM method is used for coal thickness prediction of actual seismic data. The prediction results have high accuracy and basically coincide with the coal seam information exposed by existing boreholes, which shows that the coal thickness prediction method proposed in this paper can be used as a new method for coal thickness prediction.

#### **2. Basic Principle of VMD and LSTM for Predicting Coal Thickness**

#### *2.1. Basic Principles of VMD*

During the late 1990s, Huang introduced the algorithm called EMD, which is widely used today to recursively decompose a signal into different modes of unknown but separate spectral bands [22]. However, due to the lack of mathematical theory and freedom, the robustness of the algorithm is very low, which leaves room for the development and improvement of EMD. VMD is an adaptive, non-recursive decomposition method that can decompose signals into the sum of finite component signals [23]. It is a new decomposition algorithm based on the Wiener filter, Hilbert transform, and heterodyne demodulation. It has a solid mathematical and theoretical foundation. VMD decomposes the signal according to the central frequency, divides the signal around a certain central frequency into a component, and repeats this process [24]. Finally, the original signal is decomposed. When VMD is used to process non-stationary signals, it can effectively avoid the modealiasing effect and the endpoint effect caused by the EMD algorithm, and IMF is redefined as an amplitude modulation–frequency modulation signal.

#### 2.1.1. Construction of Variational Problems

The goal of VMD is to decompose a real-valued input signal into a discrete number of IMF. The IMF expression is as follows:

$$u\_k(t) = A\_k(t) \cos(\varphi\_k(t))\tag{1}$$

In Equation (1), *A<sup>k</sup>* (*t*) is the non-negative envelope, and *ϕ<sup>k</sup>* (*t*) is the phase of the signal.

For each mode, compute the associated analytic signal by means of the Hilbert transform in order to obtain a unilateral frequency spectrum:

$$[\delta(t) + \frac{\dot{j}}{\pi t}] \* u\_k(t) \tag{2}$$

In Equation (2), *δ*(*t*) is a Dirac delta function, *j* is an imaginary unit, and ∗ is the convolution symbol.

For each mode, shift the mode's frequency spectrum to "baseband" by mixing with an exponential tuned to the respective estimated center frequency.

$$[ (\delta(t) + \frac{\dot{j}}{\pi t}) \* \mu\_k(t) ] e^{-jw\_k t} \tag{3}$$

In Equation (3), *e* −*jw<sup>k</sup> t* is the center frequency index.

According to the estimated bandwidth of IMF, the variational constraint equation is obtained:

$$\min\_{\{\boldsymbol{\mu}\_{k}\}, \{\boldsymbol{w}\_{k}\}} \left\{ \sum\_{k} \left\| \partial\_{t} [ (\boldsymbol{\delta}(t) + \frac{\boldsymbol{j}}{\pi t}) \* \boldsymbol{u}\_{k}(t)] e^{-j\boldsymbol{w}\_{k}t} \right\|\_{2}^{2} \right\} \tag{4}$$
 
$$\text{s.t.}\\\sum\_{k} \boldsymbol{u}\_{k} = \boldsymbol{f}$$

{*uk*} := {*u*1, *u*<sup>2</sup> · · · , *uK*} and {*wk*} := {*w*1, *w*<sup>2</sup> · · · , *wK*} are shorthand notations for the set of all modes and their center frequencies, respectively. ∑*<sup>k</sup>* := ∑ *K k*=1 is understood as the summation over all modes.

#### 2.1.2. Solution of Variational Problem

In order to obtain the optimal solution, the augmented Lagrange expression is obtained by introducing the quadratic penalty factor *α* and the Lagrange multiplication operator *γ*. The purpose of introducing *α* is to ensure the signal reconstruction accuracy under the influence of noise, and the purpose of introducing *γ* is to maintain strict constraints in the solving process.

$$\begin{aligned} L(\{\boldsymbol{u}\_{k}\}, \{\boldsymbol{w}\_{k}\}, \gamma) &= \operatorname{a} \left\{ \sum\_{k=1}^{K} \partial\_{t} [(\delta(t) + \frac{j}{\pi t}) \* \boldsymbol{u}\_{k}(t)] \boldsymbol{e}^{-j\boldsymbol{w}\_{k}t} \right\} \Big|\_{2}^{2} + \\ &\sum (f(t) - \sum\_{k=1}^{K} \boldsymbol{u}\_{k}(t)) \Big|\_{2}^{2} - \left\langle \gamma(t), f(t) - \sum\_{k=1}^{K} \boldsymbol{u}\_{k}(t) \right\rangle \end{aligned} \tag{5}$$

The alternating direction method of multipliers (ADMM) is used to solve the formula. In order to achieve the purpose of complete signal decomposition, the central frequency of each IMF is divided according to the frequency domain characteristics of the original signal, and then the central frequency of IMF is updated. The updated IMF expression is as follows:

$$u\_k^{n+1} = \underset{u\_k \in X}{\operatorname{argmin}} \left\{ a \left\| \left\| \partial\_l [ (\delta(t) + \frac{\dot{j}}{\pi t}) \* u\_k(t) ] e^{-j u\_k t} \right\|^2 + \left\| f(t) - \sum\_i u\_i(t) + \frac{\gamma(t)}{2} \right\|^2 \right\} \tag{6}$$

Fourier transform is used to transform the formula from the time domain to the frequency domain, and then the non-negative frequency domain interval is integrated to obtain the optimal solution. Among them, the frequency domain calculation criterion expression of each IMF is as follows:

$$\mathfrak{u}\_{k}^{n+1}(\omega) = \frac{\stackrel{\wedge}{f}(\omega) - \sum\_{i \neq \pm} \stackrel{\wedge}{u}\_{i}(\omega) + \frac{\stackrel{\wedge}{f}(\omega)}{2}}{1 + 2\alpha(\omega - \omega\_{k})^{2}} \tag{7}$$

The updated iteration formula of IMF center frequency is:

$$
\omega\_k^{n+1} = \frac{\int\_0^\infty \omega \left| \stackrel{\wedge}{u}\_k(\omega) \right|^2 d\omega}{\int\_0^\infty \left| \stackrel{\wedge}{u}\_k(\omega) \right|\_2 d\omega} \tag{8}
$$

In Equation (8), *u*ˆ *n*+1 *k* (*ω*) is the Wiener filter of ∧ *<sup>f</sup>*(*ω*) − <sup>∑</sup>*i*6<sup>=</sup> ∧ *ui*(*ω*), and *ω n*+1 *k* is the frequency spectrum center of each IMF.

#### *2.2. Basic Principles of LSTM*

Hochreiter et al. proposed LSTM on the basis of RNN. RNN takes the sequence data as the input of the model, calculates the influence on the final output by learning the changes of a series of data, and modifies the weight matrix to optimize the network by calculating the gradient through positive feedback and negative feedback so as to realize the learning of time series. However, RNN systems are problematic in practice, despite their stability. During training, they suffer from well-documented issues, known as "vanishing gradients" and "exploding gradients" [25]. These difficulties become pronounced when the dependencies in the target subsequence span a large number of samples, requiring the window of the unrolled RNN model to be commensurately wide in order to capture these long-range dependencies [26]. LSTM is a good solution to the problem of "vanishing gradients" and "exploding gradients" and can be efficiently learned through memory cells and "gates" and for long-term memory of useful information. The structure of LSTM is similar to a traditional neural network, which has one input layer, one or more hidden layers, and one output layer. The hidden layer of the LSTM neural network contains many neurons called memory cells, and each of these memory cells has three "gates", whose function is to maintain and adjust the state of the memory cells. These three "gates" are called the forget gate, input gate, and output gate, respectively. The essence of the control gate of LSTM is a fully connected layer. The input is a vector *x*, which is calculated with weight vector *W* and bias item *b* and filtered by the activation function. The output will obtain a real vector of 0–1, whose calculation formula is as follows:

$$\mathbf{g}(\mathbf{x}) = \sigma(w\mathbf{x} + b) \tag{9}$$

The activation function of the gates in the hidden layer of LSTM is the sigmoid function with a range of (0, 1). The output of the gates is multiplied by the input vector to determine how much information is passed. When the output of the gate is 0, the product of the output vector and the input vector results in 0, and the information cannot pass. When the output is 1, the input vector is unaffected, and the gate is open. The gate is always ajar because of the sigmoid function range properties.

LSTM controls the unit state c with the forget gate and the input gate. The forget gate determines how much of the unit state *ct*−<sup>1</sup> at the previous moment is retained to the current moment *c<sup>t</sup>* , and the input gate determines how much of the network input *x<sup>t</sup>* at the current moment is saved to the unit state *c<sup>t</sup>* . LSTM uses the output gate to control how much output the unit state *c<sup>t</sup>* has to the current output of LSTM *h<sup>t</sup>* .

Updating a cell can be broken down into the following steps [24].

(1) Calculate the value of the forget gate:

$$f\_t = \sigma(\mathcal{W}\_f \cdot [h\_{t-1}, \mathbf{x}\_t] + b\_f) \tag{10}$$

In Equation (10), *W<sup>f</sup>* is the weight matrix of the forget gate, [*ht*−1, *x<sup>t</sup>* ] is the joining of two vectors into a longer vector, *b<sup>f</sup>* is the offset of the forget gate, and σ is the sigmoid function.

(2) Calculate the value of the input gate:

In Equation (11), *W<sup>i</sup>* is the weight matrix of the forget gate, and *b<sup>i</sup>* is the offset of the forget gate.

$$\mathbf{i}\_t = \sigma(\mathbf{W}\_{\dot{\mathbf{i}}} \cdot [\mathbf{h}\_{t-1}, \mathbf{x}\_t] + b\_{\dot{\mathbf{i}}}) \tag{11}$$

(3) Calculate the value of <sup>∼</sup> *ct* . It is used to describe the unit state of the current input, calculated from the previous output and the current input:

$$\widetilde{\mathcal{L}}\_t = \tanh(\mathcal{W}\_\mathbf{c} \cdot [h\_{t-1}, \mathbf{x}\_t] + b\_\mathbf{c}) \tag{12}$$

(4) Calculates the cell state at the current time *c<sup>t</sup>* :

$$\mathcal{c}\_{t} = f\_{t} \odot \mathcal{c}\_{t-1} + i\_{t} \odot \stackrel{\sim}{c\_{t}} \tag{13}$$

In Equation (13), ⊙ is the symbol for multiplying matrices by bits, and *c<sup>t</sup>* is the result of combining current and long-term memories.

(5) Calculate the value of the output gate:

$$\rho\_t = \sigma(\mathcal{W}\_o \cdot [h\_{t-1}, \mathbf{x}\_t] + b\_o) \tag{14}$$

In Equation (14), *W<sup>o</sup>* is the weight matrix of the output gate, and *b<sup>o</sup>* is the offset of the forget gate.

(6) Calculate the current cell output of LSTM:

$$h\_l = o\_l \odot \tanh(c\_l) \tag{15}$$

#### *2.3. Coal Thickness Prediction Process of VMD-LSTM*

Firstly, the wedge model of coal thickness is constructed and simulated by seismic forward modeling, and the sensitive seismic attributes of coal thickness such as instantaneous amplitude, instantaneous frequency, and relative wave impedance are extracted, then the feasibility of coal thickness prediction using seismic attributes is analyzed. Secondly, the seismic data were decomposed into IMFs with different characteristics and frequencies by using the VMD method, the correlation coefficients between seismic attributes of seismic data IMF1 and coal thickness were calculated, and the appropriate IMFs were selected in order to effectively remove random noise and highlight effective signals. Finally, the LSTM network is used to predict coal thickness. The specific flow chart is shown in Figure 1.

**Figure 1.** Flow chart of coal thickness prediction by VMD and LSTM.

#### **3. Numerical Calculation**

#### *3.1. Simple Signal Test*

Formula (16) is adopted to synthesize the signal, and Gaussian white noise with a signal-to-noise ratio (SNR) of 30 dB is added to the signal to form a signal containing noise (Figure 2). The EMD and VMD methods are used to denoise simple signals to test their denoising ability. The denoising results are shown in Figures 3 and 4. It can be seen from Figure 3 that there is an obvious mode-aliasing problem in EMD decomposition, which cannot effectively decompose the random noise information in simple signals. Figure 4 shows that VMD has a better decomposition effect on random noise, IMF1 corresponds to the main components of the signal, and the correlation coefficient with the noise-free signal reaches 0.9998. The result shows that VMD can be used in signal denoising, and the denoising effect is good.

$$\begin{cases} y\_1 = \cos(30\pi \ast t) \\ y\_2 = 0.5 \ast \sin(30\pi \ast t) \\ y = y\_1 + y\_2 \end{cases} \tag{16}$$

1 2

y

 

1 2

0.5\*sin(30 \* )

**Figure 2.** Simple signal and noisy signal: (**a**) simple signal; (**b**) noisy signal; (**c**) noise.

**Figure 3.** EMD decomposition results of noisy signal: (**a**) noisy signal; (**b**) IMF 1; (**c**) IMF 2; (**d**) IMF 3.

**Figure 4.** VMD decomposition results of noisy signal: (**a**) noisy signal; (**b**) IMF 1; (**c**) IMF 2.

The instantaneous amplitude and frequency of IMF1 denoised by VMD are calculated (Figure 5). As shown in Figure 5, the correlation coefficients of the instantaneous amplitude and instantaneous frequency between the noise-free signal and IMF1 are high, reaching 0.9861 and 0.9709.

**Figure 5.** VMD decomposition results of noisy signal: (**a**) instantaneous amplitude of noise-free signal and IMF1 of VMD; (**b**) instantaneous frequency of noise-free signal and IMF1 of VMD.

The application results of VMD in the simple signal show that VMD has good denoising ability, the noisy signal processed by VMD can accurately contain the effective components of the noise-free signal, and the instantaneous amplitude and instantaneous frequency extracted from IMF1 have a high correlation with the instantaneous frequency and instantaneous amplitude of the noise-free signal.

#### *3.2. VMD Decomposition of Coal Thickness Wedge Model*

Firstly, a wedge model, which is commonly used in the study of coal thickness, is constructed. The model mainly consists of three layers: sandstone, mudstone, and coal seam. The coal thickness varies from 0 m to 40 m, and the velocity, density, and thickness of each layer are obtained from [5]. A Ricker wavelet with a main frequency of 50 Hz was used for forward modeling, and the distance between the receiving channels was 10 m, with a total of 40 receiving channels. The forward modeling section is shown in Figure 6a. Random noise of 40 dB was added to the forward record of the wedge model, as shown in Figure 6b. As can be seen from Figure 6, seismic records are significantly affected by noise, and the continuity of the in-phase axis becomes poor, which is not conducive to the subsequent prediction of coal thickness. VMD was carried out for noisy signals. The experiment found that when the decomposed IMF number K was greater than 2, the center frequency was very close, so K was set to 2. The results of the seismic records of the wedge model of the coal seam decomposed by VMD are shown in Figure 6c,d. The correlation coefficients between the noise-free signal and IMF1 seismic tracks are calculated, with the maximum value reaching 0.9659 and the average value 0.9258. It shows that VMD has strong denoising ability, and IMF1 decomposed by VMD is basically consistent with the noise-free synthetic seismic record.

**Figure 6.** Wedge model seismic record and VMD decomposition result: (**a**) forward seismic record of the wedge-shaped model; (**b**) seismic record of the model after adding random noise; (**c**) IMF1 after VMD denoising of noisy signals; (**d**) IMF2 after VMD denoising of noisy signal.

#### *3.3. Seismic Attributes Analysis of Wedge Model Seismic Records*

According to the existing research, seismic attributes such as instantaneous amplitude, instantaneous frequency, and relative wave impedance have a certain correlation with the thickness of coal. Therefore, the three seismic attributes mentioned above are extracted, and their relationship with the thickness of coal is analyzed. Figure 7 shows the relationship between the instantaneous amplitude, instantaneous frequency, and relative wave impedance of seismic records in the noise-free wedge model and coal thickness. When the coal seam thickness is less than 10 m, the instantaneous amplitude attribute shows an obvious increasing trend with the increase in the coal seam thickness; in other words, the instantaneous amplitude attribute has a good positive correlation with the coal thickness. When the coal thickness is greater than 10 m, the instantaneous amplitude attribute gradually decreases and tends to change gently. When the thickness of coal is less than 20 m, the instantaneous frequency attribute gradually decreases and tends to the minimum value with the increase of coal thickness. The instantaneous frequency attribute has a good negative correlation with the coal thickness. With the continuous increase in coal thickness, the instantaneous frequency attribute gradually increases and tends to be stable. When the thickness of coal is less than 20 m, the relative wave impedance attribute increases gradually with the increase in the coal thickness and has a good positive correlation with the coal thickness. When the thickness of coal is greater than 20 m, the relative wave impedance attribute decreases and tends to be stable with the change in coal thickness. In conclusion, when the thickness of coal is small, the relationship between the attributes of the instantaneous amplitude, instantaneous frequency and relative wave impedance, and coal thickness is simple, and the coal thickness can be predicted by using seismic attributes.

**Figure 8.** Relationship between the instantaneous amplitude, instantaneous frequency, and relative wave impedance of IMF and coal thickness.

#### **4. Application of VMD-LSTM Method in Coal Thickness Prediction**

#### *4.1. Geological Survey of the Working Area*

The working area is located in Ordos Basin. The coal-bearing strata in the mine field are the Carboniferous Taiyuan Formation and the Lower Permian Shanxi Formation. The total thickness of the coal-bearing strata revealed by drilling is 118.26–192.88 m, with an average of 142.74 m. The #4 coal seam is located in the upper part of the second rock member of the Lower Permian Shanxi Formation. The natural thickness of the coal seam is 0–5.65 m, with an average thickness of 3.82 m. The thickness of the working area is 0.86–3.79 m, 2.44 m on average. The structure of the coal seam is complex. There are 0–9 layers of partings, generally three layers. Most partings are located in the middle of the coal seam. The coefficient of variation of coal thickness is 32%. The coal seam gradually thickens from west to east. The #4 coal seam is a relatively stable coal seam with reliable comparison and most of the mining area. The thickness of the #4 coal seam in the working area varies from 2.30 to 4.70 m.

#### *4.2. Coal Thickness Prediction and Result Analysis*

Seismic data processing mainly adopts conventional processes such as preprocessing, filtering, deconvolution, velocity analysis, normal moveout correction and static correction, stack, and migration. The focus of this paper is to use the VMD method to attenuate the noise of post-stack migration 3D seismic data at any time so as to provide seismic data with a high SNR for coal thickness prediction. The VMD-LSTM method was used to predict the thickness of the #4 coal seam in this area. The parameters of LSMT are as follows: the solver is 'adam', the gradient threshold is 1, the maximum number of rounds is 500, and the execution environment is specified as 'cpu'.

Figure 9 shows the original seismic record and the seismic record after VMD denoising of the #4 coal seam. It can be seen that the #4 coal seam is between 240 ms and 290 ms of the seismic record. It can be seen from the comparison between Figure 9a,b that the original seismic record is greatly affected by noise, and the seismic event of the #4 coal seam is not clear. After VMD denoising, the seismic event of the #4 coal seam is clearer, and the energy is significantly enhanced, which is conducive to the subsequent coal thickness prediction.

**Figure 9.** Original seismic record and seismic record after VMD denoising of #4 coal seam: (**a**) original seismic record of #4 coal seam; (**b**) seismic record after VMD denoising of #4 coal seam.

Instantaneous amplitude, instantaneous frequency, and relative wave impedance attributes of the seismic record after VMD denoising were extracted (Figures 10–12). The relationship between the coal thickness and instantaneous amplitude attribute derived from the wedge model shows that the coal seam in the western part of the area may be thinner, while the coal seam in the eastern part may be thicker. The instantaneous frequency and relative wave impedance attributes also have similar characteristics, but the response characteristics of each seismic attribute are quite different. The results of a single seismic attribute reflect the possible distribution characteristics of the coal seam, but it is impossible to quantitatively predict the coal thickness.

In order to study the variation of coal thickness in this area more accurately, based on the coal thickness information and seismic attributes of 10 known boreholes in this area, the instantaneous amplitude, instantaneous frequency, and relative wave impedance attributes of the borehole side channel were used as the input of the training samples, and the known coal thickness was used as the output of the training samples. The coal thickness predicted by the traditional BP neural network method is shown in Figure 13. The LSTM network was also used for training, which has six layers: an input layer, an output layer, and four hidden layers. In order to prevent over-fitting, the dropout layer was introduced into the model, and the dropout value was set to 0.1. The root mean square error is used for the loss function, and the optimizer is Adam. After obtaining satisfactory training results, the coal thickness of the whole area was predicted, as shown in Figure 14. Comparing Figures 13 and 14, it can be seen that the LSTM prediction results have better regularity. Figure 14 shows that the area of the southeast coal seam is thick, between 3 and 5 m, and the southwest coal seam is thin, about 3 m. The predicted coal thickness information is basically

consistent with the trend that the coal seam in this area gradually thickens from west to east and from south to north. This regularity is not obvious in Figure 13. The A10 borehole reveals a coal thickness of 4.7 m. Figure 14 shows that the coal seam in the area where A10 is located is thicker and the surrounding area is stable, but there is a certain deviation in the prediction results in Figure 13. The A9 borehole reveals a coal thickness of 2.3 m. Figure 14 also shows that the coal seam in the area where the borehole is located is thinner, and the coal seam is thinner in a certain range around. The analysis of the prediction results of coal thickness at 10 boreholes shows that the minimum absolute error of the BP neural network coal thickness prediction is 0.35 m, and the maximum absolute error is 1.27 m. However, the minimum absolute error of the LSMT coal thickness prediction is only 0.08 m, and the maximum absolute error is 0.48 m. The application results of actual data show that the coal thickness prediction method proposed in this paper has high accuracy, and the predicted coal thickness results have important guiding significance and reference value for later coal mining.

**Figure 10.** Instantaneous amplitude attribute of #4 coal seam.

**Figure 11.** Instantaneous frequency attribute of #4 coal seam.

**Figure 12.** Relative wave impedance amplitude attribute of #4 coal seam.

**Figure 13.** Coal thickness prediction results based on BP neural network.

**Figure 14.** Coal thickness prediction results based on VMD-LSMT method.

#### **5. Conclusions**


frequency attribute has a good negative correlation with the coal thickness, which indicates that the seismic attribute is feasible for coal thickness prediction.


**Author Contributions:** Y.H., L.Y., and X.Q. conceived and designed the algorithms; Y.H., X.Q., Z.L., and L.Y. performed the algorithms; Y.H., X.Q., and Y.C. analyzed the data; Y.H. and L.Y. wrote the paper. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Fundamental Research Funds for the Central Universities (Grant Number 2018QNA45) and a project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

