*Article* **Efficient Dynamic Cost Scheduling Algorithm for Financial Data Supply Chain**

**Alia Al Sadawi 1, Abdulrahim Shamayleh 1,2,\* and Malick Ndiaye 1,2**


**Abstract:** The financial data supply chain is vital to the economy, especially for banks. It affects their customer service level, therefore, it is crucial to manage the scheduling of the financial data supply chain to elevate the efficiency of banking sectors' performance. The primary tool used in the data supply chain is data batch processing which requires efficient scheduling. This work investigates the problem of scheduling the processing of tasks with non-identical sizes and different priorities on a set of parallel processors. An iterative dynamic scheduling algorithm (DCSDBP) was developed to address the data batching process. The objective is to minimize different cost types while satisfying constraints such as resources availability, customer service level, and tasks dependency relation. The algorithm proved its effectiveness by allocating tasks with higher priority and weight while taking into consideration customers' Service Level Agreement, time, and different types of costs, which led to a lower total cost of the batching process. The developed algorithm proved effective by testing it on an illustrative network. Also, a sensitivity analysis is conducted by varying the model parameters for networks with different sizes and complexities to study their impact on the total cost and the problem under study.

**Keywords:** financial data; supply chain management; data batching; scheduling; batching cost; parallel processing; optimization; multi-processing commitment

#### **1. Introduction**

In today s world, economies and commodity markets swing rapidly; personal, organizational, and business networks are becoming more interconnected, instrumented, and intelligent [1]. One of the important aspects of the business world is supply chain management. It works towards satisfying customers requirements through the efficient use of resources resulting in reducing cost and increasing profit margin for companies [2]. With a highly competitive global market, supply chains have to be more responsive to customers needs. Supply chain represents all stages of a process; usually, researchers focus on materials that are manufactured to become an end product; however, the supply chain covers a wider range that extends beyond physical assets. For instance, looking at the banking sector, their main supply chain is of financial data that needs to be processed using available resources which are software and hardware processors.

One of the main instruments used in business networks for the management of the financial data supply chain is data batch processing (DBP). It plays a critical role in daily operations carried out in most organizations in different business fields. Data batch processing can be defined as the execution of data files as input batches using the available resources and gather the resulted files as output batches while satisfying files priorities, predecessors constraints, and time constraints [3,4]. The execution is carried out on the mainframe computer that runs at a scheduled time or on an as-needed basis without user interaction and minimal or no interaction with a computer operator [3,4]. The main feature of data batch processing is its ability to handle an enormous amount of

**Citation:** Al Sadawi, A.; Shamayleh, A.; Ndiaye, M. Efficient Dynamic Cost Scheduling Algorithm for Financial Data Supply Chain. *Algorithms* **2021**, *14*, 211. https:// doi.org/10.3390/a14070211

Academic Editor: Frank Werner

Received: 18 June 2021 Accepted: 12 July 2021 Published: 14 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

data files which makes it attractive and essential to many organizations that are constantly dealing with heavy business activities [5]. It contributes to the major part of the workload on mainframe computers that will often run a large number of business workflows with complex interrelations, requiring careful scheduling and prioritizing to ensure that all batch jobs run in the correct order and meet strict deadlines [3,4,6].

Our research considers the case of processing end-of-day (EOD) operations which include highly important and frequently used activities and services such as: preparing customers' bank statements; credit card processing bills; fraud detection data; merging the day s transactions into master files; sorting data files for optimal processing; providing daily, weekly, monthly, and annual reports; invoices and bank statements; performing periodic payroll calculations; and applying interest to financial accounts [4]. Banks seek to elevate their level of customer service by optimizing their data supply chain scheduling. This supply chain consists of the bank, a service provider that arranges the processing of data, and processors providing company that rents and leases processors and software to service providers. Also, in our research, data supply chains adopt the technique of lot streaming which is a technique that splits a given data job, each consisting of similar files, into tasks to allow overlapping of successive operations in processing systems thereby reducing the processing makespan.

The proposed work addresses the financial data supply chain scheduling problem from an operational perspective by considering the scheduling at an individual job level. The problem can be viewed as a single-stage supply chain scheduling problem where jobs are arranged to be processed by mainframe processors that can be modeled as a series of flow shop machines. Processors are leased and rented from manufacturing and supplying company as needed by a third-party company that performs the data scheduling. Data arrives raw and unprocessed from banks to a service provider which needs to manage scheduling them for processing as per the available resources (software and hardware) in the most efficient way. After processing, jobs must be returned back to banks where they will be charged for the provided service.

In order to address financial data supply chain, this research covers all aspects of data batch processing, being that it is the tool that manages banks financial data supply chain scheduling. The scheduling aspects of DBP are job priorities, precedence relationships, constraints in addition to cost. It is important to understand that cost consideration in data batch processing is vital due to the extremely high cost of the resources involved. However, previous research has overlooked the cost despite its importance, and the focus was on the effectiveness of the scheduling component of the process. The high cost of software and hardware resources used in the batch process and the patent rights of companies that own the platform made it not only beneficial, but also essential for DBP users to reduce the cost associated with it. Therefore, this research intends to bridge such a gap and provides a comprehensive study that covers the important aspects of data batch processing. Thus, the main contributions of this work can be summarized as follows:


The rest of the paper is organized as follows: Section 2, the background related to this work is presented. The data batch algorithm is presented in Section 3. The illustrative example and sensitivity analysis are presented in Sections 4 and 5 respectively. Finally, the conclusions are presented in Section 6.

#### **2. Literature Review**

Supply chain management has another perspective when associated with the financial sector. An alternative concept appears that relates to information or data and supply chain known as Financial Information Supply Chain Management. The researchers in [7] emphasized the need to improve financial operating processes in order to optimize financial data supply chain management by reducing processing time, improving efficiency, and decreasing associated costs of equipment and computers.

Other studies reached an aligned outcome such as the case study by [8] to design, develop, and implement an inter-organizational system by the Reserve Bank of Australia (central bank) and other authorities to remodel data reporting by financial institutions in Australia. The research concluded that the complexity of data consumption patterns led to an increased interdependence within the financial information supply chain which requires developing data exchanges and commodity-like IT infrastructures.

Also, a study by [9] stated that multiple governments have implemented Integrated Financial Management Information Systems to improve effectiveness and streamline business processes. Authors determine the need for automated financial operations to improve data supply chain efficiency.

The authors of [7] believe that in order to improve financial performance, we need to utilize a tool from lean manufacturing. They state that although financial operations are not the same as manufacturing operations, they share more similarities than might be acknowledged. These similarities entitle the financial data supply chain to adopt batch processing and scheduling from lean manufacturing.

Batch processing is widely used because of its ability to meet business-critical functions. It is mainly applied when dealing with large, repeated jobs that can be carried out at a prescribed time. The batch process is complex posing a challenge to efficiently allocate the needed resources. In this section, related literature to approaches used in data batching is presented. Also, related work of using data batching in other fields is presented as well.

The scheduling problem was studied by many researchers since it is a critical issue in batch process operation and performance improvement. Page et al. [10] considered eight common heuristics along with the genetic algorithm (GA) evolutionary strategy to dynamically schedule tasks to processors in a heterogeneous distributed system; they found that using multiple heuristics to generate schedules provides more efficient schedules than using each heuristic on its own. Méndez et al. [11] classified and presented the different state-of-the-art optimization methods associated with the batch scheduling problem. Osman et al. [12] proposed a model for data batch scheduling; they presented a dynamic, iterative framework to assign the required tasks to available resources taking into consideration the predecessors, constraints, and priority of each job. Al Sadawi et al. [13] studied the data batch problem with the goal of minimizing costs and satisfying customer service level agreement. Lim and Chao [14] used fuzzy inference systems to model the preferences of users and decide on the priority of each schedule with the goal of providing users with more efficient throughput. Xhafa and Abraham [15] developed heuristic and metaheuristic methods to deal with scheduling in grid technologies which are considered more complicated than scheduling in classical parallel and distributed systems. Aida [16] evaluated multiple job scheduling algorithms performance to investigate the effect of job size characteristics on job scheduling in a parallel computer system. Stoica et al. [17] described a schedule based on the microeconomic paradigm for online scheduling of a set of parallel jobs in a multiprocessor system. The user was granted control over the performances of his jobs by providing him with a saving account containing an amount of money used to run the jobs. Stoica [18] considered the problem of scheduling an online set

of jobs on a parallel computer with identical processors by using simulation to compare three microeconomic policies with three variable partitioning policies. Islam et al. [19] demonstrated higher revenue and better performance by using their proposed new scheduling heuristic called Normalized Urgency to prioritize jobs based on their urgency and their processing times.

A study by Damodaran and Vélez-Gallego [20] developed a simulated annealing algorithm to evaluate the performance of batch systems in terms of total completion time with the goal of minimizing the processing time, and Mehta et al. [21] proposed a parallel query scheduling algorithm by dividing the workload into batches and exploiting common operations within queries in a batch, resulting in significant savings compared to single query scheduling techniques. Grigoriev et al. [22] developed a two-phased LP rounding technique that was used to assign resources to jobs and jobs to machines where a maximum number of units of a resource may be used to speed up the jobs, and the available amount of units of that resource must not be exceeded at any time. Also, Bouganim et al. [23] studied execution plans performance of data integration systems where they proposed an execution strategy to reduce the query response time by concurrently executing several query fragments to overlap data delivery delays with the processing of these query fragments. Ngubiri and van Vliet [24] proposed a new approach for parallel job schedulers fairness evaluation where jobs are not expected to have the same performance in a fair set up when they do not have the same resource requirements and arrive when the queue and system have different states. Arpaci-Dusseau and Culler [25] used a proportional-share scheduler as a building-block and showed that extensions to the above scheduler for improving response time can still fairly allocate resources to a mix of sequential, interactive, and parallel jobs in a distributed environment.

From an economic point of view, Ferguson et al. [26] adopted the human economic model and implemented it on resource allocation in a computer network system which resulted in limiting the complexity of resource sharing algorithms by decentralizing the control of resources. Additionally, Kuwabara et al. [27] presented a market-based approach, where resources are allocated to activities through buying and selling of resources between agents and resource allocation in a multi-agent system. Chun and Culler [28] presented a performance analysis of market-based batch schedulers for clusters of workstations using user-centric performance metrics as the basis for system evaluation. Also, Sairamesh et al. [29] proposed a new methodology based on economic models to provide Quality of Service (QoS) guarantees to competing traffic classes in packet networks. Yeo et al. [30] outlined a taxonomy that describes how market-based resource management systems can support utility-driven cluster computing; the taxonomy is used to survey existing market-based resource management systems to better understand how they can be utilized.

Islam et al. [31] designed a framework that provides an admission control mechanism that only accepts jobs whose requested deadlines can be met and, once accepted, guarantees these deadlines. However, the framework is completely blind to the revenue these jobs can fetch for the supercomputer center. They analyzed the impact of job opportunity cost on the overall revenue of the supercomputer center and attempted to minimize it through predictive techniques. Mutz and Wolski [32] presented a novel implementation of the Generalized Vickrey Auction that uses dynamic programming to schedule jobs and computes payments in pseudo-polynomial time. Mutz et al. [33] proposed and evaluated the application of the Expected Externality Mechanism as an approach to solving the problem of efficiently and fairly allocating resources in a number of different computational settings based on economic principles. Tests indicated that the mechanism meets its theoretical predictions in practice and can be implemented in a computationally tractable manner.

Lavanya et al. [34] proposed two task scheduling algorithms for heterogeneous systems. Their offline and online scheduling algorithms aimed at reducing the overall makespan of task allocation in cloud computing environments. The algorithms' simulation proved that they outperformed standard algorithms in terms of makespan and

cloud utilization. Minimizing makespan was the objective of the research conducted by Muter [35] which tackled single and parallel batch processing machine scheduling. The author presented a reformulation for parallel batch processing machines and proposed an exact algorithm to solve this problem. Also, Jia et al. [36] developed a mathematical model and a fuzzy ant colony optimization (FACO) algorithm to schedule parallel non-identical size jobs with fuzzy processing times. The batch processing utilized machines with different capacities and aimed at minimizing the makespan. Additionally, Li [37] proposed two fast algorithms and a polynomial time approximation scheme (PTAS) to tackle the problem of scheduling n jobs on m parallel batching machines with inclusive processing set restrictions and non-identical capacities. The research aimed at finding a non-preemptive schedule to minimize makespan.

Another study by Josephson and Ramesh [38] aimed at creating a task scheduling process by examining the various real times scheduling algorithm. The study presented a new algorithm for task scheduling in a multiprocessor environment. The authors used TORSCHE toolbox for developing real-time scheduling in addition to utilizing features of particle swarm optimization. The proposed algorithm succeeded in executing a maximum number of the process in a minimum time. Also, Ying et al. [39] investigated the Distributed No-idle Permutation Flowshop Scheduling Problem (DNIPFSP) with the objective of minimizing the makespan. The authors proposed an Iterated Reference Greedy (IRG) algorithm that was compared with a state-of-the-art iterated greedy (IG) algorithm, as well as the Mixed Integer Linear Programming (MILP) model on two benchmark problems showing promising results.

An energy-efficient flexible job shop scheduling problem (EFJSP) with transportation was the core of research by Li and Lei [40]. The authors developed an imperialist competitive algorithm with feedback to minimize makespan, total tardiness, and total energy consumption. The conducted experiments provided promising computational results, which proved the effectiveness of the proposed algorithm. Furthermore, a study addressing distributed unrelated parallel machines scheduling problem aiming at minimizing makespan in the heterogeneous production network was proposed by Lei et al. [41]. A novel imperialist competitive algorithm with memory was developed by authors and experiments were conducted to test the performance of where the computational results proved the effectiveness of the algorithm. A study aimed at optimizing the trade-off between the total cost of tardiness and batch delivery was conducted by Rahman et al. [42]. To achieve this goal, the authors proposed three new metaheuristic algorithms which are the Differential Evolution with different mutation strategy variation, a Moth Flame Optimization, and Lévy-Flight Moth Flame Optimization algorithm. The algorithms were validated through an industrial case study.

Luo [43] tackled the dynamic flexible job shop scheduling problem under new job insertions. The goal of the research was to minimize the total tardiness; therefore, the authors proposed a deep Q-network (DQN). The developed DQN was trained using deep Q-learning and numerical experiments confirmed the superiority and generality of DQN. Also, Yun et al. [44] suggested a genetic algorithm based energy-efficient designtime task scheduling algorithm for an asymmetric multiprocessor system. The proposed algorithm adaptively applies different generation strategies to solution candidates based on their completion time and energy consumption. Experiments proved that it minimized energy consumption compared to existing methods. Finally, Saraswati et al. [45] aimed at minimizing the total tardiness in batch completion time using the metaheuristic approach of simulated annealing. The programming was done using Python programming. The research case study scheduled batches to parallel independent machines where the results of data processing demonstrated reduced total tardiness.

As concluded from the above survey, none of the articles found in the literature covered all aspects of data batch processing and scheduling. While some researchers concentrated on fulfilling the time constraint, others aimed at scheduling the maximum number of tasks effectively; however, none of the studies found in the literature considered cost during the scheduling process of data batches. This work tackles the significantly effective and widely used data batching process covering all its aspects. It will focus on scheduling a set of required jobs to be processed in a batch using the available resources (i.e., processors) as per the predecessors, job priorities and constraints stated in the SLA while batch job s predecessors and priorities are specified by the client depending on the type of tasks handled. This work represents a major contribution to the literature since it comprises all aspects of the DBP including cost.

#### **3. Data Batch Algorithm**

This research tackles the financial data supply chain management for the banking and financial sector which mainly revolves around processing financial-related data using available resources. Those resources are usually software and hardware processors. A major tool used in the management of the financial data supply chain is data batch processing (DBP). Our study covers all aspects of data batch processing—such as job priorities, precedence task relation, and time constraints—in addition to different types of cost. It is vital to emphasize the importance of the cost aspect in data batch processing since the utilized resources costs are very high. Yet, previous research work has neglected considering DBP cost despite its significance. Therefore, this research is extremely important since it includes different types of DBP costs in addition to all other aspects in a scheduling algorithm.

The DBP which the financial sector relies heavily on has been bounded by the IT systems that drive banking businesses making them in severe need of rapid, efficient and effective processes to be able to focus on their clients holistically. They use DBP widely in their daily operations like processing end-of-day (EOD) jobs which include highly important and frequently used activities and services such as: preparing employees payroll, interest rate calculations, and customer s bank statements, credit card processing bills, fraud detection data, and many others [4]. Banks usually outsource DBP to a thirdparty company (service provider) that takes charge of arranging and processing the data and aggregating the output as shown in Figure 1. The service provider company leases processors and software from a provider to perform the batch process for the bank based on a Service Level Agreement (SLA). The service provider usually operates after closing hours so there will be no more data entries or online intervention.

**Figure 1.** Data batching service network.

DBP begins with scheduling data, gathered in batches by type, to the available processor. The data files are scheduled while taking into consideration job priorities, predecessors and other constraints meaning that jobs will not be processed simultaneously but rather as per their schedule. The objective is to execute the tasks using the available resources to improve utilization, reduce cost and increase their profit while meeting the SLA. Each job consists of one or more executable data files. It is considered the surface layer of batch systems. A job consists of multiple tasks; therefore, the network of tasks is a detailed network view of jobs. In other words, a job is defined as a collection of tasks that are used to perform a computation. The developed dynamic scheduling algorithm considers cost types associated with the data batch scheduling which are: servers and software leasing cost, rental cost for additional resources needed in case of overload and extra work, penalty cost of failing to execute the batch process as per the (SLA), and the opportunity

cost representing the cost of idling a resource for any period of time due to inefficient task allocation.

The data batching dynamic algorithm will be presented next. The algorithm will efficiently decide on resources allocation for the service provider to minimize the total operational cost. The service provider will batch process data for its clients one at a time according to a specific service agreement with that client.

#### *3.1. Dynamic Cost Scheduling Algorithm for Data Batch Processing (DCSDBP)*

Usually, given the current business practice dealing with such techniques, certain market-based rules govern the implementation of these scheduling processes. Therefore, the following assumptions underpinning the proposed algorithm are applied:


3.1.1. Indices


#### 3.1.2. Problem Parameters


#### 3.1.3. Problem Variables


#### 3.1.4. Problem Decision Variables


The DCSDBP algorithm is illustrated in Figure 2 and the step-by-step description of the algorithm is as follows.

**Figure 2.** Dynamic Cost Scheduling Algorithm for Data Batch Processing (DCSDBP).

Step 1: Preparatory and Initialization Stage

This step involves preparing the initial data which consists of the subset of data files that are ready for processing and the availability of leased and rented. We also set:


$$I\nu^T = \{\}\tag{1}$$

Set

Set

$$P\_k = 1 \,\forall \, k \in K \tag{2}$$

Equation (2) indicates that leased processors are ready since the binary variable *Pk* is set to 1.

$$\mathcal{W}r = 1 \,\, \forall \, r \in \mathcal{R} \tag{3}$$

Equation (3) indicates that rented processors are ready since the binary variable *Wr* is set to 1.

Set

Set

$$V^T = 0\tag{4}$$

Equation (4) indicates that the number of rented extra processors acquired at this time period *VT* is zero.

$$q\_i^T = 0 \,\forall \, i \in I \tag{5}$$

Equation (5) indicates that a data file *i* have never been processed since *q<sup>T</sup> <sup>i</sup>* is set to zero.

Set

$$T = 0\tag{6}$$

Equation (6) indicates the beginning of the time loop of the algorithm where the time *T* is set to zero.

Step 2: Set of Files Available for Processing

We assign parameters and weights to the subset of data files available for processing *I <sup>T</sup>* based on their precedence obtained from the dependency matrix.

If:

$$\sum\_{j=1}^{J} l\_{ij}^{T} = \begin{array}{c} 1 \nmid i \end{array} \tag{7}$$

Then set:

$$f\_i^T = \mathbf{1} \tag{8}$$

$$I\nu^T = I\nu^T + \{i\}\tag{9}$$

$$\boldsymbol{\alpha}\_i^T = \sum\_{\theta=1}^I l\_{\theta i}^T \forall \ i \tag{10}$$

*αT <sup>i</sup>* is the weight for each data file. It is calculated using the precedence/dependency matrix by finding the number of files that depend on file i.

This step will indicate the set of data files available for processing which will serve as input for step 3 to start the scheduling of available files on the available processors.

#### Step 3: Allocation of Files to Processors

In this step, the algorithm allocates files to processors. The model presented in this step will allocate files with the objective function (11) of minimizing data file allocation cost while taking into consideration priority, weight and criticality of each file included in each subset at any time *T*.

Min (Z T) = *<sup>I</sup>* ∑ *i*=1 *K* ∑ *k*=1 ( *Csv*+ ( *Csf*<sup>+</sup> *Ch BW* ) *β<sup>i</sup> α<sup>T</sup> i* ) *X<sup>T</sup> iK*<sup>+</sup> *<sup>K</sup>* ∑ *k*=1 ( *Csf*+*Ch BW* ) (1<sup>−</sup> *<sup>I</sup>* ∑ *i*=1 *X<sup>T</sup> iK*)+ *<sup>I</sup>* ∑ *i*=1 *V* ∑ *v*=1 ( *Cesv*+*Cesf*+*Ceh β<sup>i</sup> α<sup>T</sup> i* )*Y<sup>T</sup> ir*+ *V* ∑ *v*=1 (*Cesf* <sup>+</sup> *Ceh*) (<sup>1</sup> <sup>−</sup> *<sup>I</sup>* ∑ *i*=1 *YT ir* )+ *<sup>A</sup><sup>T</sup>* ∗ *Cp* ∗ (*<sup>T</sup>* − *SLA*) (11)

The first term considers the basic processors leasing cost, weight *α<sup>T</sup> <sup>i</sup>* and priority *β<sup>i</sup>* at any time unit *T*. The second term considers the opportunity cost of not utilizing the leased processors at any time unit *T.* The third term handles the cost of renting additional processors, weight *α<sup>T</sup> <sup>i</sup>* and priority *β<sup>i</sup>* at any time unit *T.* The fourth term considers the opportunity cost of not utilizing the additionally rented processors at any time *T.* the last term is concerned with the penalty cost of exceeding the *SLA*. The *β<sup>i</sup> α<sup>T</sup> <sup>i</sup>* used in terms 1 and 3 ensures that files with higher priority and weight are scheduled first. *BW* is the available time during which processing can take place. It is used in Equation (11) to find what the fixed cost of resources per unit time is.

Subject to:

$$\sum\_{k=1}^{K} X\_{ik}^T + \sum\_{r=1}^{V} Y\_{ir}^T \le M\_i f\_i^T \; \forall \; i \in I\\\nu^T \text{ where } \text{Mi} = \min \left\{ e\_i, n\_i - q\_i^T \; \; K + V \right\} \tag{12}$$

The constraint in Equation (12) is concerned with leased and additionally rented processors allocation and their availability to ensure that the total file allocation does not exceed either file multiprocessing *ei* or file required remaining processing *ni* or the total number of available basic and extra processors (*K,V*) for any file *i* at a certain time *T*.

$$q\_i^T \le n\_i \lor i \in I \tag{13}$$

The constraint in Equation (13) ensures that the number of times a file is processed is less or equal to the needed processing time.

$$\sum\_{i=1}^{l'} X\_{ik}^T \le P\_k^T \; \forall \; k \in \mathcal{K} \tag{14}$$

The constraint in Equation (14) ensures that exactly one data file is allocated to a single leased processor.

$$\sum\_{i=1}^{I'} Y\_{ir}^T \le \mathcal{W}\_r^T \,\forall \, r \in \mathcal{R} \tag{15}$$

The constraint in Equation (15) ensures that exactly one data file is allocated to a single additionally rented processor.

$$X\_{iK}^T \text{ 0 or } 1 \; \forall \; i \in I \text{ and } k \in K \tag{16}$$

The constraint in Equation (16) declares that the decision variable *Xik* is binary, meaning that file *i* either be assigned to a leased processor or not.

$$Y\_{\rm ir}^{T} = \begin{array}{c} 0 \quad \text{or} \quad 1 \quad \forall \; i \in I \text{ and } r \in \mathbb{R} \end{array} \tag{17}$$

The constraint in Equation (17) declares that the decision variable *Yir* is binary, meaning that a file *i* either be assigned to an additionally rented processor or not.

#### Step 4: Update Utilized Extra Processors

At this stage, we check if an additional processor should be rented at this time unit to avoid delay and penalty cost. This step involves calculating the critical path duration for the rest of the unprocessed activities in the data files network at each time unit and compare it to the remaining time until the end of the batch process time that is agreed on in the *SLA*. A trade-off between the cost of renting an additional processor and the penalty cost is made, and according to that, it will be decided whether to rent a new processor or incur a penalty cost. Renting a new processor is subject to the condition that the total number of data files available for processing is higher than the total number of rented basic and extra processors.

Critical path duration

$$\begin{array}{l} \text{Calculate } lI^T = \text{duration of the remaining critical path at time } T.ES\_i^T = \text{Max} \\ \{ES\_j^T + (n\_i - q\_i^T)\} \,\forall \, i \text{ and } j \in I \text{ where } (i \neq j) \text{ and} \\ \mathbf{a}\_i = \mathbf{a}\_j \end{array} \tag{18}$$

$$ES\_{i}^{T} = \begin{array}{c} \text{0} \; \forall \; i = \; \text{0} \tag{19}$$

$$LS\_{\hat{i}}^T = \text{Min}\left\{LS\_{\hat{j}}^T - (n\_{\hat{i}} - q\_{\hat{i}}^T)\right\} \; \forall \; i \text{ and } j \in \; I \text{ where } (i \neq j) \text{ and} a\_{\hat{i}} \cdot \Leftarrow = a\_{\hat{j}} \tag{20}$$

$$S\_i^T = LS\_i^T - ES\_i \tag{21}$$

$$\mathbb{U}I^T = \sum\_{i=1}^I (n\_i - q\_i^T) \,\forall \, i \in \ I \text{ and } S\_i^T = 0 \tag{22}$$

In the above equations the slack *S<sup>T</sup> <sup>i</sup>* for each file *i* is calculated and the critical path for the network *UT* is found for the files with slack equal to zero (*S<sup>T</sup> <sup>i</sup>* = 0).

Checking critical path duration against SLA

$$
\Delta I^T \geq SLA - T \tag{23}
$$

Then

If

$$T\mathbb{C}\_{\mathcal{P}} = \mathbb{C}\_{\mathcal{P}} \* (\mathbb{L}^{\top} - (SLA - T))\tag{24}$$

Else

$$TC\_p = 0\tag{25}$$

The above equations calculate the penalty cost *TCp* for all cases of critical path duration against *SLA.*


If

Then

*TH* = (*Cesf* <sup>+</sup> *Cesh* <sup>+</sup> *Cesv*) <sup>∗</sup> *<sup>U</sup><sup>T</sup>* (27)

Else

$$T\_H = 0\tag{28}$$

*<sup>U</sup><sup>T</sup>* <sup>≥</sup> *SLA*−*<sup>T</sup>* (26)

The above equations calculate the extra processor renting cost *TH* in case of a delay. Amount of processing to completion

$$D^T = (\sum\_{i=1}^{I'} c\_i - q\_i^T) \forall i \in I^T \text{ and } \begin{array}{l} e\_i > 1 + (\sum\_{i=1}^{I'} e\_i^T) \ \forall i \in \\ I^T \text{ and } e\_i = 1 \end{array} \tag{29}$$

Decision on renting extra processor

If

$$T\mathbb{C}\_p \ge T\_H \text{ and } D^T > \ (K+V) \tag{30}$$

Then

$$V^T = V^T + 1\tag{31}$$

$$T^r = T \tag{32}$$

$$V^T \le R \tag{33}$$

In the above equations, the number of rented extra processor *V<sup>T</sup>* is increased by one if the penalty cost *TCp* is greater than the extra processor renting cost *TH*.

In this step, we are calculating the maximum completion time using CPM without considering splitting because the decision to multi-process a job is not taken yet. A job multi-processing means it can be split if needed to minimize the completion time of the batch process. CPM is not the only tool in this study to make the decision. Looking at Equations (29) and (31), the decision to rent an additional processor is based on another criterion. *D* was introduced which, as per Equation (29), ensures that no extra processor shall be rented unless there is an available task ready to be allocated to it. This ensures that no additional processor will be rented unless it will be used.

Step 5: Update the Availability of Files

Each data file processing parameter is incremented by the number of times it was processed. When a file is fully processed, then it is removed from the subset for the following time unit and the rest of the process.

Update *f <sup>T</sup> <sup>i</sup>* Matrix:

$$q\_i^T = q\_i^T + \sum\_{k=1}^K X\_{ik}^T + \sum\_{r=1}^R Y\_{ir}^T \forall \ i \in \ I \nu^T \tag{34}$$

If

$$O\_i^T = 1\tag{35}$$

Then

$$f\_i^{T+\Lambda} = \mathbf{1} \tag{36}$$

Else

$$f\_i^{T+\Lambda} = 0\tag{37}$$

And

If

*ji* = 0 ∀ *j* (38)

Then

And *<sup>I</sup>*

$$\sum\_{i=1}^{I} l\_{ji}^{T+\Delta} = 0 \,\forall \ j \tag{40}$$

*ij* = 0 ∀ *i* (39)

This step will update the file availability to determine if any file needs further processing or all files are completed.

*l T*+Δ

*J* ∑ *j*=1 *l T*+Δ

*qT <sup>i</sup>* = *ni*

#### Step 6: Check Termination Condition

When all files are processed, then the algorithm shall stop, else the model will go to step 7.

If 
$$\sum\_{i=1}^{I} q\_i^T = \sum\_{i=1}^{I} n\_i \tag{41}$$


Step 7: Update Clock

Increment iteration clock by one time unit until the DCSDBP algorithm allocates all files.

$$T = T + 1\tag{45}$$

*A<sup>T</sup>* = 0 (44)

Go to Step 2

Steps 2 through 7 of the DCSDBP will be repeated until all tasks are allocated to available resources and there are no more jobs waiting in the queue.

The Total batch process cost is updated as follows:

$$\begin{array}{l} \text{TBC } = \left(\mathbf{C}\_{sf} + \mathbf{C}\_{h}\right) \ast \mathbf{K} + \sum\_{T=1}^{END} \sum\_{i=1}^{I^{T}} \sum\_{k=1}^{K} \mathbf{C}\_{\text{s}\mathbf{v}} \ast \mathbf{X}\_{ik}^{T} \\ \qquad + \sum\_{T=1}^{BW} \sum\_{i=1}^{I^{T}} \frac{\mathbf{C}\_{\text{cS}f} + \mathbf{C}\_{h}}{\text{BW}} \ast \left(1 - \mathbf{X}\_{ik}^{T}\right) \\ \qquad \qquad + \sum\_{r=1}^{V} \left(\mathbf{C}\_{\text{csf}} + \mathbf{C}\_{\text{ch}}\right) (END - T^{r}) + \sum\_{T=1}^{END} \sum\_{i=1}^{I^{T}} \sum\_{r=1}^{V} \mathbf{C}\_{\text{cS}\mathbf{v}} \ast \mathbf{Y}\_{ir}^{T} \\ \qquad + \sum\_{r=1}^{V} \sum\_{i=1}^{I^{T}} \sum\_{T=T'}^{END} \left(\mathbf{C}\_{\text{csf}} + \mathbf{C}\_{\text{ch}}\right) (1 - \mathbf{Y}\_{ir}^{T}) + A^{END} \ast \mathbf{C} p \ast (T^{END} - SLA - SLA) \end{array} \tag{46}$$

The total cost and the completion time will be calculated at the end of the DCSDBP algorithm. The total batch process cost (C) = leased processors fixed hardware cost + leased processors fixed software cost+ leased processors variable software cost + leased processors opportunity cost + rented processors variable software cost + rented processors fixed software cost + rented processors hardware cost + rented processors opportunity cost + penalty cost.

#### **4. Illustrative Example**

In this section, we present a numerical example to illustrate the proposed DCSDBP algorithm. Consider a network of 15 data files with the precedence relations as shown in Figure 3.

**Figure 3.** Illustrative example.

The precedence relationships are fixed relationships provided by the client that the service provider must use in processing the files; therefore, there cannot be any deadlocks relation. Table 1 presents the different parameters' values; these values are either given directly such as file s multiprocessing *ei*, processing time *ni* and priority *βi*, or derived from the files precedence relation such as file s weight *αiˆT* and precedence parameter *LijˆT*. The following data were also used: there are two available leased processors; maximum number of available processors than can be rented is 5, *SLA* = 18 time units; batch window

BW = 22 time unit; leased processor fixed software cost *Csf* = \$10; leased processor hardware cost *Ch* = \$100; leased processor variable software cost per time unit *Csv* = \$2; rented processor fixed software cost *Cesf* = \$12.5; rented processor Hardware Cost *Ceh* = \$125; rented processor variable software cost per time unit *Cesv* = \$2.5; penalty cost per time unit in case of exceeding the *SLA* is *Cp* = \$200.

**Table 1.** Model parameters at *T=0*.


The algorithm was programmed using LINGO 15.0 x64. The Lingo output shows that the files were processed in 20 time units, while 4 extra processors were rented, and the batch process exceeded the *SLA* by 2 time units, which indicates that penalty cost was imposed. Once the batch process started, the program sets all initialization conditions, which means that ready files subset *I <sup>T</sup>* is empty. At the beginning of each time unit *T*, the precedence parameters *l T ij* for all files are checked to determine which files are ready to be processed. All files having precedence parameter *l T ij* = 1 are considered ready and inserted to the ready files subset *I <sup>T</sup>*. It is worth mentioning that files 0 and 16 are start and end files with processing time *ni* = 0 which means they won t be allocated to any processor. The reason of their existence is to start and end the network for critical path calculation purposes.

The algorithm started at *T* = 0 with files 1, 2, 3, 4, 6, 9, and 10 ready for processing; therefore, they were inserted into the ready files subset *I <sup>T</sup>*. At *T* = 0 files 4 and 6 were processed by leased processor 2 and 1 respectively because these files have the highest calculated weight *<sup>α</sup>T*=<sup>0</sup> *<sup>i</sup>* and predetermined priority *<sup>β</sup><sup>i</sup>* while the rest of ready files were shifted for the next time unit. At *T* = 1, the values of precedence parameter *l T ij* are updated for all files, so the ones that have not been processed or not finished processing still have the value of 1 and are consequently still included in the ready files subset *I <sup>T</sup>* while file 6 which has a processing time *ni* of 1 has the value of *l T ij* =0 and no longer exist in the subset *I <sup>T</sup>* since it is considered ready. Also, an additional processor *V<sup>T</sup>* was rented and utilized at that time unit because the criteria of renting a new processor was satisfied. It was found that the critical path of the remaining network activities exceeds the *SLA*, so the program needs to take an action to try to avoid the delay. The decision of renting a new processor is made since the cost of renting a new processor *TH* was found to be less than the penalty cost *TCp* at that time unit and also since there are ready files for the next time unit that exceeds the total number of available processor. During *T* = 1 file 4 was processed by leased processors 1 and 2, as well as by the additionally rented processor 1.

At *T* = 2 another additional processor was rented based on the above-explained mechanism. File 4 continued to be processed by leased processors 1 and rented processors 1 and 2

since it has a multiprocessing of *ei* = 3. Also, file 2 was processed by leased processor 2. At *T* = 3 an additional third processor was rented and file 1 was processed by leased processors 1 and 2 and rented processor 3 due to its multiprocessing criteria. File 4 was processed by rented processors 1 and 2 and by that it is considered ready. At *T* = 4 the forth additional processor was rented and used to process file 1 along with rented processors 1 and 3 while rented processor 2 and leased processor 1 were used to process file 3. File 5 was processed by leased processor 2. *T* = 5 had the same files allocations as *T* = 4. It is noted that the program did not rent any more extra processor starting from *T* = 5 onwards which means it utilized 4 out of the 5 processors available for renting and that is based on the renting mechanism. The same steps were executed on all files until the end of processing at *T* = 19 when all files were processed.

The batch process cost calculation was based on using Equation (46) by using the input values listed earlier in this section and the allocation results from Lingo output; Table 2 demonstrates the detailed costs. Also, Table 2 clarifies the cost of allocating files to leased processor and that includes hardware and software fixed costs as well as software variable cost. Similarly, rented processors allocation cost which includes hardware and software fixed costs as well as software variable cost is shown. Then the opportunity costs associated with each processor type is calculated. Penalty cost is determined at the end of the batch process and total cost is found by summing all costs for each time unit. In Table 2, column (1) represents the time unit *T*, column (2) shows the total number of existing leased processors K, column (3) identifies how many leased processors are actually being acquired at each time unit *T*, column (4) calculates the leased processors variable cost *Csv* while the fixed software *Csf* and hardware *Ch* costs are calculated at the end because they are not related to time. Leased processors opportunity cost is determined in column (5) as per Equation (27). For rented processors, column (6) shows the number of additionally rented processors *V<sup>T</sup>* at that time unit *T*, column (7) represents the number of rented processors actually being acquired at each time unit *T*. In column (8), and based on model assumption 14 which states that rented processors are paid for from the time unit *T* they are rented onwards, rented processors total allocation cost is calculated using all types of costs (extra fixed hardware cost *Ceh*, extra fixed software cost *Cesf* and extra variable software cost *Cesv*). It is worth mentioning that in case of an additional processor being rented but not utilized due to unavailability of ready files, the total allocation cost will equal fixed hardware cost *Ceh* plus fixed software cost *Cesf* while the variable software cost *Cesv* will not exist since the processor is not being utilized. Rented processors opportunity cost is found in column (9) using fixed hardware cost *Ceh* plus fixed software cost *Cesf*. Finally, column (10) sums all the above costs for each time unit *T*.

At the end of Table 2 leased fixed costs are added, they represent fixed hardware cost *Ch* plus fixed software cost *Csf* for each leased processor k. Also as mentioned above penalty cost is calculated based on number time units the processing is delayed beyond the *SLA*. Since leased processors fixed hardware cost *Ch* and fixed software cost *Csf* are calculated per unit time by dividing them by *BW*, remaining opportunity cost for basic processor exists in case the batch process time *END* is less than batch window *BW*. It represents the opportunity cost of the leased processors for the time units between the end of batch process *END* and *BW*.

From Table 2, we can see that from *T* = 0 until *T* = 17, the basic allocation cost was showing the utilization of both leased processors, which explains the zero value for opportunity cost of leased processors during the same period. However, at *T* = 18 and 19 one leased processor was utilized and that ended up in variable allocation cost for one processor and opportunity cost for the other. It can be also noticed that from *T* = 0 up to *T* = 10, every rented processor was utilized which ended in zero opportunity cost. After *T* = 10 some rented processors were not utilized due to unavailability of ready files such as at *T* = 11 where 3 out of 4 rented processors were utilized. Also at *T* = 12, 13, 16, 17, 18, and 19 none of the rented processors were utilized due to the unavailability of ready files.


**Table 2.** DBP cost summary.

The above illustrative example shows the effectiveness of the developed algorithm in allocating files to processors. The algorithm managed to allocate highly prioritized and weighted files before the ones with lower priority and weight. Also, the algorithm will advise renting the necessary number of extra processors to accomplish the batch process goal while trying to minimizing cost. In the illustrated example, the program rented 4 extra processors to achieve minimum real batch process time, which is 20 time units. Although that exceeds the SLA specified time of 18 time units, it is the minimum possible execution time for this network due to network logic and predecessors relations.

The illustrative example shows that the batch processing is performed using a set of assumptions and constraints under which the problem makes the best decision at that time unit. Decisions are made dynamically based on the current status and previous decisions. This decision process ascertains that the best decision is taken at each iteration. We do not define the global network (of all feasible assignments of jobs to servers) to perform a direct optimization on it, we rather generate the subnetwork of feasible assignments at each iteration (Step 3 of the algorithm). This sequence of optimum decision-making ensures that the solution at the end is global. Not being optimal would mean that a better solution exists at some stage in the optimization process, which would contradict stage 3 as built.

#### **5. Sensitivity Analysis**

The whole research was motivated by working with a company in the field, therefore, whatever assumptions and constraints applied to the algorithm are based on practice. Given the initial results obtained from the illustrative example, they were promising and could be easily extended to cover other networks.

Sensitivity analysis was performed in two parts. The first part was conducted on the illustrative example network by changing different parameters. In part two, the algorithm performance was tested using networks with varying sizes and complexities.

#### *5.1. Parameters Variation Analysis*

Different parameters were tested and the summary of results total cost and process time are shown in Figures 4 and 5, respectively.

**Figure 4.** Result of sensitivity analysis on DBP total cost.

**Figure 5.** Result of sensitivity analysis on DBP batch time.

#### 5.1.1. Varying Number of Processors Available to Rent

In the case of having four processors available for rent to be used in the batch process, the results were the same as the case of five available processors since the same number of processors were actually rented as in the illustrative example results shown earlier. However, having three available processors to rent resulted in a longer batch process time and higher total cost in addition to that, the SLA was exceeded by more time units. Having two processors available for rent resulted in more delay in batch process time, increased the total cost, and the SLA was exceeded by 6 time units. The batch process had the highest delay and total cost in the last case of having only one available processor to rent. The previous results shows that the algorithm is renting additional processors only when needed in order to perform the batch process with minimum execution time and total cost.

#### 5.1.2. Changing SLA Value

Different values for the SLA were tested. The results show the lower the SLA value, the higher the penalty cost will be or the need to rent additional processors to avoid delay and the opposite in case of higher SLA. The values tested are for SLA = 16, 17, 19, and 20. It is recommended that, when deciding on the SLA time between the client and the service provider, both sides study the data batch network carefully to decide on the right SLA terms that serve both sides and achieve the goal of the batch process.

#### 5.1.3. Varying Penalty Cost per Time Unit

Different penalty cost values were assumed and the algorithm was tested accordingly. The results showed that the lowest batch process cost is obviously obtained when the penalty cost is set to zero. In this case, there will not be any trade-off between cost of renting an additional processor and any other cost; however, the program needed more time to run since the completion time to process doubled. In the case of the penalty, cost equals to \$5 same results were obtained as in the case of zero penalty cost. Three extra processors instead of four were used with a penalty cost equal to 8 and the total batch process time was higher; however, the total cost is less. In cases of penalty cost of \$20 and \$100 per time unit, the same number of extra processors were rented as of the case of penalty cost per unit time = \$200 but with a less total bath cost. Increasing the penalty cost further to \$500 will not affect the way files are allocated to servers since the program will not be able to squeeze the batch time more even if the trade-off between renting more extra processors and penalty cost goes in favor of utilizing another rented processor simply because of the network s activity relations. In other words, due to precedence constraints jobs are processed only when they are available. Therefore, even if more resources are available, the duration will not be reduced since the resources will not be utilized; i.e., paying more money for resources may not reduce the completion time.

#### 5.1.4. Changing the Processor Rental Costs

When increasing the renting cost of an additional processor to 2.5 times the leased processor cost, the program rented the same number of processors to finish the batch process because the penalty cost is still higher relatively. It only ended up increasing the total batch cost, while the same number of additional processors were utilized. Increasing the processors renting cost to 5 times the leased processor cost did not stop the program from renting additional processors to finish the batch process because the penalty cost is still relatively higher. However, the total batch cost increased while the same number of additional processors were utilized. To prove the efficiency of the algorithm, the case of equal costs for leased and rented processors was tested, the algorithm rented only what it needed from available additional processors to finish the batch process. Finally, in the case of setting costs ratio to 0.5, the total price went lower but the files-to-processors allocation stayed the same.

The above findings and analysis prove that when the algorithm reaches the optimum solution, it does not utilize any unnecessary resources since it is part of the objective function to minimize all types of costs while trying to meet SLA deadline and satisfy all priorities and constraints.

#### *5.2. Jobs Network Size Analysis*

In the second part of the sensitivity analysis, the developed algorithm was tested by varying the number of jobs and the network complexity. Different network sizes and relationship complexities were generated using a random network generator RanGen2 software [46]. Several data for different network sizes of 15, 25, 50, and 100 jobs were generated. Each network size was also generated based on precedence complexity measure by an indicator of complexity "I2 index" which is a measure of the closeness of a network to a serial or parallel network based on the number of progressive levels. If I2 = 0 then the network activities are all in parallel, meaning no relation between them, while if I2 = 1 then the network activities are all in series [47]. The average program run time for the complete DCSDBP Algorithm for each network is shown in Table 3. The results demonstrate that the algorithm is efficient since the program run time is relatively low and acceptable.


**Table 3.** Run times for variant network sizes with different complexities.

#### **6. Conclusions**

The financial data supply chain is of huge importance to the banking sector. It impacts financial institutes performance and customer service. Therefore, it is of great necessity to manage the scheduling of the financial data supply chain. The main tool utilized in the financial data supply chain is data batch processing. Batch scheduling and processing are extremely important because they are widely used in service industries to track tasks and data continuously. However, there is a lack of an efficient scheduling solution for data batch scheduling which creates a major issue for the financial sector. The goal of this work is to develop an iterative Dynamic cost scheduling for DBP (DCSDBP) algorithm that includes all aspects of DBP. Different types of costs associated with the batch process were taken into consideration to develop the iterative scheduling algorithm. While the algorithm worked towards minimizing these costs, it aimed at the same time to allocate files based on their weight and priority without violating network predecessors relations. Also, the algorithm tries to satisfy the time limit specified in the SLA. The developed algorithm proved its effectiveness in allocating data files to available resources while satisfying priority and predecessors constraints in addition to maintaining the minimum possible cost, keeping in mind the SLA time limit. After coding the developed algorithm using Lingo, a number of networks were used to test it. It was concluded from the results that it is more effective to include all types of costs along with priority, weight, predecessor, and time factors, which led to a more effective allocation and a lower total batch process cost. It can also be seen from the results that renting more processors does not necessarily mean that the batch process will be performed in a shorter time because network logic relations or 'predecessors relations govern the process s total time. The decision of whether or not to

rent a new processor and when to do so is very important since it will affect the whole batch process in terms of file allocations to processors and total processing cost and time. Our research has positive implications on the performance and customer service of financial institutes and banks that choose to adopt it. It optimizes the scheduling of the data batch process which reflects on a more efficient and reliable financial data supply chain and through better management. The algorithm was developed under certain assumptions; while we tried to generalize it to cover batch processing, there are some limitations that could be addressed in future researches. For instance, it was assumed that resources costs are the same for leased processors, this could be generalized to assume different costs or even different costs as a future research direction. Also, additional research could be done on cases where renting additional processors has varying costs. Future researchers might also work on developing the additional processors renting mechanism to allow more than one processor to be rented for each time unit in case that serves the total completion time and maintains a low cost. In addition, one of our basic assumptions is that processors reservation is not allowed, which can be researched further to test the case where processor reservation is allowed and how it can be implemented; in addition to its impact on the different aspects of the batch process such as total process time, total cost and basic and extra processor utilization.

**Author Contributions:** A.A.S.: conceptualization, methodology, software programming, validation, and writing the original draft. A.S. and M.N.: conceptualization, methodology, and finalizing the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** The authors received no specific funding for this work.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable. This article does not contain any studies with human participants or animals performed by any of the authors.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Conflicts of Interest:** The authors have no conflict of interest to declare.

#### **References**


**Tianhua Zheng, Jiabin Wang \* and Yuxiang Cai**

College of Engineering, Huaqiao University, Quanzhou 362000, China; 19014084012@stu.hqu.edu.cn (T.Z.); 19014084001@stu.hqu.edu.cn (Y.C.)

**\*** Correspondence: fatwang@hqu.edu.cn

**Abstract:** In hybrid mixed-flow workshop scheduling, there are problems such as mass production, mass manufacturing, mass assembly and mass synthesis of products. In order to solve these problems, combined with the Spark platform, a hybrid particle swarm algorithm that will be parallelized is proposed. Compared with the existing intelligent algorithms, the parallel hybrid particle swarm algorithm is more conducive to the realization of the global optimal solution. In the loader manufacturing workshop, the optimization goal is to minimize the maximum completion time and a parallelized hybrid particle swarm algorithm is used. The results show that in the case of relatively large batches, the parallel hybrid particle swarm algorithm can effectively obtain the scheduling plan and avoid falling into the local optimal solution. Compared with algorithm serialization, algorithm parallelization improves algorithm efficiency by 2–4 times. The larger the batches, the more obvious the algorithm parallelization improves computational efficiency.

**Keywords:** hybrid mixed-flow workshop; hybrid particle swarm algorithm; algorithm parallelization; computational efficiency

#### **1. Introduction**

The traditional shop scheduling model takes single shop scheduling as the goal, but, in actual discrete manufacturing [1], the job shop and the flow shop are closely connected. The production process includes parts processing, component assembly and product assembly. In this production environment, optimizing one of the workshops leads to a mismatch between the progress of parts processing and subsequent component-assembly and final assembly workshops, resulting in a large amount of inventory and prolonging the product cycle, affecting the production process. Therefore, in the face of the problem of hybrid mixed-flow workshop scheduling, it is necessary to establish integrated scheduling of multiple workshops from the perspective of overall optimization.

The current solutions to the hybrid workshop scheduling problem include accurate calculations for low-complexity and small-scale problems [2–4] and heuristic algorithms. Due to accurate calculation, the calculation time increases exponentially with the complexity of the workshop scheduling problem and the application value is limited. For heuristic algorithms, it performs well on today's workshop scheduling problems, so heuristic algorithms are widely used today. Smutnicki [5] proposed an approximation algorithm based on tabu search, with the goal of minimizing processing time and studying a mixed flow shop with a limited intermediate buffer area. Wang et al. [6] proposed a multi-objective genetic algorithm to study the integrated scheduling problem of flow shop with buffer. Seidgar et al. [7] considered the coordination trade-off model of maximum process time and average completion time and used intelligent algorithms to solve and study the optimization of two-stage flow-shop scheduling with assembly tasks. Na et al. [8] proposed an evolutionary algorithm using three-segment coding to study the production planning and scheduling of mixed-flow products in flexible workshops with processing tasks and assembly tasks. Zhang et al. [9] used an optimized genetic algorithm to solve the problems

**Citation:** Zheng, T.; Wang, J.; Cai, Y. Parallel Hybrid Particle Swarm Algorithm for Workshop Scheduling Based on Spark. *Algorithms* **2021**, *14*, 262. https://doi.org/10.3390/ a14090262

Academic Editor: Frank Werner

Received: 7 August 2021 Accepted: 23 August 2021 Published: 30 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

of minimum total completion time and long equipment idle time for single-piece and small-batch hybrid workshop scheduling. Teymourian [10] faced the problem of assembly job mixed-flow shop scheduling, to the artificial immune algorithm they added the ant colony algorithm to change the antibody to avoid falling into the local minimum and obtain a better scheduling plan. Lou et al. [11] proposed an immune cloning algorithm to solve the problem when studying the optimization problem of hybrid workshop scheduling and achieved an effective solution. Hu et al. [12] proposed a genetic algorithm for multi population parallel and population screening and updating in a phased convergence manner to study the hybrid mixed-flow workshop scheduling problem. Li et al. [13] investigated hybrid mixed-flow workshop scheduling by proposing a hybrid genetic algorithm with the goal of minimizing cache area inventory. Lu et al. [14] proposed the game particle swarm optimization algorithm to study the hybrid mixed-flow workshop scheduling with the goal of parts shop uniformity and minimum inventory. Wang [15] proposed an immune genetic algorithm to study hybrid mixed-flow workshop scheduling with the objective of minimizing the maximum completion time. Tang et al. [16] proposed an improved immune genetic algorithm that introduced a multi-agent negotiation mechanism and simulated annealing algorithm to study the mixed scheduling problem of job shop and flow shop.

Intelligent algorithms are widely used in solving actual complex engineering problems. Nejah et al. [17] introduced the advantages and disadvantages of different intelligent algorithms in 3D indoor deployment problems and evaluated the performance of different intelligent algorithms on 3D indoor deployment problems. Mnasri et al. [18] introduced the application and analysis of existing hybrid intelligent algorithms on the deployment of sensor nodes in wireless sensor networks. The particle swarm optimization algorithm (PSO) stands out among many intelligent algorithms for its advantages, such as high solution accuracy and fast convergence speed. Zhao et al. [19] proposed an improved particle swarm algorithm with decreasing disturbance index on the multi-objective job shop scheduling problem. Mansour et al. [20] faced the problem of shop scheduling with congestion constraints and proposed a combination of a local search algorithm based on probabilistic perturbation and a particle swarm algorithm. Experiments show that the improved algorithm can quickly obtain the best solution; Jamrus et al. [21] proposed a hybrid genetic particle swarm optimization algorithm for flexible job shop scheduling. Experiments show that the proposed algorithm has high solution quality and good practicability. The particle swarm optimization algorithm has a wide range of applications in solving practical problems. Therefore, this paper also uses an improved particle swarm algorithm to solve the problem.

Spark [22,23] is a memory-based distributed computing framework. Comparing the Spark and Hadoop platforms, Hadoop is suitable for offline batch processing of files, but is not suitable for iterative operations. When Spark deals with iterative problems, it does not need to store the results of the iterations to disk, which makes up for the inefficiency of Hadoop Mapreduce that reads operations from the disk every time it deals with iterative problems. When programming Spark in parallel, the input data are decomposed into multiple batch processing fragments, the data are converted into RDD (resilient distributed datasets) and the data are encapsulated in RDD. Through the parallel operation of RDD, the parallel operation of data processing is realized [24]. The basic idea of realizing the parallel particle swarm algorithm is to convert the particle swarm to RDD and initialize it to multiple small populations of the same size. After parallel processing of these small populations, a feasible solution is finally obtained [25,26]. According to the idea of parallelization, a parallel hybrid particle swarm optimization algorithm is proposed for the mixed-flow hybrid workshop scheduling problem.

The existing intelligent algorithm adopts three-stage coding [13,15] to solve the hybrid mixed-flow workshop scheduling problem, which is only applicable to the case of a small batch. Nowadays, the scale and complexity of workshop scheduling are constantly increasing. In the case of relatively large batches, it is easy to fall into the problem of local optimization using its existing three-stage coding intelligent algorithm. Therefore, in the case of a large batch, three-stage coding is not used in the problem of hybrid mixed-flow workshop scheduling. Each workshop is coded independently for independent scheduling. The independent scheduling optimization of the workshop leads to a too long running time of the algorithm. Therefore, the algorithm is improved by combining Spark to realize the parallelization of the algorithm and reduce the running time of the algorithm. In this paper, a hybrid mixed-flow workshop scheduling model is established. In the case of a large batch, a hybrid particle swarm optimization parallelization algorithm based on Spark is proposed to avoid the algorithm falling into local optimization and the workshop scheduling scheme can be obtained effectively and quickly. It has important theoretical significance and application value to solve the problem of hybrid mixed-flow workshop scheduling.

#### **2. Problem Description and Modeling**

The hybrid mixed-flow workshop is composed of three parts: the first part is the parts-processing workshop, which is produced in batches; the second part is the flow shop of the component-assembly workshop, which is assembled in units; the third part is the flow shop for the final assembly of the product, which is assembled in units, as shown in Figure 1 below.

**Figure 1.** Hybrid mixed-flow workshop.

The parts-processing workshop consists of *j* machines, processing *i* parts; the componentassembly workshop is composed of *k* assembly stations, producing *x* components; the product-assembly workshop consists of *s* assembly stations to produce *y* products. For the convenience of research, the following assumptions are given [13]:

(1) At the beginning, all equipment and assembly stations are ready to perform production tasks at any time.


The objective function is to minimize the maximum completion time and the model is as follows:

$$G = \min(E\_{i,j} + E\_{x,k} + E\_{y,s})\tag{1}$$

In Equation (1), *Ei*,*<sup>j</sup>* represents the maximum completion time when all parts *i* are processed on *j* machines in the parts-processing workshop; *Ex*,*<sup>k</sup>* represents the maximum completion time of all components in the assembly shop *x* in *k* stations; *Ey*,*<sup>s</sup>* represents the maximum completion time for all products *y* in the product-assembly workshop to complete assembly at *s* workstations.

In the actual production process, the parts-processing workshop must meet the process constraints and equipment constraints. Parts are processed in corresponding machines and processes in accordance with process constraints and equipment constraints. In the assembly process, the component-assembly workshop and the product-assembly workshop, in accordance with the process and station constraints, operate at the corresponding assembly position and complete the pre-process before proceeding to the next process operation. The completion time of the workpiece at the station on the assembly line should meet the sum of the completion time of the previous product at this station and the maximum completion time of the workpiece at the previous station and the processing time at the current station. The constraints are as follows.

Parts-processing workshop:

$$\text{Equipment constraints}: \ E\_{i,j} - t\_{i,j} + r \times c\_{i,h,j} \ge E\_{i,h} \tag{2}$$

Process constraints : *Eg*,*<sup>i</sup>* − *ti*,*<sup>g</sup>* + *r* × *di*,*g*,*<sup>j</sup>* ≥ *tg*,*<sup>i</sup>* (3)

$$\text{lim}r - +\infty$$

Component-assembly workshop and product-assembly workshop:

$$\text{Station constraint}: \ E\_{\mathbf{x},k} - t\_{\mathbf{x},k} + r(1 - c\_{\mathbf{x},h,k}') \ge E\_{\mathbf{x},h} \tag{4}$$

$$\text{Process constraints}: E\_{\mathfrak{g},k} - E\_{\mathfrak{x},k} + r(1 - d\_{\mathfrak{x},\mathfrak{g},k}') \ge t\_{\mathfrak{g},k} \tag{5}$$

$$\text{Time constraint}: \ E\_{\mathbf{x},k} = t\_{\mathbf{x},k} + \max(E\_{\mathbf{x}-1,k'}, E\_{\mathbf{x},k-1}) \tag{6}$$

$$\text{lim}r - > +\infty$$

In Equations (2) and (3), the value of *ci*,*h*,*<sup>j</sup>* is 1 and 0; 0 means that the device *Mh* is placed in front of *Mj* to process *Ni* and 1 means other. The value of *di*,*g*,*<sup>j</sup>* also has two values of 0 and 1; 0 means that the workpiece *Ni* is placed in front of the *Ng* workpiece and is processed by the *Mj* equipment and 1 means others. *Ei*,*<sup>j</sup>* is the time when the part *Ni* is completed on the machine *Mj*; *ti*,*<sup>j</sup>* is the time required to process the part *Ni* on the machine *Mj*. In Equations (4)–(6), the values of *c <sup>i</sup>*,*h*,*<sup>j</sup>* are 0 and 1; 1 means that the workstation *h* is placed in front of *k* to assemble the workpiece *x*, 0 means other. The values of *d <sup>i</sup>*,*g*,*<sup>j</sup>* are 0 and 1; 1 means that the workpiece *x* is placed before *g* and works on workstation *k* and 0 means others.

At present, the intelligent algorithm deals with the hybrid mixed-flow workshop scheduling model. The algorithm uses three-level coding [15] for unified scheduling and solving. However, as the batches of parts and assembly components and products become larger and larger, this kind of coding can easily fall into a local optimal situation. Therefore, in order to solve this problem, each workshop is independently coded. Independently optimize scheduling for each workshop. This process increases the complexity of the algorithm and increases the running time. Therefore, the proposed algorithm is parallelized to reduce the running time of the algorithm and improve the efficiency of the algorithm.

#### **3. Parallelized Hybrid Particle Swarm Algorithm Based on Spark**

#### *3.1. Parallel Hybrid Particle Swarm Algorithm*

The particle swarm algorithm is a simulation of bird predation. In the process of solving, the solution of each particle corresponds to the position of the particle. The particle swarm algorithm has two attributes, speed and position. Speed represents the speed of movement and position represents the direction of movement.

The shop scheduling problem is a discrete optimization problem, the solution space is in different continuous domains. Because the traditional particle swarm algorithm particles fall into update stagnation and fall into the local optimal situation, combine the genetic algorithm and particle swarm algorithm to solve the shortcomings of traditional particle swarm algorithm and construct a parallelized hybrid particle swarm algorithm. The algorithm flow is as shown in Figure 2.

The pseudo code of the Algorithm 1 is as follows.



30 Output the global optimal k and corresponding N workpiece production sequencing

**Figure 2.** The main process of parallelized hybrid particle swarm optimization.

#### *3.2. Detailed Design of the Algorithm*

#### 3.2.1. Coding Scheme Design

The research problem is mixed mixed-flow workshop scheduling, in which there are job workshops and flow workshops and the coding methods in genetic coding are compared. The three workshops are designed with a unified coding. After the coding design is completed, the workshops are independently optimized and dispatched. First, determine the minimum production ratio of the number of products produced according to actual needs. According to the minimum production ratio, determine the minimum production ratio for the product-assembly workshop, component-assembly workshop and parts-processing workshop, then perform independent coding. In the workshop, letters and numbers are used to represent products, components and parts. The same letters represent the same products, components and parts. If we need to put into production, the P, Q and R products are 2, 1 and 2; the number of required components X and Y is 2 and 3; the parts required for parts processing A, B and C are 2, 1 and 2. Then, the coding method in the product-assembly workshop can be (P1, P2, Q1, R1, R2); the coding of the component-assembly workshop is (X1, X2, Y1, Y2, Y3); the coding of the part processing workshop is (A1, A2, A3, B1, C1, C2, C3).

#### 3.2.2. Crossover and Mutation

Enter the number of artifacts to generate N (total number of particles) random artifact sequences. After crossover and mutation with the global optimal value and its own optimal value, respectively, filter and update the one that can produce a better target value particle. In this step, the crossover and mutation operations can be regarded as random walk operations on the permutation group of the workpieces arranged in order. The mutation is a single-step walk of exchange and the crossover can be a walk formed by a combination of multiple basic exchanges. This step is similar to the speed update in the classic PSO. Whether to perform a walk is only True or False in this algorithm. This step simulates the weighting factor [27] in the classic PSO. Crossing with the local (self) and global optimal values, respectively, simulates the two velocity terms in the classic PSO. Both the mutation and crossover operations have a certain degree of randomness, which ensures that a single particle can jump out of the local optimal solution possibility.

#### 3.2.3. Parallelization of Hybrid Particle Swarm Algorithm

Pyspark is a tool of Spark and a library of sparkAPI written in python provided by Spark. Parallelization is achieved through Pyspark. First, the PSO coding is converted into a parallel RDD, then the process of solving the objective function is applied to all the particles through the Map operation provided by Spark. The time for each particle to be transformed into the objective function is summarized to obtain the optimal result.

#### **4. Instance Verification**

#### *4.1. Examples of Mixed Mixed-Flow Workshop Scheduling*

Now, we take the loader manufacturing workshop as an example [15] to verify the model and algorithm. The production system is composed of the parts-processing workshop, component-assembly workshop and product-assembly workshop. The four products produced are Q1, Q2, Q3, and Q4. The corresponding parts and component demand matrix of the products are shown in Table 1, below.

The parts-processing workshop mainly produces eight kinds of self-made parts. The set of parts is {A, B, C, D, E, F, G, H} and the set of machines in the workshop is {M1, M2, ··· , M10}. The parts in batches are processed on the machine. The processing time and process sequence are shown in Table 2 below and the time unit is s.


Note: "/" means that the product has no relationship with the required parts.


**Table 2.** Parts-processing time and process sequence.

The component-assembly workshop is mainly responsible for the assembly of components X and Y. The assembly time and steps are shown in Table 3 below and the unit is s.



The final assembly line of the product has 33 assembly stations and the corresponding assembly time and procedures for products Q1, Q2, Q3 and Q4 are shown in Table 4 below and the unit is s.


**Table 4.** Product final assembly process and time.

During the planning period, the tasks for the production of products Q1, Q2, Q3 and Q4 are divided into 320 units, 160 units, 320 units and 320 units. The minimum production ratio is 2:1:2:2. According to the known conditions, it can be known that the required parts X and Y are divided into 480 and 640 and the minimum production ratio is 3:4. The required parts A–H are 480, 640, 480, 640, 480, 640, 480 and 640, respectively, and the minimum production ratio is 3:4:3:4:3:4:3:4. Calculate according to the parallelized particle swarm algorithm, set the size of the population to 20 and the number of iterations to 300.

It runs in a 64-bit stand-alone Windows 10 operating system, 32 G running memory, 10 cores and 20 threads. In Spark's Local mode, parallel computing of algorithms is realized. In the case of local[N] mode, the optimal plans for the assembly scheduling of parts, components and products are obtained, respectively, {E1, G1, F1, D1, G2, D2, F2, B1, F3, C1, F4, A1, H1, A2, E2, C2, B2, A3, D3, H2, H3, G3, B3, D4, H4, B4, C3, E3}; {X1, X2, Y1, Y2, Y3, Y4, X3}; {Q3, Q1, Q1, Q3, Q2, Q4, Q4}; the total completion time is 15,508 s. Using the immune genetic algorithm IA [15], the total completion time is 15,679 s. Using the PSO algorithm, the total completion time is 15,660 s. Figure 3 shows the evolution curve of this algorithm.

**Figure 3.** Algorithm evolution curve.

In this paper, the parallel hybrid particle swarm optimization (PHPSO), IA and PSO algorithms are set to the same number of iterations of 50. Cbest is the optimal value of the operation, Aver is the average value of the operation and the relative deviation of the value dev [28]. Among them, dev1 is the comparison between PHPSO and IA and dev2 is the comparison between PHPSO and PSO. If dev is positive, the solution obtained by the compared algorithm is better. If dev is negative, the solution obtained by PHPSO is better. Table 5 shows algorithm comparison.

**Table 5.** Algorithm comparison.


It can be seen from Table 5, that PHPSO finds the optimal value within 50 iterations of running time and the IA and PSO algorithms cannot find the optimal solution, indicating that the PHPSO algorithm has a good ability to find the optimal solution. The average value obtained by PHPSO in 50 iterations is smaller, indicating that the algorithm has a strong global search ability, avoiding the limitation of the algorithm that is easy to fall into the local optimum and the algorithm has strong convergence.

The parallel hybrid particle swarm optimization algorithm is compared with immune genetic algorithm IA and PSO. A stand for IA or PSO algorithms. The comparison results of the largest completion time of parts-, components- and product-assembly workshops are shown in Table 6. The deviation obtained by the hybrid particle swarm algorithm solution and the IA solution is

$$Dev = \left[ (A - \mathbb{C}\_{PHPSO}) / \mathbb{C}\_{PHPSO} \right] \times 100\% \tag{7}$$


**Table 6.** Comparison of results.

From Table 6, we can see that the PHPSO algorithm used in this article has a better solution than the GA algorithm and the PSO algorithm in the parts workshop and productassembly workshop. Through the analysis of the results, the parallelized hybrid particle swarm optimization algorithm can achieve overall optimization.

#### *4.2. Computing Performance*

To test the parallelization performance of the hybrid particle swarm algorithm, set the population to 20 and the number of iterations to 40. Compare the running time of algorithm serialization and algorithm parallelization. The tested data are as follows: when the number of products is 7, the ratio of products is [2:1:2:2]; when the number of products is 10, the ratio of products is [2:3:2:3]; when the number of products is 14, the ratio of products is [3:4:5:2]. Compared with the running time of algorithm parallelization and algorithm serialization, the computing speed is greatly improved by 2–4 times. As the number of products input increases, the speed increases more obviously, reflecting the advantage of the Spark platform in processing a large amount of data. The running time of the algorithm is shown in Figure 4 below.

**Figure 4.** Algorithm running time.

#### *4.3. Results Discussion*

In the case of a large batch in the hybrid mixed-flow workshop scheduling problem, the algorithm in this paper can effectively solve the job shop scheduling problem and avoid the algorithm falling into local optimization. Combined with the Spark platform, the parallel design of the algorithm is realized. Compared with the serial operation of the algorithm, the parallel design of the algorithm improves the efficiency of the algorithm. When the bulk becomes larger, the demand data are also more complex and the enhancement of the efficiency of the algorithm is also more obvious, in line with the advantages of the Spark platform for big data processing. This paper also has limitations. Because this paper is a single objective optimization, there are many influencing factors in the actual job shop scheduling, such as inventory cost, so the next research should apply the proposed algorithm to the job shop scheduling of multi-objective optimization.

#### **5. Conclusions**

In this paper, a parallel hybrid particle swarm optimization algorithm is proposed for hybrid mixed-flow workshop scheduling problem. The Spark platform is combined with intelligent algorithms to solve the problem of workshop scheduling in high-volume situations. This can provide some reference for solving large-scale data processing in workshop scheduling.

In the future, it is necessary to apply the parallel hybrid particle swarm algorithm to the multi-objective shop scheduling problem. Consider combining the intelligent algorithm for solving multi-objectives with the algorithm in this paper to improve it, so that it can be applied to the problem of multi-objective workshop scheduling.

**Author Contributions:** Conceptualization, T.Z.; methodology, T.Z.; software, Y.C.; validation, T.Z.; formal analysis, T.Z.; investigation, Y.C.; data curation, J.W.; writing—original draft preparation, T.Z.; writing—review and editing, J.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

