1. Introduction
A heterogeneous computing system (HCS) refers to a system that incorporates different types of processing units (PUs). It has been ubiquitous in both scientific and industrial applications, not only because it can provide parallel processing and high performance powered by large numbers of PUs, but also due to its high efficiency and scalability derived from the complementarity of diverse types of PUs [
1]. For example, more than half of the top 10 supercomputing systems in the world employ CPU-accelerator heterogeneous architectures to maximize performance and efficiency [
2]. Besides the optimization in hardware organization and architecture, the efficiency of a HCS heavily depends on the effective utilization of the PUs inside. Hence, extensive efforts have been made in task scheduling approaches [
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13], which are commonly regarded as software techniques to improve system efficiency. Generally, a parallel application to run in a HCS can be decomposed into a set of lightweight tasks with precedence constraints, which can be described by a directed acyclic graph (DAG) or a workflow. Traditional workflow scheduling schemes concentrate on minimizing the total completion time, namely makespan, without violating precedence constraints.
However, it has been found that the single metric from the aspect of time efficiency is insufficient to evaluate modern HCSs, since their power consumption has become a critical issue due to the high cost of energy along with the negative environmental impacts. According to the report by National Resources Defense Council (NRDC) in the USA, the data centers in 2013 consumed 91 billion kWh of electricity, comparable to the production of 34 large coal-fired power plants [
14]. Furthermore, global data centers are predicted to cost 5% of the world’s electricity production while causing 3.2% of the worldwide carbon emissions by 2025 [
15]. On the other hand, a large portion of PUs tend to have relatively low average utilization, spending most of their time in the 10–50% utilization range [
16], which results in a massive waste of electricity and resources. Therefore, a growing number of workflow scheduling approaches have been developed to accommodate the needs for energy reduction coupled with makespan minimization [
17,
18,
19,
20,
21,
22,
23,
24,
25]. With the aid of the dynamic voltage frequency scaling (DVFS) technique that has been incorporated into common processors, schedulers are enabled to reduce the energy consumption at the expense of processing speed, which may increase the overall completion time of the application. Regarding the incompatibility of the two interests, a trade-off between makespan and energy consumption, or bi-objective optimization, still remains a challenge for energy-efficient workflow schedulers.
Given the NP-complete complexity of its general form [
26], the problem of bi-objective workflow scheduling is even more complicated, since the scheduling algorithm needs to take additional considerations for frequency selections and makespan-energy trade-offs, other than task-to-processor mappings and precedence constraint satisfaction. From the experience of traditional time-efficient workflow scheduling schemes, the methodologies to solve the scheduling problem can be dichotomized into two major groups, namely heuristic and metaheuristic [
4,
8,
9]. The heuristic-based algorithms normally have high runtime efficiency as they narrow down the search process to an extremely limited solution space by a set of efficient rule-based policies. These rules have significant effects on the results, but are not likely to be consistent for a wide range of problems. In contrast, the metaheuristic-based algorithms are less efficient because of the high computational cost generated by the incorporated combinatorial process, but they have demonstrated robust performance in various scheduling problems due to their power in searching more solution regions [
8,
9,
10,
11,
12,
13]. Furthermore, the metaheuristic group can be classified into two subcategories: single solution-based (e.g., tabu search, simulated annealing, and local search) and population-based (e.g., genetic algorithm and particle swarm optimization) [
27]. The single solution-based method exploits more solutions along a promising trajectory from a single starting point, while the population-based method concurrently track a set of seeding schedules to explore more solution space. Each of the two methods has its own strengths and weaknesses, but have high complementarity to each other. To this end, the memetic algorithm (MA) appears to be a natural choice, but is rarely applied to the problem of energy-efficient workflow scheduling. Formally, a memetic algorithm is a population-based metaheuristic composed of an evolutionary framework and a set of local search algorithms that are activated within the generation cycle of the external framework [
28].
In this article, a memetic algorithm for workflow scheduling on a DVFS-enabled HCS, namely MA-DVFS, is proposed to optimize makespan and energy consumption for executing a parallel application. Moreover, to avoid the extreme points on the Pareto front (e.g., low energy consumption with large makespan, and vice versa), as well produce a quality-guaranteed solution, MA-DVFS introduces a baseline that is also used as a seeding point during the bi-objective optimization search. The overall scheme generally involves three major phases. The first phase is task prioritizing, which has already revealed its significant effect on the quality of schedules [
3,
4]. Each task permutation under the precedence constraints indicates an independent portion of the solution space, which can be explored by a population-based method. The second phase is inspired by the fact that minimizing makespan usually helps with energy reduction. Thus, an earliest finish time (EFT)-based heuristic is utilized to provide a time-efficient candidate solution in the given portion. Based on this candidate, a local search method is applied with a certain probability to exploit better solutions in the third phase. To accommodate bi-objective optimization, the improved non-dominated sorting genetic algorithm (NSGA-II) [
29] is employed for the evolutionary framework. The main contributions of this article are listed below.
A memetic algorithm for energy-efficient workflow scheduling is proposed to integrate the abilities of exploration and exploitation with a relatively low time complexity. The search process towards optimal schedules is expected to spread intentionally and deeply.
A novel local search algorithm incorporated with a pruning technique is developed to accelerate the exploitation process. Furthermore, it is proven that launching the local search with a low probability is sufficient for a stable result.
A baseline solution generated by a time-efficient scheduling algorithm is introduced as a good seed, as well as a direction for the evolutionary search, ensuring the bi-objective optimization to produce quality-guaranteed schedules.
Extensive simulations are conducted to validate the proposed algorithm by comparisons with related algorithms on workflows of both randomly-generated and real-world applications. Experimental results reveal the superior performance and the high efficiency of the proposed algorithm.
The rest of this article is organized as follows:
Section 2 briefly reviews related work and existing approaches.
Section 3 describes the system model, the application model, and the energy model used in this article.
Section 4 details the proposed algorithm, while
Section 5 gives the experimental results and analyses. Conclusions and suggestions for future work are provided in
Section 6.
4. The MA-DVFS Algorithm
This section presents the detailed description of the proposed algorithm, including the overall algorithm flow and the concepts of the memetic algorithm.
4.1. The Algorithm Flow
The main algorithm flowchart of the proposed MA-DVFS is demonstrated in
Figure 2, which can be divided into three parts: First, a multi-objective evolutionary algorithm, e.g., NSGA-II, is employed as the main framework to explore a new solution space, as well as to evaluate individuals in each generation. Second, an EFT-based heuristic is utilized to point out a time-efficient schedule in the specified portion of the solution space. Third, a local search method integrated with a pruning technique is adopted with a low probability to exploit energy-efficient schedules in the given region. After several rounds of repeats, the best solution is reported.
From the aspect of the final schedule, the task priority queue is produced by the evolutionary algorithm, which indicates a feasible region in the solution space, as shown in
Figure 3. The processor/frequency selections are determined by the EFT-based heuristic together with the local search method. The overall algorithm leverages a combination of population-based, heuristic-based, and local search methods to coordinate exploration and exploitation for bi-objective workflow scheduling.
4.2. Encoding Scheme and Search Space Analysis
Encoding of individuals is one of the most fundamental and important steps in an evolutionary algorithm. In the scenario of energy-efficient workflow scheduling, each solution consists of three segments, including task, processor, and frequency, as illustrated in
Figure 4. The task segment contains a permutation of integers
, representing a valid task priority queue, which must be one of the topological orders of the workflow [
9]. Each gene in the processor segment can be an arbitrary index ranging from one to
m, while each gene in the frequency segment is restricted to the corresponding interval from one to
of the candidate processor
. Hence, each column of the individual represents one element of the schedule. For example, the first column in
Figure 4 denotes
, which represents that task
is scheduled to run on processor
using frequency
.
In fact, the encoding process reveals a huge search space. Task prioritizing has at most possibilities when all tasks are independent. Moreover, there can be possible assignments in the processor/frequency selection phase, where represents all frequency levels in the HCS. Thus, the search space of the energy-efficient workflow scheduling problem is in the order of , which challenges the search ability and efficiency of conventional evolutionary algorithms.
4.3. Population Initialization
The evolutionary algorithm starts from an initial population, the quality of which is critical for the search process and the final result. The initial population consists of
individuals, in each of which the task segment can be randomly generated under precedence constraints while the rest of the segments can be filled by an EFT-based heuristic and the local search method described later. Specifically, the task segments in the initial population are chosen from the set of topological orders for diversification. With a good uniform coverage, the individuals can be well spread to cover the whole feasible solution space [
9], as demonstrated in
Figure 3. However, the quality of the initial population cannot be guaranteed if it is generated in a totally random manner. In this case, a good seeding schedule can be introduced to improve the population quality and convergence speed. Meanwhile, it can be utilized as a reference point in the solution space to guide the searching process.
Based on the above idea, the process of population initialization is depicted in Algorithm 1. First, a seeding individual
is generated by the HEFT algorithm and added into the initial population
(Line 1). Then, a task priority queue is randomly chosen from the set of topological orders with a tabu list to avoid duplication (Line 3). Each task in the queue is allocated to a processor by an EFT-based heuristic to generate a schedule
(Line 4). After
rounds of repeats, the rest of the individuals of the initial population are finally generated.
Algorithm 1. Population initialization. |
Input:G, H, Output:- 1:
Generate a seeding individual by HEFT and add it to - 2:
for to do - 3:
Randomly generate a task priority queue by topological sorting with a tabu list - 4:
Allocate tasks to processors by the EFT-based heuristic to generate a schedule - 5:
Add to - 6:
end for - 7:
return
|
4.4. Fitness Evaluation and Pareto Archive
Fitness in an evolutionary algorithm represents how close a given solution is to the optimum solution. In single objective optimization, a fitness function (also known as the evaluation function) is defined to evaluate a solution. In time-efficient workflow scheduling, can be used for fitness evaluation, where a smaller implies a better fitness. In the bi-objective scenario, a fitness function also can be defined by normalization, but the coefficients of the fitness function need to be tested and optimized. Instead, MA-DVFS applies the non-dominated sorting of NSGA-II to evaluate the fitness of each solution. The non-dominated sorting aims to divide a solution set into a number of disjoint ranks, by means of comparing the values of the same objective. The non-dominated comparison operator is defined as follows.
Definition 12. : is said to be better than , if and only if and at least , where represent and , alternatively.
After non-dominated comparison, solutions of a smaller rank are better than those of a larger rank, and solutions of the same rank are viewed equally important. Solutions of the smallest rank comprise a Pareto set, which is the ultimate goal for bi-objective optimization.
In order to obtain quality-guaranteed schedules, as well as to accelerate the convergence speed, we introduce a Pareto archive to store and maintain schedules that are better than the seeding solution during each cycle of the generation. The population evaluation process is shown as Algorithm 2. First, the current population
is evaluated by the non-dominated sorting (Line 1). Then, schedules better than the seeding solution
are regarded as good candidates and stored in
(Lines 2–7). Finally, each schedule in
is added to
if it did not already exist (Line 8). Note that when updating
, the non-dominated sorting order is always maintained simultaneously. In this case, if the size of
is limited, solutions in the tail can be dropped.
Algorithm 2. Population evaluation. |
Input:, , Output:, - 1:
Apply non-dominated sorting in NSGA-II to to generate - 2:
- 3:
for each individual do - 4:
if then - 5:
Add to - 6:
end if - 7:
end for - 8:
Update with to obtain a new - 9:
return and
|
4.5. Evolutionary Operations
To produce offspring for the next generation, evolutionary algorithms have to use the current population to create the children by a series of evolutionary operations, including crossover, mutation, and selection. In MA-DVFS, the evolutionary framework is mainly used to explore diverse task priority queues. In this case, the above operations are applied to the task segment, which is commonly used in population-based methods for time-efficient workflow scheduling [
9,
24].
4.5.1. Crossover Operator
Crossover is the process of emulating generation alternation and producing offspring from selected parents. As previously mentioned, the crossover operator is applied to the task segment of the schedule. In this case, the operator should be responsible for producing valid offspring, which means new task priority queues are also topological orders of the workflow. To this end, we use a topological order preserving heuristic to generate offspring, as shown in Algorithm 3. First, two individuals are selected from the parent population, named as
and
, on which a crossover point is randomly selected (Line 2). Then, for
, genes of the left part are cloned from
(Line 3) while the rest are inherited from
in its original topological order (Lines 4–8), and
is generated in the same way (Lines 9–14).
Figure 5 demonstrates a sample process of the crossover operator. We leverage a single crossover point in MA-DVFS since it has been proven to be topological order preserving [
9] and is simple yet sufficient for task priority exploration.
Algorithm 3. Crossover operator. |
Input:, Output:, - 1:
, - 2:
Choose a random crossover point - 3:
- 4:
for each j from 1 to n do - 5:
if does not exist in yet then - 6:
Append to the tail of - 7:
end if - 8:
end for - 9:
- 10:
for each j from 1 to n do - 11:
if does not exist in yet then - 12:
Append to the tail of - 13:
end if - 14:
end for - 15:
return and
|
4.5.2. Mutation Operator
Mutation is analogous to biological mutation, which is used to maintain genetic diversity. The mutation operator changes a gene with a certain probability, thus helping the search algorithm escape from local optimal solutions. For a task priority queue, mutation should be performed without violating the precedence constraints. To this end, once a mutation point is provided, the closest predecessor and successor can be determined according to Definitions 2 and 3. Only positions between the predecessor and the successor (except the mutation point) should be considered as a new position, as shown in
Figure 6. If the new position is selected, the two genes are interchanged to produce a new individual.
4.5.3. Selection Operator
The selection operator is used to select individuals from the parent population for breeding the next generation. The primary objective of the selection operator is to emphasize the good individuals and eliminate bad ones; thus, the population should be evaluated, normally by a fitness function, which is not used in MA-DVFS. Instead, we adopt a tournament [
34] strategy for selecting candidates. The measurement of fitness of the population is implemented by the non-dominated sorting as mentioned in
Section 4.4; thus, individuals having the best fitness are then selected.
4.6. Local Search
Since a candidate schedule, which indicates a feasible region in the solution space, has been determined by the population-based metaheuristic and the EFT-based heuristic, the goal of the following step is to find the best solution in this region. However, it is a classic combinatorial optimization problem, which has possible solutions. We leverage a local search technique to tackle this problem.
Specifically, a hill climbing method is employed to get the local optima. Hill climbing is one of the local optimization methods, which starts from a given solution and seeks an improvement by incrementally modifying its configurations. The algorithm of local search with pruning is described in Algorithm 4. For a given schedule
(Line 1), each processor/frequency pair can be examined in the order of task priorities in
(Lines 2–17). Furthermore, two pruning steps, which consider the factors of makespan (Lines 6–8) and energy (Lines 9–11), respectively, are incorporated into each iteration to narrow down the searching space, thus reducing the runtime cost of the algorithm significantly. Since the best solution is recorded during the searching process (Lines 12–14), a local optimum is finally obtained (Line 18). For example, when the algorithm is applied to the workflow shown in
Figure 7 and the processors listed in
Table 1, it can improve the HEFT on energy saving by 15.9% (no makespan improvement). Meanwhile, the incorporated pruning technique contributes to the acceleration of the local search process by 69.2%.
Algorithm 4. Local search with pruning. |
Input: a solution Output: a local optimal solution - 1:
- 2:
for each in from left to right do - 3:
for each do - 4:
for each of do - 5:
Reallocate to at to formulate a new solution - 6:
if () && () then - 7:
break - 8:
end if - 9:
if () && () && () then - 10:
break - 11:
end if - 12:
if then - 13:
- 14:
end if - 15:
end for - 16:
end for - 17:
end for - 18:
return
|
4.7. The Overall Algorithm
The overall algorithm is described by Algorithm 5. The initial population
is generated by Algorithm 1 and evaluated in the next step to initialize the
(Line 2). So far, the seeding solution provided by HEFT has been included in
to ensure the quality of the final result. Then, the algorithm falls into
rounds of iterations as a population-based method algorithm always does. In each generation, a series of evolutionary operations, including selection, crossover, and mutation, as described above, is applied to the current individuals to generate a new population
(Line 5). The EFT-based heuristic used in HEFT is utilized to allocate tasks to processors without considering frequency assignments (Lines 6–8). The underlying intuition of this strategy is that reducing the total execution time usually results in a more energy-efficient schedule, since the occupancy of a HCS consumes energy every second. After that, the new population
is evaluated while the
is updated accordingly by Algorithm 2 (Line 9). Then, the local search method as defined in Algorithm 4 is launched with a certain sampling rate to a random subset of the population (Lines 10–14). Finally, the routine replacement strategy in NSGA-II is applied to form the new generation (Line 15). The evolution process is terminated when the maximum number of generations is reached. The final solution can be reported by popping the first item of
(Line 17).
Algorithm 5. The MA-DVFS algorithm. |
Input:G, H, Output:- 1:
Call Algorithm 1 to generate initial population - 2:
Call Algorithm 2 to evaluate and initialize - 3:
- 4:
whileg++do - 5:
Selection, crossover, and mutation to generate a new population - 6:
for each in do - 7:
Allocate tasks in to processors among H by the EFT-based heuristic - 8:
end for - 9:
Call Algorithm 2 to evaluate and update - 10:
if Sampling condition is met then - 11:
Randomly select an individual from and - 12:
Call Algorithm 4 to do local search to produce - 13:
Update with - 14:
end if - 15:
Combine and sort and to select individuals for the next generation - 16:
end while - 17:
the first population in - 18:
return
|
According to the procedures of the algorithm, MA-DVFS is able to provide a quality-guaranteed schedule, benefiting from the seeding schedule and the Pareto archive. Furthermore, with the aid of non-dominated sorting in NSGA-II, MA-DVFS keeps approaching the Pareto front to obtain bi-objective optimization. The time complexity of MA-DVFS is analyzed as follows. Each individual needs to execute evolutionary operations, the time complexity of which is , where g is the number of generations, p is the population size, e is the edge number of the workflow, and m is the number of processors. Taking the local search into account, the overall time complexity of the algorithm is in the order of .
6. Conclusions
In this article, a novel memetic algorithm for energy-efficient workflow scheduling on DVFS-enabled heterogeneous computing systems, MA-DVFS, is proposed by means of incorporating NSGA-II with an EFT-based heuristic and a local search method. Although the proposed local search method was already able to provide suboptimal schedules through single solution-based exploitation, it can be enhanced by a genetic framework to explore more task priority queues. Furthermore, it has been proven that a low sampling rate of local search is sufficient to provide high-quality solutions. Experimental results demonstrate the superior performance of the proposed scheme to other related algorithms in terms of makespan and energy saving. Moreover, it has higher runtime efficiency than the population-based competitor.
In future work, the performance of the proposed algorithm can be tested on large-scale HCSs, as well as workflows extracted from other applications. Incorporating other local search methods, such as tabu search [
13] and variable neighborhood search [
8], can also be considered.