**1. Introduction**

There are number of server farms equipped with hundreds of processors. The cost of energy used for cooling and running a machine for around three years surpasses the hardware cost of the machine [1]. Consequently, the major integrated chips manufacturers such as Intel and AMD are producing the dynamic speed scaling (DSS) enabled multiprocessor/multi-core machine and software such as Intel's SpeedStep [2], which support the operating system in managing the energy by varying the execution speed of processors. A founder chip maker Tilera forecasted that the numbers of processors/cores will be doubled every eighteen months [3], which will increase the energy demand to a grea<sup>t</sup> extent. Data centers consume 1.5% of total electricity usage in United States [4]. To avoid such critical circumstances, the current issue in the scheduling is to attain the good quality of service by generating an optimal schedule of jobs and to save the energy consumption, which is a conflicting and complicated problem [5].

The power *P* consumed by a processor running at speed *s* is *sV*2, where *V* is a voltage [6]. The traditional power function is *P* = *s*<sup>α</sup> (α ≥ 2 for CMOS based chips [7,8]). There are two types of speed models: the first unbounded speed model, in which the processor's speed range is, i.e., [0, ∞); the second bounded speed model, in which the speed of a processor can range from zero to some maximum speed, i.e., [0, η]. This DSS plays a vital role in energy management, where in a processor can regulate its speed to save energy. A few qualities of service metrics are slowdown, throughput, makespan, flow time and weighted flow time. At low speed, the processor finishes jobs slower and save energy, whereas at high speed, the processor finishes jobs faster but consumes more energy, as shown in Figure 1. To ge<sup>t</sup> a better quality of service and low energy consumption the objective should be to minimize the sum of flow time and energy; in case, if the importance or priority is attached, the objective should be to minimize the sum of importance-based flow time and energy. The objective of minimizing the IbFt+Ehas a natural explanation, as it can be considered in monetary terms [9].

**Figure 1.** Performance and speed curve.

In the multiprocessor systems, there is a requirement of three di fferent policies: the first policy is job selection, which decides the next job to be executed on every processor; the second policy is speed scaling, which decides every processor's execution speed at all time; the third policy is job assignment, which indicates that to which processor the new job should be assigned. In the *c*-competitive online scheduling algorithm, for each input the cost received is less than or equal to *c* times the cost of optimal offline algorithm [9]. Unlike non-clairvoyant scheduling, the size of job is unknown at arrival time, such as in UNIX operating system where jobs arrive with no information of processing requirement. Unlike online modes, in the o ffline mode, the whole job progression is known in advance. No online algorithm can attain a constant competitiveness with equal maximum speed to optimal o ffline algorithm [10].

Motwani et al. [10] commenced the study of the non-clairvoyant scheduling algorithms. Yao et al. inducted the theoretical study of speed scaling scheduling algorithm [11]. Yao et al. proposed an algorithm average rate heuristic (AVR) with a competitive ratio at most 2<sup>α</sup>−1α<sup>α</sup> using the traditional power function. Koren et al. [12] presented an optimal online scheduling algorithm *Dover* for a overloaded uniprocessor system with competitive ratio- 1 (<sup>1</sup>+ √ *k*) 2 for the objective of minimizing the throughput, where *k* is the importance ratio. The competitiveness of shortest

remaining processing time (SRPT) for multiprocessor system is *O minlogm n* , *log* σ, where *m* is number of processors, *n* is total number of jobs and σ represents the ratio of minimum to maximum job size [13]. Kalyanasundaram et al. [14] presented the idea of resource augmentation. If the resources are augmented and, (2 + <sup>Δ</sup>)-spee<sup>d</sup> *p* processors are used then the competitive ratio of Equi-partition lies between 2 3 (1 + Δ) and 2 + 4 Δ [15]. Multilevel feedback queue, a randomized algorithm with *n* jobs is *O*(*log n*)-competitive [16,17]. The first algorithm with non trivial guarantee is *O* log<sup>2</sup> σ -competitive [18], where σ is the ratio of minimum to maximum job size. There are di fferent algorithms proposed with di fferent objectives over a span of time [19–27].

Chen et al. [19] proposed algorithms with di fferent approximation bounds for processors with/without constraints on the maximum processor speed. The concept of merging dual objective of energy used and total flow time into single objective of energy used plus total flow time is proposed by Albers et al. [20]. Bansal et al. [21] proposed an algorithm, which uses highest density first (HDF) for the job selection with a traditional power function. Lam et al. [22] proposed a multiprocessor algorithm for homogeneous processors in which job assignment policy is a variant of round robin, the job selection. Random dispatching can provide (1 + <sup>Δ</sup>)-spee<sup>d</sup> *O* 1 Δ<sup>3</sup> -competitive non-migratory algorithm [23]. Chan et al. [24] proposed an *<sup>O</sup>*(1)-competitive algorithm using sleep managemen<sup>t</sup> for the objective of minimizing the flow time plus energy. Albers et al. [25] studied an o ffline problem in polynomial time and proposed a fully combitorial algorithm that relies on repeated maximum flow computation. Gupta et al. [26] proved that highest density first, weighted shortest elapsed time first and weighted late arrival processor sharing are not *<sup>O</sup>*(1)-spee<sup>d</sup> *<sup>O</sup>*(1)-competitive for the objective of minimizing the weighted flow time even in fixed variable speed processors for heterogeneous multiprocessor setting. Chan et al. [27] studied an online clairvoyant sleep managemen<sup>t</sup> algorithm scheduling with arrival-time-alignment (SATA) which is (1 + <sup>Δ</sup>)-spee<sup>d</sup> *O* 1 Δ<sup>2</sup> -competitive for the objective of minimizing the flow time plus energy. For a detailed survey refer to [28–34].

In this paper, the problem of online non-clairvoyant (ON-C) DSS scheduling is studied and an algorithm multiprocessor with bounded speed (MBS) is proposed with an objective to minimize the IbFt+E. On the basis of potential function analysis MBS is O(1)- competitive. The notations used in this paper are mentioned in the Table 1.




**Table 1.** *Cont.*

The organization of the paper is as follows. In Section 2, some related non-clairvoyant algorithms are explained and their competitive values are compared to the proposed algorithm MBS. Section 3 presents the preliminary definition and information for the proposed work. In Section 4, the proposed algorithm, its flow chart and potential function analysis is presented. The processing of a set of jobs are simulated using MBS and the best identified algorithm to observe the working of MBS. Section 6 provides the conclusion and future scope of the work.

#### **2. Related Work**

Gupta et al. [35] gave an online clairvoyant scheduling algorithm GKP (proposed by Gupta, Krishnaswamy and Pruhs) for the objective of minimizing the weighted flow time plus energy. Under the traditional power function, GKP is *<sup>O</sup>*α<sup>2</sup>-competitive without a resource augmentation for power heterogeneous processors. GKP uses highest density first (HDF) for the selection of jobs on each processor; the speed of any processor scales such that the power of a processor is the fractional weight of unfinished jobs; jobs are assigned in such a way that it gives the least increase in the projected future weighted flow time. Gupta et al. [35] used a local competitiveness analysis to prove their work. Fox et al. [36] considered the problem of scheduling the parallelizable jobs in the non-clairvoyant speed scaling settings for the objective of minimizing the weighted flow time plus energy and they used the potential function analysis to prove it. Fox et al. presented weighted latest arrival processor sharing with energy (WLAPS+E), which schedules the late arrival jobs and every job use the same number of machines proportioned by the job weight. WLAPS+E spares some machines to save the energy. WLAPS+E is (1 + <sup>6</sup><sup>Δ</sup>)-spee<sup>d</sup> (5/Δ<sup>2</sup>)-competitive, where 0 < Δ ≤ 1/6. Thang [37] studied the online clairvoyant scheduling problem for the objective of minimizing the weighted flow time plus energy in the unbounded speed model and using the traditional power function. Thang gave an algorithm (ALGThang) on unrelated machines and proved that ALGThang is 8(1 + <sup>α</sup>/*ln*α)-competitive. In AlGThang, the speed of any processor depends on the total weight of pending jobs on that machine, and any new job is assigned to a processor that minimizes the total weighted flow time.

Im et al. [38] proposed an ON-C scheduling algorithm SelfishMigrate-Energy (SM-E) for the objective of minimizing the weighted flow time plus energy for the unrelated machines. Using the traditional power function SM-E is *<sup>O</sup>*α<sup>2</sup>-competitive. In SM-E, a virtual queue is maintained on every processor where the new or migrated jobs are added at tail; the jobs migrate selfishly until equilibrium is gained. Im et al. simulates sequential best response (SBR) dynamics and they migrates each job to the machine that is provided by the Nash equilibrium. The scheduling policy applied on every processor is a variant of weighted round robin (WRR), wherein the larger speed is allotted to jobs

residing at the tail of the queue (like Latest Arrival Processor Sharing (LAPS) and Weighted Latest Arrival Processor Sharing (WLAPS)). Bell et al. [39] proposed an online deterministic clairvoyant algorithm dual-classified round robin (DCRR) for the multiprocessor system using the traditional power function. The motive of 24α *log*α*P* + α<sup>α</sup>2<sup>α</sup>−<sup>1</sup>-competitive DCRR is to schedule the jobs so that they can be completed within deadlines using minimum energy, i.e., the objective is to maximize the throughput and energy consumption. In DCRR, the sizes and the maximum densities (= size/(deadline – release time)) of jobs are known and the classification of jobs depends on the size and the maximum density both. The competitive ratio of DCRR is high, as it considers the jobs with deadlines and using a variation of round robin with the speed scaling.

Azar et al. [40] gave an ON-C scheduling algorithm NC-PAR (Non-Clairvoyant for Parallel Machine) for the identical parallel machines, wherein the job migration is not permitted. Using traditional function NC-PAR is α + 1 <sup>α</sup>−1 -competitive for the objective of minimizing the weighted flow time plus energy in unbounded speed model. In NC-PAR a global queue of unassigned jobs is maintained in First In First Out (FIFO) order. A new job is assigned to a machine, when a machine becomes free. In NC-PAR jobs are having uniform density (i.e., *weight*/*size* = 1) and the jobs are not immediately allotted to the processors at release time. The speed of a processor using NC-PAR is based on the total remaining weight of the active jobs. In non-clairvoyant model with known arbitrary weights no results are known [40].

An ON-C multiprocessor speed scaling scheduling algorithm MBS is proposed and studied against an o ffline adversary with an objective of minimizing IbFt+E. The speed of a processor using MBS is proportional to the sum of importance of all active jobs on that processor. In MBS, the processor's maximum speed can be (1 + <sup>Δ</sup>/3*m*)η (i.e., the range of speed is from zero to (1 + <sup>Δ</sup>/3*m*)η), whereas the processor's maximum speed using Opt (Optimal algorithm) is η, where *m* is number of processors and 0 < Δ ≤ (<sup>3</sup>α) −1 a constant. In MBS, a new job is assigned to an idle processor (if available) or to a processor having the minimum sum of the ratio of importance and executed size for all jobs on that processor; the policy for job selection is weighted/importance-based round robin, and each active job receives the processor speed equal to the ratio of its importance to the total importance of jobs on that processor. In this paper, the performance of MBS is analysed using a competitive analysis, i.e., the worst-case comparison of MBS and optimal o ffline scheduling algorithm. MBS is (1 + <sup>Δ</sup>/3*m*)-speed, 9 8 + 3Δ 8 · 1 + (1 + <sup>Δ</sup>/3*m*) α = *O*(1) competitive, i.e., the value for competitive ratio *c* for *m* = 2, α = 2 is 2.442; for *m* = 2, α = 3 is 2.399; the detailed results for di fferent values of *m*, Δ = (<sup>3</sup>α) −1 and α = 2 & 3 is shown in Table 2. The comparison of results is given along with the summary of results in Table 3.



Note: the speed ratio *sr* = *Maximum speed o f a processor using MBS MaximumspeedofaprocessorusingOpt* .

On the basis of the values mentioned in the Table 2, it can be observed that in proposed algorithm MBS if the number of processor increases then the speed ratio and competitive ratio increases. The data mentioned in Table 3 describe the competitive values of different scheduling algorithm. Some clairvoyant and non-clairvoyant algorithms competitive ratio are considered at α = 2, α = 3. The lower competitive value represents the better algorithm. The value of competitiveness is least for the proposed algorithm MBS.


**Table 3.** Summary of Results.

#### **3. Definitions and Notations**

An ON-C job scheduling on a multiprocessor using speed bounded setting is considered, where the jobs arrive over time, the job's importance/weight are known at release time and the size of a job is revealed only after the job's completion. Processor's speed using Opt can vary dynamically from 0 to the maximum speed η i.e., [0, η]. The nature of jobs is sequential as well as unrestricted pre-emption is permitted without penalty. The traditional power function *Power P* = *speed*<sup>α</sup> is considered, where α > 1 a fixed constant. If *s* is the processor's speed then a processor executes *s* unit of work per unit time. An active job *j* has release time lesser than the current time *t*, and it is not completely executed. The flow time *F*(*j*) of job *j* is the time duration since *j* released and until it is completed. The total importance-based flow time *F* is *j*∈*<sup>I</sup> imp*(*j*)*F*(*j*). Amortized analysis is used for algorithms where an occasional operation is very slow, but most of the other operations are faster. In amortized analysis, we analyse a sequence of operations and guarantee a worst case average time which is lower than the worst case time of a particular expensive operation.
