An N-Modular Redundancy Framework Incorporating Response-Time Analysis on Multiprocessor Platforms

Baek, Jaemin; Baek, Jeonghyun; Yoo, Jeeheon; Baek, Hyeongboo

doi:10.3390/sym11080960

Open AccessArticle

An N-Modular Redundancy Framework Incorporating Response-Time Analysis on Multiprocessor Platforms

¹

Agency for Defense Development (ADD), Defense Satellite Systems PMO, Daejeon 34063, Korea

²

Agency for Defense Development (ADD), The 3rd R&D Institute-3rd Directorate, Daejeon 34063, Korea

³

Department of Computer Science and Engineering, Incheon National University (INU), Incheon 22012, Korea

^*

Author to whom correspondence should be addressed.

Symmetry 2019, 11(8), 960; https://doi.org/10.3390/sym11080960

Submission received: 8 July 2019 / Revised: 22 July 2019 / Accepted: 27 July 2019 / Published: 31 July 2019

Download

Browse Figures

Versions Notes

Abstract

:

A timing constraint and a high level of reliability are the fundamental requirements for designing hard real-time systems. To support both requirements, the N modular redundancy (NMR) technique as a fault-tolerant real-time scheduling has been proposed, which executes identical copies for each task simultaneously on multiprocessor platforms, and a single correct one is voted on, if any. However, this technique can compromise the schedulability of the target system during improving reliability because it produces N identical copies of each job that execute in parallel on multiprocessor platforms, and some tasks may miss their deadlines due to the enlarged computing power required for completing their executions. In this paper, we propose task-level N modular redundancy (TL-NMR), which improves the system reliability of the target system of which tasks are scheduled by any fixed-priority (FP) scheduling without schedulability loss. Based on experimental results, we demonstrate that TL-NMR maintains the schedulability, while significantly improving average system safety compared to the existing NMR.

Keywords:

hard real-time systems; schedulability; reliability; N modular redundancy; multiprocessor platform; fixed-priority scheduling

1. Introduction

Most embedded systems are known as real-time, of which timing constraints are one of the system design requirements. One of the most important timing constraints in real-time systems is to complete executions within their corresponding deadlines [1]. Therefore, the correctness of a real-time system depends on the time in which the correct logical and functional output is generated. One class of real-time systems with strict timing constraints is the hard real-time system. If the timing constraints of hard real-time systems are not met, the consequences can be life threatening or cause serious economic losses. Therefore, hard real-time system designers must ensure that all timing constraints are met before the target system is actually implemented.

In addition to meeting the timing constraints of a hard real-time system, the systems functional accuracy should be guaranteed [2]. An output produced in a timely manner is not reliable when the system is out of the specified functional correctness. This deviation occurs in a computer system when the system is defective. Therefore, hard real-time systems must meet all timing constraints of tasks while they are not defective. Timing constraints in real-time applications can be met through proper job scheduling, and fault tolerance can be achieved by obtaining the required level of reliability.

Although many fault-tolerant techniques have been previously implemented using hardware [3], recent software-based techniques such as rollback, checkpointing, and re-execution have been proposed, which were initially designed for uniprocessor systems [4,5,6]. Checkpoints using the rollback technology manage each checkpoint, where the state of the system is stored in stable storage, and the system status is restored to the latest checkpoint if a transient error is detected [4]. The re-execution technique runs a job multiple times and selects the correct output obtained during multiple runs [5,6]. If all outputs are incorrect during the multiple runs, it re-executes the job to improve system reliability. Under the re-execution technique, tasks execute multiple times, which increases the possibility of deadline misses due to prolonged execution times. Therefore, existing research considering the re-execution technique aims at improving the system reliability of mixed-critical systems or energy-sensitive real-time systems; however, the schedulability of the system has inevitably been compromised.

In multi-processor domains, errors can be tolerated by taking advantage of the functionality of multiple processors [7,8,9]. The most common approach is the primary-backup approach in which the backup of a task executes if its primary does not execute successfully [7]. Backup overloading approach schedules a backup copy of the primary job in a time-overlapping manner for operational efficiency [8]. Another efficient overloading algorithm on multiple processors has been proposed through dynamic logical grouping between copies of tasks [9]. Regarding a hardware approach, Cirinei et al. proposed a dynamic reconfiguration of a multiprocessor hardware platform with a balance between performance and fault tolerance through concurrent replication [10]. N modular redundancy (NMR) executes identical copies for each task simultaneously on multiprocessor platforms, and a single correct output is voted if any. Because this technique produces N identical copies of each job, which execute in parallel, some tasks may miss their deadlines owing to enlarged computing power required for completing their execution [4,5,6].

Although NMR to make the target system tolerant to a transient fault is an effective approach for real-time scheduling, it can compromise the schedulability of the target system while improving reliability [4]. Because this technique produces N identical copies of each job, which are executed in parallel, some tasks may miss their deadlines owing to enlarged computing power required for completing their executions. This is due to the N modular redundancy techniques limited capability of forcing the same N number of copies for all tasks, where the number N is determined without schedulability analysis [4,6].

In this study, we propose a task-level N modular redundancy (TL-NMR) technique, which improves the system reliability of the target system in which tasks are scheduled by any fixed-priority (FP) scheduling without schedulability loss. The TL-NMR framework determines the number of copies (not the same number of copies for all the tasks) of each job

J_{k}^{q}

of a task

τ_{k}

while ensuring that every copied job can complete its execution before its absolute deadline by effectively using a new response-time analysis (RTA) proposed in this paper. Then,

N_{k}

copies of each job

J_{k}^{q}

execute simultaneously on multiple processors under the given FP scheduler, and a single correct output (if any) is voted on. Based on the experimental results, we demonstrate that TL-NMR maintains schedulability while significantly improving the average system safety, compared to the existing NMR.

The remainder of this paper is organized as follows. Section 2 presents our system model including task and reliability models. Section 3 introduces our proposed fault-tolerant scheduling framework called TL-NMR. Section 5 evaluates RTA for TL-NMR with various performance metrics. Section 6 concludes this study.

2. System Model

In this section, we present the system model of our target system, which includes task and reliability models.

2.1. Task Model

We consider the Liu and Layland task model [1] for m identical processors in a hard real-time system, in which a set

τ

of tasks

τ_{k}

(

1 \leq k \leq | τ |

) are denoted by three tuples of parameters, i.e.,

τ_{k} = (T_{k}, C_{k}, D_{k})

. For a given period

T_{k}

, worst-case execution time

C_{k}

, and relative deadline

D_{k}

, a task

τ_{k}

is supposed to invoke a series of jobs whose arrivals (also called release times) are at least

T_{k}

time units away from each other. In addition, each job of

τ_{k}

is required to execute for at most

C_{k}

time units to complete its execution and such an execution should be ended within

D_{k}

time units.

The q-th job

J_{k}^{q}

is invoked by a task

τ_{k}

at its release time

r_{k}^{q},

and its execution should be completed before its absolute deadline

d_{k}^{q}

as a timing constraint. We let the q-th job

J_{k}^{q}

’s finishing time be

f_{k}^{q}

, i.e.,

f_{k}^{q}

should be less than or equal to

d_{k}^{q}

to satisfy the timing constraint; then the job

J_{k}^{q}

is said to be schedulable. Consequently, a task

τ_{k}

is schedulable if every job of

τ_{k}

is schedulable, and a task set

τ

is schedulable when every task

τ_{k}

is schedulable. When the TL-NMR technique is applied,

N_{k}

copies are produced for each job

J_{k}^{q}

. Here, each copy (i.e., each copied job) is denoted by

J_{k}^{q, p}

, where

1 \leq p \leq N_{k}

holds, and a task

τ_{k}

is schedulable if every job

J_{k}^{q, p}

of

τ_{k}

is schedulable. We assume that each single job cannot execute in parallel on multiple processors simultaneously. All copied jobs

J_{k}^{q, p}

can have different finishing times

f_{k}^{q, p}

depending on the scheduling algorithm; however, they have the same release time

r_{k}^{q}

and absolute deadline

d_{k}^{q}

.

The response time

R_{k}

of a task

τ_{k}

is given by

{max}_{J_{k}^{q} \in τ} (f_{k}^{q} - r_{k}^{q})

(

m a x_{J_{k}^{q, p} \in τ} (f_{k}^{q, p} - r_{k}^{q})

when the TL-NMR technique is applied). By the definition of the response time

R_{k}

, a task set

τ

is guaranteed to be schedulable if

R_{i} \leq D_{i}

holds. We target a constrained-deadline task system in which

C_{k} \leq D_{k} \leq T_{k}

holds for every task

τ_{k} \in τ

. We consider a global preemptive work-conserving scheduling algorithm in which a job can migrate from one processor to another and a lower priority job’s execution can be hindered by a higher priority one. We assume quantum-based time where a time unit describes a quantum length of one.

2.2. Reliability Model

Among various types of faults, we consider the transient fault for our system model, which occurs for a short time without damaging the hardware devices. Reliability is the degree of tolerance for target systems against transient faults, which has been addressed by a number of existing studies, regarding a task (called task reliability) and a system (called system reliability) [6]. Task reliability is defined as the possibility that a single job of the task can execute without a transient fault. For a given average fault, arrival rate

γ

and exponential distribution, the task reliability

Υ_{k}

of a task

τ_{k}

is given by [6].

\begin{matrix} Υ_{k} = e^{- γ C_{k}} . \end{matrix}

(1)

We assume that at most one transient fault can occur for a single job and it does not change the worst-case execution time

C_{k}

of a task

τ_{k}

as a common assumption [11].

In the existing NMR, each job’s N identical copies are executed on multiple processors simultaneously (sharing the same release time

r_{k}^{q, p}

and absolute deadline

d_{k}^{q, p}

), and their results are voted to produce a single correct output (with no transient faults) if any. Unlike the existing NMR technique, our proposed TL-NMR allows different tasks to execute different number of copies for each execution. We denote the number of such copies by

N_{k}

(each

τ_{k}

can have different values of

N_{k}

). We describe the method to determine

N_{k}

in Section 3. By the definition of reliability,

1 - Υ_{k}

implies the possibility that a job

J_{k}^{q}

does not successfully execute owing to a transient fault. In addition, TL-NMR selects a correct job (if any) among

N_{k}

identical jobs. Thus,

Υ_{k}

under the TL-NMR technique can be re-formulated as follows:

\begin{matrix} Υ_{k} = 1 - {(1 - e^{- γ C_{k}})}^{N_{k}} . \end{matrix}

(2)

Then, system reliability

Υ (τ)

is defined as the average of task reliability

Υ_{k}

in a task set

τ

and is calculated as

\begin{matrix} Υ (τ) = \frac{\sum_{τ_{k} \in τ} Υ_{k}}{| τ |} \end{matrix}

(3)

For designing a hard real-time system, the reliability of the system should be maintained at a high level and every single execution of a task should be finished within its corresponding absolute deadline. The system safety

Ω (τ)

to quantify system reliability and schedulability simultaneously is defined as the product of system reliability

Υ (τ)

and schedulability (one if schedulable and zero otherwise) of

τ

. Thus,

Ω (τ)

is

Υ (τ)

if

R_{k} \leq D_{k}

for all

τ_{k} \in τ,

and zero otherwise.

3. TL-NMR Framework

In this section, we propose TL-NMR, which improves the system reliability of a target system in which tasks are scheduled by any FP scheduling without schedulability loss. The TL-NMR framework determines

N_{k}

, which represents the individual number of copies of each job

J_{k}^{q}

of a task

τ_{k}

by effectively using a new RTA proposed in this paper. Then,

N_{k}

copies of each job

J_{k}^{q}

execute simultaneously on multiple processors under the given FP scheduler, and a single correct output (if any) is voted.

Because a real-time scheduling algorithm exploiting the existing NMR technique schedules N copies of each job, it requires N times

C_{k}

(i.e.,

N \cdot C_{k}

) for each

τ_{k}

. Thus, some jobs (although they satisfy their absolute deadlines under vanilla (It indicates a scheduling algorithm that does not incorporate NMR.) FP scheduling) may miss their absolute deadlines owing to prolonged execution times. Figure 1 illustrates the scenario where a schedulable task set under vanilla RM scheduling RM scheduling (assigns a higher priority to that task

τ_{k}

which has a small

T_{k}

.) becomes unschedulable owing to NMR.

τ_{2}

and

τ_{3}

have the same priority owing to their same

T_{i}

; however, we assume that

τ_{2}

has a higher priority than

τ_{3}

according to a simple tie-breaking rule that a task with a smaller task index has a higher priority. As shown in Figure 1, jobs of three tasks

τ_{1}

,

τ_{2}

, and

τ_{3}

are schedulable under vanilla RM on

m = 3

processors (Figure 1a). Considering RM scheduling that exploits NMR for

N = 2

, two identical copies of each job are scheduled (Figure 1a). The first job

J_{3}^{1, 1}

of a job

J_{3}^{1}

starts its execution at time instant

t = 2

, and it is preempted at

t = 4

owing to higher priority jobs. Then, it completes its execution at

t = 8

. However, the second job

J_{3}^{1, 2}

of the job

J_{3}^{1}

begins its execution at

t = 6

because all the three processors are occupied by higher priority jobs. Then, it misses its absolute deadline at

t = 8

because it does not execute completely for

C_{k} = 4

.

We can easily observe that

τ

in Figure 1b becomes schedulable if one copied job of any task is not considered for scheduling (e.g.,

N_{k} = 1

for any

τ_{k}

). TL-NMR is capable of assigning

N_{k}

for each task, while it does not make a schedulable task set unschedulable under any FP scheduling. To achieve this goal, we need to address the following questions:

How to determine $N_{k}$ of each task $τ_{k}$ ?
How to guarantee schedulability of $τ_{k}$ for a given $N_{k}$ ?

To address question 1, TL-NMR effectively assigns the value of

N_{k}

using

N_{k}

-assignment algorithm in conjunction with a new RTA for TL-NMR so that

R_{k}

of every task is never greater than

D_{k}

. With

N_{k}

of every task

τ_{k}

, TL-NMR makes

N_{k}

identical copies of each job and schedules a task set

τ

according to the given FP scheduling algorithm. To address question 2, we develop a new RTA for TL-NMR to guarantee that there is no deadline miss while

N_{k}

identical copies of each job are scheduled by the given FP scheduling algorithm.

Algorithm 1 presents the operation of TL-NMR. At the beginning,

N_{k}

for every task

τ_{k}

is set to one. Then,

N_{k}

of each task is determined (Lines 2–8). For every task

τ_{k}

in

τ

, schedulability is tested with a value of

N_{k} + 1

using RTA for TL-NMR, and

N_{k} + 1

is assigned for

τ_{k}

if

τ_{k}

is deemed schedulable by the test (Lines 4–6). This procedure is conducted

m - 1

times (Lines 2–8) because at most m copies of each job are allowed by TL-NMR on an m processor platform. Thereafter, a given task set

τ

is scheduled (Lines 9–17). For every time instant t, a job

J_{k}^{q}

of a task

τ_{k}

is inserted in a ready queue Q whenever

J_{k}^{q}

is released (Lines 9–11). Released jobs in Q are scheduled according to the base FP scheduling algorithm (Line 13). Finally,

J_{k}^{q}

is removed from Q when the execution of all the copied jobs

J_{k}^{q, p}

of

J_{k}^{q}

is completed.

Algorithm 1 TL-NMR

1:: $N_{k} \leftarrow 1$ for all tasks $τ_{k} \in τ$
2:: for from 1 to $m - 1$ do
3:: for $τ_{k}$ in $τ$ do
4:: if $τ$ is deemed schedulable with $N_{k} + 1$ by a given new RTA then
5:: $N_{k} \leftarrow N_{k} + 1$
6:: end if
7:: end for
8:: end for
9:: for Every time instance t do
10:: if $J_{k}^{q}$ is released by $τ_{k}$ then
11:: Insert $J_{k}^{q}$ into Q
12:: end if
13:: Schedule jobs ( $N_{k}$ copied jobs $J_{k}^{q, p}$ for each job $J_{k}^{q}$ ) in Q according to a given FP scheduling
14:: if All copied jobs $J_{k}^{q, p}$ of $J_{k}^{q}$ finish its execution then
15:: Delete $J_{k}^{q}$ from Q
16:: end if
17:: end for

4. New RTA for TL-NMR

Because our goal is to improve reliability while guaranteeing schedulability under TL-NMR, we should be able to judge whether the task set

τ

is schedulable with the given values of

N_{k}

for every task

τ_{k}

while

N_{k}

is assigned by TL-NMR. Hence, we develop a new RTA that can be incorporated into TL-NMR.

The response time

R_{k}

of a task

τ_{k}

can be upper-bounded by the summation of the worst case execution time

C_{k}

and the worst case time instance hindering the execution of a job

J_{k}^{q}

of

τ_{k}

. Because

C_{k}

is given, RTA for TL-NMR upper bounds the latter by exploiting the notion of interference in an interval [

r_{k}^{q}, r_{k}^{q} + ℓ

) (where ℓ is limited to

D_{k}

).

The interference

I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ)

on a copied job

J_{k}^{q, p}

in the interval [

r_{k}^{q}, r_{k}^{q} + ℓ

) is defined as the cumulative length of all the intervals in which

J_{k}^{q, p}

is ready to be executed but cannot be scheduled on any processor owing to m higher priority jobs [12].

When interference occurs on

J_{k}^{q, p}

at a certain time instance, at least m higher priority jobs exist to hinder the execution of

J_{k}^{q, p}

. Thus, we need to calculate how much of the execution of the higher priority job contributes to the interference

I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ)

on

J_{k}^{q}

to upper bound

I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ)

. An important property of schedules under TL-NMR is that the execution of a copied job

J_{k}^{q, p}

can be hindered by jobs of not only other tasks

τ_{i}

but also by other copied jobs (of

τ_{k}

) sharing the same release time

r_{k}^{q}

and absolute deadline

d_{k}^{q}

. For example,

J_{3}^{1, 2}

in Figure 1b cannot execute in the interval [2, 4) owing to

J_{2}^{1, 1}

,

J_{2}^{1, 2},

(of

τ_{2}

) and

J_{3}^{1, 1}

(of

τ_{3}

) occupying three processors. Hence, we first let

J_{i}^{p}

be the set of the p-th copied jobs of

τ_{i}

. Then, we define the interference

I_{k \leftarrow i} (r_{k}^{q}, r_{k}^{q} + ℓ)

of

J_{i}^{p}

on

J_{k}^{q, p}

and the interference

I_{k \leftarrow k} (r_{k}^{q}, r_{k}^{q} + ℓ)

of the other copied job

J_{k}^{q, g}

(i.e., except the job of interest

J_{k}^{q, p}

) of

τ_{k}

on

J_{k}^{q, p}

in an interval [

r_{k}^{q}, r_{k}^{q} + ℓ

).

The interference

I_{k \leftarrow i} (r_{k}^{q}, r_{k}^{q} + ℓ)

of

J_{i}^{p}

on

J_{k}^{q, p}

in the interval [

r_{k}^{q}, r_{k}^{q} + ℓ

) is defined as the cumulative length of all the intervals in which

J_{k}^{q, p}

is ready to be executed but cannot be scheduled on any processor while jobs of

J_{i}^{p}

execute.

The interference

I_{k \leftarrow k} (r_{k}^{q}, r_{k}^{q} + ℓ)

of the other copied job

J_{k}^{q, g}

(i.e., except the job of interest

J_{k}^{q, p}

) of

τ_{k}

on

J_{k}^{q, p}

in the interval [

r_{k}^{q}, r_{k}^{q} + ℓ

) is defined as the cumulative length of all the intervals in which

J_{k}^{q, p}

is ready to be executed but cannot be scheduled on any processor while

J_{k}^{q, g}

executes.

To upper bound

I_{k \leftarrow i} (r_{k}^{q}, r_{k}^{q} + ℓ)

of

J_{i}^{p}

on

J_{k}^{q, p}

, RTA for TL-NMR uses the notion of the workload of

J_{i}^{p}

in an interval of length ℓ, which is defined as the amount of computation time required for

J_{i}^{p}

in ℓ [13]. The upper part of Figure 2 illustrates the scenario in which the workload of a task

τ_{i}

is maximized under preemptive work-conversing TL-NMR scheduling. Let us assume that

p = 1

. The first job

J_{i}^{q, 1}

of

τ_{i},

in the upper side of Figure 2, starts its execution at the beginning of the interval ℓ and completes its execution at

d_{i}^{q}

, thereby executing for

C_{i}

time units without any interference or delay. Thereafter, the following jobs (i.e.,

J_{i}^{q + 1, 1}

and

J_{i}^{q + 2, 1}

) are released and scheduled subsequently. Considering the number of executions of jobs fully executing for

C_{i}

, the other jobs executing for a portion of

C_{i}

, and the number of copies

N_{i}

, the upper-bounded workload

W_{i} (ℓ)

is calculated by

\begin{matrix} I_{k \leftarrow i} (r_{k}^{q}, r_{k}^{q} + ℓ) = W_{i} (ℓ) = F_{i} (ℓ) \cdot C_{i} + m i n (C_{i}, ℓ + D_{i} - C_{i} - F_{i} (ℓ) \cdot T_{i}), \end{matrix}

(4)

where

F_{i} (ℓ)

is the number of jobs executing for

C_{i}

calculated by

\begin{matrix} F_{i} (ℓ) = ⌊\frac{ℓ + D_{i} - C_{i}}{T_{i}}⌋ . \end{matrix}

(5)

Because the other copied jobs of

τ_{k}

share the same release time

r_{k}^{q}

and absolute deadline

d_{k}^{q}

, ℓ is limited to

D_{k}

. The interference

I_{k \leftarrow k} (r_{k}^{q}, r_{k}^{q} + ℓ)

of the other copied job

J_{k}^{q, g}

of

τ_{k}

on

J_{k}^{q, p}

in the interval [

r_{k}^{q}, r_{k}^{q} + ℓ

) can be upper-bounded by

\begin{matrix} I_{k \leftarrow k} (r_{k}^{q}, r_{k}^{q} + ℓ) = m i n (C_{k}, ℓ), \end{matrix}

(6)

as shown in the lower part of Figure 2. As seen in Figure 2,

N_{i}

times of

J_{i}^{p}

and

N_{k} - 1

times of

J_{k}^{q, g}

can contribute to

I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ)

. In addition, because a job cannot execute in a time instant if m other higher priority jobs execute, using upper-bounded

I_{k \leftarrow i} (r_{k}^{q}, r_{k}^{q} + ℓ)

and

I_{k \leftarrow k} (r_{k}^{q}, r_{k}^{q} + ℓ)

,

I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ)

is upper-bounded by

\begin{matrix} I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ) = ⌊\frac{1}{m} \cdot (\sum_{τ_{i} \in τ - {τ_{k}}} N_{i} \cdot (I_{k}^{i} (r_{k}^{q}, r_{k}^{q} + ℓ)) + (N_{k} - 1) \cdot I_{k \leftarrow k} (r_{k}^{q}, r_{k}^{q} + ℓ))⌋ . \end{matrix}

(7)

Therefore, if

I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ)

is not larger than

ℓ - C_{k}

,

J_{k}^{q, p}

can finish its execution at or before

r_{k}^{q, p} + ℓ

, where ℓ is limited to

D_{k}

.

Although (7) safely upper-bounds

I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ)

, the value derived by (7) is highly overestimated because it includes the amount of execution of

J_{i}^{p}

and

J_{k}^{q, g},

which can be performed in parallel with

J_{k}^{q, p}

. As seen in Figure 3, a portion of the execution of

J_{i}^{p}

is performed in parallel with a job

J_{k}^{q, 1}

of interest, which cannot contribute to

I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ)

. This phenomenon will also occur with

J_{k}^{q, 2}

or

J_{k}^{q, 3}

if

C_{k}

is larger than

ℓ - C_{k}

. Thus, RTA for TL-NMR limits the amount of

I_{k \leftarrow i} (r_{k}^{q}, r_{k}^{q} + ℓ)

of

J_{i}^{p}

(similarly

I_{k \leftarrow k} (r_{k}^{q}, r_{k}^{q} + ℓ)

of

J_{k}^{q, g}

), which potentially contributes to

I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ

and

ℓ - C_{k} + 1

. Here,

I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ

is more tightly upper-bounded than (7) and is given by

\begin{matrix} I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ) = ⌊ \frac{1}{m} & \cdot (\sum_{τ_{i} \in τ - {τ_{k}}} N_{i} \cdot (m i n (I_{k}^{i} (r_{k}^{q}, r_{k}^{q} + ℓ), ℓ - C_{k} + 1)) \\ + (N_{k} - 1) \cdot m i n (I_{k \leftarrow k} (r_{k}^{q}, r_{k}^{q} + ℓ), ℓ - C_{k} + 1)) ⌋ . \end{matrix}

(8)

Based on this reasoning, RTA for TL-NMR tests the schedulability of

τ_{k}

as follows.

Theorem 1.

A task

τ_{k} \in τ

is schedulable under TL-NMR, if a copied job

J_{k}^{q, p}

satisfies the following for any ℓ that holds

C_{k} \leq ℓ \leq D_{k}

.

\begin{matrix} C_{k} + I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ) \leq ℓ . \end{matrix}

(9)

Proof.

Suppose that

J_{k}^{q, p}

cannot complete its execution in [

r_{k}^{q}, r_{k}^{q} + ℓ

) even if Equation (9) holds. By the definition of

I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ)

,

J_{k}^{q, p}

’s execution is hindered by higher priority jobs in [

r_{k}^{q}, r_{k}^{q} + ℓ

) for at most

I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ)

. In addition, the worst case execution time for

J_{k}^{q, p}

to complete its execution is

C_{k}

. Thus,

J_{k}^{q, p}

can complete its execution in

C_{k} + I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ)

after the release of

r_{k}^{q}

. By Equation (9),

C_{k} + I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ) \leq ℓ

holds, and ℓ is less than or equal to

D_{k}

. Therefore

J_{k}^{q, p}

is schedulable if Equation (9) holds, which contradicts the supposition. □

The remaining issue is to find a value of ℓ and an upper bound

I_{k}^{i} (r_{k}^{j}, r_{k}^{j} + ℓ)

. RTA for TL-NMR works as follows. Initially, ℓ is set to

C_{k}

and RTA tests whether the inequality holds. If the inequality holds, the task is deemed schedulable. Otherwise, RTA resets ℓ to the previous value of the LHS of the inequality, until the inequality holds or

ℓ > D_{k}

;

ℓ > D_{k}

represents that

τ_{k}

is deemed unschedulable. If the inequality holds,

τ_{k}

is deemed schedulable and the value of ℓ satisfying the inequality is

R_{k}

, i.e.,

R_{k} \leq D_{k}

holds.

Figure 4 illustrates how RTA judges the schedulability of

τ_{k}

scheduled by TL-NMR.

5. Results

In this section, we evaluate the performance of the proposed TL-NMR compared to that of the existing fault-tolerant techniques. For performance metrics, we measure the number of randomly generated task sets that are deemed schedulable (called schedulable ratio) and the average of considered task sets’ system safety (called average system safety). Task sets for our evaluation are randomly generated based on a popular task set generation method that has been exploited in a number of existing studies regarding real-time scheduling [14,15,16]. The task set generation method used in our study has two parameters such that the number of processors

m \in {2, 4, 8, 16},

and the input parameter

α

or

1 / β

\in {0.1, 0.3, 0.5, 0.7, 0.9}

for individual task use (

C_{i} / T_{i}

) distribution. If

α

is given, a value for

C_{i} / T_{i}

is uniformly selected in [0, 0.5) and [0.5, 1) with probability

α

and

1 - α

, respectively, according to a given bimodal distribution. On the other hand, when

1 / β

is given, the value is selected according to the exponential distribution whose probability density function is

β \cdot e x p (- β \cdot x)

. Then, the parameters of a task in a task set are determined as follows.

T_{i}

is uniformly determined in

[1, 1000]

,

C_{i}

is chosen by the bimodal or exponential parameter with

T_{i}

already determined, and

D_{i}

is uniformly chosen in

[C_{i}, T_{i}]

. Ten thousand task sets are randomly generated for each value of m.

Next, we discuss the performance (i.e., schedulable ratio and average system safety) and properties of the following techniques:

TL-NMR-RM: RM scheduling incorporating the proposed TL-NMR is discussed in Section 3, and
xMR-RM: RM scheduling incorporating NMR in which x identical copies are executed to improve reliability (e.g., 3M-RM represents the scheduler where three identical copies of each job are executed under RM scheduling) proposed in the existing studies [4,5,6].

Please note that although we conducted experiments for well performing FP scheduling such as earliest quasi-deadline first and deadline monotonic, we do not explicitly discuss the performance of such scheduling because they show a trend similar to RM scheduling. 1MR-RM may also represent RM scheduling without NMR because a single copy of each job of each task is executed under vanilla RM scheduling.

Figure 5 presents the experimental results regarding the schedulable ratio and average system safety of the considered techniques. Figure 5a,b shows the schedulable ratio of the considered techniques according to varying task set use

(\sum_{τ_{k} \in τ}^{} C_{k} / T_{k})

for

m = 2

and

m = 16

, respectively. TOT represents the number of generated task sets whose task set use is represented by the axis of task set use. For example, the number of generated task sets whose task set use is about 1.5 is approximately 430 (represented by z-axis). Furthermore, Figure 5c–f plots the average system safety of task sets of the considered techniques for

γ = 0.001

and

γ = 0.01

, respectively.

From the figures, we make the following observations:

TL-NMR-RM maintains the same performance in schedulable ratio (i.e., the number of task sets deemed schedulable) compared to 1MR-RM for both $m = 2$ and 16 (Figure 5a,b),
TL-NMR-RM improves the average system safety more for all task set use compared to 1MR-RM, while higher number of job-copies in NMR decreases both the schedulable ratio and average system safety (Figure 5c–f), and
TL-NMR-RM better outperforms 1MR-RM for greater values of $γ$ (Figure 5c–f).

Observation 1 is due to the key property of TL-NMR. That is, it improves the system reliability without schedulability loss by determining the individual number of copies of each task (in conjunction with the proposed RTA in Section 4) according to Algorithm 1. Thus, a schedulable task set under 1MR-RM (i.e., vanilla RM) never become unschedulable under TL-NMR-RM.

Observation 2 is the natural consequence stemming from Observation 1 in that the system reliability is enhanced owing to increased

N_{k}

(as Equation (2) indicates), while the schedulability is not changed under TL-NMR-RM compared to that in 1MR-RM. However, the average system safety decreases for the higher number of job-copies in NMR. This counterintuitive phenomenon happens because the schedulability plays a more important role to improve the average system safety than to increase the system reliability. Recall that the system safety is zero when the task set is unschedulable. System reliability is inevitably improved if

N_{k}

of each task increases, but the schedulability is not guaranteed when such an increase of

N_{k}

is conducted not in conjunction with schedulability analysis such as RTA. Thus, the higher number of copies in NMR aggravates the average system safety by compromising schedulability even though it may improve reliability.

Observation 3 highlights the advantage of TL-NMR such that the system reliability under TL-NMR increases more compared to that under 1MR-RM when the arrival rate of transient error increases. As Equation (2) indicates, the system reliability is disproportional to

γ

but proportional to

N_{k}

. Thus, the average system safety under 1MR-RM (and the other xMR-RM series) sharply decreases with increasing

γ,

while TL-NMR-RM makes up for such a degradation by increasing

N_{k}

without schedulability loss.

6. Conclusions

In this study, we proposed a TL-NMR technique that improves the system reliability of a target system in which tasks are scheduled by any FP scheduling without schedulability loss. TL-NMR overcomes a limitation of the existing NMR, i.e., some tasks may miss their deadlines while multiple copies of tasks are executed in parallel to improve system reliability. The TL-NMR framework determines the number of copies of each job

J_{k}^{q}

of a task

τ_{k}

while ensuring that every copied job completes its execution before its absolute deadline in conjunction with an RTA framework proposed in this paper. Through our experiments, we demonstrated that TL-NMR maintains schedulability while it improves average system safety compared to the existing vanilla RM scheduling technique. The performance gap between TL-NMR and vanilla RM becomes larger for systems where transient error occurs frequently (i.e., for a larger value of

γ

). For future work, we plan to extend our work to consider multiple shared resources and derive resource-locking protocols [17,18] to improve the analytical capability.

Author Contributions

Conceptualization, J.B. (Jaemin Baek), J.B. (Jeonghyun Baek) and H.B.; Software, H.B.; data curation, J.B. (Jaemin Baek) and J.B. (Jeonghyun Baek); writing-original draft preparation, J.B. (Jaemin Baek), J.B. (Jeonghyun Baek), J.Y. and H.B.; writing-review and editing, J.Y.; Supervision, H.B.; project administration, H.B.; funding acquisition, H.B.

Funding

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. 2019R1F1A1059663).

Conflicts of Interest

The authors declare no conflict of interest.

References

Liu, C.; Layland, J. Scheduling Algorithms for Multi-programming in A Hard-Real-Time Environment. J. ACM 1973, 20, 46–61. [Google Scholar] [CrossRef]
Ekpo, S.; George, D. A system-based design methodology and architecture for highly adaptive small satellites. In Proceedings of the IEEE International Systems Conference, San Diego, CA, USA, 5–8 April 2010; pp. 516–519. [Google Scholar]
Randell, B. System structure for software fault tolerance. IEEE Trans. Softw. Eng. 1975, SE-1, 220–232. [Google Scholar] [CrossRef]
Malhotra, S.; Narkhede, P.; Shah, K.; Makaraju, S.; Shanmugasundaram, M. A review of fault tolerant scheduling in multicore systems. Int. J. Sci. Technol. Res. 2015, 4, 132–136. [Google Scholar]
Yu, X.B.; Zhao, J.S.; Zheng, C.W.; Hu, X.H. A Fault-Tolerant Scheduling Algorithm using Hybrid Overloading Technology for Dynamic Grouping based Multiprocessor Systems. Int. J. Comput. Commun. Control. 2012, 7, 990–999. [Google Scholar] [CrossRef]
Zhou, J.; Yin, M.; Li, Z.; Cao, K.; Yan, J.; Wei, T.; Chen, M. Fault-Tolerant Task Scheduling for Mixed-Criticality Real-Time Systems. J. Circuits Syst. Comput. 2017, 26, 1–17. [Google Scholar] [CrossRef]
Ghosh, S.; Melhem, R.; Mosse, D. Fault-tolerance through scheduling of aperiodic tasks in hard real-time multiprocessor systems. IEEE Trans. Parallel Distrib. Syst. 1997, 8, 272–284. [Google Scholar] [CrossRef] [Green Version]
Manimaran, G.; Murthy, C.S.R. A fault-tolerant dynamic scheduling algorithm for multiprocessor real-time systems and its analysis. IEEE Trans. Parallel Distrib. Syst. 1998, 9, 1137–1152. [Google Scholar] [CrossRef]
Al-Omari, R.; Somani, A.K.; Manimaran, G. Efficient overloading techniques for primary-backup scheduling in real-time systems. J. Parallel Distrib. Comput. 2004, 64, 629–648. [Google Scholar] [CrossRef]
Cirinei, M.; Bini, E.; Lipari, G.; Ferrari, A. A Flexible Scheme for Scheduling Fault-Tolerant Real-Time Tasks on Multiprocessors. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Rome, Italy, 26–30 March 2007; pp. 1–8. [Google Scholar]
Aminzadeh, S.; Ejlali, A. A comparative study of system-level energy management methods for fault-tolerant hard real-time systems. IEEE Trans. Comput. 2011, 60, 1228–1299. [Google Scholar] [CrossRef]
Bertogna, M.; Cirinei, M.; Lipari, G. Improved Schedulability Analysis of EDF on Multiprocessor Platforms. In Proceedings of the Euromicro Conference on Real-Time Systems (ECRTS), Balearic Islands, Spain, 6–8 July 2005; pp. 209–218. [Google Scholar]
Bertogna, M.; Cirinei, M. Response-Time Analysis for globally scheduled Symmetric Multiprocessor Platforms. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS), Tucson, AZ, USA, 3–6 December 2007. [Google Scholar]
Bertogna, M.; Cirinei, M.; Lipari, G. Schedulability Analysis of Global Scheduling Algorithms on Multiprocessor Platforms. IEEE Trans. Parallel Distrib. Syst. 2009, 20, 553–566. [Google Scholar] [CrossRef]
Baker, T.P. Comparison of Empirical Success Rates of Global vs. Partitioned Fixed-Priority EDF Scheduling for Hard Real-Time; Technical Report TR–050601; Department of Computer Science, Florida State University: Tallahassee, FL, USA, 2005. [Google Scholar]
Andersson, B.; Bletsas, K.; Baruah, S. Scheduling Arbitrary-Deadline Sporadic Task Systems on Multiprocessor. In Proceedings of the IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, Barcelona, Spain, 30 November–3 December 2008; pp. 197–206. [Google Scholar]
Sha, L.; Rajkumar, R.; Lehoczky, J.P. Priority Inheritance Protocols: An Approach to Real-Time Synchronization. IEEE Trans. Comput. 1990, 39, 1175–1185. [Google Scholar] [CrossRef]
Easwaran, A.; Adersson, B. Resource Sharing in Global Fixed-Priority Preemptive Multiprocessor Scheduling. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS), Washington, DC, USA, 1–4 December 2009; pp. 377–386. [Google Scholar]

Figure 1. Schedules under vanilla RM and RM incorporating NMR (

N = 2

) for an example task set

τ = {τ_{1} (T_{1} = 4, C_{1} = 2, D_{1} = 4), τ_{2} = τ_{3} (8, 4, 8)}

on

m = 3

processors.

Figure 1. Schedules under vanilla RM and RM incorporating NMR (

N = 2

) for an example task set

τ = {τ_{1} (T_{1} = 4, C_{1} = 2, D_{1} = 4), τ_{2} = τ_{3} (8, 4, 8)}

on

m = 3

processors.

Figure 2. Worst case scenario in which workload of

J_{i}^{p}

(set of p-th copied jobs of

τ_{k}

) and execution of

J_{k}^{q, g}

(the other copied job

J_{k}^{q, g}

of

τ_{k}

except

J_{k}^{q, p}

) are maximized in an interval [

r_{k}^{q}, r_{k}^{q} + ℓ

) under preemptive work-conserving TL-NMR scheduling for

N_{i} = 2

and

N_{k} = 3

.

Figure 2. Worst case scenario in which workload of

J_{i}^{p}

(set of p-th copied jobs of

τ_{k}

) and execution of

J_{k}^{q, g}

(the other copied job

J_{k}^{q, g}

of

τ_{k}

except

J_{k}^{q, p}

) are maximized in an interval [

r_{k}^{q}, r_{k}^{q} + ℓ

) under preemptive work-conserving TL-NMR scheduling for

N_{i} = 2

and

N_{k} = 3

.

Figure 3. Scenario in which the amount of interference

I_{k \leftarrow i} (r_{k}^{q}, r_{k}^{q} + ℓ)

of

J_{i}^{p}

that can contribute to

I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ)

on job

J_{k}^{q, p}

is limited to

ℓ - C_{k} + 1

for a task set

τ = {τ_{i}, τ_{k}}

, and

N_{i} = 2

and

N_{k} = 3

.

Figure 3. Scenario in which the amount of interference

I_{k \leftarrow i} (r_{k}^{q}, r_{k}^{q} + ℓ)

of

J_{i}^{p}

that can contribute to

I_{k} (r_{k}^{q}, r_{k}^{q} + ℓ)

on job

J_{k}^{q, p}

is limited to

ℓ - C_{k} + 1

for a task set

τ = {τ_{i}, τ_{k}}

, and

N_{i} = 2

and

N_{k} = 3

.

Figure 4. Flux diagram illustrating how RTA judges the schedulability of

τ_{k}

scheduled by TL-NMR.

Figure 4. Flux diagram illustrating how RTA judges the schedulability of

τ_{k}

scheduled by TL-NMR.

Figure 5. Evaluation results of considered *-RM series for

m = 2

and 16.

Figure 5. Evaluation results of considered *-RM series for

m = 2

and 16.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Baek, J.; Baek, J.; Yoo, J.; Baek, H. An N-Modular Redundancy Framework Incorporating Response-Time Analysis on Multiprocessor Platforms. Symmetry 2019, 11, 960. https://doi.org/10.3390/sym11080960

AMA Style

Baek J, Baek J, Yoo J, Baek H. An N-Modular Redundancy Framework Incorporating Response-Time Analysis on Multiprocessor Platforms. Symmetry. 2019; 11(8):960. https://doi.org/10.3390/sym11080960

Chicago/Turabian Style

Baek, Jaemin, Jeonghyun Baek, Jeeheon Yoo, and Hyeongboo Baek. 2019. "An N-Modular Redundancy Framework Incorporating Response-Time Analysis on Multiprocessor Platforms" Symmetry 11, no. 8: 960. https://doi.org/10.3390/sym11080960

APA Style

Baek, J., Baek, J., Yoo, J., & Baek, H. (2019). An N-Modular Redundancy Framework Incorporating Response-Time Analysis on Multiprocessor Platforms. Symmetry, 11(8), 960. https://doi.org/10.3390/sym11080960

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An N-Modular Redundancy Framework Incorporating Response-Time Analysis on Multiprocessor Platforms

Abstract

1. Introduction

2. System Model

2.1. Task Model

2.2. Reliability Model

3. TL-NMR Framework

4. New RTA for TL-NMR

5. Results

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI