Towards Failure-Aware Inference in Harsh Operating Conditions: Robust Mobile Offloading of Pre-Trained Neural Networks

Liu, Wenjing; Chen, Zhongmin; Gong, Yunzhan

doi:10.3390/electronics14020381

Open AccessArticle

Towards Failure-Aware Inference in Harsh Operating Conditions: Robust Mobile Offloading of Pre-Trained Neural Networks

by

Wenjing Liu

^1,2

,

Zhongmin Chen

^3,* and

Yunzhan Gong

¹

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China

²

College of Data Science and Application, Inner Mongolia University of Technology, Hohhot 010080, China

³

Weihai Wenlong Power (Group) Co., Ltd., State Grid Corporation of China, Weihai 264423, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(2), 381; https://doi.org/10.3390/electronics14020381

Submission received: 18 December 2024 / Revised: 11 January 2025 / Accepted: 17 January 2025 / Published: 19 January 2025

(This article belongs to the Special Issue New Advances in Distributed Computing and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Pre-trained neural networks like GPT-4 and Llama2 have revolutionized intelligent information processing, but their deployment in industrial applications faces challenges, particularly in harsh environments. To address these related issues, model offloading, which involves distributing the computational load of pre-trained models across edge devices, has emerged as a promising solution. While this approach enables the utilization of more powerful models, it faces significant challenges in harsh environments, where reliability, connectivity, and resilience are critical. This paper introduces failure-resilient inference in mobile networks (FRIM), a framework that ensures robust offloading and inference without the need for model retraining or reconstruction. FRIM leverages graph theory to optimize partition redundancy and incorporates an adaptive failure detection mechanism for mobile inference with efficient fault tolerance. Experimental results on DNN models (AlexNet, ResNet, VGG-16) show that FRIM improves inference performance and resilience, enabling more reliable mobile applications in harsh operating environments.

Keywords:

smart environment; edge intelligence; distributed neural networks; mobile inference; failure-resilient offloading

1. Introduction

The emerging pre-trained neural networks, e.g., GPT-4, Llama2, etc., have revolutionized the field of intelligent information processing with a transformative impact on AI, science, and society. These neural networks have been used as backbone models for enabling efficient mobile inference, which benefits improved model generalization and supports more intelligent industrial applications [1]. For such an intelligent industrial application, deploying the learning model in the cloud may be impractical. Instead, collaborative local devices are preferred, as continuously transmitting large volumes of raw data from end devices to the cloud model can result in high transmission latency, ultimately degrading the performance of the industrial application [2]. Compared with the rapid and remarkable progress in pre-trained neural networks, the computation and storage resources of devices in an operating condition always remain extremely limited and become a bottleneck for deploying large neural network models [3]. To address the computation limitation of end devices in processing pre-trained neural networks, model offloading [4] is proposed as a promising approach. It is to divide the computation of pre-trained neural networks and offload the obtained partitions to edge devices along the path from the edge to the cloud [5]. Specifically, although industrial devices increasingly utilize the offloading of more powerful models for complex inference, such approaches face significant challenges in harsh operating conditions.

Existing systems in harsh operating conditions face multiple environmental and connectivity challenges, which results in limited functionality, high costs, and difficult deployment. Mobile inference solutions for harsh and hazardous areas must be more resilient to withstand extreme conditions, and designing such equipment requires specialized knowledge to meet stringent safety standards. As a result, these solutions are more expensive to produce, limiting the options available on the market. Connectivity is another common issue faced by industries in remote, harsh, and hazardous areas, as many systems rely on specific connectivity requirements that may not be feasible in these environments.

Robust offloading of the pre-trained models, as well as inference on these models, is vital for failure-resilient inference on mobile devices in harsh operating conditions. In recent years, failure resilience problems have been studied in DNN-based inference processes [6,7,8,9,10]. Hadidi et al. [6] provide robustness by coding-based distributed inference. However, this coding-based method, which increases computational complexity and communication overhead, also affects inference efficiency. In [7,8], Yousef et al., introduced the concept of skip hyper-connections in distributed DNNs, which provides failure resilience for inferencing based on distributed DNNs. By building hyper-connections, intermediate layers of the distributed DNN can be skipped without significant loss of inference accuracy. These hyper-connections between non-contiguous devices with model partitions help make distributed DNNs more resilient to physical failures. They provide an alternative inference path for DNNs when one of the devices with model partitions fails. However, this method requires a total change in the structure of the DNN model, which will make the structure of the DNN model more complicated. Even worse, after the skip hyper-connections are added, the DNN model needs to be retrained, which is infeasible for large neural models with billions of tokens. Ref. [9] proposes Conflict-resilient Incremental Offloading of Deep Neural Networks at Edge (CIODE) for improving the efficiency of DNN inference in the edge smart environment. Ref. [10] implements replicas of model partitions, but no redundancy adaptation is considered. CONTINUER [11] models trade-offs in accuracy, end-to-end latency, and downtime thresholds to handle a device failure but cannot adapt various inference scenarios. But, this approach increase the costs and complexity of the inference process. Khan F M et al. [12] proposed a resilient DNN with an adaptive architecture to reduce resource consumption. They present an empirical method that identifies the connections in ResNet that can be dropped without significant impact on the model’s performance. How to provide failure resilience quickly and accurately without redesigning the pre-trained model structure is an urgent challenge.

To address the aforementioned problems, we present failure-resilient inference in mobile networks (FRIM), with robust offloading of DNN partitions, which is a promising paradigm to guarantee the failure resilience of DNN inference on mobile devices without additional model reconstruction and retraining (see Figure 1). Specifically, we propose theoretical analysis for adaptive offloading of DNN partitions to optimize partition redundancy, which models the offloading of DNN partitions based on graph theory and ensures that redundant partitions are deployed to maintain inference continuity in the face of device failures. On the other hand, these redundant partitions can perform inference in parallel and thus improve inference efficiency. In contrast to state-of-the-art methods, our proposed solution does not need model reconstruction and retraining and improves the failure resilience of mobile inference, as well as its inference performance improvement.

The primary contributions are as follows:

We model the offloading of DNN partitions by graph theory, and adapt offloading redundancy to guarantee failure resilience and enhance inference performance as well.
As a part of FRIM, we design an adaptive failure detection mechanism to locate partition failures. By incorporating a detection mechanism into the mobile inference process, FRIM can detect and mitigate failures efficiently and accurately.
We implement FRIM and other baselines in a mobile computing environment, and evaluate FRIM on three well-known DNN models (Alex-Net, Res-Net, and VGG-16). Experimental results demonstrate FRIM can adapt to different scales of device failures while enhancing inference efficiency and thus intelligence-empowered mobile applications in harsh operating conditions.

The rest of this paper is summarized as follows. We review the related works in Section 2. Some preliminary works are introduced in Section 3. In Section 4, we propose FRIM, including adaptive model partition offloading and and a failure-resilient inference scheme. In Section 5, we implement our experiments and discuss experimental results. Conclusions are drawn in Section 6.

2. Related Work

2.1. DNN Computation Offloading

Computation offloading is a paradigm that allows multiple clients to run neural network models collaboratively. Mohammed [13] proposes a technique to divide a DNN into multiple partitions that can be processed locally by end devices or distributed to one or multiple powerful nodes, such as in fog networks. In order to minimize the execution delay between the client and the edge server, PerDNN [14] uses the server’s GPU statistics to distribute DNN models. Xu et al. [15] focus on distributing DNNs based on one or more cloudlets to minimize the energy consumption or maximize the inference requests. A tensor-based Lyapunov DNN Offloading Control strategy [16] is proposed, aiming to offload DNN computational tasks to cloud–fog–edge from an overall perspective. To optimize the partition process, Wang et al. [17] presented an adaptive DNN partition scheme for inference acceleration. Almeida [18] proposes a distributed reasoning framework that can jointly adjust the partition deployment and the transferred data to the environment. Y Duan et al. [19] further proposed an optimal partition algorithm for tree-structure DNNs. Although multiple DNNs are considered, concurrent inference on these models is not discussed. Liu K et al. [20] formally defined the Cooperative Partitioning and Offloading (CPO) problem, with the objective of minimizing average inference latency. Liao Z et al. [21] proposed a Partition Point Retain algorithm, which is used to exclude layers unsuitable as partition points. Then, the Optimal Partition Point algorithm is employed to find the partition point with a minimal cost for each Mobile DNN deployment. Kakolyris et al. [22] formulates a range of Pareto optimal DNN partitioning strategies for a balance between performance and energy consumption. Duan Y et al. [23] addressed the challenge of jointly optimizing DNN partitioning and task offloading scheduling, to achieve a layer-level scheduling algorithm to manage heterogeneous DNN inference jobs.

Among state-of-the-art solutions, adaptive methods are the ultimate solution for more advanced DNN offloading in a heterogeneous environment. CIODE [9] can dynamically select collaborative targets among trusted edge clusters and collaborate with multiple targets for inference. Zhou et al. [24] leverage the Particle Swarm Optimization (PSO) algorithm to dynamically update the fused-layer path length, intercepting the fused layer’s size and path offloading strategy to explore the optimal solution. In [25], the offloading sequential DNN task is formulated as an infinite-horizon Markov decision process and the Bellman equation for optimal conditions is provided, aiming to minimize the average latency of DNN tasks. DistrEdge [26] uses deep reinforcement learning techniques to adapt the heterogeneity of devices, network conditions, and nonlinear features of DNNs. Wei et al. [27] propose a mobility-aware computation offloading and task migration approach (MCOTM) based on trajectory and resource prediction. Younis et al. [28] introduce the Energy-Latency-aware Task Offloading and Approximate Computing (ETORS) problem to utilize the Dual-Decomposition Method (DDM) to decompose it into sub-problems. Na et al. [29] introduce a deep deterministic policy based on a greedy strategy (DDPGG) that employs a greedy strategy to simultaneously optimize dynamic scheduling, device association, and task allocation for UAVs. Xiaotian Guo et al. [30] introduce a novel methodology, called EASTER, designed to learn robust distribution strategies for transformer models against device failures that considers the tradeoff between robustness (i.e., maintaining model functionality against failures) and resource utilization (considering memory usage and computations). Xiaotian Guo et al. also [31] present a novel partitioning method, called RobustDiCE, for robust distribution and inference of CNN models over multiple edge devices. In [32], an end-edge collaborative inference framework of CFNNs for big data-driven IoT, named DisCFNN, is proposed.

When a DNN model is partitioned and deployed among mobile devices, the failure of any device will make the model partitions placed on this device invalid and thus result in catastrophic outage. Although DNN computation offloading has been studied extensively, the concerns about failure resilience should be taken into account for the various deployment of DNNs in harsh operating conditions.

2.2. Failure-Resilient DNN Computation Offloading and Distributed Inference

Among the failure-resilient approaches of neural network computation offloading, the work of Hadidi et al. [6] first concerns the robustness of DNN-based inference by optimizing the offloading of DNNs. However, this program-based approach is too complex to facilitate improvement of inference efficiency in practice.

DeepFogGuard [7] uses skip hyper-connections to achieve the failure resilience of distributed DNN inference. It is similar to its follow-up works, ResiliNet [8], approaching failure resilience capability through redesigning the neural network model and modifying the training procedures of DNN models. However, both DeepFogGuard and ResiliNet need to redesign the neural network model. It is not feasible to redesign and retrain every model in practice. A failure-resilience paradigm without model modification is highly desired. Hou et al. [33] proposed an edge-computing-aided IoV architecture to traverse potential failures of subtasks. Whaiduzzaman [34] adapted a fog-based fault-tolerant framework. This framework of microservices can effectively recover from any single failure. By incorporating an advanced lock mechanism into computation offloading, CIODE [9] can handle concurrency conflicts and improve the robustness of DNN applications. P. Li et al. [35] propose ResPipe, a novel resilient model-distributed DNN training mechanism against delayed or failed devices. CONTINUER [11] can leverage trade-offs in accuracy, end-to-end latency and downtime thresholds to handle a device failure. However, these two approaches increase the costs and complexity of the inference process. Khan F M et al. [12] proposed a resilient DNN with an adaptive architecture to reduce resource consumption. This paper presents an empirical study that identifies the connections in ResNet that can be dropped without significant impact on the model’s performance in case of resource shortage. However, these existing studies only consider offloading the inference model, while the scheduling of robust model partition copies is not discussed. In [10], a preliminary scheme to use replicas of model partitions is implemented, but no redundancy adaptation is considered. A study on robust inference with rational redundancy is highly desired.

3. Problem Statement and Analysis

In this section, the typical offloading and inference strategy is formulated and analyzed. We focus on the problem caused by the failures of partitions (layers) of pre-trained neural network models after these partitions are offloaded to edge devices.

3.1. Formulation

The purpose of the partition offloading of the pre-trained models is to support distributed inference based on large neural network models and balance the overhead of inference tasks [36,37]. In this paper, we consider a distributed collaborative inference in a harsh industrial environment. In our system model, each edge device can request collaborative inference with each other. In addition to requesting collaborative inference from other edge nodes, the edge device also monitors the others at regular intervals to detect whether they meet a failure. If a failure is detected, the edge device needs to offload copies of the model partitions deployed on the failed device to another device. This would inevitably involve recovery time for inference tasks. To alleviate the impact of failed devices, we need to minimize the recovery time of inference tasks. DNN partitioning follows a popular method [13], which can partition the DNN model adaptively.

An execution graph of the pre-trained neural network (see Figure 2) is generated to guide the inference task on the neural network distributed on edge devices, which indicates the inference process and overhead of the collaborative inference among various devices in a wireless industrial network. For this execution graph, a neural network is divided into three partitions, partition A, B and C. Partition A, B and C of the pre-trained model, as well as the corresponding computational load, are transferred between an edge device and a central server to collaboratively complete the inference task. An inference task consists of several states, which are indicated by yellow cycles, i.e.,

S_{1}

,

S_{2}

, ...,

S_{11}

. Each line between cycles has a weight to describe the corresponding time delay. Among them, the line between two adjacent states of a device related to the same model partition represents the partition execution delay of this device. The line between adjacent states of different devices indicates transmission delay between these two devices. Note that only the states in the inference process are marked with the color yellow, but the states that are not be included in the run-time inference are marked by the color blue instead. At first, partition A and B are deployed at the edge device, whereas partition C is deployed at the central server, represented by partition distribution [A B], [C]. In the 2nd round of this inference process, partition B is additionally offloaded to the central server to improve the inference performance while state

S_{5}

transmits to

S_{6}

and is performed at the central server by state

S_{7}

. The partition distribution becomes [A B], [B C]. An edge device will keep partition B offloaded to it as a redundant copy of partition B. Here, the redundant copy of Partition B is a necessary redundancy for an inference task in a harsh operating condition. In this way, this dynamic partition offloading would balance lower cost and failure-resilient inference in harsh operating conditions.

3.2. Redundancy Modeling for Efficient Failure Resilience

Necessary redundancy in intentional offloaded duplication of the model partitions is highly desired to achieve failure-resilient inference, especially in harsh operation conditions. That means the redundant copies of model partitions are offloaded to edge devices before any of them fail. Furthermore, how to maintain the balance of the computational and communication cost with intentional redundancy is a crucial problem for robust mobile offloading of pre-trained neural networks. To model and optimize this process, we take a directed graph to represent the offloaded partition copies of a pre-trained neural network model (see the graph of the collaborative devices in Figure 3).

In this case, only if all of the copies of a partition become unavailable, this inference task is canceled. As depicted in Figure 3, each cycle represents a partition copy offloaded to a device, and we take a directed graph to depict this failure-resilient inference. In a harsh environment, some devices may fail at the same time. All devices covered by the communication have links with other devices. Considering devices usually communicate through mesh networking or MANET (mobile ad hoc network), the link between every two devices becomes unstable in a harsh operating condition, which is affected by the factors such as the distance between devices, antenna types to establish a reliable and low-latency communication network, etc. Hence, we analyze the impact of device failure or offline devices by formulating the above system in a harsh operating condition.

Let m be the total number of devices with offloaded partition copies. Note that some devices cannot link to each other due to their limited wireless communication range. While we separate a pre-trained neural network into partitions, two adjacent partitions comprise a bipartite graph

G_{0}

. The copies of the former partition are represented by a set

V_{1}

of size

n_{1}

, while the copies of the subsequent partition are

V_{2}

of size

n_{2}

. Note that t represents the relative index of elements within

V_{2}

, not their absolute values. Let

\begin{matrix} V_{1} = \{1, 2, . . ., n_{1}\}, \\ V_{2} = \{n_{1} + 1, n_{1} + 2, . . ., n_{1} + n_{2}\}, \\ 1 \leq s \leq n_{1}, 1 \leq t \leq n_{2} \end{matrix}

(1)

and the possible links between

V_{1}

and

V_{2}

,

E = \{i j : i \in V_{1}, j \in V_{2}\}

(2)

Considering the impact of device failure or offline devices for a specific time slot, the

(\binom{|E|}{Q})

candidate graphs with vertex set

V = V_{1} \cup V_{2}

having exactly Q edges from E and none outside E. Let

K_{s, t}

be the subgraphs contained in the bipartite graph, and include s vertices with the former partitions and t vertices with the subsequent partitions. The expected number of

K_{s, t}

is

E (K_{s, t}) = (\binom{n_{1}}{s}) (\binom{n_{2}}{t}) (\binom{|E| - s t}{Q - s t}) {(\binom{|E|}{Q})}^{- 1}

(3)

where the first part of the formula is the number of ways the copies of the first partition of

K_{s, t}

can be chosen, the second part is the number of ways the copies of the second partition can be chosen. Take

K_{s, t}

as a whole component and the third part is the number of choices for such a component, i.e., the ways that the

Q - s t

edges outside a

K_{s, t}

can be chosen. Here,

\begin{matrix} (\binom{|E| - s t}{Q - s t}) {(\binom{|E|}{Q})}^{- 1} & = \prod_{i = 0}^{s t - 1} \frac{Q - i}{n_{1} n_{2} - i} \\ < {(\frac{Q}{n_{1} n_{2}})}^{s t} \end{matrix}

(4)

Hence,

\begin{matrix} E (K_{s, t}) & < \frac{1}{s! t!} n_{1}^{s} n_{2}^{t} {(\frac{Q}{n_{1} n_{2}})}^{s t} \end{matrix}

(5)

Therefore, when s copies of the former partition and t copies of the subsequent partition fail, the probability that the inference would not be interrupted after the former partition (partition i) is executed is

\begin{matrix} P_{i}^{A} > \frac{n_{1}^{s} n_{2}^{t} - \frac{1}{s! t!} n_{1}^{s} n_{2}^{t} {(\frac{Q}{n_{1} n_{2}})}^{s t}}{n_{1}^{s} n_{2}^{t}} = 1 - \frac{1}{s! t!} {(\frac{Q}{n_{1} n_{2}})}^{s t} \end{matrix}

(6)

Here, we name

P_{i}^{A}

as

p a r t i t i o n

-

a v a i l a b i l i t y

to identify the availability of partition i.

As for a special case that there is no failed copy of the former partition, whereas t copies of the subsequent partition fail,

P_{i}^{A} = \frac{n_{1} - t}{n_{1}}

(7)

On the other hand, in the case that no copy of the former partition but t copies of the subsequent partition fail,

P_{i}^{A} = \frac{n_{2} - s}{n_{2}}

(8)

To achieve robust mobile offloading under harsh operating conditions, the key objective is to reduce redundant offloading while ensuring the failure resilience of all DNN partitions. We take

p a r t i t i o n

-

a v a i l a b i l i t y

of different partitions to guide mobile offloading of pre-trained neural networks in adaptive failure-aware offloading and inference.

4. Adaptive Failure-Aware Offloading and Inference

In practical industrial scenarios, adverse conditions in harsh and hazardous areas will interfere with inference tasks based on pre-trained neural network models. Once any edge device crashes, the related execution path of the inference task will fail, and then the corresponding inference tasks will be delayed. Considering the wireless environment of edge devices in practical industrial scenarios, network congestion or jitters will exacerbate this problem. To guarantee the effectiveness and efficiency of inference tasks in harsh operating conditions, a failure-resilience strategy should be taken into account.

The state-of-the-art failure-resilience strategies exploit some redundancy to counterbalance the effects of partition failures. Usually, redundancy mainly refers to extra partition copy deployment [38]. The edge devices are monitored to identify failures during the mobile inference process and leverage redundant model partitions to compensate failures. The failure of an edge device leads to the failure of the neural network partition offloaded on the device. We offload copies of neural network partitions to edge devices as redundancy. Here, offload refers to the process of transferring computation or data from one device to another. Due to the heterogeneity of edge devices, different devices require specific failure-resilience configurations. In order to optimize redundancy for each edge device, we add partition copies on demand of the inference and continue to adjust the amount in an adaptive way. In this way, we propose failure-resilient inference in mobile networks (FRIM) in order to guarantee inference performance in scenarios with failed devices. We take three phases to detect and alleviate the impact of partition failures:

We divide a pre-trained model into partitions and offload their copies to edge devices. Specifically, extra copies of partitions are offloaded to available edge devices to provide failure-resilience.
Once the detection scheme identifies failure of the offloaded partitions, we adapt the number of partition copies according to the difference between statistical dependability.
Finally, we use a random scheduling strategy to execute distributed inference tasks.

4.1. Failure Detection

To discover device failure and adapt model partition offloading, we incorporate a heartbeat mechanism into a statistic model and detect failures in time. In the learning stage, the heartbeat packets are sent among devices periodically, and the delay in the response packets indicates the statements of the corresponding devices [39]. Furthermore, considering that the Monte Carlo hypothesis test can take fewer samples to estimate the potential distribution [40], we reuse the detection results in terms of heartbeat response delay in a hypothesis test to determine a threshold for the regular response delay of heartbeat packets. In its failure detection stage, a failure is identified once the average response delay of successive heartbeat packets exceeds this threshold. The whole detection process includes six steps:

Periodically, a device sends heartbeat packets to each device and the average response time of the device accumulates;
Repeat step 1 k times, and we can obtain k records of average delay;
To perform a Monte Carlo hypothesis test, we randomly select a delay record from the above k average delay, record its value, and put it back without replacement. Repeat this k times, and a new group of delay records is attained, whose size is k.
Repeat step 3 $g - 1$ times in total, and we can obtain g groups of delay records;
In this way, we can obtain the confidence interval of response delay according to the distribution of the above g record groups;
Finally, calculate the confidence intervals to achieve a confidence rate where the response delay ought to be larger than the value that is used as a threshold for failure detection.

In this way, our failure detection mechanism can detect and locate failures during runtime. With the support of our detection mechanism, we can adapt the copy of the same partition of the failure device to compensate for the failure.

4.2. Adaptive Redundancy of Failure-Resilient Model Offloading

Compared with conventional methods of computation offloading for pre-trained neural networks, our scheme is tailored to harsh scenarios where devices may fail. We leverage redundancy to counterbalance the impact of device failures. More specifically, if some devices fail in harsh operation conditions, our scheme can adapt collaborative computing across multiple devices with redundant partitions of neural networks and perform inference efficiently. It is essential to note that for this design, partition copies are not offloaded randomly. Upon a device failure being detected, the partitions offloaded on this device cannot support model inference, and the redundant copy of these partitions will be used instead. Here, we need to maintain necessary redundant copy of the partitions; otherwise, we must initiate the process of offloading a new copy of the same partitions to other devices and delay the inference process. Note that, the unnecessary redundancy for specific partitions will occupy the limited resources on devices and degrade inference efficiency. Therefore, we offload the copy of partitions in an adaptive way, achieving efficient and failure-resilient inference. Specifically, for model partitions with copies, we treat each original partition and its copies as a unit. After a device fails, the partition copies on this device, as well as their connected partition copies, cannot be used any more. In this case, we need to calculate the number of copies for each partition and adapt the offloading of the partition copies with Formulas (6)–(8). For the example illustrated in Figure 4, after device D4 fails, not only do the partition copies on D4 become unavailable, but also some successors or precursors of these partition copies become unavailable according to the logic described in lines 10–23 of Algorithm 1. In this algorithm, when a device fails, it triggers a series of checks regarding the connectivity and the availability of related partition copies. For instance, when D4 fails, the system checks if there are any successor or precursor relationships between the partition copies on D4 and those on other devices. Due to the absence of an alternative execution path as defined in the algorithm, some related partition copies on other devices become unavailable. Specifically, in this case, the copy of partition B on D5 is also unavailable after D4 fails. This is because, based on the algorithm, D5 might be considered as a successor or precursor in a certain relation to the partition copies on D4, and when D4 fails, the lack of a proper execution path makes the copy of partition B on D5 unable to be used for inference any more.

More specifically, we implement an algorithm to adapt the offloading of the redundant copy of the partitions (see Algorithm 1).

Algorithm 1 Adaptive failure-resilient offloading

Require:

d [] :

The set of the available devices.

ϵ :

The threshold to offload new copy of the partition i to next device with available resource.

if Device $d_{j}$ detected to be failed then
for Each i in $P S e t_{j}$ // $P S e t_{j}$ is the set of Partition copies in $d_{j}$ . do
Update( $P_{i}^{A}$ );
if $P_{i}^{A} < ϵ$ // $ϵ$ is a threshold for offloading new partition copies. then
Announce( $d_{j}$ , Partition i);// $d_{j}$ announces its failed copy of Partition i.
Offload(Partition i, $d []$ );//Offload a copy of Partition i to another available device.
end if
end for
end if
for Each $d_{k}$ in $d []$ do
if RAnnounce( $d_{k}$ , $d_{j}$ ,Partition i)&&Contain( $d_{k}$ ,Partition i)

Check if

d_{k}

received an announce and contains Partition i. then

12:: if HasLink( $d_{k}$ , $d_{j}$ ) then
13:: $l = i - 1$ ;
14:: if !HasSuccessor( $d_{k}$ ,Partition l) then
15:: Remove( $d_{k}$ ,Partition l);//Remove isolated precursors.
16:: end if
17:: $l = i + 1$ ;
18:: if !HasPrecursor( $d_{k}$ ,Partition l) then
19:: Remove( $d_{k}$ ,Partition l);//Remove isolated successors.
20:: end if
21:: end if
22:: end if
23:: end for

The threshold

ϵ

for offloading new partition copies in Algorithm 1 is an empirical threshold configured according to the failure probabilities of the devices. Different devices have different failure probabilities and thus different empirical thresholds.

4.3. Distributed Robust Inference with Duplicated Partition Copies

The execution delay of the partition copies changes by time. As depicted by the execution graph in Figure 5, each edge between two different states is associated with a specific delay. After a copy of the

i_{t h}

model partition is offloaded to an edge device, we can take this device to execute the offloaded partition copy without any transmission delay, and the delay

T_{i}

for the copy of partition i is equal to the local execution delay

T_{i}^{e x e}

. Otherwise, a copy of this partition will be offloaded to another high-performance device, and the total delay,

T_{i}

, includes the transmission time of input,

T_{i}^{i n}

; the transmission time of output,

T_{i}^{o u t}

; and the execution delay,

T_{i}^{e x e}

on the new device. Hence, the inference delay with the

i_{t h}

partition is

T_{i} = T_{i}^{i n} + T_{i}^{e x e} + T_{i}^{o u t}

(9)

Considering significant resource consumption on the edge device, the device usually cannot perform multiple partition copies simultaneously. To schedule the offloaded partition copies and facilitate robust inference in a rational way, we design a protocol for distributed inference with duplicated partition copies and execution pathes. The detailed design is present as follows:

Execution Graph Update for Devices with Copies of the $i_{t h}$ Partition. We take each device containing the copies of the $i_{t h}$ partition to update its execution graph, including the devices containing the copies of the $i - 1_{t h}$ partition and the devices containing copies of the $i + 1_{t h}$ partitions.
Shortest Path Calculation for $i_{t h}$ Partition Execution. After a device completes the computation of the $i_{t h}$ partition, it calculates the first and second shortest paths on its execution graph to identify two copies of the $i + 1_{t h}$ partition and the corresponding devices and then confirms the available computation resources on the above devices by exchanging messages.
- If the computation resources on any of the aforementioned devices are not enough for executing the $i + 1_{t h}$ partition, another device on the next shortest path of the execution graph will be checked to confirm whether it has enough resources to perform the $i + 1_{t h}$ partition.
- This process continues until two devices, $D_{x}$ and $D_{y}$ , are selected for executing the ${(i + 1)}_{t h}$ partition.
- Finally, two devices are confirmed for executing the $i + 1_{t h}$ partition, namely, $D_{x}$ and $D_{y}$ , and then the parameters of the $i_{t h}$ model partition are sent to $D_{x}$ and $D_{y}$ .
Handling Execution Delays. For the device to complete its computation of the $i + 1_{t h}$ partition earlier, it will broadcast a message to announce its execution delay of the $i + 1_{t h}$ partition. Without loss of generalization, we assume $D_{x}$ first completes its computation. Once the other device, $D_{y}$ , receives such a message from $D_{x}$ , $D_{y}$ stops its computation of the $i + 1_{t h}$ partition.
Updating Execution Graph for Other Devices. Any other devices containing the copies of the $i_{t h}$ partition or the $i + 2_{t h}$ partition update its execution graph according to the received massage.

In this way, the overall inference delay in this collaborative environment can be calculated using Formula (10):

T = \sum_{i \in I} min T_{i} + \sum_{j \in J} min T_{j}^{e x e}

(10)

where I is the set of partitions executed on local devices, and J is the set of partitions offloaded to other devices.

For example, consider the execution sequence along the red path (from

S_{2}

to

S_{16}

) shown in Figure 5. Initially, we switch to device

D_{1}

and execute partition A. Subsequently, the output of partition A is transmitted to device

D_{2}

for collaborative execution of partition B, and then the output of partition B is sent back to

D_{1}

to continue the execution of partition C. The total sum of weights along the red path represents the overall delay when collaboratively executing this inference. As for the second inference task,

D_{2}

and

D_{3}

are selected to perform partition B. Since

D_{2}

completes the computation of partition B earlier,

D_{2}

broadcasts a message to announce its execution delay of partition B. After

D_{1}

and

D_{3}

receive the message from

D_{2}

,

D_{1}

updates the execution delay of partition B in its execution graph, while

D_{3}

stops its computation of partition B. Finally,

D_{1}

and

D_{2}

are selected to perform the copies of partition C, among them,

D_{2}

the first one that completes the computation of partition C. The second inference task has been completed. In this way, this approach allows us to capitalize on performance improvements promptly and minimize the cold start time of interrupted inference in harsh operating conditions.

5. Evaluation

5.1. Experimental Setup

5.1.1. Testbed

We implement FRIM on Caffe [41] based on Ubuntu 16.04 LTS and Android 13 systems. Three moto X40 mobile phones with Android 13 are used, namely D1, D2, and D3. These mobile devices are used within a building and connected with a wireless router TP-LINK AX1800M and communicate with each other by using a network library boost.asio. On the other hand, a remote edge server, D4, is deployed in another sub-net, which is used to initiate inference requests in a random way. The detailed information for these devices is listed in Table 1.

5.1.2. Implementation

To evaluate the performance of the proposed failure-resilient inference solution with mobile offloading (FRIM), three pre-trained neural networks popular in industrial IoT networks are used as the payloads to be offloaded to mobile devices, including Alex-Net [42], Res-Net [43], and VGG-16 [44]. All of these DNN models are pre-trained on the ImageNet dataset [45], the most popular and authoritative dataset in the field of image classification and object detection. The ImageNet dataset contains 14,197,122 annotated images of 21,841 classes. The proposed scheme (FRIM) and five baselines are implemented as follows:

No-config: Offloading DNN models with no failure resilience.
CIODE [9]: Robust offloading strategy of DNN models on multiple end devices. It is robust to deadlock and network jitter through an advanced lock mechanism.
DFG [7]: It takes additional skip hyper-connections for failure resiliency of distributed DNN inference.
Early exit [11]: When a partition fails at an early-exit point, the inference task will be terminated before this partition.
EDGESER [46]: A follow-up work incorporates skip connections and early-exit technology into pre-trained-neural network-based inference tasks.
FRIM: FRIM can adapt to DNN models of various topologies to offload and enhance the failure resilience of distributed DNN inference.

With human-guided supervision, we make the mobile devices perform inference tasks of object recognition in practical operating conditions, and the inference accuracy and efficiency of different strategies with or without device failures are evaluated. Specifically, we evaluate failure resilience of FRIM and baselines in different scenarios in Section 5.2. Finally, the impact of different failed devices on inference performance is evaluated (in Section 5.3).

5.2. Resilience Evaluation of Inference Tasks

We compare FRIM with No-config (pre-trained-model-based inference without failure resilience), CIODE, DFG, EDGESER, and early exit to evaluate the failure awareness and resilience of FRIM. In detail, the inference accuracy of these schemes is compared under different numbers of failures.

As depicted in Figure 6, for all of the three pre-trained models, FRIM maintains high accuracy. When a random failure occurs, the accuracy of No-config and CIODE decreases to about 7–9% due to their design without failure-resilience configuration. In contrast, DFG, early exit, EDGESER, and FRIM are implemented with failure resilience design. When the number of failures increases, the inference accuracy of DFG, early exit, and EDGESER drops sharply, whereas FRIM can maintain accurate inference very well. This is because DFG has a limited number of skip hyper-connections, where skip hyper-connections are only added between every two neural network layers. Once successive neural network layers fail, the intermediate data of DFG will be lost. The accuracy of early exit and EDGESER will drop more than 57%. In contrast, FRIM offloads the copies of the pre-trained models to the candidate nodes before detecting a failure. Taking advantage of optional execution paths, the intermediate data of FRIM will not be lost, thereby ensuring the accuracy of inference.

FRIM has been experimentally proven to improve the failure resilience of pre-trained-model-based inference tasks. We now comprehensively evaluate the impact of different failures on the inference performance of FRIM. To evaluate that, we first perform inference with D1 failure, D2 failure, or D3 failure. Then, to evaluate the impact of the failure of two nodes, we let {D1, D2}, {D2, D3}, or {D1, D3} fail. As shown in Table 2, when a failure occurs, the accuracy of No-config and CIODE drops significantly, no matter whether D1, D2, or D3 fails. If two nodes fail, the accuracy of the others except for FRIM, DFG, early exit, and EDGESER, also drops to varying extents. This is because the failure resilience of DFG and EDGESER relies on skip hyper-connections between several neural network layers. When two nodes with consecutive layers fail, there may be no complete execution path for DFG, early exit, and EDGESER, and the loss of intermediate data seems inevitable. Meanwhile, the early exit strategy may result in incomplete or inaccurate inference results. On the contrary, FRIM can avoid these types of potential problems. FRIM offloads the copies of pre-trained models to the candidate nodes before detecting a failure. There is always at least a complete execution path, and the intermediate data of FRIM will not be lost, thereby ensuring the accuracy of inference.

5.3. Inference Efficiency Analysis

To demonstrate the enhanced inference efficiency of FRIM, we compare the inference efficiency of FRIM with that of CIODE, DFG, DGESER, and early exit. Similar to the experiential implementation for impact evaluation of different failures on inference performance, we deploy the partitions of the pre-trained model and schedule the failure of the devices in a random way. Figure 7 illustrates the latency of ten inferences for FRIM and the other baselines for the considered benchmarks (Alexnet, Res-Net32 and VGG16). Each point represents the latency of an inference task. Due to adverse conditions like network jitter, the inference latency of FRIM, CIODE, DFG, EDGESER, and Early-exit exhibits a fluctuating trend of initially increasing and then decreasing over their average latency. FRIM achieve the shortest latency comapred with the others, since the other schemes cannot complete inference tasks without any interruption except for FRIM. The inference latency of EDGESER is longer than that of DFG and early exit due to EDGESER taking too much time to support more redundant failure resilience. Benefiting from neural network partitions and their copies offloaded to candidate nodes, the inference tasks are performed continuously without any interference. On the other hand, the total latency for Alexnet is higher but still lower than VGG16, which happens as Alexnet has two independent paths that come to be efficiently parallelized by all schemes. ResNet32 achieves the best evaluation results in terms of inference efficiency due to its enhanced residual computation.

6. Conclusions

In this paper, we introduce failure-resilient inference in mobile networks (FRIM), a new design method that can improve the failure resilience of distributed DNN inference in edge environments. In contrast to state-of-the-art methods, our proposed solution does not need additional model redesign and retraining while improving the failure resilience of distributed neural networks. In this work, we add copies to DNN partitions as redundancy. We add redundancy on demand and adjust the amount of redundancy adaptively to optimize redundancy. As a part of FRIM, we design a failure detection mechanism to detect and locate failures. In order to select the optimal deployment node for DNN partitions, we propose a DNN partition offloading strategy based on a delay improvement matrix and a deployment matrix. Moreover, we propose two guidelines for adaptively adjusting the number of DNN partitions based on the availability of each DNN partition to optimize redundancy. Finally, we implement FRIM and other baselines in a real edge computing environment composed of edge devices with different hardware configurations. Our experiments show that FRIM can ensure performance and execution efficiency while adapting to varying numbers of failures. In addition, we prove that FRIM is suitable for various failure scenarios through ablation experiments. FRIM is not only applicable to edge computing environments but also has broad potential for deployment in other resource-constrained, failure-prone intelligent systems, such as autonomous vehicles, IoT networks, and remote healthcare systems. Its adaptive redundancy and failure resilience make it particularly suitable for environments where reliability, low latency, and efficient resource usage are critical. Therefore, FRIM opens up new possibilities for robust, real-time, distributed inference in a wide range of industrial applications, achieving future advances in fault-tolerant systems.

Author Contributions

Conceptualization, W.L.; methodology, W.L.; validation, Z.C.; investigation, W.L.; data curation, Z.C.; writing—original draft preparation, W.L.; writing—review and editing, Z.C.; supervision, Y.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Science Foundation of China (61962045, 61502255, 61650205), the Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region (NJYT23104), the Natural Science Foundation of Inner Mongolia Autonomous Region (2023LHMS06023), and the Basic Scientific Research Expenses Program of Universities directly under Inner Mongolia Autonomous Region (JY20220273, JY20240002, JY20240061).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

Author Zhongmin Chen was employed by the company State Grid Corporation of China, Weihai Wenlong Power (Group) Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Tang, S.; Chen, L.; He, K.; Xia, J.; Fan, L.; Nallanathan, A. Computational intelligence and deep learning for next-generation edge-enabled industrial IoT. IEEE Trans. Netw. Sci. Eng. 2022, 10, 2881–2893. [Google Scholar] [CrossRef]
Huang, Y.; Qiao, X.; Dustdar, S.; Zhang, J.; Li, J. Toward decentralized and collaborative deep learning inference for intelligent iot devices. IEEE Netw. 2022, 36, 59–68. [Google Scholar] [CrossRef]
Shuvo, M.M.H. Edge AI: Leveraging the Full Potential of Deep Learning. In Recent Innovations in Artificial Intelligence and Smart Applications; Springer: Berlin/Heidelberg, Germany, 2022; pp. 27–46. [Google Scholar]
Wu, Y.; Wu, J.; Chen, L.; Liu, B.; Yao, M.; Lam, S.K. Share-Aware Joint Model Deployment and Task Offloading for Multi-Task Inference. IEEE Trans. Intell. Transp. Syst. 2024, 25, 5674–5687. [Google Scholar] [CrossRef]
Xu, Y.; Mohammed, T.; Di Francesco, M.; Fischione, C. Distributed Assignment with Load Balancing for DNN Inference at the Edge. IEEE Internet Things J. 2022, 10, 1053–1065. [Google Scholar] [CrossRef]
Hadidi, R.; Cao, J.; Ryoo, M.S.; Kim, H. Robustly executing DNNs in IoT systems using coded distributed computing. In Proceedings of the 56th Annual Design Automation Conference 2019, Las Vegas, NV, USA, 2–6 June 2019; pp. 1–2. [Google Scholar]
Yousefpour, A.; Devic, S.; Nguyen, B.Q.; Kreidieh, A.; Liao, A.; Bayen, A.M.; Jue, J.P. Guardians of the deep fog: Failure-resilient dnn inference from edge to cloud. In Proceedings of the First International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things, New York, NY, USA, 10–13 November 2019; pp. 25–31. [Google Scholar]
Yousefpour, A.; Nguyen, B.Q.; Devic, S.; Wang, G.; Kreidieh, A.; Lobel, H.; Bayen, A.M.; Jue, J.P. ResiliNet: Failure-Resilient Inference in Distributed Neural Networks. arXiv 2020, arXiv:2002.07386. [Google Scholar]
Chen, Z.; Xu, Z.; Wan, J.; Tian, J.; Liu, L.; Zhang, Y. Conflict-Resilient Incremental Offloading of Deep Neural Networks to the Edge of Smart Environment. Math. Probl. Eng. 2021, 2021, 9985006. [Google Scholar] [CrossRef]
Sen, T.; Shen, H. Fault Tolerant Data and Model Parallel Deep Learning in Edge Computing Networks. In Proceedings of the 2024 IEEE 21st International Conference on Mobile Ad-Hoc and Smart Systems (MASS), Seoul, Republic of Korea, 23–25 September 2024; pp. 460–468. [Google Scholar] [CrossRef]
Majeed, A.A.; Kilpatrick, P.; Spence, I.; Varghese, B. CONTINUER: Maintaining Distributed DNN Services During Edge Failures. In Proceedings of the 2022 IEEE International Conference on Edge Computing and Communications (EDGE), Barcelona, Spain, 10–16 July 2022; pp. 143–152. [Google Scholar] [CrossRef]
Khan, F.M.; Baccour, E.; Erbad, A.; Hamdi, M. Adaptive ResNet Architecture for Distributed Inference in Resource-Constrained IoT Systems. In Proceedings of the 2023 International Wireless Communications and Mobile Computing (IWCMC), Marrakesh, Morocco, 19–23 June 2023; pp. 1543–1549. [Google Scholar]
Mohammed, T.; Joe-Wong, C.; Babbar, R.; Di Francesco, M. Distributed inference acceleration with adaptive DNN partitioning and offloading. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Toronto, ON, Canada, 6–9 July 2020; pp. 854–863. [Google Scholar]
Jeong, H.J.; Lee, H.J.; Shin, K.Y.; Yoo, Y.H.; Moon, S.M. Perdnn: Offloading deep neural network computations to pervasive edge servers. In Proceedings of the 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS), Singapore, 29 November–1 December 2020; pp. 1055–1066. [Google Scholar]
Xu, Z.; Zhao, L.; Liang, W.; Rana, O.F.; Zhou, P.; Xia, Q.; Xu, W.; Wu, G. Energy-aware inference offloading for DNN-driven applications in mobile edge clouds. IEEE Trans. Parallel Distrib. Syst. 2020, 32, 799–814. [Google Scholar] [CrossRef]
Chen, Y.; Yang, L.T.; Cui, Z. Tensor-Based Lyapunov Deep Neural Networks Offloading Control Strategy with Cloud-Fog-Edge Orchestration. IEEE Trans. Ind. Inform. 2023; early access. [Google Scholar]
Wang, N.; Duan, Y.; Wu, J. Accelerate cooperative deep inference via layer-wise processing schedule optimization. In Proceedings of the 2021 International Conference on Computer Communications and Networks (ICCCN), Athens, Greece, 19–22 July 2021; pp. 1–9. [Google Scholar]
Almeida, M.; Laskaridis, S.; Venieris, S.I.; Leontiadis, I.; Lane, N.D. Dyno: Dynamic onloading of deep neural networks from cloud to device. ACM Trans. Embed. Comput. Syst. 2022, 21, 1–24. [Google Scholar] [CrossRef]
Duan, Y.; Wu, J. Computation offloading scheduling for deep neural network inference in mobile computing. In Proceedings of the 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS), Tokyo, Japan, 25–28 June 2021; pp. 1–10. [Google Scholar]
Liu, K.; Liu, C.; Yan, G.; Lee, V.C.; Cao, J. Accelerating DNN Inference With Reliability Guarantee in Vehicular Edge Computing. IEEE/ACM Trans. Netw. 2023, 31, 3238–3253. [Google Scholar] [CrossRef]
Liao, Z.; Hu, W.; Huang, J.; Wang, J. Joint multi-user DNN partitioning and task offloading in mobile edge computing. Ad Hoc Netw. 2023, 144, 103156. [Google Scholar] [CrossRef]
Kakolyris, A.K.; Katsaragakis, M.; Masouros, D.; Soudris, D. RoaD-RuNNer: Collaborative DNN partitioning and offloading on heterogeneous edge systems. In Proceedings of the 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), Antwerp, Belgium, 17–19 April 2023; pp. 1–6. [Google Scholar]
Duan, Y.; Wu, J. Optimizing Job Offloading Schedule for Collaborative DNN Inference. IEEE Trans. Mob. Comput. 2023, 23, 3436–3451. [Google Scholar] [CrossRef]
Zhou, H.; Li, M.; Wang, N.; Min, G.; Wu, J. Accelerating Deep Learning Inference via Model Parallelism and Partial Computation Offloading. IEEE Trans. Parallel Distrib. Syst. 2022, 34, 475–488. [Google Scholar] [CrossRef]
Wang, F.; Cai, S.; Lau, V.K. Sequential offloading for distributed dnn computation in multiuser mec systems. IEEE Internet Things J. 2023, 10, 18315–18329. [Google Scholar] [CrossRef]
Hou, X.; Guan, Y.; Han, T.; Zhang, N. Distredge: Speeding up convolutional neural network inference on distributed edge devices. In Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, 30 May–3 June 2022; pp. 1097–1107. [Google Scholar]
Qin, W.; Chen, H.; Wang, L.; Xia, Y.; Nascita, A.; Pescapè, A. MCOTM: Mobility-aware computation offloading and task migration for edge computing in industrial IoT. Future Gener. Comput. Syst. 2024, 151, 232–241. [Google Scholar] [CrossRef]
Younis, A.; Maheshwari, S.; Pompili, D. Energy-Latency Computation Offloading and Approximate Computing in Mobile-Edge Computing Networks. IEEE Trans. Netw. Serv. Manag. 2024, 21, 3401–3415. [Google Scholar] [CrossRef]
Lin, N.; Bai, L.; Hawbani, A.; Guan, Y.; Mao, C.; Liu, Z.; Zhao, L. Deep Reinforcement Learning-Based Computation Offloading for Servicing Dynamic Demand in Multi-UAV-Assisted IoT Network. IEEE Internet Things J. 2024, 11, 17249–17263. [Google Scholar] [CrossRef]
Guo, X.; Jiang, Q.; Shen, Y.; Pimentel, A.D.; Stefanov, T. EASTER: Learning to Split Transformers at the Edge Robustly. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2024, 43, 3626–3637. [Google Scholar] [CrossRef]
Guo, X.; Jiang, Q.; Pimentel, A.D.; Stefanov, T. RobustDiCE: Robust and Distributed CNN Inference at the Edge. In Proceedings of the 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC), Incheon, Republic of Korea, 22–25 January 2024; pp. 26–31. [Google Scholar] [CrossRef]
Hu, Y.; Xu, X.; Duan, L.; Bilal, M.; Wang, Q.; Dou, W. End-Edge Collaborative Inference of Convolutional Fuzzy Neural Networks for Big Data-Driven Internet of Things. IEEE Trans. Fuzzy Syst. 2024, 33, 203–217. [Google Scholar] [CrossRef]
Hou, X.; Ren, Z.; Wang, J.; Cheng, W.; Ren, Y.; Chen, K.C.; Zhang, H. Reliable Computation Offloading for Edge-Computing-Enabled Software-Defined IoV. IEEE Internet Things J. 2020, 7, 7097–7111. [Google Scholar] [CrossRef]
Whaiduzzaman, M.; Barros, A.; Shovon, A.R.; Hossain, M.R.; Fidge, C. A resilient fog-IoT framework for seamless microservice execution. In Proceedings of the 2021 IEEE International Conference on Services Computing (SCC), Chicago, IL, USA, 5–10 September 2021; pp. 213–221. [Google Scholar]
Li, P.; Koyuncu, E.; Seferoglu, H. Respipe: Resilient model-distributed dnn training at edge networks. In Proceedings of the ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 3660–3664. [Google Scholar]
Jeong, H.J.; Lee, H.J.; Shin, C.H.; Moon, S.M. IONN: Incremental offloading of neural network computations from mobile devices to edge servers. In Proceedings of the ACM Symposium on Cloud Computing, Carlsbad, CA, USA, 11–13 October 2018; pp. 401–411. [Google Scholar]
Shin, K.Y.; Jeong, H.J.; Moon, S.M. Enhanced Partitioning of DNN Layers for Uploading from Mobile Devices to Edge Servers. In Proceedings of the 3rd International Workshop on Deep Learning for Mobile Systems and Applications, Seoul, Republic of Korea, 21 June 2019; pp. 35–40. [Google Scholar]
Peercy, M.; Banerjee, P. Fault tolerant VLSI systems. Proc. IEEE 1993, 81, 745–758. [Google Scholar] [CrossRef]
Banerjee, M.; Borges, C.; Choo, K.K.R.; Lee, J.; Nicopoulos, C. A hardware-assisted heartbeat mechanism for fault identification in large-scale iot systems. IEEE Trans. Dependable Secur. Comput. 2020, 19, 1254–1265. [Google Scholar] [CrossRef]
Xu, Z.; Chen, B.; Wang, N.; Zhang, Y.; Li, Z. Elda: Towards efficient and lightweight detection of cache pollution attacks in ndn. In Proceedings of the 2015 IEEE 40th Conference on Local Computer Networks (LCN), Clearwater Beach, FL, USA, 26–29 October 2015; pp. 82–90. [Google Scholar]
Guo, T. Cloud-based or on-device: An empirical study of mobile deep inference. In Proceedings of the 2018 IEEE International Conference on Cloud Engineering (IC2E), Orlando, FL, USA, 17–20 April 2018; pp. 184–190. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 630–645. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
Majeed, A.A. Strategies for Maintaining Efficiency of Edge Services. Ph.D. Thesis, Queen’s University Belfast, Belfast, UK, 2023. [Google Scholar]

Figure 1. A pre-trained model-based mobile inference with redundancy.

Figure 2. An execution graph of neural network models with redundant offloading.

Figure 3. An example for partition offloading with redundancy.

Figure 4. An example for adapting redundancy.

Figure 5. Execution graph with redundant model partitions.

Figure 6. Average accuracy of FRIM and other strategies under different numbers of failures.

Figure 7. Latency for ten inference queries on FRIM and other strategies.

Table 1. Detailed information for deployed devices.

ID		D1, D2, D3	D4
CPU		Kryo 3.2 GHz	i7-9700 (3.00 GHz)
RAM		12 GB	16 GB
GPU		Adreno 740 680 MHz	NVIDIA GeForce RTX 3090
Wireless network		802.11ac (2.4/5 GHz), Bluetooth 5.0	802.11ac (2.4/5 GHz), Bluetooth 5.0
Wired network		Gigabit Ethernet	Gigabit Ethernet
Power	Idle power	4 W	95 W
	Full load power	6 W	120 W
	Average power	5 W	100 W

Table 2. Accuracy degradation under various failure scenarios.

Failure Devices		Accuracy (%)
Failure Devices		No-Config	CIODE	DFG	Early Exit	EDGESER	FRIM
Alex-Net	D1	8.59	9.01	85.24	75.36	87.71	87.66
	D2	8.01	9.22	82.50	72.23	86.50	87.55
	D3	7.99	8.85	84.42	73.86	87.43	86.31
	D1, D2	7.92	7.82	7.98	7.54	8.07	87.42
	D2, D3	7.79	7.85	7.64	7.43	7.79	86.59
	D1, D3	7.88	7.61	71.13	60.36	82.11	87.55
Res-Net	D1	8.33	7.73	93.10	84.20	94.67	95.47
	D2	8.14	8.21	93.65	81.56	94.50	94.75
	D3	7.26	8.03	92.83	80.36	95.03	95.64
	D1, D2	8.07	7.16	8.01	7.96	8.17	94.67
	D2, D3	7.84	8.07	8.18	7.82	7.59	94.81
	D1, D3	7.13	7.94	82.19	71.34	90.23	94.61
VGG-16	D1	8.41	8.69	91.46	82.42	91.74	92.67
	D2	7.36	7.79	92.32	80.23	92.61	92.71
	D3	8.20	8.71	91.55	83.56	92.60	92.65
	D1, D2	7.12	8.26	8.31	8.28	9.03	92.16
	D2, D3	8.55	7.36	8.41	8.32	8.49	91.56
	D1, D3	8.13	8.07	82.12	71.24	89.71	91.69

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, W.; Chen, Z.; Gong, Y. Towards Failure-Aware Inference in Harsh Operating Conditions: Robust Mobile Offloading of Pre-Trained Neural Networks. Electronics 2025, 14, 381. https://doi.org/10.3390/electronics14020381

AMA Style

Liu W, Chen Z, Gong Y. Towards Failure-Aware Inference in Harsh Operating Conditions: Robust Mobile Offloading of Pre-Trained Neural Networks. Electronics. 2025; 14(2):381. https://doi.org/10.3390/electronics14020381

Chicago/Turabian Style

Liu, Wenjing, Zhongmin Chen, and Yunzhan Gong. 2025. "Towards Failure-Aware Inference in Harsh Operating Conditions: Robust Mobile Offloading of Pre-Trained Neural Networks" Electronics 14, no. 2: 381. https://doi.org/10.3390/electronics14020381

APA Style

Liu, W., Chen, Z., & Gong, Y. (2025). Towards Failure-Aware Inference in Harsh Operating Conditions: Robust Mobile Offloading of Pre-Trained Neural Networks. Electronics, 14(2), 381. https://doi.org/10.3390/electronics14020381

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards Failure-Aware Inference in Harsh Operating Conditions: Robust Mobile Offloading of Pre-Trained Neural Networks

Abstract

1. Introduction

2. Related Work

2.1. DNN Computation Offloading

2.2. Failure-Resilient DNN Computation Offloading and Distributed Inference

3. Problem Statement and Analysis

3.1. Formulation

3.2. Redundancy Modeling for Efficient Failure Resilience

4. Adaptive Failure-Aware Offloading and Inference

4.1. Failure Detection

4.2. Adaptive Redundancy of Failure-Resilient Model Offloading

4.3. Distributed Robust Inference with Duplicated Partition Copies

5. Evaluation

5.1. Experimental Setup

5.1.1. Testbed

5.1.2. Implementation

5.2. Resilience Evaluation of Inference Tasks

5.3. Inference Efficiency Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI