Towards Failure-Aware Inference in Harsh Operating Conditions: Robust Mobile Offloading of Pre-Trained Neural Networks
Abstract
:1. Introduction
- We model the offloading of DNN partitions by graph theory, and adapt offloading redundancy to guarantee failure resilience and enhance inference performance as well.
- As a part of FRIM, we design an adaptive failure detection mechanism to locate partition failures. By incorporating a detection mechanism into the mobile inference process, FRIM can detect and mitigate failures efficiently and accurately.
- We implement FRIM and other baselines in a mobile computing environment, and evaluate FRIM on three well-known DNN models (Alex-Net, Res-Net, and VGG-16). Experimental results demonstrate FRIM can adapt to different scales of device failures while enhancing inference efficiency and thus intelligence-empowered mobile applications in harsh operating conditions.
2. Related Work
2.1. DNN Computation Offloading
2.2. Failure-Resilient DNN Computation Offloading and Distributed Inference
3. Problem Statement and Analysis
3.1. Formulation
3.2. Redundancy Modeling for Efficient Failure Resilience
4. Adaptive Failure-Aware Offloading and Inference
- We divide a pre-trained model into partitions and offload their copies to edge devices. Specifically, extra copies of partitions are offloaded to available edge devices to provide failure-resilience.
- Once the detection scheme identifies failure of the offloaded partitions, we adapt the number of partition copies according to the difference between statistical dependability.
- Finally, we use a random scheduling strategy to execute distributed inference tasks.
4.1. Failure Detection
- Periodically, a device sends heartbeat packets to each device and the average response time of the device accumulates;
- Repeat step 1 k times, and we can obtain k records of average delay;
- To perform a Monte Carlo hypothesis test, we randomly select a delay record from the above k average delay, record its value, and put it back without replacement. Repeat this k times, and a new group of delay records is attained, whose size is k.
- Repeat step 3 times in total, and we can obtain g groups of delay records;
- In this way, we can obtain the confidence interval of response delay according to the distribution of the above g record groups;
- Finally, calculate the confidence intervals to achieve a confidence rate where the response delay ought to be larger than the value that is used as a threshold for failure detection.
4.2. Adaptive Redundancy of Failure-Resilient Model Offloading
Algorithm 1 Adaptive failure-resilient offloading |
Require: The set of the available devices. The threshold to offload new copy of the partition i to next device with available resource.
|
4.3. Distributed Robust Inference with Duplicated Partition Copies
- Execution Graph Update for Devices with Copies of the Partition. We take each device containing the copies of the partition to update its execution graph, including the devices containing the copies of the partition and the devices containing copies of the partitions.
- Shortest Path Calculation for Partition Execution. After a device completes the computation of the partition, it calculates the first and second shortest paths on its execution graph to identify two copies of the partition and the corresponding devices and then confirms the available computation resources on the above devices by exchanging messages.
- If the computation resources on any of the aforementioned devices are not enough for executing the partition, another device on the next shortest path of the execution graph will be checked to confirm whether it has enough resources to perform the partition.
- This process continues until two devices, and , are selected for executing the partition.
- Finally, two devices are confirmed for executing the partition, namely, and , and then the parameters of the model partition are sent to and .
- Handling Execution Delays. For the device to complete its computation of the partition earlier, it will broadcast a message to announce its execution delay of the partition. Without loss of generalization, we assume first completes its computation. Once the other device, , receives such a message from , stops its computation of the partition.
- Updating Execution Graph for Other Devices. Any other devices containing the copies of the partition or the partition update its execution graph according to the received massage.
5. Evaluation
5.1. Experimental Setup
5.1.1. Testbed
5.1.2. Implementation
- No-config: Offloading DNN models with no failure resilience.
- CIODE [9]: Robust offloading strategy of DNN models on multiple end devices. It is robust to deadlock and network jitter through an advanced lock mechanism.
- DFG [7]: It takes additional skip hyper-connections for failure resiliency of distributed DNN inference.
- Early exit [11]: When a partition fails at an early-exit point, the inference task will be terminated before this partition.
- EDGESER [46]: A follow-up work incorporates skip connections and early-exit technology into pre-trained-neural network-based inference tasks.
- FRIM: FRIM can adapt to DNN models of various topologies to offload and enhance the failure resilience of distributed DNN inference.
5.2. Resilience Evaluation of Inference Tasks
5.3. Inference Efficiency Analysis
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Tang, S.; Chen, L.; He, K.; Xia, J.; Fan, L.; Nallanathan, A. Computational intelligence and deep learning for next-generation edge-enabled industrial IoT. IEEE Trans. Netw. Sci. Eng. 2022, 10, 2881–2893. [Google Scholar] [CrossRef]
- Huang, Y.; Qiao, X.; Dustdar, S.; Zhang, J.; Li, J. Toward decentralized and collaborative deep learning inference for intelligent iot devices. IEEE Netw. 2022, 36, 59–68. [Google Scholar] [CrossRef]
- Shuvo, M.M.H. Edge AI: Leveraging the Full Potential of Deep Learning. In Recent Innovations in Artificial Intelligence and Smart Applications; Springer: Berlin/Heidelberg, Germany, 2022; pp. 27–46. [Google Scholar]
- Wu, Y.; Wu, J.; Chen, L.; Liu, B.; Yao, M.; Lam, S.K. Share-Aware Joint Model Deployment and Task Offloading for Multi-Task Inference. IEEE Trans. Intell. Transp. Syst. 2024, 25, 5674–5687. [Google Scholar] [CrossRef]
- Xu, Y.; Mohammed, T.; Di Francesco, M.; Fischione, C. Distributed Assignment with Load Balancing for DNN Inference at the Edge. IEEE Internet Things J. 2022, 10, 1053–1065. [Google Scholar] [CrossRef]
- Hadidi, R.; Cao, J.; Ryoo, M.S.; Kim, H. Robustly executing DNNs in IoT systems using coded distributed computing. In Proceedings of the 56th Annual Design Automation Conference 2019, Las Vegas, NV, USA, 2–6 June 2019; pp. 1–2. [Google Scholar]
- Yousefpour, A.; Devic, S.; Nguyen, B.Q.; Kreidieh, A.; Liao, A.; Bayen, A.M.; Jue, J.P. Guardians of the deep fog: Failure-resilient dnn inference from edge to cloud. In Proceedings of the First International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things, New York, NY, USA, 10–13 November 2019; pp. 25–31. [Google Scholar]
- Yousefpour, A.; Nguyen, B.Q.; Devic, S.; Wang, G.; Kreidieh, A.; Lobel, H.; Bayen, A.M.; Jue, J.P. ResiliNet: Failure-Resilient Inference in Distributed Neural Networks. arXiv 2020, arXiv:2002.07386. [Google Scholar]
- Chen, Z.; Xu, Z.; Wan, J.; Tian, J.; Liu, L.; Zhang, Y. Conflict-Resilient Incremental Offloading of Deep Neural Networks to the Edge of Smart Environment. Math. Probl. Eng. 2021, 2021, 9985006. [Google Scholar] [CrossRef]
- Sen, T.; Shen, H. Fault Tolerant Data and Model Parallel Deep Learning in Edge Computing Networks. In Proceedings of the 2024 IEEE 21st International Conference on Mobile Ad-Hoc and Smart Systems (MASS), Seoul, Republic of Korea, 23–25 September 2024; pp. 460–468. [Google Scholar] [CrossRef]
- Majeed, A.A.; Kilpatrick, P.; Spence, I.; Varghese, B. CONTINUER: Maintaining Distributed DNN Services During Edge Failures. In Proceedings of the 2022 IEEE International Conference on Edge Computing and Communications (EDGE), Barcelona, Spain, 10–16 July 2022; pp. 143–152. [Google Scholar] [CrossRef]
- Khan, F.M.; Baccour, E.; Erbad, A.; Hamdi, M. Adaptive ResNet Architecture for Distributed Inference in Resource-Constrained IoT Systems. In Proceedings of the 2023 International Wireless Communications and Mobile Computing (IWCMC), Marrakesh, Morocco, 19–23 June 2023; pp. 1543–1549. [Google Scholar]
- Mohammed, T.; Joe-Wong, C.; Babbar, R.; Di Francesco, M. Distributed inference acceleration with adaptive DNN partitioning and offloading. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Toronto, ON, Canada, 6–9 July 2020; pp. 854–863. [Google Scholar]
- Jeong, H.J.; Lee, H.J.; Shin, K.Y.; Yoo, Y.H.; Moon, S.M. Perdnn: Offloading deep neural network computations to pervasive edge servers. In Proceedings of the 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS), Singapore, 29 November–1 December 2020; pp. 1055–1066. [Google Scholar]
- Xu, Z.; Zhao, L.; Liang, W.; Rana, O.F.; Zhou, P.; Xia, Q.; Xu, W.; Wu, G. Energy-aware inference offloading for DNN-driven applications in mobile edge clouds. IEEE Trans. Parallel Distrib. Syst. 2020, 32, 799–814. [Google Scholar] [CrossRef]
- Chen, Y.; Yang, L.T.; Cui, Z. Tensor-Based Lyapunov Deep Neural Networks Offloading Control Strategy with Cloud-Fog-Edge Orchestration. IEEE Trans. Ind. Inform. 2023; early access. [Google Scholar]
- Wang, N.; Duan, Y.; Wu, J. Accelerate cooperative deep inference via layer-wise processing schedule optimization. In Proceedings of the 2021 International Conference on Computer Communications and Networks (ICCCN), Athens, Greece, 19–22 July 2021; pp. 1–9. [Google Scholar]
- Almeida, M.; Laskaridis, S.; Venieris, S.I.; Leontiadis, I.; Lane, N.D. Dyno: Dynamic onloading of deep neural networks from cloud to device. ACM Trans. Embed. Comput. Syst. 2022, 21, 1–24. [Google Scholar] [CrossRef]
- Duan, Y.; Wu, J. Computation offloading scheduling for deep neural network inference in mobile computing. In Proceedings of the 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS), Tokyo, Japan, 25–28 June 2021; pp. 1–10. [Google Scholar]
- Liu, K.; Liu, C.; Yan, G.; Lee, V.C.; Cao, J. Accelerating DNN Inference With Reliability Guarantee in Vehicular Edge Computing. IEEE/ACM Trans. Netw. 2023, 31, 3238–3253. [Google Scholar] [CrossRef]
- Liao, Z.; Hu, W.; Huang, J.; Wang, J. Joint multi-user DNN partitioning and task offloading in mobile edge computing. Ad Hoc Netw. 2023, 144, 103156. [Google Scholar] [CrossRef]
- Kakolyris, A.K.; Katsaragakis, M.; Masouros, D.; Soudris, D. RoaD-RuNNer: Collaborative DNN partitioning and offloading on heterogeneous edge systems. In Proceedings of the 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), Antwerp, Belgium, 17–19 April 2023; pp. 1–6. [Google Scholar]
- Duan, Y.; Wu, J. Optimizing Job Offloading Schedule for Collaborative DNN Inference. IEEE Trans. Mob. Comput. 2023, 23, 3436–3451. [Google Scholar] [CrossRef]
- Zhou, H.; Li, M.; Wang, N.; Min, G.; Wu, J. Accelerating Deep Learning Inference via Model Parallelism and Partial Computation Offloading. IEEE Trans. Parallel Distrib. Syst. 2022, 34, 475–488. [Google Scholar] [CrossRef]
- Wang, F.; Cai, S.; Lau, V.K. Sequential offloading for distributed dnn computation in multiuser mec systems. IEEE Internet Things J. 2023, 10, 18315–18329. [Google Scholar] [CrossRef]
- Hou, X.; Guan, Y.; Han, T.; Zhang, N. Distredge: Speeding up convolutional neural network inference on distributed edge devices. In Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, 30 May–3 June 2022; pp. 1097–1107. [Google Scholar]
- Qin, W.; Chen, H.; Wang, L.; Xia, Y.; Nascita, A.; Pescapè, A. MCOTM: Mobility-aware computation offloading and task migration for edge computing in industrial IoT. Future Gener. Comput. Syst. 2024, 151, 232–241. [Google Scholar] [CrossRef]
- Younis, A.; Maheshwari, S.; Pompili, D. Energy-Latency Computation Offloading and Approximate Computing in Mobile-Edge Computing Networks. IEEE Trans. Netw. Serv. Manag. 2024, 21, 3401–3415. [Google Scholar] [CrossRef]
- Lin, N.; Bai, L.; Hawbani, A.; Guan, Y.; Mao, C.; Liu, Z.; Zhao, L. Deep Reinforcement Learning-Based Computation Offloading for Servicing Dynamic Demand in Multi-UAV-Assisted IoT Network. IEEE Internet Things J. 2024, 11, 17249–17263. [Google Scholar] [CrossRef]
- Guo, X.; Jiang, Q.; Shen, Y.; Pimentel, A.D.; Stefanov, T. EASTER: Learning to Split Transformers at the Edge Robustly. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2024, 43, 3626–3637. [Google Scholar] [CrossRef]
- Guo, X.; Jiang, Q.; Pimentel, A.D.; Stefanov, T. RobustDiCE: Robust and Distributed CNN Inference at the Edge. In Proceedings of the 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC), Incheon, Republic of Korea, 22–25 January 2024; pp. 26–31. [Google Scholar] [CrossRef]
- Hu, Y.; Xu, X.; Duan, L.; Bilal, M.; Wang, Q.; Dou, W. End-Edge Collaborative Inference of Convolutional Fuzzy Neural Networks for Big Data-Driven Internet of Things. IEEE Trans. Fuzzy Syst. 2024, 33, 203–217. [Google Scholar] [CrossRef]
- Hou, X.; Ren, Z.; Wang, J.; Cheng, W.; Ren, Y.; Chen, K.C.; Zhang, H. Reliable Computation Offloading for Edge-Computing-Enabled Software-Defined IoV. IEEE Internet Things J. 2020, 7, 7097–7111. [Google Scholar] [CrossRef]
- Whaiduzzaman, M.; Barros, A.; Shovon, A.R.; Hossain, M.R.; Fidge, C. A resilient fog-IoT framework for seamless microservice execution. In Proceedings of the 2021 IEEE International Conference on Services Computing (SCC), Chicago, IL, USA, 5–10 September 2021; pp. 213–221. [Google Scholar]
- Li, P.; Koyuncu, E.; Seferoglu, H. Respipe: Resilient model-distributed dnn training at edge networks. In Proceedings of the ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 3660–3664. [Google Scholar]
- Jeong, H.J.; Lee, H.J.; Shin, C.H.; Moon, S.M. IONN: Incremental offloading of neural network computations from mobile devices to edge servers. In Proceedings of the ACM Symposium on Cloud Computing, Carlsbad, CA, USA, 11–13 October 2018; pp. 401–411. [Google Scholar]
- Shin, K.Y.; Jeong, H.J.; Moon, S.M. Enhanced Partitioning of DNN Layers for Uploading from Mobile Devices to Edge Servers. In Proceedings of the 3rd International Workshop on Deep Learning for Mobile Systems and Applications, Seoul, Republic of Korea, 21 June 2019; pp. 35–40. [Google Scholar]
- Peercy, M.; Banerjee, P. Fault tolerant VLSI systems. Proc. IEEE 1993, 81, 745–758. [Google Scholar] [CrossRef]
- Banerjee, M.; Borges, C.; Choo, K.K.R.; Lee, J.; Nicopoulos, C. A hardware-assisted heartbeat mechanism for fault identification in large-scale iot systems. IEEE Trans. Dependable Secur. Comput. 2020, 19, 1254–1265. [Google Scholar] [CrossRef]
- Xu, Z.; Chen, B.; Wang, N.; Zhang, Y.; Li, Z. Elda: Towards efficient and lightweight detection of cache pollution attacks in ndn. In Proceedings of the 2015 IEEE 40th Conference on Local Computer Networks (LCN), Clearwater Beach, FL, USA, 26–29 October 2015; pp. 82–90. [Google Scholar]
- Guo, T. Cloud-based or on-device: An empirical study of mobile deep inference. In Proceedings of the 2018 IEEE International Conference on Cloud Engineering (IC2E), Orlando, FL, USA, 17–20 April 2018; pp. 184–190. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 630–645. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Majeed, A.A. Strategies for Maintaining Efficiency of Edge Services. Ph.D. Thesis, Queen’s University Belfast, Belfast, UK, 2023. [Google Scholar]
ID | D1, D2, D3 | D4 | |
---|---|---|---|
CPU | Kryo 3.2 GHz | i7-9700 (3.00 GHz) | |
RAM | 12 GB | 16 GB | |
GPU | Adreno 740 680 MHz | NVIDIA GeForce RTX 3090 | |
Wireless network | 802.11ac (2.4/5 GHz), Bluetooth 5.0 | 802.11ac (2.4/5 GHz), Bluetooth 5.0 | |
Wired network | Gigabit Ethernet | Gigabit Ethernet | |
Power | Idle power | 4 W | 95 W |
Full load power | 6 W | 120 W | |
Average power | 5 W | 100 W |
Failure Devices | Accuracy (%) | ||||||
---|---|---|---|---|---|---|---|
No-Config | CIODE | DFG | Early Exit | EDGESER | FRIM | ||
Alex-Net | D1 | 8.59 | 9.01 | 85.24 | 75.36 | 87.71 | 87.66 |
D2 | 8.01 | 9.22 | 82.50 | 72.23 | 86.50 | 87.55 | |
D3 | 7.99 | 8.85 | 84.42 | 73.86 | 87.43 | 86.31 | |
D1, D2 | 7.92 | 7.82 | 7.98 | 7.54 | 8.07 | 87.42 | |
D2, D3 | 7.79 | 7.85 | 7.64 | 7.43 | 7.79 | 86.59 | |
D1, D3 | 7.88 | 7.61 | 71.13 | 60.36 | 82.11 | 87.55 | |
Res-Net | D1 | 8.33 | 7.73 | 93.10 | 84.20 | 94.67 | 95.47 |
D2 | 8.14 | 8.21 | 93.65 | 81.56 | 94.50 | 94.75 | |
D3 | 7.26 | 8.03 | 92.83 | 80.36 | 95.03 | 95.64 | |
D1, D2 | 8.07 | 7.16 | 8.01 | 7.96 | 8.17 | 94.67 | |
D2, D3 | 7.84 | 8.07 | 8.18 | 7.82 | 7.59 | 94.81 | |
D1, D3 | 7.13 | 7.94 | 82.19 | 71.34 | 90.23 | 94.61 | |
VGG-16 | D1 | 8.41 | 8.69 | 91.46 | 82.42 | 91.74 | 92.67 |
D2 | 7.36 | 7.79 | 92.32 | 80.23 | 92.61 | 92.71 | |
D3 | 8.20 | 8.71 | 91.55 | 83.56 | 92.60 | 92.65 | |
D1, D2 | 7.12 | 8.26 | 8.31 | 8.28 | 9.03 | 92.16 | |
D2, D3 | 8.55 | 7.36 | 8.41 | 8.32 | 8.49 | 91.56 | |
D1, D3 | 8.13 | 8.07 | 82.12 | 71.24 | 89.71 | 91.69 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, W.; Chen, Z.; Gong, Y. Towards Failure-Aware Inference in Harsh Operating Conditions: Robust Mobile Offloading of Pre-Trained Neural Networks. Electronics 2025, 14, 381. https://doi.org/10.3390/electronics14020381
Liu W, Chen Z, Gong Y. Towards Failure-Aware Inference in Harsh Operating Conditions: Robust Mobile Offloading of Pre-Trained Neural Networks. Electronics. 2025; 14(2):381. https://doi.org/10.3390/electronics14020381
Chicago/Turabian StyleLiu, Wenjing, Zhongmin Chen, and Yunzhan Gong. 2025. "Towards Failure-Aware Inference in Harsh Operating Conditions: Robust Mobile Offloading of Pre-Trained Neural Networks" Electronics 14, no. 2: 381. https://doi.org/10.3390/electronics14020381
APA StyleLiu, W., Chen, Z., & Gong, Y. (2025). Towards Failure-Aware Inference in Harsh Operating Conditions: Robust Mobile Offloading of Pre-Trained Neural Networks. Electronics, 14(2), 381. https://doi.org/10.3390/electronics14020381