1. Introduction
The swift advancement of computing resources and storage of the Internet of Things devices has made significant progress in a wide range of applications, leading to an era in which computing and processing of data are flowing from the cloud to the edge [
1,
2]. Many cloud-based applications that were executed previously in the cloud are now delivered by edge devices such as tablets, wearable IoT devices, or smartphones [
3]. According to recent literature [
4,
5] the number of connected devices is expected to reach 150 billion by the year 2025. Moreover, almost 70% of the computing workloads will be executed at the edge of the network in the next years. The concept of edge computing involves performing data processing at the edge devices, resulting in improving the performance of the applications by using an edge server which provides many benefits like reducing computational latency and energy consumption. A great effort has been made by many researchers to develop different optimization tools to minimize energy consumption and latency and to maximize the performance of edge computing applications.
In addition, Artificial Intelligence, defined as a powerful vehicle for the optimization of machines’ interpretation, learning, and task-performing capacity [
6], has already expanded in a wide range of applications, including reinforcement learning and computer vision. Furthermore, real-time applications like autopilot-assisted driving and, in a few years, connected and autonomous vehicles depend on the fast processing of incoming data, which makes Cloud AI not the best solution. Edge AI is a promising solution to face these problems [
7,
8]. This solution accommodates not only the inference but also the training of the AI models to the edge. Avoiding the transmission of the data to the cloud mitigates latency, privacy, and traffic load problems. The training of a deep learning model requires intensive computing resources. Nonetheless, the explosive evolution of the hardware and the processing power of the edge devices can provide these intensive computing resources which are required to develop an AI application at the edge [
9,
10].
The core concept behind edge computing is to provide Cloud-like functionalities closer to the end device and the user. In other words, a Cloud-like device should be capable of transferring (offloading) the computationally-intensive task to a close-edge device, to reduce, for example, latency and power consumption. This approach is also called task offloading, which allows lower latency, improved energy consumption, and much better reliability. Task offloading has gained much attention from the academic community and the industry. Many studies were published in recent years addressing task offloading, for example, Wang et al. in 2017 and Ai Yuan et al. in 2018 made an effort to classify the various types of task offloading. The aspects on which this body of literature is particularly focused are architecture, resource allocation, communications, algorithms, etc. [
11,
12,
13,
14].
In this paper, we study a cooperative task execution of an edge-assisted object detection system. In more detail, we chose to use the official pre-trained YOLOv5 [
15] object detector using the COCO dataset as workload. This was offloaded to every end device to get the object detector evaluation metrics like mAP, precision, recall, and also execution time, and other metrics for every end device. We used the YOLO object detector and the COCO dataset because they can bench-mark the performance of real-time object detection systems. Moreover, we study two natural criteria for selecting how to compress the images and how to split the image set across the available end devices. We searched experimentally for the optimal trade-off among these metrics to provide optimization in terms of execution accuracy and execution time. In our testbed, we highlight optimal trade-offs between the E2E execution time and execution accuracy of our object detection system. We also show the effect of image compression on transmission delay and, of course, execution time. Our main contribution after taking our experimental results and measurements is to develop a statistical model based on our performance metrics which can be used according to the needs of the user. In other words, the user can tailor the system’s operation to minimize the execution time or maximize the accuracy of the Edge-Assisted object detector. The exploration of the delay and accuracy trade-off in edge computing has been explored in the past, for example, see [
16] and references therein, our work is the first to study this problem in the context of task splitting from servers to end devices. It is noteworthy to point out that this is a new type of offloading architecture, where larger or bigger devices (a server, in this case) splits and outsources their load to many smaller devices (the end nodes, in this case).
This paper follows the following structure: In the next section, we present related works regarding task execution and deep learning application aiming at the optimization of an edge computing system. In
Section 3, we present the architecture and system evaluation of our testbed. In
Section 4, we focus on the design of equations and task allocation policies that were used to get the results. In
Section 5, we demonstrate a discussion with results about the performance of our proposed system. We finally conclude this work in the last section acknowledging limitations and proposing future research directions.
2. Related Works
Previous research was presented by Kuanishbay et al. which focuses on optimizing the methods for offloading computations in edge computing networks [
17]. More specifically, the authors studied six different optimization methods: convex optimization, Lyapunov optimization, heuristic techniques, machine learning, and game theory method. For each of these methods, the authors focused on the objective functions (minimizing energy consumption, minimizing latency, and minimizing the system utility and the system cost), the application areas, for example, MEC or MCC, the types of offloading methods, and the evaluation methods (simulation or real-world dataset). To conclude, this paper provided a summary of some optimization methods for offloading computations in edge computing networks which can support scholars in their computational offloading research.
A comprehensive survey that provides a thorough examination of computational offloading in MEC, covering various objectives such as minimizing delay, reducing energy consumption, maximizing revenue, maximizing system utility, applications, and offloading approaches, was presented by Chuan Feng et al. [
18]. In more detail, the authors conducted a thorough review of the benefits of computation offloading techniques in MEC networks. They analyzed existing works from various perspectives to categorize them based on the application offloading objectives. The authors provide an analytical comparison of various computation offloading techniques, including mathematical solvers, heuristic algorithms, Lyapunov optimization, game theory, the Markov Decision Process, and Reinforcement Learning. They discuss the advantages and disadvantages of each method. Additionally, the paper concludes by highlighting the different challenges faced in computation offloading within MEC networks which must be studied and researched.
Some authors like Wang Xiaofei et al. [
19] have also provided a comprehensive survey of edge computing and deep learning. This survey first discusses the fundamentals of this technology, giving some paradigms of edge computing. The concepts discussed are the Cloudlet and micro data centers, mobile edge computing (MEC), fog computing, and collaborative end-edge-cloud-computing. In addition, potential hardware chips for edge AI applications are described. Lastly, the researchers discuss the challenges of the possibility of DL applications being executed on edge devices, focusing on inference, training, and optimization. The contribution of this paper is the provision of insights on how to identify the most suitable edge computing architecture that can optimize deep learning performance for both training and inference and also address challenges like energy consumption, latency, and network communications.
Another promising line of research was introduced by Yuanming Shi et al. [
20] which emphasizes the difficulties and solutions of communications in edge AI. The main information that this work is providing is a summary of the communications algorithms for distributed training of various AI models that can be deployed on edge nodes, such as zeroth-order, first-order, second-order, and federated optimization algorithms. Additionally, the authors aim to classify different system architectures of an edge AI system, including partition-based and model partition-based edge training systems. In addition, this paper reviews works based on computation offloading and inferencing at the edge of the network. Finally, different edge AI systems were introduced with discussions focusing on the challenges and solutions of the communications in these systems.
The implementation of deep learning algorithms in edge computing has garnered increasing attention in recent years. Jiasi Chen et al. in [
21] presented an analytical review that provides an overview of deep learning applications like computer vision, natural language processing, augmented reality, etc. which are used in edge computing. This work firstly introduces the benefits of bringing the data processing closer to the network edge in other words to the edge nodes or the edge devices. Then the authors provide context for measuring the performance of deep learning models and outline some frameworks for training and inference. The overall purpose of this article was to introduce edge computing and deep learning and present methods for accelerating and evaluating a deep learning inference on edge devices.
Earlier studies also focused on object recognition applications at the network edge. Chu-Fu et al. [
22] suggested a cooperative framework for image object recognition, which integrates the process of AIoT devices. They also formulated a task-offloading network optimization problem for this environment and solved it using a heuristic algorithm based on PSO. The testbed that was used in this paper was a practical surveillance system for monitoring human flow, and a technique for detecting error data were also introduced. The encouraging results of the proposed approach could be applied in a real-world surveillance system.
Scholars also discuss how edge computing technology would improve the performance of IoT. Wei Yu et al. in [
23] conducted an analytical survey to investigate the relationship between edge computing and the Internet of Things, and how the integration of edge computing can enhance the overall performance of IoT networks. More specifically, this study lists the benefits and divides various edge computing architectures into several categories. In terms of response time computation, capacity, and storage space, it also compares these categories’ performance analytically. The performance of IoT devices in the cloud versus edge computing was also compared, and an analytical analysis of the edge computing architecture’s performance, task scheduling, security, and privacy was presented. The study outlined the overbalance of edge computing concerning transmission, storage, and computation and provides a discussion about new challenges such as system integration resource management security, and privacy.
Additional studies to understand more completely the key differences between AI for Edge and AI on Edge were introduced by Shuiguang Deng et al. [
24]. The authors defined AI for Edge as a potential research direction for giving solutions in terms of optimization problems in edge computing. On the other hand, AI on Edge is a research direction on studying how to run an AI model on Edge. In other words, it is a framework that has been designed to carry out both training and inference operations, with the aim of achieving a particular objective such as optimal algorithm performance, cost savings, privacy, etc. Moreover, in this paper, the authors categorize research efforts in edge computing to certain topologies, content, and services and introduce how to initiate an edge computing application with intelligence.
Researchers also proposed an intelligent and cooperative approach to edge computing in IoT networks, which seeks to achieve a mutually beneficial integration of AI and edge computing [
25]. The authors’ focus was on redesigning the AI-related modules of edge computing to enable the distribution of AI’s core functions from the cloud to the edge. In more detail, the authors introduced a new approach to intelligent and cooperative edge computing in an IoT system, which enables edge computing and AI collaboration. The authors presented different scenarios for their proposed approach. In the first scenario, the edge-AI enables collaboratively efforts between the cloud and the edge. This is accomplished through the implementation of a novel data schema for storage where data are kept locally to ensure privacy. The additional scenario builds a smart edge by leveraging locally adaptive AI. Reservation-based approach for computation offloading that is able to adapt to AI predictions of time sequences. The simulation results indicated that this method not only allowed for the execution of the complete AI model (rather than a simplified version), but also achieved significantly lower latency and higher caching ratios at the edge. These results imply that establishing an edge computing ecosystem as a component of an intelligent Internet of Things system based on services and interfaces helpful to the developers is viable.
In addition, Zhou Zhi et al. [
26] have published on edge intelligence. In more detail, this work is offering the background of edge AI, in other words, the architectures, frameworks, and deep learning technologies that are used at the network edge. This paper introduces the basic concepts of AI focusing on deep learning, which is the most popular sector of AI, but also describes the motivation and gives definitions of edge intelligence. This work is essentially an analytical review of the architectures, systems, and framework for training edge intelligence models that concludes by identifying future directions and challenges of edge intelligence.
In 2019, a framework named Edgent, which utilizes device-edge synergy for collaborative inference in deep neural networks (DNN), with the aim of achieving high inference accuracy and low latency was developed by En Li et al. [
27]. The authors after developing this framework implemented it on a desktop PC and a Raspberry Pi and evaluated it based on real work data, and pointed out the efficacy of the Edgent framework. This framework has the ability to compress existing models to accelerate the DNN inference.
To address the issue of task offloading, Firdose Saeik et al. offered a thorough examination of how the edge and cloud may join their computational resources [
28]. Its discussions revolve around artificial intelligence, mathematical, and control theory optimization approaches. These can be utilized to meet diverse constraints, limitations, and dynamic problems pertaining to end-to-end application execution. Unlike other studies, this work on task offloading contributes in a twin way. Firstly, it presents a survey on task offloading that covers three sub-fields, namely optimization algorithms, artificial intelligence techniques, and control theory. Secondly, it categorizes the discussed techniques based on various factors, such as the objective functions, granularity level, utilization of edge and cloud infrastructures, and incorporation of mobility depending on the type of edge devices.
As far as collaborative task execution goes, Galanopoulos et al. developed a framework that enables a collaborative task execution on edge devices [
29]. This task execution is based on an auction algorithm that aims to optimize the accuracy and the execution delay. The author of this paper implemented and evaluated the system performance in relation to accuracy and execution delay by implementing a facial recognition application on a Raspberry Pi. The proposed algorithm outperforms several benchmark policies.
Among many object recognition models, one of the most popular is the YOLO network which can implement a real-time application at the network edge. Haogang Feng et al. [
30] presented a benchmark evaluation and analysis of the YOLO object detector on edge class devices. In this work the YOLO object detector was implemented with an NVIDIA Jetson Xavier, an NVIDIA Jetson Nano, and a Raspberry Pi 4 in conjunction with a USB neural network compute processor. Four versions of the YOLO object detector were tested on the three edge devices using videos with different input-size windows. The findings indicated that the Jetson Nano achieves optimal performance compared to the other two edge devices in terms of high performance and cost. In particular, the Jetson Nano gets 15 FPS while running YOLOv4 tiny model.
A recent study by Galanopoulos et al. developed an object detection system at the network edge that identifies the system-level trade-offs between E2E latency and execution accuracy [
16]. The authors developed methods that optimize the system’s transmission delay and highlight the impact of the image encoding rate and size of the neural network. Their results were based on a real-time object recognition application. They showed that the image level compression as well as their neural network size are key parameters that affect object recognition accuracy and also latency from one end to the other.
A mobile augmented reality for real-time object detection for was introduced by Luyang Liu et al. [
31]. This system can achieve accurate object detection for commodity AR/MR with a frame rate of 60 fps. The authors of this work demonstrated that using low-latency offloading techniques can effectively decrease the offloading latency and the amount of bandwidth consumed. The authors prototyped an E2E system that is implemented and used on mobile devices. The results showed that the system after using smart offloading techniques and fast object detection methods to sustain the detection accuracy can reduce the false detection rage by 27–38% and increase the detection rate by 20–34%.
Cheng Dai et al. [
32] presented an enhanced object detection system for an edge-based Internet of Vehicles (IoV) system, aiming to reduce latency. They proposed a new method for object detection based on a video key-frame, which can achieve a compression ratio of 60% with an error rate of only 10%. The proposed method can also accelerate the processing of frames in the edge IoV system.
Finally, Dong-Jin-Shin et al. [
33] presented an analytic deep-learning performance evaluation using the YOLO object detector in the NVIDIA Jetson platform. The authors used general deep learning frameworks like TensorRT and TensorFlow-Lite on embedded and mobile devices to get an analytic performance evaluation. The evaluation was performed on the NVIDIA Jetson AGX Xavier platform employing CPU and GPU utilisation, latency, energy consumption, and accuracy. The evaluation model was the YOLOv4 object detector using COCO, PASCAL, and VOC Datasets. The findings indicated that the implementation of a deep learning model on a mobile device using the TensorFlow-Lite framework is the best solution. On the contrary, when implementing a deep learning model on an embedded system with a built-in Tensor core the most effective framework is the TensorRT framework.
All in all, the studied literature provides evidence that the processing of the computing workloads is flowing from the cloud to the edge and that AI workload in edge computing architectures is a rapidly emerging research field. Moreover, a great effort has been made by the research community in designing optimization algorithms to improve the performance of edge computing systems. The most common solution in task offloading approaches is that the workload is transferred to local servers which are making the computation. Our work is, as far as we know, the first study that explores the problem of executing the workloads by splitting the task from servers to end devices.
4. Design of Algorithms and Decision Variables
The goal of our experiments was to develop an edge-assisted object detection system that is tailored around the needs of the user to minimize the E2E execution time or to maximize the accuracy of the system. Hence, in this paper, we follow an experimental approach where we use our testbed to evaluate an exhaustive list of possible configurations, i.e., different splitting decisions and compression values, with respect to accuracy and delay criteria. This analysis sheds light on the dependencies and conflicts of these two criteria and allows one to select, offline, a strategy that optimizes the criterion of their preference. To make this clear, we also introduce the respective optimization problems, albeit without providing analytical expressions for the metrics, for the reasons we explained above. The aim of this research was to find the optimal trade-off of the metrics to improve E2E execution time and the accuracy of the system. The key decisions here are how to split and offload the set of images, across the end devices. This decision is taken at the server, based on the solution of our properly defined optimization problems. In the experiments, we demonstrated that the parameters which affect and improve the operation of the system are the encoding rate
q (compression value) and the batch sizes (splitting decisions) of the delivered images to every device. More specifically based on our measurements (
Table 1,
Table 2,
Table 3 and
Table 4 and
Figure 2,
Figure 3,
Figure 4,
Figure 5,
Figure 6,
Figure 7,
Figure 8,
Figure 9 and
Figure 10) we argue that the parameters which affect the accuracy and execution time are: the encoding delay
, the transmission delay
, the inference delay
and
depend on decision variables
q and the percentage of the total amount of images transmitted to each device
. Therefore, the decision variables of our system are the encoding rate
q and
,
,
.
It should be noted that we define our functions and our parameters in
Table 5; these parameters were used for constructing our optimization problem. In more detail, we define:
In Equation (
6), we calculate the three values of time from which we keep the largest value. More specifically offloading to every device in a workload means that we calculate three values of time from which we keep the time value from the device which ended the processing of the images last. In this work, we formulate two research questions or optimization problems. The first one refers to the case
Q1 where we minimize the E2E execution time. The second
Q2 refers to the case, where we maximize accuracy. The optimization problems can be written as:
from Equations (4)–(6) we define:
Taking into account our evaluation results of the YOLOv5 using the COCO dataset in
Table 1,
Table 2,
Table 3 and
Table 4, and also
Figure 9 and
Figure 10, we reveal that the edge TPU inference machine makes the processing of the workload faster than the other two inference machines, so the common logic says to send most of the images to the edge TPU inference and lesser images to the other two inference machines to answer to research question
Q1. On the other hand, to answer
Q2 we have to offload the specific batches of images to get maximized accuracy.
5. Results and Discussion
In this section, we present our results that solve the two optimization problems “manually” by finding the optimal values of the decision variables and q. We define for i = 1 the edge TPU inference machine, i = 2 the TensorFlow-Lite inference machine with 224 × 224 × 3 NN input size, and i = 3 the TensorFlow-Lite inference machine with 320 × 320 × 3 NN input size. is the target execution time requested by the user.
In
Figure 11, we observe that the Pareto plot depicts the most efficient solutions for our two optimization problems. Every solution in the Pareto plot combines three
values. Regarding the optimal value for the encoding rate, it is clearly shown in
Figure 6,
Figure 7 and
Figure 8 that the optimal value for the encoding rate is
q = 50. To answer
Q1 according to the Pareto, the optimal solution to get the minimum E2E execution time is
= 110 s. The most optimal values for
are
= 0.7,
= 0.3,
= 0. For
Q2, we define it as
= 400 (s), which means that all the feasible solutions can not be greater than 400 s. According also to the Pareto plot, the most optimal solution for
Q2 achieves
= 0.53 and
= 342 s. The values for
which achieve the maximum accuracy
= 0,
= 0.8,
= 0.2.
The main idea of this paper was to develop a cooperative task execution of an edge-assisted object detection system. In our edge computing system, we used the You Only Look Once (YOLO) version 5. We first evaluate the performance of the YOLOv5 on every end device using metrics like Precision, Recall, and mAP on the COCO dataset. The goal of our work was to offload the workload (batch of images) to three end devices that are in our testbed and try to get the minimum execution time or the maximum accuracy. We noticed that the parameter-decision variables, which affect the E2E execution time, and execution accuracy are: the encoding rate q (compression value); and the batch sizes (splitting decisions) sent to every device.
More specifically, we explored the optimal trade-offs between the parameters and decision variables to solve optimization problems “manually”. We found that the key parameter which affects an edge-assisted object detection system is the encoding rate (compression value) along with the neural network input size and the precision (INT8, float32) and also the batch sizes (splitting decisions) sent to every end device.
Actually, the effectiveness of image compression on object recognition accuracy has recently gained much attention from many researchers. The authors in [
37] showed the significance of the effectiveness of image compression or better encoding rate
q on the object detection accuracy. The system-level trade-offs between E2E latency and deep learning accuracy were recently introduced in [
16]. We tried to learn from these lessons and add value to the relevant literature.
The contribution of our work is that we propose an edge-assisted object detection system that offloads the workload from the edge server to the end devices based on the needs of the user rather than offloading the workload from the end devices to the edge server and the cloud [
32,
37].
This approach has not been previously explored. In our work, we highlight the optimal trade-offs between E2E execution time and accuracy in an edge computing architecture that other researchers have also studied. Our work, however, goes beyond that by providing a real-time edge-assisted object detection system that is tailored made according to the needs of the user by offloading the workload to the three devices. Our proposed offloading strategies can easily be implemented on end devices, which means that the latency of our system can be reduced much more.
A natural next step in this line of work is to devise algorithms that can select the configuration decisions (splitting and compressing) in a dynamic fashion. Another future research direction associated with our subject of work could be about addressing the challenge of finding more effective image compression methods like autoencoders aiming to get better results in transmission delay and also to study ways to improve the accuracy metric with other Deep Learning techniques like distillation. Therefore, it would be useful to study autoencoders as image compressors and the distillation method and see how these affect the accuracy and execution time in an edge-assisted object detection system.
The limitation we faced in this work was, as said before, that the proposed system cannot make the splitting and compression decisions in a dynamic fashion which means that the user has to take into account all the feasible solutions and choose manually the optimal decision variables the get the minimized execution time or the maximized accuracy.