4.2.2. Edge-AI Levels

Besides the well-known Cloud Intelligence (CI), which consists in training and inferencing the DNN models fully in the cloud, EI, as described in [110], can be classified into the six levels depicted in Figure 5. The quantity of data sent up to the cloud tends to decrease as the level of EI increases, resulting in lower communications bandwidth and lower transmission delay; however, this comes at the cost of increased computational latency and energy consumption at the network's edge (including IoT nodes), implying that the EI level is application-dependent and must be carefully chosen based on several criteria: latency, energy efficiency, and privacy and communications bandwidth cost.

Inference and training are the two main computing stages in an NN. Depending on the Edge-AI level (as illustrated in Figure 5), the computational power is typically distributed between the IoT node or the edge layer, which requires increased computational power. In recent years, AI-specific hardware accelerators have enhanced high-performance inference computation at the edge of the network, namely in embedded and resource-constrained devices. For example, in [111], Karras et al. present an FPGA-based SoC architecture to accelerate the execution of ML algorithms at the edge. The system presents a high degree of flexibility and supports the dynamic deployment of ML algorithms, which demonstrate an efficient and competitive performance of the proposed hardware to accelerate AI-based inference at the edge. Another example is presented in [112] by Kim et al., where they propose a co-scheduling method to accelerate the convolution layer operations of CNN inferences at the edge by exploiting parallelism in the CNN output channels. The developed FPGA-based prototype presented a global performance improvement of up to 200%, and an energy reduction between 14.9% and 49.7%. Finally, in [113], the authors introduce NeuroPipe, a hardware managemen<sup>t</sup> method that enables energy-efficient acceleration of DNNs on edge devices. The system incorporates a dedicated hardware accelerator for neural processing. The proposed method enables the embedded CPU to operate at lower frequencies and voltages, and to execute faster inferences for the same energy consumption. The provided results show a reduction in energy consumption of 11.4% for the same performance.

**Figure 5.** Edge-AI Levels and model inference computation architectures: on-device, edge-based, and joint.
