4.2.3. Embedded ML

Conventional IoT devices are ubiquitous and low-cost, but natively resource-constrained, which limits their usage in ML tasks; however, data generated at the edge are increasingly being used to support applications that run ML models. Until now, edge ML has been predominantly focused on mobile inference, but recently several embedded ML solutions have been developed to operate in ultra-low-power devices, typically characterized by its hard resource constraints [97]. Recently, a new field of ML, known as Tiny ML, was put forward to enable inference at the edge endpoints. ML inference at the edge can optimize the overall computational resource needs, increases privacy within applications, and enhances system responsiveness. TinyML, which has been coined due to its ML inference power consumption of under a milliWatt, overcomes the power limitations of such devices, enabling low-power and low-cost distributed machine intelligence. TinyML is an open-source ML framework specifically designed for resource-constrained embedded devices. It is fully compatible with several low-cost, globally accessible hardware platforms and was designed to streamline the development of embedded ML applications [114].

TinyML technologies and applications target battery-operated devices, including hardware, algorithms, and software for on-device inference and data analytics at the edge. In [115], MLCommons, an open engineering consortium, presented a recent benchmark (MLPerf™ Tiny Inference v0.5). This inference benchmark suite targets ML use cases on embedded devices by measuring how rapidly a trained NN can process new data in ultra-low-power devices. Embedded ML is a new field in which AI-based sensor data analytics is carried out near to where the data are collected in real time. The benchmark presented in [115] focuses on a number of use cases that rely on tiny NNs (i.e., models lower than 100 kB) to analyze sensor data such as audio and video to provide intelligence at the edge of the network. The benchmark consists of four ML tasks that include the use of microphone and camera sensors in different embedded devices:


This benchmark aims to measure performance for ML in embedded systems, which operate at a microwatt level and include cameras, wearables, smart sensors, and other IoT devices that demand a certain level of intelligence. Thus, the objective of the benchmark is to measure the performance of such constrained systems in order to achieve higher efficiency over time. The results have been reported based on the embedded ML approach and its hardware and software. Table 3 compares the benchmark results for distinct embedded hardware when running a trained model by measuring the processing latency in milliseconds (i.e., how fast systems can process inputs to produce a valid result) and the respective consumed energy in μJ [116].

#### *4.3. Edge-AI Computational Cost*

Computation needs for AI are growing rapidly. Recent numbers show that large AI training runs are doubling every 3.5 month and, since 2012, the computational needs have increased by more than 300,000 times [117]. In recent years, a lot of effort has been put into increasing AI accuracy and, especially with DL, accuracy has increased at a steady pace. This increase in accuracy has been very important in making AI a reality in real-world applications; however, to run such high accuracy models, more and more computational resources need to be considered. In the short and medium term, AI will face major challenges that put its sustainability and ecological footprint into perspective. Due to the explosion of its use in several application domains, increased pressure on computational resources is already happening, not only to train but also to run these models, which are increasingly more accurate but also, computationally heavier.

Due to this novel and more sustainable practices regarding AI implementation and deployment are ye<sup>t</sup> to come. In [118], Schwartz et al. introduced the concepts of Red and Green AI, as a way to clarify and distinguish the two major currents AI approaches.

Red AI is known for relying on large models and datasets, as its performance is typically evaluated through accuracy, which is usually obtained through the use of massive processing power. In this context, the relation between model performance and model complexity is known to be logarithmic (i.e., an exponentially bigger model is required for a linear improvement in performance [119]). Furthermore, the quantity of training data and the number of tuning experiments, present the same exponential growth [118]. In each of these cases, a small performance improvement comes at an increased computational cost.

Green AI, on the other hand, focuses on achieving results without increasing or, preferably, lowering computational costs. Unlike Red AI, which results in rapidly increasing computing costs and, as a result, a rising carbon footprint, Green AI has the opposite effect [118]. In Green AI, efficiency is usually prioritized over accuracy when evaluating performance. As a result, Green AI focuses on model efficiency, which includes the amount of effort necessary to create a given result using AI, the amount of work required to train a model, and, if appropriate, the total of all tuning experiments. Efficiency may be assessed using a variety of metrics, including carbon emissions, power consumption, real-time elapsed time, number of parameters, and so on.


**Table 3.** MLPerf™ Tiny Inference v0.5 benchmark results. Data from [115].

#### *4.4. Measuring Edge-AI Energy Consumption and Carbon Footprint*

The overall cost of using AI can be obtained by considering the resources involved in all processing stages, which include energy consumption and CO2 emissions.
