*2.3. Research Gap*

The literature review presented in this section has evidenced the current research gap with no earlier reference architectures on DAIaaS and no implementations of skin disease diagnosis on fog and edge. This is the first research where a reference architecture for DAIaaS is proposed and implemented, and a healthcare disease diagnosis service is developed and studied in grea<sup>t</sup> detail, with a catalog containing several AI and Tiny AI services supported on multiple software, hardware, and networking platforms, as well as several use cases evaluated using multiple benchmarks. The services are designed to enable different use cases such as a patient at home taking images of their skin lesion and performing the diagnosis by themself with the help of a service, or a travelling medical professional requesting a diagnosis from a fog device or cloud. The users of the service can be patients, medical professionals, the family members of the patient, or any other stakeholder. Similarly, the services can be used by someone who has the disease diagnosis model, or the image, or both, by requesting the required resource (image or the model) from other providers. The novelty and high impact of this research lies in the developed reference architecture, the service catalog offering many services, the potential for the implementation of innovative use cases through the edge, fog, and cloud, and their evaluation on many software, hardware, and networking platforms, as well as a detailed description of the architecture and case study.

Commenting on the specific application we have selected for this paper, i.e., skin disease diagnosis (this comment applies to similar applications), it is important to note that having an accurate disease diagnosis model is not enough; the deployment of the model for real-time usage is an essential part of the AI system development. This includes where and how the model is going to be installed. First, both model size and complexity will influence the processing or inference time, especially with resource constrained devices. In addition, the emerging trend of virtual and mobile services including healthcare services, which are required as a result of the current COVID-19 pandemic, will require innovative and flexible architectures to support them. Therefore, the development of quick and accurate diagnosis methods for physicians must intrinsically consider in their designs the distributed architectures that these diagnosis methods will be deployed on.

### **3. Imtidad Reference Architecture, Methodology, and Service Catalog**

This section describes our proposed Imtidad reference architecture for creating distributed AI services over the cloud, fog, and edge layers and describes the service catalog, service use cases, and the service evaluation benchmarks. The section is organized as follows. The reference architecture overview is provided and elaborated in Section 3.1. A series of use cases (e.g., a user takes a photo of a lesion on their skin and instantaneously

attempts to diagnose it using their preferred service from the service catalog) are outlined in Section 3.2. An implementation of the reference architecture using a service catalog, designed as part of this research, is described in Section 3.3. A description of execution platforms is provided in Section 3.4. The metrics that have been used to evaluate and compare the services are defined and explained (service energy consumption and service values) in Section 3.5.

### *3.1. Reference Architecture and Methodology Overview*

The Imtidad reference architecture is proposed as a blueprint and procedure for decoupling applications and AI and streamlining the design and deployment of distributed AI services over the cloud, fog, and edge layers. Figure 2 depicts the Imtidad reference architecture for the skin disease diagnosis case study. The figure can be considered an insanitation or refinement of the Imtidad reference architecture for a given application; in this case skin disease diagnosis. The architecture lists all required services to create new DAIaaS services from the selection of the application to service production and operations. Each of the rectangular blocks (e.g., Service Design) in the figure can be considered a component or a service, and these services can independently and asynchronously talk to each other to create services and service catalogs.

**Figure 2.** Imtidad reference architecture.

Figure 3 depicts a sequential workflow diagram for creating a skin disease diagnosis catalog. It is created by refining Imtidad Reference Architecture. The service development and deployment process begins with a selection of an application domain, in this case, skin disease diagnosis. A dataset is required for the selected application, so that the designed model may be trained and validated. The dataset acquisition process includes dataset validation and pre-processing in preparation for training. Then, Deep Learning models are designed, trained, optimized, and validated. First, the TensorFlow (TF) model is generated, then, an optimized version is created, which, in this case, was the TensorFlow Lite (TFLite) model. Use cases are determined considering possible scenarios and business models. After that, different types of services may be designed to provide support in a series of scenarios. A service catalog is created to communicate and present various service models to users (see Table 3 and Section 3.2 for details). In addition, service providers need to find a way to benchmark services by developing evaluation metrics such as service values, energy consumption, and response time. Several execution platforms and networks are selected, and the designed services are deployed. When the services are ready for operation, the users can choose one of the services from the catalog and send their diagnosis request. External opinion might be required for validation, in this case healthcare professional opinion can be used to validate the predicted diagnosis. Validation can be done by users, service designers and providers, or a third party such as auditors.

**Figure 3.** Workflow Diagram for creating a skin disease diagnosis catalog refined from Imtidad Reference Architecture.

### *3.2. Service Use Cases*

Use cases are identified considering possible scenarios and business models for provisioning distributed AI services and skin disease diagnosis services, over the cloud, fog, and edge layers. These have been used to design a variety of services that suit different conditions and requirements. Services are listed in a services catalog for the user to select one of them and use it to diagnose a lesion image. The design of skin disease diagnosis services involves and concerns all parties including patients, patients' families, medical professionals, and, even, service providers. Patients and medical professionals are the direct users of the system and they are looking for instantaneous results and services available all the time and everywhere, while service providers aim for users' satisfaction by providing high QoS and at the same time protecting their product and copyrights.

Local services in smartphones, where model and image classification tasks are performed locally in the user device, guarantee a real-time response with no requirement for an Internet connection, and will preserve the user's privacy as the images stay on the user's

device. This kind of service can be used by patients or doctors anywhere using their own smartphones. However, this will only work if the user's device has the required resources needed to store and run AI models, and model accuracy may be compromised when converted into the Tiny version. On the other hand, remote services in smartphones, would extend the service capability and enable collaboration between edge devices. Services from nearby devices can be used when the users' devices are either unable to process the image locally or they are looking for more accurate results. In this case, users can collaborate and provide services to each other without having to share their models. In addition, the DL model service providers may also want to keep their model's copyrights and not share them, and at the same time, they want to guarantee service availability. To accomplish that, the service provider can provide a secure device (smartphone) in the facility (e.g., clinic) or with the medical professional to carry anywhere. In this case, skin images will be sent to the local device in the local network but not through the internet, which will provide some level of privacy for the users.

**Table 3.** The Imtidad service catalog.


Mobile devices (smartphones) are limited in their capabilities, therefore, devices such as laptops, NVIDIA Jetson nano, and Raspberry Pi can be used in edge or fog layers to run more complicated models or serve a large number of users simultaneously. These devices can be provided by service providers can and placed in hospitals, clinics, or, even, homes, to serve medical professionals and other users. Devices at the edge or fog layers would increase service availability and the level of user privacy and security. Nevertheless, they are incomparable with the cloud where resources are almost unlimited. The cloud is the original service provisioning platform for AI applications though services provided from the cloud have a higher latency and more congested networks. Services at the cloud can be used in case other local services at edge or fog layers are busy or absent. Moreover, DL

model service can be resides in the cloud, and data or local models can be uploaded to it for model retraining to improve the global model accuracy.

### *3.3. Service Catalog*

The service catalog lists all diagnosis services with their characteristics for the users to choose from. Diagnosis services are responsible for image classification. A total of 22 services are produced from a combination of various types of services, devices, and models (see Table 3) that suit different purposes. For each service, the service type, layers, devices, network, and models are listed. There are four different skin disease diagnosis service types, namely, local mobile service, remote mobile service, gRPC service, and containerized gRPC service. These services can be run on different layers of the network architecture including cloud, fog, and edge. Seven different devices are used for evaluation that varies in their capabilities. Google cloud virtual machines (VMs), a laptop, an NVIDIA Jetson nano, two Raspberry pi (4G and 8G), and two mobile devices (Samsung Galaxy S9 and Samsung Galaxy Note 4). Wi-Fi local area network (LAN) and the Internet wide area network (WAN) are both considered, including fiber and cellular networks. An Internet connection is required for cloud communications, but all other levels are deployed in the local network which means that their traffics is going through a Wi-Fi modem. Nevertheless, they may be deployed farther than this on a base station on other LANs close to the user. The four developed models (A, ALite, B, and BLite) are considered for all devices, though only ALite and BLite are possible for some devices due to device capability limitations. This service catalog is designed for our specific case study to show a practical example of service catalogs. This means that all sorts of devices and networks could be used to design the user's services, and they are not limited to what is specified here. Table 4 lists the acronyms and their definitions that have been used use throughout the paper for the 22 services in the service catalog.


**Table 4.** The acronyms used for the services and service definitions.


**Table 4.** *Cont.*

### *3.4. Devices and Hardware Platforms*

Seven different execution platforms are adopted in the service catalog. Google Cloud Run is selected for the cloud services which is a serverless platform that facilitates running invocable Docker container images via requests or events. Services are the main resources of the Cloud Run and each has a unique and permanent URL. Services are created by deploying a container image on infrastructure that is fully managed and optimized by Google. Service configuration includes maximum allocated memory limit, number of assigned virtual CPUs (vCPUs), and maximum number of requests (concurrent requests). An HP Pavilion laptop has been used as the fog node in our experiments. It comprises an Intel® Core ™ i7-8550U CPU and 8 GB Memory. The CPU has a total of 4 cores and 8 threads with a base frequency of 1.80 Ghz and a maximum single-core turbo frequency of 4.00 Ghz. Two types of single-board computers have been used NVIDIA Jetson nano and Raspberry Pi. NVIDIA Jetson nano is a platform designed by NVIDIA to run AI applications at the edge. The used Jetson Developer Kit is equipped with 128-core NVIDIA Maxwell ™ architecture-based GPU, Quad-core ARM® A57, and 4 GB 64-bit Memory. Figure 4 gives a brief of Jetson nano specifications and a picture of the device. Raspberry Pi is a tiny and low-cost single-board computer. Several generations of Raspberry Pi have been released during the years. In this research, two Raspberry Pi 4 Model Bs have been used. Both cards have the same Quad-core ARM Cortex-A72 processor, but one has 4 GB memory and the other has 8 GB memory. Figure 5 gives a brief of Raspberry Pi specifications and a picture of the device. Two Samsung smartphones have been used, Galaxy S9 and Galaxy Note 4. Samsung Galaxy S9 comes with ARM Mali-G72 GPU and Octa-Core CPU (Quad-Core Mongoose M3 and Quad-Core ARM Cortex-A55), Samsung Galaxy Note 4 comes with ARM Mali-T760 GPU and Octa-Core CPU (Quad-core ARM Cortex-A57 and Quad-core ARM Cortex-A53), and both have 4 GB memory. Figure 6 gives a brief of the smartphone's specifications and provides pictures for both smartphones. A full depiction of the Imtidad testbed is given in Section 4.

	-

**Figure 4.** NVIDIA Jetson Nano.

	-

### **Figure 5.** Raspberry Pi 4 Model B.

	-

	-
	-

**Figure 6.** Samsung Galaxy Smartphones.

These platforms can be located in different layers at cloud, fog, or edge. The main difference between these layers is the place where processing occurs. The cloud is located far away from the users on datacenter/s and accessed through an Internet connection, Wide Area Network (WAN). On the other hand, fog is located near users and the edge, on the same Local Area Network (LAN) or a near LAN, and it does not require an Internet connection. Fog devices might be located in streets, base stations, houses, cafes, hospitals, etc., to serve local users, while the cloud is designed to serve a large number of users. The cloud provides resources on-demand and can scale up easily. Though cloud and fog might have the same type of CPUs, cloud can increase the number of located CPUs on request or with high demands while fog resources are limited. In our case study, the cloud is the Google datacenter, specifically the Google Cloud Run platform. For the Containerized gRPC Service, two CPUs are allocated with an 8 GB memory limit and 80 concurrent requests at a time. The Fog is the HP Pavilion Laptop with an Intel® Core™ i7-8550U CPU and 8 GB Memory. Other devices on the LAN, such as NVIDIA Jetson nano and Raspberry Pi, can also be referred to as fog but for simplicity, we only refer to the laptop as Fog.

### *3.5. Service Evaluation*

To provide a way to evaluate various services in the service catalog, service energy consumptions and service values have been used as evaluation metrics. The estimated service energy consumption (*et*) for each task is calculated as an aggregated value of the data transfer energy consumption and the device energy consumption (Equation (1)).

$$
\varepsilon\_t = (\varepsilon\_n \* d \* | t) + (\eta \* p) \tag{1}
$$

The first part of Equation (1) calculates the data transfer energy consumption where *εn* is the estimated energy of a gigabyte transfer on a network of type *n*. Andrae and Edler [76] energy consumption estimations of wired fixed access network, wireless access network, and Wi-Fi for 2020 have been used in the calculation. The used energy consumption averages are 0.195 kWh/GB, 0.5435 kWh/GB, and 0.12 kWh/GB for network types Fiber, 4G, and Wi-Fi, respectively. The term *d* is the size of the transferred data for each task,

including both request and response packets. The term *t* is the average network time which is calculated as the difference between the response and processing time. The second part of Equation (1) calculates processing energy consumption for the service device, where *η* is the estimated device processing energy, which varies depending on the type of device and its specification (see Table 3 for the devices' energy-related data). The term *p* is the average processing time for each request. The terms *d*, *t*, and *p* are all averages of data collected from the experiments.

Relative values are calculated to compare two absolute values to each other, which in return provides a better way to compare service-to-service values than the absolute values such as response time, process time, energy consumption, etc. Two relative values are computed service energy value (eValue) and service speed value (sValue), as a way to benchmark different services in terms of their accuracy, energy consumption, and speed (response time). Service eValue provides accuracy-to-energy relative value, considering model accuracy and service energy consumption. Equation (2) is used to calculate the services eValue, where *et* is the estimated service energy for each task using Equation (1) and *a* is the model accuracy, which represents the percentage of true disease prediction. The model accuracy is discussed in detail, for each model, in Section 4.3. Service sValue provides accuracy-to-speed relative value considering model accuracy and service response time. Service sValue is calculated using Equation (3), where *rt* is the average response time for each task and *a* is the model accuracy.

$$\text{eValue} = \frac{a}{\frac{\varepsilon\_t}{\varepsilon\_t}} \tag{2}$$

$$\text{sValue} = \frac{a}{r\_l} \tag{3}$$

Note that the purpose of computing service value is to define a method for benchmarking services and it can be considered independent of the parametric values in the equations, such as *et*, *εt*, *εn*, *η*, etc., as they can be replaced by more accurate and specific values.

### **4. System Architecture and Design (Skin Lesion Diagnosis Services)**

This section describes the design of the proposed distributed skin disease diagnosis services. Figure 7 gives a depiction of Imtidad testbed including its devices and platforms both hardware and software. The testbed consists of one NVIDIA Jetson nano card, two Raspberry Pi cards, two Samsung smartphones, one HP Pavilion Laptop, and access to the Google Cloud Run platform. All these are connected through a wireless connection and equipped with the required software platforms. The white box on the bottom lists the software platforms used in the Imtidad testbed. The specifications of each device have been discussed in detail in Section 3.4, and the rest of this section will explain the whole system architecture and its components in detail.

This section is organized as follows. First, an overview of the system is provided and elaborated in Section 4.1, then each service is discussed in detail in the rest of the section. Section 4.2 discusses available skin datasets and the selected dataset for model training. The DL model service and model design and evaluation are described in Section 4.3. The following sections discuss each service as follows: Section 4.4 the mobile local service, Section 4.5 the mobile remote service, Section 4.6 the gRPC service, Section 4.7 the containerized gRPC service, and Section 4.8 the diagnosis request service.

### *4.1. System Overview*

The case study presented in this paper focused on the classification of the diagnoses of common pigmented skin lesions through Deep Learning-based analysis of multi-source dermatoscopic images, to elaborate on our distributed Deep Learning DL-as-a-service reference architecture. A service catalog, containing 22 different services, has been designed and implemented to investigate the proposed Imtidad reference architecture. These services belong to four service classes (or service types) that are distinguished by their varying communication and software platforms (containerized gRPC, gRPC, Android, and Android

Nearby). Android service class is referred to as "Mobile Local" and the Android Nearby service class as "Mobile Remote". The services are executed on a range of platforms or devices (both terms are used, platforms, and devices, interchangeably according to the context) including Google Cloud (Compute Node), HP Pavilion Laptop, NVIDIA Jetson nano, Raspberry Pi Model B (8 GB), Raspberry Pi Model B (4 GB), Samsung Galaxy S9, and Samsung Galaxy Note 4. These devices could exist in one or multiple of the three distributed system layers, cloud, fog, and edge. Service performance has been evaluated on fiber, cellular, Wi-Fi, and Bluetooth networks, although the designed services are IP-based and can use any IP-based networks. The 22 distributed AI services are based on four different Deep Learning models for skin cancer diagnosis, two of these are standard Deep Learning models, called Deep Learning "Model A" and "Model B". The other two models are the lighter versions of the Deep Learning models A and B called "ALite" and "BLite". The lighter models are Tiny AI models created using the Google platform TensorFlow Lite. The performance of all four models has been evaluated for all the devices, except for Raspberry Pi Model B (4 GB) and the mobile devices that were unable to execute standard models (A and B) due to the device resource limitations.

**Figure 7.** Imtidad testbed: devices and platforms.

The developed system follows a service-based design architecture rather than a component-based architecture. As services are self-contained, loosely coupled, reusable, and programming language-independent components, they provide flexibility and are easy to deploy on various platforms. Figure 8 shows the system architecture, consisting of six different services: DL model service, mobile local service, mobile remote service, gRPC service, containerized gRPC service, and diagnosis request service. The arrows linking various services show the communication among them. The DL model service is responsible for designing, implementing, training, retraining, and optimizing DL models using TensorFlow. It provides two types of models: the TF\_model and the TFLight\_model. Four different types of services have been designed that provide skin image diagnosis (classification) services, namely, mobile local service, mobile remote service, gRPC service, and containerized gRPC service, which are explained in detail in later sections. The diagnosis request service is used by users to request skin disease diagnosis from one of the diagnosis services. The user takes or selects a skin image from their drive. Then, one of the services

is selected from the provided service catalog, and a request is sent to it. Depending on the service type, a connection is established with the provider and the image is sent to the provider for classification (diagnosis). When the results are sent back, they are presented to the user.

**Figure 8.** System architecture and design (skin lesion diagnosis services).

Algorithm 1 is the master algorithm for creating new DAI services following the proposed reference architecture (see Figure 2). The algorithm comprises a list of six services that are designed and instantiated. They are shown in Figure 8, in addition to dataset acquisition and service catalog creation. The parametrization of services is used to show the instantiation of services on different devices. For instance, mobile local services are only instantiated on mobile devices while gRPC services are instantiated on various devices including PCs, laptops, Jetson Nanos, and Raspberry Pis.

**Algorithm 1:** The Master Algorithm: Create\_Services(skin\_disease\_diagnosis)

**Input**: ServiceClass skin\_disease\_diagnosis

**Output**: service\_catalog


Algorithm 2 is a generalized algorithm for the four types of skin image diagnosis (classification) services: mobile local service, mobile remote service, gRPC service, and containerized gRPC service. It explains the service provisioning procedure followed by diagnosis services. The main function is get\_diagnosis, which is called by the diagnosis request service. It takes a skin image as input and returns a list of probabilities of each class of skin disease.


