*6.4. Stage 2: YOLOv4 Image Detection*

After processing the images, the second stage of scanning the images commenced using the YOLOv4 algorithm, which could handle and detect objects in images at high speeds. Objects were easier to identify and detect in the pre-processed images due to the image content being segmented into consistent data aggregates.As shown in Figure 4, every object detector began by compressing and processing the images using a convolutional neural network backbone, which could then be used to make predictions at the endpoint of the image classification. To detect objects, several bounding boxes had to be constructed around images, requiring the concatenation of the convolutional feature layers of the backbone and the convergence of all the layers of features in the backbone at the neck.

**Figure 4.** YOLOv4 image detection architecture.

The YOLOv4 system utilized image-resizing, non-maximal suppression, and a single convolutional neural network to identify objects. It generated multiple bounding boxes and class probabilities simultaneously. Although the system was efficient for detecting objects, it could have difficulty identifying the locations of smaller objects precisely.

The input images were divided into an S × S grid, with each grid cell responsible for identifying an object if the centroid of the object was within that grid cell. Using information from the entire image, each grid cell predicted the bounding boxes (B) and the confidence ratings for those boxes. These confidence scores represented the likelihood that an object was present in the box, as well as the accuracy of the object class prediction. The confidence score was defined as:

$$conf = Pr(class\_i|obj) \times Pr(obj) \times IoI\_{pred}^{truth} \tag{1}$$

where

$$Pre(obj) \in [0, 1] \tag{2}$$

here, *Pr(object)* denotes the likelihood that there will be an object in the grid cell, and *Pr(classic|obj)* denotes the likelihood that a particular object will appear based on the presence of an item in the cell.

#### *6.5. Stage 3: K-Means–YOLOv4 Clustering*

YOLOv4 used Bag of Specials, which is a technique that adds minimal delays to inference times while significantly enhancing performance. The algorithm evaluated various activation functions. As features flowed through the network, the activation functions were altered, as depicted in Figure 5. Using conventional activation functions, such as ReLU, had not always been sufficient to push feature creation to its optimal limit, which has led to the development of novel techniques in the literature to slightly improve this method.

**Figure 5.** Proposed solution architecture.

To summarize Stage 3 as an algorithm, Algorithm 4 was written. As shown in the algorithm, the YOLOv4 detector received the clustered image before initializing the YOLOv4 layers on it. The clustered images had clustered pixels, which improved the performance of the layers in recognizing the objects, contents, and features of the images.

#### **Algorithm 4** *K*-Means–YOLOv4 Classifier

**Require:** Image Dataset **Input:** Random Centroid Points **Start:** Clustering Pixels **while** *pixels* = *end* **do Select:** Neural Engine Core **Assign:** Processing to Core **Calculate:** Mean Value **Set**: Pixel to Cluster **end while Run:** YOLO's Backbone on Clustered Image **if** Image Contains (COVID) **then Flag:** Image as Affected **else Flag:** Image as non-Affected **end if Output:** Classified Image

#### **7. Performance Evaluation and Datasets**

Performance metrics were crucial for evaluating both classification and detection techniques. In addition, the experiment environment, datasets, and data preparation used to assess these metrics were equally important. Therefore, this section provides a detailed explanation of the performance metrics, datasets, and environment, as they related to the obtained results and the implementation of the proposed solution.The dataset consisted of a diverse set of information, which was classified into four distinct categories. The dataset was split into a training set (70%) and a testing set (30%), with the training set being used to train the machine-learning algorithms and the testing set being used to evaluate their performance. The machine-learning algorithms were applied for classification, using features extracted through the feature-engineering process. The proposed algorithm was compared to various categorization approaches and was found to be highly effective on X-ray images in the experiments. The proposed solution was implemented using the Dart ARM-based programming language, which is suitable for resource-constrained mobile devices, along with specialized deep-learning code for machine-learning engines on mobile devices. For iOS devices, the Swift programming language was utilized, which is known for its ease-of-use and safety features, while Kotlin (the native Android language) was employed for Android devices. This approach allowed for the solution to be easily implemented on different mobile devices and platforms, providing a more versatile and widely accessible solution.

The *k*-means-YOLOv4 approach was evaluated on mobile devices equipped with machine-learning engines, including an iPhone 11 Pro Max with a dedicated 16-core machine-learning processor and the Samsung S22 with a system-on-a-chip, featuring a 16-bit floating-point neural processing unit (NPU). The testing dataset was divided into two categories: X-ray images and CT-scan images.
