**6. Proposed Solution**

The proposed solution was designed with two main objectives: medical data classification and medical image detection. Each model is described in detail in this section.

#### *6.1. Advanced Parallel K-Means Clustering*

In order to implement the modified parallel *k*-means clustering on the mobile execution unit and the SoC, the algorithm had to be modified to take advantage of a multi-core generalpurpose processor and a multi-core neural engine. Each operating system offered a unique set of utilities for parallel operation.The iOS environment, due to its use of Objective-C programming, has an additional tool called dispatch queues, in addition to standard tools, such as processes and threads. Although iOS is a multi-tasking operating system, it did not allow multiple processes for a single program, resulting in only one procedure being available.

However, the Android OS had a limitation in its Java and Kotlin programming languages, which was the hardware-limited access and lack of pointer support, making it difficult to fully utilize the system hardware. A lightweight process is a thread of any type. Threads share memory with their parent process while processes themselves do not. This led to issues when two threads simultaneously modified the same resource, such as a variable, resulting in illogical outcomes. In the iOS environment, threads were a finite resource on any POSIX-compliant system. Only 64 threads could be active at once for a single process. While this is a large number, there were logical reasons to exceed this limit.

The overall processing, as shown in Figure 1, of the on-device parallel clustering consisted of two jobs: managing the dataset and clustering execution, and performing the parallel *k*-means clustering itself. The general-purpose processor cores were responsible for managing the clustering in the neural engine cores. After executing the *k*-means clustering on a sub-block of the data, each core sent the centroid point-value to the general-purpose cores. The general-purpose cores then evaluated whether the centroid value was less than the centroid threshold. If it was less, a signal was sent to the execution mechanism to process the clustering again.

Figure 2 shows a flowchart of advanced parallel *k*-means clustering on the neural engine and general-purpose cores.

#### *6.2. Advanced Classification Solution*

Pre-processing medical data with advanced parallel *k*-means clustering was a useful technique to improve the classification performance of logistic regression algorithms. *K*-means clustering is a machine-learning algorithm that is used to partition a dataset into a specified number of clusters. By using advanced parallel techniques, it is possible to process data more efficiently and quickly.

Pre-processing the medical data with *k*-means clustering improved the accuracy and precision of the logistic regression algorithms by ensuring the data were simpler to classify. The *k*-means algorithm divided the data into clusters based on similar characteristics, such as age or sex. This assisted in reducing the noise and the complexity of the data, making it simpler for the logistic regression algorithm to accurately classify the data.

In addition to improving the accuracy and precision of the classification process, pre-processing the medical data with *k*-means clustering also reduced the computational resources required to operate the logistic regression algorithm. By reducing the size and complexity of the dataset, it was possible to operate the logistic regression algorithm more efficiently and quickly.

After clustering the data using *k*-means clustering, the next step in the process was to perform the logistic-regression classification. The steps for performing parallel logisticregression classification were the following:


**Figure 2.** On-device parallel clustering flowchart.

In Algorithm 2, the input is the data *D* and the number of processors or devices *n* to be used for parallelization. The output is the trained logistic-regression model *M*. The data were pre-processed and split into training and testing sets. The parallelization method was chosen, and the training data were then partitioned into smaller chunks. A separate logistic-regression model was trained on each chunk of data, and the models were combined to form the final model. The model was then evaluated on the testing data and the returned results.



The algorithm had two input parameters. The first was the clustered dataset, which included a new feature extracted by the clustering process. The second input was the number of chunks into which the dataset would be partitioned. The number of partitions depended on the number of neural engine cores available, with each chunk trained on a single core. The standard CPU cores handled general tasks, such as data partitioning; reading and writing data for the neural engine cores; combining models (M1, M2, ..., Mn); and evaluating models.

#### Classification Pre-Processing

Using *k*-means clustering as a pre-processing step could potentially improve the performance of the logistic-regression classification in several ways:


In the proposed parallel logistic regression, the weighted-combination method assisted in forming the final logistic-regression model from individual models that had been trained by each processor or device. An overview of the process is provided:


The weighted-combination method can be an effective way to leverage the power of multiple processors or devices to train logistic-regression models in parallel. By assigning weights to each individual model, the final model can benefit from the strengths of each model while mitigating their weaknesses.
