*Article* **Robust Classification and Detection of Big Medical Data Using Advanced Parallel** *K***-Means Clustering, YOLOv4, and Logistic Regression**

**Fouad H. Awad 1,\* , Murtadha M. Hamad <sup>1</sup> and Laith Alzubaidi 2,3,4,\***


**Abstract:** Big-medical-data classification and image detection are crucial tasks in the field of healthcare, as they can assist with diagnosis, treatment planning, and disease monitoring. Logistic regression and YOLOv4 are popular algorithms that can be used for these tasks. However, these techniques have limitations and performance issue with big medical data. In this study, we presented a robust approach for big-medical-data classification and image detection using logistic regression and YOLOv4, respectively. To improve the performance of these algorithms, we proposed the use of advanced parallel *k*-means pre-processing, a clustering technique that identified patterns and structures in the data. Additionally, we leveraged the acceleration capabilities of a neural engine processor to further enhance the speed and efficiency of our approach. We evaluated our approach on several large medical datasets and showed that it could accurately classify large amounts of medical data and detect medical images. Our results demonstrated that the combination of advanced parallel *k*-means pre-processing, and the neural engine processor resulted in a significant improvement in the performance of logistic regression and YOLOv4, making them more reliable for use in medical applications. This new approach offers a promising solution for medical data classification and image detection and may have significant implications for the field of healthcare.

**Keywords:** medical data; medical imaging; data classification; image detection; YOLOv4; logistic regression; machine learning; AI; deep learning

**1. Introduction**

The advancement of digital medical technology, coupled with the exponential growth of medical data, has led to biomedical research becoming a data-intensive science, resulting in the emergence of the "big-data" phenomenon, as reported in the literature, such as in [1]. Data have become a strategic resource and a key driver of innovation in the era of big data, transforming not only the way biomedical research has been conducted, but also the ways in which people live and think, which has been highlighted in studies such as [2]. To capitalize on this, the relevant departments in the medical industry should focus on collecting and managing medical health data and use this information as a foundation for later developments through the integration, analysis, and application requirements required to employ big data in the medical field [3].

Big medical data and image detection is an essential element of healthcare that plays a critical role in the storage, organization, and analysis of medical information [4]. the effective classification of medical data enables the efficient retrieval and examination of patient records, which can aid in the diagnosis and treatment of illnesses. It can also assist in identifying trends and patterns in patient health data, enabling healthcare professionals to

**Citation:** Awad, F.H.; Hamad, M.M.; Alzubaidi, L. Robust Classification and Detection of Big Medical Data Using Advanced Parallel *K*-Means Clustering, YOLOv4, and Logistic Regression. *Life* **2023**, *13*, 691. https://doi.org/10.3390/ life13030691

Academic Editor: Daniele Giansanti

Received: 30 January 2023 Revised: 24 February 2023 Accepted: 28 February 2023 Published: 3 March 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

recognize potential risk factors and take preventative measures. Furthermore, medical data classification has facilitated the advancement of new treatments and therapies by allowing researchers to analyze large datasets and uncover potential correlations and trends [5].

COVID-19 data classification has involved organizing and labeling data related to the coronavirus pandemic, such as information about confirmed cases, deaths, and vaccination rates. These types of data have often been used to track the spread of the virus and inform public health decisions. Image detection techniques have been used to identify COVID-19 related images, such as X-ray scans showing lung abnormalities associated with the virus. These techniques have assisted healthcare professionals and researchers better understand and track the spread of the virus.

However, there have been several challenges and problems associated with COVID-19 data classification and image detection. One major challenge has been ensuring the accuracy and reliability of the data being used. There have been errors and biases in the data that affected the results. Additionally, there have been privacy concerns related to collecting and using personal health data. There have also been technical challenges in developing and implementing image detection algorithms, such as difficulties in obtaining a sufficiently large dataset for training. Overall, addressing these challenges is crucial in order to effectively use data and image detection techniques to understand and combat the COVID-19 pandemic.

In this study, an efficient and high-performance solution to enhance the accuracy of medical data classification and image detection was proposed. Advanced *k*-means clustering was merged with both classification and detection techniques to elevate the performance and accuracy of these techniques [6]. To evaluate the performance of medical data classification, a large medical dataset was used. Furthermore, to evaluate the effectiveness of the detection technique, a dataset comprising X-ray COVID-19 and CT images was utilized. The results indicated that the proposed models significantly improved the performances of classification and detection. The proposed model's contributions were the following:


This paper is divided into seven sections. The introduction addresses the significance of medical data classification and medical image detection. Section 2 discusses various data classification and image detection algorithms, including their advantages and limitations. Section 3 addresses the current challenges and features of solutions for processing large amounts of medical data and images. Section 4 presents the proposed solution. Section 5 outlines the methodology and performance metrics used to evaluate the proposed solution. The implementation and results of the proposed solution are presented in Section 6. Section 7 concludes the paper.
