(d) Prescriptive Analytics

Decisions in the prescriptive analysis must be based on a wide range of practical alternatives, which can enable decision-makers in an organization to diagnose emerging opportunities or problems and recommend the best course of action to capitalize on the analysis provided in time while also taking into account the consequences and expected outcomes of decisions [37]. This analysis method automatically synthesizes Big Data and provides insights into a large number of possible outcomes before an analysis is performed. This information can be used by the decision-maker to support their actions. Prescriptive analytics give advice on what should be performed, what the best outcome will be, and how they can obtain it.

#### **3. Artificial Intelligence in Medical Field**

The use of artificial intelligence in medical research has the potential to lead to extremely sophisticated e-Health [38]. Machine learning (ML) is recognized as one of the most important scientific fields that can be integrated into the processes of diagnosis, prognosis, and even the treatment of diseases with the help of clinical decision support systems [39]. Another point about using machine learning techniques in healthcare is the elimination of human involvement to some degree, which reduces the likelihood of human error. This is particularly relevant when processing automation tasks; tedious routine work is where humans make the most errors [17]. In contrast, deep learning is a subfield of machine learning, which is a more sophisticated method that enables computers to automatically extract, analyze, and grasp relevant information from unstructured data by mimicking human thinking and learning [40]. Due to the volume of data generated for each patient, machine learning techniques have enormous potential in the healthcare field. The algorithms listed below are commonly used in health informatics.

•K-Nearest Neighbor Algorithm

We can define the k-nearest neighbor (k-NN) technique as a non-parametric algorithm, which means that the data set determines the model's structure. This is the reason why it is widely used; it does not rely on theoretical mathematical assumptions [41]. It also belongs to so-called "lazy" algorithms, which means that it does not need to learn or train all the data used in the prediction phase, and all the data can be used for the "test" phase. As a result, data learning is faster, and prediction is slower and more expensive and is thus more time and memory-consuming.

• Support Vector Machines (SVM)

Support vector machines, or SVMs, are a group of techniques used in classification and regression. They belong to a family of generalized linear classifiers. SVM is a practical method for classifying data. Typically, training and testing data for a classification task comprise certain data instances. Each instance in the training set includes a goal value and a number of other attributes. SVM classification is an example of fully supervised learning. Known labels aid in determining whether or not a system is on the right track [42]. According to [43], the SVM classifier has superior performance compared to other classifiers based on machine learning. Arrhythmic beat classification is used for anomaly detection in the electrocardiogram. The following figure (Figure 5) depicts a Support Vector Machine (SVM) model in two dimensions.

**Figure 5.** SVM model in two dimensions.

•K-Means Clustering Techniques

It has been demonstrated that data clustering is a useful technique for identifying structures in medical datasets. The k-means partitioning algorithm is one of the most popular and widely used clustering algorithms, and it belongs to a larger class of learning techniques that do not require unsupervised learning [44]. Clustering a dataset using k-means is simple. The fundamental idea is to find k centroids, one for each cluster, and link each element to the closest centroid, as long as the number (k) of clusters (groups) to be formed is predetermined.

•Artificial Neural Networks

Artificial neural networks streamline representations of the brains of living things, particularly humans. Their functions and the structure of biological neural networks are similar to those of biological neurons in the brain. They attempt to combine the function of the human brain with a strictly abstract mathematical way of thinking, thus distinguishing artificial intelligence from biology and the classical function of computers [45]. Figure 6 depicts the fundamental structure of the algorithm.

**Figure 6.** A neural network's basic structure.

However, scientists, with the source inspired by the structure of the biological neuron, have managed to create an equivalent model of the so-called artificial neuron. A biological neuron receives input signals in the form of electrical impulses in its dendrites, processes them, and then transmits them to neighboring neurons via the axis and synapses. The primary goal of using artificial neural networks is to solve specific problems or to work autonomously in certain processes, such as image recognition. The issue of opacity in artificial neural networks is of critical concern, especially in safety-critical applications where the ability to comprehend and interpret decisions is paramount. Due to the blackbox nature of neural networks, it can be challenging to identify potential sources of error

or bias, hindering our understanding of the underlying mechanisms behind decisions. While generating explanations or using more interpretable models have been proposed to address this issue, they may reduce accuracy or increase complexity. Therefore, it is essential for researchers and practitioners to weigh the trade-offs that are involved in using neural networks in safety-critical contexts and ensure that their use is justifiable and appropriately evaluated.

• Application of Machine Learning in Healthcare

There has been a considerable amount of research in recent publications to diagnose, predict or identify diseases. Nowadays, a variety of diseases are extensively diagnosed using different machine learning (ML) algorithms because of improvements in processing power and substantial studies on the subject [46]. The authors in [47] proposed a computational approach that relies on the SVM algorithm to predict Alzheimer's disease by utilizing gene and protein sequencing information. According to the obtained results in their research, the accuracy of their technique for Alzheimer's disease detection was 85.7%. U. Ahmed et al. [48] designed a framework consisting of two types of models: an SVM model and an ANN model. In order to predict if a patient has diabetes or not, these models examine the dataset to identify if a diabetes diagnosis is positive or negative. The prediction accuracy of their suggested fused technique was 94.87%. S. Thapa et al. [46] suggested a method for detecting Parkinson's disease patients based on feature selection and support vector machines. Based on the experiment's findings, TSVM can be a better classifier for a problem involving binary classifications such as Parkinson's disease delineation. To track the characteristics of brain tumors and improve detection efficiency, the authors in [49] developed a convolutional neural network-based model and MRI detection technology. This research model's main function is to segmen<sup>t</sup> and recognize MRIs: it employs a convolutional layer to improve recognition efficiency. Zheng et al. [50] used fusion k-means and SVMs to identify breast cancer. K-means were used in the experiment to identify the different hidden patterns of cancerous and benign tumors. H. K. van der Burgh et al. [51] merged clinical information from individuals with amyotrophic lateral sclerosis (a condition that results in the loss of neurons that regulate voluntary muscles) with MRI pictures. By using deep neural networks and this data, scientists were able to predict survivorship. M. Ghiasi et al. [52] designed a model dubbed the classification and regression tree (CART) model to detect coronary heart disease based on a decision tree learning algorithm. When compared to the reported targets, the results of the CART models showed the highest possible accuracy for coronary heart disease diagnosis (100%). D. Brinati et al. [53] created an interactive decision tree model to help clinicians identify COVID-19-positive patients using blood test analysis and machine learning instead of a PCR test. Their research demonstrated the feasibility and utility of using the latter two tools as an alternative to polymerase chain reaction (PCR) testing. While authors in [54] built a system based on the electronic medical record to help doctors categorize and prioritize patients in the emergency department, their system uses image data transformation as an input and a convolutional neural network algorithm as a classifier, to select patients who should go to the emergency department. The model presents a good performance of 0.86%.

In summary, Table 3 depicts in detail the applications of different machine learning techniques in healthcare analytics.

As shown in Table 3, machine learning methods can be used for a variety of applications, such as disease diagnosis, patient risk stratification, drug discovery, and resource optimization. The choice of algorithm depends on the specific use case and the type of data being analyzed. Some algorithms, such as logistic regression and decision trees, are wellsuited for binary classification tasks, while others, such as clustering and neural networks, can be used for unsupervised learning and more complex tasks. While machine learning algorithms can be powerful tools for healthcare analysis, it is important to consider their limitations and potential biases. Machine learning algorithms should be validated and tested to ensure their accuracy and reliability in real-world healthcare settings.


**Table 3.** Recent uses of machine learning in the healthcare field.

The capacity of researchers is greatly facilitated by open access to epidemiological, management, and clinical data in the health sector, which should help increase the volume of data and improve the quality of scientific research, as well as the scientific reach of institutions and the research community. In fact, the dominant trend in healthcare, which promises the most significant innovations, is that of data-driven patient care. Recording and collating all a patient's information provides a more accurate picture of the care being performed and, in general, of population health management. It can also reduce inappropriate drug prescriptions and, in many cases, save lives.

#### **4. Big Data Technology Stack in Healthcare**

Once the fundamental issues regarding the use, collection, and managemen<sup>t</sup> of big data in healthcare have been understood, it is appropriate to explore the tools provided by technology for data use. As is almost always the case in areas of software use, there is also in the use of big data the possibility of choosing between the use of open-source software and commercial solutions, which require the use of financial resources. The chosen platform must, in any case, manage data entry, processing, storage, and retrieval, as well as provide data analysis capabilities. This section presents the main options available.

#### *4.1. Infrastructure and Virtualization*

To be able to store and process huge amounts of health data efficiently, hardware resources ranging from highly scalable storage systems to computing resources for data centers, and HPC systems are required. For this purpose, there are three subareas: cloud and grid solutions, data centers and HPC systems [78]. Cloud solutions provide the user with the illusion of virtually infinite computing and storage resources and thus allow companies and researchers to easily acquire them. Cloud solutions hide the details of the proposed hardware and rely on technologies for implementing large data centers. Data centers are needed for building cloud infrastructures as well as for in-house companies to provide computing and storage resources. For data centers, commodity hardware is primarily used to scale horizontally in a cost-effective manner.
