*Article* **Evaluation of Machine Learning Algorithms for Early Diagnosis of Deep Venous Thrombosis**

**Eduardo Enrique Contreras-Luján 1, Enrique Efrén García-Guerrero 1, Oscar Roberto López-Bonilla 1, Esteban Tlelo-Cuautle 2, Didier López-Mancilla <sup>3</sup> and Everardo Inzunza-González 1,\***


**Abstract:** Deep venous thrombosis (DVT) is a disease that must be diagnosed quickly, as it can trigger the death of patients. Nowadays, one can find different ways to determine it, including clinical scoring, D-dimer, ultrasonography, etc. Recently, scientists have focused efforts on using machine learning (ML) and neural networks for disease diagnosis, progressively increasing the accuracy and efficacy. Patients with suspected DVT have no apparent symptoms. Using pattern recognition techniques, aiding good timely diagnosis, as well as well-trained ML models help to make good decisions and validation. The aim of this paper is to propose several ML models for a more efficient and reliable DVT diagnosis through its implementation on an edge device for the development of instruments that are smart, portable, reliable, and cost-effective. The dataset was obtained from a state-of-the-art article. It is divided into 85% for training and cross-validation and 15% for testing. The input data in this study are the Wells criteria, the patient's age, and the patient's gender. The output data correspond to the patient's diagnosis. This study includes the evaluation of several classifiers such as Decision Trees (DT), Extra Trees (ET), K-Nearest Neighbor (KNN), Multi-Layer Perceptron Neural Network (MLP-NN), Random Forest (RF), and Support Vector Machine (SVM). Finally, the implementation of these ML models on a high-performance embedded system is proposed to develop an intelligent system for early DVT diagnosis. It is reliable, portable, open source, and low cost. The performance of different ML algorithms was evaluated, where KNN achieved the highest accuracy of 90.4% and specificity of 80.66% implemented on personal computer (PC) and Raspberry Pi 4 (RPi4). The accuracy of all trained models on PC and Raspberry Pi 4 is greater than 85%, while the area under the curve (AUC) values are between 0.81 and 0.86. In conclusion, as compared to traditional methods, the best ML classifiers are effective at predicting DVT in an early and efficient manner.

**Keywords:** DVT; early diagnosis; artificial intelligence; machine-learning; smart system; embedded system; edge computing; edge device

#### **1. Introduction**

Deep venous thrombosis (DVT) is a disorder in which blood clots form within the veins, obstructing the flow of blood through the circulatory system, and it affects people of all ages [1]. The cause of the disease is unknown; however, it is thought to be caused by a combination of variables, including genetic factors. Genetic factors are also thought to have a role in the diagnosis of the disorder. In the field of engineering, there are two major challenges: patients suspected of DVT have no visible symptoms, and failing to

**Citation:** Contreras-Luján, E.E.; García-Guerrero, E.E.; López-Bonilla, O.R.; Tlelo-Cuautle, E.; López-Mancilla, D.; Inzunza-González, E. Evaluation of Machine Learning Algorithms for Early Diagnosis of Deep Venous Thrombosis. *Math. Comput. Appl.* **2022**, *27*, 24. https://doi.org/10.3390/ mca27020024

Academic Editors: Marcela Quiroz, Luis Gerardo de la Fraga, Adriana Lara, Leonardo Trujillo and Oliver Schütze

Received: 29 December 2021 Accepted: 2 March 2022 Published: 4 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

diagnose it could be fatal; without symptoms, the first test (D-dimer blood) is useless; and the use of ultrasound has high certainty but comes at a high cost and necessitates the use of many instruments [2,3]. DVT is a disease that must be recognized as soon as possible because the implications might be fatal for the patient. Several scientists have created various techniques and methods to diagnose the problem over the years, beginning in the 1970s with the development of ultrasonography [4], which marked a breakthrough in the timely diagnosis of clots in the lower limbs of the human body. Philip Wells, a renowned scientist, has stated on numerous occasions that technology, which has been revolutionized exponentially in recent years, will support the future in the early diagnosis of diseases. This, combined with new trends in the work of computer equipment, will enable great advances in science and human health.

Venous thromboembolism (VTE), the third most common vascular illness worldwide, is a complex condition impacted by various genetic and non-genetic risk factors [5]. The pathogenesis of VTE includes Virchow's triad, which provides for hypercoagulability, reduced blood flow or stasis, and damage to blood vessels due to disease or injury [6]; they are blood clots that can occur if the patient's blood flow changes or slows down somewhere in their body [7], putting the patient's life and health at risk. The annual incidence is 1 to 3 people per 1000 people. The prevalence of this condition varies with age. It can cause DVT or pulmonary embolism (PE) in some cases [1,8–10]; thrombosis can also develop in other veins such as the liver, cerebral sinus, retina, and mesenteric veins. Approximately one-third of VTE patients develop a PE, while two-thirds exclusively have DVT [11].

The Primary Care Unit (PCU) is the backbone of any health care system. The record of previous epidemics demonstrates the critical significance of PCU and necessitates PCU specialists' engagement in procedural decision making [12]. The PCU serves as the entry point to the Health System (HS), which is described as the primary level of health care. The "Health Unit Clinics" (Health Units that constitute Primary Health Care) are defined by their commitment to health promotion and protection, disease prevention, diagnosis, treatment, rehabilitation, harm reduction, and health maintenance on an individual and collective level, to provide comprehensive care that has a positive impact on the health status of communities [13]. They provide primary care services across the board, including the evaluation and diagnosis of acute illnesses and ongoing treatment for patients with chronic conditions [14].

Nowadays, numerous ways to determine the condition are available, such as statistical analysis and clinician scoring [15], D-dimer blood tests [16], infrared imaging [17], ultrasonography, and even the application of deep machine learning and Neural Networks (NN) [11,18]. Many countries, including the United States, Italy, the United Kingdom, Germany, and Canada, have pioneered artificial intelligence (AI) work in the diagnosis and prediction of DVT, with the percentage of accuracy and effectiveness steadily increasing over time as algorithms become more and more optimal and more data can be obtained from real cases. However, progress has been made in the development of NN in terms of debugging codes and developing new algorithms, but they have not been implemented outside of a computer. It should be noted that other types of prediction and analysis have very good effectiveness and accuracy, but the analysis is very expensive due to the difficulty of repeatability and reliability, since most diagnoses require two or more types of analysis.

On the other hand, it is well known that ultrasound is the standard test for the diagnosis of DVT and that it is one of the most accurate, and recently, they are also using ML techniques for the diagnosis of DVT [19]. However, the accuracy of the examinations improves with experience and the training that a sonographer gains in their working life, so the percentage is not always the same and is not very high at first [20,21]. Although there is research [22] that strives to combine Deep Learning (DL) and magnetic resonance imaging, with promising outcomes. Recently, it has been shown that the use of artificial neural network analysis can improve risk stratification of patients presenting with suspected DVT, the authors showed that an NN is able to diagnose DVT without the use of ultrasound, with a low false negative rate [23]. A new ML model was developed for the efficient, less

intrusive, and reliable diagnosis of DVT. This is based on pattern recognition techniques that help with rapid diagnosis as well as well-trained machine learning models that help with decision making and validating whether or not someone is suffering from this ailment.

In recent years, the field of data science has been pioneered in the development of hardware and software for the application of Artificial Neural Networks (ANNs) in clinical analysis, which can be useful for the diagnosis of DVT and other diseases in general, for example, the use of ML models such as Decision Trees, Support Vector Machine (SVM), and Neural Networks [24–26]. Nowadays, there are alternative methods of DVT diagnosis, some of which use AI. For example, in [6], ML models for venous thromboembolism (VTE) risk assessment in China are compared to the Padua model, with the Random Forest (RF) model having a higher specificity and sensitivity than the Padua model. The authors in [27] reported an automatic diagnosis model by using effective ML to predict the important risk factors of VTE collecting patient data of the medical ward at King Chulalongkorn Memorial Hospital from Thailand. Other efforts are being dedicated for the prediction of VTE with ML techniques in young and middle-aged inpatients; for example, [28] develop VTE risk classifiers using models based on multi-kernel learning and random optimization [29]. However, a drawback is that these systems are expensive, big, heavy, and have moderate energy consumption.

On the other hand, edge computing can minimize the reaction time, increase the data processing capacity, ensure data security (since it is closer to end-users, it provides greater privacy) [30], be easy to design, and be cheap [31]. It has excellent application value and features such as high reliability, superior energy savings [32], low latency, and high real-time processing, increasing the overall data quality and utilization performance under the premise of efficient processing [33]. Accordingly, one can take advantage of edge devices such as Raspberry Pi 4 (RPi4), which are very useful for solving real-world problems across various fields of application [34–39]. In this paper, the well-known RPi4 is used as the edge-computing device to develop the ML models and to evaluate their performance in diagnosing DVT. The cost–benefit of a clinical pre-examination based on ML is noted in the research [7], reducing the expenses of medical units and labor acquired using the standard method. The authors of [40] describe the development of a device for the treatment of DVT that uses Bluetooth communication with a mobile app and sensors within the system to collect data for statistical analysis.

The ML algorithms have advanced in the early diagnosis of DVT and other applications [41–43], moving from binary Decision Trees developed by the team of [44] to more sophisticated algorithms that integrate image analysis by AI [18] and are also very complex in that they go into up to 68 variables to give a final verdict of this disease [45]. In some investigations with very big datasets, the predictors have an area under the receiver operating characteristic (AU-ROC) of 0.83 to 0.85 [46].

For the reasons stated above, the goal of this research is to propose several ML models that are trained by using a dataset of patients with the condition. It is collected from the state of the art [10] to have good judgment and clinical analysis to determine the diagnosis of DVT in a patient with the symptomatology of the condition, with the purpose of having a timely response and thus saving many lifes. In this research, the well-known Raspberry Pi 4 (RPi4) is employed as the edge-computing device to develop ML models and assess their performance in diagnosing DVT. This is to facilitate the development of smart, portable, reliable, and cost-effective instrumentation. All of this is possible thanks to pattern recognition algorithms that provide accurate diagnoses and well-trained ML algorithms that determine whether or not a patient has the condition. The assumption is that ML algorithms will outperform today's standard approaches as a means of early diagnosis for diagnostic aid in the health sector and primary care.

The paper is organized as follows. Section 2 presents the materials and methods used to develop the ML models. In Section 3, the scoring and performance metrics of every ML model are shown; furthermore, the usage scenario is described, and the performance comparison of PC and RPi4. Finally, in Section 4, the conclusions and future work are summarized.

#### **2. Materials and Methods**

#### *2.1. Machine Learning Algorithm Training*

In this paper, we propose six ML algorithms to evaluate the occurrence of DVT in a patient: Decision Trees (DT), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest (RF), Multi-Layer Perceptron Neural Network (MLP-NN), and Extra Trees (ET). All of these ML models may be found in the Scikit Learn library [47,48]. The Scikit Learn library is built on top of NumPy, and it can be used for any kind of project. It has a lot of tools that can be used for both the pre- and post-processing of data. The flow chart for performing the ML algorithm training and testing is shown in Figure 1. First, it imports the appropriate libraries or toolboxes, such as Scikit Learn, Pandas, and Seaborn. Next, the features dataset is loaded, and the input data (features) and output data must be separated. Then, the dataset is randomly divided, with 85% used for training and cross-validation, and the remaining 15% used for testing. Next, the data are scaled between 0 and 1 to produce optimum results. Then, the ML algorithms is trained. Next, the ML model is scored, i.e., the confusion matrix and performance metrics are used to evaluate the ML models.

In this paper, a PC and RPi4 are used to train the ML models for DVT diagnosis, with the goal of testing their performance on both hardware and confirming that the RPi4's scoring parameters and performance metrics are equally as trustworthy as those on a PC. Table 1 compares the RPi4's primary technical specifications to those of a PC. While the hardware of the PC (laptop) obviously outperforms that of the RPi4, it is vital to prove experimentally that the results produced with the RPi4 are competitive to those obtained with a PC. Additionally, it is observed that the RPi4 is significantly less expensive than a PC, which would significantly lower manufacturing costs in a process of large-scale production of intelligent devices, for example, in the manufacture of hundreds or millions of smart instruments.


**Table 1.** RPi4 versus PC technical specifications comparison.

The dataset for this study was compiled from the following sources [10]. Since these data had been used previously, and only 59 real cases from a public hospital had been obtained, a data augmentation algorithm was devised. They are used to construct a dataset of 10,000 synthetic examples, which will be used for later training, validation, and testing as

well as validation of the proposed ML models. The following process was used, as shown in Figure 1, and it will be detailed in depth in each stage below.

**Figure 1.** Proposed methodology for early diagnosis of DVT.

Data augmentation is a technique that is frequently used in machine learning to enhance the size of the dataset utilized in the learning process [10] . It entails producing new instances from the original data set while maintaining the data's pattern. It is mostly used in medical contexts to augment the image collection for image-based diagnosis; see, for example [49–52]. In this paper, Algorithm (1) reported in [10] was used. It performs the data augmentation to generate each case that will comprise the set of synthetic data for training and validation of the proposed ML models. Therefore, the first task to be performed is to calculate the percentage of positive and negative cases that are present for each type of risk probability of the occurrence of DVT in addition to the percentage for which each of the factors of the Wells Score was observed in the real cases to which we had access. To calculate the percentage of suspected cases of DVT in each type of risk proposed by Wells, historical data was taken, where it is mentioned that of all the cases observed, 19% were diagnosed as DVT, while the remaining 81% had a different diagnosis. Furthermore, it is mentioned that in the cases detected as Low Risk, only 5% of the cases were diagnosed as positive for DVT, while 17% were diagnosed in Medium Risk, and 53% were diagnosed as High Risk.

The Wells Criteria, as shown in Table 2, are used to train the ML algorithms for the prediction of DVT, in which the trained models are expected to perform well in order to reach a high accuracy in the prediction of this condition.


**Table 2.** Wells criteria for predicting deep vein thrombosis (DVT), taken from [10,53–55].

#### *2.2. Pre-Processing Data*

Two criteria are taken into account that are not covered by the Wells rubric. The first is age, which is measured in numbers ranging from 1 to 9, each of which corresponds to one of the age groups listed in Table 3 [10]. The second factor is gender, which is assigned a value of 0 to males and 1 to females. They are being offered as a way to help patients with suspected DVT better stratify their risk, just as it is managed in [23], so that the data collected may be pre-processed and ML can detect the illness without difficulty. Since the input data are binary, that is, they are regarded as 1 (for positive comorbidity) or 0 (for negative comorbidity), working with them in a computer system is simple, as most media handle binary values, with the age range being the main distinction, as weighted in Table 3.

**Table 3.** Age factor pre-processing, taken from [10].


The dataset is in Comma Separated Values (CSV) format in American Standard Code for Information Interchange (ASCII), so that it can be processed more easily in the Python environment, as well as in management so that it can be saved and extracted quickly. The data from the Excel file is fed into the software on the computer, using the Jupyter Notebook platform with Python, with each header referring to the DVT comorbidity in each of the columns. After that, using the Seaborn pairplot command, plots are generated between all of the data so that each of the values may be discriminated. It is required to normalize the values of 1–9 to values between 0 and 1 for good NN training.

Equation (1) is used to normalize the data in the age column so that this factor is not the most determinant or the one with the most weight within the ML model used. In this way, all the values of each clinical characteristic will be kept between 0 and 1 except for the age, which will be a floating value, and the others being integers.

$$Norm\_{A\emptyset^{\mathfrak{c}}} = \frac{Age - Min\_{A\emptyset^{\mathfrak{c}}}}{Max\_{A\emptyset^{\mathfrak{c}}} - Min\_{A\emptyset^{\mathfrak{c}}}} \tag{1}$$

A data description is created to note specific properties of each column of information as well as the primary statistics of the values in the constructed dataset. We noted that the data is in a huge imbalance as a result of this, as it contains 7562 negative genuine cases and 2438 positive real cases.

The *train test split* function divides the dataset into 85% for training and 15% for validation, where the 11 input factors are considered clinical characteristics of the Wells criteria (*cancer, immobilization, surgery, pain, leg swelling, ankle swelling, edema, superficial veins, and previously diagnosed DVT*) and the factors of age and gender, respectively, and the output will result in the DVT diagnosis.

The K-fold cross-validation (with K = 5) is used to evaluate the performance of the ML models and perform a comparative analysis to select the model that best fits the DVT classification problem [56].

An *early-stopping* function has been constructed so that if there is no change of 0.01 in accuracy after 5 epochs, the model's training is truncated and ended, so that the training does not take too long and the percentage of accuracy of the ML model employed during training does not vary much.

#### *2.3. Hyperparameters of the ML Models*

The use of an NN with table properties was first suggested during the creation of neural networks. It has an input layer with 11 predictors (cancer, immobility, surgery, pain, leg swelling, ankle swelling, edema, superficial veins, and previously diagnosed DVT). There is no magic formula for selecting the optimum number of hidden layers and neurons. However, some thumb rules are available for calculating the number of hidden layers and neurons. A rough approximation can be obtained by the geometric pyramid rule proposed by [57,58]. In this case, four hidden layers (32–64–32–16) were found for the best performance metrics, and an output layer with the DVT diagnosis is proposed, as shown in Table 4. Since the computer is binary, it is suggested that the number of neurons per layer be multiples of 2*<sup>N</sup>* for optimal processing time, where *N* is an integer, and the number of neurons in the first hidden layer should be greater than the number of inputs, being multiples of 2*N*, ascending in each hidden layer until a maximum of 2*<sup>N</sup>* is reached and then descending with multiples of 2*<sup>N</sup>* until the last hidden layer has a number of neurons slightly greater than the number of input neurons.

**Table 4.** Proposed sequential model (NN) for DVT diagnosis.


An input layer of 11 neurons, four hidden layers of 32, 64, 32, and 16 neurons, and an output layer representing the diagnostic result make up this ANN model. The input layer's activation function is a *relu* function, while the hidden layers' activation function is *tanh*, the learning rates of the classifier are defined as constant equal to 0.001, the max iteration number is 400, and with *Adam* as the optimizer.

The same ANN model is also trained on a Raspberry Pi 4 due to hardware limitations, and the hyperparameters of the classifiers are the same as the PC, only the results are slightly different and are discussed on Section 3. Furthermore, it is suggested that ML models be used in DVT diagnosis to compare each of the models, including SVM, KNN, DT, ET, and RF classifiers. The hyperparameters dealt with by the SVM classifier are as follows: a random state of 42, a *C parameter* of 1.0, a "linear" classifier kernel, a *degree* of 3, a *gamma* "scale", and a *random state* of 3. The hyperparameters for the KNN classifier are as follows: the number of neighbors is set to 50, the *weights* are set to "distance", and the *algorithm* is set to "ball tree".

The Decision Tree classifier *criteria* used is "entropy", the *splitter* is "random", the *minimum sample divisors* is 2, the *minimum leaf samples* are given by 1, the *maximum features* are given by "auto", the *max features* are 80, and there is a *random state* of 42.

The Extra Trees classifier is employed with a *random state* of 42 and many *estimators* of 200. The *criterion* utilized is "gini", the *minimum sample divisors* is 2, the *minimum leaf samples* is 1, the *maximum of features* is "auto", and the *max features* are given by 80.

Finally, the RF classifier has several *estimators* in 480, with "gini" as the *criterion*, 2 as the *minimum sample divisors*, 1 as the *minimum number of leaf samples*, "auto" as the *maximum number of features*, true *Bootstrap*, and 42 as the *random state*. All these hyperparameters are shown in the Table 5 for every simulation in this paper.


**Table 5.** Hyperparameters of ML models.

To improve the performance of each classifier, the hyperparameters must be optimized. This can be accomplished using the GridSearchCV tool, which implements the standard estimator API. By "fitting" it to a dataset, all possible combinations of parameter values are evaluated, and only the best combination is kept. The chosen parameters maximize the score of the missing data unless an explicit score is given, in which case it is utilized instead of the default parameters for scoring [47,48]. The following classifiers have modified hyperparameters: the Random Forest and Extra Trees classifiers (Number of Estimators, Criterion, Max Features), the KNN classifier (Number of Neighbors, Weights, and Algorithm), the Decision Tree classifier (Criterion, Splitter, Max Features, and Max Depth), and the SVM classifier (C, Kernel, Degree (when using *rbf*, *poly*, and *sigmoid*)). They were distinct in each of them regarding the different possibilities dealt with. The optimization procedure is based on the "GridSearch" (GS) algorithm, which methodically calculates all possible combinations of hyperparameters. The main disadvantage is that it needs a significant amount of time and calculation [59].

The hardware used for the development of these experiments has the following specifications: AMD Ryzen 7 2.9 GHz CPU, 8 GB 3200 MHz DDR4 RAM, NVIDIA GeForce GTX 1660 Ti 6000 MB GPU, a Windows 10 operating system, and Python software with Anaconda 3. Similarly, a Raspberry Pi 4 SBC with the following specs: CPU 1.5 GHz Broadcom BXM2711, RAM 4GB 3200 MHz DDR4, Raspbian OS, and Python software with Thonny IDE was used to study the behavior of the ML in various contexts and system architectures. The following libraries utilized in this paper: Seaborn, which aids in statistical data visualization within Python; Pandas, which is a library to perform data analysis; NumPy, the platform's data manipulation library; and Matplotlib, which is Python's animated and interactive static visualization library. Scikit-Learn was used to create the ML models as well as custom neural networks.

#### **3. Results**

A two-class classification confusion matrix is developed to track the progress of the ML model trained, allowing the metrics to be validated and the process to be more dependable within the rubric by separating it into negative and positive DVT classifications, respectively.

Each of them is kept with the true diagnosis and the diagnosis predicted by the ML algorithm; the first is True Negative (TN), in which both the true diagnosis and the ML prediction are negative, and the second is False Negative (FN), in which the ML diagnosis is negative but the true diagnosis is positive for DVT. Another level of the confusion matrix is the False Positive (FP) criterion, which occurs when the algorithm diagnoses DVT as positive when the actual diagnosis is negative, and finally, the True Positive (TP) criterion, which occurs when the actual diagnosis is positive for the condition and the ML model prediction is positive, resulting in a True Positive (TP), as shown in Table 6.

**Table 6.** Confusion matrix of two-class classification, taken from [10,60].


For each ML model trained, the values of Accuracy, F1 Score, Precision, Recall, Specificity, and the area under the curve (AUC) are acquired and printed using sklearn metrics, the acquisition of Accuracy, Precision, Recall, and the ROC curve was accomplished in the case of the Multi-Layer Perceptron NN (MLP-NN), and the Accuracy (2), F1 score (3), Specificity (4), and Recall (5) values are calculated using the following equations taken from [24,61].

$$Accuracy = \frac{True\ Positive + True\ Negative}{True\ Positive + False\ Positive + True\ Negative + False\ Positive} \tag{2}$$

$$F1 - Score = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall} \tag{3}$$

$$\text{Specificity} / \text{Precision} = \frac{\text{True Negative}}{\text{False Positive} + \text{True Negative}} \tag{4}$$

$$\text{Sensitivity} / \text{Recall} = \frac{\text{True Positive}}{\text{False Negative} + \text{True Positive}} \tag{5}$$

These actions are carried out for both PC and Raspberry Pi 4 metrics acquisition. Table 7 shows each of the scoring parameters gathered by each ML model (SVM, KNN, Decision Tree, Extra Trees, Random Forest), the ANN Multi-Layer Perceptron (MLP) model was also registered in the same way; each one includes the metrics (Accuracy, F1 Score, Precision/Specificity, and Recall/Sensitivity) as well as the values that each of the boxes of the two-class confusion matrix (True Positive, True Negative, False Positive and False Negative) gave, allowing you to see how these findings are obtained.

**Table 7.** Scoring parameters of the ML algorithms evaluated in this study using 15% of the separated data for testing.


In terms of prediction model validation, there are two basic approaches utilized as selection criteria for a prediction model: (i) The hold-out model and (ii) the K-fold cross validation model. Both have the feature of utilizing a subset of the dataset for training and keeping a portion for validation. The K-fold cross-validation is a method that is utilized as a selection criterion for a prediction/classification model [56]. Essentially, it entails using a subset of the dataset to construct the model and leaving another portion of the dataset to validate it. The K-fold cross-validation procedure runs K times and averages the classification results for each interaction. As shown in Figure 2, it entails partitioning the dataset into k segments and selecting a different section to test the model K times. In contrast, the remaining K-1 elements are used to train the ML model [10,56]. The average values computed in the loop are the performance metrics supplied by K-fold cross-validation. This method is computationally expensive, but it does not waste a lot of data (unlike setting an arbitrary validation set), which is a big plus in applications such as inverse inference when the number of samples is small [47,48].

**Figure 2.** Scheme of the K-fold cross-validation for the proposed ML models, inspired from [47,48].

Table 8 shows the average results of K-fold cross-validation corresponding to each ML model. In this study, we use K = 5 folds, and this assisted in validating all the scoring parameters of each ML model. The Accuracy, F1 score, Precision, Recall, and ROC-AUC were the metrics that could be achieved through this cross-validation; the ML model that had the best overall performance was the KNN, with the best scoring parameters, followed by RF in second place. Later, we found the Extra Trees model, in fourth place is MLP-NN, in fifth place are the Decision Trees, and last but not least is the SVM classifier.

**Table 8.** Average results of K-fold cross-validation with K = 5 of the ML models trained on PC.


Regarding the final evaluation (testing), Tables 7 and 9 show the the outcomes of performance metrics when the ML models were tested with 15% of the dataset that was randomly split from the original dataset; i.e., this 15% of the data was kept separate from the dataset used for training; therefore, they had influence neither during training nor in cross-validation. As indicated in Table 7, the best ML model for this issue using PC is KNN, which has the greatest metrics. MLP-NN is second, Random Forest (RF) is third, Extra Trees is fourth, Decision Trees is fifth, and lastly, we have the SVM model that is sixth. Within these rubrics, the accuracy values of the ML models are very good, ranging from 85.4% (SVM) to 90.4% (KNN), as well as the specificity of the model, which is somewhat deplorable in the case of the SVM, being the lowest with 69.37 %, but the others far exceed it, with the KNN classifier reaching 80.66%. It also possessed a superior comparative response to the angiologist physician metrics of 73.82% in accuracy and a specificity of 71.43% [10], proving to be a better technique to detect DVT.

Despite its restricted technology, the Raspberry Pi 4 achieves good results, attaining the same metrics as the PC with no differences, demonstrating the strength of the SystemOnChip (SoC), which is ideal for moving this type of diagnosis to smart devices. Despite

the embedded system's limited computational capability, excellent metrics for a portable intelligent system are attained, outperforming the Wells technique in a typical approach.

Receiver Operating Characteristic (ROC) curves can be generated using the aforementioned data, which indicate how well the model can distinguish between two objects. They are key metrics for assessing an ML model's performance. Furthermore, they are employed in binary classification issues, i.e., problems with two distinct output classes. The connection between the model's True Positive Rate (TPR) and False Positive Rate (FPR) is depicted by the ROC curve.

Both the ROC curve on PC and the ROC curve on Raspberry Pi 4 have a similar response; the KNN model has a larger area under the curve, making it more visually appealing, which is complemented by the scoring metrics mentioned in Table 7, which is followed by the Random Forest (RF) classifier, as shown in Figure 3.

Similarly, it is possible to obtain PR curves (Precision–Recall curves), which are a useful measure for observing prediction success when classes are highly disequilibrated or unbalanced. In information retrieval, precision is a measure of the relevance of the results, whereas recall is a measure of the number of truly relevant results that are returned. The PR curve depicts the trade-off between Precision and Recall at various levels. A high area under the curve indicates both High Recall and High Precision, with High Precision corresponding to a low False Positive Rate and high Recall corresponding to a low False Negative Rate. High scores in both cases show that the classifier is delivering accurate (High Precision) results as well as the bulk of positive outcomes (High Recall).

Figure 4 shows that the PR curves of the Raspberry Pi 4 and PC are similar; the KNN classifier has a better PR curve over all classifiers in both cases, having a larger area under the curve covered within the graph, making it one of the best classifiers, followed by the Extra Trees classifier.

**Figure 3.** ROC curves. (**a**) ROC curve on PC, and (**b**) ROC curve on Raspberry Pi 4.

**Figure 4.** PR curves. (**a**) PR curve on PC, and (**b**) PR curve on Raspberry Pi 4.

The next step is to obtain the performance metrics to evaluate the ML models; we rely on the metrics of the Scikit learn library. The performance metrics are the Area Under the Curve (AUC) using the trapezoidal method, the Cohen's Kappa coefficient, Hamming Loss, and Matthew's correlation coefficient, all of which are achieved on a PC and a Raspberry Pi 4 correspondingly, which are listed in Table 9.

According to Table 9, the KNN classifier is the best binary ML classifier for PC in terms of performance metrics, followed by Random Forest (RF) in second, the Extra Trees classifier in third, MLP-NN model in fourth, Decision Trees classifier in fifth, and last but not least, the SVM classifier. The system's measurements show an AUC ranging from 81.04% with the SVM model to 86.80% with the KNN classifier. The Hamming Loss is another metric that goes from SVM at 14.53% to the best with a lower percentage, the KNN model at 9.60%.

**Table 9.** Performance metrics of the six ML algorithms evaluated in this study using 15% of the separated data for testing.


For its part, the Raspberry Pi 4 achieved good results, which are identical in theory to those acquired by the PC, when all of the parameters involved in each of the classifiers, during training, and determining the performance metrics and score were taken into account.

The average time of each of the proposed classifiers to be used within this problem was obtained both on PC and Raspberry Pi 4. Figure 5 shows the average time of ten training runs of each model with their respective characteristics, both on the computer and on the SoC, to analyze the cost–benefit of each of the proposed classifiers to be used within this problem. Due to factors of processing power operations, it turns out to be faster than the Raspberry Pi 4, with the Decision Trees model being the fastest, followed closely by the KNN, and further behind the Extra Trees, all with a training time of less than a second, then the SVM, before reaching 2 s, the RF classifier, and finally, the MLP-NN being the slowest of all with a time greater than 30 s. The training on the Raspberry Pi 4 takes two to five times longer due to the embedded system's processing limitations, with times of 0.02 s in Decision Trees, 0.1 s in KNN models, 2.81 s in Extra Trees, 3.12 s in SVM, 6.89 s in RF models, and finally 175.85 s in RF models.

**Figure 5.** ML algorithms training time.

#### *Usage Scenario*

These ML models could be implemented on an embedded system such as the Raspberry Pi 4 (RPi4) to develop a DVT diagnostic smart system. This could be integrated with a color sensor (RGB), heart rate (BPM), and temperature (ºC), as well as a user interface (GUI) that may include some questions according to the Wells criteria. Furthermore, the physician can acquire the raw Wells criteria for each patient to be diagnosed with this proposed system. The smart system will have a trained ML model into which the selected patient's data will be entered, and it will provide a diagnosis prediction for the patient's condition. The prospective apparatus proposed by this research can be shown in Figure 6. As discussed before and shown in Table 1, the RPi4 is less expensive than a PC, which would result in significant cost savings associated with the large-scale production of intelligent devices, such as the manufacture of hundreds or millions of smart instruments.

**Figure 6.** Suggested usage scenario. (**a**) Block diagram of proposed system and (**b**) proposed system outline.

#### **4. Conclusions**

Multiple ML classifiers were assessed for the prediction of DVT in the lower limbs according to Wells criteria. They were subjected to different score and performance metrics to assist with identifying the dependability of each one. The results of each of the created models were subjected to cross-validation. The experimental results show that the KNN model is the best in terms of performance and score metrics (higher accuracy (90.40%), higher specificity (80.66%), ROC-AUC (86.80%), and PR-AUC (86.16%)), but it is second in terms of execution time (0.01904 s) followed by the MLP-NN model, which is the slowest in terms of execution time (30.08 s), but gives us the second best accuracy (89.40%). The KNN classifier, on the other hand, among the models trained on the Raspberry Pi 4, has the same score and performance metrics as on the PC; the main difference is in the execution time, as it takes 0. 0951 s to train the model, making it the second in this category; however, in real terms, it is possible to wait a little longer for a portable result, and in second place is the MLP-NN classifier, with an execution time of 175.8485 s, making it the slowest. The accuracy of all trained models on PC and Raspberry Pi 4 is greater than 85%, while the AUC values are between 81 and 86%. In conclusion, as compared to traditional methods, the best ML classifiers were effective at predicting DVT diagnosis in a timely and efficient manner.

**Author Contributions:** Conceptualization, E.E.G.-G. and E.I.-G.; Data curation, D.L.-M.; Formal analysis, E.T.-C.; Funding acquisition, O.R.L.-B.; Investigation, E.E.C.-L. and E.I.-G.; Methodology, E.E.C.-L.; Project administration, O.R.L.-B.; Resources, O.R.L.-B.; Software, E.E.C.-L. and E.I.-G.; Supervision, E.E.G.-G. and E.I.-G.; Validation, E.E.G.-G., E.T.-C. and D.L.-M.; Visualization, E.E.G.-G. and O.R.L.-B.; Writing—original draft, E.E.C.-L.; Writing—review and editing, E.T.-C. and E.I.-G. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Universidad Autónoma de Baja California (UABC) through the 22th internal call with grant number 679.

**Data Availability Statement:** The dataset, pair plot, and trained ML models are shared as supplementary material. The trained ML models can be loaded using the joblib library. Furthermore, supplemental material is available at: http://doi.org/10.17632/dpyngbvc47.1 (accessed on 14 February 2022), and https://data.mendeley.com/datasets/dpyngbvc47/1 (accessed on 14 February 2022).

**Acknowledgments:** The authors would like to thank INAOE and UDG for accepting researcher E. Inzunza-González to carry out his sabbatical stay. The authors are very grateful to Conacyt for the scholarship awarded to E.E.C.-L. Thanks are given to PRODEP for supporting the academic groups to increase their degree of consolidation. The authors appreciate the reviewers' remarks and recommendations to enhance the paper.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**

