**1. Introduction**

Thoracic pain is one of the generally most relevant factors in people with cardiovascular problems at risk of heart attacks. However, despite its relevance in this area, chest pain may be an indicator of some other pathology not related to CVD. In 2015, the WHO (World Health Organization) recorded 17.7 million deaths related to CVD, where 42.8% were due to coronary heart disease and 36.15% to cerebrovascular accidents [1]. While the World Heart Federation in 2017 reported that in Mexico, 77% of deaths were due to NCD (Non-Communicable Diseases), where 24% of these were caused by CVD [2]. In 2018, the INEGI (National Institute of Statistical Geography) reported in Baja California 149,368 cases of death from CVD, where ischemic diseases represented 72.7%, while hypertensive diseases were 15.9%; the rest were split between pulmonary vascular disorders and acute rheumatic fever, among others [3].

Since CVDs are involved with a large percentage of the causes of death in Baja California, a decision was made to analyze a database with information from 258 patients provided by Medica Norte with variables such as Edad, Género, Fumador, HTA, Dyslipidemia, Diabetes, ERC (Cr basal), Suma FRCV, C. Isquémica previa, PPT, Rangos PPT,

**Citation:** Rojas-Mendizabal, V.; Castillo-Olea, C.; Gómez-Siono, A.; Zuñiga, C. Assessment of Thoracic Pain Using Machine Learning: A Case Study from Baja California, Mexico. *Int. J. Environ. Res. Public Health* **2021**, *18*, 2155. https://doi.org/ 10.3390/ijerph18042155

Academic Editor: Tim Hulsen

Received: 26 December 2020 Accepted: 10 February 2021 Published: 23 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Tipo dolor, TnT Ingreso, TnT Curva (4 h), ECG, Tipo Alteración, TC > 100, IC, Alta precoz, UDT, Ingreso, Ergometría, Eco stress, Cate, Angio TAC, IAM, Revascularización (See Appendix A); and thus, with the help of Orange, data analysis was carried out to find which biochemical markers or habits are mostly related to thoracic pain of cardiac origin, to more accurately locate the risk factors involved in development of a cardiac event and dismiss as an emergency those patients with chest pain who do not meet the conditions established for the development of CVD. With these results, a proposal for second parameters to take into account in emergency rooms is produced to avoid possible deaths caused by thoracic pain.

For this analysis, two variables based on troponin were considered, since it is in charge of establishing the frequency of cardiac muscle contraction, which, when affected by a heart attack, is released and can be used as a bio indicator [4]. According to a 2019 study, Troponin has a positive predictive value of 62%, while its negative predictive value is 93% for cardiac lesions [5]. Therefore, the first variable was TnT Ingreso, where troponin levels were measured in the blood of patients on arrival at the emergency room, and the second was TnT curve (4 h), which are the levels of troponin found in the blood of admitted patients four hours later.

When a patient arrives at the emergency room with chest pain, he is evaluated with an exam known as PreTest Probability (PPT), which helps choose the most accurate method of analysis to determine the type of pain in the patient. This PreTest consider variables like gender, age, and some symptoms such as typical angina, atypical angina, or non-anginal pain. Later, depending on the values of these variables, a percentage is established that can be part of one of the four ranges used, and this range will determine the probability that the pain present is due to CVD or not [6].

Among the conventional predictive methods to assess the etiology of thoracic pain are the SCORE (Systematic Coronary Risk Evaluation), ASCVD (AtheroSclerotic Cardio-Vascular Disease) Risk Estimator, and Framingham. The SCORE method is adapted from the guide for CVD prevention in 2016 carried out by a project with the same name, which is based on the calculation of risk factors for the prediction of possible CVDs at 10 years in European patients [7]. On the other hand, ASCVD Risk Estimator evaluates the risk that the patient has of atherosclerosis since this disease affects the arteries causing CVD. While the Framingham method is the most widely used and oldest, since it dates back to 1948, the risk of CVD using this method is calculated by assigning a value to variables related to the patient's condition and subsequently making a summation that will indicate the risk of developing CVD within 10 years [8].

Nowadays, machine learning technologies, deep learning, and artificial intelligence have been a meaningful tool for the healthcare industry. Thus, its classification and patterns recognition capabilities for applications enable the image processing for treatable diseases diagnosis. In addition, predictions based in mathematical models algorithms using databases to classify different diseases related with a specific system and variables correlation to find possible factors associated with high risk of mortality and chronic diseases are used as decision making tool. The way these tools work is by simulating the human brain functioning, with the greatest advantage in big data processing capabilities. This technology offers methods such as supervised learning based (Random Forest, Support Vector Machine, and Artificial Neural Network), unsupervised learning based (capable of finding patterns of unlabeled data and cluster), and hybrid methods based on trial and error (Reinforcement Learning) [9–11].

#### **2. Materials and Methods**

For the data analysis employed for this paper, we used Orange software version 3.23. This software offers a visual programming environment that allows analyzing data from statistics to machine learning by using interconnected "widgets" that indicate the flow that data must follow and functions applied to data. To analyze the database provided

by Clinic Medical Norte to find the secondary variables to consider a thoracic pain with a cardiac origin, we used 17 widgets.

Five different machine learning algorithms available in the Orange data mining toolkit [12], including k-nearest neighbor (kNN), decision tree, support vector machine (SVM), random forest, and logistic regression, were employed in this study. To evaluate the classification models, we use a 10-fold cross-validation strategy, where the original samples were randomly partitioned into ten equal-sized subsamples, and we retained a single subsample as validation data for testing. For this analysis, we use the following tools of Orange:

Data


Models

	- Evaluation:

#### *2.1. Description of the Database*

The database (provided by the Clinic Medical Norte) contains 27 data items from 256 patients (See Appendix B). The average age of the participants included in this study is 60 years. Table 1 presents the assessment criteria used in the patients of the Clinic Medical Norte.

**Table 1.** Assessment criteria used for patients.


Note. Adapted from *Prehospital Medical Emergency Manual* (p. 334), by A. Pacheco-Rodríguez, A. Serrano-Moraza, J. Ortega-Carnicer, F. Hermoso-Gadeo, 2001 [13], Madrid, España: Aran Ediciones. Copyright 2001 by Aran Editions.

### *2.2. Machine Learning Models for Thoracic Pain Evaluation*

Figure 1 shows the thoracic pain management guide. To create these models, we use the variables that provide post-disease information, such as medications. Furthermore, according to the clinical practice guideline, the variables used as a diagnosis were eliminated [14,15].

**Figure 1.** Thoracic pain management guide.

To identify the most influential variables in the different created models, a classification of these variables was done by assigning to each one a score, with the lower scores being indicative of greater importance. For this analysis, we considered a sample of 256 patients, and two machine learning techniques were used: Tree classification and cross-validation. For statistical analysis, the "distributions" tool from Orange was used.

#### **3. Results**

The database provided by Clinic Medical Norte is formed by 256 patients, of which 35.66% had an IAM. Of those who suffered from an IAM, 63.04% had dyslipidemia, 50% suffered from CKD, 71.74% had diabetes, 36.96% had Hypertension, 72.42% were smokers or smoke, and 54.35% were men.

#### *3.1. Tree Classification*

As mentioned before, in this model, the target was IAM where the decision tree suggested six factors to determine if the person with thoracic pain was in risk to present an IAM; these factors were found as the current considered in the emergency room when a patient with chest pain arrives. Another target examined was the variable of Risk Factors for Cardiovascular Disease (FRCV), this target was considered as a categorical variable, which showed if the patients suffered from a disease of had a negative result in clinical tests, and the result of the decision tree revealed the proposed secondary factors to evaluate if a thoracic pain has a cardiac origin or does not. In Table 2, the results from botch tree classification analysis are shown.

**Table 2.** Comparison between Acute Myocardial Infarction (AMI) and Risk Factors for Cardiovascular Disease (FRCV) used as a target for Tree classification.

