Next Article in Journal
Nutritional Composition and Uses of Chia (Salvia hispanica) in Guatemala
Previous Article in Journal
Predicting Gastric Cancer Molecular Subtypes from Gene Expression Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Electronic Health Records Exploitation Using Artificial Intelligence Techniques †

by
Carla Guerra Tort
1,*,
Vanessa Aguiar Pulido
2,
Victoria Suárez Ulloa
3,
Francisco Docampo Boedo
4,
José Manuel López Gestal
4 and
Javier Pereira Loureiro
1
1
CITIC-Research Center of Information and Communication Technologies, University of A Coruña, 15071 A Coruña, Spain
2
Department of Computer Science, University of Miami, Coral Gables, FL 33146, USA
3
Institute for Biomedical Research of A Coruña (INIBIC)-Fundación Profesor Novoa Santos, 15006 A Coruña, Spain
4
Instituto Médico Quirúrgico San Rafael, 15009 A Coruña, Spain
*
Author to whom correspondence should be addressed.
Presented at the 3rd XoveTIC Conference, A Coruña, Spain, 8–9 October 2020.
Proceedings 2020, 54(1), 60; https://doi.org/10.3390/proceedings2020054060
Published: 9 September 2020
(This article belongs to the Proceedings of 3rd XoveTIC Conference)

Abstract

:
The exploitation of electronic health records (EHRs) has multiple utilities, from predictive tasks and clinical decision support to pattern recognition. Artificial Intelligence (AI) allows to extract knowledge from EHR data in a practical way. In this study, we aim to construct a Machine Learning model from EHR data to make predictions about patients. Specifically, we will focus our analysis on patients suffering from respiratory problems. Then, we will try to predict whether those patients will have a relapse in less than 6, 12 or 18 months. The main objective is to identify the characteristics that seem to increase the relapse risk. At the same time, we propose an exploratory analysis in search of hidden patterns among data. These patterns will help us to classify patients according to their specific conditions for some clinical variables.

1. Introduction

The electronic health record (EHR) represents the digital version of a patient’s medical history. In an EHR system, data is stored in a collection of tables where each record corresponds to a patient’s healthcare episode. EHRs constitute a rich source of information, including demographic data (age, gender, address, ...), administrative data and a wide range of clinical information (clinical notes, diagnoses, procedure-treatments, lab test, medical imaging...) [1,2,3]. The knowledge extracted from EHRs can be used in clinical decision support, epidemiological and predictive tasks, population care improvement and pattern recognition [2,4]. For this reason, the exploitation of EHRs has aroused interest of researchers in the last years [5,6]. Nevertheless, EHRs have some characteristics that make this goal hard to achieve. Heterogeneity, noise, incompleteness, redundancy or the inconsistent representation of data are some of the challenges to cope with. In this context, exploratory analysis and preprocessing steps play a fundamental role [7,8].
Artificial Intelligence (AI) has become a key tool for EHR exploitation. Machine Learning and Deep Learning have been successfully used to identify new risk factors, patterns and medical associations [6,9]. In addition, recent studies show the potential of these modern techniques to make predictions better than the traditional existing methods [9,10,11].
In this project, we propose the use of AI to exploit and extract value from EHR data. More concretely, we focus our study on the analysis of relapse rates in patients suffering from the most prevalent diagnoses in our data set. We consider as a relapse the return of a disease time after its apparent overcoming. We will construct a Machine Learning model to predict whether a patient will have a recurrence in less than 6, 12 or 18 months (depending on diagnosis). This model will allow us to identify the characteristics that seem to increase the relapse risk in those patients. At the same time, we will carry out exploratory analysis in search of hidden patterns among data. We hope the results help us to classify patients according to their specific conditions.

2. Data Set Description

Anonymous patient data were extracted from the San Rafael Hospital database. Records range from January 2000 to January 2020. Main diagnoses and procedures are encoded in both ICD-9 and ICD-10, so the data is divided in two codification sets. ICD-9 set consists of 156,362 records and 89,211 patients. ICD-10 set consists of 32,069 records and 25,013 patients. More information about the sets is given in Table 1. Demographic and clinical features acts as predictive variables.

3. Present Work

Currently, the study is in the preprocessing phase. The most frequent diagnoses have been identified by a descriptive study of the data set. Table 2 shows these main diagnoses and the associated recounts. Among all the most prevalent diagnoses, we have selected those related to respiratory problems. We discarded the diagnoses of traumatology and varicose veins because they were not considered relevant to this specific research.
After selecting the ICD-9 and ICD-10 codes of interest, the sets will be unified in order to procure a larger and completed data set. Null and missing values will be removed to ensure data quality. In addition, the Machine Learning models will be defined. Once we obtain a clean data set, the next steps will allow us to recognize the most explanatory predictive variables for the chosen diagnoses. For this task, we will apply a Principal Component Analysis (PCA) [12].

Author Contributions

All the authors have contributed to the conceptualization of the paper and the design of the research; methodology and AI models definition, V.A.P. and V.S.U.; data acquisition, F.D.B. and J.M.L.G.; writing—original draft preparation, C.G.T.; writing—review and editing, J.P.L.; All authors have read and agreed to the published version of the manuscript.

Funding

Centro de Investigación de Galicia CITIC is funded by Consellería de Educación, Universidades e Formación Profesional from Xunta de Galicia and European Union (European Regional Development Fund—FEDER Galicia 2014-2020 Program) by grant ED431G 2019/01. Partially supported by the Spanish Ministry of Science (Challenges of Society 2019) PID2019-104323RB-C33.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pham, T.; Tran, T.; Phung, D.; Venkatesh, S. Predicting healthcare trajectories from medical records: A deep learning approach. J. Biomed. Inform. 2017, 69, 218–229. [Google Scholar] [CrossRef] [PubMed]
  2. Yadav, P.; Steinbach, M.; Kumar, V.; Simon, G. Mining Electronic Health Records (EHRs): A Survey. ACM Comput. Surv. 2018, 50, 85:1–85:40. [Google Scholar] [CrossRef]
  3. Martínez-Romero, M.; Vázquez-Naya, J.M.; Pereira, J.; Pereira, M.; Pazos, A.; Baños, G. The iOSC3 system: Using ontologies and SWRL rules for intelligent supervision and care of patients with acute cardiac disorders. Comput. Math. Methods. Med. 2013. [Google Scholar] [CrossRef] [PubMed]
  4. Shickel, B.; Tighe, P.J.; Bihorac, A.; Rashidi, P. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis. IEEE J. Biomed. Health Inform. 2018, 22, 1589–1604. [Google Scholar] [CrossRef] [PubMed]
  5. Marier, A.; Olsho, L.E.W.; Rhodes, W.; Spector, W.D. Improving prediction of fall risk among nursing home residents using electronic medical records. J. Am. Med. Inform. Assoc. 2016, 23, 276–282. [Google Scholar] [CrossRef] [PubMed]
  6. Panahiazar, M.; Taslimitehrani, V.; Pereira, N.; Pathak, J. Using EHRs and Machine Learning for Heart Failure Survival Analysis. Stud. Health Technol. Inform. 2015, 216, 40–44. [Google Scholar] [CrossRef] [PubMed]
  7. Yue, L.; Dongyuan, T.; Weitong, C.; Xuming, H.; Minghao, Y. Deep learning for heterogeneous medical data analysis. World Wide Web 2020, 1–23. [Google Scholar] [CrossRef]
  8. Miotto, R.; Li, L.; Kidd, B.A.; Dudley, J.T. Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records. Sci. Rep. 2016, 6, 26094. [Google Scholar] [CrossRef] [PubMed]
  9. Weng, S.F.; Reps, J.; Kai, J.; Garibaldi, J.M.; Qureshi, N. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS ONE 2017, 12, e0174944. [Google Scholar] [CrossRef]
  10. Kawaler, E.; Cobian, A.; Pessig, P.; Cross, D.; Yale, S.; Craven, M. Learning to Predict Post-Hospitalization VTE Risk from EHR Data. In Proceedings of the 12th AMIA Annual Symposium, Chicago, Illinois, USA, 3–7 November 2012; pp. 436–445. [Google Scholar]
  11. Wong, N.C.; Lam, C.; Patterson, L.; Shayegan, B. Use of machine learning to predict early biochemical recurrence after robot-assisted prostatectomy. BJU Int. 2019, 123, 51–57. [Google Scholar] [CrossRef] [PubMed]
  12. Maćkiewicz, A.; Ratajczak, W. Principal components analysis (PCA). Comput. Geosci. 1993, 19, 303–342. [Google Scholar] [CrossRef]
Table 1. Numeric description of ICD-9 and ICD-10 sets.
Table 1. Numeric description of ICD-9 and ICD-10 sets.
RecordsPatientsDiagnosesProcedures
ICD-9156,36289,21141471581
ICD-1032,06925,01326912555
Table 2. Most prevalent diagnoses in the data set.
Table 2. Most prevalent diagnoses in the data set.
RecordsPatientsRelapses
Dorsopathies10,17782281250
Varicose veins970079811568
Arthropathies943781371116
Respiratory infections772243491466
Rheumatism66015943534
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Guerra Tort, C.; Aguiar Pulido, V.; Suárez Ulloa, V.; Docampo Boedo, F.; López Gestal, J. M.; Pereira Loureiro, J. Electronic Health Records Exploitation Using Artificial Intelligence Techniques. Proceedings 2020, 54, 60. https://doi.org/10.3390/proceedings2020054060

AMA Style

Guerra Tort C, Aguiar Pulido V, Suárez Ulloa V, Docampo Boedo F, López Gestal JM, Pereira Loureiro J. Electronic Health Records Exploitation Using Artificial Intelligence Techniques. Proceedings. 2020; 54(1):60. https://doi.org/10.3390/proceedings2020054060

Chicago/Turabian Style

Guerra Tort, Carla, Vanessa Aguiar Pulido, Victoria Suárez Ulloa, Francisco Docampo Boedo, José Manuel López Gestal, and Javier Pereira Loureiro. 2020. "Electronic Health Records Exploitation Using Artificial Intelligence Techniques" Proceedings 54, no. 1: 60. https://doi.org/10.3390/proceedings2020054060

Article Metrics

Back to TopTop