Next Article in Journal
Practical Grammar Compression Based on Maximal Repeats
Next Article in Special Issue
Decision Support System for Fitting and Mapping Nonlinear Functions with Application to Insect Pest Management in the Biological Control Context
Previous Article in Journal
A New Way to Store Simple Text Files
Previous Article in Special Issue
An Unknown Radar Emitter Identification Method Based on Semi-Supervised and Transfer Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Case Study for a Big Data and Machine Learning Platform to Improve Medical Decision Support in Population Health Management

by
Fernando López-Martínez
1,*,†,‡,
Edward Rolando Núñez-Valdez
1,‡,
Vicente García-Díaz
1,‡ and
Zoran Bursac
2,‡
1
Department of Computer Science, Oviedo University, 33003 Oviedo, Spain
2
Department of Biostatistics, Florida International University, Miami, FL 33199, USA
*
Author to whom correspondence should be addressed.
Current address: Sanitas Medical Center Corporate Offices, 8400 NW 33rd St, Doral, FL 33122, USA.
These authors contributed equally to this work.
Algorithms 2020, 13(4), 102; https://doi.org/10.3390/a13040102
Submission received: 28 February 2020 / Revised: 18 April 2020 / Accepted: 21 April 2020 / Published: 23 April 2020
(This article belongs to the Special Issue Algorithms in Decision Support Systems)

Abstract

:
Big data and artificial intelligence are currently two of the most important and trending pieces for innovation and predictive analytics in healthcare, leading the digital healthcare transformation. Keralty organization is already working on developing an intelligent big data analytic platform based on machine learning and data integration principles. We discuss how this platform is the new pillar for the organization to improve population health management, value-based care, and new upcoming challenges in healthcare. The benefits of using this new data platform for community and population health include better healthcare outcomes, improvement of clinical operations, reducing costs of care, and generation of accurate medical information. Several machine learning algorithms implemented by the authors can use the large standardized datasets integrated into the platform to improve the effectiveness of public health interventions, improving diagnosis, and clinical decision support. The data integrated into the platform come from Electronic Health Records (EHR), Hospital Information Systems (HIS), Radiology Information Systems (RIS), and Laboratory Information Systems (LIS), as well as data generated by public health platforms, mobile data, social media, and clinical web portals. This massive volume of data is integrated using big data techniques for storage, retrieval, processing, and transformation. This paper presents the design of a digital health platform in a healthcare organization in Colombia to integrate operational, clinical, and business data repositories with advanced analytics to improve the decision-making process for population health management.

1. Introduction

Colombia’s health system is formed by the public sector and the private sector. The general social security system has two plans, contributory and subsidized. The contributory regime covers salaried workers, pensioners, and independent workers, with the subsidized plan covering anyone who cannot pay. Enrollment coverage increased from 96.6% in 2014 to 97.6% in 2015 [1].
The National Health Authority’s primary purpose in Colombia is to improve the quality of healthcare and strengthening supervision, surveillance, and control of the health system. The 2015 Statutory Health Law No. 1751 places the responsibility for guaranteeing the right to health with the health system and recognizes health as a fundamental social right and makes it the state’s responsibility to pursue an approach in health promotion and disease prevention [2].
The health sector in Colombia supports all initiatives for implementing new technologies to prevent cardiovascular diseases, disabilities, and high-cost hospitalization cases [3]. There is a remarkable need to improve the prediction of the risk of conditions for the population through the integration and unification of massive volumes of data and the implementation of effective advance analytic solutions to improve the decision-making process and population health management in Colombia’s population [4].
Keralty organization is formed by a group of insurance and health services companies with a global presence, which together develops an integral health model, whose purpose is to produce health and well-being to people throughout their lives. The organization is committed to keeping its users healthy and autonomous, focusing on prevention, identification, and management of health risks, control, and care of disease and dependency [5]. The organization is a leader in Colombia by providing integrated health services and is recognized for their human, scientific, technical, and ethical approach [6].
This paper presents how we can obtain value from a large volume of heterogeneous data generated by different data sources in healthcare, and the architecture implemented. The development of proper advanced data analytics methods such as machine learning and big data analytics to perform meaningful real-time analysis on the data to predict clinical complications before it happens and to support the decision-making process are challenging but much needed to handle the complexity of the data-driven problems we are currently facing.

1.1. Related Work

Several initiatives in Europe, Asia, and North America aim to develop healthcare digital platforms with collaborative access tools to allow the exchange and sharing of information and knowledge wherever and whenever needed throughout the attention process. This type of frameworks and architectures will allow maximum quality and efficiency for patient’s care, and to provide appropriate attention to the patient’s condition and risks.
Castilla and Leon, for example, implemented a digitalization of health services as a tool to increase the efficiency of the services and increase the security in the attention to patient [7]. A healthcare cyber-physical system assisted by cloud and big data is being developed in the department of computer science at Pace University in New York [8]. This system consists of a data integration layer, a data management layer, and a data-analytics service layer to improve the functioning of the healthcare system. In France, a group of researchers implemented a wearable knowledge as a service platform to cleverly manage heterogeneous data coming from wearable devices to assist the physicians in supervising the patient health [9]. Another interesting work was presented at the International Conference on Computational Intelligence and Data Science (ICCIDS 2018). The authors proposed a hybrid four-layer healthcare model to improve disease diagnostic [10]. In India, a centralized architecture for an end to end integration of healthcare systems deployed in the cloud environment was developed using fog computing [11].
Medical organizations are investing more and more in developing a healthcare platform that integrates data, applications, business processes, and user interfaces to gain knowledge and useful insights for clinical decisions, drug recommendation systems, and better disease diagnoses. Some other examples of big data applications in healthcare can be found in healthcare monitoring, where data captured from wearable devices can assist providers in managing symptoms of patients online and adjust their prescriptions [12]. An analytical platform called “MedAware” has been developed to detect errors in medical prescriptions and clinical errors, reducing the hospital admission and readmission in real-time [13]. In the healthcare prediction field, a healthcare system called “Gemini (Generalizable Medical Information analysis and Integration system)” was developed to collect, process, and analyze large volumes of clinical data and apply machine learning algorithms for performing predictive analytics [14]. Other platforms have been implemented for genomics data analytics to generate predictions based on DNA molecular changes and mutations [15]. Another type of healthcare platform is related to the healthcare knowledge system, defined as the combination of clinical data and physician expertise to support clinical decision-making and diagnosis [16].

1.2. Why Big Data and Machine Learning?

Big data and machine learning are redefining healthcare goals for the future. Healthcare data are impacting the way disease research is performed, and the level of complexity in population health management is increasing as the traditional fee for service approach is transformed into the value-based care model [17,18].
Population health management is basically the aggregation of patient health data from multiple data sources, and the analysis and transformation into actionable insights to generate informed decisions to improve clinical and financial outcomes [19].
Big data technologies will allow us to bring large volumes of structured and unstructured data from disparate data sources into a data repository to be examined and analyzed. Machine learning models will assist in discovering insights from complex datasets with capabilities such as finding unseen patterns, making new predictions, and analyzing trends on health data. Machine learning is being used in a variety of clinical domains with the analysis of hundreds of clinical parameters resulting in effective and efficient models to improve the outcomes and quality of medical care models [20].
The implementation of this platform shows the enormous potential in using big data to individualize medical treatment, the opportunity for improving the lives of the patients, delivering better medical care, and reduced waste at an operational level [21]. Other chances for big data in healthcare for Keralty organization are:
  • A physician would know before prescribing whether the patient is at high-risk to become dependent and different treatment plans can be selected based on this information.
  • Psychosocial and clinical medical data could inform about the development of a chronic illness that can be properly diagnosed.
  • The organization can use big data to understand how they are performing, the opportunities to improve clinical care, and their capacity to redesign care delivery to their patients.
  • Using the platform’s analytics component to improve the quality of care and patient experience at the lowest possible cost is core to the organization.
  • Capturing streaming data and wearable data can provide to healthcare providers real-time insights about a patient’s health that will allow them to improve their decision-making process for treatment and medication.
  • Big data analysis can help the organization to deliver information that is evidence-based and can improve the efficiency, understanding, and implementation of the best practices associated with any disease.
In addition to the big data technologies used to build the platform, another essential component is the advanced analytic module of the platform. This module contains several machine learning algorithms to support clinical diagnosis. However, the organization should feel confident in these models and how they can be applied to specific use cases. These first models will alert providers to changes in high-risk conditions such as sepsis and hypertensive patients.
The main objective of this paper is to present the developed platform and its components to allow Keralty organization to derive better and more actionable insights from their data, i.e., to derive meaningful information from all these data in a way that allows them to improve care and lower costs needed for value-based reimbursement and business objectives while providing the highest quality care for population health management [22]. The goal is to be aligned with the triple aim framework developed by the Institute for Healthcare Improvement that describes an approach to optimizing healthcare system performance.The implementation of this platform intends to resolve several problems in health services to assist patients and their families in managing their health by providing better access to healthcare services [23].

2. Proposed Digital Health Platform

Keralty organization currently have several information systems such as Health Information Systems (HIS), Lab Information Systems (LIS), Radiology Information Systems (RIS), Enterprise Resource Planning (ERP), and Customer Relationship Management (CRM), among others, in their ambulatory care centers, hospitals, and home care, which support their integrated health model. The information from these systems was not consolidated on a single platform, and its access and availability generated an operative load, which obstructs all health management processes and the support of clinical decisions for physicians. Consequently, we proposed the design and implementation of a healthcare, clinical, and business data repository with advanced analytic capabilities to consume machine learning prediction models to improve the decision-making process and population health management at the organization. The digital health platform conceptual framework is shown in Figure 1.
The implementation of the platform was an ambitious project that required integrating health information from disparate sources, building numerous technological and functional components, and the definition of IT management processes robust enough to support interoperability with other systems. The digital health information platform included patient-related data, Electronic Health Records (EHR), diagnostic reports, prescriptions, medical images, pharmacy records, research data, operational data, financial data, and human resources data.
This project was innovative and pioneered the designing and building of a comprehensive health digital platform for a healthcare organization in Colombia, with the patient being at the center of it and all of its information aggregated and summarized based on the standardized enterprise data repository. This information can be accessed quickly and intuitively when and where it is needed, hiding all technical complexity and providing longitudinal process management tools, as well as tools for decision support for professionals. The difference of this platform with other implementations was the development of a medical portal with a patient 360 view that uses data from the enterprise data repository to generate real-time early warning scores, patient surveillance, open API for hospitals integration, prediction of health risk patterns, high-risk markers, co-morbidity detection to predict critical diseases, early diagnosis of diseases, treatment comparison with medical guidelines, and measurement of efficiency of specific drugs to provide the best quality of care.
The Digital Healthcare platform architecture can ingest data from over 50 different source systems at the granular level, including claims, clinical, financial, administrative, wearables, genomics, and socioeconomic data. Few platforms today can integrate that many heterogeneous data sources successfully. The platform can consume machine learning models on-demand without the need for further development. The data logic models are on top of the raw data and can be accessed, reused, and updated through open APIs without the need for clinical and business logic changes. The platform was able to integrate successfully structured and unstructured data. It is commonly seen that this type of platforms in the market is built to either integrate structured data or unstructured but few cases successfully integrate both. Open microservices APIs were created for operations such as authorization, identity management, interoperability, and data pipeline management. These microservices enable the development of third-party applications to interoperate with the platform.

2.1. Platform Architecture

The initial approach was to build a big data processing pipeline with a Microsoft Azure lambda architecture to support real-time and batch analytics. This approach is shown in Figure 2. This architecture has different mechanisms to consume data depending on the source and timing needed to generate insights. In addition, with this approach, we can have professionals with different skills working in parallel to build the platform.
The architecture contains a batch layer, a real-time layer, and a serving layer. The batch layer is in charge of persistent storage and is able to scale horizontally. The real-time layer process streaming data and performs dynamic computation. The serving layer query data on the repositories and consume the prediction models.
From the infrastructure point of view, the platform offers the flexibility of being implemented in a hybrid environment, namely the cloud and the local data processing center, through the use of virtualization techniques, containers, and load balancing systems. The design of the infrastructure was prepared to provide a flexible set of resources that can be used on-demand and based on the specific workload requirements. The infrastructure deployment relied heavily on automation to provide fluid operations.

2.2. Data Repository

An enterprise-wide staging repository for the big data analytics platform was considered. The data lake allows capturing data of any volume, type, and ingestion speed in one single place for storing heterogeneous data. This staging area included capabilities such as security, scalability, reliability, and availability. The data can be passed, processed directly from the staging area, or can be ingested to an enterprise data warehouse for historical load, preparation, and serve for BI and machine learning needs. This data warehouse repository has a scale-out architecture and massively parallel processing (MPP) engine.
Data models were developed to cover clinical, social, and healthcare program domains. Each model performs validations and processing on the data received, decoupling the processing and administration of the data from the source. These data models can also be extended to store additional attributes specific to the implementation, allowing these models to subscribe to certain types of messages, using the mapping and filtering options provided by the data processing pipelines. Once these subscriptions are created, the model will be loaded with all relevant messages to those who are subscribed and stored in the data lake.
For data storage, the data are loaded into a data warehouse with a daily refresh. This healthcare data repository contains a highly normalized data model for fast and efficient querying and analysis. This repository is read-only.

2.3. Integration and Interoperability

The platform provides a mechanism to integrate data from heterogeneous sources, define workflows to ingest data from different data stores, and transform and process data to data stores to be consumed by BI applications. A cloud-based data integration service is used to create these data-driven workflows and orchestrate all automation, transformation, and data movement in the platform. The main tasks this integration service should perform are: creation and scheduling of data pipelines to ingest data from different data sources, processing and transformation of the data, and store data in data stores such as data lakes or data warehouses.
Azure Data Factory automates and orchestrates the entire data integration process from end to end in the platform. We built the ETL (extract, transform, and load) pipelines with this Azure component. The data are extracted from the source locations, transformed from its source format to the target Azure data lake’s schema, and loaded into Azure data lake and the data warehouse, where they can be used for analytics and reporting. Azure Data Factory defines control flows that execute various tasks in the transform and load process.
We used the mechanism called mapping data flows, combining control flows and data flows to build the data transformations with an easy-to-use visual user interface. These data flows are then executed as activities within Azure Data Factory pipelines. Data Factory is certified by HIPAA (Health Insurance Portability and Accountability Act), which protects the data while they are in use with Azure. In the data flow, we created transformation streams where we define the source data and create the graph with the transformations, schema operations such as derived column, aggregate, surrogate keys and selects, and the output settings.

2.4. Data Security and Privacy Model

In terms of security, the platform guarantees authentication, access control, and encryption capabilities. The security mechanisms of the platform can provide protection, alert monitoring, and support the OAuth 2.0 protocol for authentication with REST interfaces. ACLs are enabled on folders, subfolders, and files. The platform also provides encryption mechanisms to protect the data. All these capabilities are accompanied by the implementation of enterprise security policies and regulatory compliance requirements.

2.5. Stream Analytics

The platform can handle mission-critical real-time data and offer end to end streaming pipelines with continuous integration and continuous delivery (CI-CD) services. Other capabilities such as in-memory processing, data encryption, and support of international security standards including HIPAA (Health Insurance Portability and Accountability Act), HITRUST (Health Information Trust Alliance), and GDPR (General Data Protection Regulation).

2.6. Advanced Analytics

The analytic data component consists of two areas: The first area is the BI models we develop for tactical, operational, and strategic decisions. The second area comprehends several prediction models that need to be developed. Currently, there are two prediction models developed by the authors of this paper to support population health management, specifically the diagnosis of sepsis and hypertension prediction [24,25]. These insights assist providers in the detection and tracking of chronic diseases. The machine learning component is used to build, test, consume, and deploy predictive analytic models on-demand and as requested for the organization. The platform provides self-service dashboards and visualizations that use data from the repositories to drive the decision-making process. The machine learning application layer is one of the essential layers of this platform.
Once the data are integrated, aggregated, and normalized in the system, the platform offers a tool to provide knowledge management through the business intelligence interface providing data analysis, design, and training of machine learning models, as well as development and management of results-based care indicators or population health management. The platform provides a tool where clinicians, researchers, and scientists can mine the data and get valuable information.
Machine learning models can be trained and customized in preconfigured data domains, allowing the storage of the results for future use. Data researchers and scientists can develop advanced tools to obtain information and value of the data stored in the solution, taking advantage of the model design, training, and validation component. We briefly present the predictive models implemented in the platforms.
  • Machine Learning Classification for a Hypertensive Population: This prediction model evaluates the association between gender, race, BMI (Body Mass Index), age, smoking, kidney disease, and diabetes using logistic regression. Data were collected from NHANES datasets from 2007 to 2016 to train and test the model, a dataset of 19,709 samples with (83%) non-hypertensive individuals and (17%) hypertensive individuals. The results show a sensitivity of 77%, a specificity of 68%, precision on the positive predicted value of 32% in the test sample, and a calculated AUC of 0.73 (95% CI [0.70–0.76]). The model used to estimate the probability that a person will belong to the hypertensive or non-hypertensive class is:
    p = e ( β 0 + β 1 g e n d e r + β 2 a g e + β 3 r a c e + β 4 b m i + β 5 k i d n e y + β 6 s m o k e + β 7 d i a b e t e s ) 1 + e ( β 0 + β 1 g e n d e r + β 2 a g e + β 3 r a c e + β 4 b m i + β 5 k i d n e y + β 6 s m o k e + β 7 d i a b e t e s )
We used the logistic regression classification model in this experiment to evaluate the importance of the risk factor variables and their relationship with the prevalence of hypertension among a nationally representative sample of adults ≥20 years in the United States (n = 19,759). The distribution of the samples by hypertensive patients, gender, and race is shown in Table 1.
We computed chi-square test between each independent variable and the dependent variable to indicate the strength of evidence that there is some association between the variables. Chi-square was selected due to the categorical form of the data used in the model, and it is considered one of the best methods to estimate the dependency between the class and the features when the feature can take a fixed number of possible values that belong to a group or nominal category.
Table 2 shows the p-value for each variable; the null hypothesis is reject for any p ≤ 0.05, while the null hypothesis is not rejected when p > 0.05. p-values for the variables GENDER, BMIRANGE_1, BMIRANGE_3, and KIDNEY_2 are not statistically significant at 0.05 alpha level; the clinical importance of these variables in the model for interpretation allows us to include them. We ran the model with and without the variables, and there were no significant changes in the accuracy score, positive predicted value rate, and true positive rate.
The training dataset was derived from a random sampling of 70% (13,831) of the extracted study population and the test sampling the remaining 30% (5928) to evaluate the model on the ground-truth that was never used for training. We ran the logistic regression model on the entire dataset to verify the accuracy score of the model.
The Logistic Regression model uses the logit function to express the relationship of the risk factors as:
l o g i t ( p ) = l n ( p 1 p ) = β 0 + β 1 X 1 + β 2 X 2 + + β i X i
The probability of success can be expressed as:
p = e ( β 0 + β 1 X 1 + β 2 X 2 + + β i X i ) 1 + e ( β 0 + β 1 X 1 + β 2 X 2 + + β i X i )
where p is the predicted probability of having hypertension, X i are the risk factors or independent variables, and β i are the coefficients that are estimated by using the method of maximum likelihood and allow us to calculate the odds that, for every unit increase in X i , the odds of having hypertension changes by e β .
  • A neural network approach to predict early neonatal sepsis: We developed a non-invasive neural network classification model for early neonatal sepsis detection. The data used in this study are from Crecer’s Hospital center in Cartagena-Colombia. A dataset of 555 neonates with (66%) of negative cases and (34%) of positive cases was used to train and test the model. The study results show a sensitivity of 80.32%, a specificity of 90.4%, precision on the positive predicted value of 83.1% in the test, sample and a calculated area under the curve of 0.925 (95% Confidence Interval [91.4–93.06]). The neural network architecture can be seen in Figure 3.
Table 3 shows the parameters of the architecture. Labels X1–X7 are informative only, and the input size is 27 variables.
The model used an anonymous dataset from a private medical institution in Cartagena, Colombia, from 2016 to 2017. Demographic, laboratory data, blood pressure, and body measures data were part of the dataset. This dataset includes cases of live newborns of ages inferior to 72 h with a diagnosis of early neonatal sepsis by clinical criteria and laboratory blood cultures. Control cases were part of the dataset including all newborns healthy by clinical diagnosis and who returned healthy for a follow up at 72 h.
This retrospective study includes 186 cases and 368 controls based on a case-control relationship of 1:2 with a 95% trust factor and power of 80%. Bivariate analysis and logistic regression were performed to detect the variables associated with early sepsis, and the statistical significance was considered at the alpha level of 0.05.
This model considered nine sociodemographic, fourteen obstetric, nine neonatal, and four maternal infectious related pathology variables. Table 4 shows the quantitative sociodemographic variables, Table 5 shows the qualitative sociodemographic variables, Table 6 shows the quantitative neonatal variables, Table 7 shows the qualitative neonatal variables, Table 8 shows the quantitative obstetric variables, Table 9 shows the qualitative obstetric variables, and Table 10 shows the qualitative maternal infections of the cases and controls.
A bivariate chi-square test with correction was performed to the qualitative variables to find a statistical association between the independent variable and the possibility to develop early neonatal sepsis. For continuous variables, the Mann–Whitney U test was performed. From this statistical analysis, it is essential to show that we did not find significant statistical evidence for the variables age, start of marital status at younger than 18 years old, gender, APGAR (Appearance, Pulse, Grimace, Activity, and Respiration) value less than 7 after 1 and 5 min, the number of pregnancies, and the type of birth. Prenatal control is not associated with the case of sepsis; however, assisting to five prenatal controls are associated with the protection to avoid the appearance of early neonatal sepsis. There was no evidence with the variables IUGR (Intrauterine Growth Restriction) background and multiple pregnancies. Twenty-seven (27) variables were selected as input variables for our artificial neural network architecture.
In terms of computational timing, It is difficult to evaluate the complexity and timing of a machine learning algorithm. However, based on the algorithmic complexity, we can measure the time performance in terms of its training time complexity using big O notation because the classification time of the models can vary depending on the stress in the computational performance and power. In terms of timing, the classification prediction with the trained models is less than 1 s. The time complexity of the logistic regression could be expressed as O ( ( f + 1 ) c s E ) , where f is the number of features (+1 because of bias), c is the number of possible outputs, s is the number of samples, and E is the number of epochs to run. For the neural network approach, O ( p n l 1 + n l 1 n l 2 + ) , where p is the number of features and n l i is the number of neurons at layer i in a neural network [26].

3. Actual Platform Benefits

The implementation of the platform became the digital healthcare ecosystem for the organization. The organization can populate workflow information systems with critical decision-making insights, accurate and reliable healthcare data that significantly increased the value of the healthcare outcome to patients and care providers. This platform delivers significant benefits to the organization, such as physicians having an intelligent application that can be configured to their preferences and optimized to their disciplines, patients receiving more personalized care, an improvement in healthcare workflow and patient care, and personalized care for physicians and patients.
We describe in the following subsections several use cases that effectively present the change and digital transformation of the organization with the implementation of the platform.

3.1. Reduce Total Cost of Care for Care Coordination

With a robust data analytic component, the organization was able to prioritize opportunities for improvement and to improve the way care is coordinated and delivered throughout its network of hospitals and medical facilities. The results include a considerable increase in financial results in just six months.
The organization uses the platform to generate timely, meaningful, and actionable data to drive change and improve the quality of care for patients. The organization uses the data for risk-stratification of the network’s population, prioritization of the care coordination activities, and prevention activity’s interventions. Risk stratification was completed for all patients, enabling care managers to identify individuals at various risk levels for unnecessary services and high-cost utilization, improving patient outcomes and experience. The analytical component also reduces unnecessary visits, facilitates access to specialty care and community-based services, and achieves healthcare outcomes. Other benefits include 3% increase in the detection of high-risk patients with primary care, 20% increase in the number of patients with ongoing care managed, and 10% percent reduction in emergency department utilization per member among care managed patients.

3.2. Self-Service Analytic

As described in this paper, the healthcare platform combines and standardizes data across different source systems to provide actionable insights in a single platform. The platform integrates data from different sources, such as claims data, cost data, financial data, clinical data, and other patient data. With self-service analytics, the organization increases the number of users accessing the analytic component, improving data visibility and providing actionable insights to improve patient outcomes.

3.3. Reduce Deaths from Sepsis

The organization improved sepsis mortality rates and improving care outcomes by using the advanced analytic component of the platform. Sepsis impacts almost 1.7 million adults in the U.S. and is responsible for nearly 270,000 annual deaths. One-third of all hospital deaths are patients with sepsis [27]. The machine learning prediction model used in the platform was developed by one of the authors of this paper, as described before. It is still too early to mention the results of the utilization of this feature. However, the goal of the organization is to reduce its sepsis mortality rate, the costs of the creation of its sepsis care transformation team, and the implementation of an evidence-based sepsis care practice.

3.4. Discussion and Limitations

The digital health platform helps Keralty organization with closing the gaps between multiple datasets, improving clinical benefits, improving patient’s lives, supporting better decision-making to manage larger populations, and improving overall health outcomes. However, the need for algorithms with high accuracy in medical diagnosis is still a challenge that needs to be improved precisely and efficiently [28]. The increasing complexity of building end-to-end platforms to integrate disparate systems and to apply machine learning techniques in specific areas such as computer vision, natural language processing, reinforcement learning, and other generalized methods present many challenges when forming the interdisciplinary team needed and the set of technological components used for the implementation.
Some challenges should be considered in the design and implementation of machine learning projects for healthcare. One of the most critical challenges requires algorithms that can answer causal questions. These questions are beyond classical machine learning algorithms because they require a formal model of interventions [29]. To address this type of question from the analytical component of the platform, we need to learn from data differently and to gain knowledge in causal models to understand how machine learning algorithms need to be trained. Another challenge is to create reliable outcomes from heterogeneous data sources with the participation of SME (Subject Matter Experts) who understand the disease; the machine learning predictive accuracy and correct clinical interpretation depend on the criteria and context of the disease. Providers and machine learning engineers should work together on model interpretability and applicability. Machine learning implementation is not an easy task; the selection of predictive features and optimization of hyperparameters is another challenge that needs to be mastered to implement models that provide useful insights [30]. The success and meaningful use of these algorithms, and their integration into the platform depends on the accuracy of the models and their interpretability.

4. Results of Advanced Analytics

After training and testing the logistic regression model for predicting hypertension, we generated some evaluation metrics to evaluate the classifier. Table 11 shows the confusion matrix with the classification results, include the true positive value (730), true negative value (3407), false negative (216), and false positive value (1575). The classification report in Table 12 shows the calculated precision and sensitivity.
The test sampling of 5928 contains 4982 (84%) non-hypertensive and 946 (16%) hypertensive patients. The model shows a sensitivity of 730/946 = 77% and a specificity of 3407/4982 = 68%. The precision of the model was 730/2305 = 32% and the negative predicted value 3407/3623 = 94%. The false negative rate of the model was 216/946 = 22%. The model was better at identifying individuals who will not develop hypertension than those who will develop hypertension.
For the neural network approach to predict early neonatal sepsis, Table 13 shows the confusion matrix with the classification results of actual class label vs. the predicted ones, including the true positive value (49), true negative value (95), false negative (12), and false positive value (10).
The classification report in Table 14 shows the precision and sensitivity. The sensitivity of the model is moderately acceptable due to the imbalanced testing dataset, and there is still a high number of false negatives.
A sensitivity of 80.3% and a specificity of 90.4% show that the model might be useful for detecting positive cases, and the true negative rate shows that the model is also efficient at identifying negative cases. The high precision value of 83.1% and the AUC of 0.925 confirm the adequacy of the model as a preliminary screening tool. The percentage of positive cases shows that the model works better than random guessing and the conditional probability of negative test results is considerably low. The accuracy of 86.74% shows that the model correctly identifies negative cases and positive cases based on the characteristics of the dataset and the small number of cases examined.

5. Comparison with Other Platforms

A review of several healthcare platforms shows that the architecture presented in this paper covers all the categories from integration, interoperability, security care, and advanced analytics. Generally, other implementations only focused on one specific area, as shown in Table 15 and taken from the International Conference on Computational Intelligence and Data Science (ICCIDS 2018) and a healthcare frameworks review proposed in the Journal of King Saud University [31].
We designed and implemented a healthcare platform using big data technologies with actionable insights to augment human decision-making at the organization impacting the population’s health, public health, and to capture social determinants of health. This platform comprehends all the features we use in the comparison. Raghupathi et al. reported a conceptual architecture to present big data analytic outlines in healthcare with no predictive analytic capabilities and no patient 360 view. Patel et al. designed a big data architecture platform to improve data aggregation in the healthcare industry and to provide a reduction in healthcare cost, predicting analytic, preventive care, and drug discovery capabilities but without patient 360 view capabilities. Chawla et al. presented a patient-centric healthcare framework—Collaborative Assessment and Recommendation Engine (CARE)—to improve patient-centric treatment and diagnosis without real-time monitoring and 360 view capabilities. Abinaya et al. implemented a fascinating e-Health service application for diagnosing heart diseases. Balladini et al. designed a real-time architecture of big data for Francisco Lopez Lima Hospital in Argentina to process physiological data. This platform did not include predictive analytic and patient 360 view. Belle et al. implemented a genomic data processing platform that provides image analytic and signal processing of psychological data. Mezghani et al. designed a big data platform for integrating heterogeneous wearable data in healthcare for real-time monitoring and diagnosis. Lastly, Chen et al. presented a real-time big data platform to improve communication and collaboration between patients and providers, increasing the quality of care that clinical teams can provide.

6. Conclusions and Future Work

This paper provides details of an optimized and secure healthcare platform that revolutionizes the healthcare industry in Colombia by providing better information to patients and care teams. The use of this technology reduces the costs associated with healthcare.
The proposed digital health platform allows us to address population health challenges, to understand better patient’s health, and to find hidden patterns that traditional data analytics fail to find. The organization can use unified patient-generated data, financial data, and socioeconomic data to detect patterns and to discover a group of patients who share similar health behavior. The analysis of clinical and non-clinical data allows predicting patient’s health with better accuracy. The platform also allows better health discoveries and actions based on treatment history for individuals and groups of patients.
Keralty organization recognized that better care coordination was required for patients receiving care. The organization wanted to improve quality outcomes, provider engagement and recruitment, and its own economic health. To meet these objectives, the organization focuses on clinician engagement and organizational alignment, ensuring widespread access to meaningful, actionable data, and the use of the healthcare analytics platform to inform decisions and drive improvement. Keralty believes the use of machine learning will be one of the most important, life-saving technologies ever introduced to the organization. We believe the opportunities are virtually limitless for the platform to improve and accelerate clinical, workflow, and financial outcomes.
More future work needs to be done on the platform to continue improving all the benefits for the entire organization. Tools for performing knowledge discovery process will be added to the ecosystem. The organization is planning to start the implementation of prescriptive analytics models to assist the organization in making smarter decisions in population health management. The architecture team will look at the possibility of implementing Map/Reduce-based computations for processing data with high scalability and to execute low latency and high concurrency analytical queries on top of Hadoop clusters.

Author Contributions

Conceptualization, F.L.-M., V.G.-D. and E.R.N.-V.; Methodology, F.L.-M., V.G.-D. and E.R.N.-V.; Software, F.L.-M.; Validation, F.L.-M., V.G.-D., Z.B. and E.R.N.-V.; Formal Analysis, F.L.-M.; Investigation, F.L.-M., V.G.-D., Z.B. and E.R.N.-V.; Resources, F.L.-M.; Data Curation, F.L.-M. and Z.B.; Writing—Original Draft Preparation, F.L.-M., V.G.-D. and E.R.N.-V.; Writing—Review and Editing, F.L.-M., V.G.-D., Z.B. and E.R.N.-V.; Visualization, F.L.-M., Z.B. and E.R.N.-V.; Supervision, F.L.-M. and V.G.-D.; Project Administration, V.G.-D. and E.R.N.-V.; Funding Acquisition, F.L.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

This document presents an independent study supported by the company Sanitas USA. The points of view expressed are those of the authors and not necessarily those of Sanitas USA. We thank Ivan Murcia VP of Healthcare Services at Sanitas USA and Santiago Thovar, CIO at Keralty who provided insight and expertise that greatly assisted the study.

Conflicts of Interest

The authors declare no conflict of interest. The founders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
ACLAccess Control List
BIBusiness Intelligence
CRMCustomer Relationship Management
EHRElectronic Health Record
ERPEnterprise Resource Planning
GDPRGeneral Data Protection Regulation
HISHospital Information System
HIPAAHealth Insurance Portability and Accountability Act
HITRUSTHealth Information Trust Alliance
LISLab Information System
MPPMassive Parallel Computing
RISRadiology Information System
RESTRepresentational State Transfer

References

  1. Glassman, A.; Giuffrida, A.; Escobar, M.L.; Giedion, U. Chapter 1 Colombia: After a Decade of Health System Reform. In From Few to Many; Inter-American Development Bank: Washington, DC, USA, 2009; Volume 1, pp. 1–13. [Google Scholar]
  2. Ruíz, F.; Gaviria, A.; Norman, J. Plan Decenal de Salud Pública. Bogotá 2020, in press. [Google Scholar]
  3. Legido, H.; Lopez, P.A.; Balabanova, D.; Perel, P.; Lopez-Jaramillo, P.; Nieuwlaat, R.; Schwalm, J.D.; McCready, T.; Yusuf, S.; McKee, M. Patients’ knowledge, attitudes, behaviour and health care experiences on the prevention, detection, management and control of hypertension in Colombia: A qualitative study. PLoS ONE 2015, 10, e122112. [Google Scholar] [CrossRef] [Green Version]
  4. Lopez, F.E.; Bonfante, M.C.; Arteta, I.G.; Baldiris, R.E. IoT and big data in public health: A case study in Colombia. In Protocols and Applications for the Industrial Internet of Things; IGI Global: Hershey, PA, USA, 2018; pp. 309–321. ISBN 978-1-5225-3806-6. [Google Scholar]
  5. Dennis, R.J.; Caraballo, L.; García, E.; Rojas, M.X.; Rondon, M.A.; Pérez, A.; Aristizabal, G.; Peñaranda, A.; Barragan, A.M.; Ahumada, V. Prevalence of asthma and other allergic conditions in Colombia 2009–2010: A cross-sectional study. BMC Pulm. Med. 2012, 12, 12. [Google Scholar] [CrossRef] [Green Version]
  6. About Keralty. Available online: https://www.keralty.com/en/about-keralty (accessed on 27 January 2020).
  7. León, G.R. Digitalización de Historia Clínica. Available online: https://contrataciondelestado.es/wps/wcm/connect/3236c434-7ce1-484f-bb50-b8942bdc7d66/DOC20190314132936Estandar_digitalizacion_SACYL-+9.pdf?MOD=AJPERES (accessed on 27 January 2020).
  8. Zhang, Y.; Qiu, M.; Tsai, C.W.; Hassan, M.M.; Alamri, A. Health-CPS: Healthcare cyber-physical system assisted by cloud and big data. IEEE Syst. J. 2017, 11, 88–95. [Google Scholar] [CrossRef]
  9. Mezghani, E.; Exposito, E.; Drira, K.; Da Silveira, M.; Pruski, C. A Semantic Big Data Platform for Integrating Heterogeneous Wearable Data in Healthcare. J. Med. Syst. 2015, 39. [Google Scholar] [CrossRef]
  10. Kaur, P.; Sharma, M.; Mittal, M. Big Data and Machine Learning Based Secure Healthcare Framework. Proc. Procedia Comput. Sci. 2018, 132, 1049–1059. [Google Scholar] [CrossRef]
  11. Thota, C.; Sundarasekar, R.; Manogaran, G.; Varatharajan, R.; Priyan, M.K. Centralized Fog Computing security platform for IoT and cloud in healthcare system. In Fog Computing: Breakthroughs in Research and Practice; IGI Global: Hershey, PA, USA, 2018; pp. 365–378. ISBN 978-1-5225-5650-3. [Google Scholar]
  12. Edet, R.; Afolabi, B. Prospects and Challenges of Population Health with Online and other Big Data in Africa. Adv. J. Soc. Sci. 2019, 6, 57–63. [Google Scholar] [CrossRef] [Green Version]
  13. MedAware—Using AI to Eliminate Prescription Errors—Digital Innovation and Transformation. Available online: https://digital.hbs.edu/platform-digit/submission/medaware-using-ai-to-eliminate-prescription-errors/ (accessed on 8 March 2020).
  14. Ling, Z.J.; Tran, Q.T.; Fan, J.; Koh, G.C.H.; Nguyen, T.; Tan, C.S.; Yip, J.W.L.; Zhang, M. GEMINI: An integrative healthcare analytics system. Proc. VLDB Endow. 2014, 7, 1766–1771. [Google Scholar] [CrossRef]
  15. Manogaran, G.; Thota, C.; Lopez, D.; Vijayakumar, V.; Abbas, K.M.; Sundarsekar, R. Big Data Knowledge System in Healthcare; Springer: Cham, Switzerland, 2017; pp. 133–157. [Google Scholar] [CrossRef] [Green Version]
  16. Bates, D.W.; Saria, S.; Ohno-Machado, L.; Shah, A.; Escobar, G. Big data in health care: Using analytics to identify and manage high-risk and high-cost patients. Health Affairs 2014, 33, 1123–1131. [Google Scholar] [CrossRef] [Green Version]
  17. Farooqi, M.M.; Shah, M.A.; Wahid, A.; Akhunzada, A.; Khan, F.; ul Amin, N.; Ali, I. Big Data in Healthcare: A Survey. Appl. Intell. Technol. Healthc. 2019, 143–152. [Google Scholar] [CrossRef]
  18. Hulsen, T.; Jamuar, S.S.; Moody, A.R.; Karnes, J.H.; Varga, O.; Hedensted, S.; Spreafico, R.; Hafler, D.A.; McKinney, E.F. From big data to precision medicine. Front. Media 2019, 6, 34. [Google Scholar] [CrossRef] [Green Version]
  19. Hatzigeorgiou, M.N.; Joshi, M.S. Population Health Systems: The Intersection of Care Delivery and Health Delivery. Popul. Health Manag. 2019, 22, 467–469. [Google Scholar] [CrossRef]
  20. Koti, M.S.; Alamma, B.H. Predictive analytics techniques using big data for healthcare databases. In Proceedings of the Smart Innovation, Systems and Technologies; Springer Science and Business Media: Singapore, 2019; Volume 105, pp. 679–686. [Google Scholar] [CrossRef]
  21. Dash, S.; Shakyawar, S.K.; Sharma, M.; Kaushik, S. Big data in healthcare: Management, Analysis and Future prospects. J. Big Data 2019, 6, 54. [Google Scholar] [CrossRef] [Green Version]
  22. Puaschunder, J.M. Big Data, Algorithms and Health Data. SSRN Electron. J. 2019. [Google Scholar] [CrossRef]
  23. Moreira, M.W.; Rodrigues, J.J.; Korotaev, V.; Al-Muhtadi, J.; Kumar, N. A Comprehensive Review on Smart Decision Support Systems for Health Care. Inst. Electr. Electron. Eng. 2019, 13, 3536–3545. [Google Scholar] [CrossRef]
  24. López-Martínez, F.; Núñez-Valdez, E.R.; Lorduy Gomez, J.; García-Díaz, V. A neural network approach to predict early neonatal sepsis. Comput. Electr. Eng. 2019, 76, 379–388. [Google Scholar] [CrossRef]
  25. López-Martínez, F.; Schwarcz.MD, A.; Núñez-Valdez, E.R.; García-Díaz, V. Machine Learning Classification Analysis for a Hypertensive Population as a Function of Several Risk Factors. Expert Syst. Appl. 2018, 110, 206–215. [Google Scholar] [CrossRef]
  26. Singh, A. Foundations of Machine Learning. SSRN Electron. J. 2019, 486. [Google Scholar] [CrossRef]
  27. Rhee, C.; Dantes, R.; Epstein, L.; Murphy, D.J.; Seymour, C.W.; Iwashyna, T.J.; Kadri, S.S.; Angus, D.C.; Danner, R.L.; Fiore, A.E.; et al. Incidence and trends of sepsis in US hospitals using clinical vs claims data, 2009–2014. JAMA J. Am. Med. Assoc. 2017, 318, 1241–1249. [Google Scholar] [CrossRef]
  28. Mahindrakar, P.; Hanumanthappa, M. Data Mining In Healthcare: A Survey of Techniques and Algorithms with Its Limitations and Challenges. Int. J. Eng. Res. Appl. 2013, 3, 937–941. [Google Scholar]
  29. Ghassemi, M.; Naumann, T.; Schulam, P.; Beam, A.L.; Chen, I.Y.; Ranganath, R. A Review of Challenges and Opportunities in Machine Learning for Health 2018. arXiv 2018, arXiv:1806.00388. [Google Scholar]
  30. Waring, J.; Lindvall, C.; Umeton, R. Automated machine learning: Review of the state-of-the-art and opportunities for healthcare. Artif. Intell. Med. 2020, 104, 101822. [Google Scholar] [CrossRef]
  31. Palanisamy, V.; Thirunavukarasu, R. Implications of big data analytics in developing healthcare frameworks—A review. J. King Saud Univ. Comput. Inf. Sci. 2019, 31, 415–425. [Google Scholar] [CrossRef]
  32. Raghupathi, W.; Raghupathi, V. Big data analytics in healthcare: Promise and potential. Health Inf. Sci. Syst. 2014, 2. [Google Scholar] [CrossRef] [PubMed]
  33. Patel, S.; Patel, A. A Big Data Revolution in Health Care Sector: Opportunities, Challenges and Technological Advancements. Int. J. Inf. Sci. Tech. 2016, 6, 155–162. [Google Scholar] [CrossRef]
  34. Chawla, N.V.; Davis, D.A. Bringing big data to personalized healthcare: A patient-centered framework. J. Gen. Intern. Med. 2013, 28. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  35. Abinaya, K. Data Mining with Big Data e-Health Service Using Map Reduce. IJARCCE 2015, 4, 123–127. [Google Scholar] [CrossRef]
  36. Balladini, J.; Rozas, C.; Frati, F.; Vicente, N.; Orlandi, C. Big Data Analytics in Intensive Care Units: Challenges and applicability in an Argentinian Hospital. J. Comput. Sci. Technol. 2015, 15, 61–67. [Google Scholar]
  37. Belle, A.; Thiagarajan, R.; Soroushmehr, S.M.R.; Navidi, F.; Beard, D.A.; Najarian, K. Big data analytics in healthcare. BioMed Res. Int. 2015, 2015. [Google Scholar] [CrossRef] [Green Version]
  38. Chen, D.; Chen, Y.; Brownlow, B.N.; Kanjamala, P.P.; Arredondo, C.A.G.; Radspinner, B.L.; Raveling, M.A. Real-time or near real-time persisting daily healthcare data into HDFS and elasticsearch index inside a big data platform. IEEE Trans. Ind. Inform. 2017, 13, 595–606. [Google Scholar] [CrossRef]
Figure 1. Conceptual Framework—Keralty Health Portal.
Figure 1. Conceptual Framework—Keralty Health Portal.
Algorithms 13 00102 g001
Figure 2. Azure Big Data and Machine Learning Lambda Architecture.
Figure 2. Azure Big Data and Machine Learning Lambda Architecture.
Algorithms 13 00102 g002
Figure 3. Multilayer Perceptron Architecture.
Figure 3. Multilayer Perceptron Architecture.
Algorithms 13 00102 g003
Table 1. Number of samples by hypertensive class, gender, and race.
Table 1. Number of samples by hypertensive class, gender, and race.
Hypertension, Adults 20 and over—2007–2016
ClassGenderRacen
Non HypertensiveFemaleMexican American1269
Non-Hispanic Black1674
Non-Hispanic White3674
Other Hispanic951
Other Race—Including Multi-Racial864
MaleMexican American1255
Non-Hispanic Black1599
Non-Hispanic White3714
Other Hispanic774
Other Race—Including Multi-Racial843
HypertensiveFemaleMexican American205
Non-Hispanic Black420
Non-Hispanic White662
Other Hispanic149
Other Race—Including Multi-Racial114
MaleMexican American214
Non-Hispanic Black478
Non-Hispanic White670
Other Hispanic138
Other Race—Including Multi-Racial132
Total19,799
Table 2. Chi2 test and p-value for the independent variables.
Table 2. Chi2 test and p-value for the independent variables.
Chi-Squared between Each Indicator Variable and the Baseline for the Model
FeatureDescriptionDummyp-ValueScore
GENDERMaleGENDER_10.14164462.160001
FemaleGENDER_20.14502682.123795
AGERANGE20–30AGERANGE_10.0000001560.890568
31–40AGERANGE_20.0000001299.675698
41–50AGERANGE_30.000000198.221463
51–60AGERANGE_40.000003521.520345
61–70AGERANGE_50.0000001342.879412
71–80AGERANGE_60.00000011037.137074
RACEMexican AmericanRACE_10.00677977.330429
Other HispanicRACE_20.02757564.854409
Non-Hispanic WhiteRACE_30.04559123.996636
Non-Hispanic BlackRACE_40.000000191.264812
Other RaceRACE_50.000027817.562718
BMIRANGEUnderweight = <18.5BMIRANGE_10.67303610.178071
Normal weight = 18.5–24.9BMIRANGE_20.00003317.234712
Overweight = 25–29.9BMIRANGE_30.91745720.010741
Obesity = BMI of 30 or greaterBMIRANGE_40.000636211.666854
KIDNEYYesKIDNEY_10.000000158.963059
NoKIDNEY_20.18728891.738816
SMOKEYesSMOKE_10.00217599.394891
NoSMOKE_20.00534617.758468
DIABETESYesDIABETES_10.0000001217.214128
NoDIABETES_20.000000139.351672
BorderlineDIABETES_30.000005120.798905
Table 3. Model architecture parameters.
Table 3. Model architecture parameters.
Model Architecture Parameters
ParameterValue
Input Dimension27
Num Output classes2
Num Hidden Layers3
Hidden Layer1 Dimension674
Activation Func Layer1Relu
Hidden Layer2 Dimension336
Activation Func Layer2Relu
Hidden Layer3 Dimension168
Activation Func Layer3Relu
Minibatch size8
Num samples to train388
Num minibatches to train48
Loss Functioncross entropy with softmax
Eval ErrorClassification error
Learner for parametersmomentum sgd
Eval MetricsConfusion Matrix, AUC
Table 4. Quantitative sociodemographic variables in cases (186) and controls (369).
Table 4. Quantitative sociodemographic variables in cases (186) and controls (369).
Quantitative Socio Demographic VariableCasesControls
MeanMedianSDRIQMeanMedianSDRIQp-Value
Age23.9323.54.9920–2624.22236.1919–280.793
Onset of sexual activity16.06160.94515–1715.6160.97115–160.0001
Table 5. Qualitative sociodemographic variables in cases (186) and controls (369).
Table 5. Qualitative sociodemographic variables in cases (186) and controls (369).
Qualitative Socio Demographic VariableCategoriesCasesControlsX2p-Value
N%N%
Teen MotherYes158.16918.710.880.001
No17191.930081.3
Health RegimenGovernment18398.434994.64.510.041
Commercial31.6205.4
OriginRural4222.651.471.870.00001
Urban14477.436498.6
Marital StatusMarried or in common law married12868.810127.487.640.00001
Single, divorced or widow5831.226872.6
Level of educationElementary School8646.28021.735.570.00001
High School10053.828978.3
Start of Marital status life younger than 18 yoYes17895.735796.70.390.531
No84.3123.3
Start of Marital status life younger than 16 yoYes4725.314739.811.540.001
No13974.722260.2
Table 6. Quantitative Neonatal variables in cases (186) and controls (369).
Table 6. Quantitative Neonatal variables in cases (186) and controls (369).
Quantitative Neonatal VariableCasesControls
MeanMedianSDRIQMeanMedianSDRIQp-Value
New born weight in grams2639.92768.5546.52500–30203202.43224412.12950–35000.0001
APGAR after 1 min of birth7.738.00.6118.08.098.00.5988.00.0001
Table 7. Qualitative Neonatal variables in cases (186) and controls (369).
Table 7. Qualitative Neonatal variables in cases (186) and controls (369).
Qualitative Neonatal VariableCategoriesCasesControlsX2p-Value
N%N%
PrematureYes10053.8256.8156.40.0001
No8646.234493.2
GenderMale10958.620254.70.7481.672
Female7741.416745.3
Less than 1500 gramsYes115.920.515.60.00001
No17594.136799.5
Less than 2500 gramsYes4423.792.464.440.00001
No14276.336097.6
APGAR less than 7 after 1 min of birthYes21.130.80.0950.999
No18498.936699.2
APGAR less than 7 after 5 minYes42.292.40.0450.999
No18297.836097.6
Respiratory distressYes8947.8277.3122.80.0001
No9752.234292.7
Table 8. Quantitative Obstetric variables in cases (186) and controls (369).
Table 8. Quantitative Obstetric variables in cases (186) and controls (369).
Quantitative Obstetric VariableCasesControls
MeanMedianSDRIQMeanMedianSDRIQp-Value
Gestational age at the time of birth35.636.03.4734–3938.439.01.6238–390.0001
Number of prenatal controls4.085.01.833.75–5.04.325.01.834–5.00.002
Number of pregnacies1.771.01.151.0–2.01.61.01.151–2.00.076
Number of births1.041.01.030–10.71.01.030–10.0001
Numbers of C-sections0.651.00.680–10.761.00.680–10.029
Table 9. Qualitative Obstetric variables in cases (186) and controls (369).
Table 9. Qualitative Obstetric variables in cases (186) and controls (369).
Qualitative Obstetric VariableCategoriesCasesControlsX2p-Value
N%N%
Type of birthVaginal9852.716243.93.8330.05
C-Section8847.320756.1
IUGR BackgroundYes52.7133.50.2750.6
No18197.335696.5
Assistance for prenatal controlYes16588.731886.20.7020.402
No2111.35113.8
Assistance for at least 4 prenatal controlYes14075.330181.63.010.083
No4624.76818.4
Assistance for at least 5 prenatal controlYes10556.525468.88.3010.004
No8143.511531.2
Premature rupture of membrane with more than 18 hoursYes9551.1174.6165.70.00001
No9148.935295.4
ChorioamnionitisYes2312.430.836.960.00001
No16387.636699.2
Premature membrane rupture with more than 6 hoursYes16186.619452.661.960.0001
No2513.417547.4
Multiple PregnaciesYes21.1102.70.390.353
No18498.935997.3
Table 10. Qualitative maternal infections variables in cases (186) and controls (369).
Table 10. Qualitative maternal infections variables in cases (186) and controls (369).
Qualitative Maternal Infections VariablesCategoriesCasesControlsX2p-Value
N%N%
Maternal FeverYes6736.04010.850.380.0001
No11964.032989.2
Yeast InfectionsYes3116.7154.125.830.0001
No15583.335495.9
Sexualy transmitted disease historyYes2714.571.934.240.0001
No15985.536298.1
Urinary Tract InfectionsYes115.992.44.290.0381
No17594.136097.6
Table 11. Confusion matrix.
Table 11. Confusion matrix.
Predicted
Non-HypertensiveHypertensive
TrueNon-Hypertensive34071575
Hypertensive216730
Table 12. Classification report.
Table 12. Classification report.
Classification Report
PrecisionRecallf1-ScoreSupport
Non-Hypertensive0.940.680.794982
Hypertensive0.320.770.45946
avg/total0.840.70.745928
Table 13. Confusion matrix.
Table 13. Confusion matrix.
Predicted
Non-SepsisSepsis
TrueNon-Sepsis9510
Sepsis1249
Table 14. Classification report.
Table 14. Classification report.
Classification Report
True PositiveFalse NegativePrecisionAccuracy
49120.830.867
False PositiveTrue NegativeRecallf1-score
10950.8030.817
Positive Label: 1Negative Label: 0
Table 15. Comparison of healthcare big data platforms.
Table 15. Comparison of healthcare big data platforms.
Author and YearPatient CentricPredictive AnalysisReal Time MonitoringImprove TreatmentInteroperabilityWorkflow and RulesPop HealthPatient 360
Our Health PlatformYesYesYesYesYesYesYesYes
Raghupathi et al. (2014) [32]YesNoYesYesPartialNoPartialNo
Patel et al. (2016) [33]YesYesYesYesPartialPartialYesNo
Chawla et al. (2013) [34]YesYesNoYesPartialPartialYesNo
Abinaya et al. (2015) [35]PartialYesPartialYesYesPartialNoNo
Balladini et al. (2015) [36]YesNoYesYesPartialYesNoNo
Belle et al. (2015) [37]PartialNoYesYesPartialYesNoNo
Mezghani et al. (2015)PartialNoYesYesYesPartialNoNo
Chen et al. (2017) [38]YesPartialYesYesYesPartialYesNo

Share and Cite

MDPI and ACS Style

López-Martínez, F.; Núñez-Valdez, E.R.; García-Díaz, V.; Bursac, Z. A Case Study for a Big Data and Machine Learning Platform to Improve Medical Decision Support in Population Health Management. Algorithms 2020, 13, 102. https://doi.org/10.3390/a13040102

AMA Style

López-Martínez F, Núñez-Valdez ER, García-Díaz V, Bursac Z. A Case Study for a Big Data and Machine Learning Platform to Improve Medical Decision Support in Population Health Management. Algorithms. 2020; 13(4):102. https://doi.org/10.3390/a13040102

Chicago/Turabian Style

López-Martínez, Fernando, Edward Rolando Núñez-Valdez, Vicente García-Díaz, and Zoran Bursac. 2020. "A Case Study for a Big Data and Machine Learning Platform to Improve Medical Decision Support in Population Health Management" Algorithms 13, no. 4: 102. https://doi.org/10.3390/a13040102

APA Style

López-Martínez, F., Núñez-Valdez, E. R., García-Díaz, V., & Bursac, Z. (2020). A Case Study for a Big Data and Machine Learning Platform to Improve Medical Decision Support in Population Health Management. Algorithms, 13(4), 102. https://doi.org/10.3390/a13040102

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop