Revolutionizing Utility of Big Data Analytics in Personalized Cardiovascular Healthcare

Sharma, Praneel; Sharma, Pratyusha; Sharma, Kamal; Varma, Vansh; Patel, Vansh; Sarvaiya, Jeel; Tavethia, Jonsi; Mehta, Shubh; Bhadania, Anshul; Patel, Ishan; Shah, Komal

doi:10.3390/bioengineering12050463

Open AccessReview

Revolutionizing Utility of Big Data Analytics in Personalized Cardiovascular Healthcare

by

Praneel Sharma

¹,

Pratyusha Sharma

²,

Kamal Sharma

^3,*

,

Vansh Varma

⁴

,

Vansh Patel

⁵

,

Jeel Sarvaiya

⁵

,

Jonsi Tavethia

⁵

,

Shubh Mehta

⁵,

Anshul Bhadania

⁵

,

Ishan Patel

⁶ and

Komal Shah

⁷

¹

Department of Information and Communication Technology, Dhirubhai Ambani Institute of Information and Communication Technology (DAIICT), Gandhinagar 382007, Gujarat, India

²

Department of Computer Science & Engineering, Ahmedabad University, Ahmedabad 380009, Gujarat, India

³

Department of Cardiology, SAL Hospital, Ahmedabad 380054, Gujarat, India

⁴

GMERS Medical College and Hospital, Valsad 396001, Gujarat, India

⁵

BJ Medical College, Civil Hospital, Ahmedabad 380016, Gujarat, India

⁶

Department of Biology, Nova Southeastern University, Fort Lauderdale, FL 33328, USA

⁷

Indian Institute of Public Health, Gandhinagar 382042, Gujarat, India

^*

Author to whom correspondence should be addressed.

Bioengineering 2025, 12(5), 463; https://doi.org/10.3390/bioengineering12050463

Submission received: 19 March 2025 / Revised: 21 April 2025 / Accepted: 23 April 2025 / Published: 27 April 2025

(This article belongs to the Special Issue Smart Applications and Technology for Cardiovascular Disease Management)

Download

Browse Figure

Review Reports Versions Notes

Abstract

The term “big data analytics (BDA)” defines the computational techniques to study complex datasets that are too large for common data processing software, encompassing techniques such as data mining (DM), machine learning (ML), and predictive analytics (PA) to find patterns, correlations, and insights in massive datasets. Cardiovascular diseases (CVDs) are attributed to a combination of various risk factors, including sedentary lifestyle, obesity, diabetes, dyslipidaemia, and hypertension. We searched PubMed and published research using the Google and Cochrane search engines to evaluate existing models of BDA that have been used for CVD prediction models. We critically analyse the pitfalls and advantages of various BDA models using artificial intelligence (AI), machine learning (ML), and artificial neural networks (ANN). BDA with the integration of wide-ranging data sources, such as genomic, proteomic, and lifestyle data, could help understand the complex biological mechanisms behind CVD, including risk stratification in risk-exposed individuals. Predictive modelling is proposed to help in the development of personalized medicines, particularly in pharmacogenomics; understanding genetic variation might help to guide drug selection and dosing, with the consequent improvement in patient outcomes. To summarize, incorporating BDA into cardiovascular research and treatment represents a paradigm shift in our approach to CVD prevention, diagnosis, and management. By leveraging the power of big data, researchers and clinicians can gain deeper insights into disease mechanisms, improve patient care, and ultimately reduce the burden of cardiovascular disease on individuals and healthcare systems.

Keywords:

big data analytics; cardiovascular diseases; personalized medicine

1. Introduction

The term “big data analytics (BDA)” defines the computational techniques to study complex datasets that are too large for common data processing software, encompassing techniques such as data mining (DM), machine learning (ML), and predictive analytics (PA) to find patterns, correlations, and insights in massive datasets [1]. BDA has potential applications in healthcare to transform patient care and manage diseases by allowing personalized treatment plans and enhancing clinical outcomes [2]. Cardiovascular diseases (CVDs) are attributed to a combination of various risk factors, including sedentary lifestyle, obesity, diabetes, dyslipidaemia, and hypertension [3]. With mammoth datasets generated from electronic health records (EHR) and wearable health technologies, data sciences can identify trends and outcomes and personalize interventions. Apart from a better assessment of treatment efficacy, BDA not only reduces selection bias, but also, with the integration of wide-ranging data sources, such as genomic, proteomic, and lifestyle data, it could help understand the complex biological mechanisms behind CVD, including risk stratification in risk-exposed individuals [3]. Predictive modelling can help in the development of personalized medicines, particularly in pharmacogenomics; understanding genetic variation helps to guide drug selection and dosing, with the consequent improvement in patient outcomes [4].

The objective of this article is to evaluate and compare the available BDA algorithms along with AI, ML, and PL in the current era with their pitfalls and how these can be further integrated to refine CVD diagnostics and management.

To summarize, incorporating BDA into cardiovascular research and treatment represents a paradigm shift in our approach to CVD prevention, diagnosis, and management. By leveraging the power of big data, researchers and clinicians can gain deeper insights into disease mechanisms, improve patient care, and ultimately reduce the burden of cardiovascular disease on individuals and healthcare systems.

2. Big Data in Healthcare

The term “big data” refers to volumes of data that are too huge to handle using conventional software or web-based systems. Douglas Laney’s concept of three dimensions—volume, velocity, and variety, or the “three Vs” of (big) data is increasing [5] Variety refers to the various organized and disorganized data kinds that any company or system can gather, such as transaction-level data, video, audio, text, or log files. Velocity, on the other hand, describes the speed or rate of data collecting and making it accessible for further analysis. These three pillars now serve as the accepted definition of big data [6].

BDA is a complex process of reviewing large and varied amounts of data to identify hidden relationships, patterns, market trends, and other relevant information. This field uses different methods and tools, including data mining (DM), predictive modelling (PM), machine learning (ML), and artificial intelligence (AI), to process and analyse enormous amounts of data [7].

2.1. Enhancing CVD Treatment and Research Through Big Data Analytics

BDA uses large datasets, including genomic information, electronic health records (EHRs), and real-time health monitoring, to provide more comprehensive and detailed knowledge of cardiovascular illnesses. In healthcare, BDA can be helpful by using it as mentioned in Table 1.

Predictive algorithms have the ability to assess how each patient will react to a given treatment, which allows for the development of customized therapeutic strategies that maximize benefits and reduce side effects [8]. Furthermore, BDA can find new therapeutic targets and biomarkers by revealing patterns in huge datasets. Real-time modifications to treatment regimens are also made possible by the capacity to continuously monitor patient data using wearable technology and wireless mobile apps. This dynamic strategy maximizes therapy efficacy and improves patient outcomes by enabling prompt interventions and alterations based on ongoing data [9].

2.2. Challenges

BDA and management in healthcare have several obstacles. Health information is sensitive, making data security and privacy important issues. When combining various data sources, issues with compatibility occur, making it more difficult to synthesize accurate insights. Furthermore, the enormous amount of data necessitates significant processing and storage capacity, which is frequently expensive [10].

3. Applications of Big Data Analytics in Cardiovascular Diseases

BDA in the context of cardiovascular diseases (CVD) is a revolutionary tool for population health management, drug safety monitoring, personalized treatment, predictive modelling, and quality of care.

The combination of ML and AI has greatly improved predictive modelling. For example, retinal fundus photos have been proposed to identify cardiovascular risk factors, strengthening risk assessment and enabling early intervention options [11], exemplifying non-invasive imaging techniques with BDA to improve the predicted accuracy of cardiovascular risk assessment with models that analyse variables including demographic, clinical, and risk factors to stratify patients based on their risk profiles and reducing the incidence of cardiovascular events [12].

Various studies have proposed different models utilizing different inputs beyond BDA in prediction analysis for cardiovascular diseases, as shown in Table 2 [13].

Big data can be used to merge genetic information, lifestyle factors, and medical history to enable human–computer interaction and offer customized recommendations that enhance patient adherence and results [20]. This method is further improved by pharmacogenetics, which takes into account how genes affect drug responses [21].

Studies on the critical medications for CVD in low- and middle-income nations emphasize the significance of data-driven strategies for reducing healthcare inequities and gathering data from a variety of sources [22]. The ability to merge and evaluate the merged data of the mobile health technologies and biometric sensors is transforming population health management through the real-time tracking of cardiovascular health indicators, early intervention, and patient self-care including safety surveillance of elderly individuals’ prescription drugs [23]. The inclusion of empirical data in safety monitoring has been improving comprehension of drug efficacy across patient demographics. The study by Schaffer et al., 2022 [23] examines the hazards associated with polypharmacy use in patients treated with cardiovascular medicines and the optimization of their treatment plans and safety [24] (Jurgens et al., 2022). Big data help healthcare providers measure and improve the quality of cardiac care by analysing information about patient outcomes, allowing healthcare organizations to evaluate the effectiveness of treatments and identify areas for improvement. The potential of AI in personalized heart care has been emphasized for its role in improving care quality through data-driven insights [25].

4. Predictive Modelling and Risk Assessment

The technological advancements in the fields of ML, AI, and statistics, when implemented sequentially, can identify patients likely to experience cardiovascular events and help in suggesting appropriate therapies that can increase their survival rates.

An important application of ML is Artificial Neural Networks (ANNs) as depicted in Figure 1, being applied to symptoms and signs, drug dosing, the analysis of cardiac imaging, ECG signal interpretation, and the detection and management of coronary artery disease [26]. Simplistically put, based on the scientifically and statistically significant input datasets apart from the outliers, after being incorporated in the input layer, a weighted analysis identifies the contribution of each variable in the hidden layer, which, further in the weighted analysis in the output layer, gives output data as the presence or absence of cardiovascular disease, as shown in Figure 1. One such study described how effective machine learning is in predicting atrial fibrillation (AF) from EHRs and Cardiac Magnetic Resonance Imaging (CMRI), indicating how these models have helped to offer precise prediction to the patients on the basis of image and clinical findings [27]. The QRISK4 and PREVENT models leverage EHR data combined with social determinants of health to predict 10- to 30-year cardiovascular risks. These models exclude race as a predictor and instead use social deprivation indices (income, education, housing) to address disparities [28]. A comparative cross-sectional study between ML and conventional mode of survival analysis showed that ML algorithms have a superior ability to predict cardiovascular risks, surpassing the conventional statistical approach. However, ML models may produce biased risk estimates if they fail to consider censoring in survival data [29].

An interesting study is the development of a system for predicting heart attacks that leverages user input to generate complete risk scores. This approach shows how AI-powered apps can leverage machine learning-based health predictors to manage cardiovascular health, using many risk factors and assessing them simultaneously. More importantly, big data from national administrative databases have been incorporated to predict cardiovascular risk through deep learning (DL) techniques, which reflect the importance of substantial data for improving the precision of risk evaluations [29].

ML algorithms can be used in EHR data as a technology to predict the occurrence of cardiovascular disorders [30]. Additionally, wearable technology like smart watches with built-in detectors and mobile applications enable the continuous monitoring of patient data, which further improves the potential for real-time risk assessment and can help in early intervention [31].

However, big data analysis has its limitations in its application to cardiovascular health. Despite excellent accuracy, ML can introduce extra complexity and obscurity in interpretation, making it very difficult for physicians to understand the factors behind the predictions [32]. Data-level approaches to address demographic biases and underrepresentation can be addressed with resampling by adjusting training data balance by oversampling minority groups or undersampling majority groups.

Table 3 summarizes various models used for cardiac data analysis, detailing datasets, algorithms, features, and accuracy rates. Techniques like Learning Vector Quantization (LVQ), Recurrent Neural Networks (RNN), Multilayer Perceptron (MLP), and Classification and Regression Tree (CART) are applied to datasets such as UCI, Cleveland, and Framingham. Accuracy rates range from 71% to 98%, highlighting the efficiency of different approaches like LVQ and Multinomial Logistic Regression (MLR).

5. Personalized Medicine

Personalized medicine, or precision medicine, is a revolutionary change in healthcare that concentrates on customizing medical treatments according to the unique characteristics of each patient. This method differs from the traditional “one-size-fits-all” approach. In the field of personalized medicine, the utilization of BDA has created opportunities for innovative and targeted treatment approaches. These tactics rely on combining genomics, precision medicine, and advanced data analytics to enhance the precision of diagnoses and outcomes. The Apple Heart Study analysed data from 400,000 participants using smartwatches to detect irregular pulses. This wearable-derived data, when integrated with EHRs for validating atrial fibrillation detection, achieved 84% concordance with clinical ECG findings [37].

5.1. Role of Genomics and Precision Medicine

Personalized medicine has paved the way for FDA-approved treatments, targeting the specific characteristics of individuals, including not only a person’s genetic makeup but also the genetic profile of their tumour. Patients with many cancers now have access to molecular testing, allowing physicians to select treatments that improve survival chances while decreasing exposure to adverse effects [38]. However, the greater integration of faster processors, larger memory, and highly advanced algorithms, methodologies, and cloud computing may become critical for the continued existence of clinical information [39].

Based on the genetic information from a patient, it can establish genetic propensities for specific cardiovascular diseases, such as coronary artery disease or atrial fibrillation. For example, certain gene variants result in polymorphism in how individuals metabolize selected medications. This gives rise to pharmacogenomics, in which medications are genetically tailored for the patient, aiming for efficacy maximization while minimizing adverse effects. Genomic epidemiology using an integrated approach in groups representing categories for cardiovascular conditions with many genes and single nucleotide polymorphisms (SNPs) has identified elevated cholesterol and other lipid disorders, hypertension, and blood pressure in atherosclerosis and myocardial infarction [40]. Any given SNP is found on virtually every 1000 base pairs with approximately 3 billion base pairs of the human genome sequence, and there are likely >10,000,000 or an even greater number of SNPs that may make up the variation to enable SNP genotyping [41].

Network analysis carried out by utilizing a precision medicine perspective is capable of clustering individuals purely on endophenotype. It optimizes medication and behavioural changes that improve health, disease prediction and prognosis, disease biomarker identification, clinical trial enrolment enrichment, and the development of exposure that can optimally be tailored to the patient [42].

The BDA approach can break through the limits of traditional genome-wide association studies (GWAS) in engaging the propensity of understanding genetic variances and their interactions with environmental factors. For large-scale GWAS, there has been a large number of genetic loci significantly related to blood pressure and hypertension, and the Exome Aggregation Consortium have found the phenomenon in which some mutations such as those in the BMPR2 gene are associated with pulmonary arterial hypertension with variable penetrance. By integrating multiple data sources and employing novel analytical frameworks, big data analysis progresses towards precision medicine with targeted approaches and treatment modalities for cardiovascular diseases [43].

5.2. Tailoring Treatments Based on Individual Data

Various treatment modalities for cardiovascular disease are being aimed at as personalized medicine via Nanobiotechnology and Stem-Cell Therapy.

Nanoparticles are known to possess dual functions, offering both molecular imaging and therapeutic delivery. Most commonly, magnetic nanoparticles (MNP) such as contrast agents in MRI are said to detect high-risk plaque in atherosclerosis and show active vascular inflammation or thrombosis [44]. Such nanoparticles may be modified with therapeutic agents such as plasmid DNA for gene delivery or to allow targeted therapy verified through imaging modalities [45]. Additionally, perfluorocarbon nanoparticles could unite molecular imaging with localized drug delivery or infuse an enhanced targeting approach to therapeutic interventions. The development of lncRNA AK083884 and nanozyme-enhanced tyramine signal amplification probes have shown protection from viral myocarditis, and its genetic implications can have futuristic value in preventing the same [46,47].

Conversely, personalized cell therapy offers the possibility to obtain iPSCs from a patient’s somatic cells, allowing this personalized approach to create cardiomyocytes and bioengineered pacemakers adapted to individual disease profiles. These advances show how individualized data transform therapeutic strategies into precision treatment options in cardiovascular care [48].

AI techniques include LoRA, quantization, and several others. The former is a regression by which complicated machine learning models are made simpler. For example, these AI models, which make predictions regarding how patients respond to treatments for CVDs, can further develop LoRA’s optimization capabilities to effectively mitigate the difficulty in analysing extensive datasets that may enhance clinical decision-making. Together, LoRA and quantization make probable individualized treatments that are not only precise but widely available, improving the management of CVDs across settings [48].

Familial hypercholesterolaemia is a monogenic disease characterized by slightly elevated levels of LDL-C, with the early development of atherosclerosis and consequent cardiovascular disease in youth. The mutations in the LDLR gene can be of two types: homozygous (HoFH), where there are functional deleterious mutations in both alleles, or heterozygous (HeFH), involving deficiencies in one allele. Inhibitors of PCSK9 in patients with residual LDLR activity are represented by monoclonal antibodies—Evolocumab and Alirocumab—which reduce LDL-C levels through the inhibition of PCSK9. In HoFH patients bearing null mutations in a single LDLR allele, ANGPTL3 inhibitors such as Evinacumab reduce LDL-C substantially. Inclisiran, as a small interfering RNA (siRNA), allows the administration of a less frequent dosing schedule by inhibiting hepatic PCSK9. Their capability to individualize treatment based on genetic profiles and improve the treatment outcomes for patients with FH represents another landmark towards personalized medicine [49].

5.3. Using Big Data for Public Health Initiatives

The goal of ML and AI is to develop models that learn from multiple data types to make predictions, similar to human cognitive functions. Multimodal ML combines multiple data types like image, text, and speech to create more robust models. The process of learning representations based on multimodal input sources is called multimodal learning [50]. The process of merging information from several measuring modalities—such as imaging, text, and genetic data—in the context of cardiovascular illness is known as data fusion. Recent evidence highlights the application of multimodal learning and data fusion in improving risk stratification and decision-making for cardiovascular diseases. Researchers combine genomic data, cardiac imaging, and EHRs to identify high-risk individuals for major adverse cardiovascular events (MACE) with unprecedented accuracy. For instance, the UK Biobank34 and the Million Veterans Program 35 provide integrated genetic markers and quantified imaging studies with Electronic Health Records (EHRs) that may be utilized to enhance prognostic and therapy response prediction for cardiovascular illness [51]. Moreover, AI-driven systems trained on large-scale datasets like the UK Biobank can offer personalized recommendations for therapeutic interventions by predicting the likelihood of specific responses to treatments.

5.4. Identifying Trends and Patterns in CVD

Risk Prediction Models: Traditional cardiovascular risk prediction models use only a few clinical parameters such as age, sex, blood pressure, and cholesterol levels. BDA can make the models more powerful by incorporating increasingly large pools of data including imaging and electronic health records (EMRs). Chaves et al. developed an advanced risk prediction framework by combining deep learning techniques with abdominopelvic CT imaging features and EMR data. Their approach outperformed traditional risk scores for the prediction of ischemic heart disease (IHD). Higher predictive power could be achieved because the model integration of imaging and EMR could reveal interactions and correlations that were not obvious before [52].

5.5. Improving Disease Detection and Diagnosis

a.: Acute Cardiovascular Disease Detection: Pattern recognition via BDA makes it possible to detect acute cardiovascular diseases in early stages with high accuracy. Zhang et al., [53] proposed a multimodal-based strategy by fusing ECG, phonocardiograms, echocardiography, Holter monitors, and biological markers for CAD detection, reaching high diagnostic accuracy based on the complementary information among different data modalities [52].
b.: Severity Assessment: The appraisal of cardiovascular disease severity can be greatly improved via the integration of several imaging techniques [52] by combining echocardiography and cardiac MRI to boost the prediction of sudden cardiac death in dilated cardiomyopathy patients. This multimodal method reaches a more complete evaluation of cardiac function and structure and increases accuracy in severity appraisal [54].
c.: Early Identification of At-Risk Populations: BDA also enables the early identification of populations at risk for CVD by utilizing lifestyle, genetic, and environmental data. Studies have demonstrated the integration of wearable device data, genomics, and social determinants of health to identify high-risk individuals for atrial fibrillation. This proactive approach facilitates timely preventive interventions, significantly reducing the burden of disease and healthcare costs

In the study by Chaves et al., a segmentation model alone, an imaging model alone, a clinical model alone, and fusion model-based machine learning models were developed and validated in a 5-year ischaemic heart disease prediction task. The “Imaging + Clinical Fusion” model identified more people at high risk of future 5-year ischaemic heart disease when compared against standard of care tools such as Framingham Risk Score (AUC 0.862 vs 0.776) [55]. Illustrating the potential of multimodal data fusion for reducing cardiovascular disease risk as well as early detection, Ali et al. used data from mobile and medical sensors (i.e., blood pressure, ECG, EMG), electronic medical records (EMR), and other relevant sources for multimodal fusion to develop an ML model capable of automatically giving recommendations for cardiovascular treatment. The important phases are outlined in data collection, feature extraction, feature fusion, and prediction processing by applying deep learning techniques. Before using deep learning mechanisms for disease detection and prescription plans, firstly, sensor and EMR data were fused via a feature matrix. Several preprocessing steps such as normalization and feature selection have demonstrated how this can help make the cardiovascular disease care process more accessible and precise by generating personalized recommendations based on patient data acquired from wearable low-cost sensors or other devices. They also show the disruptive changes made by big-data analytics and machine learning in advancing patient disease management and prescription plans [56].

6. Drug and Medical Device Safety Surveillance

One essential component of post-market surveillance is tracking negative effects. The Manufacturer and User Facility Device Experience (MAUDE) database maintained by the FDA is the main source of adverse event reports; however, because it relies on passive reporting, there may be underreporting and a dearth of thorough exposure data [57]. Research suggests that the incorporation of unique device identifiers (UDIs) into electronic health records (EHRs) may improve the monitoring of adverse events related to devices, which in turn may enhance the identification of safety signals [58].

Furthermore, active surveillance methods, such as the Data Extraction and Longitudinal Trend Analysis (DELTA) network, have demonstrated promise in detecting new safety issues via automated signal detection [59]. Real-time data analysis has emerged as one of the most important tools in medical device safety surveillance in recent years. The FDA’s Sentinel Initiative is one example of this approach, which uses electronic health data to monitor drug and device safety [60]. Emerging technologies, such as natural language processing (NLP) and machine learning algorithms, are now being used to analyse unstructured data within EHRs, such as physician notes and patient narratives, offering enhanced capabilities for detecting safety signals beyond traditional systems like MAUDE [59].

Case studies show how real-time data can inform safety assessments and enhance patient outcomes, such as those involving endovascular aneurysm repair devices, highlighting the usefulness of linked registry claims data in long-term surveillance [61]. For example, registry-based active surveillance, as demonstrated by Resnic et al. [62], successfully linked patient registries with claims data to detect safety issues with endovascular aneurysm repair devices, directly influencing clinical guidelines and improving outcomes. Numerous studies provide instances of effective surveillance methods. For example, a survey of medical device manufacturers found that single-arm trials and registries are used for most of the post-market clinical research and that these methods frequently lack the statistical power required for thorough safety evaluations [63].

Furthermore, the examination of incidents involving ventilators has demonstrated that improving healthcare personnel’s knowledge and training can greatly lower adverse events [64]. In summary, a multimodal strategy that incorporates case studies to guide best practices, real-time data analysis, and the strict monitoring of side effects is needed for efficient drug and medical device safety surveillance.

7. Quality of Care and Performance Measurement

In cardiovascular medicine, using comprehensive data analytics aids the ongoing evaluation of quality through various performance metrics. Common indicators include patient death rates, adherence to clinical guidelines, hospital readmission frequencies, and the careful use of treatments like percutaneous coronary intervention (PCI) within specific timeframes. Newer indicators have appeared such as tracking frailty in heart failure patients incorporating EHR data and using machine learning to spot patterns linked to poor outcomes. A systematic review showed that hospitals that use data analytics platforms to monitor real-time adherence to clinical protocols saw better results in both death and readmission rates, showing how data-driven monitoring improves overall care quality [64].

A key study looked at using neural networks to predict the risk of gestational diabetes mellitus (GDM). The researchers examined a dataset from the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) using a radial basis function network (RBFNetwork) algorithm. The findings showed that the model could predict GDM. This allows doctors to step in and reduce unfavourable outcomes for mothers and babies [65].

8. Challenges and Future Directions

Enhancing patient outcomes, predicting outbreaks, and providing deep insights not only help with any kind of prevention but also open doors to an increased quality of life. Ultimately, big data shall transform healthcare to deliver improved health to individuals and communities as well [66].

One of the challenges it brings about is that dealing with data in many cases proves easier said than done, from the simple task of catching/collecting and storing to more complex tasks such as analysis and visualization. There are many issues regarding data structure—big data, instead of being informed and open, is partly dispersed, scattered, and rarely consistent from source to origin, with data security being a major issue in storage and transfers (especially the costs associated with securing, storing, and transferring unstructured data), and there is a shortage of managerial and analytical skills, particularly for real-time analytics [67].

The extensive collection of medical data from a variety of sources poses a significant challenge for data scientists regarding its integration and practical application. To advance healthcare, it is crucial to merge bioinformatics, health informatics, and analytics to foster personalized and effective treatments. The arrival of big data has already resulted in remarkable progress in healthcare, spanning from data management to drug discovery for intricate diseases. Instead of replacing specialists, big data will bolster advancements in healthcare, redirecting attention towards personalized care and predictive analytics. Future applications include forecasting health outcomes, predicting epidemics, and discovering new biomarkers and therapies, ultimately improving quality of life.

The ethical considerations associated with big data in healthcare are highly complex and multidimensional, covering a range of issues that significantly affect individuals, institutions, and the overall society.

1.: Informed Consent: Individuals may provide consent for the collection and use of their data without fully understanding the potential future applications, particularly as data can be repurposed, aggregated, and shared across diverse platforms [68] Google’s Project Nightingale collected healthcare data from millions without patient consent, leading to public backlash and calls for stricter transparency and consent protocols. IBM’s AI ethics initiatives emphasize transparency and explainability, requiring that AI decision-making processes be understandable to stakeholders (7 Essential Data Ethics Examples for Businesses in 2025).
2.: Privacy and Confidentiality: Protecting the privacy and confidentiality of sensitive health data is a primary concern in big data applications [69]. While anonymization and de-identification techniques are routinely used, these methods are not foolproof, as advances in data linkage and re-identification techniques have demonstrated that individuals can still be identified by combining disparate data sources [70].
3.: Data Ownership and Control: The issue of data ownership is a central ethical challenge in the big data landscape [71]. This raises questions regarding the rights of data subjects and whether they should be entitled to a share of the benefits that result from the use of their data, particularly in cases where institutions or corporations derive financial or intellectual gains.
4.: Equity and the Big Data Divide: The capacity to harness big data for healthcare innovation is disproportionately concentrated among institutions with advanced technological infrastructure, deep financial resources, and sophisticated analytical expertise. This concentration creates a “big data divide”, wherein institutions and populations with fewer resources may be left behind, exacerbating existing health disparities [72].
5.: Epistemological Challenges: The vast scale of big data in healthcare creates an over-reliance on correlation-driven insights, often without a clear understanding of the underlying causal mechanisms. This presents significant epistemological challenges, as decisions based on superficial correlations may lead to erroneous conclusions and suboptimal interventions [73].

Table 4 presents the key challenges of big data analytics (BDA) in healthcare, along with relevant examples.

9. Future Directions: Balancing Core Considerations

A consistent patient consent framework is essential. Patient consent drives data availability, with evidence suggesting that individuals are generally willing to share their health data when approached [80]. The flexibility of HIPAA rules allows consent forms to be combined across research studies, making the process more efficient [81]. Dynamic consent forms tailored to the specific needs of each study and patient record could streamline this process, only including relevant elements [82].

Data de-identification is yet another crucial security measure. Despite worries concerning the ease of re-identification, research suggests that attacks on fully HIPAA-compliant de-identified datasets are relatively uncommon [83]. The big data platforms make de-identified data more useful for research by facilitating the statistical analysis of widespread datasets and longitudinal studies without disclosing private health information [83]. Healthcare institutions should create reliable rules for data usage that shield people from potential damage in situations where data segmentation is inadequate. Encryption and authorization verification processes further enhance data security while encouraging transparency and trust in big data research.

10. Conclusions

The dawn of a new era of BDA has enormous potential for providing physicians and other healthcare professionals with a personalized and individualized approach to patient treatment based on vast amounts of comprehensive data. However, it is apparent that BDA will advance as a new style of practising medicine.

Biomedical and healthcare tools like genomics, biometric sensors, and smartphone apps now generate vast amounts of data, necessitating a better understanding of how to leverage this information. Integrating big data from electronic health records (EHRs), electronic medical records (EMRs), and other sources helps refine prognostic models and treatment strategies. Healthcare analytics firms aim to lower costs, enhance clinical decision support (CDS) systems, and develop effective platforms while facing challenges related to data privacy and security. The vast data pool from healthcare and biomedical research has led to improved disease diagnosis, treatment, and prevention. Advances in computing, including supercomputers and quantum computing, accelerate the extraction of actionable insights from big data. Despite infrastructure challenges, BDA continue to drive innovation in clinical practices and personalized healthcare.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kim, D.S.; Park, J.W. Medical big data: Promise and challenges. Kidney Res. Clin. Pract. 2017, 36, 3–11. Available online: https://www.krcp-ksn.org/journal/view.php?id=10.23876/j.krcp.2017.36.1.3 (accessed on 14 September 2024).
Silverio, A.; Cavallo, P.; De Rosa, R.; Galasso, G. Big health data and cardiovascular diseases: A challenge for research, an opportunity for clinical care. Front. Med. 2019, 6, 36. [Google Scholar] [CrossRef] [PubMed]
American Heart Association. Heart disease and stroke statistics—2019 update: A report from the American Heart Association. Circulation 2019, 139, e56–e528.
Roden, D.M.; Van Driest, S.L.; Wells, Q.S.; Mosley, J.D.; Denny, J.C.; Peterson, J.F. Opportunities and challenges in cardiovascular pharmacogenomics. Circ. Res. 2018, 122, 1176–1190. [Google Scholar] [CrossRef]
Hammad, R.; Barhoush, M.; Abed-Alguni, B.H. A semantic-based approach for managing healthcare big data: A survey. J. Healthc. Eng. 2020, 2020, 8865808. [Google Scholar] [CrossRef]
Di Mauro, M.; Greco, M.; Grimaldi, M. A formal definition of big data based on its essential features. Libr. Rev. 2016, 65, 122. Available online: https://www.academia.edu/23962108/A_formal_definition_of_Big_Data_based_on_its_essential_features (accessed on 14 September 2024). [CrossRef]
Batko, K.; Ślęzak, A. The use of big data analytics in healthcare. J. Big Data 2022, 9, 3. Available online: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8733917/ (accessed on 14 September 2024). [CrossRef]
Toumieux, P.; Chevalier, L.; Sahuguède, S.; Julien-Vergonjanne, A. Optical wireless connected objects for healthcare. Healthc. Technol. Lett. 2015, 2, 118–122. [Google Scholar] [CrossRef]
Awrahman, B.J.; Aziz Fatah, C.; Hamaamin, M.Y. A review of the role and challenges of big data in healthcare informatics and analytics. Comput. Intell. Neurosci. 2022, 2022, 5317760. [Google Scholar] [CrossRef]
Poplin, R.; Varadarajan, A.V.; Blumer, K.; Liu, Y.; McConnell, M.V.; Corrado, G.S.; Peng, L.; Webster, D.R. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat. Biomed. Eng. 2018, 2, 158–164. [Google Scholar] [CrossRef]
Houston Methodist. How Big Data Analytics Can Improve Cardiovascular Medicine; Methodology|Houston Methodist. 2024. Available online: https://read.houstonmethodist.org/how-big-data-analytics-can-improve-cardiovascular-medicine (accessed on 21 August 2024).
Quazi, S.; Malik, J.A. A systematic review of personalized health applications through human-computer interactions (HCI) on cardiovascular health optimization. J. Cardiovasc. Dev. Dis. 2022, 9, 273. [Google Scholar] [CrossRef] [PubMed]
Srinivasan, S.; Gunasekaran, S.; Mathivanan, S.K.; Benjula Anbu Malar, M.B.; Jayagopal, P.; Dalu, G.T. An active learning machine technique-based prediction of cardiovascular heart disease from UCI repository database. Sci. Rep. 2023, 13, 13588. [Google Scholar] [CrossRef] [PubMed]
Sahoo, P.K.; Mohapatra, S.K.; Wu, S.L. SLA-based healthcare big data analysis and computing in cloud networks. J. Parallel Distrib. Comput. 2018, 119, 121–135. [Google Scholar] [CrossRef]
Sahoo, P.K.; Mohapatra, S.K.; Wu, S.L. Analyzing healthcare big data with prediction for future health condition. IEEE Access 2016, 4, 9786–9799. [Google Scholar] [CrossRef]
Manimurugan, S.; Almutairi, S.; Aborokbah, M.M.; Narmatha, C.; Ganesan, S.; Chilamkurti, N.; Alzaheb, R.A.; Almoamari, H. Two-stage classification model for the prediction of heart disease using IoMT and artificial intelligence. Sensors 2022, 22, 476. [Google Scholar] [CrossRef]
Choi, S.Y.; Chung, K. Knowledge process of health big data using MapReduce-based associative mining. Pers. Ubiquitous Comput. 2020, 24, 571–581. [Google Scholar] [CrossRef]
Safa, M.; Pandian, A.; Gururaj, H.; Ravi, V.; Krichen, M. Real-time healthcare big data analytics model for improved QoS in cardiac disease prediction with IoT devices. Health Technol. 2023, 13, 473–483. [Google Scholar] [CrossRef]
Mohapatra, S.K. Healthcare big data analysis with artificial neural network for cardiac disease prediction. Electronics 2024, 13, 163. [Google Scholar] [CrossRef]
Tomlinson, B.; Hu, M.; Waye MM, Y.; Chan, P.; Liu, Z.M. Current status of personalized medicine based on pharmacogenetics in cardiovascular medicine. Expert Rev. Precis. Med. Drug Dev. 2016, 1, 5–8. [Google Scholar] [CrossRef]
Bazargani, Y.T.; Ugurlu, M.; de Boer, A.; Leufkens, H.G.M.; Mantel-Teeuwisse, A.K. Selection of essential medicines for the prevention and treatment of cardiovascular diseases in low- and middle-income countries. BMC Cardiovasc. Disord. 2018, 18, 126. [Google Scholar] [CrossRef]
Ozenberger, K.; Alexander, G.C.; Shin, J.I.; Whitsel, E.A.; Qato, D.M. Use of prescription medications with cardiovascular adverse effects among older adults in the United States. Pharmacoepidemiol. Drug Saf. 2022, 31, 1027–1038. [Google Scholar] [CrossRef] [PubMed]
Schaffer, A.L.; Chia, J.; Brett, J.; Pearson, S.A.; Falster, M.O. A nationwide study of multimedicine use in people treated with cardiovascular medicines in Australia. Pharmacotherapy 2022, 42, 828–836. [Google Scholar] [CrossRef]
Jurgens, C.Y.; Lee, C.S.; Aycock, D.M.; Masterson Creber, R.; Denfeld, Q.E.; DeVon, H.A. State of the science: The relevance of symptoms in cardiovascular disease and research: A scientific statement from the American Heart Association. Circulation 2022, 146, e173–e184. [Google Scholar] [CrossRef] [PubMed]
Shahid, N.; Rappon, T.; Berta, W. Applications of artificial neural networks in health care organizational decision-making: A scoping review. PLoS ONE 2019, 14, e0212356. [Google Scholar] [CrossRef] [PubMed]
Dykstra, S.; Satriano, A.; Cornhill, A.K.; Lei, L.Y.; Labib, D.; Mikami, Y. Machine learning prediction of atrial fibrillation in cardiovascular patients using cardiac magnetic resonance and electronic health information. Front. Cardiovasc. Med. 2022, 9, 998558. [Google Scholar] [CrossRef]
Barbieri, S.; Mehta, S.; Wu, B.; Bharat, C.; Poppe, K.; Jorm, L. Predicting cardiovascular risk from national administrative databases using a combined survival analysis and deep learning approach. Int. J. Epidemiol. 2022, 51, 931–944. [Google Scholar] [CrossRef]
Ming-Lung Tsai, M.D.; Kuan-Fu Chen, M.D. Harnessing Electronic Health Records and Artificial Intelligence for Enhanced Cardiovascular Risk Prediction: A Comprehensive Review. J. Am. Heart Assoc. 2025, 14, e036946. [Google Scholar] [CrossRef]
Nghiem, N.; Atkinson, J.; Nguyen, B.P.; Tran-Duy, A.; Wilson, N. Predicting high health-cost users among people with cardiovascular disease using machine learning and nationwide linked social administrative datasets. Health Econ. Rev. 2023, 13, 9. [Google Scholar] [CrossRef]
Khawar Hussain, H.; Tariq, A.; Yousaf Gill, A. Role of arrodetificial intelligence in cardiovascular health care. J. World Sci. 2023, 2, 583–591. [Google Scholar] [CrossRef]
Tison, G.H.; Zhang, J.; Delling, F.N.; Deo, R.C. Automated and interpretable patient ECG profiles for disease detection, tracking, and discovery. Circ. Cardiovasc. Qual. Outcomes 2019, 12, e005289. [Google Scholar] [CrossRef]
Iriart, J.A.B. Precision medicine/personalized medicine: A critical analysis of movements in the transformation of biomedicine in the early 21st century. Cad. Saúde Pública 2019, 35, e00153118. [Google Scholar] [CrossRef] [PubMed]
Al Bataineh, A.; Manacek, S. MLP-PSO Hybrid Algorithm for Heart Disease Prediction. J. Pers. Med. 2022, 12, 1208. [Google Scholar] [CrossRef] [PubMed]
Pathan, M.S.; Nag, A.; Pathan, M.M.; Dev, S. Analyzing the impact of feature selection on the accuracy of heart disease prediction. Healthc. Anal. 2022, 2, 100060. [Google Scholar] [CrossRef]
Ozcan, M.; Peker, S. A classification and regression tree algorithm for heart disease modeling and prediction. Healthc. Anal. 2023, 3, 100130. [Google Scholar] [CrossRef]
Verma, L.; Srivastava, S.; Negi, P. A hybrid data mining model to predict coronary artery disease cases using non-invasive clinical data. J. Med. Syst. 2016, 40, 178. [Google Scholar] [CrossRef]
Dhingra, L.S.; Shen, M.; Mangla, A.; Khera, R. Cardiovascular Care Innovation through Data-Driven Discoveries in the Electronic Health Record. Am. J. Cardiol. 2023, 203, 136–148. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Jain, K.K. Personalized management of cardiovascular disorders. Med. Princ. Pract. 2017, 26, 399–414. [Google Scholar] [CrossRef]
Currie, G.; Delles, C. Precision medicine and personalized medicine in cardiovascular disease. In Sex-Specific Analysis of Cardiovascular Function; Kerkhof, P.L.M., Miller, V.M., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 589–605. [Google Scholar] [CrossRef]
Lee, M.S.; Flammer, A.J.; Lerman, L.O.; Lerman, A. Personalized medicine in cardiovascular diseases. Korean Circ. J. 2012, 42, 583–591. [Google Scholar] [CrossRef]
Leopold, J.A.; Loscalzo, J. The emerging role of precision medicine in cardiovascular disease. Circ. Res. 2018, 122, 1302–1315. [Google Scholar] [CrossRef]
Leopold, J.A.; Maron, B.A.; Loscalzo, J. The application of big data to cardiovascular disease: Paths to precision medicine. J. Clin. Investig. 2020, 130, 29–38. [Google Scholar] [CrossRef]
European Society of Cardiology. Heart Patients Set to Receive Treatment Tailored to Their Genetic and Health Information; The ESC Press Office: Sophia Antipolis, France, 2024; Available online: https://www.escardio.org/The-ESC/Press-Office/Press-releases/heart-patients-set-to-receive-treatment-tailored-to-their-genetic-and-health-inf (accessed on 12 January 2025).
Kumar, A.; Jena, P.K.; Behera, S.; Lockey, R.F.; Mohapatra, S.; Mohapatra, S. Multifunctional magnetic nanoparticles for targeted delivery. Nanomed. Nanotechnol. Biol. Med. 2010, 6, 64–69. [Google Scholar] [CrossRef] [PubMed]
Oluwafemidiakhoa. Personalized Medicine 2.0: AI’s Role in Tailoring Treatments; Mr. Plan B Publication: London, UK, 2024; Available online: https://medium.com/mr-plan-publication/personalized-medicine-2-0-ais-role-in-tailoring-treatments-2f2592cdd49c (accessed on 22 August 2024).
Zhang, Y.; Zhu, L.; Li, X.; Ge, C.; Pei, W.; Zhang, M.; Zhong, M.; Zhu, X.; Lv, K. M2 macrophage exosome-derived lncRNA AK083884 protects mice from CVB3-induced viral myocarditis through regulating PKM2/HIF-1α axis mediated metabolic reprogramming of macrophages. Redox Biol. 2024, 69, 103016. [Google Scholar] [CrossRef] [PubMed]
Li, L.; Li, J.; Zhong, M.; Wu, Z.; Wan, S.; Li, X.; Zhang, Y.; Lv, K. Nanozyme-enhanced tyramine signal amplification probe for preamplification-free myocarditis-related miRNAs detection. Chem. Eng. J. 2025, 503, 158093. [Google Scholar] [CrossRef]
Pirillo, A.; Catapano, A.L.; Norata, G.D. Monoclonal antibodies in the management of familial hypercholesterolemia: Focus on PCSK9 and ANGPTL3 inhibitors. Curr. Atheroscler. Rep. 2021, 23, 79. [Google Scholar] [CrossRef]
Gao, J.; Li, P.; Chen, Z.; Zhang, J. A survey on deep learning for multimodal data fusion. Neural Comput. 2020, 32, 829–864. [Google Scholar] [CrossRef]
Bycroft, C.; Freeman, C.; Petkova, D.; Band, G.; Elliott, L.T.; Sharp, K. The UK Biobank resource with deep phenotyping and genomic data. Nature 2018, 562, 203–209. [Google Scholar] [CrossRef]
Amal, S.; Safarnejad, L.; Omiye, J.A.; Ghanzouri, I.; Cabot, J.H.; Ross, E.G. Use of multi-modal data and machine learning to improve cardiovascular disease care. Front. Cardiovasc. Med. 2022, 9, 840262. [Google Scholar] [CrossRef]
Bandera, F.; Baghdasaryan, L.; Mandoli, G.E.; Cameli, M. Multimodality imaging predictors of sudden cardiac death. Heart Fail. Rev. 2020, 25, 427–446. [Google Scholar] [CrossRef]
Zhang, X.; Gu, K.; Miao, S.; Zhang, X.; Yin, Y.; Wan, C.; Yu, Y.; Hu, J.; Wang, Z.; Shan, T.; et al. Automated detection of cardiovascular disease by electrocardiogram signal analysis: A deep learning system. Cardiovasc. Diagn. Ther. 2020, 10, 227–235. [Google Scholar] [CrossRef]
Zambrano Chaves, J.M.; Wentland, A.L.; Desai, A.D.; Banerjee, I.; Boutin, R.D.; Maron, D.J.; Rodriguez, F.; Sandhu, A.T.; Jeffrey, R.B.; Rubin, D.; et al. Opportunistic assessment of ischemic heart disease risk using abdominopelvic computed tomography and medical record data: A multimodal explainable artificial intelligence approach. Sci. Rep. 2023, 13, 21034. [Google Scholar] [CrossRef]
Ali, F.; El-Sappagh, S.; Islam, S.R.; Kwak, D.; Ali, A.; Imran, M.; Kwak, K.-S. A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Inf. Fusion 2020, 63, 208–222. [Google Scholar] [CrossRef]
Lalani, C.; Kunwar, E.M.; Kinard, M.; Dhruva, S.S.; Redberg, R.F. Reporting of death in US Food and Drug Administration medical device adverse event reports in categories other than death. JAMA Intern. Med. 2021, 181, 1217–1223. [Google Scholar] [CrossRef] [PubMed]
Rathi, V.K.; Ross, J.S.; Redberg, R.F. Unique device identifiers—Missing in action. JAMA Intern. Med. 2023, 183, 1049–1050. [Google Scholar] [CrossRef] [PubMed]
Resnic, F.S.; Majithia, A.; Marinac-Dabic, D.; Robbins, S.; Ssemaganda, H.; Hewitt, K. Registry-based prospective, active surveillance of medical-device safety. N. Engl. J. Med. 2017, 376, 526–535. [Google Scholar] [CrossRef]
Ball, R.; Robb, M.; Anderson, S.A.; Dal Pan, G. The FDA’s Sentinel Initiative–A comprehensive approach to medical product surveillance. Clin. Pharmacol. Ther. 2016, 99, 265–268. [Google Scholar] [CrossRef]
Wang, X.; Ayakulangara Panickan, V.; Cai, T.; Xiong, X.; Cho, K.; Cai, T. Endovascular aneurysm repair devices as a use case for postmarketing surveillance of medical devices. JAMA Intern. Med. 2023, 183, 1090–1097. [Google Scholar] [CrossRef]
Ross, J.S.; Blount, K.L.; Ritchie, J.D.; Hodshon, B.; Krumholz, H.M. Post-market clinical research conducted by medical device manufacturers: A cross-sectional survey. Med. Devices 2015, 8, 241–249. [Google Scholar] [CrossRef]
Santana, J.; Waheed, S. Analysis of medical device reports involving ventilator-related incidents in a clinical setting. J. Clin. Eng. 2022, 47, 107–115. [Google Scholar] [CrossRef]
Mortazavi, B.J.; Downing, N.S.; Bucholz, E.M.; Dharmarajan, K.; Manhapra, A.; Li, S.-X.; Negahban, S.N.; Krumholz, H.M. Analysis of machine learning techniques for heart failure readmissions. Circ. Cardiovasc. Qual. Outcomes 2016, 9, 629–640. [Google Scholar] [CrossRef]
Moreira, M.W.L.; Rodrigues, J.J.P.C.; Kumar, N.; Al-Muhtadi, J.; Korotaev, V. Evolutionary radial basis function network for gestational diabetes data analytics. J. Comput. Sci. 2018, 27, 410–417. Available online: https://www.sciencedirect.com/science/article/abs/pii/S1877750317304726 (accessed on 19 August 2024). [CrossRef]
Abouelmehdi, K.; Beni-Hessane, A.; Khaloufi, H. Big healthcare data: Preserving security and privacy. J. Big Data 2018, 5, 1. [Google Scholar] [CrossRef]
Bainbridge, M. Big data challenges for clinical and precision medicine. In Big Data, Big Challenges: A Healthcare Perspective: Background, Issues, Solutions, and Research Directions; Househ, M., Kushniruk, A.W., Borycki, E.M., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 17–31. [Google Scholar]
Choudhury, S.; Fishman, J.R.; McGowan, M.L.; Juengst, E.T. Big data, open science and the brain: Lessons learned from genomics. Front. Hum. Neurosci. 2014, 8, 239. [Google Scholar] [CrossRef] [PubMed]
Schadt, E.E. The changing privacy landscape in the era of big data. Mol. Syst. Biol. 2012, 8, 612. [Google Scholar] [CrossRef]
Joly, Y.; Dove, E.S.; Knoppers, B.M.; Bobrow, M.; Chalmers, D. Data sharing in the post-genomic world: The experience of the International Cancer Genome Consortium (ICGC) Data Access Compliance Office (DACO). PLoS Comput. Biol. 2012, 8, e1002549. [Google Scholar] [CrossRef]
Steinsbekk, K.S.; Ursin, L.O.; Skolbekken, J.A.; Solberg, B. We’re not in it for the money: Lay people’s moral intuitions on commercial use of “their” biobank. Med. Health Care Philos. 2013, 16, 151–162. [Google Scholar] [CrossRef]
Krittanawong, C.; Johnson, K.W.; Hershman, S.G.; Tang, W.H.W. Big data, artificial intelligence, and cardiovascular precision medicine. Expert Rev. Precis. Med. Drug Dev. 2018, 3, 305–317. [Google Scholar] [CrossRef]
Callebaut, W. Scientific perspectivism: A philosopher of science’s response to the challenge of big data biology. Stud. Hist. Philos. Biol. Biomed. Sci. 2012, 43, 69–80. [Google Scholar] [CrossRef]
Peddicord, D.; Waldo, A.B.; Boutin, M.; Grande, T.; Gutierrez, L., Jr. A proposal to protect privacy of health information while accelerating comparative effectiveness research. Health Aff. 2010, 29, 2082–2090. [Google Scholar] [CrossRef]
Rumbold, J.M.; Pierscionek, B.K. Ethical challenges in Big Data health research: A review. Healthcare 2023, 5, 34. [Google Scholar]
Murdoch, T.B.; Detsky, A.S. The inevitable application of big data to health care. JAMA 2013, 309, 1351–1352. [Google Scholar] [CrossRef]
Roski, J.; Bo-Linn, G.W.; Andrews, T.A. Creating value in health care through big data: Opportunities and policy implications. Health Aff. 2014, 33, 1115–1122. [Google Scholar] [CrossRef] [PubMed]
Obermeyer, Z.; Powers, B.; Vogeli, C.; Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 2019, 366, 447–453. [Google Scholar] [CrossRef] [PubMed]
Mittelstadt, B.D.; Floridi, L. The ethics of Big Data: Current and foreseeable issues in biomedical contexts. Sci. Eng. Ethics. 2016, 22, 303–341. [Google Scholar] [CrossRef] [PubMed]
Groves, P.; Kayyali, B.; Knott, D.; Van Kuiken, S. The ‘Big Data’ Revolution in Healthcare: Accelerating Value and Innovation. McKinsey & Company Report. 2016. Available online: https://www.mckinsey.com (accessed on 14 September 2024).
Hodge, J.G., Jr.; Gostin, L.O.; Jacobson, P.D. Legal issues concerning electronic health information. JAMA 1999, 282, 1466. [Google Scholar] [CrossRef]
Emam, K.E.; Jonker, E.; Arbuckle, L.; Malin, B. A systematic review of re-identification attacks on health data. PLoS ONE 2011, 6, e28071. Available online: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0028071 (accessed on 2 December 2011). [CrossRef]
Hoffman, S.; Podgurski, A. Balancing privacy, autonomy, and scientific needs in electronic health records research. SMU Law Rev. 2012, 65, 85–144. [Google Scholar]
Angst, C.M. Protect my privacy or support the common good? Ethical questions about electronic health information exchanges. J. Bus. Ethics 2009, 90, 169–178. [Google Scholar] [CrossRef]

Figure 1. An important simplistic application of ML.

Table 1. Various aspects of big data analytics.

Descriptive Analytics	Aims to Examine Past Datasets for Patterns and Trends
Predictive analytics	Aims to predict likely outcomes and make evidence-based forecasts using historical data
Prescriptive analytics	Utilizes data from diverse sources, such as statistical analyses, machine learning algorithms, and data mining techniques, to predict potential future outcomes and determine the most optimal course of action
Diagnostic analytics	Analysing historical and real-time data to identify the underlying causes

Table 2. Comparison of the various models using big data and other resources. “✓” stands for available feature while “×” stands for feature not available.

Related Works	Big Data	Map Reduce	Cloud	Cardiac Healthcare Data	ECG Data
Sahoo P.K. et al., 2018 [14]	✓	×	✓	✓	×
Sahoo P.K. et al., 2016 [15]	✓	✓	✓	✓	×
Manimurugun et al., 2022 [16]	✓	×	✓	✓	×
Choi et al., 2020 [17]	✓	✓	×	×	×
Safa et al., 2023 [18]	✓	×	×	✓	×
Mohapatra et al., 2024 [19]	✓	✓	✓	✓	✓

Table 3. Some of the models used for cardiac data analysis with Accuracy rates.

Authors	Dataset	Algorithm Type	Analysis	Number of Features	Accuracy (%)
Srinivasan et al., 2023 [13]	UCI respository	Learning vector quantization (LVQ)	Classification	10	98
AI Bataineh & Manacek 2022 [33]	Heart disease	Multilayer Perceptron (MLP) + PSO	Classification	13	84
AI Bataineh & Manacek 2022 [33]	Heart disease	Recurrent neural network (RNN) + long short-term memory (LSTM)	Classification	14	95
Pathan M.S et al., 2022 [34] (77)	Cardiovascular Disease (CVD) and Framingham	MLP, support vector classifier	Classification	12 (CVD)	74 (CVD)
Pathan M.S et al., 2022 [34] (77)	Cardiovascular Disease (CVD) and Framingham	MLP, support vector classifier	Classification	11 (Fram)	71 (Fram)
Ozcan M et al., 2023 [35]	Cleveland, Hungarian, Switzerland, Long Beach VA Stalog Dataset	Classification and regression tree (CART)	Classification and Regression	11	87
Verma L et. al., 2016 [36]	Department of Cardiology, IGMC	Multinominal logistic regression (MLR)	Classification	26	98

Table 4. Challenges of BDA with examples.

Challenge with BDA Usage	Descriptions	Scientific Evidence/Implications	Citation
Data Privacy and Security	Data safety, patient identifiers, and data breaches might raise concerns about compliance with regulatory agencies.	Studies show that unauthorized access to health data can lead to loss of trust, legal consequences, and delays in adopting analytics. Privacy-preserving models (e.g., federated learning) are being explored.	[74]
Integration of Data Sources	BDA uses heterogeneous data sources that combine information on demographics, clinical, anthropometric, lifestyle and risk factors, genomics, metabolomics, and imaging tools. This complexity adds variations in format.	Research highlights difficulties in achieving interoperability across electronic health records (EHRs), devices, and databases. Standards like FHIR are being developed to address this.	[75]
Infrastructure Costs	BDA can be a cost-sensitive technique due to the requirements of high computational power, storage, and skilled professionals.	Studies estimate significant upfront and ongoing costs for hospitals and research institutions. Cloud-based solutions can help mitigate infrastructure burdens but may raise additional concerns about data governance.	[76]
Algorithm Bias and Accuracy	Data gaps, inaccurate datasets, and poorly coded heterogeneous data can generate misleading and biased algorithms. Often, CVD’s determinants are contextual and a lack of Indigenous data might build inaccurate models for various populations.	Evidence shows that the underrepresentation of certain populations in datasets can lead to biased outcomes. Initiatives to increase diversity in data collection and ethical AI practices are essential.	[77]
Ethical and Legal Challenges	Ambiguities around ownership, consent, and the ethical use of patient data complicate the deployment of analytics in healthcare.	Researchers highlight the importance of clear legal frameworks and ethical guidelines. For example, consent models for the secondary use of data in research remain a contested issue.	[78]
Resource requirements	BDA is an emerging area requiring expertise from diverse fields (for BDA in the area of CVD: data scientists, clinicians).	Reports emphasize the shortage of professionals trained in both healthcare and data analytics. Education and training programs integrating both domains are critical for capacity building.	[79]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sharma, P.; Sharma, P.; Sharma, K.; Varma, V.; Patel, V.; Sarvaiya, J.; Tavethia, J.; Mehta, S.; Bhadania, A.; Patel, I.; et al. Revolutionizing Utility of Big Data Analytics in Personalized Cardiovascular Healthcare. Bioengineering 2025, 12, 463. https://doi.org/10.3390/bioengineering12050463

AMA Style

Sharma P, Sharma P, Sharma K, Varma V, Patel V, Sarvaiya J, Tavethia J, Mehta S, Bhadania A, Patel I, et al. Revolutionizing Utility of Big Data Analytics in Personalized Cardiovascular Healthcare. Bioengineering. 2025; 12(5):463. https://doi.org/10.3390/bioengineering12050463

Chicago/Turabian Style

Sharma, Praneel, Pratyusha Sharma, Kamal Sharma, Vansh Varma, Vansh Patel, Jeel Sarvaiya, Jonsi Tavethia, Shubh Mehta, Anshul Bhadania, Ishan Patel, and et al. 2025. "Revolutionizing Utility of Big Data Analytics in Personalized Cardiovascular Healthcare" Bioengineering 12, no. 5: 463. https://doi.org/10.3390/bioengineering12050463

APA Style

Sharma, P., Sharma, P., Sharma, K., Varma, V., Patel, V., Sarvaiya, J., Tavethia, J., Mehta, S., Bhadania, A., Patel, I., & Shah, K. (2025). Revolutionizing Utility of Big Data Analytics in Personalized Cardiovascular Healthcare. Bioengineering, 12(5), 463. https://doi.org/10.3390/bioengineering12050463

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Revolutionizing Utility of Big Data Analytics in Personalized Cardiovascular Healthcare

Abstract

1. Introduction

2. Big Data in Healthcare

2.1. Enhancing CVD Treatment and Research Through Big Data Analytics

2.2. Challenges

3. Applications of Big Data Analytics in Cardiovascular Diseases

4. Predictive Modelling and Risk Assessment

5. Personalized Medicine

5.1. Role of Genomics and Precision Medicine

5.2. Tailoring Treatments Based on Individual Data

5.3. Using Big Data for Public Health Initiatives

5.4. Identifying Trends and Patterns in CVD

5.5. Improving Disease Detection and Diagnosis

6. Drug and Medical Device Safety Surveillance

7. Quality of Care and Performance Measurement

8. Challenges and Future Directions

9. Future Directions: Balancing Core Considerations

10. Conclusions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI