Next Article in Journal
Digital Dental Biometrics for Human Identification Based on Automated 3D Point Cloud Feature Extraction and Registration
Next Article in Special Issue
Dual and Multi-Target Cone-Beam X-ray Luminescence Computed Tomography Based on the DeepCB-XLCT Network
Previous Article in Journal
Predictive Methods for Thrombus Formation in the Treatment of Aortic Dissection and Cerebral Aneurysms: A Comprehensive Review
Previous Article in Special Issue
Synthesizing High b-Value Diffusion-Weighted Imaging of Gastric Cancer Using an Improved Vision Transformer CycleGAN
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Integrating Multi-Organ Imaging-Derived Phenotypes and Genomic Information for Predicting the Occurrence of Common Diseases

1
Human Phenome Institute, Fudan University, Shanghai 201203, China
2
Department of Radiology, Ruijin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200025, China
3
Ultrasound Department, Ruijin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200025, China
4
Radiology Department, Jiangyin Affiliated Hospital of Nanjing University of Chinese Medicine, 130 Renmin Middle Road, Jiangyin 214400, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Bioengineering 2024, 11(9), 872; https://doi.org/10.3390/bioengineering11090872
Submission received: 12 June 2024 / Revised: 17 July 2024 / Accepted: 16 August 2024 / Published: 28 August 2024

Abstract

:
As medical imaging technologies advance, these tools are playing a more and more important role in assisting clinical disease diagnosis. The fusion of biomedical imaging and multi-modal information is profound, as it significantly enhances diagnostic precision and comprehensiveness. Integrating multi-organ imaging with genomic information can significantly enhance the accuracy of disease prediction because many diseases involve both environmental and genetic determinants. In the present study, we focused on the fusion of imaging-derived phenotypes (IDPs) and polygenic risk score (PRS) of diseases from different organs including the brain, heart, lung, liver, spleen, pancreas, and kidney for the prediction of the occurrence of nine common diseases, namely atrial fibrillation, heart failure (HF), hypertension, myocardial infarction, asthma, type 2 diabetes, chronic kidney disease, coronary artery disease (CAD), and chronic obstructive pulmonary disease, in the UK Biobank (UKBB) dataset. For each disease, three prediction models were developed utilizing imaging features, genomic data, and a fusion of both, respectively, and their performances were compared. The results indicated that for seven diseases, the model integrating both imaging and genomic data achieved superior predictive performance compared to models that used only imaging features or only genomic data. For instance, the Area Under Curve (AUC) of HF risk prediction was increased from 0.68 ± 0.15 to 0.79 ± 0.12, and the AUC of CAD diagnosis was increased from 0.76 ± 0.05 to 0.81 ± 0.06.

Graphical Abstract

1. Introduction

Advanced application technologies based on biomedical imaging have the potential to significantly enhance diagnostic efficiency and accuracy. The fusion of biomedical imaging and genomics information for disease classification is a cutting-edge approach in medical diagnostics [1]. This technique facilitates a more comprehensive integration of genetic and environmental factors contributing to complex diseases, such as cardiovascular disease, diabetes, and liver disease, which are now increasingly studied using multi-omics approaches [2,3,4,5]. Medical imaging provides visual information about the anatomy and functional status within the body [6]. Phenotypes derived from imaging can quantitatively reflect the structure and functional status of organs, serving as excellent biomarkers for disease prediction [7]. On the other hand, polygenic risk scores (PRSs) are widely employed in research due to their demonstrated validity and potential clinical utility in predicting various common diseases [8]. They provide crucial insights into inherent genetic factors that influence disease, aiding significantly in early disease diagnosis [9]. They offer valuable insights into disease progression and facilitate early detection, independent of disease presence or activity levels [10]. Integrating cardiac magnetic resonance (CMR) traits and PRSs can enhance the performance of predicting complex traits and diseases [5]. Integrating PRSs with the QCancer-10 score [11], which is calculated relatively easily from health records, modestly improves risk prediction over the use of the Qcancer-10 score alone [12]. Wang et al. developed a classifier incorporating both MRI and PRS features, which achieved the optimal prediction performance in schizophrenia [13]. However, a significant limitation of previous studies is their reliance on IDPs from a single organ, overlooking the potential benefits of integrating data from multiple organs. This limitation may constrain our understanding of disease mechanisms and reduce the comprehensiveness of treatment strategies. As healthcare systems increasingly embrace genomic medicine, there are significant opportunities to integrate PRSs, which summarize an individual’s genetic predisposition for adverse treatment outcomes and disease complications, together with imaging-derived phenotypes (IDPs) into clinical decision-making.
In the present study, we analyzed IDPs from various organs and disease-related PRSs of disease from the UK Biobank (UKBB) [14] dataset to construct a disease prediction model. UKBB represents a multi-center dataset, with inpatient record data sourced from HES (England), PEDW (Wales), or SMR (Scotland). Additionally, we evaluated the performance enhancements of fusion models, which integrate both IDPs and PRSs, compared to models utilizing solely IDPs or PRSs. The diseases considered in our analysis were common ones that affect multiple organs. The process of constructing the disease prediction model is depicted in the Figure 1.

2. Materials and Methods

2.1. Study Design

The UKBB is a prospective cohort of approximately 500,000 individuals from the United Kingdom, enrolled between 2006 and 2010. The cohort includes extensive phenotyping, imaging, and multiple genomic data types. The design of the cohort has been detailed in a previous paper [14]. Starting in 2014, about 40,000 participants returned for their first multi-modal imaging visit, which included brain MRI, heart MRI, and abdominal MRI. Representative multi-modal images are illustrated in Figure 2. This comprehensive imaging allows for the evaluation of organ conditions across the entire body. Longitudinal health outcomes for the study participants are tracked via national health datasets. Data from the UKBB (Application Number: 96511) were applied for the present analysis.

2.2. Baseline Examination and Sample Collection

Our study focused on the PRSs of diseases and IDPs from multiple organs, all accessible within the UKBB dataset. The participant selection process in our study is illustrated in Figure 3, which details the inclusion and exclusion criteria used to define our study cohorts from the UKBB population. We focused on participants of European descent because they constitute the majority in the UKBB. Additionally, performing PRS calculations requires a homogeneous population to ensure the accuracy and reliability of the genetic associations, minimizing potential biases due to population stratification. Ultimately, 8646 individuals remained with complete IDPs and PRSs, and their baseline characteristics at the time of their first visit are summarized in Table 1 below.
In our study, we used specific IDPs from different organs including the heart, brain, kidney, liver, lung, pancreas, and spleen from the UKBB, with the corresponding UKBB Field IDs detailed in Table 2. Those IDPs were automatically extracted through the deep learning model, and detailed information about the process of IDP extraction can be found in the reference. Cardiac imaging data included the phenotypes from cardiac and aortic structure and function. For the kidney, Langner et al. [15] automatically segmented the renal parenchyma and extracted IDPs related to the renal parenchyma volume from abdominal MRI scans. Zhao et al. [16] extracted brain phenotypes from T1-weighted images. Liver phenotypes were provided by Mojtahed et al. [17], using proton density fat fraction images (Field ID 40061), gradient echo images (Field ID 20203), and IDEAL sequence images (Field ID 20254), which facilitated the calculation of liver fat fractions and corrected T1 (cT1) measurements. The volumes, fat fractions, and iron contents of the liver, spleen, and pancreas were extracted from abdominal MRI scans by Liu et al. [18]. Lastly, lung function phenotypes were assessed using spirometry tests, which measured multiple functional parameters including forced vital capacity (FVC), forced expiratory volume in one second (FEV1), peak expiratory flow (PEF), and the FEV1/FVC ratio. These IDPs serve as indicators of the structural and functional condition of the respective organs and play a crucial role in our comprehensive analysis of the associations between organ health and common diseases.
In this study, nine common diseases were included for analysis, namely atrial fibrillation (AF), heart failure (HF), hypertension, myocardial infarction (MI), asthma, type 2 diabetes (T2D), chronic kidney disease (CKD), coronary artery disease (CAD), and chronic obstructive pulmonary disease (COPD), classified according to the International Classification of Diseases (ICD-10) codes. And patients with multiple diseases were included, allowing for the possibility of participants being diagnosed with more than one disease simultaneously. Health was defined relative to the included diseases. Participants were classified as belonging to the healthy group if they did not have any of the specified diseases mentioned above, resulting in 5669 individuals being categorized in the healthy group. Specific ICD-10 codes and numbers of cases related to these diseases are listed in Table 3.

2.3. PRS Calculation

In this study, we calculated PRSs for nine common diseases. The computation procedure, detailed below, aimed to identify single-nucleotide polymorphisms (SNPs) significantly associated with these diseases while excluding any SNPs in linkage disequilibrium (LD).
Initially, summary statistics for each disease were processed using PLINK [20] software version 1.07. This process utilized a p-value-based clumping method with specific parameters: --clump-p1 = 0.0001, --clump-p2 = 0.01, --clump-r2 = 0.5, and --clump-kb = 250. These settings were selected to identify SNPs strongly associated with the diseases, yet independent of LD effects. European population data from the 1000 Genomes Project [21] served as the reference panel for LD. Subsequent steps involved the extraction and merging of the selected SNP data across different chromosomes, facilitated by bgenix [22] and cat-bgen software tools for handling the original UKBB per-chromosome genetic data files. The dataset was further processed by converting the bgen files to PLINK format, with the --hard-call-threshold parameter set to 0.1.
Quality control for the SNPs was conducted by leveraging imputed genotype quality (INFO) and minor allele frequency (MAF) data provided by the UKBB, utilizing QCTOOL [23] for the calculations. This crucial step involved removing ambiguous SNPs and those with INFO values below 0.4 or MAF values less than 0.005. The analysis specifically targeted samples of samples of European ancestry from the UKBB. Subsequently, PRSs were calculated using the quality-controlled data. This rigorous process guarantees the generation of reliable PRSs, offering valuable insights into the genetic predisposition towards common diseases in the population.

2.4. Prediction Model

The study employed multi-organ IDPs and PRSs to predict disease outcomes. Our analysis compared the performance of three distinct models: the PRSs-Only Model, which used solely PRSs derived from genetic data; the IDPs-Only Model, which utilized only the IDPs from multiple organs, including quantitative measurements and features reflecting the structure and function of various organs; and the Combined PRSs and IDPs Model, which integrated both PRSs and multi-organ IDPs to enhance predictive power. Logistic regression with L1 norm regularization, implemented via the glmnet package in R (version 4.3.1), served as the prediction model. To address the imbalance between positive and negative samples in our dataset, a downsampling method was employed through the utilization of the R package themis. The data were split into a training set and a test set with a 7:3 ratio. Our analysis compared the performance of different models: using only PRSs, only IDPs, and a combination of both.
To explore the influence of lifestyle factors on disease prediction, we included the characteristics of smoking and drinking. The disease prediction accuracy was compared with and without lifestyle factors with the fusion model of IDPs and PRSs as the baseline. The process of model development was consistent with the methods previously described.

2.5. Validation Cohort

To validate our results observed in the entire UKBB dataset, the models were tested in HES, the largest center within the UKBB. Further details on the model and analysis in this center remain as those previously described.

2.6. Feature Ranking

Feature importance was determined by fitting a random forest model, using participants’ variables as inputs and the predictions of our model as outputs. To evaluate feature importance, the mean decrease in Gini coefficient, a metric derived from the random forest algorithm, was utilized. This approach permitted the assessment the contribution of each feature within our IDPs + PRSs model and generated a ranked order of feature importance for the various IDPs and PRSs. This ranking is essential for understanding how IDPs and PRSs differentially impact the predictive accuracy of our disease prediction models.

3. Results

3.1. Prediction Results

To evaluate the performance of disease prediction models, our study focused on three distinct methodologies: models based only on PRSs, models based only on multi-organ IDPs, and models integrating both PRSs and IDPs. The effectiveness of these models was evaluated by AUC, a robust metric for evaluating the predictive accuracy of binary classifiers. The integrated model, combining both PRSs and IDPs, demonstrated significant improvements in predictive performance compared to models using either PRSs or IDPs alone in T2D, AF, CAD, COPD, asthma, MI, and HF. For instance, the combined model of CAD demonstrated superior predictive accuracy with an AUC of 0.81 ± 0.06, compared to 0.76 ± 0.05 for IDPs alone and 0.66 ± 0.06 for PRSs alone. Similarly, the PRS + IDP model (AUC = 0.79 ± 0.12) of HF was superior to both PRS (AUC = 0.63 ± 0.16) and IDP models (AUC = 0.68 ± 0.15). However, in other diseases such as hypertension, the combined model did not perform as well as the other models. The IDP-alone model achieved an AUC of 0.77 ± 0.03, which was more effective than the combined approach (AUC = 0.73 ± 0.03). The detailed ROCs for the common disease prediction models are shown in Figure 4, and the performance metrics for our disease prediction models, including the Pearson correlation between the true and predicted conditions of the diseases, AUC, sensitivity, specificity, and accuracy, are summarized in Table 4. The significance of the differences between the AUCs was calculated using https://www.medcalc.org/calc/comparison_of_independentROCtest.php accessed on 16 July 2024.
Table 5 summarizes the prediction performance for various diseases with and without the inclusion of lifestyle factors. The data indicated that integrating lifestyle factors into the prediction model generally decreases the predictive performance of most diseases. Only CKD, asthma, and CAD demonstrated improved performance, although the enhancements were not statistically significant.

3.2. Validation

A subset of UKBB data facilitated the evaluation of the robustness and generalizability of our models. The AUC results were compared between the entire UKBB dataset and the HES center (Figure 5). In the entire UKBB dataset, the integrated model, combining both PRSs and IDPs, demonstrated significant improvements in predictive performance compared to models using either PRSs or IDPs alone in T2D, AF, CAD, COPD, asthma, MI, and HF. In the HES center, the integrated model performed the best in CKD, COPD, AF, HF, hypertension, MI, T2D.

3.3. Feature Ranking

Figure 6 illustrates the feature importance derived from our models, showcasing the differential contributions of PRSs and multi-organ IDPs across various disease predictions. For AF, left-atrium-related features emerged as the most crucial. For T2D, features related to the liver were important. For hypertension prediction, features related to the thickness of the myocardium and the distensibility of Ao made a great contribution to the prediction model. For CAD and COPD prediction, PRSs contributed more than other IDPs. A change in kidney parenchyma volume is a potential marker for the presence of CKD. Lung function parameter was a major contributor in predictive models for both COPD and asthma, emphasizing its recognized impact. For HF prediction, the ejection fraction of the ventricle played the most important role among all the features. Surprisingly, for MI prediction, the fat fraction of pancreas and liver iron corrected T1 value contributed more than cardiac IDPs.

4. Discussion

In recent years, the integration of multi-modal data for disease prediction has become increasingly prevalent, reflecting a broader trend in precision medicine of harnessing diverse datasets for enhanced diagnostic accuracy. For instance, Dolci et al. [24] introduced a deep multi-modal generative data fusion framework for integrating neuroimaging and genomics in classifying Alzheimer’s disease, achieving superior prediction performance despite incomplete data availability. Vanguri et al. [25] demonstrated the enhanced predictive capacity of integrating CT imaging and genomic features to forecast immunotherapy response in advanced non-small-cell lung cancer, achieving superior performance compared to individual biomarkers like tumor mutational burden and PD-L1 expression. However, previous research has predominantly concentrated on a limited number of diseases and single-organ imaging. Our research stands out by incorporating a wide array of diseases and a rich variety of IDPs from multiple organs, enhancing both the predictive power and clinical relevance of our models.
Our investigation focused on the fusion of genomic data with multi-organ IDPs to predict prevalent diseases, utilizing the extensive UKBB dataset. This integration yielded substantial enhancements in predictive accuracy, particularly evidenced by notable increases in the AUC for asthma, COPD, AF, CAD HF, MI, and T2D. However, not all diseases showed improved prediction. CKD and hypertension did not exhibit notable enhancements in predictive accuracy. This could be attributed to the relatively minor genetic contributions to these conditions, which may diminish the effectiveness of PRSs in their prediction. This insight underscored the importance of prioritizing the consideration of acquired dietary and lifestyle habits in the treatment of CKD and hypertension.
Our result of feature importance revealed organ-specific contributions to predictive accuracy. Specific features such as those related to the left atrium (LA) were pivotal for AF prediction, while liver-related features significantly influenced the prediction of T2D. These insights not only demonstrate the critical role of targeted organ analysis in enhancing disease prediction models but also suggest that individual organ metrics can be decisive factors in managing and understanding complex diseases. Recognizing these patterns enables a more refined approach to precision medicine, facilitating tailored treatment strategies based on specific organ involvement and genetic profiles.
Our study has several limitations. First, while we included a range of IDPs from the UKBB that have been previously extracted, several potentially valuable features were not considered. Notably, advanced imaging metrics such as cardiac T1 mapping [26] or specific modalities in brain imaging [27], such as functional MRI (fMRI) reflecting brain activity or diffusion MRI (dMRI) reflecting local tissue microstructure, were not included in our analysis. However, incorporating these techniques could potentially provide valuable insights into certain diseases. The inclusion of these additional features in future studies could significantly enhance both the predictive accuracy and clinical relevance of the models. Second, our study only included imaging and genetic data. Integrating additional data types like proteomics [28] and transcriptomics [29], which capture micro-level changes in the body, could enhance predictive models further [30,31,32]. These omics data offer insights into molecular pathways and functional alterations, potentially improving predictive accuracy. Last, logistic regression was selected for building a prediction model because of its interpretability, computational efficiency, and suitability for the data structure. Future research could explore more complex machine learning models to possibly improve outcomes, given a substantial dataset size and sufficient training time.

Author Contributions

Conceptualization, methodology, software, M.L.; validation, C.W.; formal analysis, M.L.; investigation, Y.L.; resources, M.L.; data curation, M.L.; writing—original draft preparation, M.L.; writing—review and editing, Y.L., L.S., M.S., X.H., Q.L., C.W., M.Y., X.R. and J.M.; visualization, M.L.; supervision, C.W.; project administration, C.W.; funding acquisition, C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported in part by the National Natural Science Foundation of China (No. 62331021) and the Shanghai Sailing Program (No. 20YF1402400).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data are available from UK Biobank via a standard application procedure at http://www.ukbiobank.ac.uk/register-apply accessed on 27 August 2023.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cui, C.; Yang, H.; Wang, Y.; Zhao, S.; Asad, Z.; Coburn, L.A.; Wilson, K.T.; Landman, B.A.; Huo, Y. Deep Multimodal Fusion of Image and Non-Image Data in Disease Diagnosis and Prognosis: A Review. Prog. Biomed. Eng. 2023, 5, 022001. [Google Scholar] [CrossRef] [PubMed]
  2. Ritchie, S.C.; Lambert, S.A.; Arnold, M.; Teo, S.M.; Lim, S.; Scepanovic, P.; Marten, J.; Zahid, S.; Chaffin, M.; Liu, Y.; et al. Integrative Analysis of the Plasma Proteome and Polygenic Risk of Cardiometabolic Diseases. Nat. Metab. 2021, 3, 1476–1483. [Google Scholar] [CrossRef] [PubMed]
  3. Wigger, L.; Barovic, M.; Brunner, A.-D.; Marzetta, F.; Schöniger, E.; Mehl, F.; Kipke, N.; Friedland, D.; Burdet, F.; Kessler, C.; et al. Multi-Omics Profiling of Living Human Pancreatic Islet Donors Reveals Heterogeneous Beta Cell Trajectories towards Type 2 Diabetes. Nat. Metab. 2021, 3, 1017–1031. [Google Scholar] [CrossRef]
  4. Liu, Y.; Méric, G.; Havulinna, A.S.; Teo, S.M.; Åberg, F.; Ruuskanen, M.; Sanders, J.; Zhu, Q.; Tripathi, A.; Verspoor, K.; et al. Early Prediction of Incident Liver Disease Using Conventional Risk Factors and Gut-Microbiome-Augmented Gradient Boosting. Cell Metab. 2022, 34, 719–730.e4. [Google Scholar] [CrossRef] [PubMed]
  5. Zhao, B.; Li, T.; Fan, Z.; Yang, Y.; Shu, J.; Yang, X.; Wang, X.; Luo, T.; Tang, J.; Xiong, D.; et al. Heart-Brain Connections: Phenotypic and Genetic Insights from Magnetic Resonance Images. Science 2023, 380, abn6598. [Google Scholar] [CrossRef] [PubMed]
  6. Littlejohns, T.J.; Holliday, J.; Gibson, L.M.; Garratt, S.; Oesingmann, N.; Alfaro-Almagro, F.; Bell, J.D.; Boultwood, C.; Collins, R.; Conroy, M.C.; et al. The UK Biobank Imaging Enhancement of 100,000 Participants: Rationale, Data Collection, Management and Future Directions. Nat. Commun. 2020, 11, 2624. [Google Scholar] [CrossRef] [PubMed]
  7. Ahmad, A.; Imran, M.; Ahsan, H. Biomarkers as Biomedical Bioindicators: Approaches and Techniques for the Detection, Analysis, and Validation of Novel Biomarkers of Diseases. Pharmaceutics 2023, 15, 1630. [Google Scholar] [CrossRef]
  8. O’Sullivan, J.W.; Ashley, E.A.; Elliott, P.M. Polygenic Risk Scores for the Prediction of Cardiometabolic Disease. Eur. Heart J. 2023, 44, 89–99. [Google Scholar] [CrossRef]
  9. Lewis, C.M.; Vassos, E. Polygenic Risk Scores: From Research Tools to Clinical Instruments. Genome Med. 2020, 12, 44. [Google Scholar] [CrossRef]
  10. Brown, M.A. Polygenic Risk Scores. Semin. Arthritis Rheum. 2024, 64, 152330. [Google Scholar] [CrossRef]
  11. Usher-Smith, J.A.; Harshfield, A.; Saunders, C.L.; Sharp, S.J.; Emery, J.; Walter, F.M.; Muir, K.; Griffin, S.J. External Validation of Risk Prediction Models for Incident Colorectal Cancer Using UK Biobank. Br. J. Cancer 2018, 118, 750–759. [Google Scholar] [CrossRef] [PubMed]
  12. Briggs, S.E.W.; Law, P.; East, J.E.; Wordsworth, S.; Dunlop, M.; Houlston, R.; Hippisley-Cox, J.; Tomlinson, I. Integrating Genome-Wide Polygenic Risk Scores and Non-Genetic Risk to Predict Colorectal Cancer Diagnosis Using UK Biobank Data: Population Based Cohort Study. BMJ 2022, 379, e071707. [Google Scholar] [CrossRef] [PubMed]
  13. Wang, M.; Hu, K.; Fan, L.; Yan, H.; Li, P.; Jiang, T.; Liu, B. Predicting Treatment Response in Schizophrenia With Magnetic Resonance Imaging and Polygenic Risk Score. Front. Genet. 2022, 13, 848205. [Google Scholar] [CrossRef] [PubMed]
  14. Bycroft, C.; Freeman, C.; Petkova, D.; Band, G.; Elliott, L.T.; Sharp, K.; Motyer, A.; Vukcevic, D.; Delaneau, O.; O’Connell, J.; et al. The UK Biobank Resource with Deep Phenotyping and Genomic Data. Nature 2018, 562, 203–209. [Google Scholar] [CrossRef] [PubMed]
  15. Langner, T.; Östling, A.; Maldonis, L.; Karlsson, A.; Olmo, D.; Lindgren, D.; Wallin, A.; Lundin, L.; Strand, R.; Ahlström, H.; et al. Kidney Segmentation in Neck-to-Knee Body MRI of 40,000 UK Biobank Participants. Sci. Rep. 2020, 10, 20963. [Google Scholar] [CrossRef] [PubMed]
  16. Zhao, B.; Ibrahim, J.G.; Li, Y.; Li, T.; Wang, Y.; Shan, Y.; Zhu, Z.; Zhou, F.; Zhang, J.; Huang, C.; et al. Heritability of Regional Brain Volumes in Large-Scale Neuroimaging and Genetic Studies. Cereb. Cortex 2019, 29, 2904–2914. [Google Scholar] [CrossRef] [PubMed]
  17. Mojtahed, A.; Kelly, C.J.; Herlihy, A.H.; Kin, S.; Wilman, H.R.; McKay, A.; Kelly, M.; Milanesi, M.; Neubauer, S.; Thomas, E.L.; et al. Reference Range of Liver Corrected T1 Values in a Population at Low Risk for Fatty Liver Disease—A UK Biobank Sub-Study, with an Appendix of Interesting Cases. Abdom. Radiol. 2019, 44, 72–84. [Google Scholar] [CrossRef]
  18. Liu, Y.; Basty, N.; Whitcher, B.; Bell, J.D.; Sorokin, E.P.; van Bruggen, N.; Thomas, E.L.; Cule, M. Genetic Architecture of 11 Organ Traits Derived from Abdominal MRI Using Deep Learning. eLife 2021, 10, e65554. [Google Scholar] [CrossRef]
  19. Bai, W.; Suzuki, H.; Huang, J.; Francis, C.; Wang, S.; Tarroni, G.; Guitton, F.; Aung, N.; Fung, K.; Petersen, S.E.; et al. A Population-Based Phenome-Wide Association Study of Cardiac and Aortic Structure and Function. Nat. Med. 2020, 26, 1654–1662. [Google Scholar] [CrossRef] [PubMed]
  20. Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M.A.R.; Bender, D.; Maller, J.; Sklar, P.; De Bakker, P.I.W.; Daly, M.J.; et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am. J. Hum. Genet. 2007, 81, 559–575. [Google Scholar] [CrossRef]
  21. Fairley, S.; Lowy-Gallego, E.; Perry, E.; Flicek, P. The International Genome Sample Resource (IGSR) Collection of Open Human Genomic Variation Resources. Nucleic Acids Res. 2020, 48, D941–D947. [Google Scholar] [CrossRef]
  22. Band, G.; Marchini, J. BGEN: A Binary File Format for Imputed Genotype and Haplotype Data. bioArxiv 2018. [Google Scholar] [CrossRef]
  23. Wigginton, J.E.; Cutler, D.J.; Abecasis, G.R. A Note on Exact Tests of Hardy-Weinberg Equilibrium. Am. J. Hum. Genet. 2005, 76, 887–893. [Google Scholar] [CrossRef]
  24. Dolci, G.; Rahaman, M.A.; Chen, J.; Duan, K.; Fu, Z.; Abrol, A.; Menegaz, G.; Calhoun, V.D. A Deep Generative Multimodal Imaging Genomics Framework for Alzheimer’s Disease Prediction. In Proceedings of the 2022 IEEE 22nd International Conference on Bioinformatics and Bioengineering (BIBE), Taichung, Taiwan, 7–9 November2022; pp. 41–44. [Google Scholar]
  25. Vanguri, R.S.; Luo, J.; Aukerman, A.T.; Egger, J.V.; Fong, C.J.; Horvat, N.; Pagano, A.; Araujo-Filho, J.D.A.B.; Geneslaw, L.; Rizvi, H.; et al. Multimodal Integration of Radiology, Pathology and Genomics for Prediction of Response to PD-(L)1 Blockade in Patients with Non-Small Cell Lung Cancer. Nat. Cancer 2022, 3, 1151–1164. [Google Scholar] [CrossRef]
  26. Nauffal, V.; Di Achille, P.; Klarqvist, M.D.R.; Cunningham, J.W.; Hill, M.C.; Pirruccello, J.P.; Weng, L.-C.; Morrill, V.N.; Choi, S.H.; Khurshid, S.; et al. Genetics of Myocardial Interstitial Fibrosis in the Human Heart and Association with Disease. Nat. Genet. 2023, 55, 777–786. [Google Scholar] [CrossRef] [PubMed]
  27. Miller, K.L.; Alfaro-Almagro, F.; Bangerter, N.K.; Thomas, D.L.; Yacoub, E.; Xu, J.; Bartsch, A.J.; Jbabdi, S.; Sotiropoulos, S.N.; Andersson, J.L.R.; et al. Multimodal Population Brain Imaging in the UK Biobank Prospective Epidemiological Study. Nat. Neurosci. 2016, 19, 1523–1536. [Google Scholar] [CrossRef]
  28. Sun, B.B.; Chiou, J.; Traylor, M.; Benner, C.; Hsu, Y.-H.; Richardson, T.G.; Surendran, P.; Mahajan, A.; Robins, C.; Vasquez-Grinnell, S.G.; et al. Plasma Proteomic Associations with Genetics and Health in the UK Biobank. Nature 2023, 622, 329–338. [Google Scholar] [CrossRef]
  29. Li, L.; Chen, Z.; von Scheidt, M.; Li, S.; Steiner, A.; Güldener, U.; Koplev, S.; Ma, A.; Hao, K.; Pan, C.; et al. Transcriptome-Wide Association Study of Coronary Artery Disease Identifies Novel Susceptibility Genes. Basic. Res. Cardiol. 2022, 117, 6. [Google Scholar] [CrossRef]
  30. Tsai, P.-C.; Lee, T.-H.; Kuo, K.-C.; Su, F.-Y.; Lee, T.-L.M.; Marostica, E.; Ugai, T.; Zhao, M.; Lau, M.C.; Väyrynen, J.P.; et al. Histopathology Images Predict Multi-Omics Aberrations and Prognoses in Colorectal Cancer Patients. Nat. Commun. 2023, 14, 2102. [Google Scholar] [CrossRef] [PubMed]
  31. Chen, R.J.; Lu, M.Y.; Williamson, D.F.K.; Chen, T.Y.; Lipkova, J.; Noor, Z.; Shaban, M.; Shady, M.; Williams, M.; Joo, B.; et al. Pan-Cancer Integrative Histology-Genomic Analysis via Multimodal Deep Learning. Cancer Cell 2022, 40, 865–878.e6. [Google Scholar] [CrossRef]
  32. Seyed Tabib, N.S.; Madgwick, M.; Sudhakar, P.; Verstockt, B.; Korcsmaros, T.; Vermeire, S. Big Data in IBD: Big Progress for Clinical Practice. Gut 2020, 69, 1520–1532. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Workflow of disease prediction model development. This diagram illustrates the systematic process used in our study, beginning with the collection of raw data from the UK Biobank, including imaging and genetic data. The process involves the extraction of key features, specifically imaging-derived phenotypes (IDPs) and polygenic risk scores (PRSs), which are subsequently employed to develop disease prediction models using logistic regression model with lasso regularization. The performance of these models is evaluated by AUC for different disease classifiers and by the ranking of features based on their importance in prediction.
Figure 1. Workflow of disease prediction model development. This diagram illustrates the systematic process used in our study, beginning with the collection of raw data from the UK Biobank, including imaging and genetic data. The process involves the extraction of key features, specifically imaging-derived phenotypes (IDPs) and polygenic risk scores (PRSs), which are subsequently employed to develop disease prediction models using logistic regression model with lasso regularization. The performance of these models is evaluated by AUC for different disease classifiers and by the ranking of features based on their importance in prediction.
Bioengineering 11 00872 g001
Figure 2. Multi-modal medical imaging. (a) Brain MRI; (b) heart MRI; (c) abdominal MRI.
Figure 2. Multi-modal medical imaging. (a) Brain MRI; (b) heart MRI; (c) abdominal MRI.
Bioengineering 11 00872 g002
Figure 3. Inclusion and exclusion flowchart for the study.
Figure 3. Inclusion and exclusion flowchart for the study.
Bioengineering 11 00872 g003
Figure 4. ROCs for common disease prediction models. This figure displays the ROCs for models predicting several common diseases, comparing the performance of models based on PRSs (blue line), IDPs (green line), and a combination of both PRSs and IDPs (red line). Each panel represents a different disease, with AUC values indicating the predictive accuracy of each model. CKD: chronic kidney disease; COPD: chronic obstructive pulmonary disease; AF: atrial fibrillation; CAD: coronary artery disease; HF: heart failure; MI: myocardial infarction; T2D: type 2 diabetes.
Figure 4. ROCs for common disease prediction models. This figure displays the ROCs for models predicting several common diseases, comparing the performance of models based on PRSs (blue line), IDPs (green line), and a combination of both PRSs and IDPs (red line). Each panel represents a different disease, with AUC values indicating the predictive accuracy of each model. CKD: chronic kidney disease; COPD: chronic obstructive pulmonary disease; AF: atrial fibrillation; CAD: coronary artery disease; HF: heart failure; MI: myocardial infarction; T2D: type 2 diabetes.
Bioengineering 11 00872 g004
Figure 5. Comparative ROC analysis of disease prediction models across entire UKBB dataset and HES center. This figure presents the ROC curves for models predicting several common diseases, with panels for each disease split into two comparisons. The left panels utilize data from the whole UKBB, while the right panels are derived from the HES center. Each curve represents the performance of models based on PRSs in blue, IDPs in green, and a combination of both PRSs and IDPs in red. CKD: chronic kidney disease; COPD: chronic obstructive pulmonary disease; AF: atrial fibrillation; CAD: coronary artery disease; HF: heart failure; MI: myocardial infarction; T2D: type 2 diabetes.
Figure 5. Comparative ROC analysis of disease prediction models across entire UKBB dataset and HES center. This figure presents the ROC curves for models predicting several common diseases, with panels for each disease split into two comparisons. The left panels utilize data from the whole UKBB, while the right panels are derived from the HES center. Each curve represents the performance of models based on PRSs in blue, IDPs in green, and a combination of both PRSs and IDPs in red. CKD: chronic kidney disease; COPD: chronic obstructive pulmonary disease; AF: atrial fibrillation; CAD: coronary artery disease; HF: heart failure; MI: myocardial infarction; T2D: type 2 diabetes.
Bioengineering 11 00872 g005
Figure 6. Feature importance in prediction models. This figure highlights the top 10 features for each disease prediction model. The color represents the specific organ from which these features are derived. The horizontal axis measures the features’ importance using the mean decrease in Gini, which quantifies their contribution to the predictive accuracy of the models.
Figure 6. Feature importance in prediction models. This figure highlights the top 10 features for each disease prediction model. The color represents the specific organ from which these features are derived. The horizontal axis measures the features’ importance using the mean decrease in Gini, which quantifies their contribution to the predictive accuracy of the models.
Bioengineering 11 00872 g006
Table 1. Population characteristics at the time of first visit with full IDPs and PRSs.
Table 1. Population characteristics at the time of first visit with full IDPs and PRSs.
CharacteristicsAll (N = 8646)Men (N = 3808)Women (N = 4838)
Age (years)64.2 ± 7.2964.8 ± 7.463.7 ± 7.2
Weight (kg)74.59 ± 13.8282.54 ± 12.1168.33 ± 11.71
Height (cm)168.69 ± 8.99175.98 ± 6.43162.95 ± 6.09
BMI26.25 ± 3.8326.87 ± 3.3925.77 ± 4.07
Leukocytes (×109/L)6.53 ± 1.906.53 ± 2.236.52 ± 1.59
RBC (×109/L)4.50 ± 0.404.75 ± 0.344.29 ± 0.32
Hemoglobin (g/L)14.14 ± 1.2215.02 ± 0.9313.44 ± 0.95
Blood platelets (109/L)251.21 ± 56.30236.74 ± 51.22262.64 ± 57.49
ALT (U/L)22.46 ± 13.0226.68 ± 13.0919.18 ± 11.99
AST (U/L)25.26 ± 7.9827.41 ± 8.0923.59 ± 7.48
Dbil (μmol/L)1.84 ± 0.802.01 ± 0.861.69 ± 0.70
Urea (mmol/L)5.26 ± 1.215.50 ± 1.235.08 ± 1.17
C-reactive protein (mg/L)1.84 ± 3.081.81 ± 3.081.86 ± 3.07
GGT (U/L)32.51 ± 34.8240.61 ± 35.8726.19 ± 32.62
Lipoprotein (mg/L)43.25 ± 48.8744.47 ± 49.7142.30 ± 48.20
TBIL (μmol/L)9.37 ± 4.5610.54 ± 5.018.45 ± 3.39
TGs (mmol/L)1.61 ± 0.911.89 ± 1.041.39 ± 0.71
RBC: red blood cell count; ALT: alanine aminotransferase; AST: aspartate aminotransferase, Dbil: direct bilirubin, GGT: gamma glutamyltransferase, TBIL: total bilirubin, TGs: triglycerides, CKD: chronic kidney disease, COPD: chronic obstructive pulmonary disease, AF: atrial fibrillation, CAD: coronary artery disease, HF: heart failure, MI: myocardial infarction, T2D: type 2 diabetes.
Table 2. The Field IDs of the different organs’ IDPs.
Table 2. The Field IDs of the different organs’ IDPs.
OrganIDPsField IDReference
Brainvolume of grey matter25782–25920, 24360–24409[16]
Heartcardiac and aortic structure and function24100–24181[19]
Lung FVC / FEV 1 / PEF / F E V 1 F V C /volume 3062, 3063, 3064, 20150, 20151, 20153, 20154, 20256, 20257, 20258, 21084[18]
Livervolume/fat fraction/iron/
corrected T1
21080, 21088, 21089, 40060, 40061, 40062[17,18]
Spleenvolume/fat fraction/iron21083, 21170, 21173[18]
Pancreasvolume/fat fraction/iron21087, 21090, 21091[18]
Kidneyvolume/kidney distance21081, 21082, 21160–21163[15]
FVC: forced vital capacity; FEV1: forced expiratory volume in one second; PEF: peak expiratory flow.
Table 3. The ICD10 code and number of cases of diseases.
Table 3. The ICD10 code and number of cases of diseases.
DiseaseICD10 Code and Field IDNumber
HFI110, I130, I132, Z941, T862, I500, I501, I50989
MII252, I210, I211, I212, I213, I214, I219, I21X, I220, I221226
AFI480, I481, I482, I483, I484, I489225
CADZ955, I252, Z951, I240, I241, I248, I249, I250, I251, I253, I254, I255, I256, I258, I259, I210, I211, I212, I213, I214, I219, I21X, I220, I221, I228, I229, I230, I231, I232, I233, I234, I235, I236, I238497
T2DE110, E111, E112, E113, E114, E115, E116, E117, E118, E119284
HypertensionI10, I110, I119, I120, I129, I130, I131, I132, I139, I150, I151, I152, I158, I159, O100, O101, O102, O103, O104, O1091637
COPD42016128
Asthma420141093
CKD132032244
HF: heart failure; MI: myocardial infarction; CKD: chronic kidney disease; COPD: chronic obstructive pulmonary disease; AF: atrial fibrillation; CAD: coronary artery disease; T2D: type 2 diabetes.
Table 4. Prediction performance of different diseases.
Table 4. Prediction performance of different diseases.
DiseaseModelCorAUCIntervalSenSpecAccuracy
CKDPRS−0.060.53 0.43~0.630.42 0.73 0.42
IDP0.34 0.68 0.59~0.760.87 0.47 0.66
PRS + IDP0.31 0.67 0.58~0.760.66 0.66 0.66
AsthmaPRS0.16 0.59 0.54~0.630.40 0.74 0.56
IDP0.34 0.70 0.66~0.740.48 0.84 0.66
PRS + IDP0.40 0.73 *0.69~0.770.61 0.73 0.67
COPDPRS0.54 0.82 0.73~0.920.84 0.71 0.78
IDP0.38 0.70 0.58~0.820.61 0.79 0.70
PRS + IDP0.66 0.88 0.81~0.960.79 0.89 0.83
AFPRS0.40 0.74 0.66~0.820.58 0.82 0.70
IDP0.37 0.71 0.62~0.790.76 0.58 0.68
PRS + IDP0.43 0.75 0.66~0.830.46 0.95 0.70
CADPRS0.27 0.66 0.60~0.720.56 0.72 0.65
IDPs0.44 0.76 0.71~0.810.88 0.56 0.71
PRS + IDP0.50 0.81* 0.75~0.860.80 0.73 0.76
HFPRS0.22 0.63 0.47~0.790.74 0.55 0.66
IDP0.32 0.68 0.53~0.830.50 0.84 0.66
PRS + IDP0.50 0.79 0.66~0.910.73 0.81 0.77
HypertensionPRS0.12 0.57 0.53~0.610.50 0.61 0.56
IDP0.45 0.77 0.74~0.800.71 0.72 0.71
PRS + IDP0.40 0.73 0.70~0.760.65 0.71 0.68
MIPRS0.24 0.62 0.52~0.710.71 0.49 0.60
IDP0.29 0.68 0.58~0.770.56 0.75 0.66
PRS + IDP0.48 0.79 0.71~0.860.87 0.55 0.72
T2D PRS0.27 0.65 0.57~0.730.63 0.60 0.62
IDP0.57 0.84 0.78~0.900.71 0.86 0.78
PRS + IDP0.60 0.87 * 0.81~0.920.82 0.79 0.81
Cor: Pearson correlation; Sen: sensitivity; Spec: specificity; CKD: chronic kidney disease; COPD: chronic obstructive pulmonary disease; AF: atrial fibrillation; CAD: coronary artery disease; HF: heart failure; MI: myocardial infarction; T2D: type 2 diabetes, “*” indicates that PRSs + IDPs are significantly better than PRSs or IDPs alone.
Table 5. Prediction performance of different diseases with and without lifestyle factors.
Table 5. Prediction performance of different diseases with and without lifestyle factors.
DiseaseModelCorAUCIntervalSenSpecAccuracy
CKDBaseline0.31 0.67 0.58~0.760.660.66 0.66
Baseline + drink0.36 0.72 0.62~0.820.55 0.820.67
Baseline + smoke0.370.720.62~0.820.62 0.76 0.69
Baseline + drink + smoke0.31 0.68 0.58~0.780.49 0.82 0.65
AsthmaBaseline0.35 0.70 0.66~0.740.59 0.700.65
Baseline + drink0.33 0.69 0.65~0.740.780.55 0.67
Baseline + smoke0.36 0.71 0.66~0.750.70 0.61 0.65
Baseline + drink + smoke0.380.720.67~0.760.65 0.70 0.67
COPDBaseline0.710.910.84~0.970.920.74 0.83
Baseline + drink0.55 0.84 0.72~0.970.79 0.930.86
Baseline + smoke0.42 0.76 0.62~0.890.71 0.75 0.73
Baseline + drink + smoke0.60 0.85 0.73~0.980.88 0.85 0.86
AFBaseline0.500.810.74~0.890.760.76 0.76
Baseline + drink0.39 0.78 0.68~0.870.61 0.920.76
Baseline + smoke0.49 0.79 0.70~0.880.64 0.88 0.74
Baseline + drink + smoke0.39 0.75 0.65~0.850.68 0.82 0.78
CADBaseline0.46 0.77 0.72~0.820.66 0.79 0.73
Baseline + drink0.510.800.75~0.860.69 0.840.76
Baseline + smoke0.44 0.76 0.70~0.830.68 0.74 0.71
Baseline + drink + smoke0.40 0.73 0.66~0.790.690.69 0.69
HFBaseline0.630.870.78~0.970.85 0.81 0.83
Baseline + drink0.56 0.82 0.69~0.950.900.65 0.78
Baseline + smoke0.37 0.68 0.51~0.860.42 1.000.73
Baseline + drink + smoke0.24 0.58 0.40~0.770.64 0.56 0.60
HypertensionBaseline0.470.780.75~0.810.67 0.760.72
Baseline + drink0.46 0.77 0.74~0.810.77 0.65 0.71
Baseline + smoke0.45 0.77 0.73~0.800.860.56 0.71
Baseline + drink + smoke0.44 0.76 0.73~0.790.67 0.71 0.69
MIBaseline0.49 0.800.73~0.880.870.63 0.75
Baseline + drink0.48 0.78 0.69~0.870.83 0.71 0.77
Baseline + smoke0.500.79 0.70~0.880.79 0.66 0.73
Baseline + drink + smoke0.36 0.73 0.63~0.820.64 0.750.70
T2D Baseline0.560.840.78~0.900.67 0.860.76
Baseline + drink0.53 0.81 0.74~0.890.85 0.72 0.78
Baseline + smoke0.50 0.79 0.71~0.870.900.62 0.76
Baseline + drink + smoke0.52 0.80 0.73~0.880.85 0.63 0.73
Baseline means the model integrating PRSs and IDPs. Cor: Pearson correlation; Sen: sensitivity; Spec: specificity; CKD: chronic kidney disease; COPD: chronic obstructive pulmonary disease; AF: atrial fibrillation; CAD: coronary artery disease; HF: heart failure; MI: myocardial infarction; T2D: type 2 diabetes. Bolded text indicates the best performance for the respective metric.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, M.; Li, Y.; Sun, L.; Sun, M.; Hu, X.; Li, Q.; Yu, M.; Wang, C.; Ren, X.; Ma, J. Integrating Multi-Organ Imaging-Derived Phenotypes and Genomic Information for Predicting the Occurrence of Common Diseases. Bioengineering 2024, 11, 872. https://doi.org/10.3390/bioengineering11090872

AMA Style

Liu M, Li Y, Sun L, Sun M, Hu X, Li Q, Yu M, Wang C, Ren X, Ma J. Integrating Multi-Organ Imaging-Derived Phenotypes and Genomic Information for Predicting the Occurrence of Common Diseases. Bioengineering. 2024; 11(9):872. https://doi.org/10.3390/bioengineering11090872

Chicago/Turabian Style

Liu, Meng, Yan Li, Longyu Sun, Mengting Sun, Xumei Hu, Qing Li, Mengyao Yu, Chengyan Wang, Xinping Ren, and Jinlian Ma. 2024. "Integrating Multi-Organ Imaging-Derived Phenotypes and Genomic Information for Predicting the Occurrence of Common Diseases" Bioengineering 11, no. 9: 872. https://doi.org/10.3390/bioengineering11090872

APA Style

Liu, M., Li, Y., Sun, L., Sun, M., Hu, X., Li, Q., Yu, M., Wang, C., Ren, X., & Ma, J. (2024). Integrating Multi-Organ Imaging-Derived Phenotypes and Genomic Information for Predicting the Occurrence of Common Diseases. Bioengineering, 11(9), 872. https://doi.org/10.3390/bioengineering11090872

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop