MOGAD: Integrated Multi-Omics and Graph Attention for the Discovery of Alzheimer’s Disease’s Biomarkers

Zhang, Zhizhong; Chen, Yuqi; Wang, Changliang; Guo, Maoni; Cai, Lu; He, Jian; Liang, Yanchun; Wong, Garry; Chen, Liang

doi:10.3390/informatics12030068

Open AccessArticle

MOGAD: Integrated Multi-Omics and Graph Attention for the Discovery of Alzheimer’s Disease’s Biomarkers

by

Zhizhong Zhang

^1,†,

Yuqi Chen

^1,†,

Changliang Wang

^2,†,

Maoni Guo

²,

Lu Cai

¹,

Jian He

¹,

Yanchun Liang

³

,

Garry Wong

⁴

and

Liang Chen

^1,*

¹

Department of Computer Science and Technology, College of Mathematics and Computer, Shantou University, Shantou 515063, China

²

Guangzhou National Laboratory, Guangzhou 510799, China

³

School of Computer Science, Zhuhai College of Science and Technology, Zhuhai 519041, China

⁴

Faculty of Health Sciences, University of Macau, Taipa, Macau SAR 999078, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Informatics 2025, 12(3), 68; https://doi.org/10.3390/informatics12030068

Submission received: 12 May 2025 / Revised: 28 June 2025 / Accepted: 7 July 2025 / Published: 9 July 2025

Download

Browse Figures

Versions Notes

Abstract

The selection of appropriate biomarkers in clinical practice aids in the early detection, treatment, and prevention of disease while also assisting in the development of targeted therapeutics. Recently, multi-omics data generated from advanced technology platforms has become available for disease studies. Therefore, the integration of this data with associated clinical data provides a unique opportunity to gain a deeper understanding of disease. However, the effective integration of large-scale multi-omics data remains a major challenge. To address this, we propose a novel deep learning model—the Multi-Omics Graph Attention biomarker Discovery network (MOGAD). MOGAD aims to efficiently classify diseases and discover biomarkers by integrating various omics data such as DNA methylation, gene expression, and miRNA expression. The model consists of three main modules: Multi-head GAT network (MGAT), Multi-Graph Attention Fusion (MGAF), and Attention Fusion (AF), which work together to dynamically model the complex relationships among different omics layers. We incorporate clinical data (e.g., APOE genotype) which enables a systematic investigation of the influence of non-omics factors on disease classification. The experimental results demonstrate that MOGAD achieves a superior performance compared to existing single-omics and multi-omics integration methods in classification tasks for Alzheimer’s disease (AD). In the comparative experiment on the ROSMAP dataset, our model achieved the highest ACC (0.773), F1-score (0.787), and MCC (0.551). The biomarkers identified by MOGAD show strong associations with the underlying pathogenesis of AD. We also apply a Hi-C dataset to validate the biological rationality of the identified biomarkers. Furthermore, the incorporation of clinical data enhances the model’s robustness and uncovers synergistic interactions between omics and non-omics features. Thus, our deep learning model is able to successfully integrate multi-omics data to efficiently classify disease and discover novel biomarkers.

Keywords:

multi-omics; disease classification; graph attention; biomarker discovery; deep learning

1. Introduction

Alzheimer’s disease (AD) is a neurodegenerative condition characterized by an insidious onset and progressive development, yet its pathogenesis remains enigmatic [1]. In clinical practice, the identification of appropriate biomarkers for early diagnosis, prevention, detection, and treatment is of great significance in preventing and delaying the progression of disease. Moreover, such biomarkers can contribute to mechanistic studies and provide effective therapeutic targets for future drug development [2]. With the rapid development of high-throughput technologies, researchers can potentially obtain various types of omics data such as proteomics, epigenomics (DNA methylation), and transcriptomics (gene or miRNA expression) from the same sample [3]. These data offer a multi-faceted perspective on the intrinsic structure and function of biological systems. Since single-omics approaches can only capture a fraction of biological complexity, integrating data from multiple omics platforms holds great potential for biomarker discovery [4]. Such discoveries can then accelerate the identification and understanding of complex diseases, shedding light on mechanisms of action and paving the way for effective diagnosis and intervention.

Multi-omics has shown immense potential in the study of various diseases. In cancer research, the importance and application of multi-omics approaches have become increasingly prominent, providing a comprehensive perspective for uncovering the biological mechanisms and clinical characteristics of cancer [5,6]. In 2019, Leon-Mimila P et al. successfully applied integrative multi-omics approaches to explore new mechanisms of cardiovascular disease and discover plasma biomarkers [7]. Multi-omics also plays a significant role in other types of diseases, such as chronic obstructive pulmonary disease [8], ulcerative colitis [9], and inflammatory bowel disease [10]. They also addressed current challenges, research gaps, and future directions to identify novel biomarkers and therapeutic targets, advancing personalized treatment development.

Multi-omics also holds significant potential in AD research. Integrating multi-omics facilitates mechanistic studies of AD and enhances our understanding and provides therapeutic strategies for AD and related dementias [11,12,13,14]. Previously, our research identified potential blood biomarkers for Parkinson’s disease (PD) by integrating whole-blood gene expression and DNA methylation data [15]. In 2019, Laura Xicota et al. successfully applied multi-omics approaches to identify biomarkers for amyloid plaque deposition from peripheral blood in a group of asymptomatic individuals at risk of AD [16]. In 2021, Whitaker Cohn et al., through multi-omics analysis, identified disease-related features from exosomes of microglial cells isolated from the brain tissue of AD patients [17].

However, due to the high dimensionality and complexity of multi-omics data, effectively integrating information from different omics platforms and data formats remains a major challenge, which includes handling heterogeneity, batch effects, and ensuring the interpretability of data integration analyses [18,19]. Most traditional methods do not consider the interactions between different omics data types and may bias certain omics categories, while machine learning methods can provide potential solutions [20]. As a result, many innovative multi-omics integration approaches have been proposed, focusing on leveraging interactions between different omics data types. For example, In 2015, Dokyoon Kim et al. proposed a graph-based semi-supervised learning integration method to combine multi-omics data and genomic knowledge to predict cancer clinical outcomes [21]. In 2018, Ricard Argelaguet et al. introduced an unsupervised statistical learning algorithm, MOFA, to identify the main sources of variation in multi-omics data [22]. In 2019, Amrit Singh et al. developed a supervised ensemble classification algorithm, DIABLO, that identifies highly correlated variables across omics layers while modeling inter-layer interactions [23].

In recent years, an increasing number of researchers have applied deep neural networks (DNNs) to multi-omics integration [24]. In 2019, Tianle Ma et al. proposed MAE, a deep learning method for predicting the survival and progression-free survival of cancer patients [25]. In 2021, Xiaoyu Zhang et al. introduced OmiEmbed, a unified end-to-end deep learning framework for high-dimensional multi-omics data. This framework is designed to handle tasks such as dimensionality reduction, tumor type classification, multi-omics integration, clinical feature reconstruction, and survival prediction [26]. In the same year, Olivier B. Poirion et al. presented DeepProg, a novel computational framework that integrates deep learning and machine learning methods. DeepProg showed a high predictive power on liver cancer and breast cancer datasets and outperformed other multi-omics integration methods in risk stratification [27]. However, these models did not leverage the similarities between samples, which may result in the loss of some sample-specific similarities. Also in 2021, Tongxin Wang et al. proposed multi-omics graph convolutional networks (MOGONET) [28], a multi-omics integration method that combines omics-specific learning with cross-omics correlation learning. This method constructs similarity networks for each omics feature of the sample using cosine similarity, and then applies graph convolutional networks to learn the sample similarity network for effective multi-omics data classification. MOGONET performed well in AD patient classification, low-grade glioma grading, renal cancer type classification, and breast cancer invasive subtype classification using gene expression, DNA methylation, and miRNA expression data. It is an effective multi-omics data classification tool with broad application potential. In 2022, Li X et al. proposed a multi-omics integration model based on graph convolutional network (MoGCN) [29], which integrates 3 different types of omics: copy number variation (CNV), gene expression, and protein expression. The model first calculates similarity networks using the similarity network fusion (SNF) algorithm, reduces the dimensionality of the original matrices via an auto-encoder (AE), and finally performs classification using a graph convolutional network (GCN). MoGCN achieved promising results on a BRCA dataset. In 2023, Gong P et al. introduced the multi-omics attention-driven learning network (MOADLN) model [30]. Similar to MOGONET, MOADLN utilizes three types of omics: DNA methylation, gene expression, and miRNA expression. It employs a multi-head self-attention network for classification and prediction, achieving satisfactory performance on both the ROSMAP and BRCA datasets. Additionally, MOADLN identified the top 20 significant biomarkers associated with disease mechanisms. In 2024, Lan et al. proposed DeepKEGG, a novel interpretable deep learning framework for multi-omics integration, which models sample-wise correlations via pathway self-attention and incorporates a biological hierarchy to improve cancer recurrence prediction and biomarker identification [31]. In the same year, Wang et al. introduced TMO-Net, a cross-modal multi-omics pre-trained network that enables joint representation learning and incomplete omics inference across pan-cancer datasets, serving as a foundation model for interpretable downstream oncology tasks [32]. In 2025, Xu et al. proposed EMitool, an explainable network-based multi-omics integration method for disease subtyping, which achieves clinically relevant stratification without prior clinical information and demonstrates superior performance across 31 TCGA cancer types [33].

These models currently focus only on integrating multi-omics data and do not incorporate relevant clinical data. The inclusion of clinical data alongside omics data has the potential to improve the prediction accuracy of disease biomarkers and enhance our understanding of disease. As a result, researchers have begun applying machine learning algorithms to integrate both omics and other non-omics data. In 2019, Gangcai Xie et al. proposed an algorithm called GDP, which combines deep learning frameworks with the Cox proportional hazards (CPHs) model and applies group lasso regularization for survival prediction using both clinical and multi-omics data in cancer prognosis [34]. In the same year, Zhi Huang et al. introduced a deep learning model named SALMON, which integrates multi-omics data and clinical data from patients to predict breast cancer survival. The results demonstrated that the integration of multi-omics data with clinical data provides a better prediction of breast cancer survival outcomes [35].

To our knowledge, deep learning models that integrate multi-omics and non-omics data have not yet been applied in AD research, nor has such validation been extended to include Hi-C data. Moreover, there is a lack of models that simultaneously incorporate clinical data, multi-omics information, and Hi-C validation. Therefore, we propose a novel model, MOGAD (Multi-Omics Graph Attention biomarker Discovery Network), a deep learning model for biomarker discovery that integrates multiple omics data with the similarity between samples. Additionally, MOGAD effectively incorporates non-omics data, leading to improved prediction and diagnostic outcomes. We demonstrate the model’s effectiveness by using MOGAD to predict a sample’s disease status based on their multi-omics data. Moreover, by integrating non-omics data with multi-omics, we observe significantly improved results. Furthermore, we use MOGAD to identify key biomarkers associated with AD, highlighting the model’s ability to extract relevant biological insights. We utilized Hi-C data to analyze the top-ranked biomarkers, comparing their chromatin interaction frequencies between normal and AD samples. This comparison serves to validate the biological relevance and robustness of our biomarker discovery results. To validate the model’s generalizability and effectiveness, we also apply it to multi-class cancer datasets, demonstrating its robustness in other disease contexts. We compared MOGAD with several models from previous studies. The results are presented in Table A1.

2. Materials and Methods

2.1. Datasets

2.1.1. Multi-Omics Data and Clinical Data

AD datasets were obtained from the ROSMAP study. ROSMAP consists of the ROS and MAP studies, both of which are longitudinal clinical-pathological cohort studies conducted by Rush University on AD. ROSMAP contains multi-omics and clinical data related to AD. ROSMAP is a large-scale neuroscience and aging research project aimed at investigating the pathogenesis of AD and related disorders such as cognitive impairment in older adults [36,37,38]. We obtained and integrated DNA methylation, mRNA, and miRNA data from the datasets, along with their clinical information. To ensure data integrity, samples with missing values in any omics modality were excluded, yielding a cohort with complete multi-omics profiles. For ROSMAP, we used the clinical consensus diagnosis of cognitive status at the time of death as the label, categorizing AD⁺ and AD as a single group. Ultimately, the samples were divided into two groups: No cognitive impairment (NCI) and AD, comprising 169 NCI samples and 207 AD samples. Additionally, we collected CERAD score [39], Braak stage [40], and APOE genotype as non-omics features of the samples.

For model testing purposes, we obtained the TCGA Breast Invasive Carcinoma (BRCA) dataset containing multi-omics and clinical data related to breast cancer [41]. For BRCA, we use the same subtype classification system (PAM50) as MOGONET to divide the samples into five categories, LumA, LumB, Normal, Basal, and Her2, with sample sizes of 189, 98, 63, 56, and 25, respectively [42,43].

2.1.2. Other Annotation Data

Gene annotation was based on the latest Ensembl version (Release 114) [44]. miRNA related information was obtained from miRBase (https://www.mirbase.org/, accessed on 14 September 2024, version 22) [45] and human miRNAs derived from viruses were excluded. Platform information for the Methylation 450 k array was downloaded from GEO, which allowed us to annotate the methylation sites and gene regions. The targets of miRNA were obtained from miRTarbase (Version 2025) [46].

We obtained the mapping file for the methylation of Cytosine–phosphate–Guanine (CpG) sites to different gene regions from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL13534 (accessed on 9 September 2022). The DNA methylation data were mapped to different gene regions, and 6 regions were selected for this study: Body, 1stExon, TSS1500, 5′UTR, 3′UTR, and TSS200. As each gene was represented by multiple probes, we computed the mean methylation value across all probes per gene and assigned this averaged value to the respective gene, ultimately generating 6 distinct matrices.

Additionally, 2 extra matrices were generated:

We generated an ALLRegions matrix by averaging the values of the same genes across the 6 matrices.
We generated an ALLGeneALLRegions matrix by concatenating the above 7 matrices (after adding region-specific prefixes to their gene names) based on matching sample names.

The above methods were applied to both the ROSMAP and BRCA datasets.

2.1.3. Hi-C Data

We incorporated Hi-C data to validate the reliability of biomarker discovery results obtained from the ROSMAP dataset. The Hi-C datasets used in this study were downloaded from http://menglab.pub/hic/ (accessed on 14 September 2024) [47]. Specifically, we utilized sample-merged processed hic files (smp_ad.hic and smp_aged.hic), representing the Hi-C data of AD patients and normal elderly individuals, respectively. We employed Juicer Tools [48] to extract interaction frequencies between chromosomal regions corresponding to each feature.

Since some genes span multiple non-contiguous chromosomal regions, we treated each region as a separate feature. To distinguish them, we appended the suffix “.x” (x = 1, 2, 3, …) to the feature names in the file. As a result, a total of 100 features were constructed for downstream analysis.

Subsequently, interaction frequencies between miRNA feature pairs were extracted from two Hi-C datasets using Juicer Tools, generating symmetric n × n matrices (n: number of features). Hi-C differential matrices were then calculated through a logarithmic transformation of normalized interaction ratios between AD and normal aging samples. To address sparse interactions caused by narrow chromosomal feature ranges, all genomic regions were expanded by ±100,000 base pairs. Data processing employed Knight–Ruiz (KR) normalization at 1 kb resolution. Detailed computational procedures are provided in Appendix A.1.

2.2. Methods

Our model is a classification task that integrates multi-omics data with non-omics data. The MOGAD workflow consists of 6 main components: data preprocessing, construction of similarity networks, pretraining of MGAT, pretraining of MGAF, integration of omics networks and biomarker discovery. The workflow is illustrated in Figure 1, and the detailed steps will be described in the following sections.

Figure 1 illustrates the complete workflow of this study, which comprises 3 parts: (1) Data collection: We collected 3 types of omics data and clinical data. Redundant features and noise were removed from the omics data through feature selection; (2) MOGAD: The omics data and their corresponding cosine similarity matrices were processed by MGAT and MGAF, respectively. The outputs of these 2 modules were then fed into AF to produce the final predictions. When clinical data is involved, it is concatenated with the omics data prior to downstream processing; (3) Biomarker discovery: please refer to the Biomarker Discovery section.

2.2.1. Data Preprocessing

To ensure data quality and comparability across diseases, all omics data were preprocessed. DNA methylation data were preprocessed following MOGONET’s protocol, retaining only probes aligned with the Illumina Infinium HumanMethylation27 Bead Chip [28]. Low-variance and zero-value features were filtered from methylation, mRNA, and miRNA data, followed by z-score normalization. For the ROSMAP dataset, clinically validated non-omics features (CERAD score, Braak stage, APOE genotype) were encoded into numerical variables and processed by z-score normalization. In order to avoid the influence of test samples on preprocessing, all preprocessing was conducted after splitting the data into training and test sets. Detailed preprocessing criteria and parameter thresholds are provided in Appendix A.2.

2.2.2. Construction of Similarity Networks

To capture sample relationships, we constructed similarity networks using omics and non-omics data through cosine similarity analysis. When using omics data alone, each omics type generated a distinct network by calculating pairwise sample similarities. For integrated analysis, non-omics features were concatenated with omics data and weighted by their clinical relevance, forming a composite similarity network. All networks underwent sparsification via adaptive thresholding to retain biologically meaningful connections, following methodologies adapted from MOGONET [28]. Detailed mathematical formulations and parameter settings are provided in Appendix A.3.

2.2.3. Pretraining of MGAT

To capture the features and similarities of each omics sample, we employ an MGAT module. Each omics dataset’s feature matrix and similarity network are input into a corresponding MGAT for training, allowing independent prediction results for each omics type to be learned. The MGAT leverages adaptive attention weights to emphasize samples with high biological similarity, which is enhanced by multi-head mechanisms to capture diverse relational patterns. Intermediate representations from multiple attention heads were concatenated and further refined through a secondary GAT layer for preliminary classification. Training utilized mean squared error (MSE) loss to align predictions with clinical labels. The complete MGAT architecture is visualized in Figure A1 and the details can be found in Appendix A.4.

2.2.4. Pretraining of MGAF

To overcome the limitations of late fusion in capturing cross-omics interactions, we developed an MGAF module for intermediate feature integration. MGAF dynamically learns attention weights to combine encoded representations from multiple omics-specific GATs, enabling synergistic learning of inter-omics relationships during training. This mid-fusion strategy preserves feature-level interactions by projecting multi-omics embeddings into a unified space before classification. The module is exclusively activated for multi-omics inputs; the architecture of the module is provided in Figure A2 and further details can be found in Appendix A.5.

2.2.5. Integration of Omics Networks

To harmonize intra- and inter-omics correlations, we designed an AF module that integrates preliminary predictions from MGAT and MGAF through learnable attention weights. This joint training framework dynamically balances omics-specific and cross-omics signals while incorporating L2 regularization to mitigate overfitting. For combined omics/non-omics inputs, non-omics features are weighted by clinical relevance scores before concatenation with omics data, ensuring biologically informed representation learning. The complete MOGAD workflow, including staged training strategies and multi-source data routing, is illustrated in Figure A3 and the details can be found in Appendix A.6.

2.2.6. Biomarker Discovery

The MOGAD model was trained on multi-omics data to identify influential biomarkers through a perturbation-based feature scoring strategy. For each omics feature, its importance was quantified by the decline in prediction performance (F1-score) when the feature was masked in the test data. Features were ranked by their impact scores, with higher scores indicating stronger biomarker potential. To ensure robustness, the scoring process incorporated repeated experiments across 8 methylation region groups and the miRNAs identified across the 8 experiments were then ranked by selection frequency to identify the most recurrent features. Biologically significant features were further prioritized via enrichment analysis. The complete workflow is illustrated in the bottom of Figure 1.

2.2.7. Hyperparameter Tuning

For the 3 modules (MGAT, MGAF, and AF), some hyperparameters were determined by performing a grid search based on the parameter settings from the publicly available MOGONET codebase, such as the pretraining learning rates for MGAT and MGAF, as well as the learning rate for AF. Other hyperparameters, including the number of hidden units, the dropout rate, and the number of attention heads, were tuned via a grid search on a widely adopted benchmark dataset. Detailed hyperparameter configurations are provided in Table A2.

3. Results

3.1. Experiment

To validate the effectiveness of our model, we compared its performance on the ROSMAP and BRCA datasets with other existing omics classification models. To assess whether integrating non-omics data enhances performance, we conducted experiments on the ROSMAP dataset using several different types of non-omics data for integration. To evaluate the effectiveness and necessity of omics integration, we designed comparative experiments using only a single omics dataset or combinations of two omics datasets for classification. Moreover, to verify the necessity of each module in our model, we conducted a series of ablation experiments.

To ensure the reliability and reproducibility of the experimental results, we randomly split the dataset into training and testing sets at a 7:3 ratio, generating a total of six different data partition schemes. For each experiment, we conducted six independent repetitions using these six data schemes, and the final results are the average of these six experiments.

Regarding evaluation metrics, for binary classification tasks, we used Accuracy (ACC), F1-score, Area Under the Curve (AUC), and the Matthews Correlation Coefficient (MCC); for multi-class classification tasks, we used Accuracy (ACC), weighted F1-score (F1-weighted), and macro-averaged F1-score (F1-macro) as evaluation criteria.

3.2. The Performance of Different Models Using Multi Omics Data

For the ROSMAP dataset, we compared the classification performance of our model, using only omics data, with the following widely cited classification models [15,49,50]: Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Least absolute shrinkage and selection operator (Lasso), Elastic-Net Regularized Generalized Linear Models (Elastic-Net), Random Forest (RF), Decision Tree (DT), Gaussian Naive Bayes (GNB), eXtreme Gradient Boosting (XGBoost), Ridge Classifier (Ridge), Partial Least Squares Regression (PLSR), Deep Neural Network (DNN), MOGONET [28], MOADLN [30], and MoGCN [29]. For the DNN, in our experiments, we use full collection network, and for the Lasso and Elastic-Net, predictions greater than 0.5 were classified as AD, while predictions less than or equal to 0.5 were classified as NCI. For the BRCA dataset, since Lasso, Elastic-Net, Ridge, and PLSR are primarily used for regression tasks and are not commonly applied to multi-class classification tasks, our comparison only includes XGBoost, SVM, RF, KNN, DNN, DT, GNB, and MOGONET. For DNN, we chose a fully connected network, similar to the setup used for ROSMAP.

Except for our model, MOGONET, MOADLN, and MoGCN, all other models were trained using the direct concatenation of features from the three omics datasets. MoGCN was utilized only in the BRCA dataset due to its demonstrated superiority in multi-categorization classification tasks. All experiments were conducted using the same preprocessed data. The experimental results are shown in Table 1 and Table A3.

From Table 1 and Table A3, we observe that some of the traditional machine learning models exhibit commendable performances. For the ROSMAP dataset, Ridge performed the best among the traditional machine learning models, followed closely by Elastic-Net and Lasso. In contrast, for the BRCA dataset, XG-Boost performed the best, with RF in second place. Interestingly, in both datasets, there were traditional machine learning models that outperformed the deep learning model DNN. In the ROSMAP dataset, Ridge even surpassed MOGONET and slightly outperformed our model in terms of AUC. Similarly, in the BRCA dataset, XGBoost outperformed MOGONET but was slightly inferior to our model. This phenomenon suggests that deep learning models are not always superior to traditional machine learning models for certain tasks. Designing task-specific deep learning models tailored to real-world requirements is essential. Our model, except for a slightly lower AUC than Ridge in the ROSMAP dataset, outperformed all other models across all metrics. This demonstrates the superior capability of our model in integrating multi-omics data effectively.

3.3. Comparison of Our Model with Other Models in Terms of Training Speed and Memory

To gain a comprehensive understanding of our model’s performance in terms of speed and memory usage, we conducted experiments on both aspects and compared the results with those of other models. We chose to compare with MOGONET and MOADLN, to control the variables, we used 500 pre-training epochs and 2000 training epochs for all models. Notably, to better control the epoch variable, we modified the experiment by changing the training frequency of MGAT and MGAF from once every five epochs to once every epoch. The results are shown in Table A4.

As shown in Table A4, in terms of time efficiency, the three models showed similar performance, and our model did not demonstrate any improvement in terms of time consumption. The peak RAM usage of MOGAD is comparable to that of MOADLN, and both are approximately 300 MB lower than that of MOGONET, suggesting that our model does not impose an additional memory overhead. Moreover, MOGAD shows significantly higher peak GPU memory usage, reaching several times that of the others. We attribute both observations to the computational complexity of the Graph Attention Network (GAT) and the use of multi-head attention mechanisms, which result in a substantially larger computational scale compared to the other models. Although this design increases both training time and GPU memory consumption, the trade-off is justified by the notable performance improvements achieved by our model.

3.4. The Performance of Our Model Under Different Omics Data Types

To validate that our model effectively integrates data from multiple omics, we conducted several comparative experiments. These experiments included using combinations of three omics datasets, combinations of two omics datasets (DNA + mRNA, DNA + miRNA, mRNA + miRNA), and individual omics datasets, the results of ROSMAP are shown in Figure 2, and the BRCA results are shown in Figure A4.

From the figures, we observe similar trends across the ROSMAP and BRCA datasets. The combination of all three omics datasets consistently achieved the best performance. For the two omics datasets combinations, their results were better than the respective single omics results of each contributing dataset. However, single omics data not included in the combination sometimes outperformed the two omics datasets combinations.

This indicates that our model can effectively leverage interactions between multiple omics datasets and demonstrates its scalability to integrate multiple omics. It also highlights that certain single omics datasets may be more informative than the combinations of some other omics datasets. Among the individual omics datasets, mRNA performed the best.

3.5. The Performance of Our Model Using Non-Omics Data with Different Importance Score k

In our data processing, we observed that non-omics data might also influence experimental results. To explore this, we incorporated three non-omics features—APOE genotype, CERAD score, and Braak stage—into the ROSMAP dataset. We introduced a hyperparameter k to quantify the relative importance of each non-omics feature. By multiplying the non-omics features by their respective k values, the model could focus more effectively on these features. Since our model multiplies the non-omics features by k before concatenating them with the omics features and constructing the similarity network, increasing the value of k biases the similarity computation towards the corresponding non-omics feature.

The impact of the hyperparameter k on the model for each of the three non-omics features can be found in Figure A5. We observed that the model’s performance varied with different k values. For APOE genotype and Braak stage, the model exhibited clear performance peaks when k was set to 2 and 4. In contrast, for CERAD score, a noticeable peak occurred only when k was set to 2. These variations suggest that neither neglecting nor excessively emphasizing these features allows the model to fully leverage their utility. However, our model effectively identifies important features, leading to minimal performance fluctuations. Instead, the model’s performance stabilizes within a certain range, showcasing its robustness in incorporating significant features.

To demonstrate that our model can effectively integrate non-omics data with existing multi-omics datasets to enhance prediction and diagnostic accuracy, we used the best-performing hyperparameter k values from the aforementioned experiments for the comparison. Specifically, we set k = 2 for APOE genotype, k = 4 for Braak stage and CERAD score. The results of these experiments are shown in Figure 3.

As demonstrated in Figure 3, the integration of non-omics features universally enhances model performance metrics compared to models without non-omics inputs. Among these biological characteristics, Braak stage exhibits the most substantial performance improvement, while APOE genotype demonstrates a more modest yet statistically significant enhancement. Notably, both CERAD score and Braak stage require postmortem brain dissection for acquisition, which confines their clinical application to diagnostic contexts. In contrast, APOE genotype represents a pre-mortem genetic biomarker that can be longitudinally tracked throughout the disease course, thereby enabling the model to function as a predictive diagnostic tool. Although the performance increment from the APOE genotype is comparatively limited, its stable predictive value persists across multiple clinical endpoints.

3.6. The Necessity of Each Module in Our Model

To validate the effectiveness and necessity of each module in our model, we conducted several ablation experiments: (1) remove MGAF and only use the combination of MGAT and AF; (2) remove MGAT and only use the combination of MGAF and AF; and (3) replace AF with direct concatenation of multiple inputs, followed by a linear layer.

To verify the necessity of learning each omics dataset separately through individual modules rather than simply concatenating all omics data early on, we directly concatenated the features of the three omics before training and treated it as a single omics input.

The findings from the ROSMAP cohort (Figure 4) and BRCA cohort (Figure A6) demonstrate that the removal of any individual fusion module resulted in performance degradation, underscoring the necessity of each component. Notably, removing MGAF had the most significant impact on the model, highlighting the importance of the mid-term fusion of omics data. Additionally, directly concatenating the three omics datasets produced worse results than learning each omics dataset with individual modules, this suggests that our model first learns the features of each omics dataset separately and then integrates them, which improves the overall performance.

3.7. The Performance of Our Model Using DNA Gene Regions and CpGs

Figure 5 and Figure 6 display the performance of DNA gene regions and CpGs combined with mRNA and miRNA, as well as their standalone performance on the ROSMAP dataset. The performance on the BRCA dataset is shown in Figure A7 and Figure A8.

In the BRCA dataset, the performance of gene regions is generally better than when using CpGs, which indicates that integrating CpGs into gene regions is beneficial for improving model performance. In the ROSMAP dataset, although some gene regions perform worse than CpGs, most still outperform the CpG probes. When comparing the performance of single-omics and multi-omics within the same dataset, we observe that combining multiple omics reduces the gap between gene regions and CpGs. This suggests that our model can leverage the combination of multiple omics to narrow the performance gap between gene regions and CpGs.

3.8. The Biomarkers Identified by Our Model

To investigate how to better incorporate methylation data into the MOGAD model for the purpose of identifying effective biomarkers, this study integrated eight different types of methylation data: Body, 1stExon, TSS1500, 5′UTR, ALLRegions, 3′UTR, ALLGeneALLRegions, and TSS200. Each type of methylation data was used as the methylation component in multi-omics integration alongside mRNA and miRNA for model training. For each region, the top 100 miRNAs were identified based on model results, and then miRNA features were further selected from two datasets based on their frequency of occurrence across all gene regions. Features that appeared more than four times were retained for further analysis. As a result, 118 miRNA features were identified from the BRCA dataset and 89 from the ROSMAP dataset for subsequent functional enrichment analysis. The enrichment results of ROSMAP miRNA biomarkers are shown in Figure 7, and BRCA miRNA biomarkers are shown in Figure A9.

3.9. Hi-C in Research

To verify the rationality of biomarker discovery, we introduced Hi-C data to perform further analysis on the biomarker discovery results.

Figure 8 Illustrates Hi-C interaction frequency matrices of control and AD. We did find some miRNAs with large differences between AD and Normal, such as hsa-miR-548j-5p, hsa-miR-192-5p, hsa-miR-34b-3p, hsa-miR-34b-5p, hsa-miR-1286, hsa-miR-129-5p.1 and hsa-miR-129-2-3p. These miRNAs suggest that we may have identified features associated with Alzheimer’s disease, with the higher interaction frequencies between the pairs indicating a higher likelihood of disease. In addition to the large differences in interaction frequencies for several feature pairs, there are also significant differences in clustering distribution. These findings validate the potential biological function of the selected biomarkers while further confirming the rationality of the experimental approach to biomarker discovery.

4. Discussion

In this study, we developed MOGAD, an integrated neural network model capable of both classification prediction and biomarker discovery. Using MOGAD, we achieved outstanding performance metrics: on ROSMAP dataset, the model attained 0.773 (ACC), 0.787 (F1-score), 0.832 (AUC), and 0.551 (MCC); while on the BRCA dataset, it achieved 0.874 (ACC), 0.874 (F1-weighted), and 0.861 (F1-macro).

Through comprehensive experiments, we demonstrated MOGAD’s exceptional capabilities in the following: (1) effective integration of multi-omics data; (2) essential contribution of each modular component; (3) successful incorporation of non-omics data; (4) high-performance biomarker identification.

Despite the promising results, our study has several limitations. First, the current research predominantly relies on the ROSMAP dataset for AD, with only partial validation conducted using BRCA data. This limited dataset scope restricts the generalizability of our findings. To address this, future studies should expand the dataset coverage to include a diverse range of diseases, enabling more comprehensive validation and broader applicability of the proposed approach. Second, unlike TMO-Net, our model is unable to handle missing omics data, which significantly constrains its utility in real-world scenarios where incomplete data are common. This limitation underscores the need for further development to enhance the model’s robustness and adaptability to handle data with missing omics. Future research will focus on integrating data imputation techniques or developing algorithms that can effectively process incomplete datasets. Finally, although our model achieves improved accuracy, it does not reduce computational time and demands substantial GPU memory, posing challenges for practical deployment. Optimizing the algorithm to enhance memory efficiency and accelerate computation will be a crucial area for future work. These enhancements are essential to make the model more scalable and accessible for large-scale data analysis in clinical and research settings.

The results show differences in the classification performance of CpGs site and different gene regions across ROSMAP and BRCA datasets. The patterns of abnormal DNA methylation may vary significantly among different diseases, which directly affects the performance of single CpGs site and region-integrated analyses [51,52]. The CpG site level retains the original probe-level resolution, with each probe reflecting the methylation status of a specific site, enabling the capture of local and detailed regulatory information. However, using the mean values of gene regions (such as gene body and UTRs), although it may obscure the abnormal methylation signals of some key sites, can provide better interpretability. Without considering sample size limitations and computational costs, retaining high-resolution CpG site features usually leads to better model performance. Nevertheless, if gene region features can be cleverly constructed, a better balance between interpretability and performance may also be achieved.

For enrichment results (Figure 7), the most significantly enriched Gene Ontology (GO) terms in terms of molecular function (MF), biological process (BP), and cellular component (CC) are the same for both BRCA and ROSMAP, namely DNA-binding transcription factor binding, regulation of protein kinase activity, and focal adhesion, respectively. However, in the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment, Human cytomegalovirus infection was most enriched in BRCA, whereas the PI3K−Akt signaling pathway was most enriched in ROSMAP.

In 2008, N. Bektas et al. found that, compared to normal breast tissue, the transcription factor FOXM1 is overexpressed at both the RNA and protein levels in breast cancer [53], and in 2022, Sun E et al. reviewed the role of NF-κB in the pathogenesis of AD and outlined its potential for AD drug development [54], providing supporting evidence of the link between transcription factor binding and both AD and breast cancer, as exemplified by NF-κB and FOXM1.

In 1997, Imahori K. et al. demonstrated that tau protein kinases I (TPKI) and II (TPKII) are key enzymes responsible for abnormal tau phosphorylation and the formation of paired helical filaments (PHFs), the core component of neurofibrillary tangles in AD [55]. In 2016, J. L. Hsu and M.-C. Hung have reviewed the role of tyrosine kinases in breast cancer [56]. indicating a strong connection between protein kinases and both AD and breast cancer.

In 2007, Caltagarone J et al. revealed that the integrin/FAK signaling pathway can induce neuronal cell cycle re-entry and neuronal death [57], and in 2010, Luo M et al. found that focal adhesion kinase (FAK) is a critical determinant in the initiation, progression, and metastasis of breast cancer [58], contributing to the development of AD. This highlights the significant association between focal adhesion and both AD and breast cancer.

Moreover, in 2021, the PI3K/AKT signaling pathway was shown to be closely related to AD [59], and in 2025, Human cytomegalovirus (HCMV) infection was reported to have a higher prevalence in breast cancer tissues compared to non-cancerous tissues, and is associated with increased breast cancer risk [60].

Due to the limitations of our study, we excluded viral-related miRNA genes obtained from biomarker discovery. Although these were not investigated, there remains a significant connection between viruses and AD. In 1997, R. F. Itzhaki et al. discovered that Herpes simplex virus type 1 (HSV-1) and APOE-ε4 synergistically cause neurological damage and increase the risk of Alzheimer’s disease [61].

5. Conclusions

We propose a novel multi-omics deep learning model that integrates multiple omics data for biomarker discovery and disease prediction. This model leverages non-omics data from samples to improve prediction and diagnostic outcomes. After preprocessing the omics data, noise and redundant features were removed. For each individual omics, we used a specific MGAT to learn the features of each omics type.

Moreover, we hypothesize that combining omics and non-omics data during the construction of similarity networks yields better results. This combined similarity network is then input into the MGAT. Compared to fully connected networks, GATs can utilize the sample similarity network to capture relationships between samples and perform preliminary classification. Additionally, we use another model, MGAF, which compensates for the limitations of MGAT by allowing the mid-term fusion of omics features. MGAF provides an initial fusion of omics data, generating another preliminary classification result.

Finally, we use an attention fusion mechanism to jointly train all outputs from the MGAT and MGAF models to produce the final prediction result, which enables more effective learning of intra-group and inter-group correlations in higher-dimensional label spaces. We train the model using multiple omics data from the ROSMAP dataset and BRCA dataset to predict whether patients have Alzheimer’s disease or breast cancer, and compare the performance of our model to existing omics classification models, our model significantly outperforms other models.

Through the model, we validate the necessity of integrating multiple omics data for analysis, demonstrating that combining multiple omics results in better analytical outcomes. We also perform feature elimination experiments to assess their impact on the model, and conduct enrichment analysis on the most influential features. Ultimately, we identify potential biomarkers that have a significant impact on Alzheimer’s disease. Furthermore, by incorporating non-omics features, we identify that APOE genotype, CERAD score, and Braak stage, which all play a role in Alzheimer’s disease. In future work, our research holds considerable potential for further expansion. This includes extending the model framework to support the longitudinal prediction of AD progression, evaluating its effectiveness on larger-scale multi-omics datasets such as UK Biobank, and integrating single-cell multi-omics data to enable higher-resolution analysis and deeper biological insights.

Author Contributions

Conceptualization, L.C. (Liang Chen); methodology, Z.Z., Y.C., and C.W.; software, M.G.; validation, M.G., L.C. (Lu Cai), J.H., Y.L., and G.W.; formal analysis, Z.Z., Y.C., and C.W.; investigation, C.W.; resources, M.G.; data curation, C.W. and L.C. (Lu Cai); writing—original draft preparation, Z.Z., Y.C.; writing—review and editing, C.W., G.W., and L.C. (Liang Chen); visualization, Z.Z., Y.C., and L.C. (Lu Cai); supervision, L.C. (Liang Chen); project administration, Y.L. and G.W.; funding acquisition, Y.L. and L.C. (Liang Chen). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Natural Science Foundation of Guangdong Province (2025A1515012810), the National Natural Science Foundation of China (62372494, 62002212, 32373177), the National Science and Technology Major Project (2024ZD0529106), the Guangdong Quality Engineering Project (2024001), STU Scientific Research Foundation for Talents (35941918), and Li Ka Shing Foundation Cross-Disciplinary Research Grant (2020LKSFG07D).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

ROSMAP dataset can be found in https://adknowledgeportal.synapse.org (accessed on 6 September 2022), include DNA methylation (syn3168763), mRNA (syn3505720), miRNA (syn3387327) and clinical data (syn3191087). BRCA dataset can be found in https://xenabrowser.net/datapages/?cohort=GDC%20TCGA%20Breast%20Cancer%20(BRCA)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443 (accessed on 20 April 2024), and Hi-C data can be found in http://menglab.pub/hic/ (accessed on 14 September 2024). Our code can be found in https://github.com/neko111111/MOGAD, accessed on 14 September 2024.

Acknowledgments

We thank the administrators of the ROSMAP and AD Knowledge Portal, who made this work possible.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Hi-C Differential Matrix Construction

Appendix A.1.1. Interaction Frequency Extraction

Interaction matrices were generated from AD (smp_ad.hic) and normal aging (smp_aged.hic) Hi-C datasets using juicer_tools with the following parameters:

(1): Data type: Observed values
(2): Normalization: KR
(3): Resolution: 1 kb

Appendix A.1.2. Differential Matrix Calculation

The differential matrix

X

was computed as:

X = \ln (\frac{X_{A D} + E}{X_{n o r m a l} + E} + 1)

(A1)

where:

$X_{A D}$ : Interaction matrix from AD samples.
$X_{n o r m a l}$ : Interaction matrix from normal aging samples.
$E$ : Stabilization matrix (Dimension: n × n, all elements = 0.001) added to prevent division by zero.
$\ln (x + 1)$ : applied to prevent negative results.

Appendix A.1.3. Genomic Region Extension

To mitigate sparsity caused by narrow original feature ranges ([start, end]), chromosomal regions were expanded to [start−100,000, end+100,000], effectively capturing 200 kb windows centered on each feature. This adjustment ensured statistically meaningful interaction frequency sampling.

Appendix A.2. Data Preprocessing

Appendix A.2.1. Omics Data Processing

DNA Methylation:

Probe filtering: Retained probes matching the Illumina Infinium HumanMethylation27 Bead Chip.

Variance threshold: Features with variance <0.001 were removed.

Gene Expression:

Variance threshold: Features with variance <0.1 were removed.

miRNA Data:

Features with all-zero values were excluded because of miRNA data’s fewer features.

Normalization: All omics datasets underwent Z-score standardization prior to analysis:

z = \frac{x - μ}{σ}

(A2)

where

μ

is the mean and

σ

is the standard deviation.

Appendix A.2.2. Non-Omics Feature Encoding (ROSMAP)

CERAD Score:

Encoding: 0 (No AD), 1 (Possible AD), 2 (Probable AD), 3 (Definite AD).

Braak Stage:

Encoding: 0 (None), 1–6 (Stages 1–6).

APOE Genotype:

Since the APOE genotype E4 gene is a susceptibility risk factor for AD [62,63], we encode APOE genotype by following method:

Low-risk group (E2E2, E2E3, E3E3): Assigned 0.

High-risk group (E2E4, E3E4, E4E4): Assigned 1.

Normalization: All non-omics features underwent Z-score standardization.

Appendix A.3. Construction of Similarity Networks

Appendix A.3.1. Omics-Only Networks

(1): Cosine Similarity Calculation:

For samples

i

,

j

with feature vectors

x_{i}

,

x_{j}

:

Similarity (x_{i}, x_{j}) = \frac{x_{i} \cdot x_{j}}{‖x_{i}‖ \times ‖x_{j}‖}

(A3)

(2): Threshold-Based Sparsification:

Retain edges only if similarity exceeds threshold

ϵ

:

A_{i j} = \{\begin{matrix} s (x_{i}, x_{j}), & i f i \neq j a n d s (x_{i}, x_{j}) \geq ϵ \\ 0, & o t h e r w i s e \end{matrix}

(A4)

where

s (\cdot)

denotes cosine similarity.

Appendix A.3.2. Threshold Determination

The threshold

ϵ

is calculated as (A5):

k = \frac{\sum_{i, j} I (s (x_{i}, x_{j}) \geq ϵ)}{n}

(A5)

Here,

I (\cdot)

is the indicator function,

n

is the number of nodes, and

k = 2

was set from MOGONET [28].

Appendix A.3.3. Integrated Omics/Non-Omics Networks

(1): Feature Fusion:

Combined feature vector for sample

i

:

x_{i}^{'} = [x_{i}^{o m i c s}; α \cdot x_{i}^{n o n - o m i c s}]

(A6)

where

α

denotes non-omics feature weights.

(2): Input feature matrix:

X^{'} = (\begin{matrix} \begin{matrix} N \\ | | \end{matrix} \\ i = 1 \end{matrix} {{(x}_{i}^{'})}^{T})^{T}

(A7)

where

N

represents the number of samples, || represents horizontal concatenation, and

T

represents matrix transposition.

(3): Network Generation:

Compute cosine similarity using

X^{'}

to construct composite network

A^{'}

.

Appendix A.4. Pretraining of MGAT

Appendix A.4.1. Input and Architecture

Input: Feature matrix

X_{m}

and Adjacency matrix

A_{m}

(omics

m

)

Architecture: Primary multi-head GAT→concatenate→Secondary GAT

Appendix A.4.2. Attention Weight Computation

For weight computation, we used the Mish activation function. The weight computation formula is as follows:

α_{i j} = \frac{e x p (M i s h (a^{T} [W x_{i} | | W x_{j}))}{\sum_{k ϵ N_{t r}} e x p (M i s h (a^{T} [W x_{i} | | W x_{k}]))}

(A8)

where

α_{i j}

represents the weight coefficient between sample

i

and sample

j

,

a^{T}

represents the parameters learned by the model,

W

is the weight matrix for the corresponding features,

x_{i}

denotes the input of sample

i

, and

N_{t r}

is the total number of training samples.

Appendix A.4.3. Multi-Head Aggregation

The individual GAT aggregates the obtained weight parameters to derive the intermediate representation

x_{k}^{'}

for the corresponding sample:

x_{k}^{'} = M i s h (\sum_{j ϵ N_{t r}} α_{i j} W x_{j})

(A9)

By concatenating the outputs of multiple GATs, a stable output representation can be obtained:

x^{'} = \begin{matrix} \begin{matrix} K \\ | | \end{matrix} \\ k = 1 \end{matrix} x_{k}^{'}

(A10)

where || represents horizontal concatenation,

K

represents the number of attention heads, i.e., the number of GATs.

Appendix A.4.4. Classification and Loss Function

We use another GAT to process

x^{'}

and adjacency matrix

A_{m}

, and obtain the final result of MGAT based on m-th omics data:

{\hat{Y}}_{M G A T} = S o f t m a x (L i n e a r (G A T (x^{'}, A_{m})))

(A11)

MSE loss for MGAT:

L_{M G A T} = \frac{\sum_{j = 1}^{N_{t r}} {({\hat{Y}}_{M G A T} - y_{j})}^{2}}{N_{t r}}

(A12)

where

N_{t r}

represents the number of training samples and

y_{j}

denotes the true label of the j-th sample.

Appendix A.4.5. Pseudo-Code for MGAT

Algorithm A1. Multi-head GAT network (MGAT)

Input: Feature matrix for omics m

X_{m}

, Adjacency matrix for omics m

A_{m}

and Number of attention heads K.
Output: Preliminary prediction results of MGAT

{\hat{Y}}_{M G A T}

.
1: Initialize weight matrices

W

and attention vector

α

2: for each head k = 1 to K do
3: for each node j in

N_{t r}

do
4

: α_{i j}

\leftarrow \frac{e x p (M i s h (a^{T} [W x_{i} | | W x_{j}))}{\sum_{k ϵ N_{t r}} e x p (M i s h (a^{T} [W x_{i} | | W x_{k}]))}

5: end for
6:

x_{k}^{'}

\leftarrow M i s h (\sum_{j ϵ N_{t r}} α_{i j} W x_{j})

7: end for
8:

x^{'}

\leftarrow \begin{matrix} \begin{matrix} K \\ | | \end{matrix} \\ k = 1 \end{matrix} x_{k}^{'}

9

: {\hat{Y}}_{M G A T}

\leftarrow S o f t m a x (L i n e a r (G A T (x^{'}, A_{m})))

10: Compute

MSE Loss : L_{M G A T}

\leftarrow \frac{\sum_{j = 1}^{N_{t r}} {({\hat{Y}}_{M G A T} - y_{j})}^{2}}{N_{t r}}

11: Return

{\hat{Y}}_{M G A T}

, L_{M G A T}

Appendix A.5. Pretraining of MGAF

Appendix A.5.1. Input Encoding

For each omics type

m

, the feature matrix

X_{m}

and adjacency matrix

A_{m}

are encoded via a dedicated GAT:

E_{m} = {G A T}_{m} (X_{m}, A_{m})

(A13)

Appendix A.5.2. Attention-Weighted Fusion

Learnable attention score

a_{m}

scales each omics representation:

E = \begin{matrix} \begin{matrix} M \\ | | \end{matrix} \\ m = 1 \end{matrix} a_{m} E_{m}

(A14)

where

M

represents the number of omics datasets, and || represents horizontal concatenation.

Appendix A.5.3. Classification and Loss

A final prediction is generated via linear layer and Softmax functions:

{\hat{Y}}_{M G A F} = S o f t m a x (L i n e a r (E))

(A15)

MSE loss for MGAF:

L_{M G A F} = \frac{\sum_{j = 1}^{N_{t r}} {({\hat{Y}}_{M G A F} - y_{j})}^{2}}{N_{t r}}

(A16)

where

N_{t r}

represents the number of training samples and

y_{j}

denotes the true label of the j-th sample.

Appendix A.5.4. Notice

Notably, MGAF is a module designed for training with multiple omics datasets and is not used when the input consists of a single omics dataset.

Appendix A.5.5. Pseudo-Code for MGAF

Algorithm A2. Multi-Graph Attention Fusion (MGAF)

Input: Feature matrix for omics m

X_{m}

, Adjacency matrix for omics m

A_{m}

and Total number of omics datasets M.
Output: Preliminary prediction results of MGAF

{\hat{Y}}_{M G A F}

.
1: Initialize omics-specific GATs:

{G A T}_{1}

, {G A T}_{2}

, \dots, {G A T}_{M}

2: Initialize attention weights

a_{1}

, a_{2}

, \dots, a_{M}

(learnable parameters)
3: for each omics type m = 1 to M do

4 : E_{m}

\leftarrow {G A T}_{m} (X_{m}, A_{m})

5: end for
6: E←

\begin{matrix} \begin{matrix} M \\ | | \end{matrix} \\ m = 1 \end{matrix} a_{m} E_{m}

7

: {\hat{Y}}_{M G A F}

\leftarrow S o f t m a x (L i n e a r (E))

8

: Compute MSE Loss : L_{M G A F}

←

\frac{\sum_{j = 1}^{N_{t r}} {({\hat{Y}}_{M G A F} - y_{j})}^{2}}{N_{t r}}

9: Return

{\hat{Y}}_{M G A F}

, L_{M G A F}

Appendix A.6. Integration of Omics Networks

Appendix A.6.1. Fusion Mechanism

(1): Input Processing:

For each input prediction

X_{t}

(MGAT/MGAF outputs):

E = \begin{matrix} \begin{matrix} T \\ | | \end{matrix} \\ t = 1 \end{matrix} a_{t} \cdot R e L U (σ_{t} (X_{t}))

(A17)

where

T

represents the number of input matrices, and

σ_{t} (\cdot)

denotes the linear layer.

(2): Final Classification:

{\hat{Y}}_{A F} = S o f t m a x (L i n e a r (R e L U (L i n e a r (E))))

(A18)

Appendix A.6.2. Loss Functions

MSE Loss for AF:

L_{A F} = \frac{\sum_{j = 1}^{N_{t r}} {({\hat{Y}}_{A F} - y_{j})}^{2}}{N_{t r}}

(A19)

where

N_{t r}

represents the number of training samples and

y_{j}

denotes the true label of the j-th sample.

Appendix A.6.3. Pseudo-Code for MGAF

Algorithm A3. Attention Fusion (AF)

Input: Output matrix from MGAT or MGAF

X_{t}

, Number of input matrices T.
Output: Final prediction

{\hat{Y}}_{A F}

.
1: Initialize linear layers

σ_{1}

,

σ_{2}

, …,

σ_{T}

2: Initialize learnable attention weights

a_{1}

, a_{2}

, …

, a_{T}

3: for each t = 1 to T do
4:

H_{t}

←

R e L U (σ_{t} (X_{t}))

5:

E_{t}

←

a_{t} H_{t}

6: end for
7: E←

\begin{matrix} \begin{matrix} T \\ | | \end{matrix} \\ t = 1 \end{matrix} E_{t}

8:

{\hat{Y}}_{A F} = S o f t m a x (L i n e a r (R e L U (L i n e a r (E))))

9: Compute MSE Loss:

L_{A F}

←

\frac{\sum_{j = 1}^{N_{t r}} {({\hat{Y}}_{A F} - y_{j})}^{2}}{N_{t r}}

10: Return

{\hat{Y}}_{A F}

,

L_{A F}

Appendix A.7. Common Ground for MGAT, MGAF and AF

Appendix A.7.1. Regularization

To prevent overfitting, this study introduces L2 regularization after each loss function. Taking the loss function of AF as an example, it is formulated as follows:

J (W, b) = \frac{1}{N_{t r}} L_{A F} + \frac{λ}{2 N_{t r}} \sum_{j = 1}^{n_{x}} {‖W‖}_{2}^{2}

(A20)

where

λ

represents the weight decay coefficient,

n_{x}

denotes the number of network layers,

W

is the network weights, and

{‖W‖}_{2}^{2}

represents the square of the L2 norm of the weights.

Appendix A.7.2. Training Strategy

Staged Updates:

Pretrained MGAT/MGAF: Parameters updated every 5 epochs.

AF: Parameters updated every epoch.

Input Scenarios:

Single-omics: AF fuses only MGAT outputs.

Multi-omics: AF fuses MGAT + MGAF outputs.

Appendix A.7.3. Omics/Non-Omics Integration

Same as the third point in the construction of similarity networks.

Appendix A.8. Biomarker Discovery

Appendix A.8.1. How to Calculate Feature Score

Trained MOGAD model parameters were fixed for all downstream evaluations.

For each omics feature (miRNA, mRNA, methylation):

Step 1: For the origin data

S_{o r i g i n}

, each omics feature is set to zero sequentially as perturbed data

S_{p e r t u r b e d}

.

Step 2: Input perturbed data

S_{p e r t u r b e d}

into the trained model to compute F1 score.

Step 3: Calculate feature score as

∆ S = S_{o r i g i n} - S_{p e r t u r b e d}

.

Appendix A.8.2. Experiment

Methylation data were divided into 8 gene regions, each paired with identical miRNA/mRNA data to form a distinct data group.

Each group underwent 5 independent trials, and final feature scores were averaged across trials.

Appendix A.8.3. Frequency-Based miRNA Filtering

miRNA features that appears over four times in all gene regions were retained as high-confidence candidates.

Appendix A.8.4. Biological Validation

Top-ranked features underwent enrichment analysis.

Appendix B

Table A1. The difference between MOGAD and other models.

Method	MOGONET	MOADLN	MoGCN	MOGAD
Multi-omics types	mRNA, Me, miRNA	mRNA, Me, miRNA	mRNA, CNV, protein	mRNA, Me, miRNA
Clinical Data Integration	FALSE	FALSE	FALSE	TRUE
Core components	GCN	Multi-head Attention	GCN, SNF and AE	GAT
Hi-C Validation	FALSE	FALSE	FALSE	TRUE
Performance (AD ACC)	0.751	0.758	None	0.773
Performance (BRCA ACC)	0.851	0.835	0.793	0.874

mRNA: gene expression data; Me: DNA methylation data; miRNA: miRNA expression data; CNV: copy number variation data; protein: protein expression data.

Table A2. Hyperparameter in each module.

Modules	Pre-Learning Rate	Learning Rate	Hidden Layer	Dropout Rate	Head Number
MGAT (ROSMAP)	5 × 10⁻³	5 × 10⁻⁴	20	0.5	3
MGAF (ROSMAP)	5 × 10⁻³	5 × 10⁻⁴	20	0.5	None
AF (ROSMAP)	None	1 × 10⁻³	16	None	None
MGAT (BRCA)	5 × 10⁻³	5 × 10⁻⁴	50	0.5	3
MGAF (BRCA)	5 × 10⁻³	5 × 10⁻⁴	200	0.1	None
AF (BRCA)	None	1 × 10⁻³	64	None	None

Table A3. The performances of different models use BRCA omics data.

Method	ACC	F1_Weighted	F1_Macro
SVM	0.786	0.767	0.666
KNN	0.644	0.578	0.513
RF	0.832	0.817	0.734
DT	0.780	0.777	0.734
NB	0.436	0.323	0.214
XGBoost	0.856	0.855	0.820
DNN	0.798	0.802	0.761
MOGONET	0.851	0.847	0.804
MOADLN	0.835	0.829	0.793
MoGCN	0.793	0.781	0.750
MOGAD (This study)	0.874	0.874	0.861

Table A4. Comparison of our model with other models in terms of training speed and memory.

Dataset	Metrics	MOGAD	MOGONET	MOADLN
ROSMAP	Time	113.33 s	72.77 s	33.96 s
	Peak GPU memory usage	479.27 MB	145.96 MB	172.64 MB
	Peak RAM usage	2531.36 MB	2828.47 MB	2529.46 MB
BRCA	Time	231.96 s	101.15 s	63.67 s
	Peak GPU memory usage	2377.35 MB	378.11 MB	458.75 MB
	Peak RAM usage	2529.08 MB	2829.73 MB	2528.39 MB

Figure A1 illustrates the overall workflow of MGAT. First, the omics data (Feature matrix) and the generated cosine similarity network (Adjacency matrix) are fed into multiple GAT (Graph Attention Network) heads for training. The outputs from these heads are then concatenated to form a new feature matrix, which is combined with the original cosine similarity network and input into another GAT for further training. Finally, the model produces the final output of MGAT through a Linear layer and Softmax function. If you have multiple omics data, you need the same number of MGATs.

Figure A2 illustrates the overall workflow of MGAF. First, the omics data (Feature matrix) and the cosine similarity network (Adjacency matrix) generated from it are fed into their own GAT for training. Then, learnable attention scores are assigned to each product to obtain the next product. Finally, these products are concatenated and passed through a Linear layer and Softmax function to obtain the final output of MGAF. It is worth noting that if there is only one omics data, MGAF will not work. The symbol ‘*’ represent the multiplication.

Figure A3 illustrates the overall workflow of AF. First, the outputs from MGAT (One or multiple instances) and MGAF (Zero or one instance) are input into AF and processed through Linear layers and activation functions. The processed data is then assigned corresponding learnable attention scores and concatenated. Finally, the concatenated results are passed through multiple Linear layers, activation functions, and a Softmax layer to generate the final prediction. The symbol ‘*’ represent the multiplication.

Figure A1. Multi-head GAT network.

Figure A2. Multi-graph attention fusion.

Figure A3. Attention fusion.

Figure A4. The performance of MOGAD under different BRCA omics data types.

Figure A5. The performance of our model using non-omics data with different importance score k.

Figure A6. The performance of MOGAD without different modules in BRCA.

Figure A7. The performance of MOGAD under BRCA CpGs and different gene regions with mRNA and miRNA.

Figure A8. The performance of MOGAD under BRCA CpGs and different gene regions.

Figure A9. The GO and KEGG pathway enrichment results of BRCA.

References

Kumar, A.; Singh, A. A review on alzheimer’s disease pathophysiology and its management: An update. Pharmacol. Rep. 2015, 67, 195–203. [Google Scholar] [PubMed]
Blennow, K.; Zetterberg, H. Biomarkers for alzheimer’s disease: Current status and prospects for the future. J. Intern. Med. 2018, 284, 643–663. [Google Scholar] [PubMed]
Hasin, Y.; Seldin, M.; Lusis, A. Multi-omics approaches to disease. Genome Biol. 2017, 18, 83. [Google Scholar]
Huang, S.; Chaudhary, K.; Garmire, L.X. More is better: Recent progress in multi-omics data integration methods. Front. Genet. 2017, 8, 84. [Google Scholar]
Heo, Y.J.; Hwa, C.; Lee, G.-H.; Park, J.-M.; An, J.-Y. Integrative multi-omics approaches in cancer research: From biological networks to clinical subtypes. Mol. Cells 2021, 44, 433–443. [Google Scholar]
Subramanian, I.; Verma, S.; Kumar, S.; Jere, A.; Anamika, K. Multi-omics data integration, interpretation, and its application. Bioinform. Biol. Insights 2020, 14, 1177932219899051. [Google Scholar]
Leon-Mimila, P.; Wang, J.; Huertas-Vazquez, A. Relevance of multi-omics studies in cardiovascular diseases. Front. Cardiovasc. Med. 2019, 6, 91. [Google Scholar]
Yan, Z.; Chen, B.; Yang, Y.; Yi, X.; Wei, M.; Ecklu-Mensah, G.; Buschmann, M.M.; Liu, H.; Gao, J.; Liang, W. Multi-omics analyses of airway host–microbe interactions in chronic obstructive pulmonary disease identify potential therapeutic interventions. Nat. Microbiol. 2022, 7, 1361–1375. [Google Scholar]
Mills, R.H.; Dulai, P.S.; Vázquez-Baeza, Y.; Sauceda, C.; Daniel, N.; Gerner, R.R.; Batachari, L.E.; Malfavon, M.; Zhu, Q.; Weldon, K. Multi-omics analyses of the ulcerative colitis gut microbiome link Bacteroides vulgatus proteases with disease severity. Nat. Microbiol. 2022, 7, 262–276. [Google Scholar]
Zhang, Y.; Thomas, J.P.; Korcsmaros, T.; Gul, L. Integrating multi-omics to unravel host-microbiome interactions in inflammatory bowel disease. Cell Rep. Med. 2024, 5, 101738. [Google Scholar]
Clark, C.; Dayon, L.; Masoodi, M.; Bowman, G.L.; Popp, J. An integrative multi-omics approach reveals new central nervous system pathway alterations in alzheimer’s disease. Alzheimer’s Res. Ther. 2021, 13, 71. [Google Scholar]
Nativio, R.; Lan, Y.; Donahue, G.; Sidoli, S.; Berson, A.; Srinivasan, A.R.; Shcherbakova, O.; Amlie-Wolf, A.; Nie, J.; Cui, X. An integrated multi-omics approach identifies epigenetic alterations associated with Alzheimer’s disease. Nat. Genet. 2020, 52, 1024–1035. [Google Scholar] [PubMed]
Hampel, H.; Nisticò, R.; Seyfried, N.T.; Levey, A.I.; Modeste, E.; Lemercier, P.; Baldacci, F.; Toschi, N.; Garaci, F.; Perry, G. Omics sciences for systems biology in alzheimer’s disease: State-of-the-art of the evidence. Ageing Res. Rev. 2021, 69, 101346. [Google Scholar] [PubMed]
Badhwar, A.; McFall, G.P.; Sapkota, S.; Black, S.E.; Chertkow, H.; Duchesne, S.; Masellis, M.; Li, L.; Dixon, R.A.; Bellec, P. A multiomics approach to heterogeneity in alzheimer’s disease: Focused review and roadmap. Brain 2020, 143, 1315–1331. [Google Scholar]
Wang, C.; Chen, L.; Yang, Y.; Zhang, M.; Wong, G. Identification of potential blood biomarkers for parkinson’s disease by gene expression and DNA methylation data integration analysis. Clin. Epigenetics 2019, 11, 24. [Google Scholar]
Xicota, L.; Ichou, F.; Lejeune, F.-X.; Colsch, B.; Tenenhaus, A.; Leroy, I.; Fontaine, G.; Lhomme, M.; Bertin, H.; Habert, M.-O. Multi-omics signature of brain amyloid deposition in asymptomatic individuals at-risk for Alzheimer’s disease: The insight-pread study. EBioMedicine 2019, 47, 518–528. [Google Scholar]
Cohn, W.; Melnik, M.; Huang, C.; Teter, B.; Chandra, S.; Zhu, C.; McIntire, L.B.; John, V.; Gylys, K.H.; Bilousova, T. Multi-omics analysis of microglial extracellular vesicles from human alzheimer’s disease brain tissue reveals disease-associated signatures. Front. Pharmacol. 2021, 12, 766082. [Google Scholar]
Palsson, B.; Zengler, K. The challenges of integrating multi-omic data sets. Nat. Chem. Biol. 2010, 6, 787–789. [Google Scholar]
Tarazona, S.; Arzalluz-Luque, A.; Conesa, A. Undisclosed, unmet and neglected challenges in multi-omics studies. Nat. Comput. Sci. 2021, 1, 395–402. [Google Scholar]
Reel, P.S.; Reel, S.; Pearson, E.; Trucco, E.; Jefferson, E. Using machine learning approaches for multi-omics data analysis: A review. Biotechnol. Adv. 2021, 49, 107739. [Google Scholar]
Kim, D.; Joung, J.-G.; Sohn, K.-A.; Shin, H.; Park, Y.R.; Ritchie, M.D.; Kim, J.H. Knowledge boosting: A graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction. J. Am. Med. Inform. Assoc. 2015, 22, 109–120. [Google Scholar]
Argelaguet, R.; Velten, B.; Arnol, D.; Dietrich, S.; Zenz, T.; Marioni, J.C.; Buettner, F.; Huber, W.; Stegle, O. Multi-omics factor analysis—A framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol. 2018, 14, e8124. [Google Scholar]
Singh, A.; Shannon, C.P.; Gautier, B.; Rohart, F.; Vacher, M.; Tebbutt, S.J.; Lê Cao, K.-A. DIABLO: An integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics 2019, 35, 3055–3062. [Google Scholar] [PubMed]
Kang, M.; Ko, E.; Mersha, T.B. A roadmap for multi-omics data integration using deep learning. Brief. Bioinform. 2022, 23, bbab454. [Google Scholar]
Ma, T.; Zhang, A. Integrate multi-omics data with biological interaction networks using multi-view factorization autoencoder (mae). BMC Genom. 2019, 20, 944. [Google Scholar]
Zhang, X.; Xing, Y.; Sun, K.; Guo, Y. OmiEmbed: A unified multi-task deep learning framework for multi-omics data. Cancers 2021, 13, 3047. [Google Scholar]
Poirion, O.B.; Jing, Z.; Chaudhary, K.; Huang, S.; Garmire, L.X. Deepprog: An ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data. Genome Med. 2021, 13, 112. [Google Scholar]
Wang, T.; Shao, W.; Huang, Z.; Tang, H.; Zhang, J.; Ding, Z.; Huang, K. Mogonet integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat. Commun. 2021, 12, 3445. [Google Scholar]
Li, X.; Ma, J.; Leng, L.; Han, M.; Li, M.; He, F.; Zhu, Y. Mogcn: A multi-omics integration method based on graph convolutional network for cancer subtype analysis. Front. Genet. 2022, 13, 806842. [Google Scholar]
Gong, P.; Cheng, L.; Zhang, Z.; Meng, A.; Li, E.; Chen, J.; Zhang, L. Multi-omics integration method based on attention deep learning network for biomedical data classification. Comput. Methods Programs Biomed. 2023, 231, 107377. [Google Scholar]
Lan, W.; Liao, H.; Chen, Q.; Zhu, L.; Pan, Y.; Chen, Y.-P.P. Deepkegg: A multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery. Brief. Bioinform. 2024, 25, bbae185. [Google Scholar]
Wang, F.-a.; Zhuang, Z.; Gao, F.; He, R.; Zhang, S.; Wang, L.; Liu, J.; Li, Y. Tmo-net: An explainable pretrained multi-omics model for multi-task learning in oncology. Genome Biol. 2024, 25, 149. [Google Scholar]
Xu, Y.; Wu, J.; Chen, C.; Ouyang, J.; Li, D.; Shi, T. Emitool: Explainable multi-omics integration for disease subtyping. Int. J. Mol. Sci. 2025, 26, 4268. [Google Scholar] [PubMed]
Xie, G.; Dong, C.; Kong, Y.; Zhong, J.F.; Li, M.; Wang, K. Group lasso regularized deep learning for cancer prognosis from multi-omics and clinical features. Genes 2019, 10, 240. [Google Scholar] [CrossRef] [PubMed]
Huang, Z.; Zhan, X.; Xiang, S.; Johnson, T.S.; Helm, B.; Yu, C.Y.; Zhang, J.; Salama, P.; Rizkalla, M.; Han, Z. Salmon: Survival analysis learning with multi-omics neural networks on breast cancer. Front. Genet. 2019, 10, 166. [Google Scholar]
Hodes, R.J.; Buckholtz, N. Accelerating medicines partnership: Alzheimer’s disease (amp-ad) knowledge portal aids alzheimer’s drug discovery through open data sharing. Expert Opin. Ther. Targets 2016, 20, 389–391. [Google Scholar]
Bennett, D.A.; Schneider, J.A.; Arvanitakis, Z.; Wilson, R.S. Overview and findings from the religious orders study. Curr. Alzheimer Res. 2012, 9, 628–645. [Google Scholar]
De Jager, P.L.; Ma, Y.; McCabe, C.; Xu, J.; Vardarajan, B.N.; Felsky, D.; Klein, H.-U.; White, C.C.; Peters, M.A.; Lodgson, B. A multi-omic atlas of the human frontal cortex for aging and alzheimer’s disease research. Sci. Data 2018, 5, 180142. [Google Scholar]
Morris, J.C.; Heyman, A.; Mohs, R.C.; Hughes, J.P.; van Belle, G.; Fillenbaum, G.; Mellits, E.D.; Clark, C. The consortium to establish a registry for alzheimer’s disease (cerad). Part i. Clinical and neuropsychological assessment of Alzheimer’s disease. Neurology 1989, 39, 1159–1165. [Google Scholar]
Braak, H.; Braak, E. Neuropathological stageing of alzheimer-related changes. Acta Neuropathol. 1991, 82, 239–259. [Google Scholar]
Goldman, M.J.; Craft, B.; Hastie, M.; Repečka, K.; McDade, F.; Kamath, A.; Banerjee, A.; Luo, Y.; Rogers, D.; Brooks, A.N. Visualizing and interpreting cancer genomics data via the xena platform. Nat. Biotechnol. 2020, 38, 675–678. [Google Scholar]
Parker, J.S.; Mullins, M.; Cheang, M.C.; Leung, S.; Voduc, D.; Vickery, T.; Davies, S.; Fauron, C.; He, X.; Hu, Z.; et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 2009, 27, 1160–1167. [Google Scholar] [CrossRef] [PubMed]
The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 2012, 490, 61–70. [Google Scholar] [CrossRef]
Dyer, S.C.; Austine-Orimoloye, O.; Azov, A.G.; Barba, M.; Barnes, I.; Barrera-Enriquez, V.P.; Becker, A.; Bennett, R.; Beracochea, M.; Berry, A. Ensembl 2025. Nucleic Acids Res. 2025, 53, D948–D957. [Google Scholar]
Kozomara, A.; Birgaoanu, M.; Griffiths-Jones, S. mirbase: From microrna sequences to function. Nucleic Acids Res. 2019, 47, D155–D162. [Google Scholar] [PubMed]
Cui, S.; Yu, S.; Huang, H.-Y.; Lin, Y.-C.-D.; Huang, Y.; Zhang, B.; Xiao, J.; Zuo, H.; Wang, J.; Li, Z. Mirtarbase 2025: Updates to the collection of experimentally validated microrna–target interactions. Nucleic Acids Res. 2025, 53, D147–D156. [Google Scholar]
Meng, G.; Xu, H.; Lu, D.; Li, S.; Zhao, Z.; Li, H.; Zhang, W. Three-dimensional chromatin architecture datasets for aging and alzheimer’s disease. Sci. Data 2023, 10, 51. [Google Scholar]
Durand, N.C.; Shamim, M.S.; Machol, I.; Rao, S.S.; Huntley, M.H.; Lander, E.S.; Aiden, E.L. Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Syst. 2016, 3, 95–98. [Google Scholar]
Francescatto, M.; Chierici, M.; Rezvan Dezfooli, S.; Zandonà, A.; Jurman, G.; Furlanello, C. Multi-omics integration for neuroblastoma clinical endpoint prediction. Biol. Direct 2018, 13, 5. [Google Scholar]
Acharjee, A.; Kloosterman, B.; Visser, R.G.; Maliepaard, C. Integration of multi-omics data for prediction of phenotypic traits using random forest. BMC Bioinform. 2016, 17, 363–373. [Google Scholar]
Zhou, X.; Chen, Z.; Cai, X. Identification of epigenetic modulators in human breast cancer by integrated analysis of DNA methylation and RNA-Seq data. Epigenetics 2018, 13, 473–489. [Google Scholar] [CrossRef] [PubMed]
Wang, E.; Wang, M.; Guo, L.; Fullard, J.F.; Micallef, C.; Bendl, J.; Song, W.M.; Ming, C.; Huang, Y.; Li, Y.; et al. Genome-wide methylomic regulation of multiscale gene networks in alzheimer’s disease. Alzheimer’s Dement. 2023, 19, 3472–3495. [Google Scholar] [CrossRef]
Bektas, N.; Haaf, A.t.; Veeck, J.; Wild, P.J.; Lüscher-Firzlaff, J.; Hartmann, A.; Knüchel, R.; Dahl, E. Tight correlation between expression of the forkhead transcription factor foxm1 and her2 in human breast cancer. BMC Cancer 2008, 8, 42. [Google Scholar]
Sun, E.; Motolani, A.; Campos, L.; Lu, T. The pivotal role of nf-kb in the pathogenesis and therapeutics of alzheimer’s disease. Int. J. Mol. Sci. 2022, 23, 8972. [Google Scholar] [PubMed]
Imahori, K.; Uchida, T. Physiology and pathology of tau protein kinases in relation to alzheimer’s disease. J. Biochem. 1997, 121, 179–188. [Google Scholar]
Hsu, J.L.; Hung, M.-C. The role of her2, egfr, and other receptor tyrosine kinases in breast cancer. Cancer Metastasis Rev. 2016, 35, 575–588. [Google Scholar]
Caltagarone, J.; Jing, Z.; Bowser, R. Focal adhesions regulate aβ signaling and cell death in alzheimer’s disease. Biochim. Biophys. Acta (BBA)-Mol. Basis Dis. 2007, 1772, 438–445. [Google Scholar]
Luo, M.; Guan, J.-L. Focal adhesion kinase: A prominent determinant in breast cancer initiation, progression and metastasis. Cancer Lett. 2010, 289, 127–139. [Google Scholar]
Long, H.-Z.; Cheng, Y.; Zhou, Z.-W.; Luo, H.-Y.; Wen, D.-D.; Gao, L.-C. Pi3k/akt signal pathway: A target of natural products in the prevention and treatment of Alzheimer’s disease and Parkinson’s disease. Front. Pharmacol. 2021, 12, 648636. [Google Scholar]
Blanco, R.; Muñoz, J.P. Human cytomegalovirus infection and breast cancer: A literature review of clinical and experimental data. Biology 2025, 14, 174. [Google Scholar] [CrossRef]
Itzhaki, R.F.; Lin, W.-R.; Shang, D.; Wilcock, G.K.; Faragher, B.; Jamieson, G.A. Herpes simplex virus type 1 in brain and risk of alzheimer’s disease. Lancet 1997, 349, 241–244. [Google Scholar] [PubMed]
Meyer, M.R.; Tschanz, J.T.; Norton, M.C.; Welsh-Bohmer, K.A.; Steffens, D.C.; Wyse, B.W.; Breitner, J. Apoe genotype predicts when—not whether—one is predisposed to develop alzheimer disease. Nat. Genet. 1998, 19, 321–322. [Google Scholar] [PubMed]
Yamazaki, Y.; Zhao, N.; Caulfield, T.R.; Liu, C.-C.; Bu, G. Apolipoprotein e and alzheimer disease: Pathobiology and targeting strategies. Nat. Rev. Neurol. 2019, 15, 501–518. [Google Scholar] [PubMed]

Figure 1. The workflow of our study.

Figure 2. The performance of MOGAD under different ROSMAP omics data types.

Figure 3. The performance of MOGAD using omics data and clinical data with best k in ROSMAP.

Figure 4. The performance of MOGAD without different modules in ROSMAP.

Figure 5. The performance of MOGAD under ROSMAP CpGs and different gene regions with mRNA and miRNA.

Figure 6. The performance of MOGAD under ROSMAP CpGs and different gene regions.

Figure 7. The GO and KEGG pathway enrichment results of ROSMAP.

Figure 8. Interaction intensity of top miRNA candidates based on HI-C. We took the logarithms of the interaction intensity.

Table 1. The performance of different models using ROSMAP omics data.

Method	ACC	F1-Score	AUC	MCC
SVM	0.647	0.692	0.732	0.297
KNN	0.590	0.635	0.627	0.176
Lasso	0.710	0.729	0.785	0.417
Elastic-Net	0.739	0.766	0.824	0.479
RF	0.656	0.692	0.714	0.310
DT	0.583	0.603	0.594	0.166
GNB	0.515	0.490	0.498	0.015
XGBoost	0.702	0.731	0.767	0.403
Ridge	0.760	0.780	0.839	0.521
PLSR	0.590	0.588	0.666	0.186
DNN	0.674	0.697	0.749	0.349
MOGONET	0.751	0.772	0.791	0.505
MOADLN	0.758	0.786	0.800	0.524
MOGAD (This study)	0.773	0.787	0.832	0.551

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Z.; Chen, Y.; Wang, C.; Guo, M.; Cai, L.; He, J.; Liang, Y.; Wong, G.; Chen, L. MOGAD: Integrated Multi-Omics and Graph Attention for the Discovery of Alzheimer’s Disease’s Biomarkers. Informatics 2025, 12, 68. https://doi.org/10.3390/informatics12030068

AMA Style

Zhang Z, Chen Y, Wang C, Guo M, Cai L, He J, Liang Y, Wong G, Chen L. MOGAD: Integrated Multi-Omics and Graph Attention for the Discovery of Alzheimer’s Disease’s Biomarkers. Informatics. 2025; 12(3):68. https://doi.org/10.3390/informatics12030068

Chicago/Turabian Style

Zhang, Zhizhong, Yuqi Chen, Changliang Wang, Maoni Guo, Lu Cai, Jian He, Yanchun Liang, Garry Wong, and Liang Chen. 2025. "MOGAD: Integrated Multi-Omics and Graph Attention for the Discovery of Alzheimer’s Disease’s Biomarkers" Informatics 12, no. 3: 68. https://doi.org/10.3390/informatics12030068

APA Style

Zhang, Z., Chen, Y., Wang, C., Guo, M., Cai, L., He, J., Liang, Y., Wong, G., & Chen, L. (2025). MOGAD: Integrated Multi-Omics and Graph Attention for the Discovery of Alzheimer’s Disease’s Biomarkers. Informatics, 12(3), 68. https://doi.org/10.3390/informatics12030068

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MOGAD: Integrated Multi-Omics and Graph Attention for the Discovery of Alzheimer’s Disease’s Biomarkers

Abstract

1. Introduction

2. Materials and Methods

2.1. Datasets

2.1.1. Multi-Omics Data and Clinical Data

2.1.2. Other Annotation Data

2.1.3. Hi-C Data

2.2. Methods

2.2.1. Data Preprocessing

2.2.2. Construction of Similarity Networks

2.2.3. Pretraining of MGAT

2.2.4. Pretraining of MGAF

2.2.5. Integration of Omics Networks

2.2.6. Biomarker Discovery

2.2.7. Hyperparameter Tuning

3. Results

3.1. Experiment

3.2. The Performance of Different Models Using Multi Omics Data

3.3. Comparison of Our Model with Other Models in Terms of Training Speed and Memory

3.4. The Performance of Our Model Under Different Omics Data Types

3.5. The Performance of Our Model Using Non-Omics Data with Different Importance Score k

3.6. The Necessity of Each Module in Our Model

3.7. The Performance of Our Model Using DNA Gene Regions and CpGs

3.8. The Biomarkers Identified by Our Model

3.9. Hi-C in Research

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Hi-C Differential Matrix Construction

Appendix A.1.1. Interaction Frequency Extraction

Appendix A.1.2. Differential Matrix Calculation

Appendix A.1.3. Genomic Region Extension

Appendix A.2. Data Preprocessing

Appendix A.2.1. Omics Data Processing

Appendix A.2.2. Non-Omics Feature Encoding (ROSMAP)

Appendix A.3. Construction of Similarity Networks

Appendix A.3.1. Omics-Only Networks

Appendix A.3.2. Threshold Determination

Appendix A.3.3. Integrated Omics/Non-Omics Networks

Appendix A.4. Pretraining of MGAT

Appendix A.4.1. Input and Architecture

Appendix A.4.2. Attention Weight Computation

Appendix A.4.3. Multi-Head Aggregation

Appendix A.4.4. Classification and Loss Function

Appendix A.4.5. Pseudo-Code for MGAT

Appendix A.5. Pretraining of MGAF

Appendix A.5.1. Input Encoding

Appendix A.5.2. Attention-Weighted Fusion

Appendix A.5.3. Classification and Loss

Appendix A.5.4. Notice

Appendix A.5.5. Pseudo-Code for MGAF

Appendix A.6. Integration of Omics Networks

Appendix A.6.1. Fusion Mechanism

Appendix A.6.2. Loss Functions

Appendix A.6.3. Pseudo-Code for MGAF

Appendix A.7. Common Ground for MGAT, MGAF and AF

Appendix A.7.1. Regularization

Appendix A.7.2. Training Strategy

Appendix A.7.3. Omics/Non-Omics Integration

Appendix A.8. Biomarker Discovery

Appendix A.8.1. How to Calculate Feature Score

Appendix A.8.2. Experiment

Appendix A.8.3. Frequency-Based miRNA Filtering

Appendix A.8.4. Biological Validation

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines