MOMIC: A Multi-Omics Pipeline for Data Analysis, Integration and Interpretation
Abstract
:1. Introduction
2. Materials and Methods
2.1. Core Software
2.2. Implemented Protocols
2.3. Availability and Requirements
- MOMIC server: https://github.com/laumadmar/MOMIC_server.git (accessed on 10 January 2022)
- Collection of notebooks: https://github.com/laumadmar/MOMIC_notebooks.git (accessed on 10 January 2022)
- Documentation: https://laumadmar.github.io/MOMIC_server (accessed on 10 January 2022)
3. Motivation and Existing Alternatives
4. Results
Use Cases: Application on Real Projects
5. Discussion
6. Conclusions and Future Work
Future Work
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Gligorijević, V.; Pržulj, N. Methods for biological data integration: Perspectives and challenges. J. R. Soc. Interface 2015, 12, 20150571. [Google Scholar] [CrossRef] [PubMed]
- Haas, R.; Zelezniak, A.; Iacovacci, J.; Kamrad, S.; Townsend, S.; Ralser, M. Designing and interpreting ‘multi-omic’experiments that may change our understanding of biology. Curr. Opin. Syst. Biol. 2017, 6, 37–45. [Google Scholar] [CrossRef] [PubMed]
- Afgan, E.; Baker, D.; Batut, B.; van den Beek, M.; Bouvier, D.; Čech, M.; Chilton, J.; Clements, D.; Coraor, N.; Grüning, B.A.; et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 2018, 46, W537–W544. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Subramanian, I.; Verma, S.; Kumar, S.; Jere, A.; Anamika, K. Multi-omics Data Integration, Interpretation, and Its Application. Bioinform. Biol. Insights 2020, 14, 1177932219899051. [Google Scholar] [CrossRef] [Green Version]
- Pinu, F.R.; Beale, D.J.; Paten, A.M.; Kouremenos, K.; Swarup, S.; Schirra, H.J.; Wishart, D. Systems biology and multi-omics integration: Viewpoints from the metabolomics research community. Metabolites 2019, 9, 76. [Google Scholar] [CrossRef] [Green Version]
- Kluyver, T.; Ragan-Kelley, B.; Pérez, F.; Granger, B.; Bussonnier, M.; Frederic, J.; Kelley, K.; Hamrick, J.; Grout, J.; Corlay, S.; et al. Jupyter Notebooks—A publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas; Loizides, F., Schmidt, B., Eds.; IOS Press: Amsterdam, The Netherlands, 2016; pp. 87–90. [Google Scholar]
- Madrid, L.; Moreno-Grau, S.; Ahmad, S.; González-Pérez, A.; de Rojas, I.; Xia, R.; Adami, P.V.M.; García-González, P.; Kleineidam, L.; Yang, Q.; et al. Multiomics integrative analysis identifies APOE allele-specific blood biomarkers associated to Alzheimer’s disease etiopathogenesis. Aging 2021, 13, 9277. [Google Scholar] [CrossRef]
- Merkel, D. Docker: Lightweight linux containers for consistent development and deployment. Linux J. 2014, 2014, 2. [Google Scholar]
- Baldi, P.; Hatfield, G.W. DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modeling; Cambridge University Press: Cambridge, MA, USA, 2011. [Google Scholar]
- Gautier, L.; Cope, L.; Bolstad, B.M.; Irizarry, R.A. affy—Analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 2004, 20, 307–315. [Google Scholar] [CrossRef]
- Smyth, G.K. Limma: Linear models for microarray data. In Bioinformatics and Computational Biology Solutions Using R and Bioconductor; Gentleman, R., Carey, V.J., Huber, W., Irizarry, R.A., Dudoit, S., Eds.; Springer: New York, NY, USA, 2005; pp. 397–420. [Google Scholar]
- Bolstad, B. preprocessCore: A Collection of Pre-Processing Functions; R Package Version 1.50.0; Bioconductor: Santo Domingo, Dominican Republic, 2020. [Google Scholar]
- Leek, J.T.; Johnson, W.E.; Parker, H.S.; Jaffe, A.E.; Storey, J.D. The Sva Package for Removing Batch Effects and Other Unwanted Variation in High-Throughput Experiments. Bioinformatics 2012, 28, 882–883. [Google Scholar] [CrossRef]
- Andrews, S.; Krueger, F.; Segonds-Pichon, A.; Biggins, L.; Krueger, C.; Wingett, S. FastQC; Babraham Institute: Babraham, UK, 2010. [Google Scholar]
- Dobin, A.; Davis, C.A.; Schlesinger, F.; Drenkow, J.; Zaleski, C.; Jha, S.; Batut, P.; Chaisson, M.; Gingeras, T.R. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 2013, 29, 15–21. [Google Scholar] [CrossRef]
- Love, M.I.; Huber, W.; Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014, 15, 550. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Durinck, S.; Spellman, P.T.; Birney, E.; Huber, W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 2009, 4, 1184–1191. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Anderson, C.A.; Pettersson, F.H.; Clarke, G.M.; Cardon, L.R.; Morris, A.P.; Zondervan, K.T. Data quality control in genetic case-control association studies. Nat. Protoc. 2010, 5, 1564–1573. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Marees, A.T.; de Kluiver, H.; Stringer, S.; Vorspan, F.; Curis, E.; Marie-Claire, C.; Derks, E.M. A tutorial on conducting genome-wide association studies: Quality control and statistical analysis. Int. J. Methods Psychiatr. Res. 2018, 27, e1608. [Google Scholar] [CrossRef] [PubMed]
- Das, S.; Forer, L.; Schönherr, S.; Sidore, C.; Locke, A.E.; Kwong, A.; Vrieze, S.I.; Chew, E.Y.; Levy, S.; McGue, M.; et al. Next-generation genotype imputation service and methods. Nat. Genet. 2016, 48, 1284–1287. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Taliun, D.; Harris, D.N.; Kessler, M.D.; Carlson, J.; Szpiech, Z.A.; Torres, R.; Taliun, S.A.G.; Corvelo, A.; Gogarten, S.M.; Kang, H.M.; et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 2021, 590, 290–299. [Google Scholar] [CrossRef]
- Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M.A.; Bender, D.; Maller, J.; Sklar, P.; De Bakker, P.I.; Daly, M.J.; et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007, 81, 559–575. [Google Scholar] [CrossRef] [Green Version]
- de Leeuw, C.A.; Mooij, J.M.; Heskes, T.; Posthuma, D. MAGMA: Generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 2015, 11, e1004219. [Google Scholar] [CrossRef]
- Turner, S.D. qqman: An R package for visualizing GWAS results using QQ and manhattan plots. Biorxiv 2014, 005165. [Google Scholar] [CrossRef]
- Zhu, Y. Bioconductor-DEqMS: A Tool to Perform Statistical Analysis of Differential Protein Expression for Quantitative Proteomics Data. R Package Version 2019, 1, 10-18129. [Google Scholar]
- Willer, C.J.; Li, Y.; Abecasis, G.R. METAL: Fast and efficient meta-analysis of genomewide association scans. Bioinformatics 2010, 26, 2190–2191. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Kang, D.D.; Shen, K.; Song, C.; Lu, S.; Chang, L.C.; Liao, S.G.; Huo, Z.; Tang, S.; Ding, Y.; et al. An R package suite for microarray meta-analysis in quality control, differentially expressed gene analysis and pathway enrichment detection. Bioinformatics 2012, 28, 2534–2536. [Google Scholar] [CrossRef] [PubMed]
- Kolde, R.; Laur, S.; Adler, P.; Vilo, J. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 2012, 28, 573–580. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, B.; Kirov, S.; Snoddy, J. WebGestalt: An integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res. 2005, 33, W741–W748. [Google Scholar] [CrossRef]
- Walter, W.; Sánchez-Cabo, F.; Ricote, M. GOplot: An R package for visually combining expression data with functional analysis. Bioinformatics 2015, 31, 2912–2914. [Google Scholar] [CrossRef]
- The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 2015, 526, 68–74. [Google Scholar] [CrossRef] [Green Version]
- Clough, E.; Barrett, T. The gene expression omnibus database. In Statistical Genomics; Springer: Berlin/Heidelberg, Germany, 2016; pp. 93–110. [Google Scholar]
- Shock, N.W. Normal human aging: The Baltimore longitudinal study of aging. JAMA 1986, 255, 960. [Google Scholar]
- Madrid, L.E.A. Integrated Genomic, Transcriptomic and Proteomic Analysis for Identifying Markers of Alzheimer’s Disease. Diagnosticsl 2021, 11, 2303. [Google Scholar] [CrossRef]
- de Rojas, I.; Moreno-Grau, S.; Tesi, N.; Grenier-Boley, B.; Andrade, V.; Jansen, I.E.; Pedersen, N.L.; Stringa, N.; Zettergren, A.; Hernández, I.; et al. Common variants in Alzheimer’s disease and risk stratification by polygenic risk scores. Nat. Commun. 2021, 12, 3417. [Google Scholar] [CrossRef]
- Ochoa, D.E.A. Open Targets Platform: Supporting systematic drug–target identification and prioritisation. Nucleic Acids Res. 2021, 49, D1302–D1310. [Google Scholar] [CrossRef]
- Lambert, J.C.; Ibrahim-Verbaas, C.A.; Harold, D.; Naj, A.C.; Sims, R.; Bellenguez, C.; Jun, G.; DeStefano, A.L.; Bis, J.C.; Beecham, G.W.; et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 2013, 45, 1452–1458. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gibbs, R.A.; Belmont, J.W.; Hardenbol, P.; Willis, T.D.; Yu, F.L.; Yang, H.M.; Ch’ang, L.Y.; Huang, W.; Liu, B.; Shen, Y.; et al. The International Hapmap Project; Nature Publishing Group: London, UK, 2003. [Google Scholar]
Name | Platform | Friendliness | Functionality | Comparison with MOMIC | Availability |
---|---|---|---|---|---|
GenePattern | Web | Easy | Offers a platform for reproducible bioinformatics | Similar to MOMIC in functionality but the code is closed source, making it impossible to full customise the pipelines, however changing the input parameters does allow for minor changes. It focuses mainly on transcriptomics and lacks GWAS analysis and integration of omics. GenePattern notebook also extends JupyterHub and it is available via web and as a local server | https://www.genepattern.org/ (accessed 1 April 2022) |
Galaxy | Web | Medium | Enables researchers without informatics expertise to perform computational analyses through the web | Serves the same aim but unlike MOMIC, it does not offer real time code and visualisation, easy data manipulation or customisation. Developing new tools in Galaxy is not straight forward and requires XML skills | https://usegalaxy.org/ (accessed 1 April 2022) |
mixOmics | R | Difficult | Offers a wide range of novel multivariate methods for the exploration and integration of biological datasets with a particular focus on on variable selection | This R package can be installed in MOMIC to extend the proposed transcriptomics pipelines. It does not offer protocols for GWAS analysis | http://mixomics.org/ (accessed 1 April 2022) |
Paintomics 3 | Web | Easy | Offers integrative visualization of multiple omic datasets onto KEGG pathways | Requires specific input format and data wrangling. This just provides visualisation, as oppose to MOMIC which also provides a platform for data analysis. Results obtained from MOMIC could be used to feed this tool if desired. It does not offer protocols for GWAS analysis | http://www.paintomics.org/ (accessed 1 April 2022) |
Basepair | Web | Easy | Offers interactive NGS analysis pipelines for users with no programming experience | Good alternative for NGS analysis analysis but there is a fee to pay per sample. The source code is not exposed so the customisation is limited | https://www.basepairtech.com/ (accessed 1 April 2022) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Madrid-Márquez, L.; Rubio-Escudero, C.; Pontes, B.; González-Pérez, A.; Riquelme, J.C.; Sáez, M.E. MOMIC: A Multi-Omics Pipeline for Data Analysis, Integration and Interpretation. Appl. Sci. 2022, 12, 3987. https://doi.org/10.3390/app12083987
Madrid-Márquez L, Rubio-Escudero C, Pontes B, González-Pérez A, Riquelme JC, Sáez ME. MOMIC: A Multi-Omics Pipeline for Data Analysis, Integration and Interpretation. Applied Sciences. 2022; 12(8):3987. https://doi.org/10.3390/app12083987
Chicago/Turabian StyleMadrid-Márquez, Laura, Cristina Rubio-Escudero, Beatriz Pontes, Antonio González-Pérez, José C. Riquelme, and Maria E. Sáez. 2022. "MOMIC: A Multi-Omics Pipeline for Data Analysis, Integration and Interpretation" Applied Sciences 12, no. 8: 3987. https://doi.org/10.3390/app12083987
APA StyleMadrid-Márquez, L., Rubio-Escudero, C., Pontes, B., González-Pérez, A., Riquelme, J. C., & Sáez, M. E. (2022). MOMIC: A Multi-Omics Pipeline for Data Analysis, Integration and Interpretation. Applied Sciences, 12(8), 3987. https://doi.org/10.3390/app12083987