Interpretable Autoencoders Trained on Single Cell Sequencing Data Can Transfer Directly to Data from Unseen Tissues
Abstract
:1. Introduction
2. Materials and Methods
3. Results
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Way, G.P.; Greene, C.S. Bayesian deep learning for single-cell analysis. Nat. Methods 2018, 15, 1009–1010. [Google Scholar] [CrossRef]
- Eraslan, G.; Simon, L.M.; Mircea, M.; Mueller, N.S.; Theis, F.J. Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun. 2019, 10, 390. [Google Scholar] [CrossRef] [Green Version]
- Grønbech, C.H.; Vording, M.F.; Timshel, P.N.; Sønderby, C.K.; Pers, T.H.; Winther, O. scVAE: Variational auto-encoders for single-cell gene expression data. Bioinformatics 2020, 36, 4415–4422. [Google Scholar] [CrossRef] [PubMed]
- Marouf, M.; Machart, P.; Magruder, D.S.S.; Bansal, V.; Kilian, C.; Krebs, C.F.; Bonn, S. Realistic in silico generation and augmentation of single cell RNA-seq data using Generative Adversarial Neural Networks. bioRxiv 2018, 390153. [Google Scholar] [CrossRef]
- Mattei, P.-A.; Frellsen, J. MIWAE: Deep Generative Modelling and Imputation of Incomplete Data Sets. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 4413–4423. [Google Scholar]
- Hou, W.; Ji, Z.; Ji, H.; Hicks, S.C. A systematic evaluation of single-cell RNA-sequencing imputation methods. Genome Biol. 2010, 21, 218. [Google Scholar] [CrossRef] [PubMed]
- Viñas Torné, R.; Azevedo, T.; Gamazon, E.; Liò, P. Deep learning enables fast and accurate imputation of gene expression across tissues. Front. Genet. 2021, 12, 489. [Google Scholar]
- Bica, I.; Andrés-Terré, H.; Cvejic, A.; Liò, P. Unsupervised generative and graph representation learning for modelling cell differentiation. Sci. Rep. 2020, 10, 9790. [Google Scholar] [CrossRef] [PubMed]
- Ding, J.; Condon, A.; Shah, S.P. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nat. Commun. 2018, 9, 2002. [Google Scholar] [CrossRef] [Green Version]
- Lopez, R.; Gayoso, A.; Yosef, N. Enhancing scientific discoveries in molecular biology with deep generative models. Mol. Syst. Biol. 2020, 16, e9198. [Google Scholar] [CrossRef]
- Lopez, R.; Regier, J.; Cole, M.B.; Jordan, M.; Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 2018, 15, 1053–1058. [Google Scholar] [CrossRef]
- Menden, K.; Marouf, M.; Oller, S.; Dalmia, A.; Magruder, D.S.; Kloiber, K.; Heutink, P.; Bonn, S. Deep learning–based cell composition analysis from tissue expression profiles. Sci. Adv. 2020, 6, eaba2619. [Google Scholar] [CrossRef]
- Torroja, C.; Sanchez-Cabo, F. Digitaldlsorter: Deep-Learning on scRNA-Seq to Deconvolute Gene Expression Data. Front. Genet. 2019, 10, 978. [Google Scholar] [CrossRef] [Green Version]
- Kinalis, S.; Nielsen, F.C.; Winther, O.; Bagger, F.O. Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data. BMC Bioinform. 2019, 20, 379. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mao, H.; Broerman, M.J.; Benos, P.V. Interpretable Factors in scRNA-seq Data with Disentangled Generative Models. In Proceedings of the 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE), Cincinnati, OH, USA, 26–28 October 2020; pp. 85–88. [Google Scholar]
- Lotfollahi, M.; Naghipourfar, M.; Theis, F.J.; Wolf, F.A. Conditional out-of-distribution generation for unpaired data using transfer VAE. Bioinformatics 2020, 36, i610–i617. [Google Scholar] [CrossRef] [PubMed]
- Rybakov, S.; Lotfollahi, M.; Theis, F.J.; Alexander Wolf, F. Learning interpretable latent autoencoder representations with annotations of feature sets. bioRxiv 2020. [Google Scholar] [CrossRef]
- Svensson, V.; Gayoso, A.; Yosef, N.; Pachter, L. Interpretable factor models of single-cell RNA-seq via variational autoen-coders. Bioinformatics 2020, 36, 3418–3421. [Google Scholar] [CrossRef] [Green Version]
- Zhang, S.; Li, X.; Lin, Q.; Lin, J.; Wong, K.-C. Uncovering the key dimensions of high-throughput biomolecular data using deep learning. Nucleic Acids Res. 2020, 48, e56. [Google Scholar] [CrossRef]
- Wang, W.; Yang, D.; Chen, F.; Pang, Y.; Huang, S.; Ge, Y. Clustering with Orthogonal AutoEncoder. IEEE Access 2019, 7, 62421–62432. [Google Scholar] [CrossRef]
- Bansal, N.; Chen, X.; Wang, Z. Can We Gain More from Orthogonality Regularizations in Training Deep CNNs? arXiv 2018, arXiv:1810.09102. [Google Scholar]
- Brocki, L.; Chung, N.C. Concept Saliency Maps to Visualize Relevant Features in Deep Generative Models. In Proceedings of the 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 1771–1778. [Google Scholar] [CrossRef] [Green Version]
- Springenberg, J.T.; Dosovitskiy, A.; Brox, T.; Riedmiller, M. Striving for Simplicity: The All Convolutional Net. arXiv 2014, arXiv:1412.6806. [Google Scholar]
- Subramanian, A.; Tamayo, P.; Mootha, V.K.; Mukherjee, S.; Ebert, B.L.; Gillette, M.A.; Paulovich, A.; Pomeroy, S.L.; Golub, T.R.; Lander, E.S.; et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 2005, 102, 15545–15550. [Google Scholar] [CrossRef] [Green Version]
- Tabula Muris, C. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 2018, 562, 367–372. [Google Scholar] [CrossRef]
- Stuart, T.; Butler, A.; Hoffman, P.; Hafemeister, C.; Papalexi, E.; Mauck, W.M., III; Hao, Y.; Stoeckius, M.; Smibert, P.; Satija, R. Comprehensive Integration of Single-Cell Data. Cell 2019, 177, 1888–1902.E21. [Google Scholar] [CrossRef]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010; Volume 9, pp. 249–256. [Google Scholar]
- Sutskever, I.; Martens, J.; Dahl, G.; Hinton, G. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning, PMLR, Atlanta, GA, USA, 17–19 June 2013; Volume 28, pp. 1139–1147. [Google Scholar]
- Liberzon, A.; Birger, C.; Thorvaldsdóttir, H.; Ghandi, M.; Mesirov, J.P.; Tamayo, P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015, 1, 417–425. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Korotkevich, G.; Sukhov, V.; Budin, N.; Shpak, B.; Artyomov, M.N.; Sergushichev, A. Fast gene set enrichment analysis. bioRxiv 2021, 060012. [Google Scholar] [CrossRef] [Green Version]
- Becht, E.; McInnes, L.; Healy, J.; Dutertre, C.-A.; Kwok, I.W.H.; Ng, L.G.; Ginhoux, F.; Newell, E.W. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 2018, 37, 38–44. [Google Scholar] [CrossRef] [PubMed]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Frazier, P.I. Bayesian Optimization Recent. In Advances in Optimization and Modeling of Contemporary Problems; The Institute for Operations Research and the Management Sciences: Catonsville, MD, USA, 2018; pp. 255–278. [Google Scholar] [CrossRef]
- Wilson, J.T.; Moriconi, R.; Hutter, F.; Deisenroth, M.P. The reparameterization trick for acquisition functions. arXiv 2017, arXiv:1712.00424. [Google Scholar]
- Letham, B.; Karrer, B.; Ottoni, G.; Bakshy, E. Constrained Bayesian Optimization with Noisy Experiments. Bayesian Anal. 2019, 14, 495–519. [Google Scholar] [CrossRef]
- Uhlen, M.; Oksvold, P.; Fagerberg, L.; Lundberg, E.; Jonasson, K.; Forsberg, M.; Zwahlen, M.; Kampf, C.; Wester, K.; Hober, S.; et al. Towards a knowledge-based Human Protein Atlas. Nat. Biotechnol. 2010, 28, 1248–1250. [Google Scholar] [CrossRef]
Smart-seq2 | Amount of Genes | 10x Genomics | Amount of Genes | Combined Dataset Seurat | Amount of Genes |
---|---|---|---|---|---|
Tongue | 3904 | Tongue | 10,757 | Tongue | 498 |
Thymus | 3772 | Thymus | 7808 | Thymus | 726 |
Spleen | 3708 | Spleen | 8798 | Spleen | 848 |
Marrow | 3992 | Marrow | 10,445 | Marrow | 771 |
Liver | 3779 | Liver | 6950 | Liver | 653 |
Kidney | 3762 | Kidney | 11,645 | Kidney | 774 |
Heart | 4130 | Heart_and_Aorta | 8081 | Heart | 745 |
Bladder | 3900 | Bladder | 10,926 | Bladder | 653 |
Mammary | 3969 | Mammary_Gland | 11,495 | Mammary | 829 |
Lung | 3974 | Lung | 10,666 | Lung | 738 |
Trachea | 4022 | Trachea | 10,584 | Trachea | 822 |
Muscle | 3968 | Limb_Muscle | 9202 | Muscle | 834 |
Simple Model Mean Loss | Hidden Layers | Bottleneck Layer | Complex Model Mean Loss | Hidden Layers | Bottleneck Layer | |
---|---|---|---|---|---|---|
Bladder | 0.94610459 | 77 | 44 | 0.94966865 | 101 | 65 |
Heart | 0.95382822 | 99 | 57 | 0.95688325 | 70 | 40 |
Kidney | 0.97204543 | 118 | 42 | 0.97270389 | 95 | 65 |
Liver | 0.95906472 | 106 | 35 | 0.95659369 | 124 | 40 |
Lung | 0.96714902 | 83 | 63 | 0.96536231 | 119 | 62 |
Mammary | 0.97676671 | 65 | 42 | 0.97654885 | 91 | 62 |
Marrow | 0.98160857 | 110 | 53 | 0.98427087 | 67 | 60 |
Muscle | 0.95753598 | 92 | 72 | 0.95977772 | 89 | 62 |
Spleen | 0.97818142 | 98 | 68 | 0.97743446 | 67 | 58 |
Thymus | 0.98370707 | 73 | 63 | 0.98711485 | 88 | 57 |
Tongue | 0.93583191 | 90 | 51 | 0.93667495 | 74 | 70 |
Trachea | 0.97082079 | 100 | 49 | 0.96826011 | 116 | 55 |
Dataset | Cell Types Overlapping with Marrow | |||
---|---|---|---|---|
Lung | T-cell | B-cell | Natural killer cell | Monocyte |
Liver | B-cell | Natural killer cell | ||
Muscle | T-cell | B-cell | ||
Thymus | T-cell | B-cell | ||
Spleen | T-cell | B-cell |
Dataset | Cell Types Overlapping with Lung | ||||
---|---|---|---|---|---|
Bladder | Leukocyte | Endothelial cell | |||
Marrow | Monocyte | Macrophage | Natural killer cell | T-cell | B-cell |
Thymus | T-cell | Leukocyte | |||
Trachea | Stromal cell | Leukocyte | Epithelial cell | Endothelial cell | |
Spleen | B-cell | Myeloid cell | Natural killer cell | T-cell | |
Kidney | Leukocyte | Macrophage | Endothelial cell | ||
Liver | B-cell | Leukocyte | Natural killer cell | ||
Mammary | Endothelial cell | T-cell | B-cell | Macrophage | |
Muscle | Endothelial cell | T-cell | B-cell | Macrophage | |
Heart | Endothelial cell | Leukocyte |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Walbech, J.S.; Kinalis, S.; Winther, O.; Nielsen, F.C.; Bagger, F.O. Interpretable Autoencoders Trained on Single Cell Sequencing Data Can Transfer Directly to Data from Unseen Tissues. Cells 2022, 11, 85. https://doi.org/10.3390/cells11010085
Walbech JS, Kinalis S, Winther O, Nielsen FC, Bagger FO. Interpretable Autoencoders Trained on Single Cell Sequencing Data Can Transfer Directly to Data from Unseen Tissues. Cells. 2022; 11(1):85. https://doi.org/10.3390/cells11010085
Chicago/Turabian StyleWalbech, Julie Sparholt, Savvas Kinalis, Ole Winther, Finn Cilius Nielsen, and Frederik Otzen Bagger. 2022. "Interpretable Autoencoders Trained on Single Cell Sequencing Data Can Transfer Directly to Data from Unseen Tissues" Cells 11, no. 1: 85. https://doi.org/10.3390/cells11010085
APA StyleWalbech, J. S., Kinalis, S., Winther, O., Nielsen, F. C., & Bagger, F. O. (2022). Interpretable Autoencoders Trained on Single Cell Sequencing Data Can Transfer Directly to Data from Unseen Tissues. Cells, 11(1), 85. https://doi.org/10.3390/cells11010085