Enhancing Single-Cell and Bulk Hi-C Data Using a Generative Transformer Model
Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. Overview of HiCENT
2.2. HiCENT Architecture
2.3. Loss Functions in HiCENT
2.4. Hi-C and scHi-C Datasets Used for Enhancement
2.5. Data Preprocessing
2.6. Implementations of HiCENT and Six Other Methods
2.7. Performance Evaluation Metrics
3. Results
3.1. Hyperparameter Selections of HiCENT
3.2. HiCENT Outperforms Reference Methods in Image-Based Metrics and Hi-C Reproducibility Metrics
3.3. Visual Comparison of Predicted Contact Maps in Hi-C Data
3.4. HiCENT Enhanced scHi-C Data Facilitates Cell Clustering
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Oluwadare, O.; Highsmith, M.; Cheng, J. An Overview of Methods for Reconstructing 3-D Chromosome and Genome Structures from Hi-C Data. Biol. Proced. Online 2019, 21, 7. [Google Scholar] [CrossRef] [PubMed]
- Lieberman-Aiden, E.; van Berkum, N.L.; Williams, L.; Imakaev, M.; Ragoczy, T.; Telling, A.; Amit, I.; Lajoie, B.R.; Sabo, P.J.; Dorschner, M.O.; et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 2009, 326, 289–293. [Google Scholar] [CrossRef]
- Nagano, T.; Lubling, Y.; Stevens, T.J.; Schoenfelder, S.; Yaffe, E.; Dean, W.; Laue, E.D.; Tanay, A.; Fraser, P. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 2013, 502, 59–64. [Google Scholar] [CrossRef]
- Kim, H.J.; Yardımcı, G.G.; Bonora, G.; Ramani, V.; Liu, J.; Qiu, R.; Lee, C.; Hesson, J.; Ware, C.B.; Shendure, J.; et al. Capturing cell type-specific chromatin compartment patterns by applying topic modeling to single-cell Hi-C data. PLoS Comput. Biol. 2020, 16, e1008173. [Google Scholar] [CrossRef]
- Dixon, J.R.; Selvaraj, S.; Yue, F.; Kim, A.; Li, Y.; Shen, Y.; Hu, M.; Liu, J.S.; Ren, B. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 2012, 485, 376–380. [Google Scholar] [CrossRef]
- Rao, S.S.; Huntley, M.H.; Durand, N.C.; Stamenova, E.K.; Bochkov, I.D.; Robinson, J.T.; Sanborn, A.L.; Machol, I.; Omer, A.D.; Lander, E.S.; et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 2014, 159, 1665–1680. [Google Scholar] [CrossRef] [PubMed]
- Fortin, J.P.; Hansen, K.D. Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data. Genome Biol. 2015, 16, 180. [Google Scholar] [CrossRef]
- Galitsyna, A.A.; Gelfand, M.S. Single-cell Hi-C data analysis: Safety in numbers. Brief. Bioinform. 2021, 22, bbab316. [Google Scholar] [CrossRef] [PubMed]
- Dautle, M.A.; Chen, Y. Single-Cell Hi-C Technologies and Computational Data Analysis. Adv. Sci. 2025, 12, e2412232. [Google Scholar] [CrossRef]
- Paulsen, J.; Gramstad, O.; Collas, P. Manifold Based Optimization for Single-Cell 3D Genome Reconstruction. PLoS Comput. Biol. 2015, 11, e1004396. [Google Scholar] [CrossRef]
- Nagano, T.; Lubling, Y.; Yaffe, E.; Wingett, S.W.; Dean, W.; Tanay, A.; Fraser, P. Single-cell Hi-C for genome-wide detection of chromatin interactions that occur simultaneously in a single cell. Nat. Protoc. 2015, 10, 1986–2003. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; An, L.; Xu, J.; Zhang, B.; Zheng, W.J.; Hu, M.; Tang, J.; Yue, F. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat. Commun. 2018, 9, 750. [Google Scholar] [CrossRef] [PubMed]
- Liu, T.; Wang, Z. HiCNN: A very deep convolutional neural network to better enhance the resolution of Hi-C data. Bioinformatics 2019, 35, 4222–4228. [Google Scholar] [CrossRef]
- Cheng, Z.; Liu, L.; Lin, G.; Yi, C.; Chu, X.; Liang, Y.; Zhou, W.; Jin, X. ReHiC: Enhancing Hi-C data resolution via residual convolutional network. J. Bioinform. Comput. Biol. 2021, 19, 2150001. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Dai, Z. SRHiC: A Deep Learning Model to Enhance the Resolution of Hi-C Data. Front. Genet. 2020, 11, 353. [Google Scholar] [CrossRef]
- Highsmith, M.; Cheng, J. VEHiCLE: A Variationally Encoded Hi-C Loss Enhancement algorithm for improving and generating Hi-C data. Sci. Rep. 2021, 11, 8880. [Google Scholar] [CrossRef]
- Zhang, S.; Plummer, D.; Lu, L.; Cui, J.; Xu, W.; Wang, M.; Liu, X.; Prabhakar, N.; Shrinet, J.; Srinivasan, D.; et al. DeepLoop robustly maps chromatin interactions from sparse allele-resolved or single-cell Hi-C data at kilobase resolution. Nat. Genet. 2022, 54, 1013–1025. [Google Scholar] [CrossRef]
- Dimmick, M.C.; Lee, L.J.; Frey, B.J. HiCSR: A Hi-C super-resolution framework for producing highly realistic contact maps. bioRxiv 2020. [Google Scholar] [CrossRef]
- Hong, H.; Jiang, S.; Li, H.; Du, G.; Sun, Y.; Tao, H.; Quan, C.; Zhao, C.; Li, R.; Li, W.; et al. DeepHiC: A generative adversarial network for enhancing Hi-C data resolution. PLoS Comput. Biol. 2020, 16, e1007287. [Google Scholar] [CrossRef]
- Liu, Q.; Lv, H.; Jiang, R. hicGAN infers super resolution Hi-C data with generative adversarial networks. Bioinformatics 2019, 35, i99–i107. [Google Scholar] [CrossRef]
- Hu, Y.; Ma, W. EnHiC: Learning fine-resolution Hi-C contact maps using a generative adversarial framework. Bioinformatics 2021, 37, i272–i279. [Google Scholar] [CrossRef]
- Hicks, P.; Oluwadare, O. HiCARN: Resolution enhancement of Hi-C data using cascading residual networks. Bioinformatics 2022, 38, 2414–2421. [Google Scholar] [CrossRef] [PubMed]
- Xie, Q.; Han, C.; Jin, V.; Lin, S. HiCImpute: A Bayesian hierarchical model for identifying structural zeros and enhancing single cell Hi-C data. PLoS Comput. Biol. 2022, 18, e1010129. [Google Scholar] [CrossRef] [PubMed]
- Zhou, J.; Ma, J.; Chen, Y.; Cheng, C.; Bao, B.; Peng, J.; Sejnowski, T.J.; Dixon, J.R.; Ecker, J.R. Robust single-cell Hi-C clustering by convolution- and random-walk-based imputation. Proc. Natl. Acad. Sci. USA 2019, 116, 14011–14018. [Google Scholar] [CrossRef]
- Zheng, J.; Yang, Y.; Dai, Z. Subgraph extraction and graph representation learning for single cell Hi-C imputation and clustering. Brief. Bioinform. 2023, 25, bbad379. [Google Scholar] [CrossRef]
- Zhang, R.; Zhou, T.; Ma, J. Multiscale and integrative single-cell Hi-C analysis with Higashi. Nat. Biotechnol. 2022, 40, 254–261. [Google Scholar] [CrossRef]
- Liu, Q.; Zeng, W.; Zhang, W.; Wang, S.; Chen, H.; Jiang, R.; Zhou, M.; Zhang, S. Deep generative modeling and clustering of single cell Hi-C data. Brief. Bioinform. 2023, 24, bbac494. [Google Scholar] [CrossRef]
- Wang, Y.; Guo, Z.; Cheng, J. Single-cell Hi-C data enhancement with deep residual and generative adversarial networks. Bioinformatics 2023, 39, btad458. [Google Scholar] [CrossRef]
- Zheng, Y.; Shen, S.; Keleş, S. Normalization and de-noising of single-cell Hi-C data with BandNorm and scVI-3D. Genome Biol. 2022, 23, 222. [Google Scholar] [CrossRef] [PubMed]
- Lu, Z.; Li, J.; Liu, H.; Huang, C.; Zhang, L.; Zeng, T. Transformer for Single Image Super-Resolution. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 19–20 June 2022. [Google Scholar]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Shazeer, N. GLU Variants Improve Transformer. arXiv 2020, arXiv:2002.05202. [Google Scholar]
- Knight, P.A.; Ruiz, D. A fast algorithm for matrix balancing. IMA J. Numer. Anal. 2013, 33, 1029–1047. [Google Scholar] [CrossRef]
- Langmead, B.; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [Google Scholar] [CrossRef] [PubMed]
- Ursu, O.; Boley, N.; Taranova, M.; Wang, Y.X.R.; Yardimci, G.G.; Stafford Noble, W.; Kundaje, A. GenomeDISCO: A concordance score for chromosome conformation capture experiments using random walks on contact map graphs. Bioinformatics 2018, 34, 2701–2707. [Google Scholar] [CrossRef]
- Yang, T.; Zhang, F.; Yardımcı, G.G.; Song, F.; Hardison, R.C.; Noble, W.S.; Yue, F.; Li, Q. HiCRep: Assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res. 2017, 27, 1939–1949. [Google Scholar] [CrossRef] [PubMed]
- Tian, Y.; Zhao, X.; Huang, W. Meta-learning approaches for learning-to-learn in deep learning: A survey. Neurocomputing 2022, 494, 203–223. [Google Scholar] [CrossRef]
- Korecki, M. Deep Reinforcement Meta-Learning and Self-Organization in Complex Systems: Applications to Traffic Signal Control. Entropy 2023, 25, 982. [Google Scholar] [CrossRef]
- Li, A.; Li, H.; Yuan, G. Continual Learning with Deep Neural Networks in Physiological Signal Data: A Survey. Healthcare 2024, 12, 155. [Google Scholar] [CrossRef]
- Tian, W.; Zhou, J.; Bartlett, A.; Zeng, Q.; Liu, H.; Castanon, R.G.; Kenworthy, M.; Altshul, J.; Valadon, C.; Aldridge, A.; et al. Single-cell DNA methylation and 3D genome architecture in the human brain. Science 2023, 382, eadf5357. [Google Scholar] [CrossRef]
- Lee, D.S.; Luo, C.; Zhou, J.; Chandran, S.; Rivkin, A.; Bartlett, A.; Nery, J.R.; Fitzpatrick, C.; O’Connor, C.; Dixon, J.R.; et al. Simultaneous profiling of 3D genome structure and DNA methylation in single human cells. Nat. Methods 2019, 16, 999–1006. [Google Scholar] [CrossRef]
- Qin, L.; Zhang, G.; Zhang, S.; Chen, Y. Deep Batch Integration and Denoise of Single-Cell RNA-Seq Data. Adv. Sci. 2024, 11, e2308934. [Google Scholar] [CrossRef]
- Yu, M.; Abnousi, A.; Zhang, Y.; Li, G.; Lee, L.; Chen, Z.; Fang, R.; Lagler, T.M.; Yang, Y.; Wen, J.; et al. SnapHiC: A computational pipeline to identify chromatin loops from single-cell Hi-C data. Nat. Methods 2021, 18, 1056–1059. [Google Scholar] [CrossRef]
- Zhang, R.; Zhou, T.; Ma, J. Ultrafast and interpretable single-cell 3D genome analysis with Fast-Higashi. Cell Syst. 2022, 13, 798–807.e6. [Google Scholar] [CrossRef] [PubMed]
1/16 Down-Sampled | 1/32 Down-Sampled | |||||
---|---|---|---|---|---|---|
Model | SSIM | PSNR | MSE | SSIM | PSNR | MSE |
HiCPlus | 0.8763 | 31.1084 | 0.0008 | 0.8759 | 32.2933 | 0.0006 |
DeepHiC | 0.8979 | 34.5182 | 0.0003 | 0.8838 | 34.0568 | 0.0004 |
HiCNN | 0.8997 | 33.8231 | 0.0004 | 0.8831 | 32.6867 | 0.0006 |
HiCSR | 0.9016 | 30.8811 | 0.0009 | 0.8782 | 33.1212 | 0.0005 |
HiCARN | 0.9097 | 35.1358 | 0.0003 | 0.8969 | 34.3054 | 0.0003 |
HiCENT | 0.9152 | 35.2673 | 0.0003 | 0.9026 | 34.4197 | 0.0003 |
1/64 down-sampled | 1/100 down-sampled | |||||
Model | SSIM | PSNR | MSE | SSIM | PSNR | MSE |
HiCPlus | 0.8491 | 30.94 | 0.0008 | 0.8436 | 30.8603 | 0.0008 |
DeepHiC | 0.8709 | 32.6925 | 0.0005 | 0.8528 | 32.0369 | 0.0007 |
HiCNN | 0.8699 | 32.1255 | 0.0007 | 0.8609 | 32.0144 | 0.0006 |
HiCSR | 0.8676 | 32.3657 | 0.0006 | 0.8616 | 32.0048 | 0.0006 |
HiCARN | 0.8843 | 33.4657 | 0.0005 | 0.8756 | 32.9561 | 0.0005 |
HiCENT | 0.8899 | 33.5145 | 0.0005 | 0.8828 | 33.0139 | 0.0005 |
Model | HiCPlus | DeepHiC | HiCNN | HiCSR | HiCARN | HiCENT |
---|---|---|---|---|---|---|
Chr4 | 0.8782 | 0.9013 | 0.8928 | 0.8832 | 0.9122 | 0.9162 |
Chr14 | 0.8869 | 0.9102 | 0.9051 | 0.897 | 0.9195 | 0.923 |
Chr16 | 0.865 | 0.89 | 0.882 | 0.7311 | 0.9027 | 0.9062 |
Chr20 | 0.8907 | 0.9157 | 0.9117 | 0.9043 | 0.9235 | 0.9275 |
Average | 0.8802 | 0.9043 | 0.8979 | 0.8539 | 0.914475 | 0.9182 |
Model | HiCPlus | DeepHiC | HiCNN | HiCSR | HiCARN | HiCENT |
---|---|---|---|---|---|---|
Chr4 | 0.7953 | 0.8523 | 0.8337 | 0.8716 | 0.8666 | 0.8754 |
Chr14 | 0.8857 | 0.9158 | 0.9098 | 0.9317 | 0.9278 | 0.9340 |
Chr16 | 0.8813 | 0.9144 | 0.8989 | 0.9135 | 0.9223 | 0.9262 |
Chr20 | 0.8776 | 0.9158 | 0.9076 | 0.9255 | 0.9213 | 0.9124 |
Average | 0.8600 | 0.8995 | 0.8875 | 0.9106 | 0.9095 | 0.9120 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gao, R.; Ferraro, T.N.; Chen, L.; Zhang, S.; Chen, Y. Enhancing Single-Cell and Bulk Hi-C Data Using a Generative Transformer Model. Biology 2025, 14, 288. https://doi.org/10.3390/biology14030288
Gao R, Ferraro TN, Chen L, Zhang S, Chen Y. Enhancing Single-Cell and Bulk Hi-C Data Using a Generative Transformer Model. Biology. 2025; 14(3):288. https://doi.org/10.3390/biology14030288
Chicago/Turabian StyleGao, Ruoying, Thomas N. Ferraro, Liang Chen, Shaoqiang Zhang, and Yong Chen. 2025. "Enhancing Single-Cell and Bulk Hi-C Data Using a Generative Transformer Model" Biology 14, no. 3: 288. https://doi.org/10.3390/biology14030288
APA StyleGao, R., Ferraro, T. N., Chen, L., Zhang, S., & Chen, Y. (2025). Enhancing Single-Cell and Bulk Hi-C Data Using a Generative Transformer Model. Biology, 14(3), 288. https://doi.org/10.3390/biology14030288