Scaling for African Inclusion in High-Throughput Whole Cancer Genome Bioinformatic Workflows
Simple Summary
Abstract
1. Introduction
2. WGS Data of African Patient-Derived Tumours
2.1. Cohort Information of African Patients
2.2. Cancer Discoveries from African Genomic Studies
2.3. Challenges of Analysing WGS Data of African Patients
3. Rapid and Scalable HPC Workflow for African Genomic Studies
3.1. SAPCS Workflow Overview
3.2. High-Level Parallelism
3.2.1. Parallelism via Physical Data Chunking for Alignment
3.2.2. Parallelism via Genomic Interval Chunking
3.3. Integration with Workflow Management Tools
4. Emerging Technologies and Resources to Be Integrated to African Genomic Studies
5. Conclusions and Challenges
Supplementary Materials
Author Contributions
Funding
Conflicts of Interest
References
- Bray, F.; Laversanne, M.; Sung, H.; Ferlay, J.; Siegel, R.L.; Soerjomataram, I.; Jemal, A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2024, 74, 229–263. [Google Scholar] [CrossRef]
- Rubagumya, F.; Carson, L.; Mushonga, M.; Manirakiza, A.; Murenzi, G.; Abdihamid, O.; Athman, A.; Mungo, C.; Booth, C.; Hammad, N. An analysis of the African cancer research ecosystem: Tackling disparities. BMJ Glob. Health 2023, 8, e011338. [Google Scholar] [CrossRef]
- Drake, T.M.; Knight, S.R.; Harrison, E.M.; Søreide, K. Global inequities in precision medicine and molecular cancer research. Front. Oncol. 2018, 8, 346. [Google Scholar] [CrossRef] [PubMed]
- Pereira, L.; Mutesa, L.; Tindana, P.; Ramsay, M. African genetic diversity and adaptation inform a precision medicine agenda. Nat. Rev. Genet. 2021, 22, 284–306. [Google Scholar] [CrossRef]
- Omotoso, O.; Teibo, J.O.; Atiba, F.A.; Oladimeji, T.; Paimo, O.K.; Ataya, F.S.; Batiha, G.E.-S.; Alexiou, A. Addressing cancer care inequities in sub-Saharan Africa: Current challenges and proposed solutions. Int. J. Equity Health 2023, 22, 189. [Google Scholar] [CrossRef]
- Lawson, D.J.; Van Dorp, L.; Falush, D. A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots. Nat. Commun. 2018, 9, 3258. [Google Scholar] [CrossRef]
- Liu, W.; Zheng, S.L.; Na, R.; Wei, L.; Sun, J.; Gallagher, J.; Wei, J.; Resurreccion, W.K.; Ernst, S.; Sfanos, K.S. Distinct genomic alterations in prostate tumors derived from African American men. Mol. Cancer Res. 2020, 18, 1815–1824. [Google Scholar] [CrossRef]
- Kittles, R.A.; Baffoe-Bonnie, A.B.; Moses, T.Y.; Robbins, C.M.; Ahaghotu, C.; Huusko, P.; Pettaway, C.; Vijayakumar, S.; Bennett, J.; Hoke, G. A common nonsense mutation in EphB2 is associated with prostate cancer risk in African American men with a positive family history. J. Med. Genet. 2006, 43, 507–511. [Google Scholar] [CrossRef] [PubMed]
- Khani, F.; Mosquera, J.M.; Park, K.; Blattner, M.; O’Reilly, C.; MacDonald, T.Y.; Chen, Z.; Srivastava, A.; Tewari, A.K.; Barbieri, C.E. Evidence for molecular differences in prostate cancer between African American and Caucasian men. Clin. Cancer Res. 2014, 20, 4925–4934. [Google Scholar] [CrossRef]
- Huang, F.W.; Mosquera, J.M.; Garofalo, A.; Oh, C.; Baco, M.; Amin-Mansour, A.; Rabasha, B.; Bahl, S.; Mullane, S.A.; Robinson, B.D. Exome sequencing of African-American prostate cancer reveals loss-of-function ERF mutations. Cancer Discov. 2017, 7, 973–983. [Google Scholar] [CrossRef] [PubMed]
- Blattner, M.; Lee, D.J.; O’Reilly, C.; Park, K.; MacDonald, T.Y.; Khani, F.; Turner, K.R.; Chiu, Y.-L.; Wild, P.J.; Dolgalev, I. SPOP mutations in prostate cancer across demographically diverse patient cohorts. Neoplasia 2014, 16, 14-W10. [Google Scholar] [CrossRef]
- Yuan, J.; Kensler, K.H.; Hu, Z.; Zhang, Y.; Zhang, T.; Jiang, J.; Xu, M.; Pan, Y.; Long, M.; Montone, K.T. Integrative comparison of the genomic and transcriptomic landscape between prostate cancer patients of predominantly African or European genetic ancestry. PLoS Genet. 2020, 16, e1008641. [Google Scholar] [CrossRef] [PubMed]
- Lindquist, K.J.; Paris, P.L.; Hoffmann, T.J.; Cardin, N.J.; Kazma, R.; Mefford, J.A.; Simko, J.P.; Ngo, V.; Chen, Y.; Levin, A.M. Mutational landscape of aggressive prostate tumors in African American men. Cancer Res. 2016, 76, 1860–1868. [Google Scholar] [CrossRef] [PubMed]
- Xiao, Q.; Sun, Y.; Dobi, A.; Srivastava, S.; Wang, W.; Srivastava, S.; Ji, Y.; Hou, J.; Zhao, G.-P.; Li, Y. Systematic analysis reveals molecular characteristics of ERG-negative prostate cancer. Sci. Rep. 2018, 8, 12868. [Google Scholar] [CrossRef] [PubMed]
- Petrovics, G.; Li, H.; Stümpel, T.; Tan, S.-H.; Young, D.; Katta, S.; Li, Q.; Ying, K.; Klocke, B.; Ravindranath, L. A novel genomic alteration of LSAMP associates with aggressive prostate cancer in African American men. EBioMedicine 2015, 2, 1957–1964. [Google Scholar] [CrossRef]
- Consortium, I.C.G. International network of cancer genome projects. Nature 2010, 464, 993. [Google Scholar] [CrossRef]
- Weinstein, J.N.; Collisson, E.A.; Mills, G.B.; Shaw, K.R.; Ozenberger, B.A.; Ellrott, K.; Shmulevich, I.; Sander, C.; Stuart, J.M. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 2013, 45, 1113–1120. [Google Scholar] [CrossRef]
- Aaltonen, L.A.; Abascal, F.; Abeshouse, A.; Aburatani, H.; Adams, D.J.; Agrawal, N.; Ahn, K.S.; Ahn, S.-M.; Aikata, H.; Akbani, R.; et al. Pan-cancer analysis of whole genomes. Nature 2020, 578, 82–93. [Google Scholar] [CrossRef]
- Jiagge, E.; Jin, D.X.; Newberg, J.Y.; Perea-Chamblee, T.; Pekala, K.R.; Fong, C.; Waters, M.; Ma, D.; Dei-Adomakoh, Y.; Erb, G. Tumor sequencing of African ancestry reveals differences in clinically relevant alterations across common cancers. Cancer Cell 2023, 41, 1963–1971.e1963. [Google Scholar] [CrossRef]
- Brown, L.M.; Hagenson, R.A.; Koklič, T.; Urbančič, I.; Qiao, L.; Strancar, J.; Sheltzer, J.M. An elevated rate of whole-genome duplications in cancers from Black patients. Nat. Commun. 2024, 15, 8218. [Google Scholar] [CrossRef]
- Johnson, J.A.; Moore, B.J.; Syrnioti, G.; Eden, C.M.; Wright, D.; Newman, L.A. Landmark series: The cancer genome atlas and the study of breast cancer disparities. Ann. Surg. Oncol. 2023, 30, 6427–6440. [Google Scholar] [CrossRef]
- Van der Auwera, G.A.; Carneiro, M.O.; Hartl, C.; Poplin, R.; Del Angel, G.; Levy-Moonshine, A.; Jordan, T.; Shakir, K.; Roazen, D.; Thibault, J. From FastQ data to high-confidence variant calls: The genome analysis toolkit best practices pipeline. Curr. Protoc. Bioinform. 2013, 43, 11.10.11–11.10.33. [Google Scholar] [CrossRef] [PubMed]
- McKenna, A.; Hanna, M.; Banks, E.; Sivachenko, A.; Cibulskis, K.; Kernytsky, A.; Garimella, K.; Altshuler, D.; Gabriel, S.; Daly, M. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010, 20, 1297–1303. [Google Scholar] [CrossRef]
- Huang, Z.; Rustagi, N.; Veeraraghavan, N.; Carroll, A.; Gibbs, R.; Boerwinkle, E.; Venkata, M.G.; Yu, F. A hybrid computational strategy to address WGS variant analysis in > 5000 samples. BMC Bioinform. 2016, 17, 1–12. [Google Scholar] [CrossRef]
- Meggendorfer, M.; Jobanputra, V.; Wrzeszczynski, K.O.; Roepman, P.; de Bruijn, E.; Cuppen, E.; Buttner, R.; Caldas, C.; Grimmond, S.; Mullighan, C.G. Analytical demands to use whole-genome sequencing in precision oncology. In Seminars in cancer Biology; Elsevier: Amsterdam, The Netherlands, 2022; Volume 84, pp. 16–22. [Google Scholar]
- Jaratlerdsiri, W.; Jiang, J.; Gong, T.; Patrick, S.M.; Willet, C.; Chew, T.; Lyons, R.J.; Haynes, A.-M.; Pasqualim, G.; Louw, M.; et al. African-specific molecular taxonomy of prostate cancer. Nature 2022, 609, 552–559. [Google Scholar] [CrossRef]
- Jaratlerdsiri, W.; Chan, E.K.; Gong, T.; Petersen, D.C.; Kalsbeek, A.M.; Venter, P.A.; Stricker, P.D.; Bornman, M.R.; Hayes, V.M. Whole-genome sequencing reveals elevated tumor mutational burden and initiating driver mutations in African men with treatment-naïve, high-risk prostate cancer. Cancer Res. 2018, 78, 6736–6746. [Google Scholar] [CrossRef]
- Moody, S.; Senkin, S.; Islam, S.M.A.; Wang, J.; Nasrollahzadeh, D.; Cortez Cardoso Penha, R.; Fitzgerald, S.; Bergstrom, E.N.; Atkins, J.; He, Y.; et al. Mutational signatures in esophageal squamous cell carcinoma from eight countries with varying incidence. Nat. Genet. 2021, 53, 1553–1563. [Google Scholar] [CrossRef]
- Van Loon, K.; Mmbaga, E.J.; Mushi, B.P.; Selekwa, M.; Mwanga, A.; Akoko, L.O.; Mwaiselage, J.; Mosha, I.; Ng, D.L.; Wu, W. A Genomic Analysis of Esophageal Squamous Cell Carcinoma in Eastern Africa. Cancer Epidemiol. Biomark. Prev. 2023, 32, 1411–1420. [Google Scholar] [CrossRef]
- Grande, B.M.; Gerhard, D.S.; Jiang, A.; Griner, N.B.; Abramson, J.S.; Alexander, T.B.; Allen, H.; Ayers, L.W.; Bethony, J.M.; Bhatia, K. Genome-wide discovery of somatic coding and noncoding mutations in pediatric endemic and sporadic Burkitt lymphoma. Blood J. Am. Soc. Hematol. 2019, 133, 1313–1324. [Google Scholar] [CrossRef] [PubMed]
- Thomas, N.; Dreval, K.; Gerhard, D.S.; Hilton, L.K.; Abramson, J.S.; Ambinder, R.F.; Barta, S.; Bartlett, N.L.; Bethony, J.; Bhatia, K. Genetic subgroups inform on pathobiology in adult and pediatric Burkitt lymphoma. Blood 2023, 141, 904–916. [Google Scholar] [CrossRef] [PubMed]
- Ansari-Pour, N.; Zheng, Y.; Yoshimatsu, T.F.; Sanni, A.; Ajani, M.; Reynier, J.-B.; Tapinos, A.; Pitt, J.J.; Dentro, S.; Woodard, A. Whole-genome analysis of Nigerian patients with breast cancer reveals ethnic-driven somatic evolution and distinct genomic subtypes. Nat. Commun. 2021, 12, 6946. [Google Scholar] [CrossRef]
- Hayes, V.M.; Patrick, S.M.; Shirinde, J.; Jaratlerdsiri, W.; Nenzhelele, M.; Radzuma, M.B.; Gheybi, K.; Mokua, W.; Oyaro, M.O.; Moreira, D.M. Health equity research outcomes and improvement Consortium Prostate Cancer Health Precision Africa1K: Closing the health equity gap through rural community inclusion. J. Urol. Oncol. 2024, 22, 144–149. [Google Scholar] [CrossRef]
- Zhang, R.; Li, C.; Wan, Z.; Qin, J.; Li, Y.; Wang, Z.; Zheng, Q.; Kang, X.; Chen, X.; Li, Y. Comparative genomic analysis of esophageal squamous cell carcinoma among different geographic regions. Front. Oncol. 2023, 12, 999424. [Google Scholar] [CrossRef] [PubMed]
- Li, M.; Zhang, Z.; Wang, Q.; Yi, Y.; Li, B. Integrated cohort of esophageal squamous cell cancer reveals genomic features underlying clinical characteristics. Nat. Commun. 2022, 13, 5268. [Google Scholar] [CrossRef] [PubMed]
- Cui, Y.; Chen, H.; Xi, R.; Cui, H.; Zhao, Y.; Xu, E.; Yan, T.; Lu, X.; Huang, F.; Kong, P. Whole-genome sequencing of 508 patients identifies key molecular features associated with poor prognosis in esophageal squamous cell carcinoma. Cell Res. 2020, 30, 902–913. [Google Scholar] [CrossRef]
- Gong, T.; Jaratlerdsiri, W.; Jiang, J.; Willet, C.; Chew, T.; Patrick, S.M.; Lyons, R.J.; Haynes, A.-M.; Pasqualim, G.; Brum, I.S. Genome-wide interrogation of structural variation reveals novel African-specific prostate cancer oncogenic drivers. Genome Med. 2022, 14, 100. [Google Scholar] [CrossRef]
- Huang, R.; Bornman, M.R.; Stricker, P.D.; Simoni Brum, I.; Mutambirwa, S.B.; Jaratlerdsiri, W.; Hayes, V.M. The impact of telomere length on prostate cancer aggressiveness, genomic instability and health disparities. Sci. Rep. 2024, 14, 7706. [Google Scholar] [CrossRef]
- Soh, P.X.; Adams, A.; Bornman, M.R.; Jiang, J.; Stricker, P.D.; Mutambirwa, S.B.; Jaratlerdsiri, W.; Hayes, V.M. Y chromosome variation and prostate cancer ancestral disparities. iScience 2025, 28, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 2013, arXiv:1303.3997. [Google Scholar]
- Chen, Z.; Yuan, Y.; Chen, X.; Chen, J.; Lin, S.; Li, X.; Du, H. Systematic comparison of somatic variant calling performance among different sequencing depth and mutation frequency. Sci. Rep. 2020, 10, 3501. [Google Scholar] [CrossRef]
- Poplin, R.; Ruano-Rubio, V.; DePristo, M.A.; Fennell, T.J.; Carneiro, M.O.; Van der Auwera, G.A.; Kling, D.E.; Gauthier, L.D.; Levy-Moonshine, A.; Roazen, D. Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv 2017. bioRxiv:201178. [Google Scholar]
- Cibulskis, K.; Lawrence, M.S.; Carter, S.L.; Sivachenko, A.; Jaffe, D.; Sougnez, C.; Gabriel, S.; Meyerson, M.; Lander, E.S.; Getz, G. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 2013, 31, 213–219. [Google Scholar] [CrossRef] [PubMed]
- Cameron, D.L.; Baber, J.; Shale, C.; Valle-Inclan, J.E.; Besselink, N.; van Hoeck, A.; Janssen, R.; Cuppen, E.; Priestley, P.; Papenfuss, A.T. GRIDSS2: Comprehensive characterisation of somatic structural variation using single breakend variants and structural variant phasing. Genome Biol. 2021, 22, 1–25. [Google Scholar] [CrossRef]
- Chen, X.; Schulz-Trieglaff, O.; Shaw, R.; Barnes, B.; Schlesinger, F.; Källberg, M.; Cox, A.J.; Kruglyak, S.; Saunders, C.T. Manta: Rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 2015, 32, 1220–1222. [Google Scholar] [CrossRef] [PubMed]
- Kim, S.; Scheffler, K.; Halpern, A.L.; Bekritsky, M.A.; Noh, E.; Källberg, M.; Chen, X.; Kim, Y.; Beyter, D.; Krusche, P. Strelka2: Fast and accurate calling of germline and somatic variants. Nat. Methods 2018, 15, 591–594. [Google Scholar] [CrossRef] [PubMed]
- Jones, D.; Raine, K.M.; Davies, H.; Tarpey, P.S.; Butler, A.P.; Teague, J.W.; Nik-Zainal, S.; Campbell, P.J. cgpCaVEManWrapper: Simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinform. 2016, 56, 15.10.11–15.10.18. [Google Scholar] [CrossRef]
- Raine, K.M.; Hinton, J.; Butler, A.P.; Teague, J.W.; Davies, H.; Tarpey, P.; Nik-Zainal, S.; Campbell, P.J. cgpPindel: Identifying somatically acquired insertion and deletion events from paired end sequencing. Curr. Protoc. Bioinform. 2015, 52, 15.17.11–15.17.12. [Google Scholar] [CrossRef]
- Radenbaugh, A.J.; Ma, S.; Ewing, A.; Stuart, J.M.; Collisson, E.A.; Zhu, J.; Haussler, D. RADIA: RNA and DNA integrated analysis for somatic mutation detection. PLoS ONE 2014, 9, e111516. [Google Scholar] [CrossRef]
- Wilm, A.; Aw, P.P.K.; Bertrand, D.; Yeo, G.H.T.; Ong, S.H.; Wong, C.H.; Khor, C.C.; Petric, R.; Hibberd, M.L.; Nagarajan, N. LoFreq: A sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res. 2012, 40, 11189–11201. [Google Scholar] [CrossRef]
- Rimmer, A.; Phan, H.; Mathieson, I.; Iqbal, Z.; Twigg, S.R.F.; Wilkie, A.O.M.; McVean, G.; Lunter, G.; Consortium, W.G.S. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 2014, 46, 912–918. [Google Scholar] [CrossRef]
- Saunders, C.T.; Wong, W.S.; Swamy, S.; Becq, J.; Murray, L.J.; Cheetham, R.K. Strelka: Accurate somatic small-variant calling from sequenced tumor–normal sample pairs. Bioinformatics 2012, 28, 1811–1817. [Google Scholar] [CrossRef]
- Rausch, T.; Zichner, T.; Schlattl, A.; Stütz, A.M.; Benes, V.; Korbel, J.O. DELLY: Structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 2012, 28, i333–i339. [Google Scholar] [CrossRef]
- Layer, R.M.; Chiang, C.; Quinlan, A.R.; Hall, I.M. LUMPY: A probabilistic framework for structural variant discovery. Genome Biol. 2014, 15, R84. [Google Scholar] [CrossRef] [PubMed]
- Willet, C.E.; Chew, T.; Samaha, G.; Sadsad, R. Fastq-to-bam @ NCI-Gadi. WorkflowHub. 2021. Available online: https://workflowhub.eu/workflows/146?version=1 (accessed on 9 June 2025).
- Chew, T.; Willet, C.E.; Samaha, G.; Sadsad, R. Germline-ShortV @ NCI-Gadi. WorkflowHub. 2021. Available online: https://workflowhub.eu/workflows/143?version=1 (accessed on 9 June 2025).
- Chew, T.; Willet, C.E.; Sadsad, R. Somatic-ShortV @ NCI-Gadi. WorkflowHub. 2021. Available online: https://workflowhub.eu/workflows/148?version=1 (accessed on 9 June 2025).
- Chen, S.; Zhou, Y.; Chen, Y.; Gu, J. fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018, 34, i884–i890. [Google Scholar] [CrossRef] [PubMed]
- Tarasov, A.; Vilella, A.J.; Cuppen, E.; Nijman, I.J.; Prins, P. Sambamba: Fast processing of NGS alignment formats. Bioinformatics 2015, 31, 2032–2034. [Google Scholar] [CrossRef] [PubMed]
- Faust, G.G.; Hall, I.M. SAMBLASTER: Fast duplicate marking and structural variant read extraction. Bioinformatics 2014, 30, 2503–2505. [Google Scholar] [CrossRef]
- García-Alcalde, F.; Okonechnikov, K.; Carbonell, J.; Cruz, L.M.; Götz, S.; Tarazona, S.; Dopazo, J.; Meyer, T.F.; Conesa, A. Qualimap: Evaluating next-generation sequencing alignment data. Bioinformatics 2012, 28, 2678–2679. [Google Scholar] [CrossRef]
- Favero, F.; Joshi, T.; Marquard, A.M.; Birkbak, N.J.; Krzystanek, M.; Li, Q.; Szallasi, Z.; Eklund, A.C. Sequenza: Allele-specific copy number and mutation profiles from tumor sequencing data. Ann. Oncol. 2015, 26, 64–70. [Google Scholar] [CrossRef]
- Gong, T.; Hayes, V.M.; Chan, E.K. Detection of somatic structural variants from short-read next-generation sequencing data. Brief. Bioinform. 2021, 22, bbaa056. [Google Scholar] [CrossRef]
- Di Tommaso, P.; Chatzou, M.; Floden, E.W.; Barja, P.P.; Palumbo, E.; Notredame, C. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 2017, 35, 316–319. [Google Scholar] [CrossRef]
- Levy, B.; Kanagal-Shamanna, R.; Sahajpal, N.S.; Neveling, K.; Rack, K.; Dewaele, B.; Olde Weghuis, D.; Stevens-Kroef, M.; Puiggros, A.; Mallo, M. A framework for the clinical implementation of optical genome mapping in hematologic malignancies. Am. J. Hematol. 2024, 99, 642–661. [Google Scholar] [CrossRef]
- Sakamoto, Y.; Sereewattanawoot, S.; Suzuki, A. A new era of long-read sequencing for cancer genomics. J. Hum. Genet. 2020, 65, 3–10. [Google Scholar] [CrossRef] [PubMed]
- Rodriguez, I.; Rossi, N.M.; Keskus, A.G.; Xie, Y.; Ahmad, T.; Bryant, A.; Lou, H.; Paredes, J.G.; Milano, R.; Rao, N.; et al. Insights into the mechanisms and structure of breakage-fusion-bridge cycles in cervical cancer using long-read sequencing. Am. J. Hum. Genet. 2024, 111, 544–561. [Google Scholar] [CrossRef]
- Chan, E.K.; Cameron, D.L.; Petersen, D.C.; Lyons, R.J.; Baldi, B.F.; Papenfuss, A.T.; Thomas, D.M.; Hayes, V.M. Optical mapping reveals a higher level of genomic architecture of chained fusions in cancer. Genome Res. 2018, 28, 726–738. [Google Scholar] [CrossRef]
- Nurk, S.; Koren, S.; Rhie, A.; Rautiainen, M.; Bzikadze, A.V.; Mikheenko, A.; Vollger, M.R.; Altemose, N.; Uralsky, L.; Gershman, A. The complete sequence of a human genome. Science 2022, 376, 44–53. [Google Scholar] [CrossRef]
- Rhie, A.; Nurk, S.; Cechova, M.; Hoyt, S.J.; Taylor, D.J.; Altemose, N.; Hook, P.W.; Koren, S.; Rautiainen, M.; Alexandrov, I.A. The complete sequence of a human Y chromosome. Nature 2023, 621, 344–354. [Google Scholar] [CrossRef]
- Miga, K.H.; Koren, S.; Rhie, A.; Vollger, M.R.; Gershman, A.; Bzikadze, A.; Brooks, S.; Howe, E.; Porubsky, D.; Logsdon, G.A. Telomere-to-telomere assembly of a complete human X chromosome. Nature 2020, 585, 79–84. [Google Scholar] [CrossRef]
- Liao, W.-W.; Asri, M.; Ebler, J.; Doerr, D.; Haukness, M.; Hickey, G.; Lu, S.; Lucas, J.K.; Monlong, J.; Abel, H.J.; et al. A draft human pangenome reference. Nature 2023, 617, 312–324. [Google Scholar] [CrossRef] [PubMed]
- Rhie, A.; McCarthy, S.A.; Fedrigo, O.; Damas, J.; Formenti, G.; Koren, S.; Uliano-Silva, M.; Chow, W.; Fungtammasan, A.; Kim, J. Towards complete and error-free genome assemblies of all vertebrate species. Nature 2021, 592, 737–746. [Google Scholar] [CrossRef]
- Clarke, L.; Zheng-Bradley, X.; Smith, R.; Kulesha, E.; Xiao, C.; Toneva, I.; Vaughan, B.; Preuss, D.; Leinonen, R.; Shumway, M. The 1000 Genomes Project: Data management and community access. Nat. Methods 2012, 9, 459–462. [Google Scholar] [CrossRef] [PubMed]
Consortium or Project | Cancer Type | Country | Cohort Size a | Tissue Fixation b | Coverage of Tumour, Normal (Median/Mean) | Recruitment Time | Recruitment Hospitals |
---|---|---|---|---|---|---|---|
SAPCS [26,27] | PCa | South Africa | 123 | FF | 88.69X, 44.3X (median) | 2013–2018 | Polokwane Urology Clinic, Limpopo; Tshilidzini Hospital, Limpopo; Pretoria’s Steve Biko Academic Hospitals, Gauteng; Dr George Mukhari Academic Hospitals, Gauteng; and Kalafong Academic Hospital, Gauteng |
ESCCAPE [28] | ESCC | Kenya | 68 | FF | 49X, 26X (mean c) | 2014–2020 | Moi Teaching and Referral Hospital, Eldoret; |
Malawi | 59 | Queen Elizabeth Central Hospital, Blantyre; | |||||
Tanzania | 35 | Kilimanjaro Clinical Research Institute, Moshi | |||||
AfrECC [29] | ESCC | Tanzania | 61 | FFPE | 60X, 30X (targeted coverage, de facto values unavailable) | 2016–2018 | Muhimbili National Hospital, Dar es Salaam, |
BLGSP [30,31] | BL | Uganda | 87 | 83 FF, 4 FFPE | 82X, 41X (mean c); 72.6X (mean across sample types c) | Unavailable | Uganda Cancer Institute, Kampala; St Mary’s Hospital, Gulu |
NBCS [32] | BRCA | Nigeria | 97 | FPAX | 103.2X, 35.1X (mean) | 2013–2015 | Lagos State University Teaching Hospital, Lagos |
Cancer Type | Measurement | Values or Odds Ratios | p-Value | Comparison b |
---|---|---|---|---|
Short variants (nucleotide variants, insertion and deletion variants less than 50 bp) | ||||
PCa | Tumour mutational burden (TMB, mutations per Mb) | 1.197 versus 1.061 | 0.013 | EUR |
PCa | Predicted damaging mutations (count) | 14 versus 11 | 0.022 | EUR |
BRCA | Insertions and deletions (indels) | N/A | 6.5 × 10−5, 2 × 10−4 | EUR, AA |
Driver genes | ||||
BRCA | GATA3 | 6.3-fold | FDR = 0.038 | EUR, AA |
BRCA | Non-coding region, upstream of ZNF217 (frequency) | 42.3% versus 4.3% | FDR = 0.037 | EUR, AA |
BRCA | Non-coding region, spanning SYPL1 (frequency | 28.9% versus 0% | FDR = 0.097 | EUR, AA |
ESCC | TP53 (frequency) | 72% versus 74.8–87% [34,35,36] | - | EUR, AA |
BL | SIN3A (frequency) | 18.4% versus 9.1% | - | patients from the USA |
BL | HIST1H1E (frequency) | 9.2% versus 4.5% | - | |
BL | CHD8 (frequency) | 9.2% versus 4.5% | - | |
Somatic copy number alteration (SCNA) | ||||
PCa | Percentage of genome alteration (PGA) | 7.26% versus 2.82% | 0.021 | EUR |
BRCA | Whole-genome duplications (WGD) | 3-fold | FDR = 0.02 | EUR, AA |
Structural variants (SV) | ||||
PCa | Duplication (relative frequency, count) [37] | 1.6-fold, 2.5-fold | - | EUR |
PCa | A single type hyper-SV frequency [37] a | 2-fold | - | EUR |
PCa | PCAT1 | 9.09-fold | 0.012 | EUR |
PCa | TMPRSS2–ERG | 0.26-fold | 0.0004 | EUR |
Several types of variants combined | ||||
BRCA | intra-tumoral heterogeneity (ITH, increase %) | 3.4%, 5.7% | 0.005, 0.00017 | EUR, AA |
PCa | NCOA2 | 5.81-fold | 3.14 × 10−6 | EUR |
PCa | DDX11L1 | 4.17-fold | 0.0001 | EUR |
PCa | STK19 | 4.65-fold | 0.004 | EUR |
PCa | SETBP1 | 2.80-fold | 0.012 | EUR |
Consortium or Project | Genome | Variant Callers | ||
---|---|---|---|---|
Short Variants | Structural Variants | |||
Germline | Somatic | |||
SAPCS | GRCh38 | GATK HaplotypeCaller [42] | GATK MuTect2 [43] | GRIDSS [44], Manta [45] |
ESCCAPE | GRCh37 | Strelka2 [46] | Strelka2, and cgpCaVEMan [47] for SNVs; cgpPindel [48] for INDELs | BRASS a |
AfrECC | GRCh37 | - | RADIA [49] | - |
BLGSP | GRCh38 | - | Strelka2, GATK Mutect2, Lofreq [50], and SAGE b | GRIDSS, Manta |
NBCS | GRCh37 | Platypus [51] | GATK MuTect and Strelka [52] | Manta, DELLY [53], and Lumpy [54] |
Steps | Sample Type a | CPU/Task | Total Tasks | Batches | CPUs/Batch | Execution Time (h) | Main Algorithm with Version |
---|---|---|---|---|---|---|---|
Pipeline 1 Data pre-processing for variant discovery | 14.4 | ||||||
Split FASTQ | Bood | 4 | 20 | 1 | 96 | 0.9 | fastp [58] v0.20.0 |
Tumour | 4 | 20 | 1 | 96 | 1.8 | ||
Alignment | Both | 6 | 11,040 | 3 | 3840 | 0.5 | BWA-MEM v0.7.15 |
Merge | Bood | 24 | 20 | 1 | 480 | 0.4 | SAMBAMBA [59] v0.7.1 |
Tumour | 24 | 20 | 1 | 480 | 0.8 | ||
Mask duplicate | Bood | 14 | 20 | 1 | 280 | 1.3 | SAMBLASTER [60] v0.1.24 |
Tumour | 14 | 20 | 1 | 280 | 2.6 | ||
BQSR recal | Bood | 1 | 640 | 1 | 640 | 0.2 | GATK v4.4.0.0 b BaseRecalibrator |
Tumour | 1 | 640 | 1 | 640 | 0.3 | ||
BQSR apply | Bood | 2 | 480 | 1 | 960 | 0.3 | GATK ApplyBQSR |
Tumour | 2 | 480 | 1 | 960 | 0.6 | ||
qSignature | Bood | 24 | 20 | 1 | 480 | 0.7 | QSignature c v0.1pre (75) |
Tumour | 24 | 20 | 1 | 480 | 1.4 | ||
Qualimap | Bood | 6 | 20 | 2 | 144 | 1.4 | Qualimap [61] v.2.2.1 |
Tumour | 6 | 20 | 2 | 144 | 2.8 | ||
Sequenza | Pair | 2 | 480 | 1 | 504 | 3.6 | Sequenza [62] v3.0.0 |
Pipeline 2 Germline short variant discovery | 8.1 | ||||||
Variant call | Bood | 1 | 64,000 | 1 | 480 | 1.8 | GATK HaplotypeCaller |
Consolidation | Bood | 1 | 3200 | 11 | 144 | 1.3 | GATK GenomicsDBImport |
Joint genotyping | Bood | 1 | 3200 | 1 | 144 | 2 | GATK GenotypeGVCFs |
VQSR | Blood | 16 | 1 | 1 | 16 | 3 | GATK VariantFiltration, MakeSitesOnlyVcf, VariantRecalibrator, CollectVariantCallingMetrics, ApplyVQSR, CollectVariantCallingMetrics |
Pipeline 3 Somatic short variant discovery | 3.3 | ||||||
PoN | Bood | 1 | 64,000 | 1 | 2880 | 0.6 | GATK Mutect2 |
Consolidate | Blood | 2 | 3200 | 1 | 96 | 0.3 | GATK GenomicsDBImport |
Create PoN | Blood | 1 | 3200 | 1 | 960 | 1.6 | GATK CreateSomaticPON |
Variant call | Pair | 1 | 64,000 | 1 | 2880 | 0.8 | GATK Mutect2 |
Pipeline 4 Structural variant discovery | 23 | ||||||
GRIDSS | Pair | 8 | 20 | 20 | 8 | Range, 10–20 | GRIDSS v2.8.3 |
Manta | Pair | 24 | 20 | 2 | 48 | 3.0 | Manta v1.6.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jiang, J.; Samaha, G.; Willet, C.E.; Chew, T.; Hayes, V.M.; Jaratlerdsiri, W. Scaling for African Inclusion in High-Throughput Whole Cancer Genome Bioinformatic Workflows. Cancers 2025, 17, 2481. https://doi.org/10.3390/cancers17152481
Jiang J, Samaha G, Willet CE, Chew T, Hayes VM, Jaratlerdsiri W. Scaling for African Inclusion in High-Throughput Whole Cancer Genome Bioinformatic Workflows. Cancers. 2025; 17(15):2481. https://doi.org/10.3390/cancers17152481
Chicago/Turabian StyleJiang, Jue, Georgina Samaha, Cali E. Willet, Tracy Chew, Vanessa M. Hayes, and Weerachai Jaratlerdsiri. 2025. "Scaling for African Inclusion in High-Throughput Whole Cancer Genome Bioinformatic Workflows" Cancers 17, no. 15: 2481. https://doi.org/10.3390/cancers17152481
APA StyleJiang, J., Samaha, G., Willet, C. E., Chew, T., Hayes, V. M., & Jaratlerdsiri, W. (2025). Scaling for African Inclusion in High-Throughput Whole Cancer Genome Bioinformatic Workflows. Cancers, 17(15), 2481. https://doi.org/10.3390/cancers17152481