Background: Artificial intelligence (AI) is reshaping oncology at every stage of the cancer care pathway, from population-level screening through molecular diagnosis, treatment planning, and post-treatment surveillance. Despite an exponential growth in AI oncology publications exceeding 5000 peer-reviewed studies annually, a critical and persistent
[...] Read more.
Background: Artificial intelligence (AI) is reshaping oncology at every stage of the cancer care pathway, from population-level screening through molecular diagnosis, treatment planning, and post-treatment surveillance. Despite an exponential growth in AI oncology publications exceeding 5000 peer-reviewed studies annually, a critical and persistent gap separates demonstrated algorithmic performance from genuine patient benefit. Most published evidence derives from retrospective, single-institution studies conducted in curated dataset environments that systematically differ from real-world clinical deployment conditions. This comprehensive review examines the translational maturity of AI applications across 18 major malignancies, providing an evidence-stratified, cross-cancer assessment of where AI has fulfilled, approaches, or remains far from fulfilling its transformative potential in oncological care. Methods: A structured narrative review was conducted across PubMed/MEDLINE, Embase, IEEE Xplore, and the Cochrane Library, supplemented by regulatory grey literature including FDA 510(k) decision summaries, CE Technical Files, and ClinicalTrials.gov. Search terms combined cancer site-specific terminology with AI methodology terms and translational outcome descriptors. Studies were only included if they applied an AI or machine learning methodology to a defined clinical oncological task, reported a clearly specified performance evaluation, and involved human subjects or human-derived clinical data. Evidence quality was assessed using QUADAS-2, PROBAST, and Cochrane RoB 2. A five-tier translational readiness framework, grounded in the NIH T0–T4 translational spectrum and CONSORT-AI/SPIRIT-AI guidelines, was applied a priori to enable cross-cancer comparison. A rigorous distinction was maintained between diagnostic accuracy and clinical utility, defined as demonstrated impact on clinical decision-making or patient-centered outcomes. Results: Across all 18 malignancies, AI development varied profoundly by cancer type. Breast cancer and prostate cancer (Tier 1) represent the most mature AI ecosystems, with multiple FDA-cleared tools for mammographic screening and digital pathology achieving prospective multi-institutional validation; however, randomized evidence demonstrating reduced cancer-specific mortality remains absent. Lung, hepatocellular, and melanoma AI (Tier 2) have achieved regulatory milestones but face documented performance disparities across demographic subgroups, including DermaSensor’s 20.7% specificity in primary care settings and HCC model failures in non-viral disease etiologies. Colorectal, glioma, pancreatic, and ovarian cancers (Tier 3) exhibit technical maturity without clinical clarity: colorectal CADe systems increase adenoma detection but meta-analyses of 18,232 patients across 21 RCTs fail to demonstrate improvement in advanced neoplasia detection or cancer incidence reduction. A full study-level presentation of pooled estimates, confidence intervals, and heterogeneity statistics for each cited randomized evidence base across all cancer types would extend beyond the intended scope and format of this cross-cancer narrative review. Gastric, esophageal, cervical, bladder, head and neck, and endometrial cancers (Tier 4) demonstrate promising single-institutional or geographically restricted results without multi-institutional external validation, particularly notable for cervical cancer AI’s transformative potential in low- and middle-income countries constrained by absent regulatory frameworks. Hematologic malignancies, sarcoma, and pediatric solid tumors (Tier 5) face structural barriers, workflow incompatibility in hematopathology, extreme rarity in sarcoma (>70 subtypes, <15,000 US cases annually), and irreducible ethical constraints in pediatric data governance, that cannot be resolved through algorithmic refinement alone. Conclusions: Oncological AI has not yet fulfilled its clinical promise. Across all five translational tiers, a single finding is consistent: diagnostic accuracy is not a surrogate for patient benefit. AI tools with high sensitivity and specificity have repeatedly failed to demonstrate equivalent reductions in cancer-specific mortality, overdiagnosis, or procedural harm under real-world outcome scrutiny. Simultaneously, documented performance disparities across races, ethnicity, disease etiology, and geographic setting reveal that current AI systems risk amplifying the very health inequities they are positioned to resolve. Bridging this translational gap requires three coordinated systemic shifts: regulatory frameworks mandating post-market outcome surveillance as a condition of clinical clearance; prospective trial designs measuring patient-centered endpoints rather than diagnostic concordance alone; and sustained infrastructure investment in federated data governance, demographically inclusive training datasets, and LMIC-accessible regulatory pathways. AI holds genuine potential to reduce cancer mortality on a global scale—but only if held to the evidentiary and equity standards that the stakes of oncological care demand.
Full article