Next Article in Journal
Integrating Speech Recognition into Intelligent Information Systems: From Statistical Models to Deep Learning
Previous Article in Journal
Balancing Layout Space and Risk Comprehension in Health Communication: A Comparison of Separated and Integrated Icon Arrays
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

From Mammogram Analysis to Clinical Integration with Deep Learning in Breast Cancer Diagnosis

Science and Innovation Center “Artificial Intelligence”, Astana IT University, Astana 010000, Kazakhstan
*
Authors to whom correspondence should be addressed.
Informatics 2025, 12(4), 106; https://doi.org/10.3390/informatics12040106
Submission received: 26 June 2025 / Revised: 26 September 2025 / Accepted: 30 September 2025 / Published: 2 October 2025
(This article belongs to the Section Medical and Clinical Informatics)

Abstract

Breast cancer is one of the main causes of cancer-related death for women worldwide, and enhancing patient outcomes still depends on early detection. The most common imaging technique for diagnosing and screening for breast cancer is mammography, which has a high potential for early lesion detection. With an emphasis on the incorporation of deep learning (DL) techniques, this review examines the changing role of mammography in early breast cancer detection. We examine recent advancements in DL-based approaches for mammogram analysis, including tasks such as classification, segmentation, and lesion detection. Additionally, we assess the limitations of traditional mammographic methods and highlight how DL can enhance diagnostic accuracy, reduce false positives and negatives, and support clinical decision-making. The review emphasizes the potential of DL to assist radiologists in clinical decision-making, as well as increases in diagnostic accuracy and decreases in false positives and negatives. We also discuss issues like interpretability, generalization across populations, and data scarcity. This review summarizes the available data to highlight the revolutionary potential of DL-enhanced mammography in breast cancer screening and to suggest future research avenues for more reliable, transparent, and clinically useful AI-driven solutions.

1. Introduction

Breast cancer is the most commonly diagnosed cancer worldwide and remains one of the leading causes of cancer-related mortality in women. In 2020, more than 2.3 million women were diagnosed, and approximately 685,000 deaths were recorded globally [1]. Early detection substantially improves treatment outcomes, reduces mortality, and lowers the need for aggressive interventions. Large studies and meta-analyses have shown that regular mammography screening can reduce breast cancer mortality by about 20% in screened populations [2,3].
Breast cancer is a heterogeneous disease with multiple subtypes. It is broadly classified as non-invasive (in situ) or invasive. In situ cancers, such as ductal carcinoma in situ (DCIS), remain confined to the ducts, whereas invasive cancers spread into surrounding tissue [4]. Invasive ductal carcinoma (IDC) accounts for 70–80% of cases, followed by invasive lobular carcinoma (ILC). Other clinically important subtypes include triple-negative breast cancer and inflammatory breast cancer [4]. These subtypes highlight the biological and clinical diversity of breast cancer, influencing detection and management strategies.
The Breast Imaging Reporting and Data System (BI-RADS) provides standardized terminology and assessment categories to guide mammographic interpretation and management. Radiologists assign categories from 0 to 6, with higher numbers indicating greater suspicion of malignancy; for example, BI-RADS 5 denotes a lesion with more than 95% probability of cancer [5]. This system reduces ambiguity, supports consistent communication, and informs decisions on biopsy or follow-up.
Several imaging modalities are used in breast cancer detection. Ultrasound is valuable for dense breast tissue or lesion characterization, magnetic resonance imaging (MRI) offers high sensitivity for high-risk patients, and digital breast tomosynthesis (DBT) improves lesion visibility and reduces false positives compared with standard mammography [6]. Despite these advances, digital 2D mammography remains the primary tool for population-level screening due to its accessibility, cost-effectiveness, and diagnostic value for small calcifications and masses. Its limitations—reduced sensitivity in dense tissue, variability in interpretation, false positives, and false negatives—underscore the need for methodological innovation.
Recent progress in artificial intelligence (AI), particularly deep learning (DL), has advanced image analysis in medicine. Convolutional neural networks (CNNs) have achieved strong performance in classification, detection, and segmentation tasks. Applied to mammography, DL models can serve as decision-support tools by providing consistent image interpretation and assisting radiologists in high-volume screening settings. AI has the potential to reduce diagnostic variability, mitigate reader fatigue, and improve workflow efficiency, although its clinical value remains contingent on validation in real-world settings.
Prior reviews on AI in mammography have provided important insights but with varying scope. Early systematic reviews summarized test-accuracy results and demonstrated that algorithms can approach the performance of individual radiologists under controlled conditions [7,8]. Broader surveys included multiple imaging modalities and tasks such as risk stratification and density assessment [9,10]. More recent syntheses emphasized mammography, reporting external validations and prospective studies [11,12], while others examined risk prediction rather than diagnostic tasks [13].
Building on this literature, our review focuses specifically on image-based diagnostic deep learning for 2D mammography, covering both screening and diagnostic views. We summarize detection, segmentation, and classification methods, including recent vision–language models (VLMs), and examine evidence for clinical integration through prospective deployments. In contrast to broader or cross-modality reviews, this work provides a focused synthesis of methods, validation strategies, clinical applications, and outstanding challenges, updated through mid-2025.
Despite progress, several barriers limit routine clinical implementation of DL in mammography. Evidence remains dominated by retrospective dataset evaluations, with limited large-scale prospective validation. Model interpretability is limited, and black-box designs hinder clinician trust and regulatory approval. Generalizability is also a challenge, as performance varies across populations, imaging technologies, and healthcare settings, raising concerns about bias and equity. Data scarcity, particularly of well-annotated datasets representing diverse populations, exacerbates these issues. Finally, workflow integration remains difficult, requiring AI systems to align with radiologists’ reading protocols and existing infrastructure.
To address these gaps, this review achieves the following:
  • Summarizes state-of-the-art methods for detection, segmentation, and classification;
  • Evaluates DL-based mammography workflows and reported technical performance;
  • Reviews prospective deployments and trials of AI-supported screening;
  • Compares methods and highlights their limitations;
  • Discusses emerging directions such as explainability and multi-task learning.
This systematic review provides an updated assessment of deep learning in mammography diagnosis, combining evidence from clinical studies and AI research. It aims to clarify the current state of the field, the methodological and clinical challenges, and directions for future development.

2. Methods and Materials

2.1. Research Questions

To achieve the purposes of the literature review, research questions (RQs) were formulated. They were set to investigate all the aspects of the existing research on breast cancer diagnosis and detection using deep learning technologies. All research questions are listed in the Table 1.

2.2. PRISMA

In this study, the PRISMA strategy is employed. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guide is a useful method to organize literature review and reports the aims and the results of the study [14]. Paper identification, selection, and further examination were performed according to PRISMA standards.
The PRISMA flow diagram is used to demonstrate the literature search stage by stage. It illustrates the paper screening and filtering process with the numbers of papers eliminated after each step. Figure 1 displays the PRISMA flow diagram of our literature review.

2.3. Data Sources and Search Strategy

This query was executed without modification in each database, ensuring consistency in retrieval of relevant studies. Searches were performed on IEEE Xplore (1 June 2025), SpringerLink (1 June 2025), Elsevier Scopus (2 June 2025), and MDPI Journals (3 June 2025). As a time frame for the review, a period was taken from 1 January 2020, to 1 June 2025. Scholarly research articles and conference proceedings were obtained from reputable research databases: IEEE, Springer, Elsevier, MDPI, and a few others. The search was limited to titles, abstracts, and author keywords (or their closest database-specific equivalents), rather than full text, to maximize relevance while avoiding retrieval of articles that mention the terms only peripherally. The appropriate topic-related keywords were selected and used to make search queries with the help of AND/OR logical operators. Figure 2 demonstrates the combination of keywords used. The following Boolean search query was applied uniformly across all databases (IEEE Xplore, SpringerLink, Elsevier/Scopus, MDPI):
(“Mammography” OR “Mammogram”) AND (“Breast cancer” OR “Breast tumor”) AND (“Detection” OR “Diagnosis” OR “Classification” OR “Segmentation”) AND (“Deep learning” OR “Machine learning”).
Figure 3 displays the distribution of the papers with regard to the databases in which they are published. As it can be seen, a total of 17, 4, 12, 11, and 3 papers belonged to IEEE, Springer, Elsevier, MDPI, and Other, respectively.
Table 2 shows the overview of the number of retrieved papers for each examined task. Figure 4 demonstrates the distribution of works by publication year. To address the suggested research questions, a selection of papers underwent extensive investigation. The details of each data modality, the kinds of problems they were used to solve, the models and techniques they employed, and current trends were all taken into account independently. Following that, a follow-up discussion and analysis of the limitations, difficulties, and potential future of multimodal learning were conducted for all modalities combined.

2.4. Exclusion and Inclusion Criteria

In the identification stage, 390 papers were retrieved using the search query that included all relevant keywords. After removing 188 duplicate records, 202 papers remained for retrieval. A total of 16 of these could not be retrieved in full text, leaving 186 studies for title and abstract screening.
During the screening phase, 107 studies were excluded at the title and abstract level because they did not meet one or more of the predefined inclusion criteria listed in Table 3. The breakdown of these exclusions was as follows: studies written in other languages ( n = 12 ), published before 1 January 2020 ( n = 18 ), focused on prognosis, meta-analyses, or clinical decision-making ( n = 21 ), did not use deep learning or machine learning methods ( n = 24 ), used non-mammography imaging modalities such as ultrasound, MRI, or CT ( n = 19 ), or were preprints, retracted papers, review papers, or dissertations ( n = 13 ).
Seventy-nine full-text articles were then assessed for eligibility. At this stage, 32 reports were excluded for the following reasons:
  • Focused on prognosis, meta-analyses, or clinical decision-making without reporting primary algorithmic results on mammograms ( n = 23 );
  • Lacked sufficient scientific novelty or methodological rigor ( n = 9 ).
A total of 47 studies met the inclusion criteria and were retained for review.
All inclusion and exclusion criteria were defined in accordance with PRISMA guidelines and are summarized in Table 3. The scope of this review was limited to image-based diagnostic deep learning methods for 2D mammography (screening and diagnostic views) focused on detection, segmentation, and classification tasks.
All retrieved records were managed in Zotero for organization and duplicate removal. Screening was performed manually by one primary reviewer, with two additional reviewers cross-checking the results. Discrepancies were resolved by consensus. No automation tools were used in the selection or data extraction process.
Data extraction was performed manually using a structured spreadsheet, capturing study objectives, datasets, deep learning architectures, performance metrics, and task categorization. All extracted data were cross-checked by independent reviewers. Discrepancies were resolved by consensus. No contact with study investigators was made to obtain or confirm data.
Table 4 summarizes the main outcome domains considered in the review and the performance metrics extracted from each study.

2.5. Risk of Bias Within Studies

Based on the QUADAS-2 tool adapted for deep learning diagnostic studies, the risk of bias assessment for the 47 included articles is summarized in Table 5. Overall, nine studies (19.15%) were rated as having a low risk of bias across all four domains.
In the dataset representative domain, 23 studies (48.94%) had an unclear risk of bias due to limited information on how datasets were selected or split.
For the index test domain, 35 studies (74.47%) were rated as low risk, supported by the use of cross-validation strategies, reproducible deep learning pipelines, and clearly described architectures.
In the reference standard domain, 30 studies (63.83%) were assessed as low risk, with labels typically derived from radiologist annotations or biopsy-confirmed datasets such as INbreast, CBIS-DDSM, or OPTIMAM.
Regarding flow and timing, 41 studies (87.23%) demonstrated a low risk of bias. These studies applied clearly defined data partitions and ensured consistent application of the index and reference standards.
For the applicability concerns component, 26 studies (55.32%) were considered to have low concern, while others presented moderate to high concerns due to outdated datasets, simulated data, or weak alignment with clinical workflows. These issues impact the generalizability of AI models to real-world screening settings.
Table 6 show the risk and bias applicability across included studies.
To assess the risk of bias due to missing results in the synthesis, we evaluated the potential for selective outcome reporting and publication bias by examining whether studies consistently reported performance metrics across comparable tasks (e.g., AUC, sensitivity, and specificity for detection/classification tasks, or Dice score for segmentation). Studies that failed to report key evaluation metrics or that only reported performance for the best-performing configuration (e.g., best model or dataset fold) without full disclosure of all tested configurations were flagged as potentially at risk for selective reporting.
We also qualitatively assessed publication bias by noting a strong skew toward studies reporting high performance, with few studies discussing negative or suboptimal results. However, no formal statistical methods (e.g., funnel plots or Egger’s test) were applied due to the heterogeneity in datasets, evaluation setups, and outcome definitions across studies. As a result, while we acknowledge the possibility of reporting bias, the lack of comparable quantitative outcomes limited the ability to fully assess its impact.

3. Results

The primary results of recent deep learning (DL) techniques used for mammography analysis are shown in this section. We examine how cutting-edge techniques have been created and assessed for the fundamental tasks of classification, segmentation, and detection. Computer-aided diagnosis (CAD) systems rely on these vision tasks to assist radiologists by automatically identifying lesions, defining their borders, and identifying if they are malignant. Each task’s progress is highlighted in the ensuing subsections, which also emphasize the architectures employed, the datasets used, and the performance attained in both clinical and experimental situations.

3.1. Key Vision Tasks in Mammogram Interpretation

3.1.1. Detection

The task of lesion detection in mammography involves not only predicting the presence of abnormalities but also localizing them within the images (e.g., via bounding boxes or segmentation masks). In the 2020–2025 period, deep learning-based detection methods in mammography have significantly advanced, adapting state-of-the-art object detection frameworks (one-stage and two-stage detectors) and even emerging transformer architectures to the domain. Figure 5 shows an example of detection in mammography. Unlike classification (which outputs a patient- or breast-level diagnosis) or segmentation (which outlines lesion boundaries), detection algorithms specifically aim to pinpoint suspicious lesions (masses, calcifications, etc.) on the mammogram and often provide a confidence score for each finding. Recent works have emphasized improving detection accuracy (sensitivity) at low false positive rates, a critical factor for clinical viability in screening programs [15].
For the detection of mammogram lesions, various approaches adapt object detection models like RetinaNet, YOLO, and Faster R-CNN. For example, Agarwal et al. [22] presented one of the first deep learning benchmarks on a large mammography dataset (the OPTIMAM database), using a Faster R-CNN architecture to detect breast masses. Their framework achieved about 87% true positive rate (TPR) at ∼0.8 false positives per image (FPPI) on a subset of over 7000 images. Two-stage detectors (R-CNN family) benefit from robust region proposals but can be computationally heavy; nonetheless, they have been effective when sufficient training data are available. On the other hand, one-stage detectors like YOLO have been particularly popular for mammography due to their speed and simplicity. Aly et al. [19] developed a YOLOv3-based system with optimized anchors for detecting breast masses, reporting over 92% TPR at only 0.09 FPPI on INbreast. Similarly, Yan et al. [16] proposed a dual-view detection model where YOLOv3 region proposals from both CC and MLO views are combined by a Siamese network to enforce cross-view consistency, yielding a TPR around 96% at 0.26 FPPI on INbreast. These results underscore that incorporating mammography-specific insights (like using both views of a breast) can substantially improve detection performance. One-stage models have also been extended to detect multiple lesion types; for instance, Baccouche et al. [24] designed a YOLO-based pipeline to simultaneously localize and categorize masses, calcifications, and architectural distortions, achieving high accuracy on a multi-class dataset.
Because mammographic lesions vary greatly in size (from tiny microcalcifications to large masses), recent detectors often include multi-scale feature pyramids or anchor-free designs to better handle small lesions. Cao et al. [17] introduced an anchor-free one-stage detector (BMassDNet) for mass detection with a feature pyramid network, which achieved over 93% sensitivity at ∼0.5 FPPI on INbreast. An anchor-free variant of YOLO was also explored by Zhang et al. [18] to address the bias of preset anchor boxes; their model outperformed a conventional anchor-based YOLOv3, reaching 95% TPR at 1.7 FPPI on INbreast. These advances indicate that tuning the detector architecture to mammography’s requirements (small object detection, high class imbalance, etc.) yields tangible benefits. Recent studies have further optimized detection systems by combining many deep learning approaches. For example, Manalı et al. [23] proposed a three-channel system combining Support Vector Machine (SVM) and CNNs with decision fusion, achieving an overall accuracy of 99.1% on mammogram images. Another notable trend is the integration of segmentation into detection frameworks to improve localization precision. Mask R-CNN, which adds a segmentation branch to Faster R-CNN, has been applied to mammograms to simultaneously detect lesions and generate pixel-level masks. For example, transfer-learning a Mask R-CNN on the CBIS-DDSM dataset (which provides polygonal lesion annotations) achieved about 80% mean average precision (mAP)—roughly a radiologist-level performance—on that set. Such instance segmentation approaches can better delineate lesion extent compared to plain bounding boxes, which is useful for downstream assessment (e.g., measuring lesion size). Similarly, Su et al. [51] proposed a hybrid “YOLO-Logo” model that fuses YOLO with transformer-based segmentation for masses, showing improved detection and outline accuracy in mammograms. Some frameworks also perform an auxiliary benign/malignant classification on each detected region (turning the system into a CADx tool). This multi-task learning was demonstrated by Baccouche et al. [24], whose YOLO-based detector not only localized lesions but also classified them by type (mass vs. calcification vs. distortion) during the detection process. The inclusion of such diagnostic labeling is closely related to detection and can enhance the clinical utility of the AI system, although it overlaps with the scope of the classification task (discussed in the previous section).
Transformer-based models have started to influence mammography detection as well. Vision Transformers (ViTs) and transformer-driven detectors (e.g., DEtection TRansformers (DETR)) can model global context, which may be advantageous in breast imaging, where comparing patterns across multiple views or across a large image is important. Chen et al. [20] proposed a Vision Transformer that processes the four mammographic views (L-CC, R-CC, L-MLO, R-MLO) concurrently, achieving a classification AUC exceeding 0.81 on a private dataset. This highlights transformers’ capability to utilize cross-view relationships, offering significant potential for advancing breast cancer detection. For lesion detection specifically, Betancourt et al. [26] integrated a Swin Transformer backbone into a detection pipeline, reporting a sensitivity of 95.7% at 65% mAP on combined CBIS-DDSM and INbreast test sets. Notably, transformer-based detectors only began appearing in the mammography literature around 2022, so this is an emerging area. Early results, such as the Swin-SFTNet model of Kamran et al. [52] (a Swin-Transformer U-Net for “micro-lesion” segmentation), show that transformers can outperform CNN architectures in detecting subtle mammographic findings (evaluated on pixel-level lesion segmentation). We anticipate more transformer-driven detection models in mammography, potentially combining detection and segmentation tasks in a unified framework.
In addition to fully supervised approaches, researchers have investigated methods to reduce annotation demands for breast cancer detection. One promising technique is unsupervised anomaly localization, where models learn the patterns of normal mammograms and identify regions that differ as potential abnormalities. These methods do not require lesion bounding box labels; instead, they rely on modeling normal breast tissue appearance. For example, Park et al. [21] trained a StyleGAN2 generative model on only normal mammograms and then identified anomalies by detecting discrepancies between real images and the generator’s reconstructions. This approach achieved high sensitivity in distinguishing cancer-containing images by highlighting suspicious regions, demonstrating the potential for anomaly detectors to act as a second reader in screening. Other works have used autoencoders or image-to-image translation frameworks to similar effect, detecting lesions as the differences between an input mammogram and its reconstructed “normal” version (with the assumption that the reconstruction will fail to reproduce tumor-specific details). While these unsupervised methods may not pinpoint lesions as precisely as supervised detectors, they are valuable in scenarios with limited annotated data or for identifying novel or atypical lesions that were not present in the training set.
Overall, deep learning-based detection in mammography has rapidly matured. Early in this period, studies demonstrated that CNN-based detectors could at least match the performance of traditional computer-aided detection (CAD) systems, while more recent works have pushed toward or even exceeded radiologist-level performance on certain tasks [15]. The combination of improved models (one-stage, two-stage, and transformer-based), multi-view and multi-task learning strategies, and larger high-quality datasets has led to detectors that can identify breast lesions with impressive accuracy. Reducing false positives remains a key challenge to enhance the adoption of deep learning in breast cancer screening, alongside the need to detect minute abnormalities like individual microcalcifications within clusters. Additionally, ensuring that the models generalize effectively across diverse populations and varying imaging conditions is crucial for robust performance. The incorporation of context from multiple views or prior exams, as well as unsupervised pre-training or anomaly detection techniques, are promising strategies to continue advancing the field. In summary, lesion detection has become a cornerstone of AI in mammography, bridging the gap between image-level predictions and clinically actionable findings by indicating where suspicious lesions are located, not just whether they are present. For a comprehensive overview of recent developments in deep learning for breast cancer imaging, including detection and beyond, readers are referred to Carriero et al. [53].
The Figure 6 illustrates the YOLO-based pipeline for breast cancer detection in mammograms, as adapted from Prinzi et al. [54]. The workflow begins with dataset preparation from CBIS–DDSM, INbreast, and a proprietary dataset, where images are pre-processed (CLAHE histogram equalization, resizing, and augmentation) and divided into training, validation, and test sets. Subsequently, YOLO-based models are trained with augmented data, leveraging transfer learning across datasets. The final YOLOv5s mass detector outputs predictions on mammograms, producing bounding box detections and corresponding heatmaps that highlight regions of interest.
A comparative overview of the detection approaches discussed above is provided in Table 7, summarizing their typical strengths, limitations, target tasks, and representative studies.

3.1.2. Segmentation

Segmentation of mammograms is a complex task in computer-aided diagnosis of breast cancer. This complexity is due to the density of breast tissue, which leads to difficulties in identifying abnormal regions [28]. Furthermore, a wide range of abnormalities can be detected on the mammography images, which complicates the process even more. Radiologists routinely analyze mammograms for several suspicious characteristics that may suggest cancer. This encompasses asymmetries in tissue density between the breasts, architectural distortions that change the normal breast structure, shadowy regions or masses with irregular shapes and poorly defined borders, and microcalcifications—minute calcium deposits that may cluster and indicate potential early oncogenic alterations [29]. In the diagnostic process, standard segmentation tasks in mammography include mass segmentation, which identifies localized, often irregularly shaped lesions with varying density and appearance (as illustrated in Figure 7); calcification segmentation, focused on detecting clustered microcalcifications; pectoral muscle segmentation, particularly in mediolateral oblique (MLO) views; and breast boundary segmentation, which differentiates the breast from the background.
Recent advances in breast mass segmentation using mammography images have demonstrated a wide range of methodologies and performance levels, often leveraging convolutional neural networks and hybrid approaches to tackle the challenges of precise tumor localization, especially in dense breast tissues. There is an increasing focus on the use of deep transfer learning (DTL) and advanced U-Net architectures to enhance mass segmentation and classification in mammographic images.
A notable study by Tiryaki [30] proposed a cascaded deep learning pipeline that integrates a novel U-Net model with Xception-encoded weights for mass lesion segmentation, followed by DTL-based classification. This approach was applied to the Breast Cancer Digital Repository (BCDR) [55] dataset. In the segmentation stage, the authors systematically compared various U-Net variants, including basic five-layer U-Nets as well as attention U-Net, residual U-Net, MultiResUnet, DeepLabV3+, and U-Net++. The proposed U-Net++ with Xception encoder achieved a Dice Similarity Coefficient (DSC) of 63.56%, Intersection over Union (IoU) of 54.08%, and AUC of 78.29% for mass segmentation. These results establish a strong benchmark for mass segmentation on the BCDR dataset, supporting the idea that advanced U-Net variants combined with high-performing encoder backbones can significantly improve segmentation performance on complex mammography tasks.
A notable study by Ghantasala et al. [27] employed a U-Net-based convolutional neural network on the CBIS-DDSM dataset, focusing on both binary and multilabel segmentation. Their analysis showed that segmentation accuracy was significantly higher in images with lower breast density. Binary segmentation achieved a robust Dice score of 81%, while multilabel segmentation lagged behind due to severe class imbalance.
Similarly, Ahmed et al. [34] proposed CoAtNet-Lite, a lightweight hybrid model combining convolutional layers with attention mechanisms. This model stood out for its efficiency, requiring only 22 million parameters—achieved by freezing the last 20 layers—compared to the 25 million used by other models. It reported a precision of 77%, recall of 78%, and an F1-score of 77%, outperforming conventional models like DenseNet-169, ResNet-50, and Inception-V3, as well as classical machine learning models such as Random Forest and Gradient Boosting. Importantly, the model also prioritized clinical interpretability by incorporating LIME and Grad-CAM for visual explanations, enhancing trust in medical predictions.
Demil et al. [35] took a different route by analyzing cost-effectiveness and hardware optimization. Using the MIAS dataset and the NeuroMem NM500 chip, they applied unsupervised segmentation through both statistical and threshold-based methods. Mammograms were divided into 4 × 4 blocks and clustered using the chip’s distance-learning capabilities. The threshold-based approach outperformed the statistical one, achieving tumor detection rates of 83.33% for benign cases and 82.97% for malignant ones, with average region intersection scores exceeding 85%.
Another significant contribution was from Bentaher et al. [40], who introduced a ResNet-UNet segmentation model trained on the INbreast dataset. The model effectively merged ResNet’s strong feature extraction abilities with U-Net’s spatial reconstruction. Evaluation results were promising: 91% precision, 66% recall, 76% Dice coefficient (F1-score), and 62% Intersection over Union (IoU), showcasing the model’s strength in correctly identifying true mass regions while maintaining a reasonable balance between sensitivity and specificity.
Another method, built on U-Net and SegNet, Enhanced SegNet pushed these improvements even further [41]. Trained on the CBIS-DDSM dataset, this model achieved an impressive 97.12% precision and 95.30% recall, with an inference time of just 102 milliseconds, indicating its readiness for fast-paced clinical deployment.
Another variation of U-Net is used for lesion segmentation with an attention-based sampling scheme and multi-scale feature fusion on datasets with minimal data [32]. They utilize a soft attention mechanism with similarity weights from a multi-layer perceptron. The models were evaluated on Breast A, B, C, and D datasets and achieved the dice coefficient of 46.2%.
Another work proposes a novel group of loss functions, Adaptive Sample-Level Prioritizing (ASP) losses, for the task of mass segmentation utilizing the attention U-Net model [31]. Loss functions are designed to include extra image-specific information, such as mass size ratio and breast tissue density, to improve training for segmentation models. Using D-ASP (Density-ASP), Attention U-Net reached the dice coefficient of 74.59% on the INBreast dataset, and using R-ASP (Ratio-ASP), it achieved the dice coefficient of 74.18% on INBreast as well, which showed increased performance.
Ali et al. [36] explored a hybrid framework combining M3D-Neural Cellular Automata (M3D-NCA) and Shape-Guided Segmentation (SGS) to refine tumor boundaries. M3D-NCA used 3D convolutions across 2D slices to model spatial and contextual growth, while SGS contributed anatomical shape priors using Superpixel Pooling and Unpooling Modules. This fusion led to a Dice Score of 62.71%, an mIoU of 54.10%, and an accuracy of 84.15%. Notably, it did so with only 14.31 million parameters and 25.33 GFLOPs, outperforming well-known models like UNet, ResUNet, and TransUNet in both accuracy and efficiency.
In a related direction, Farrag et al. [38] modified DeepLabv3+ by adding a double-dilated convolution module. The illustration of the pipeline is depicted in Figure 8. This innovation preserved local resolution while expanding the receptive field. By applying parallel dilations—one low for dense core features and one high for broader context—the model maintained fine-grained information and improved the Dice score to 81%. Moreover, it halved the miss detection rate from 8% to 4%, keeping the false positive rate steady at 20%. Explainability was addressed with Grad-CAM and Occlusion Sensitivity, where Grad-CAM produced more focused heatmaps, as validated through entropy and pixel-flipping analyses.
Another unique approach came from Patil et al. [42], who used an Optimized Region Growing (ORG) algorithm enhanced by Dragonfly Optimization (DGO) to automatically select seed points and intensity thresholds. This adaptive method outperformed conventional approaches like Fuzzy, Adaptive, and Weighted Region Growing. It achieved an IoU of 97.54%, Rand Index of 95.21%.
Further combining traditional and modern methods, Nour and Boufama [39] developed a hybrid framework that integrated U-Net with an active contour model (ACM). Initially, U-Net performed coarse segmentation, followed by ACM refining the tumor boundaries using localized energy minimization. This model handled tumors with unusual shapes and low contrast particularly well, achieving a Dice coefficient of 81.3%, an IoU of 89.1%, and an overall accuracy of 97.34%, outperforming models like VGG16, VGG19, and DeepLabV3.
Masood et al. [37] contributed a highly effective model based on a modified Swin Transformer (mST) enhanced with Multi-Level Adaptive Feature Fusion (MLAFF). This approach strategically integrated low- and high-level features using spatial attention mechanisms and was evaluated on seven benchmark datasets, including INBreast, DDSM, MIAS, CBIS-DDSM, MIMBCD-UI, KAU-BCMD, and Mammographic Masses. The pipeline combined CNN-based preprocessing with Swin Transformer blocks utilizing Local and Global Transferable Multi-Head Self-Attention. Their MLAFF module significantly improved skip connections, leading to top-tier results: a Dice score of 98.7%, IoU of 94%, F1-score of 91%, precision of 90%, and recall of 96%, marking this framework as one of the most accurate and generalizable across datasets.
Finally, a comprehensive comparison by Hithesh et al. [33] using the CBIS-DDSM dataset evaluated a variety of segmentation models—UNet, UNet++, Attention-UNet, SegNet, and the Segment Anything Model (SAM). The dataset was augmented and balanced to address class imbalance, with preprocessing through Gaussian filtering and CLAHE, followed by Otsu’s thresholding for postprocessing. UNet++ emerged as the best performer with a Dice score of 77%, an IoU of 79%, and a 99% accuracy. Although SAM had strong pretrained capabilities, it underperformed (a Dice score of 39% and an IoU of 41%) due to its lack of domain-specific fine-tuning.
Table 8 provides a summary of the main segmentation approaches discussed, outlining their strengths, limitations, and representative works in mammographic lesion segmentation.

3.1.3. Classification

A crucial step in clinical workflows is the classification of breast tumors, which are usually divided into benign and malignant categories. This helps determine whether additional diagnostic procedures are required. CNNs and other DL techniques have emerged as the most popular method for automating this task in recent years. These models have been used in full-image classification frameworks as well as ROI-based pipelines because they are excellent at learning intricate visual patterns. Even though some methods use manually annotated tumor regions or refined models with little external testing, their applicability in the real world is frequently still in doubt. In this context, StethoNet [46] presents a robust and interpretable DL-based framework for breast cancer classification from full-field mammograms, aiming to improve generalization across different datasets and imaging devices. Instead of relying on ROI annotations, StethoNet uses entire mammograms and combines five pre-trained CNN architectures. The model was trained on the CMMD dataset and tested on two external datasets (INbreast and Vindr-Mammo) without any fine-tuning. Additionally, it employs two innovative strategies for handling the two mammography views (CC and MLO): a soft voting ensemble and a merged-view image approach. Both pathways showed strong performance, with AUC scores of 90.7% on CMMD, 83.9% on INbreast, and 85.7% on Vindr-Mammo. These results surpass many prior methods, particularly those requiring ROI extraction or internal fine-tuning on test sets.
Another study by Elkorany et al. [43] presents a hybrid deep learning framework for classifying mammographic image patches into normal, benign, and malignant categories using the MIAS dataset. The proposed system combines three pre-trained convolutional neural networks—Inception-V3, ResNet50, and AlexNet—as feature extractors. To improve efficiency and reduce redundancy, the authors introduce the Term Variance (TV) algorithm as a novel feature selection method in the breast cancer imaging domain. After extracting features from each CNN, TV is used to select high-variance, discriminative features, which are then fed into a Multiclass Support Vector Machine (MSVM) classifier. The feature selection process is applied in two stages: first reducing 8192 features to 3500, then further narrowing down to 600 features. The system achieves a peak classification accuracy of 100% at a 90% training split, outperforming many existing models.
CNNs, by design, are limited in how they view an image—they see in fragments. They focus on local regions, require extensive preprocessing, and struggle with rotation or scale variance. The overall structure of the CNN-based approach is given in Figure 9. Moreover, their computational complexity increases with depth, and they often fail to capture global context, which is vital when assessing full mammographic images. As mammograms can vary greatly across devices and populations, this local view becomes a serious bottleneck. This is where transformers—originally developed for natural language processing—step in, reshaping how images are analyzed. Instead of scanning locally, vision transformers (ViTs) divide an image into patches and allow every patch to attend to every other patch, enabling the model to understand the image as a whole right from the start.
The study titled “Vision-Transformer-Based Transfer Learning for Mammogram Classification” by Ayana et al. [44] builds on this very strength. It tells the story of a team of researchers who decided to challenge the CNN dominance in mammogram analysis by turning to transformer-based models. Their goal was to classify mammograms as benign or malignant—not by looking at cropped regions, but by analyzing full images with holistic attention. The real innovation came from the models they used. They didn’t just rely on one type of transformer—they used three state-of-the-art architectures: the classic Vision Transformer (ViT), the Swin Transformer with windowed attention for better locality, and the Pyramid Vision Transformer (PVT) designed for efficient processing through hierarchical attention.
Building on the earlier discussion of single-input approaches—where mammographic images are analyzed individually using CNN-based or transformer-based models—it becomes evident that while these models can effectively extract local and even global features from a single view, they often miss the opportunity to leverage the rich, complementary information present across multiple standard mammographic projections. In clinical practice, each breast is typically imaged from two angles: the CC and MLO views. Furthermore, exams include both left and right breasts, yielding a total of four images per case. These views are not redundant; rather, they provide diverse visual perspectives that radiologists routinely cross-reference to identify subtle or asymmetric findings. To address this, researchers have begun exploring multi-view and two-stream architectures specifically designed to ingest and integrate information from multiple views simultaneously. In particular, two-stream models process CC and MLO images of the same breast (or sometimes left and right counterparts) through parallel processing streams, each stream dedicated to one view. This setup allows the model to learn distinct feature representations for each projection before fusing them for joint decision-making. Such an approach enhances robustness, as it mirrors the diagnostic strategy used by human experts.
The work by Bermudez et al. [49] takes this idea further by incorporating advanced fusion mechanisms using cross-view attention and graph-based relational modeling. Rather than relying solely on concatenation or simple feature merging, the authors propose architectures that learn to model ipsilateral (CC–MLO same-side) and contralateral (left–right) dependencies—a strategy that significantly boosts the interpretability and accuracy of breast cancer detection. Their system is trained and evaluated on the CSAW dataset, using exam-level labels without requiring pixel-level or lesion annotations. This weakly supervised setup is especially valuable for scaling real-world screening solutions. The study finds that transformer-based models with learned cross-view attention consistently outperform standard baselines, especially in handling challenging cases where abnormalities may be subtle or vary in appearance across views. The same work with the use of a transformer was proposed by Ahmed et al. [48]. In short, this transition to multi-view, two-stream modeling marks a crucial evolution in mammogram analysis—moving from isolated image interpretation to holistic exam-level reasoning, reflecting how radiologists approach the task in practice.
Continuing from the discussion on dual-stream and multi-view mammography models, the study by Wu et al. [47] represents a key advancement in real-world deployment of such architectures. Unlike traditional single-view models, this study leverages all four standard mammographic views using a multi-view CNN architecture trained on over 200,000 exams. Among the tested configurations—image-wise, view-wise, side-wise, and joint—the view-wise model, which processes CC and MLO views separately before fusing their outputs, yielded the best results for malignant classification. The pipeline also incorporates patch-level heatmaps to guide attention, combining local region information with global exam-level context through a two-stage architecture. Their approach was not just tested on datasets but validated in a reader study with 14 radiologists, showing that the model alone matched expert-level performance, and when combined with radiologists, it improved accuracy beyond either alone. This demonstrates how intelligent multi-view integration—especially leveraging both CC and MLO projections per breast—can significantly enhance clinical breast cancer screening.
Yamazaki and Ishida [45] propose a solution rooted in GANs (Generative Adversarial Networks): synthesize the missing CC view directly from the MLO image. Their method is based on Complete Representation GAN (CR-GAN) and enhanced with progressive growing and feature matching loss for improved image resolution and structural realism. By using a bi-directional training strategy, the model learns to generate both CC-from-MLO and MLO-from-CC views, improving generalization and robustness. The network is trained on public datasets such as CBIS-DDSM, INbreast, and CMMD, using image patches progressively resized up to 512×512. Although certain challenges remain—especially for reconstructing fine features like calcifications—the generated synthetic views demonstrate the potential to support radiologists in settings where only single-view data is available. Rather than replacing dual-view models, this approach augments them by providing additional synthetic data or filling gaps where full views are missing. It opens up a new branch of mammographic AI research: not just analyzing images, but generating the ones we need.
In an insightful study by Hussain et al. [50], the authors proposed a novel approach termed “Multiview Multimodal Feature Fusion (MMFF)” specifically designed to improve breast cancer classification. This method combines imaging data—specifically four standard mammographic views—with textual metadata from radiological reports, including patient age, breast density, BIRADS scores, family history of cancer, and lesion laterality. The researchers collected a detailed in-house dataset from TecSalud Hospitals in Monterrey, Mexico. It consisted of 3,080 mammographic images from 770 cases, coupled with radiological reports. For feature extraction, the authors used a ResNet50 convolutional neural network enhanced by Squeeze-and-Excitation (SE) blocks for image data, enabling the model to focus selectively on the most informative image features. Simultaneously, they employed an artificial neural network (ANN) to process the structured textual metadata. A late fusion strategy combined these independent feature sets before feeding them into another ANN classifier for final diagnosis. This innovative multimodal approach achieved outstanding performance, significantly surpassing single-modal or simpler multimodal techniques. The MMFF model achieved an accuracy of 96.9%, a precision of 97.7%, and a sensitivity of 91.6%. This research clearly demonstrates the potential of integrating multimodal data in medical diagnostics, offering valuable insights and direction for future developments in breast cancer detection.
Table 9 overviews advantages and disadvantages of the proposed methods by authors, and in Table 10, the additional works for all tasks with their architectures and results are given.

3.2. Overview of Publicly Available Datasets

The advancement of deep learning techniques for breast cancer diagnosis has been significantly driven by the availability of diverse publicly accessible mammography datasets. These datasets differ in image modality, resolution, annotation quality, and the breadth of tasks they support, making them foundational resources for developing and benchmarking computer-aided diagnosis (CAD) systems.
An early and widely used resource is the Mammographic Image Analysis Society (MIAS) database [72], comprising 322 digitized-film images from 161 patients with radiologist truth-markings (elliptical ROIs) and basic labels. The Digital Database for Screening Mammography (DDSM) [73] broadened the field by providing 2620 digitized-film exams (legacy LJPEG, often converted to TIFF/DICOM) with pathology-verified labels. Building on DDSM, the curated CBIS–DDSM release [74] standardized formats and annotations (10,239 images; 2620 studies; 1566 participants) and supplies predefined train/test splits for masses and calcifications.
INbreast [75] remains a quality-focused benchmark: 410 full-field digital mammograms (DICOM) with precise polygonal delineations (masses, calcifications, asymmetries, architectural distortions) and BI-RADS density. At the population scale, the OPTIMAM Mammography Image Database (OMI-DB) [76] contains over 2.5 million FFDM images from 173,319 women across multiple vendors, with rich clinical metadata; access is restricted and typically requires a DUA through a data access committee.
More recent releases combine larger cohorts with richer labels. VinDr–Mammo [77] offers 5000 four-view exams (20,000 images) with breast-level BI-RADS assessments and lesion-level annotations; access is credentialed (DUA), and a predefined split (4000/1000 exams) is provided. The CMMD dataset [78] includes 5202 images from 1775 patients, biopsy-confirmed labels, and molecular subtype metadata for a subset and is publicly available via TCIA (CC BY 4.0). The DMID dataset [79] provides 510 digital mammography images with mass segmentation masks and associated labels; the number of patients is not reported in the source description, and no official split is supplied.
The Swedish CSAW ecosystem [80] offers population-scale screening data with restricted-access cohort resources and public subsets oriented to specific tasks (e.g., CSAW–CC for case-control analyses and CSAW–S for segmentation). These subsets include pixel-level tumor annotations and detailed documentation but do not impose a unified split across the entire ecosystem.
Overall, the range of available datasets—from small, high-quality digital collections to massive multi-vendor archives—enables researchers to explore not only the robustness of detection models but also the precision of segmentation approaches and the accuracy of classification systems across varying imaging and clinical conditions. Table 11 summarizes the sample size, modality, annotation type, access/licensing (including DUA/special requests), and any split notes for each dataset to facilitate reproducible use.

3.3. Clinical Integration

The practical use of deep learning (DL) in breast cancer diagnosis is accelerating, as improvements in artificial intelligence (AI) are expected to enhance diagnostic precision, alleviate radiologists’ burden, and improve patient outcomes. Nonetheless, despite the swift advancements in research, the shift from algorithm development to standard clinical use continues to be a complicated and multifarious task. Numerous extensive studies and experimental programs illustrate the capability of deep learning algorithms to assist or potentially substitute human readers in the interpretation of mammograms. A prospective trial in Sweden with almost 55,000 women demonstrated that substituting one radiologist with the AI system Lunit INSIGHT MMG yielded a cancer detection rate that was not inferior to conventional double reading, while preserving recall rates and safety criteria [81]. Separately, a retrospective simulation using over 500,000 mammograms from Stockholm’s screening program indicated that AI-based triage could substantially reduce radiologist workload without compromising sensitivity [80].
Commercial deep learning technologies, such as iCAD’s ProFound AI and ScreenPoint Medical’s Transpara, have received regulatory approvals and are currently utilized in clinical settings throughout Europe, North America, and select regions of Asia. These systems aid radiologists in lesion identification, malignancy assessment, and case prioritization. In a countrywide screening cohort in Germany comprising over 460,000 people, deep learning-assisted readings resulted in a 17.6% enhancement in cancer diagnosis without elevating the false positive rate [82].
Notwithstanding these achievements, numerous barriers persist that hinder extensive clinical integration. A primary concern is model interpretability. Numerous deep learning systems function as “black boxes,” complicating clinicians’ ability to comprehend or trust their outputs. Initiatives to enhance explainability, like attention heatmaps and saliency-based visualizations, are crucial for promoting therapeutic acceptance. Moreover, the generalization across varied populations and imaging equipment continues to be an issue, as several models are developed using restricted, demographically biased datasets. This prompts inquiries on equity and resilience, especially among marginalized patient populations [83].
Legal and regulatory factors are also of significant importance. Despite various deep learning tools obtaining FDA or CE certification, ambiguity persists regarding culpability in cases of diagnostic errors related to AI-assisted decisions. Furthermore, effective integration is largely contingent upon the ability of DL systems to be assimilated into current radiology workflows, encompassing interoperability with Picture Archiving and Communication System (PACS) and Radiology Information System (RIS) platforms, as well as their capacity to facilitate—rather than hinder—radiologists’ daily operations [84].
Future solutions can facilitate the wider clinical implementation of deep learning in breast cancer imaging. This includes the implementation of prospective, multi-center trials; the creation of intuitive, explicable interfaces; and the establishment of resilient systems for ongoing performance evaluation following deployment. Moreover, the education of radiologists and the collaborative design of AI with end-users will be crucial to guarantee that these tools augment, rather than supplant, human knowledge.
In conclusion, although deep learning demonstrates significant potential for enhancing breast cancer detection and alleviating the workload of radiologists, its practical implementation depends on factors beyond mere technological efficacy. Trust, transparency, generalizability, and workflow alignment are essential foundations for guaranteeing that DL systems provide significant benefit in practical clinical environments.

3.4. Limitations and Challenges

While deep learning has brought significant advancements to breast cancer computer-aided diagnosis (CADx), key challenges remain across the three primary tasks: detection, segmentation, and classification. These challenges hinder not only performance but also the clinical applicability and generalizability of current models.
The first step in any CADx pipeline—lesion detection—remains particularly sensitive to the inherent variability of mammographic data. Lesions can be small, subtle, and irregular, often blending into dense breast tissue, making them difficult to localize. Even with powerful object detection models such as Faster R-CNN or YOLO variants, performance is constrained when datasets have limited resolution or lack detailed bounding box annotations. Moreover, many detection models fail to incorporate multi-view mammographic inputs effectively, losing the contextual information that radiologists typically rely on to cross-check findings across CC and MLO views. This often results in missed detections or increased false positives, especially in challenging cases like overlapping tissue structures or faint microcalcifications [25]. The overall limitations of detection across the works are given in Table 12.
Following detection, segmentation aims to precisely outline tumor boundaries. However, despite architectural advancements like U-Net and transformer-based models, segmentation remains one of the most error-prone components in breast cancer CADx [51]. A core difficulty lies in the low contrast of mammograms, particularly in dense breast tissues where tumor margins are poorly defined. This makes accurate boundary detection difficult, even for sophisticated models [26]. Furthermore, many widely used datasets—such as MIAS and CBIS-DDSM—offer only coarse or subjectively labeled annotations, introducing significant label noise. This degrades model learning, especially in pixel-level supervised training. Class imbalance is another common issue: malignant lesions are underrepresented, resulting in biased models that under-segment or completely miss smaller, irregular tumors [80]. Over-segmentation of benign structures or adjacent tissue is also common, particularly in simpler encoder-decoder architectures. Additionally, segmentation models often lack robustness when transferred across datasets due to differences in resolution, imaging hardware, and patient demographics. Compounding these issues, high-performing models tend to be computationally intensive and are challenging to deploy in real-time, resource-limited clinical environments. Key segmentation challenges identified in recent literature, with representative sources, are summarized in Table 13.
Finally, classification, the task of determining whether a lesion is benign or malignant, faces its own set of hurdles. While many models show promise on internal datasets, their real-world utility is often compromised by overfitting, class imbalance, and lack of interpretability. A key limitation is the reliance on unimodal input—typically images—while ignoring rich contextual data found in clinical records or radiological reports. Recent studies have attempted to address this through multimodal architectures that combine image and text features. For instance, Hussain et al. [50] demonstrated how integrating mammographic views with textual metadata significantly improves classification accuracy. A summary of major challenges for lesion classification in mammography, together with representative references, is provided in Table 14.
Altogether, these limitations across detection, segmentation, and classification point to a broader need: future CADx systems must be not only accurate but also robust, explainable, and adaptable to diverse clinical environments. Solving these challenges will be essential for bridging the gap between research models and deployable diagnostic tools in breast cancer care.

4. Discussion

The integration of artificial intelligence (AI) into computer-aided diagnosis (CADx) for breast cancer is transforming clinical practice by leveraging advanced vision–language models (VLMs) and multimodal deep learning frameworks. These technologies address critical challenges in detection, segmentation, and classification, improving accuracy, efficiency, and interpretability while adapting to diverse clinical settings.
A significant advancement is the use of VLMs trained on paired image–text data, such as radiology reports, which eliminates the need for labor-intensive manual annotations. These models enable zero-shot classification and localization of breast lesions by aligning visual features with clinical language, demonstrating robust performance across varied datasets and imaging protocols [47]. For instance, a 2024 study in Nature Medicine reported that AI-supported mammography increased the cancer detection rate by 17.6% compared to traditional methods, highlighting enhanced sensitivity [85]. This generalizability is vital for diverse patient demographics and imaging equipment [86].
Multimodal foundation models further enhance CADx capabilities by integrating diverse data sources, including imaging, patient history, genetic risk factors, and radiological metadata. Models like MedSAM, trained on over one million image–mask pairs across medical domains, achieve robust segmentation performance on unseen cancers and modalities, minimizing overfitting and improving adaptability [87]. These models provide a holistic clinical context, leading to more accurate diagnoses.
AI integration significantly optimizes clinical workflows and reduces diagnostic errors. A 2024 Radiology study found that AI reduced false positives by 20.5% (1.63% vs. 2.39%) compared to radiologist-only readings, enhancing specificity [88]. Additionally, AI support reduced reading time for normal mammograms by 43%, as reported in another 2024 Nature Medicine study, allowing radiologists to focus on complex cases [85]. A hybrid AI–radiologist model, evaluated in a 2024 Radiology study, reduced radiologist workload by 38.1% for 41,469 mammography exams (median patient age 59) while maintaining comparable cancer detection rates (6.7% vs. 6.6%) and recall rates (23.9% vs. 23.6%) [89]. In this model, AI (Transpara, ScreenPoint Medical) independently assessed cases with high certainty, while radiologists double-read uncertain cases, achieving significant efficiency gains without compromising performance [89].
Interpretability is improved through VLMs, which generate human-readable outputs, such as diagnostic summaries or sentence-level rationales linked to radiology reports [90]. These capabilities build clinician trust and support automated report drafting, potentially streamlining double-reading workflows. Federated learning further enables collaborative model training across institutions without compromising patient privacy, producing robust, adaptable models via fine-tuning [91].
In conclusion, AI-driven CADx systems, powered by VLMs and multimodal models, are transforming breast cancer detection. With demonstrated improvements in detection rates (+17.6%), false positive reduction (20.5%), reading time (−43%), and workload reduction (38.1%), these systems offer enhanced accuracy, efficiency, and clinical trust. As regulatory frameworks evolve, AI will likely shift from specialized tools to comprehensive, adaptable diagnostic platforms.

5. Limitation of the Study

The study encountered multiple limitations that impacted its outcomes. First of all, many breast cancer datasets, primarily from private clinics and hospitals, were inaccessible. Despite efforts to gain access through applications and direct contact with dataset owners, only a small number of datasets were acquired. Second of all, the chosen search strategy, including its inclusion and exclusion criteria, may have introduced further constraints. Expanding the literature review to include additional search platforms and databases could have enriched the study. Incorporating existing review papers might have provided deeper insights into current trends in breast cancer research.

6. Conclusions

Breast cancer continues to be one of the most common and lethal diseases affecting women globally, with early identification and precise diagnosis being essential for enhancing outcomes. Mammography, the premier imaging technique for breast cancer screening, has significantly advanced due to developments in deep learning and artificial intelligence. In the last ten years, numerous deep learning-based methods have been created for essential tasks, including lesion identification, picture segmentation, and malignancy classification. These techniques have shown significant potential in improving diagnostic accuracy, alleviating radiologist burden, and facilitating consistent decision-making.
However, our review highlights that despite promising results, significant challenges persist. These include low contrast in mammographic images, class imbalance in datasets, limited generalizability across clinical environments, and the black-box nature of many DL models. In particular, segmentation accuracy is hindered by coarse annotations and dense tissue overlap, while classification models often struggle with integrating contextual clinical information and maintaining interpretability.
Recent developments in multimodal learning and the introduction of vision–language models (VLMs) represent a promising new direction. These models offer the ability to unify image and textual data, reduce dependency on extensive labeled datasets, and provide explainable outputs that align closely with clinical reasoning. Moreover, advances in foundation models and federated learning pave the way for scalable, privacy-preserving, and institution-agnostic CADx systems.
In summary, while deep learning has already transformed mammography-based diagnosis, the field stands on the cusp of another major leap—toward truly integrated, explainable, and generalizable AI systems. Future research should focus on developing robust multimodal architectures, improving dataset diversity and quality, and ensuring that CADx models are clinically validated, interpretable, and ethically deployable in real-world settings.

Author Contributions

Conceptualization, B.A. and T.Z.; methodology, T.Z.; software, D.R.; validation, T.Z., A.I. and D.R.; formal analysis, T.Z.; investigation, A.I. and D.R.; resources, B.A.; data curation, A.I.; writing—original draft preparation, T.Z. and A.I.; writing—review and editing, B.A. and T.Z.; visualization, T.Z. and A.I.; supervision, B.A.; project administration, B.A. and T.Z.; funding acquisition, B.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Committee of Science of the Ministry of Science and Higher Education of the Republic of Kazakhstan, grant number BR24993145.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available on open access: 1. KAU-BCMD: https://www.kaggle.com/datasets/orvile/kau-bcmd-mamography-dataset (accessed on 10 January 2025); 2. RSNA: https://www.kaggle.com/competitions/rsna-breast-cancer-detection (accessed on 10 January 2025); 3. CMMD: https://www.kaggle.com/datasets/tommyngx/cmmd2022 (accessed on 10 January 2025). Dataset available on request: 1. VinDr-Mammo: https://vindr.ai/datasets/mammo (accessed on 10 January 2025); 2. NLBS: https://www.frdr-dfdr.ca/repo/dataset/cb5ddb98-ccdf-455c-886c-c9750a8c34c2 (accessed on 20 January 2025); 3. CSAW-CC: https://gts.ai/dataset-download/csaw-cc-mammography/ (accessed on 25 January 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
AUCArea Under the ROC Curve
CADxComputer-Aided Diagnosis
CCCraniocaudal
CEConformité Européenne (EU regulatory mark)
CNNConvolutional Neural Network
DETRDEtection TRansformer
DICOMDigital Imaging and Communications in Medicine
DLDeep Learning
DTLDeep Transfer Learning
FDAU.S. Food and Drug Administration
FFDMFull-Field Digital Mammography
FPPIFalse Positives Per Image
GANGenerative Adversarial Network
IoUIntersection over Union
LBPLocal Binary Patterns
mAPMean Average Precision
mIoUMean Intersection over Union
MLOMediolateral Oblique
PACSPicture Archiving and Communication System
RISRadiology Information System
ROIRegion of Interest
SVMSupport Vector Machine
TPRTrue Positive Rate
ViTVision Transformer
VLMVision Language Models

References

  1. World Health Organization. Breast Cancer 2021. Available online: https://www.iarc.who.int/featured-news/breast-cancer-awareness-month-2021/?utm_source=chatgpt.com (accessed on 17 March 2025).
  2. Myers, E.R.; Moorman, P.; Gierisch, J.M.; Havrilesky, L.J.; Grimm, L.J.; Ghate, S.; Davidson, B.; Mongtomery, R.C.; Crowley, M.J.; McCrory, D.C.; et al. Benefits and harms of breast cancer screening: A systematic review. JAMA 2015, 314, 1615–1634. [Google Scholar] [CrossRef]
  3. UK Independent. Panel on Breast Cancer Screening. The benefits and harms of breast cancer screening: An independent review. Lancet 2012, 380, 1778–1786. [Google Scholar] [CrossRef]
  4. American Cancer Society. Types of Breast Cancer. 2025. Available online: https://www.cancer.org/cancer/types/breast-cancer/about/types-of-breast-cancer.html (accessed on 28 August 2025).
  5. American College of Radiology (ACR). ACR BI-RADS® Atlas: Breast Imaging Reporting and Data System; American College of Radiology: Reston, VA, USA, 2013. [Google Scholar]
  6. Lowry, K.P.; Coley, R.Y.; Miglioretti, D.L.; Kerlikowske, K.; Henderson, L.M.; Onega, T.; Sprague, B.L.; Lee, J.M.; Herschorn, S.; Tosteson, A.N.A.; et al. Screening Performance of Digital Breast Tomosynthesis vs Digital Mammography in Community Practice by Patient Age, Screening Round, and Breast Density. JAMA Netw. Open 2020, 3, e2011792. [Google Scholar] [CrossRef]
  7. Freeman, K.; Geppert, J.; Stinton, C.; Todkill, D.; Johnson, S.; Clarke, A.; Taylor-Phillips, S. Use of artificial intelligence for image analysis in breast cancer screening programmes: Systematic review of test accuracy. BMJ 2021, 374, n1872. [Google Scholar] [CrossRef]
  8. Yoon, J.H.; Strand, F.; Baltzer, P.A.; Conant, E.F.; Gilbert, F.J.; Lehman, C.D.; Morris, E.A.; Mullen, L.A.; Nishikawa, R.M.; Sharma, N.; et al. Standalone AI for Breast Cancer Detection at Screening Digital Mammography and Digital Breast Tomosynthesis: A Systematic Review and Meta-Analysis. Radiology 2023, 307, e222639. [Google Scholar] [CrossRef]
  9. Lei, Y.M.; Yin, M.; Yu, M.H.; Yu, J.; Zeng, S.E.; Lv, W.Z.; Li, J.; Ye, H.R.; Cui, X.W.; Dietrich, C.F. Artificial intelligence in medical imaging of the breast. Front. Oncol. 2021, 11, 600557. [Google Scholar] [CrossRef]
  10. Al-Karawi, D.; Al-Zaidi, S.; Helael, K.A.; Obeidat, N.; Mouhsen, A.M.; Ajam, T.; Alshalabi, B.A.; Salman, M.; Ahmed, M.H. A review of artificial intelligence in breast imaging. Tomography 2024, 10, 705–726. [Google Scholar] [CrossRef]
  11. Branco, P.E.S.C.; Franco, A.H.S.; Oliveira, A.P.d.; Carneiro, I.M.C.; Carvalho, L.M.C.d.; Souza, J.I.N.d.; Leandro, D.R.; Cândido, E.B. Artificial intelligence in mammography: A systematic review of the external validation. Rev. Bras. Ginecol. Obs. 2024, 46, e-rbgo71. [Google Scholar] [CrossRef]
  12. Díaz, O.; Rodríguez-Ruíz, A.; Sechopoulos, I. Artificial Intelligence for breast cancer detection: Technology, challenges, and prospects. Eur. J. Radiol. 2024, 175, 111457. [Google Scholar] [CrossRef]
  13. Schopf, C.M.; Ramwala, O.A.; Lowry, K.P.; Hofvind, S.; Marinovich, M.L.; Houssami, N.; Elmore, J.G.; Dontchos, B.N.; Lee, J.M.; Lee, C.I. Artificial Intelligence-Driven Mammography-Based Future Breast Cancer Risk Prediction: A Systematic Review. J. Am. Coll. Radiol. 2024, 21, 319–328. [Google Scholar] [CrossRef]
  14. Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
  15. McKinney, S.M.; Sieniek, M.; Godbole, V.; Godwin, J.; Antropova, N.; Ashrafian, H.; Back, T.; Chesus, M.; Corrado, G.S.; Darzi, A.; et al. International evaluation of an AI system for breast cancer screening. Nature 2020, 577, 89–94. [Google Scholar] [CrossRef]
  16. Yan, Y.; Conze, P.H.; Lamard, M.; Quellec, G.; Cochener, B.; Coatrieux, G. Towards improved breast mass detection using dual-view mammogram matching. Med. Image Anal. 2021, 71, 102083. [Google Scholar] [CrossRef]
  17. Cao, H.; Pu, S.; Tan, W.; Tong, J. Breast mass detection in digital mammography based on anchor-free architecture. Comput. Methods Programs Biomed. 2021, 205, 106033. [Google Scholar] [CrossRef]
  18. Zhang, L.; Li, Y.; Chen, H.; Wu, W.; Chen, K.; Wang, S. Anchor-free YOLOv3 for mass detection in mammogram. Expert Syst. Appl. 2022, 191, 116273. [Google Scholar] [CrossRef]
  19. Aly, G.H.; Marey, M.; El-Sayed, S.A.; Tolba, M.F. YOLO-Based Breast Masses Detection and Classification in Full-Field Digital Mammograms. Comput. Methods Programs Biomed. 2021, 200, 105823. [Google Scholar] [CrossRef]
  20. Chen, X.; Zhang, K.; Abdoli, N.; Gilley, P.W.; Wang, X.; Liu, H.; Zheng, B.; Qiu, Y. Transformers improve breast cancer diagnosis from unregistered multi-view mammograms. Diagnostics 2022, 12, 1549. [Google Scholar] [CrossRef]
  21. Park, S.; Lee, K.H.; Ko, B.; Kim, N. Unsupervised anomaly detection with generative adversarial networks in mammography. Sci. Rep. 2023, 13, 2925. [Google Scholar] [CrossRef]
  22. Agarwal, R.; Díaz, O.; Yap, M.H.; Lladó, X.; Martí, R. Deep learning for mass detection in Full Field Digital Mammograms. Comput. Biol. Med. 2020, 121, 103774. [Google Scholar] [CrossRef]
  23. Manalı, D.; Demirel, H.; Eleyan, A. Deep Learning Based Breast Cancer Detection Using Decision Fusion. Computers 2024, 13, 294. [Google Scholar] [CrossRef]
  24. Baccouche, A.; Garcia-Zapirain, B.; Zheng, Y.; Elmaghraby, A.S. Early Detection and Classification of Abnormality in Prior Mammograms Using Image-to-Image Translation and YOLO Techniques. Comput. Methods Programs Biomed. 2022, 221, 106884. [Google Scholar] [CrossRef]
  25. Baccouche, A.; Garcia-Zapirain, B.; Castillo-Olea, C.; Elmaghraby, A.S. Breast Lesions Detection and Classification via YOLO-Based Fusion Models. Comput. Mater. Contin. 2021, 69, 1407–1427. [Google Scholar] [CrossRef]
  26. Betancourt Tarifa, A.S.; Marrocco, C.; Molinara, M.; Tortorella, F.; Bria, A. Transformer-based mass detection in digital mammograms. J. Ambient Intell. Humaniz. Comput. 2023, 14, 2723–2737. [Google Scholar] [CrossRef]
  27. Ghantasala, G.P.; Unhelkar, B.; Chakrabarti, P.; Vidyullatha, P.; Pyla, M. Improving Breast Cancer Diagnosis through Multi-Class Segmentation using Attention UNet Model. In Proceedings of the 2024 Second International Conference on Advances in Information Technology (ICAIT), Chikkamagaluru, India, 24–27 July 2024; Volume 1, pp. 1–7. [Google Scholar]
  28. Identifying Women With Dense Breasts at High Risk for Interval Cancer. Ann. Intern. Med. 2015, 162, 673–681. [CrossRef] [PubMed]
  29. Reeves, R.A.; Kaufman, T. Mammography. In StatPearls [Internet], updated 2023 July 24 ed.; StatPearls Publishing: Treasure Island, FL, USA, 2023; Available online: https://www.ncbi.nlm.nih.gov/books/NBK559310/ (accessed on 4 July 2025).
  30. Müjdat Tiryaki, V. Mass segmentation and classification from film mammograms using cascaded deep transfer learning. Biomed. Signal Process. Control 2023, 84, 104819. [Google Scholar] [CrossRef]
  31. Aliniya, P.; Nicolescu, M.; Nicolescu, M.; Bebis, G. Improved Loss Function for Mass Segmentation in Mammography Images Using Density and Mass Size. J. Imaging 2024, 10, 20. [Google Scholar] [CrossRef]
  32. Fu, X.; Cao, H.; Hu, H.; Lian, B.; Wang, Y.; Huang, Q.; Wu, Y. Attention-Based Active Learning Framework for Segmentation of Breast Cancer in Mammograms. Appl. Sci. 2023, 13, 852. [Google Scholar] [CrossRef]
  33. Hithesh, M.; Puttanna, V.K. From Pixels to Prognosis: Exploring from UNet to Segment Anything in Mammogram Image Processing for Tumor Segmentation. In Proceedings of the 2023 4th International Conference on Intelligent Technologies (CONIT), Bangalore, India, 21–23 June 2024; pp. 1–7. [Google Scholar]
  34. Ahmed, S.T.; Barua, S.; Fahim-Ul-Islam, M.; Chakrabarty, A. CoAtNet-Lite: Advancing Mammogram Mass Detection Through Lightweight CNN-Transformer Fusion with Attention Mapping. In Proceedings of the 2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT), Dhaka, Bangladesh, 2–4 May 2024; pp. 143–148. [Google Scholar]
  35. Demil, S.; Bouzar-Benlabiod, L.; Paillet, G. Cost Efficient Mammogram Segmentation and Classification with NeuroMem® Chip for Breast Cancer Detection. In Proceedings of the 2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI), Bellevue, WA, USA, 4–6 August 2023; pp. 273–278. [Google Scholar]
  36. Ali, M.; Hu, H.; Muhammad, T.; Qureshi, M.A.; Mahmood, T. Deep Learning and Shape-Driven Combined Approach for Breast Cancer Tumor Segmentation. In Proceedings of the 2025 6th International Conference on Advancements in Computational Sciences (ICACS), Lahore, Pakistan, 18–19 February 2025; pp. 1–6. [Google Scholar]
  37. Masood, A.; Naseem, U.; Kim, J. Multi-Level swin transformer enabled automatic segmentation and classification of breast metastases. In Proceedings of the 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia, 24–27 July 2023; pp. 1–4. [Google Scholar]
  38. Farrag, A.; Gad, G.; Fadlullah, Z.M.; Fouda, M.M. Mammogram tumor segmentation with preserved local resolution: An explainable AI system. In Proceedings of the GLOBECOM 2023–2023 IEEE Global Communications Conference, Kuala Lumpur, Malaysia, 4–8 December 2023; pp. 314–319. [Google Scholar]
  39. Nour, A.; Boufama, B. Hybrid Deep Learning and Active Contour Approach for Enhanced Breast Lesion Segmentation and Classification in Mammograms. Intell.-Based Med. 2025, 11, 100224. [Google Scholar] [CrossRef]
  40. Bentaher, N.; Kabbadj, Y.; Salah, M.B. Enhancing breast masses detection and segmentation: A novel u-net-based approach. In Proceedings of the 2023 10th International Conference on Wireless Networks and Mobile Communications (WINCOM), Istanbul, Turkey, 26–28 October 2023; pp. 1–6. [Google Scholar]
  41. M’Rabet, S.; Fnaiech, A.; Sahli, H. Heightened breast cancer segmentation in mammogram images. In Proceedings of the 2024 International Conference on Control, Automation and Diagnosis (ICCAD), Paris, France, 15–17 May 2024; pp. 1–6. [Google Scholar]
  42. Patil, B.; Vishwanath, P.; Priyanka, K.; Husseyn, M.; Parthiban, K. Convolutional Neural Network-Regularized Extreme Learning Machine with Hyperbolic Secant for Breast Cancer Segmentation and Classification. In Proceedings of the 2025 3rd International Conference on Integrated Circuits and Communication Systems (ICICACS), Raichur, India, 21–22 February 2025; pp. 1–6. [Google Scholar]
  43. Elkorany, A.S.; Elsharkawy, Z.F. Efficient breast cancer mammograms diagnosis using three deep neural networks and term variance. Sci. Rep. 2023, 13, 2663. [Google Scholar] [CrossRef]
  44. Ayana, B.Y.; Kumar, A.; Kim, J.; Kim, S.W. Vision-transformer-based transfer learning for mammogram classification. Diagnostics 2023, 13, 178. [Google Scholar] [CrossRef]
  45. Yamazaki, A.; Ishida, T. Two-view mammogram synthesis from single-view data using generative adversarial networks. Appl. Sci. 2022, 12, 12206. [Google Scholar] [CrossRef]
  46. Lamprou, C.; Katsikari, K.; Rahmani, N.; Hadjileontiadis, L.J.; Seghier, M.; Alshehhi, A. StethoNet: Robust Breast Cancer Mammography Classification Framework. IEEE Access 2024, 12, 144890–144903. [Google Scholar] [CrossRef]
  47. Wu, N.; Phang, J.; Park, J.; Shen, Y.; Huang, Z.; Zorin, M.; Jastrzębski, S.; Févry, T.; Katsnelson, J.; Kim, E.; et al. Deep neural networks improve radiologists’ performance in breast cancer screening. IEEE Trans. Med. Imaging 2020, 39, 1184–1194. [Google Scholar] [CrossRef]
  48. Ahmed, S.; Elazab, N.; El-Gayar, M.M.; Elmogy, M.; Fouda, Y.M. Multi-Scale Vision Transformer with Optimized Feature Fusion for Mammographic Breast Cancer Classification. Diagnostics 2025, 15, 1361. [Google Scholar] [CrossRef]
  49. Manigrasso, F.; Milazzo, R.; Russo, A.S.; Lamberti, F.; Strand, F.; Pagnani, A.; Morra, L. Mammography classification with multi-view deep learning techniques: Investigating graph and transformer-based architectures. Med. Image Anal. 2025, 99, 103320. [Google Scholar] [CrossRef]
  50. Hussain, S.; Teevno, M.A.; Naseem, U.; Avalos, D.B.A.; Cardona-Huerta, S.; Tamez-Peña, J.G. Multiview Multimodal Feature Fusion for Breast Cancer Classification Using Deep Learning. IEEE Access 2025, 13, 9265–9275. [Google Scholar] [CrossRef]
  51. Su, Y.; Liu, Q.; Xie, W.; Hu, P. YOLO-LOGO: A transformer-based YOLO segmentation model for breast mass detection and segmentation in digital mammograms. Comput. Methods Programs Biomed. 2022, 221, 106903. [Google Scholar] [CrossRef] [PubMed]
  52. Kamran, S.A.; Hossain, K.F.; Tavakkoli, A.; Bebis, G.; Baker, S. Swin-sftnet: Spatial feature expansion and aggregation using swin transformer for whole breast micro-mass segmentation. In Proceedings of the 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), Cartagena, Colombia, 18–21 April 2023; pp. 1–5. [Google Scholar]
  53. Carriero, A.; Groenhoff, L.; Vologina, E.; Basile, P.; Albera, M. Deep Learning in Breast Cancer Imaging: State of the Art and Recent Advancements in Early 2024. Diagnostics 2024, 14, 848. [Google Scholar] [CrossRef]
  54. Prinzi, F.; Insalaco, M.; Orlando, A.; Gaglio, S.; Vitabile, S. A yolo-based model for breast cancer detection in mammograms. Cogn. Comput. 2024, 16, 107–120. [Google Scholar] [CrossRef]
  55. Guevara Lopez, M.A.; Posada, N.; Moura, D.; Pollán, R.; Franco-Valiente, J.; Ortega, C.; Del Solar, M.; Díaz-Herrero, G.; Ramos, I.; Loureiro, J.; et al. BCDR: A Breast Cancer Digital Repository. In Proceedings of the 15th International Conference on Experimental Mechanics, Porto, Portugal, 22–27 July 2012; pp. 1065–1066. [Google Scholar]
  56. Pattanaik, R.K.; Mishra, S.; Siddique, M.; Gopikrishna, T.; Satapathy, S. Breast Cancer Classification from Mammogram Images Using Extreme Learning Machine-Based DenseNet121 Model. J. Sens. 2022, 2022, 2731364. [Google Scholar] [CrossRef]
  57. Wang, Y.; Wang, Z.; Feng, Y.; Zhang, L. WDCCNet: Weighted double-classifier constraint neural network for mammographic image classification. IEEE Trans. Med. Imaging 2021, 41, 559–570. [Google Scholar] [CrossRef]
  58. Petrini, D.G.; Shimizu, C.; Roela, R.A.; Valente, G.V.; Folgueira, M.A.A.K.; Kim, H.Y. Breast cancer diagnosis in two-view mammography using end-to-end trained efficientnet-based convolutional network. IEEE Access 2022, 10, 77723–77731. [Google Scholar] [CrossRef]
  59. Ayana, G.; Park, J.; Choe, S.w. Patchless multi-stage transfer learning for improved mammographic breast mass classification. Cancers 2022, 14, 1280. [Google Scholar] [CrossRef]
  60. Dada, E.G.; Oyewola, D.O.; Misra, S. Computer-aided diagnosis of breast cancer from mammogram images using deep learning algorithms. J. Electr. Syst. Inf. Technol. 2024, 11, 38. [Google Scholar] [CrossRef]
  61. Lopez, E.; Grassucci, E.; Valleriani, M.; Comminiello, D. Multi-view hypercomplex learning for breast cancer screening. arXiv 2022, arXiv:2204.05798. [Google Scholar] [CrossRef]
  62. Dehghan Rouzi, M.; Moshiri, B.; Khoshnevisan, M.; Akhaee, M.A.; Jaryani, F.; Salehi Nasab, S.; Lee, M. Breast cancer detection with an ensemble of deep learning networks using a consensus-adaptive weighting method. J. Imaging 2023, 9, 247. [Google Scholar] [CrossRef]
  63. Khan, S.K.; Kanamarlapudi, A.; Singh, A.R. RM-DenseNet: An Enhanced DenseNet Framework with Residual Model for Breast Cancer Classification Using Mammographic Images. In Proceedings of the 2024 2nd International Conference on Advancement in Computation & Computer Technologies (InCACCT), Gharuan, India, 2–3 May 2024; pp. 711–715. [Google Scholar]
  64. Shen, T.; Wang, J.; Gou, C.; Wang, F. Hierarchical Fused Model with Deep Learning and Type-2 Fuzzy Learning for Breast Cancer Diagnosis. IEEE Trans. Fuzzy Syst. 2020, 28, 3204–3218. [Google Scholar] [CrossRef]
  65. Leung, C.; Nguyen, H. A Novel Deep Learning Approach for Breast Cancer Detection on Screening Mammography. In Proceedings of the 2023 IEEE 23rd International Conference on Bioinformatics and Bioengineering, BIBE 2023, Dayton, OH, USA, 4–6 December 2023; pp. 277–284. [Google Scholar] [CrossRef]
  66. Pi, J.; Qi, Y.; Lou, M.; Li, X.; Wang, Y.; Xu, C.; Ma, Y. FS-UNet: Mass segmentation in mammograms using an encoder-decoder architecture with feature strengthening. Comput. Biol. Med. 2021, 137, 104800. [Google Scholar] [CrossRef]
  67. Huynh, H.N.; Tran, A.T.; Tran, T.N. Region-of-Interest Optimization for Deep-Learning-Based Breast Cancer Detection in Mammograms. Appl. Sci. 2023, 13, 6894. [Google Scholar] [CrossRef]
  68. Mohammed, A.D.; Ekmekci, D. Breast Cancer Diagnosis Using YOLO-Based Multiscale Parallel CNN and Flattened Threshold Swish. Appl. Sci. 2024, 14, 2680. [Google Scholar] [CrossRef]
  69. Rahman, M.M.; Jahangir, M.Z.B.; Rahman, A.; Akter, M.; Nasim, M.A.A.; Gupta, K.D.; George, R. Breast Cancer Detection and Localizing the Mass Area Using Deep Learning. Big Data Cogn. Comput. 2024, 8, 80. [Google Scholar] [CrossRef]
  70. Jiang, J.; Peng, J.; Hu, C.; Jian, W.; Wang, X.; Liu, W. Breast cancer detection and classification in mammogram using a three-stage deep learning framework based on PAA algorithm. Artif. Intell. Med. 2022, 134, 102419. [Google Scholar] [CrossRef] [PubMed]
  71. Bhatti, H.M.A.; Li, J.; Siddeeq, S.; Rehman, A.; Manzoor, A. Multi-detection and Segmentation of Breast Lesions Based on Mask RCNN-FPN. In Proceedings of the 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020, Seoul, Republic of Korea, 16–19 December 2020; pp. 2698–2704. [Google Scholar] [CrossRef]
  72. Suckling, J.; Parker, J.; Dance, D.; Astley, S.; Hutt, I.; Boggis, C.; Ricketts, I.; Stamatakis, E.; Cerneaz, N.; Kok, S.; et al. The Mammographic Image Analysis Society Digital Mammogram Database; University of Essex: Colchester, UK, 1994. [Google Scholar] [CrossRef]
  73. Heath, M.; Bowyer, K.; Kopans, D.; Kegelmeyer, P.; Moore, R.; Chang, K.; Munishkumaran, S. Current Status of the Digital Database for Screening Mammography. In Digital Mammography: Nijmegen, 1998; Karssemeijer, N., Thijssen, M., Hendriks, J., van Erning, L., Eds.; Springer: Dordrecht, The Netherlands, 1998; pp. 457–460. [Google Scholar] [CrossRef]
  74. Lee, R.S.; Gimenez, F.; Hoogi, A.; Miyake, K.K.; Gorovoy, M.; Rubin, D.L. A curated mammography data set for use in computer-aided detection and diagnosis research (CBIS-DDSM). Sci. Data 2017, 4, 1–9. [Google Scholar] [CrossRef]
  75. Moreira, I.C.; Amaral, I.; Domingues, I.; Cardoso, A.; Cardoso, M.J.; Cardoso, J.S. INbreast: Toward a Full-field Digital Mammographic Database. Acad. Radiol. 2012, 19, 236–248. [Google Scholar] [CrossRef]
  76. Halling-Brown, M.D.; Warren, L.M.; Ward, D.; Lewis, E.; Mackenzie, A.; Wallis, M.G.; Wilkinson, L.S.; Given-Wilson, R.M.; McAvinchey, R.; Young, K.C. OPTIMAM Mammography Image Database: A Large-Scale Resource of Mammography Images and Clinical Data. Radiol. Artif. Intell. 2021, 3, e200103. [Google Scholar] [CrossRef] [PubMed]
  77. Nguyen, H.T.; Nguyen, H.Q.; Pham, H.H.; Lam, K.; Le, L.T.; Dao, M.; Vu, V. VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography. Sci. Data 2023, 10, 277. [Google Scholar] [CrossRef]
  78. Cui, C.; Li, L.; Cai, H.; Fan, Z.; Zhang, L.; Dan, T.; Li, J.; Wang, J. The Chinese Mammography Database (CMMD): An Online Mammography Database with Biopsy Confirmed Types for Machine Diagnosis of Breast. The Cancer Imaging Archive. 2021. Available online: https://doi.org/10.7937/tcia.eqde-4b16 (accessed on 4 July 2025).
  79. Oza, P.; Oza, U.; Oza, R.; Sharma, P.; Patel, S.; Kumar, P.; Gohel, B. Digital mammography dataset for breast cancer diagnosis research (DMID) with breast mass segmentation analysis. Biomed. Eng. Lett. 2024, 14, 317–330. [Google Scholar] [CrossRef] [PubMed]
  80. Dembrower, K.; Wåhlin, E.; Liu, Y.; Olsson, M.; Eklund, M.; Lång, K.; Tsakok, A.; Strand, F. Effect of artificial intelligence-based triaging of breast cancer screening mammograms on cancer detection and radiologist workload: A retrospective simulation study. Lancet Digit. Health 2020, 2, e468–e474. [Google Scholar] [CrossRef]
  81. Dembrower, K.; Crippa, A.; Colón, E.; Eklund, M.; Strand, F.; Consortium, S.T. Artificial intelligence for breast cancer detection in screening mammography in Sweden: A prospective, population-based, paired-reader, non-inferiority study. Lancet Digit. Health 2023, 5, e703–e711. [Google Scholar] [CrossRef]
  82. van Nijnatten, T.J.A.; Payne, N.R.; Hickman, S.E.; Ashrafian, H.; Gilbert, F.J. Overview of trials on artificial intelligence algorithms in breast cancer screening—A roadmap for international evaluation and implementation. J. Clin. Med. 2025, 167, 111087. [Google Scholar] [CrossRef] [PubMed]
  83. Pedemonte, S.; Tsue, T.; Mombourquette, B.; Vu, Y.N.T.; Matthews, T.; Hoil, R.M.; Shah, M.; Ghare, N.; Zingman-Daniels, N.; Holley, S.; et al. A deep learning algorithm for reducing false positives in screening mammography. arXiv 2022, arXiv:2204.06671. [Google Scholar] [CrossRef]
  84. Kyono, T.; Gilbert, F.J.; van der Schaar, M. MAMMO: A Deep Learning Solution for Facilitating Radiologist-Machine Collaboration in Breast Cancer Diagnosis. arXiv 2018, arXiv:1811.02661. [Google Scholar] [CrossRef]
  85. Eisemann, N.; Bunk, S.; Mukama, T.; Baltus, H.; Elsner, S.A.; Gomille, T.; Hecht, G.; Heywang-Köbrunner, S.; Rathmann, R.; Siegmann-Luz, K.; et al. Nationwide real-world implementation of AI for cancer detection in population-based mammography screening. Nat. Med. 2025, 31, 917–924. [Google Scholar] [CrossRef]
  86. Yu, H.; Yi, S.; Niu, K.; Zhuo, M.; Li, B. UMIT: Unifying Medical Imaging Tasks via Vision-Language Models. arXiv 2025, arXiv:2503.15892. [Google Scholar] [CrossRef]
  87. Ma, J.; He, Y.; Li, F.; Han, L.; You, C.; Wang, B. Segment Anything in Medical Images. Nat. Commun. 2024, 15, 654. [Google Scholar] [CrossRef] [PubMed]
  88. Lauritzen, A.D.; Lillholm, M.; Lynge, E.; Nielsen, M.; Karssemeijer, N.; Vejborg, I. Early Indicators of the Impact of Using AI in Mammography Screening for Breast Cancer. Radiology 2024, 311, e232479. [Google Scholar] [CrossRef]
  89. Verboom, S.D.; Kroes, J.; Pires, S.; Broeders, M.J.; Sechopoulos, I. Hybrid radiologist/AI mammography screening with certainty-based triage: A simulation study. Radiology 2024, 316, e242594. [Google Scholar] [CrossRef]
  90. Ji, J.; Hou, Y.; Chen, X.; Pan, Y.; Xiang, Y. Vision-Language Model for Generating Textual Descriptions From Clinical Images: Model Development and Validation Study. JMIR Form. Res. 2024, 8, e32690. [Google Scholar] [CrossRef]
  91. Rieke, N.; Hancox, J.; Li, W.; Milletari, F.; Roth, H.R.; Albarqouni, S.; Bakas, S.; Galtier, M.N.; Landman, B.A.; Maier-Hein, K.; et al. The future of digital health with federated learning. NPJ Digit. Med. 2020, 3, 119. [Google Scholar] [CrossRef] [PubMed]
Figure 1. PRISMA flow diagram. * indicates different database names.
Figure 1. PRISMA flow diagram. * indicates different database names.
Informatics 12 00106 g001
Figure 2. Search strategy by keywords.
Figure 2. Search strategy by keywords.
Informatics 12 00106 g002
Figure 3. The statistics of the selected papers are grouped based on their publishers.
Figure 3. The statistics of the selected papers are grouped based on their publishers.
Informatics 12 00106 g003
Figure 4. The distribution of retrieved papers by years.
Figure 4. The distribution of retrieved papers by years.
Informatics 12 00106 g004
Figure 5. Examples from public mammography datasets, where the green box indicates a mass and the yellow box indicates a calcification. (a) CBIS–DDSM MLO view; (b) INbreast MLO view [25].
Figure 5. Examples from public mammography datasets, where the green box indicates a mass and the yellow box indicates a calcification. (a) CBIS–DDSM MLO view; (b) INbreast MLO view [25].
Informatics 12 00106 g005
Figure 6. YOLO-based pipeline for breast cancer detection in mammograms. The workflow integrates CBIS–DDSM, INbreast, and proprietary datasets, applies pre-processing and augmentation, trains YOLO models with transfer learning, and outputs both bounding box predictions and heatmaps [54].
Figure 6. YOLO-based pipeline for breast cancer detection in mammograms. The workflow integrates CBIS–DDSM, INbreast, and proprietary datasets, applies pre-processing and augmentation, trains YOLO models with transfer learning, and outputs both bounding box predictions and heatmaps [54].
Informatics 12 00106 g006
Figure 7. Example of mammography images and corresponding region of interest (ROI) masks of a mass. On the right of each pair are the mammography views (left—craniocaudal (CC), right—mediolateral oblique (MLO)), and on the left are the ROI masks. The ROI is represented as a binary image: black indicates the background, while white highlights the suspicious region.
Figure 7. Example of mammography images and corresponding region of interest (ROI) masks of a mass. On the right of each pair are the mammography views (left—craniocaudal (CC), right—mediolateral oblique (MLO)), and on the left are the ROI masks. The ROI is represented as a binary image: black indicates the background, while white highlights the suspicious region.
Informatics 12 00106 g007
Figure 8. Segmentation pipeline of the DeepLabv3+ architecture using a ResNet backbone and an Atrous Spatial Pyramid Pooling (ASPP) module [38].
Figure 8. Segmentation pipeline of the DeepLabv3+ architecture using a ResNet backbone and an Atrous Spatial Pyramid Pooling (ASPP) module [38].
Informatics 12 00106 g008
Figure 9. Classification pipeline using transfer learning [56].
Figure 9. Classification pipeline using transfer learning [56].
Informatics 12 00106 g009
Table 1. Research questions for breast cancer diagnosis review.
Table 1. Research questions for breast cancer diagnosis review.
RQResearch Question
RQ1What are the current deep learning techniques used in mammographic breast cancer diagnosis?
RQ2What are the types of problems solved?
RQ3Which methods/models are used and how they evolve?
RQ4What are the recent trends for detection, segmentation, and classification tasks?
RQ5What are the existing limitations and challenges?
RQ6Which datasets are available for mammography images?
RQ7What are the existing multimodal learning approaches?
Table 2. Summary of task types in mammography research.
Table 2. Summary of task types in mammography research.
Task TypeNumber of Papers
Detection19
Segmentation18
Classification14
Table 3. Inclusion and exclusion criteria for study selection.
Table 3. Inclusion and exclusion criteria for study selection.
Inclusion CriteriaExclusion Criteria
Studies written in English onlyStudies written in other languages
Published between 1 January 2020 and 1 June 2025Published before 1 January 2020
Focused on mammography images for breast cancer diagnosis and detectionFocused on mammography images for breast cancer prognosis, meta-analysis, and clinical decision-making
Used deep learning or machine learning methodsNo deep learning or machine learning used
Included studies focusing only on mammographyModality-specific restrictions
Published in peer-reviewed sources (IEEE, Scopus, etc.)Preprints, retracted papers, review papers, dissertations
Table 4. Outcomes of interest and their definitions.
Table 4. Outcomes of interest and their definitions.
Outcome DomainDefinition and Metrics Collected
Lesion Detection PerformanceSensitivity; specificity; precision; false positives per image (FPPI); AUC.
Segmentation AccuracyDice Similarity Coefficient (DSC); Intersection over Union (IoU); pixel-level precision and recall.
Classification AccuracyAccuracy; F1-score; AUC; confusion matrix-derived metrics (TP, TN, FP, FN).
Table 5. Summary of risk of bias and applicability concerns based on QUADAS-2.
Table 5. Summary of risk of bias and applicability concerns based on QUADAS-2.
DomainLow Risk (%)Unclear Risk (%)High Risk (%)
Dataset Representativeness18 (38.30%)23 (48.94%)6 (12.77%)
Index Test (DL model)35 (74.47%)9 (19.15%)3 (6.38%)
Reference Standard30 (63.83%)12 (25.53%)5 (10.64%)
Flow and Timing41 (87.23%)5 (10.64%)1 (2.13%)
Applicability ConcernsLow (55.32%)Moderate (31.91%)High (12.77%)
Table 6. Risk of bias assessment for key cited studies across QUADAS-2 domains adapted for AI-based mammography studies.
Table 6. Risk of bias assessment for key cited studies across QUADAS-2 domains adapted for AI-based mammography studies.
StudyDataset RepresentativenessIndex TestReference StandardFlow and TimingOverall Risk
[15]LowLowLowLowLow
[16]LowUnclearLowLowModerate
[17]LowLowUnclearLowModerate
[18]LowLowUnclearLowModerate
[19]LowUnclearLowLowModerate
[20]UnclearLowLowLowModerate
[21]LowLowLowLowLow
[22]UnclearLowLowLowModerate
[23]LowLowUnclearLowModerate
[24]LowLowLowLowLow
[25]LowUnclearUnclearLowHigh
[26]LowLowLowLowLow
[27]LowLowLowLowLow
[28]UnclearLowUnclearLowHigh
[29]LowUnclearLowLowModerate
[30]LowUnclearLowLowModerate
[31]LowLowUnclearLowModerate
[32]LowLowLowLowLow
[33]LowLowLowLowLow
[34]LowUnclearLowLowModerate
[35]LowLowLowLowLow
[36]LowLowLowLowLow
[37]UnclearLowLowLowModerate
[38]LowLowUnclearLowModerate
[39]LowUnclearLowLowModerate
[40]LowLowUnclearLowModerate
[41]LowUnclearLowLowModerate
[42]LowLowLowLowLow
[43]LowLowLowLowLow
[44]LowLowUnclearLowModerate
[45]LowLowLowLowLow
[46]UnclearLowLowLowModerate
[47]LowLowLowLowLow
[48]LowLowLowLowLow
[49]LowLowUnclearLowModerate
[50]LowUnclearLowLowModerate
Table 7. Summary of deep learning models for breast lesion detection in mammography.
Table 7. Summary of deep learning models for breast lesion detection in mammography.
ApproachAdvantageDisadvantageTaskPapers
Ensemble Faster R-CNN + CNNHigher accuracy than radiologists; reduces errors; works across countriesNeeds large diverse data; high computational cost; interpretability limitedDetectionMcKinney et al. [15]
YOLOv3-based fusion modelAccurate, fast, multi-class detectionLower calcification sensitivity; needs tuning; may miss small lesionsDetectionBaccouche et al. [25]
Faster R-CNN (InceptionV2)High sensitivity, low false positives, robust across datasetsNeeds large annotation; slower than YOLO; misses subtle lesionsDetectionAgarwal et al. [22]
YOLOv3 with anchor tuningFast, high sensitivity, accurate for mass sizesLower accuracy for tiny calcifications; needs tuning/augmentationDetectionAly et al. [19]
Dual-view Siamese network (YOLOv3 + Siamese CNN)Improves detection by using both views; fewer false positivesNeeds paired images; more complex; matching masses is harderDetectionYan et al. [16]
YOLO + image-to-image translation (Pix2Pix, CycleGAN)High accuracy for masses; enables early prediction; multi-lesion typesLower accuracy for prior calcifications; needs paired data; complexDetectionBaccouche et al. [24]
Anchor-free BMassDNetHigh recall, low false positives; handles various mass sizesNeeds mask labels; can’t detect all lesion types; longer trainingDetectionCao et al. [17]
Anchor-free YOLOv3 (GIoU, focal loss)Fewer false anchors; better for small massesHigher false positives than 2-stage; complex loss tuningDetectionZhang et al. [18]
Decision fusion (CNN, ResNet50 + SVM, LBP + SVM)Very high accuracy; robust; avoids overfittingComputationally heavier; needs careful tuningDetectionManalı et al. [23]
YOLO-LOGO (YOLOv5 + Transformer)Fast, detects and segments masses, high TPRLower precision for small masses; complex; not end-to-endDetection/SegmentationSu et al. [51]
Transformer-based CNN (TransM)High accuracy; models global/local featuresNeeds more data/memory; complexDetectionChen et al. [20]
Swin Transformer mass detectorHigher sensitivity; outperforms CNNs; improved with fusionHigh compute needs; longer training; harder to tuneDetectionBetancourt et al. [26]
Swin-SFTNet (Swin Transformer U-Net)Best for micro-mass segmentation; robust to small/irregular shapesHigher complexity; more demanding; less public validationDetection/SegmentationKamran et al. [52]
StyleGAN2-based anomaly detectorNo annotation needed; detects anomalies; high sensitivityLower specificity; less accurate than supervised; CC view onlyDetectionPark et al. [21]
Table 8. Summary of deep learning models for lesion segmentation in mammography.
Table 8. Summary of deep learning models for lesion segmentation in mammography.
ApproachAdvantageDisadvantageTaskPapers
U-Net family (U-Net, U-Net++, Attention U-Net, ResNet-U-Net)Strong spatial localization; easily extensible (attention, residual, multi-scale features); good for medical imagesMay struggle with dense tissue; sensitive to class imbalance; performance varies with architecture depthSegmentationTiryaki et al. [30], Hithesh et al. [33], Ghantasala et al. [27], Bentaher et al. [40], Nour and Boufama [39], Jimaging et al. [31]
SegNet family (SegNet, Enhanced SegNet)Efficient inference; fast for clinical deployment; lower computational requirementsLower accuracy on complex shapes and dense tissue compared to transformer modelsSegmentationM’Rabet [41], Hithesh et al. [33]
Transformer-based models (Swin Transformer, CoAtNet)Excellent global context modeling; high accuracy and generalizability; powerful multi-scale feature fusionHigh computational cost; complex training and architectureSegmentationMasood et al. [37], Ahmed et al. [34]
DeepLab family (DeepLabv3+)Large receptive field; multi-scale context with fine-detail preservation; flexible dilation ratesHigher complexity; requires careful tuning for dilation rates; slower inferenceSegmentationFarrag et al. [38]
Hybrid CNN + traditional (e.g., U-Net + Active Contour Model)Precise boundary refinement; robust on irregular, low-contrast massesMore complex and slower inference; integration of components adds overheadSegmentationNour and Boufama [39]
Segment Anything Model (SAM)Strong generic segmentation pretraining; flexible for multiple domainsUnderperforms without domain-specific fine-tuning; low accuracy for mammograms out of the boxSegmentationHithesh et al. [33]
Table 9. Summary of deep learning models for breast cancer classification in mammography.
Table 9. Summary of deep learning models for breast cancer classification in mammography.
ApproachAdvantageDisadvantageTaskPapers
CNN-based models (single view or ROI-based CNNs)Strong local feature learning; well-established; good for small regions or patchesLimited global context; high preprocessing needs; struggles with scale/rotation varianceClassificationElkorany et al. [43], Wu et al. [47]
Transformer-based models (ViT, Swin, PVT)Excellent global context modeling; better at handling full images; holistic attentionHigh computational cost; complex to train; requires large datasetsClassificationAyana et al. [44], Bermudez et al. [49], Ahmed et al. [48]
Ensemble CNN frameworksCombines strengths of multiple CNNs; better generalization; robust to variations in inputIncreased computational complexity; feature fusion design is non-trivialClassificationStethoNet [46], Elkorany et al. [43]
Two-stream/multi-view CNN or transformer modelsLeverages complementary views (CC, MLO, left-right); mirrors radiologist workflowMore complex architecture; higher memory and compute requirementsClassificationWu et al. [47], Bermudez et al. [49], Ahmed et al. [48]
GAN-augmented classifiersGenerates missing views (e.g., CC from MLO); enhances data completeness; aids low-resource settingsSynthetic views may lack fine details; challenging to train and validateClassificationYamazaki and Ishida [45]
Multimodal models (image + metadata fusion)Combines imaging and clinical data; improves diagnostic accuracy; more holistic viewMore complex training pipeline; requires comprehensive data collectionClassificationHussain et al. [50]
Table 10. Summary of the additional literature on breast cancer detection, segmentation, and classification.
Table 10. Summary of the additional literature on breast cancer detection, segmentation, and classification.
Author, YearTarget VariableArchitecturePre-ProcessingDatasetOutputResult
Wang et al. (2021) [57]Breast Cancer ClassificationDenseNet-121AugmentationINbreastBenign/MalignantAcc.: 96.2%
Petrini et al. (2022) [58]Breast Cancer DiagnosisEfficientNetTransfer LearningCBIS-DDSMNormal, Benign, MalignantAcc.: 85.13%, AUC: 93%
Ayana et al. (2022) [59]Mass ClassificationMSTLTransfer LearningDDSMBenign/MalignantAUC: 100%
Dada et al. (2024) [60]Breast Cancer DetectionEfficientNetResizeMaMaTT2Benign, Malignant, NormalAcc.: 98.29%
Lopez et al. (2022) [61]Breast Cancer ClassificationPHResNetPre-TrainingCBIS-DDSMMulti-ClassAUC: 84%
Dehghan et al. (2023) [62]Breast Cancer DetectionEnsemble CNNTransfer LearningINbreast, DDSMBenign/MalignantF2: 95.48%
Khan et al. (2024) [63]Breast Cancer ClassificationRM-DenseNetNot SpecifiedNot SpecifiedBenign/MalignantAcc.: 96.50%
Shen et al. (2020) [64]Segmentation, ClassificationResU-segNetAugmentation, FeaturesINbreast, PrivateTumor, MalignancyDSC: 90%, AUC: 98%
Leung et al. (2023) [65]SegmentationU-NetAugmentationCBIS-DDSMTumor RegionsDice: 64.59%
Pi et al. (2021) [66]SegmentationFS-UNetNot SpecifiedCBIS-DDSMTumor RegionsDice: 84.19%
Huynh et al. (2023) [67]ROI OptimizationYOLOXWindowingMultiple datasetsBinaryAUC: 98%
Mohammed et al. (2024) [68]DetectionYOLO, CNNAugmentationCBIS-DDSMTumor RegionsmAP: 91.15%
Rahman et al. (2024) [69]DetectionYOLO + U-NetAugmentationMIASTumor RegionsAUC: 98.6%
Jiang et al. (2022) [70]DetectionEfficientNet-B3Post-ProcessingCBIS-DDSM, MIASTumor RegionsAUC: 96%
Bhatti et al. (2020) [71]Detection, SegmentationMask R-CNN-FPNAugmentationDDSM, INbreastMulti-ClassmAP: 84%
Table 11. An overview of publicly available mammography datasets (with sample size, modality, annotations, access/licensing, and split notes).
Table 11. An overview of publicly available mammography datasets (with sample size, modality, annotations, access/licensing, and split notes).
YearNameTypical TasksSample Size
(Patients/Exams/Images)
Modality/FormatsAnnotationsAccess/LicensingTrain/Test Split Notes
1994Mammographic Image Analysis Society (MIAS) [72]Tumor detection, classification161 patients/—/322 imagesDigitized film (PGM; 1024×1024 from 200 µm resampled)Radiologist truth-markings; elliptical lesion outlines; benign/malignant; tissue typePublic for research; MIAS licence (research-only); no DUANo official split
1997Digital Database for Screening Mammography (DDSM) [73]Tumor detection, classification—/2620 exams/∼10,480 imagesDigitized film (LJPEG; often converted to TIFF/DICOM)ROI contours; BI–RADS descriptors; lesion type; pathologically verified labelsPublic download (legacy tooling); no DUA (license unspecified)No official split; CBIS–DDSM provides standardized splits
2011INbreast [75]Detection, segmentation, classification115 patients/—/410 imagesFFDM (DICOM)Precise lesion contours (XML); BI–RADS density & assessmentFreely available to researchers; no explicit license, no DUA statedNo official split
2017CBIS–DDSM [74]Detection, segmentation, classification1566 participants/2620 studies/10,239 imagesDigitized film curated to DICOMUpdated ROI segmentations & boxes; curated labelsPublic via TCIA; CC BY 3.0; no DUAPredefined train/test splits (mass & calcifications)
2020OPTIMAM (OMI–DB) [76]Detection, classification, risk modeling173,319 women/—/>2.5M imagesFFDM (DICOM) + rich clinical metadataExtensive clinical labels; lesion–level annotations available in subsetsRestricted: apply to Data Access Committee; DUA required; fees may applyNo public split; varies per approved study protocol
2020CSAW (cohort & public subsets) [80]Detection, segmentation, riskFull cohort: multi–million images; CSAW–CC subset: 8723 participants (873 cancer/7850 controls); CSAW–S segmentation: 172 patientsFFDM (DICOM; PNG masks in subsets)Pixel–level tumor annotations (CSAW–S/CC); masking labels (CSAW–M)CSAW–CC: Restricted on request; CSAW–S: Restricted with CC BY–NC–ND 4.0 termsSubsets documented; no single official split across CSAW ecosystem
2021VinDr–Mammo [77]Detection, classification—/5000 exams/20,000 imagesFFDM (DICOM)Breast–level BI–RADS assessment; lesion bounding boxes; double read with arbitrationRestricted on PhysioNet (credentialed access; DUA); citation requiredPredefined split: 4000 train/1000 test
2022DMID [79]Detection, segmentation, classification—/—/510 imagesDigital mammography (DICOM & TIFF)Mass segmentation masks; benign/malignant; BI–RADS density; abnormality typePublic for research/education (publisher’s terms); no DUANo official split
2021CMMD [78]Detection, classification, subtype prediction1775 patients/—/5202 imagesMammography (collected TIFF; released as DICOM) + Clinical data (XLSX)Biopsy–confirmed benign/malignant; molecular subtypes for subsetPublic via TCIA; CC BY 4.0; no DUANo official split
Table 12. Major challenges in breast tumor detection in mammography.
Table 12. Major challenges in breast tumor detection in mammography.
ChallengeCorresponding Articles
Breast Tissue Density Obscuring Masses[15,16,17,18,19]
Generalization Across Populations and Devices[15,20,21]
Integration of Multi-View and Temporal Data[15,16,20]
Anchor Design and Localization Precision[17,18,19]
Limited Annotated Datasets and Class Imbalance[21,22,23,24]
Real-Time Detection Constraints and Computational Cost[17,18,19,25]
Interpretability and Clinical Acceptance[15,21,26]
Table 13. Major challenges in mammography segmentation from the recent literature.
Table 13. Major challenges in mammography segmentation from the recent literature.
ChallengeCorresponding Articles
Breast Tissue Density and Lesion Visibility[27,28,29,30,31]
Class Imbalance and Multi-label Complexity[27,32,33]
Computational Complexity and Model Efficiency[34,35,36,37]
Interpretability and Clinical Trust[34,38,39]
Small Dataset Sizes and Generalization Limits[30,32,37]
Weak Boundary Detection and Irregular Lesion Shapes[31,36,39,40]
Model Adaptability and Real-Time Deployment[35,36,41,42]
Table 14. Major challenges in breast tumor classification in mammography.
Table 14. Major challenges in breast tumor classification in mammography.
ChallengeCorresponding Articles
Overfitting on Small Datasets and Limited External Validation[43,44,45]
Poor Generalization Across Devices and Populations[44,46,47]
Loss of Global Context in CNN-Based Models[44,47,48]
View Integration and Multi-Image Fusion Complexity[45,47,49]
Lack of Interpretability in Deep Learning Pipelines[43,46,50]
Incomplete View Availability in Clinical Data[45]
Integration of Clinical Metadata with Imaging Data[46,48,49,50]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Abdikenov, B.; Zhaksylyk, T.; Imasheva, A.; Rakishev, D. From Mammogram Analysis to Clinical Integration with Deep Learning in Breast Cancer Diagnosis. Informatics 2025, 12, 106. https://doi.org/10.3390/informatics12040106

AMA Style

Abdikenov B, Zhaksylyk T, Imasheva A, Rakishev D. From Mammogram Analysis to Clinical Integration with Deep Learning in Breast Cancer Diagnosis. Informatics. 2025; 12(4):106. https://doi.org/10.3390/informatics12040106

Chicago/Turabian Style

Abdikenov, Beibit, Tomiris Zhaksylyk, Aruzhan Imasheva, and Dimash Rakishev. 2025. "From Mammogram Analysis to Clinical Integration with Deep Learning in Breast Cancer Diagnosis" Informatics 12, no. 4: 106. https://doi.org/10.3390/informatics12040106

APA Style

Abdikenov, B., Zhaksylyk, T., Imasheva, A., & Rakishev, D. (2025). From Mammogram Analysis to Clinical Integration with Deep Learning in Breast Cancer Diagnosis. Informatics, 12(4), 106. https://doi.org/10.3390/informatics12040106

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop