Next-Generation Diagnostics: The Impact of Synthetic Data Generation on the Detection of Breast Cancer from Ultrasound Imaging

Rai, Hari Mohan; Dashkevych, Serhii; Yoo, Joon

doi:10.3390/math12182808

Open AccessArticle

Next-Generation Diagnostics: The Impact of Synthetic Data Generation on the Detection of Breast Cancer from Ultrasound Imaging

by

Hari Mohan Rai

^1,*

,

Serhii Dashkevych

²

and

Joon Yoo

^1,*

¹

School of Computing, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si 13120, Gyeonggi-do, Republic of Korea

²

Department of Computer Engineering, Vistula University, Stokłosy 3, 02-787 Warszawa, Poland

^*

Authors to whom correspondence should be addressed.

Mathematics 2024, 12(18), 2808; https://doi.org/10.3390/math12182808

Submission received: 16 August 2024 / Revised: 3 September 2024 / Accepted: 6 September 2024 / Published: 11 September 2024

(This article belongs to the Special Issue Mathematical Methods in Machine Learning and Data Science)

Download

Browse Figures

Versions Notes

Abstract

:

Breast cancer is one of the most lethal and widespread diseases affecting women worldwide. As a result, it is necessary to diagnose breast cancer accurately and efficiently utilizing the most cost-effective and widely used methods. In this research, we demonstrated that synthetically created high-quality ultrasound data outperformed conventional augmentation strategies for efficiently diagnosing breast cancer using deep learning. We trained a deep-learning model using the EfficientNet-B7 architecture and a large dataset of 3186 ultrasound images acquired from multiple publicly available sources, as well as 10,000 synthetically generated images using generative adversarial networks (StyleGAN3). The model was trained using five-fold cross-validation techniques and validated using four metrics: accuracy, recall, precision, and the F1 score measure. The results showed that integrating synthetically produced data into the training set increased the classification accuracy from 88.72% to 92.01% based on the F1 score, demonstrating the power of generative models to expand and improve the quality of training datasets in medical-imaging applications. This demonstrated that training the model using a larger set of data comprising synthetic images significantly improved its performance by more than 3% over the genuine dataset with common augmentation. Various data augmentation procedures were also investigated to improve the training set’s diversity and representativeness. This research emphasizes the relevance of using modern artificial intelligence and machine-learning technologies in medical imaging by providing an effective strategy for categorizing ultrasound images, which may lead to increased diagnostic accuracy and optimal treatment options. The proposed techniques are highly promising and have strong potential for future clinical application in the diagnosis of breast cancer.

Keywords:

ultrasound imaging; breast cancer detection; deep learning; synthetic dataset generation; StyleGAN3; EffiecientNet-B7

MSC:

68T07

1. Introduction

Ultrasound diagnostics is vital for the early identification and diagnosis of many disorders, including cancers. Ultrasound, in particular, is utilized to analyze formations in soft tissues such as the breast, allowing for the diagnosis of whether they are malignant or benign. The accuracy of such a diagnosis has a direct influence on the patient’s subsequent treatment options and prognosis. However, despite advancements in ultrasound technology, interpreting ultrasound pictures remains a difficult endeavor due to their high variability and the subjectivity of professional assessments. For example, investigations have demonstrated that radiologists can differ greatly in their diagnosis accuracy; in fact, as many as 30% of instances have been found to have discrepancies [1].

Breast cancer is a serious health issue worldwide, which has a significant impact on people’s personal health and public health [2]. According to the World Health Organization (WHO), an estimated 2.3 million women were diagnosed with breast cancer in 2020, with more than 684,000 dying from the illness [3,4]. Accounting for 11.7% of newly diagnosed cases and 6.9% of cancer-related fatalities worldwide, breast cancer has a high incidence rate, which increases its global burden [5]. Early and precise identification is critical for effective treatment, with a five-year survival rate of over 90% for localized breast cancer compared to only 15% for metastatic stages [6].

Breast cancer screening can greatly increase survival rates and is essential for early detection. A variety of approaches are used in screening, such as non-invasive imaging modalities and physical examinations by medical professionals. The imaging modalities that are most frequently used include histology, ultrasound, CT scan, MRI, X-ray, and mammography [7]. Every method has unique benefits and drawbacks that are suited to certain therapeutic situations. When changes are seen on a screening mammography, for example, mammography is usually utilized to provide comprehensive images of the breast to help with diagnosis. For example, mammography has been proven to lower breast cancer mortality in women aged 50–69 by around 20% [8]. Breast ultrasonography is useful in differentiating among solid and fluid-filled lumps; solid lumps may signify benign fibrous growths or malignancy, while fluid-filled lumps are often benign. Compared to mammography, breast MRI provides more information, which makes it useful for identifying microscopic tumors or minor changes in the breast that would not be apparent on a mammogram. For instance, the breast MRI has proved to be 30% more sensitive than mammography in identifying breast cancer in its early stages [8]. If more assessment is required, a breast biopsy might be carried out. To help with a final diagnosis and treatment planning, an insignificant amount of tissue or a few cells are taken out of the suspicious place and examined. All things considered, a mix of screening techniques and imaging modalities enables thorough assessment and early identification of breast cancer, eventually leading to better patient outcomes [9,10].

A significant challenge in medical imaging, especially when associated with breast cancer, is the lack of annotated data. Training reliable diagnostic models requires high-quality, labeled datasets, yet obtaining such data may be time- and resource-consuming. As an illustration, annotated medical-imaging datasets are frequently restricted to small, specialized cohorts, which can lead to models that are not very generalizable to a wide range of populations. The lack of data makes it difficult to create advanced diagnostic devices and restricts the effectiveness of machine-learning models, especially in uncommon or under-represented instances.

This problem may be solved by synthetic data generation, especially when using advanced techniques like generative adversarial networks (GANs) [11,12,13]. GANs may enhance existing datasets by producing high-quality synthetic images that closely resemble real-world data. This adds more training examples to the dataset, which enhances the versatility and effectiveness of the model. For instance, it has been demonstrated in a number of applications that using synthetic data may increase the model training efficiency and accuracy by up to 15% [5,8]. The use of synthetic data can assist in overcoming the constraints imposed by small datasets, improve machine-learning models’ ability to generalize, and eventually, increase the efficiency and accuracy of diagnosis. A potent strategy for overcoming the problems associated with data scarcity is the use of synthetic data in medical imaging, which helps to advance the creation of more sophisticated diagnostic instruments.

The ability to automate the process of interpreting ultrasound images has evolved in recent years due to advancements in artificial intelligence and machine-learning techniques. This can greatly improve the diagnosis accuracy and efficiency. The use of deep learning, particularly with generative adversarial networks and EfficientNet designs, brings up new possibilities for classifying benign and malignant lesions in ultrasound images. Global health is seriously threatened by the rising incidence and mortality rates of breast cancer, which highlights the critical need for innovative research and creative solutions. Breast cancer significantly impairs women’s emotional health and quality of life, in addition to their physical health, on a global scale.

Because breast cancer can take many different forms and progress at different rates, it is complicated and requires an integrated strategy to enhance early detection, precise diagnosis, and successful treatment plans. The urgent need to address the rising incidence of breast cancer is what drives this study. Through delving more deeply into the specifics of breast anatomy, the ways that cancer progresses, and the powers of diagnostic methods, the key contributions are as follows:

Synthetic data integration: We successfully integrate StyleGAN3-generated synthetic ultrasound images with actual data to improve the outcome of the proposed model for breast cancer detection.
Improved metrics: Employing synthetic data considerably improves the classification accuracy, improving the metrics values by around 3% to 4%.
Statistical validation: Our results are validated by thorough statistical analyses, including t-tests and Wilcoxon signed-rank tests, which reveal significant enhancements (p-values < 0.0001).
Detailed assessment: We investigate our model’s performance using a variety of metrics (accuracy, precision, recall, and F1 score) for experiments on both real and synthetic datasets.
Advancement in medical imaging: Our technique demonstrates that GANs could address data scarcity and increase diagnosis accuracy in medical-imaging applications.
Scalability and efficiency: By utilizing EfficientNet-B7, our model is scalable and has low computing needs, making it appropriate for a range of medical-imaging applications.

2. Literature Review

Breast cancer detection is a significant area of medical research, and several machine-learning and deep-learning approaches have been used to increase the diagnosis accuracy. We conducted a literature evaluation to identify research gaps in breast cancer detection. This review highlights current breakthroughs in this sector, concentrating on diverse approaches and their performance indicators.

Sadad et al. (2018) [14] employed mammography scans to apply decision trees to the MIAS and DDSM datasets, achieving a remarkable 98.20% accuracy. Decision trees are well known for their simplicity and interpretability, making them an efficient diagnostic tool. The study’s drawbacks include a limited ability to capture complex connections in large datasets. Mughal et al. (2018) [15] and Kavitha et al. (2022) [16] employed backpropagation neural networks on the MIAS and DDSM datasets, as well as mammography modalities, and obtained accuracy of 99.00% and 98.50%, respectively, confirming neural networks’ usefulness in breast cancer detection. The problem is that the model is prone to overfitting due to its high complexity.

Vijayarajeswari et al. (2019) [17] and Kaur et al. (2019) [18] explored the utilization of SVM on the MIAS dataset and reached 94.00% accuracy. SVM is a convincing classifier that outclasses in high-dimensional spaces and is frequently applied in medical image analysis, especially breast cancer imaging techniques. The limitation of this study is that it imposes sensible collection of kernel functions and settings for best performance. Haris et al. (2024) [19] also employed SVM, acquiring the CBIS-DDSM dataset for mammography modality, and attained 98.8% accuracy. The authors exhibited excellent performance in medical image processing in this study by using structured data for breast cancer diagnosis tasks. A constraint of this study is that it imposes thorough optimization for the best results in terms of sensitivity to the kernel function and parameter choices.

Valvano et al. (2019) [20] and Mahesh et al. (2024) [21] used CNNs to achieve accuracies of 98.20% and 95.2%, respectively, using datasets such as CBIS-DDSM and Mini-DDSM, as well as breast histopathology images. CNNs are widely renowned for their capacity to automatically extract characteristics from images, making them ideal for medical-imaging applications for breast cancer detection. The limitation is that training requires vast volumes of annotated data, which is computationally costly. Cai et al. (2020) [22] achieved 93.7% accuracy with DCNN on the Local and INbreast datasets, proving the deep-learning model’s capacity to gather comprehensive visual information for accurate diagnosis. However, the restriction might be vanishing or inflating gradients in very deep networks. Vaka et al. (2020) [23] employed dense neural networks to assess a private hospital’s histopathology dataset, with an accuracy of 97.21%. This approach exhibits dense networks’ versatility while processing non-image input, such as histopathology slides. The model may struggle with high-dimensional data and may necessitate significant hyperparameter modification, indicating a drawback of our work. Ur Rehman et al. (2021) [24] employed FC-DSCNN on the DDSM and PINUM datasets and achieved 90.00% accuracy. This approach improves the diagnostic performance by combining deep and supervised learning. This study may be vulnerable to overfitting owing to its completely linked layers and vast parameter space.

Ragab et al. (2022) [25] applied CSO-MLP to the BUSI dataset and achieved an accuracy of 97.09%. CSO-MLP is a tailored method that improves the neural network performance of cancer detection applications. The potential sensitivity to hyperparameter selections and dataset features, which requires strong validation across multiple datasets to assure generalizability, appears to be the drawback of this work. Sheeba et al. (2023) [26] used the histopathology datasets Bisque and BreakHis to diagnose breast cancer using TCL-RAM, with a 97.00% accuracy rate. TCL-RAM improves the performance in histopathology image analysis by using transfer-learning techniques in pretrained models and adapting them to fresh datasets. The limitation of this study is that the variations in tissue preprocessing and imaging processes may restrict the transferability of previously learned properties. Yan et al. (2023) [27] used an ensemble classifier on the DDSM and MIAS datasets and achieved 93.26% accuracy. Ensemble approaches use many learning algorithms to increase the overall performance and resilience. The model appears to have limitations due to its increased complexity and significant processing cost. Asadi and Memon (2023) [28] achieved 96.80% accuracy using ResNet50 on the INbreast, BCDR, and WDBC datasets. Advanced designs, like as ResNet50, are recognized for their depth and capacity to tackle challenging image categorization problems. The model may necessitate substantial computational resources and skill for installation and training.

Huynh et al. (2023) [29] used EfficientNet and ConvNeXt designs on six mammography datasets, reaching an accuracy of 92.00%. These advanced CNN designs aim to strike a compromise between model complexity and computational economy, displaying strong performance over a wide range of datasets. The variability in dataset attributes and imaging techniques between institutions may impact the model generalizability, which is the model’s limitation. Bouzar-Benlabiod et al. (2023) [30] achieved 86.71% accuracy on the CBIS-DDSM dataset by combining SE-ResNet-101 with CBR. This method highlights the significance of customized deep-learning models designed for specific datasets. The applied model may not generalize well to different datasets or modalities.

Oyelade et al. (2024) [31] achieved 97.7% accuracy with TwinCNN on the MIAS and BreakHis datasets. TwinCNN models are helpful tools for diagnosing breast cancer since they are designed to handle a variety of imaging modalities. This investigation’s constraints include the need for perfect synchronization across many networks due to the increased complexity of model creation and training. Kadadevarmath and Reddy (2024) [32] proposed the DualNet-DL mammography model, which achieved 94.29% accuracy using the CBIS-DDSM and MIAS datasets. This model architecture combines two networks to improve the feature extraction and classification in breast cancer detection tasks. The drawback of this study is the potential complexity of model training and the interpretability caused by the dual network integration. AlSalman et al. (2024) [33] proposed the federated +DCNN approach for mammography-based breast cancer diagnosis across a number of datasets, including VINDR-MAMMO, CMMD, and INBREAST, with an impressive accuracy of 98.90%. This method use federated learning to train deep convolutional neural networks (DCNNs) jointly across several decentralized data sources while maintaining data privacy. The limitation of this study is the potential for diversity in the data quality and distribution between participating institutions, which may impair the model generalizability.

3. Materials and Methods

Our proposed methodology demonstrates the impact of synthetic data generation using the StyleGAN3 model over the conventional augmentation technique. Figure 1 presets a block diagram depicting the compete methodology utilized for the detection of breast cancer from the ultrasound image datasets. In the first stage, we have acquired the dataset from 4 different sources of breast cancer images, which mainly include 2 pathology types, benign and malignant. Since the images are collected from various sources, preprocessing techniques such as cropping and resizing to a uniform size have been applied. Then, the data have been distributed into the training and test set in the ratio of 80:20, after which conventional data augmentation and StyleGAN3 have been applied to the training dataset, whereas the test dataset has been reserved for validating the model. The EfficientNet-B7 DL model has been utilized for taring the datasets using 5-fold cross-validations techniques, and at last, model has been verified on the test dataset. The complete details about each step are presented in the following sections.

3.1. Data Collection

The foundation of this research was a dataset comprised of 3186 ultrasound images of breast lesions, collected from four different publicly available sources, including breast lesions ultrasound (BrEaST), BUSI, Thammasat, and HMSS (Figure 2). The images were categorized into two groups, malignant and benign, according to the clinical annotations provided. Our study was underpinned by an extensive analysis of ultrasound images of breast lesions, classified as malignant or benign. To achieve the most accurate and generalizable results, a comprehensive dataset was compiled from various sources. This process ensured a broad spectrum of characteristics and features in the ultrasound images, which is necessary for training and testing deep-learning models.

BrEaST-Lesions USG—offers a collection of ultrasound images of breast lesions, equipped with detailed annotations, including clinical data and histological analysis results. We have utilized 252 images from this dataset, including 154 benign types and 98 malignant types [34].
Dataset BUSI with GT—includes breast ultrasound images with precise demarcation of tumor boundaries, providing valuable data for training recognition and classification models. From this dataset, we have utilized 454 benign cancers and 211 malignant cancers, giving a total of 665 images [35].
Thammasat University Dataset—contains images gathered by researchers at Thammasat University, including images with various artifacts, contributing to model training in conditions close to real clinical practice. From this open-source dataset, we have utilized 2006 breast ultrasound images, including 846 benign and 1160 malignant lesions [9].
HMSS Ultrasound Cases—provides an extensive collection of ultrasound diagnostic cases, including images of breast formations, enriching the dataset with a variety of cases. From this dataset, we have used 263 images, in which 120 are benign and 263 are malignant [36].

Figure 2. The distribution of breast ultrasound images by pathology types across the datasets.

3.2. Data Preprocessing

To create a standardized training set, data from all the sources underwent thorough processing and standardization. A key step in the preparation was the standardization of all the image sizes and the cropping operation to remove edges containing the ultrasound device interface from the images. This process helped eliminate irrelevant diagnostic elements and focus the model’s attention on the medical content of the images. We have developed a preprocessing approach that incorporates cropping and scaling images to increase the detection accuracy of breast cancer. Standardizing pictures from different sources requires this method to ensure uniformity in resolution and to focus on pertinent sections of breast tissue [37,38].

We utilized image-cropping techniques in order to differentiate between the breast tissue, which is vital to the diagnosis of cancer. Usually, areas of clinical relevance, such as tumors or certain breast regions, are highlighted by annotations or masks that define the region of interest (ROI). The bounding box coordinates (

x_{m i n}, x_{m a x}, y_{m i n}, x_{m a x}

) of the ROI are found during the cropping process using either automatic segmentation methods or predetermined criteria [39]. By doing this, it is ensured that only relevant and potentially abnormal regions of the breast image are retained for further investigation.

We have then used the technique of resizing the cropped image to the specified target dimensions (512 × 512 pixels) after the ROI has been created and retrieved. In order to maintain image quality and details during transformation, bilinear interpolation has been used for resizing. We have employed an approach that is intended to standardize and resize clinical images to a standard resolution of 512 × 512 pixels, uniformly across all the utilized breast cancer datasets (

D_{BrEaST}, D_{BUSI}, D_{Thammasat}, D_{HMSS}

). In order to ensure consistency in the image dimensions across the various datasets for accurate analysis and dependable training of models in breast cancer detection, standardization is critical in the medical-imaging field [40].

We started by defining our standard target dimensions for resizing as

(W_{target} = 512)

and

(H_{target} = 512)

pixels. In accordance with the best standards in medical imaging, these dimensions were chosen to strike the right balance between computing efficiency and the preservation of image information. Each dataset (D), including images from

D_{BrEaST}, D_{BUSI}, D_{Thammasat}, D_{HMSS},

is iterated through by the algorithm, which processes each image separately. In the dataset (D), we used bilinear interpolation for every image

I_{i}

. Because it preserves the image quality when scaling by smoothly interpolating the pixel values depending on nearby pixels, this approach is ideal for medical imaging [41]. To start resizing the cropped images, scaling factors are first calculated, as given by Equation (1).

{scale}_{w} = \frac{W_{in}}{W_{target}} {scale}_{h} = \frac{H_{in}}{H_{target}}

(1)

where

W_{in}

and

H_{in}

are the width and height of the original image

I_{i} .

We used bilinear interpolation to standardize pictures of uniform width. The coordinates

(x_{in}, y_{in})

in the original image

I_{i}

corresponding to each pixel in the scaled image

I_{i}^{'}

have been calculated using Equation (2).

x_{in} = {scale}_{w} \times x_{out}, y_{in} = {scale}_{h} \times y_{out}

(2)

Here,

(x_{out}, y_{out})

denotes the coordinates in the resized image

I_{i}^{'}

, which ranges from

0 to (W_{target} - 1)

for

x_{out}

and

0 to H_{target} - 1

for

y_{out}

. The variables

(x_{in}, y_{in})

represent the corresponding coordinates in the original image

I_{i}

. In order to carry out bilinear interpolation with accuracy, the method uses Equation (3) to find the closest integer coordinates

(x_{1}, y_{1})

and their subsequent coordinates

(x_{2}, y_{2})

from the original image.

x_{1} = [x_{in}], y_{1} = [y_{in}] x_{2} = \min (x_{1} + 1, W_{in} - 1), y_{2} = \min (y_{1} + 1, H_{in} - 1)

(3)

In bilinear interpolation, the final pixel value in the scaled image is computed using the fractional weights alpha and beta. They are determined by taking into account the target pixel’s relative position across the original image frame. The horizontal interpolation weight between the two nearest pixels in a row is represented by α. In a given column, the weight given to the vertical interpolation between the two adjacent pixels is represented by β. The alpha and beta values are obtained using Equation (4) [42].

α = x_{in} - x_{1}, β = y_{in} - y_{1}

(4)

To compute the pixel value

({I'}_{out} [x_{out}, y_{out}]

in the resized image

I_{i}^{'}

, bilinear interpolation combines the values of the four closest pixels in the original image

I_{i}

. This process is explained by Equation (5):

I_{out} [x_{out}, y_{out}] = (1 - α) \cdot (1 - β) \cdot I_{in} [x_{1}, y_{1}] + α \cdot (1 - β) \cdot I_{in} [x_{2}, y_{1}] + (1 - α) \cdot β \cdot I_{in} [x_{1}, y_{2}] + α \cdot β \cdot I_{in} [x_{2}, y_{2}]

(5)

After processing all the images from the

D_{BrEaST}, D_{BUSI}, D_{Thammasat}, D_{HMSS}

, the standardized images are saved in a separate dataset. This method guarantees that all the images are uniformly scaled to 512 × 512 pixels using bilinear interpolation. Algorithm 1 presents the complete preprocessing procedures used in our proposed technique in terms of pseudocode. Standardizing image dimensions is crucial in medical imaging because it enables uniform analysis and model training across different datasets. We improved the quality of the standardized images by using bilinear interpolation, which has detail-preserving features. This, in turn, provided precise and dependable medical image analysis, thereby enhancing the breast cancer diagnosis accuracy. These standardization and data preparation procedures were critically important for creating a high-quality and representative training set that unites a wide spectrum of ultrasound images of breast lesions. This approach ensures the high generalization ability of the developed models and increases the accuracy and efficiency of the ultrasound image classification.

Algorithm 1. Pseudocode for proposed preprocessing.

Input: Datasets $(D) [D_{BrEaST}, D_{BUSI}, D_{Thammasat}, D_{HMSS}]$
Output: $D_{s t a n d a r d i z e d}$ (standardized and uniformly resized images)
# Step 1: Initialization
Define $W_{t a r g e t}$ = 512
Define $H_{t a r g e t}$ = 512
$D_{s t a n d a r d i z e d}$ = []
# Step 2: Process Each Dataset
for each dataset D in $[D_{BrEaST}, D_{BUSI}, D_{Thammasat}, D_{HMSS}]$ :
for each image $I_{i}$ in dataset D:
# Step 2a: Crop Image Based on ROI
$x_{m i n}$ = calculate_ $x_{m i n}$ ( $I_{i}$ )
$x_{m a x}$ = calculate_ $x_{m a x}$ ( $I_{i}$ )
$y_{m i n}$ = calculate_ $y_{m i n}$ ( $I_{i}$ )
$y_{m a x}$ = calculate_ $y_{m a x}$ ( $I_{i}$ )
$I_{i}^{cropped}$ = $I_{i}$ [ $y_{m i n}$ : $y_{m a x}$ , $x_{m i n}$ : $x_{m a x}$ ]
# Step 2b: Resize Cropped Image to Uniform Dimensions
$W_{i n}$ = width $(I_{i}^{cropped})$
$H_{i n}$ = height $(I_{i}^{cropped})$
# Compute scaling factors
${scale}_{w}$ = $W_{i n}$ / $W_{t a r g e t}$
${scale}_{h}$ = $H_{i n}$ / $H_{t a r g e t}$
# Initialize resized image I_i_prime
$I_{i}^{'}$ = $e m p t y_{i m a g e}$ ( $W_{t a r g e t}$ , $H_{t a r g e t}$ )
# Perform bilinear interpolation for resizing
for $y_{o u t}$ from 0 to $H_{t a r g e t}$ − 1:
for $x_{o u t}$ from 0 to $W_{t a r g e t}$ − 1:
$x_{i n}$ = ${scale}_{w} \times x_{out}$
$y_{in} = {scale}_{h} \times y_{out}$
$x_{1}$ = floor( $x_{i n}$ )
$y_{1}$ = floor( $y_{i n}$ ))
$x_{2}$ = min( $x_{1}$ + 1, $W_{i n}$ − 1)
$y_{2}$ = min( $y_{1}$ + 1, $H_{i n}$ − 1)
$α = x_{in} - x_{1}$
$β = y_{in} - y_{1}$
$I_{i}^{'} [x_{out}, y_{out}] = (1 - α) \cdot (1 - β) \cdot I_{i}^{cropped} [x_{1}, y_{1}] + α \cdot (1 - β) \cdot I_{i}^{cropped} [x_{2}, y_{1}] + (1 - α) \cdot β \cdot I_{i}^{cropped} [x_{1}, y_{2}] + α \cdot β \cdot I_{i}^{cropped} [x_{2}, y_{2}]$
# Step 2c: Store Standardized Image
$D_{s t a n d a r d i z e d}$ .append( $I_{i}^{'}$ )
# Output the standardized and uniformly resized images
return D_standardized

3.3. Data Augmentation

We carefully addressed the possible class imbalance created by the use of synthetic data. We used class balancing strategies in our study, such as incorporating additional data augmentation (6 kinds) techniques to reduce any potential distortions. Furthermore, to make sure there were an equal number of classes within the synthetic data, we gathered multiple additional datasets (from four different sources and four types of datasets), including some that became readily accessible in 2024. This method enabled us to use the most recent and diverse data, considerably improving the quality and efficacy of the model training.

Data augmentation is a collection of techniques employed to generate additional data samples from an existing dataset [40]. The primary objective is to enrich the diversity and size of the dataset, thereby enabling ML/DL models to better learn from the available data. These techniques involve applying various transformations to the original images, such as random cropping, horizontal or vertical flipping, adding noise, blurring, random shifting, scaling, etc. During our training process, in addition to standardizing and cropping images, we integrated several data augmentation methods to augment the dataset and enhance the robustness of our models.

Before applying data augmentation, we partitioned the entire classification dataset into training and test datasets. The total dataset collected from all the sources comprised 3186 images for the classification of breast cancer in two categories: malignant and benign. Initially, the dataset was split into 80% (training data) and 20% (test data), resulting in 2548 images for training, with the remaining 638 images reserved for testing the models. Subsequently, we applied the following augmentation techniques exclusively to the training set (2548 images), while the test images were kept in reserve and not subjected to any processing techniques. Conventional augmentation such as horizontal flipping, vertical flipping, random shifts, scaling and rotations, perspective changes, and Gaussian noise addition have been applied to the real training images. These augmentation strategies were carefully selected to improve the model training and enhance the models’ ability to handle the diverse distortions often present in clinical settings. To provide a concise overview of the augmentation techniques employed, they have been organized in Table 1, while visual representations of the applied augmentation techniques can be found in Figure 3.

3.4. StyleGAN3

The StyleGAN3 architecture (Figure 4) is an advanced generative model that creates images of excellent quality by combining a number of carefully constructed layers and functions to improve the image fidelity, eliminate aliasing, and ensure the translation equivariance [43]. The architecture is made up of a mapping network, a synthesis network, layer operations, noise injection, and ToRGB layers, all of which are mathematically designed to help generate realistic images [44].

First, a latent vector

z \sim N (0, I),

derived from a Gaussian distribution, is transformed via the mapping network. The mapping network transforms z into an intermediate latent space (w) using an 8-layer multi-layer perceptron (MLP), given by Equation (6).

w = f (z) = M L P (z)

(6)

Next, using sinusoidal mappings, Fourier features are employed to encode high-frequency details (Equation (7)).

ϕ (z) = [\sin (ω z), \cos (ω z)]

(7)

where

ω

is a learnt frequency parameter. These modified features are passed into the synthesis network, which employs the features to create images by processing them through layers with increasing resolution [43,44].

The feature maps are then gradually passed through several layers via the synthesis network, each of which is impacted by the intermediate latent vector w, to gradually produce images. The Fourier features are first converted into baseline feature maps by the network’s Conv 1 × 1 layer, which also acts as a starting point for further feature extraction. The synthesis network is made up of numerous layers (L0 through L13), each intended to handle varying levels of picture detail. The coarse layers (L0 to L6) record the image’s main structure, including the fundamental forms and layouts. The network refines mid-level elements, such as complex frameworks like facial features and textures, as it moves through the middle layers (L7 to L11). Lastly, at the greatest possible quality of the image, the fine layers (L12 to L13) concentrate on catching minute features such as hair strands and skin textures [43]. Style modulation (Mod) is included into every layer, as expressed by Equation (8).

y = s_{i} \cdot x, s_{i} = Mod (w)

(8)

This is followed by the demodulation process to regulate the feature magnitudes, as given in Equation (9).

\hat{x} = \frac{y}{\sqrt{E [y^{2}] + ϵ}}

(9)

After that, convolution blocks (Conv 3 × 3 or 1 × 1) are used to process the modulated and demodulated features. These blocks are essential for feature extraction and refinement at every resolution stage [44]. Moreover, the network makes use of exact upsampling and downsampling procedures, as given by Equation (10).

Downsample (x) = x ↓ s, Upsample (x) = x ↑ s

(10)

where a scaling factor is represented by

s

. The StyleGAN3 design is significantly impacted by noise inputs, which introduce stochastic variation at every layer. A noise is added, as expressed by Equation (11):

x_{out} = x + b_{i} \cdot N_{i}

(11)

where the learned parameter weight is denoted by

b_{i}

and noise vector is denoted by

(N_{i} \sim N (0, I))

. The final feature maps are modulated to RGB using the ToRGB Layers, as given by Equation (12).

RGB = C o n v_{1 x 1} (Mod (w))

(12)

An exponential moving average (EMA) is utilized for smoother updates in order to stabilize the training, as given by Equation (13).

θ_{EMA} = α θ_{EMA} + (1 - α) θ_{current}

(13)

StyleGAN3 can produce high-fidelity, alias-free images with realistic features and a consistent structure by combining these procedures. However, while we acknowledge the potential of other image-generating techniques, such diffusion models, we believe that GANs—specifically, StyleGAN3—may be more advantageous for enhancing medical datasets, particularly those employed in diagnostics. One major factor is the high complexity and computing needs involved with diffusion models. Compared to GANs, these models usually need more processing power and take longer to train, which makes them less useful in resource-constrained applications like hospital settings. Diffusion models also frequently offer less control over the generating process, which may result in inaccurate or distorted diagnostic characteristics. However, GANs such as StyleGAN3 provide more control over the material that is generated, guaranteeing that the pictures fulfill certain standards that are essential for medical applications.

3.5. Data Generation Using StyleGAN3

To enhance the quality and diversity of the training dataset, generative adversarial network technology (StyleGAN3) [44] was employed, configured to create ultrasound images of breast lesions. A crucial aspect of the generation process was the production of images with a resolution of 512 × 512 pixels. This size was chosen to preserve the high image detail and quality, which is critically important for ensuring the accuracy of the subsequent classification. The model was trained separately for each class of lesion, which allowed for the generation of category-specific images while maintaining their high resolution. The training process of the StyleGAN3 model continued until kimg = 1000 was reached, which provided a sufficient degree of training for generating high-quality 512 × 512 images. This resolution was strictly maintained throughout the entire generation process to ensure that all the generated images would have consistent sizes and the quality required for accurate and efficient classification.

(Explanation: kimg = 1000 means that the training process will continue until the model has seen 1,000,000 (1000 thousand) real images in total. The quality of the generated images was assessed using the Fréchet inception distance (FID) metric, which quantitatively measured the closeness of the distribution of the generated images to the original dataset. It is reported that the FID metric reached 22.9 for malignant lesions and 35.4 for benign lesions, indicating the high quality and realism of the images generated at a resolution of 512 × 512).

Following the completion of training of the StyleGAN3 model and achieving FID values of 22.9 for malignant lesions and 35.4 for benign lesions, a detailed visualization of the results was conducted to assess the quality of generation. For each class, 25 generated images were selected and visualized in a tile format, allowing for a clear evaluation of the diversity and realism of the synthetic images (Figure 5). This visualization serves as further confirmation of the effectiveness of using generative adversarial networks to enhance training datasets in medical diagnostics.

Synthetic Image Generation

Our approach leveraged the StyleGAN3 architecture to synthesize realistic ultrasound images from random noise vectors. The generator within the GAN model transforms random noise vectors,

z,

sampled from a latent space into high-resolution ultrasound images, represented as

x, [z \sim N (0,1)]

. The core of the generator consists of stacked convolution blocks [45], which progressively enhance the resolution of the generated images. These blocks enable the generator to capture fine-grained details and larger-scale features simultaneously, as is expressed by Equation (14).

x = G (z)

(14)

The discriminator component of StyleGAN3 distinguishes between real

(x_{real})

and generated

(x_{fake})

ultrasound images. It achieves this through a series of convolutional blocks, as represented by Equation (15).

P_{real} = D (x_{real}) P_{fake} = D (x_{fake})

(15)

The discriminator’s output,

P_{real} a n d P_{fake},

represents the probabilities that the input samples are real or fake, respectively, as described by Equation (16).

{\hat{y}}_{r e a l} = σ (P_{real}) {\hat{y}}_{f a k e} = σ (P_{fake})

(16)

By training the generator and discriminator adversarially, StyleGAN3 learns to generate ultrasound images that closely resemble real ones, offering a valuable tool for various applications in medical imaging and research.

3.6. EffiecientNetB7

Selection of the right classifier is very important for an accurate and efficient detection process. There are several deep-learning classifiers available for the classification of medical images, such as CNNs, LSTMs, AttentionNet, transformers, EfiicientNets and many more [46,47]. CNN are the most utilized models for the medical image classification task. To improve the accuracy of the models, they are scaled up; for example, Resnet-18 scaled up to ResNet-201 [48]. There are only three dimensions available to scale up the CNN models: width, depth, and image resolution. The CNN operation can be defined by Equation (17).

N = ⊙_{i = 1 . . . . S} F_{i}^{L_{i}} (X_{< H_{i}, W_{i}, C_{i} >}) N = F_{k} ⊙ . . . . F_{1} ⊙ F_{1} (X_{1}) = ⊙_{j = 1 . . . k} F_{j} (X_{1}) Y_{i j}^{l} = \sum_{a = 1}^{A^{l - 1}} \sum_{b = 1}^{B^{l - 1}} F_{i}^{l} \cdot X_{a + (i - 1) s_{l}, b + (j - 1) s_{l}}^{l - 1} + b_{i}^{l}

(17)

Scaling up the dimensions of the models in width, depth, and resolution increases the accuracy, but for the bigger model, the accuracy gain diminishes. To overcome this, compound scaling is introduced. In this case, the model is scaled up in an efficient way in all three dimensions simultaneously, which will extract more complex features and fine details without increasing the computational cost [49,50]. The compound scaling is defined by Equation (18).

depth (d) = α^{φ} width (w) = β^{φ} resolution (r) = γ^{φ} s . t . α . β^{2} . γ^{2} \approx 2 α ≧ 1, β ≧ 1, γ ≧ 1

(18)

Here,

α, β, γ

are the scaling coefficients, which can be determined through a grid search, and

φ

is a nonnegative scaling step that is user specific. The FLOPS of the CNN model is directly proportional to the depth, width2 and resolution2, so if we double the depth, the FLOPS will be double, but if we double the width or resolution, the FLOPS becomes 4 times greater. In EfficientNet, if we scale the model using the above equation, the FLOPS will be scaled up by

{(α . β^{2} . γ^{2})}^{φ}

. So, our main objective is to optimize the FLOPS as well as the accuracy, as given by Equation (19) [50].

A c c u r c a y \times {[\frac{F L O P S}{T}]}^{w}

(19)

where Accuracy is the model’s performance on the target task, FLOPS is the model’s computational cost, measured by floating-point operations, T is a target FLOPS budget, if specified, and w is a hyperparameter set to −0.07 (a negative value emphasizes accuracy). The objective function aims to maximize the product of the accuracy and a term that penalizes high FLOPS. The negative weight w of −0.07 emphasizes the importance of achieving high accuracy while still keeping the FLOPS under control.

The EfficientNet models come with 8 variants from EfficientNetB0–EfficientNetB7. In this work, we have utilized the EfficientNetB7 model, which best suited for our application (breast cancer detection task). Figure 6 illustrates an architectural view of the EfficientNet-B7, along with layer wise distribution. This model includes the mobile inverted residual block (MBConv), which is a building block of the EfficientNet model. The MBConv block utilizes the depthwise separable convolution (Depthwise Conv) block, along with squeeze and excitation (SE) block. The depthwise convolution block concentrates on extracting characteristics unique to each color channel in the image while collecting spatial information within each channel. It ignores the connections among these channels, though. Mathematically, this operation is represented in Equation (20).

Y_{d} (i, j, k) = \sum_{m = 0}^{M - 1} \sum_{n = 0}^{N - 1} X (i + m, j + n, k) \cdot K_{d} (m, n, k)

(20)

where

X

represents the input image,

K_{d}

represents the depthwise filter,

M

and

N

are the filter dimensions, and

Y_{d} (i, j, k)

represents the output feature map at position

(i, j)

for channel

k .

The output of the depthwise convolutions is combined with the

(1 \times 1)

pointwise convolution, as defined by Equation (21).

Y_{p} (i, j, k) = \sum_{c = 0}^{C - 1} W_{p} (i, j, k, c) \cdot Y_{d} (i, j, c)

(21)

where the number of channels is denoted by

C,

the

1 \times 1

convolution filter is represented by

W_{p}

, and the output following the pointwise convolution is given by

Y_{p} (i, j, k) .

Squeeze and excitation (SE) blocks are incorporated into the architecture to enhance the feature extraction by recalibrating the channel-wise feature response, as given in Equation (22).

S (x) = σ (W_{2} \cdot δ (W_{1} \cdot \frac{1}{H \cdot W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} x (i, j, :)))

(22)

where

H a n d W

are the height and width of the feature maps, and

x (i, j, :)

represents the feature map values at position

(i, j)

through all the channels.

δ

is the ReLU activation function, which introduces non-linearity,

W_{1}

and

W_{2}

are the weights of the two fully connected (FC) layers, and σ is the sigmoid activation function. These blocks may restrict less informative channels and highlight more informative ones. By concentrating on the most pertinent information found in the data, this procedure improves the feature extraction. An architectural view of the blocks, (a) SE block, (b) MBConv1(3 × 3), (c) MBConv6 (3 × 3), and (d) MBConv6 (5 × 5), used in the EfiicientNet-B7 model is shown in Figure 7.

EfficientNet-B7’s architecture feeds the input image to the Conv 3 × 3 block, a fundamental conventional convolution block that extracts the crucial features from the input image. Following the first Conv 3 × 3 block, the structure comprises numerous blocks (blocks 1–7) that use MBConv blocks of variable numbers. Block 1 has three MBConv1 blocks, where each MBConv1 block is represented by Equation (23).

Y_{MBConv 1} (i, j, k) = \sum_{k = 0}^{C - 1} (\sum_{m = 0}^{2} \sum_{n = 0}^{2} X (i + m, j + n, k) \cdot K_{d} (m, n, k)) \cdot W_{p} (i, j, k, c)

(23)

Here,

DWConv

represents the depthwise convolution and

PWConv

denotes the pointwise convolution.

Blocks 2 and 3 employ seven MBConv6 (3 × 3) and MBConv6 (5 × 5) blocks, respectively. This design enables the model to capture features at various sizes within each channel, as formalized in Equation (24).

Y_{MBConv 6 (3 \times 3)} (i, j, k) = \sum_{k = 0}^{C - 1} (\sum_{m = 0}^{2} \sum_{n = 0}^{2} X (i + m, j + n, k) \cdot K_{d} (m, n, k)) \cdot W_{p} (i, j, k, c) Y_{MBConv 6 (5 \times 5)} (i, j, k) = \sum_{k = 0}^{C - 1} (\sum_{m = 0}^{4} \sum_{n = 0}^{4} X (i + m, j + n, k) \cdot K_{d} (m, n, k)) \cdot W_{p} (i, j, k, c)

(24)

Similarly, 10 MBConv6 (3 × 3) and MBConv6 (5 × 5) are utilized in blocks 4 and 5, respectively. Block 7 employs 13 MBConv6(5 × 5), followed by 4 MBConv6 (3 × 3) in block 6, as it proceeds through the feature map. The model is able to extract more complicated characteristics for precise breast cancer classification because of the hierarchical structure with increasing complexity in the MBConv blocks.

3.7. Evaluation Metrics

In our analysis, we employed some standard evaluation metrics, as defined by Equations (25)–(28).

Accuracy (Acc) = \frac{T P + T N}{T P + T N + F P + F N}

(25)

Precision (Pre) = \frac{T P}{T P + F P}

(26)

Recall (Rec) = \frac{T P}{T P + F N}

(27)

F 1 - score (F 1) = \frac{2 * P r e * R e c}{P r e + R e c}

(28)

The foundation for analyzing classification models lies in the confusion matrix. This matrix methodically presents the quantity of accurate classifications (true positives—TPs, true negatives—TNs) and incorrect classifications (false positives—FPs, false negatives—FNs) for a given task. The confusion matrix holds specific significance in circumstances where missing positive cases carry considerable concerns [47]. A high recall value ensures the model minimizes the risk of dominating serious instances, even if it means accepting a higher rate of false positives. This trade-off is crucial, particularly when the cost of missing a positive case outweighs the cost of falsely identifying a negative one [51,52].

4. Results

We have performed two separate experiments for the detection of breast cancer utilizing ultrasound datasets from four sources. The first experiment was on the original or real dataset with simple augmentation and the second experiment with real and synthetic dataset generated using StyleGAN3, utilizing the EfficientNet-B7 model. For the classification of ultrasound images into malignant and benign lesions, we selected the EfficientNet-B7 architecture based on preliminary experiments, where this architecture demonstrated the best performance on our dataset. This choice was driven by the fact that the EfficientNet-B7 model offers an optimal balance between the number of parameters and computational efficiency, which is especially important when working with high-resolution medical images. During the experiments, it was found that models with fewer parameters, such as EfficientNet-B5, showed lower accuracy, indicating insufficient model complexity for successful training on this dataset. The initial dataset included 3186 ultrasound images collected from various public sources. From this pool, 20% (638 images) were set aside for testing using a fixed seed to ensure the consistency and reproducibility of the results. The remaining data were split into five folds for cross-validation, ensuring that each image would be used in the training and validation sets in different iterations of the cross-validation.

Both experiments were carried out on common hardware and software setups to examine the performance of both methodologies. The hardware used in the experiments was a computer system with 16 GB RAM, a 500 GB SSD hard drive, and 12 GB GPU support for smooth and quick training of the used model. The graphics processing unit (GPU) is especially important for sophisticated data processing and accelerating deep-learning training. In the software part, we used the Python 3.10 programming environment and the PyTorch Python library to create the DL model. We also utilized the same hyperparameters for both experiments: 50 epochs, Adam optimizer, 0.0001 learning rate, five-fold cross-validation training technique, and input image size 512 × 512.

4.1. Experimentation on Real Dataset

In the first experiment, we utilized the original 3186 ultrasound images, among which 80% (2548) images were used for training, along with some augmented images. In this experiment, we have used the real images with some augmentation applied, such as horizontal flipping, vertical flipping, random shifts, scaling, and rotations, perspective changes, and addition of Gaussian noise. We have utilized the common DL model EfficientNet-B7 for both the experiments, with 50 epochs as the input size as the images are of high quality, 512 × 512. The five-fold cross-validation strategy was utilized for training the utilized model, and for each fold, we have plotted the training history in terms of the training and validation accuracy and loss, respectively. Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12 illustrate the training history curves for the first experiment on a real dataset. These curves provide the model training summary in terms of the loss and accuracy during five-fold cross-validation, where each fold utilizes a separate set of training and validation data.

The test result using the EfficientNet-B7 model in terms of the confusion matrix is presented in Figure 13 on the test dataset (638 US images). From the confusion matrix of the first experiment, we can observe that the model identifies 291 malignant and 275 benign classes and misclassifies 30 benign and 42 malignant classes. Also, the performance of each fold in terms of the evaluation metrics, accuracy, F1 score, precision and recall, for each fold is presented in Table 2. The accuracy metrics among all five folds show reliable performance, where fold 0 achieves 0.8558 accuracy, whereas highest accuracy is achieved in fold 4 with 0.8746. The overall aggregated accuracy for this experiment is 88.71%, which shows the good reliability of the model for correctly classifying breast cancer from ultrasound images.

The precision score for this experiment varies from 0.8560 in fold 0 to 0.8748 in fold 4, and the aggregated precision value is 88.79%. This precision score shows that the model has a good capability for identifying positives cases (malignant cancer) for the detection of breast cancer. If we look at the recall values, it varies from 0.8542 for fold 2 to 0.8746 for fold 4. The aggregated recall value is 88.71%, which shows the good ability of the model to correctly classify the actual cancerous images. In terms of the F1 score metrics, the harmonic mean of the precision and recall varies from 0.8543 (fold 2) to 0.8745 (fold 4). The overall aggregated score is 88.72% in the first experiment using the proposed model, which shows the good ability to minimize the false negative and identify the true positive, along with the good predictions.

If we observe the overall performance of the model in the first experiment on a real dataset, firstly, it shows consistent performance with slight variations in terms of all the evaluation metrics. This proves that the model generalizes well across all five subsets of the data, which is very curial for the real-time application. Secondly, fold 4 provides the highest values for each metrics, accuracy of 0.8746, F1 of 0.8745, precision of 0.8748, and 0.8746, which show that the model performs optimally on this subset, providing strong classification ability and consistency across all metrics. Similarly, fold 2 shows the lowest performance, with an accuracy of 0.8543, an F1 Score of 0.8543, a precision of 0.8569, and a recall of 0.8542. These differences, although minor, indicate areas for potential model improvement, such as handling variations in data that are less represented in other folds. The overall aggregated score for all the metrics is 88.71% accuracy, 88.72% F1 score, 88.79% precision, and 88.71% recall, which show that model is reliable and effective in the detection of breast cancer from ultrasound images.

4.2. Experimentation on Combined Dataset (Real and Synthetic)

The second experiment was performed on the combined dataset, which included the real data and GAN-generated ultrasound images. The same model architecture, hyperparameters and all the other environments were utilized while working with the combined dataset. The main difference was in the expansion of the training set by including 10,000 generated images (5000 malignant and 5000 benign), created using the generative adversarial network StyleGAN3. These generated images, with a resolution of 512 × 512 pixels, were added to 2548 original images (after setting aside the test sample), creating a combined dataset of 12,548 images for training. Combining these augmentations with the diversity of features presented in the generated data enabled the creation of a powerful and comprehensive training set. The application of augmentations significantly expanded the model’s capabilities in image classification, increasing its accuracy and resistance to overfitting. In Figure 14, the distribution of each pathology type, malignant and benign, after the GAN augmentation is shown.

Training on the combined dataset was also conducted using five-fold cross-validation and similar augmentation methods as for the original dataset. Additionally, this approach allowed for the exploration of the impact of generated data on the model’s generalization ability and its accuracy. Special attention was paid to the analysis of the quality metrics, including the F1 score, to assess the effectiveness of the models trained on the expanded dataset. We have also illustrated the training history curves for the second experiment in Figure 15, Figure 16, Figure 17, Figure 18 and Figure 19 on the Real+GAN dataset.

The second experiment also employed the EfficientNet-B7 model on a Real+GAN augmented dataset, which also utilized the same test set of 638 ultrasound images. Figure 20 depicts the result in terms of the confusion matrix, which provides the model with enhanced performance for breast cancer diagnosis. It can be observed from the confusion matrix that the model correctly classifies 302 (true positive) malignant cases and 285 (true negative) benign cases. The number of false positives, or benign cases misclassified as malignant, has decreased to 20. This reduction represents a significant improvement in the model’s accuracy, demonstrating its improved ability to prevent false alarms in benign situations.

Similarly, the number of false negatives, or malignant cases misclassified as benign, was reduced to 31, demonstrating the model’s better recall in recognizing true malignant instances. In comparison to the confusion matrix from the first experiment, the second experiment’s matrix shows a considerable increase in all of the crucial areas: greater true positive and true negative counts, as well as lower false positive and negative counts. This suggests that the GAN-augmented dataset successfully improved the model’s capacity to generalize and reliably detect breast cancer in ultrasound images.

The second experiment, which used the EfficientNet-B7 model on a combined dataset improved with GAN-augmented images of excellent quality, showed a considerable improvement in performance across all the metrics. The performance of the second experiment in terms of all the evaluation metrics on the combined dataset is presented in Table 3. The accuracy metrics’ values across the folds range from 0.8715 (fold 4) to 0.9044 (fold 1), where the aggregated accuracy is 92.01%. The introduction of GAN-augmented images obviously contributed to this improvement by providing the model with more high-quality data from which to train, hence improving its generalization capabilities. The precision score varies between 0.8748 (Fold 3) to 0.9061 (Fold 1), with an average precision of 92.07%, indicating that the model can properly detect the majority of malignant instances. High recall is vital for detecting most cancer cases and reducing the chance of missed diagnosis. Recall scores ranging from 0.8715 (Fold 4) to 0.9044 (Fold 1), with an aggregated recall of 92.01%, show that the model can accurately identify the majority of malignant cases. High recall is important for detecting the majority of cancer cases and reducing the possibility of missed diagnosis. The F1 scores, which balance precision and recall, were consistently high across all the folds, ranging from 0.8715 to 0.9044, with an average of 0.9201. The uniformity of the F1 score across various folds shows that the model performed well in both recognizing real positive events and limiting false positives. These metrics demonstrate that the GAN-augmented dataset improved the EfficientNet-B7 model’s ability to distinguish between benign and malignant instances, increasing the model’s sensitivity and reliability.

These metrics reveal that the GAN-augmented dataset enabled the EfficientNet-B7 model to better distinguish between benign and malignant cases, enhancing both the reliability and the sensitivity of the model. Figure 21, Figure 22, Figure 23, Figure 24 and Figure 25 provide a comparison of the performance over all the folds as well as aggregated results for all the metrics (accuracy, recall, precision, F1 score, and aggregate score). It is clear that in the second experiment utilizing the combined (Real+GAN) dataset, all the metric values increase or improve, with the exception of fold 4, where the first experiment scores on the real dataset are marginally higher than those on the combined dataset. The aggregated values across all the metrics—accuracy, precision, recall, and F-score—increase by 3.3%, 3.28%, 3.3%, and 3.29%, respectively, for the second experiment utilizing the combined dataset (Real+GAN). This finding emphasizes the overall improvement made by incorporating GAN-generated data into the training set.

4.3. Statistical Analysis

In addition to evaluating the Real and Real+GAN datasets using numerous metrics, we performed extensive statistical analyses to estimate the significance of the observed differences. More precisely, we used the t-test and the Wilcoxon signed-rank test to determine the statistical significance of the metrics across the two experimental scenarios. This section summarizes the analysis and findings.

4.3.1. T-Test Analysis

We conducted a t-test to evaluate if there was any significant variation between the metrics in the Real dataset and the Real+GAN dataset. To determine if the outcomes of two groups differ statistically from one another, the t-test is frequently used to compare the results. Table 4 provides a summary of the t-test results and the corresponding hypothesis. The t-test produces t-values that are substantially larger than zero (~6.3) for all the metrics, and the associated p-values that are much lower (0.0001) than the standard significance level of 0.05. As a result, we must reject the null hypothesis (H0) for each metric, proving that there is a statistically significant difference within the experiments performed on Real dataset and the Real+GAN dataset.

4.3.2. Wilcoxon Signed-Rank Test Analysis

We also used the Wilcoxon signed-rank test to assess the significance of the variations between matched samples of both experiments on the Real and Real+GAN datasets. Table 5 provides a summary of the Wilcoxon signed-rank test results with the corresponding hypothesis for each metric. The Wilcoxon test is a non-parametric approach that does not presuppose the normal distribution of the data, making it appropriate for instances in which the data do not fit the t-test assumptions. The results of the Wilcoxon signed-rank test indicate that every metric has a Wilcoxon statistic of 0, with p-values below 0.0001. The null hypothesis (H0) is rejected, and the statistical significance of the observed differences is confirmed, as this demonstrates a very significant difference between the experiments on both datasets.

The findings of the Wilcoxon signed-rank test and the t-test consistently demonstrate that there are notable distinctions between both experimentations on the Real and Real+GAN datasets for each metric that was evaluated. The statistical reliability of the observed gains (or changes) in the metrics with the inclusion of GAN data is confirmed by the substantial p-values derived from these tests, which rule out the possibility of random variation.

5. Discussion

In this section, we discuss and compare our proposed techniques with the state-of-the-art techniques for the detection of breast cancer. For this purpose, we have included several studies on breast cancer detection published between 2018 and 2024. Also, we have included several criteria, such as techniques utilized, datasets used, imaging modalities, and metrics (%). From Table 6, we can observe that the authors have utilized several modalities, such as mammography imaging (MMI), histopathology (H&E), and ultrasound imaging (US), from various datasets. The most commonly utilized modality for the detection of breast cancer is mammography and the most commonly utilized datasets are the DDSM (Mini or CBIS) [14,15,16,21,24,27,30,32,53], which shows the acceptability of the DDSM datasets.

There are many types of techniques have been used by many authors, including machine-learning models such as SVM, BPNN, DT, MLP, etc.) and deep-learning models (CNN, DCNN, ResNet, EffiecientNet, etc.). The highest accuracy (98.90%) for the detection is achieved by the authors in [33], utilizing federated learning techniques in combination with the DCNN model on multiple datasets, VINDR-MAMMO, CMMD, INBREAST with the mammography imaging modality. Although the authors have achieved the highest accuracy among all the utilized studies, they have mainly focused on the accuracy metrics, which may not be the only criteria to decide the supremacy. The study in [29] used a maximum of six datasets utilizing EfficientNet and ConvNeXt models on MMI modality and achieved 92.00% accuracy, precision, recall and 97% F1 score. But this score is not on the combined dataset, as they have performed the experiment on the individual dataset. Some authors [21,29,33] have utilized more than two datasets, but they have performed the experiment of individual datasets.

The multiple modalities are also utilized in only [21,24], which achieved 95.2% accuracy and 97.7% accuracy respectively. But the issue is that the authors of [21] only provided the accuracy metrics, whereas the authors of [24] have achieved 97.7% accuracy with the H&E modality, but with the MMI and fused modalities, they achieved 91.3% and 68.4%, respectively.

In our study, we utilized four different datasets from four different sources and then generated 10,000 high-quality images using StyleGAN3, which has not been done by most of the researchers. Also, we have used ultrasound images that are very easily and quickly acquired around every corner of the world through non-invasive techniques. We have utilized the EfficientNet-B7 deep-learning model for the detection of breast cancer, which has less parameters as compared to many DL models, providing less computation complexity. Also, we used four types of metrics, Acc, pre, rec, and F1 score, with consistent values over 92%, which shows the effectiveness of our model. Also, five-fold cross-validation with high-quality images provides the better generativity of the dataset from a real-life perspective. Our proposed techniques demonstrate the impact of synthetically generated high-quality images using GAN over the conventional augmentation.

Overall, the performance comparison for the second experiment on the combined dataset clearly shows the benefits of using GAN-augmented images of excellent quality. The significant improvements in the accuracy, F1 score, precision, and recall demonstrate the model’s improved ability to accurately categorize breast cancer images. This enhanced performance is notably obvious in the lower number of false positives and false negatives, demonstrating GAN augmentation’s potential to improve diagnostic models for breast cancer detection. Comparing the two approaches, it can be noted that the inclusion of the generated data in the training set improved the generalization ability of the model and increased the classification accuracy. This demonstrates the value of generative adversarial networks in generating additional training data, especially when the original medical images are limited. An important aspect is not only the quantitative increase in the data set but also the quality of the generated images, confirmed by the FID metric, indicating a high degree of realism and closeness to the original images.

Although our research shows that incorporating synthetically created data in the training set has several benefits, it is crucial to be aware of any possible drawbacks. The possibility of overfitting is one issue, especially if the model begins to depend too much on patterns observed in the synthetic data that may not accurately represent actual situations. The possibility for artifacts or minor distortions to be introduced into the synthetic images that do not present in the original data is another drawback. These artifacts may accidentally influence the model’s learning process and impact its generalizability. To reduce these concerns, we applied rigorous quality assurance processes throughout the development of the synthetic images, and we made ensured that the FID metric was utilized to assess the synthetic images’ realism and similarity to the original dataset.

Even with these safety measures, using synthetic data in clinical applications needs to be performed with caution. To completely evaluate the robustness of our technique, further validation across several imaging modalities and testing on real-time clinical datasets are required. Furthermore, the identification of breast cancer using the proposed methodologies has not been verified in the real-word dataset, indicating another area for further research. Future research will solve these drawbacks and ensure that the proposed approaches are secure and efficient for clinical use.

6. Conclusions

In this work, we have performed the detection of breast cancer and demonstrated the impact of synthetically generated images using a GAN over the conventional augmentation technique. We have utilized the ultrasound imaging modality from four publicly available datasets, BrEaST, BUSI, Thammasat, and HMSS, from four different sources. A total of 3186 ultrasound images have been utilized with benign (1574) and malignant (1612) pathology types. To prove the impact of synthetically generated images, we have used the StyleGAN3 model to generate high-quality (512 × 512) 10,000 images. We have performed two separate experiments using a real dataset with conventional augmentation and a combined dataset (Real+GAN) utilizing the EfficientNet-B7 deep-learning model. The model is trained using five-fold cross-validation techniques and validated using the accuracy, precision, recall and F1 score metrics. The proposed methodology demonstrated that utilizing the EfficientNet-B7 model on the combined dataset provides over 92% metric values. It also demonstrated that training the model using a large dataset (Real+GAN) increases the model performance by over 3%.

In future work, we may utilize multimodal datasets to check the data generalizability of our proposed techniques across different data types. Also, we may test the proposed techniques on a real-time dataset for clinical application.

Author Contributions

Conceptualization, H.M.R. and S.D.; methodology, H.M.R.; software, S.D.; validation, J.Y., H.M.R. and S.D.; resources, S.D.; writing—original draft preparation, H.M.R.; writing—review and editing, J.Y., H.M.R. and S.D.; visualization, S.D. and H.M.R.; supervision, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets used in this study are publicly available and can be freely downloaded from the following links: BrEaST-Lesions_USG—https://www.cancerimagingarchive.net/collection/breast-lesions-usg/, accessed on 18 March 2024. Dataset_BUSI_with_GT’—https://scholar.cu.edu.eg/?q=afahmy/pages/dataset, accessed on 18 March 2024. Thammasat—http://www.onlinemedicalimages.com/index.php/en/81-site-info/73-introduction, accessed on 18 March 2024. HMSS—https://www.ultrasoundcases.info/, accessed on 18 March 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Alksas, A.; Shehata, M.; Saleh, G.A.; Shaffie, A.; Soliman, A.; Ghazal, M.; Khelifi, A.; Khalifeh, H.A.; Razek, A.A.; Giridharan, G.A.; et al. A Novel Computer-Aided Diagnostic System for Accurate Detection and Grading of Liver Tumors. Sci. Rep. 2021, 11, 13148. [Google Scholar] [CrossRef]
Sahu, A.; Das, P.K.; Meher, S. An Efficient Deep Learning Scheme to Detect Breast Cancer Using Mammogram and Ultrasound Breast Images. Biomed. Signal Process. Control 2024, 87, 105377. [Google Scholar] [CrossRef]
Wilkinson, L.; Gathani, T. Understanding Breast Cancer as a Global Health Concern. Br. J. Radiol. 2022, 95. [Google Scholar] [CrossRef] [PubMed]
Arnold, M.; Morgan, E.; Rumgay, H.; Mafra, A.; Singh, D.; Laversanne, M.; Vignat, J.; Gralow, J.R.; Cardoso, F.; Siesling, S.; et al. Current and Future Burden of Breast Cancer: Global Statistics for 2020 and 2040. Breast 2022, 66, 15–23. [Google Scholar] [CrossRef] [PubMed]
Xu, Y.; Gong, M.; Wang, Y.; Yang, Y.; Liu, S.; Zeng, Q. Global trends and forecasts of breast cancer incidence and deaths. Sci. Data 2023, 10, 334. [Google Scholar] [CrossRef]
World Health Organization Breast Cancer. Available online: https://www.who.int/news-room/fact-sheets/detail/breast-cancer (accessed on 21 April 2024).
Rai, H.M. Cancer Detection and Segmentation Using Machine Learning and Deep Learning Techniques: A Review. Multimed. Tools Appl. 2023. [Google Scholar] [CrossRef]
Christiansen, S.R.; Autier, P.; Støvring, H. Change in Effectiveness of Mammography Screening with Decreasing Breast Cancer Mortality: A Population-Based Study. Eur. J. Public Health 2022, 32, 630–635. [Google Scholar] [CrossRef]
Rodtook, A.; Kirimasthong, K.; Lohitvisate, W.; Makhanov, S.S. Automatic Initialization of Active Contours and Level Set Method in Ultrasound Images of Breast Abnormalities. Pattern Recognit. 2018, 79, 172–182. [Google Scholar] [CrossRef]
Rai, H.M.; Yoo, J. A Comprehensive Analysis of Recent Advancements in Cancer Detection Using Machine Learning and Deep Learning Models for Improved Diagnostics. J. Cancer Res. Clin. Oncol. 2023, 149, 14365–14408. [Google Scholar] [CrossRef]
Rai, H.M.; Yoo, J.; Dashkevych, S. GAN-SkipNet: A Solution for Data Imbalance in Cardiac Arrhythmia Detection Using Electrocardiogram Signals from a Benchmark Dataset. Mathematics 2024, 12, 2693. [Google Scholar] [CrossRef]
Abdallah, Y.M.Y.; Alqahtani, T. Research in Medical Imaging Using Image Processing Techniques. In Medical Imaging—Principles and Applications; IntechOpen: London, UK, 2019. [Google Scholar]
Nie, D.; Trullo, R.; Lian, J.; Wang, L.; Petitjean, C.; Ruan, S.; Wang, Q.; Shen, D. Medical Image Synthesis with Deep Convolutional Adversarial Networks. IEEE Trans. Biomed. Eng. 2018, 65, 2720–2730. [Google Scholar] [CrossRef] [PubMed]
Sadad, T.; Munir, A.; Saba, T.; Hussain, A. Fuzzy C-Means and Region Growing Based Classification of Tumor from Mammograms Using Hybrid Texture Feature. J. Comput. Sci. 2018, 29, 34–45. [Google Scholar] [CrossRef]
Mughal, B.; Sharif, M.; Muhammad, N.; Saba, T. A Novel Classification Scheme to Decline the Mortality Rate among Women Due to Breast Tumor. Microsc. Res. Tech. 2018, 81, 171–180. [Google Scholar] [CrossRef]
Kavitha, T.; Mathai, P.P.; Karthikeyan, C.; Ashok, M.; Kohar, R.; Avanija, J.; Neelakandan, S. Deep Learning Based Capsule Neural Network Model for Breast Cancer Diagnosis Using Mammogram Images. Interdiscip. Sci. 2022, 14, 113–129. [Google Scholar] [CrossRef]
Vijayarajeswari, R.; Parthasarathy, P.; Vivekanandan, S.; Basha, A.A. Classification of Mammogram for Early Detection of Breast Cancer Using SVM Classifier and Hough Transform. Measurement 2019, 146, 800–805. [Google Scholar] [CrossRef]
Kaur, P.; Singh, G.; Kaur, P. Intellectual Detection and Validation of Automated Mammogram Breast Cancer Images by Multi-Class SVM Using Deep Learning Classification. Inform. Med. Unlocked 2019, 16, 100151. [Google Scholar] [CrossRef]
Haris, U.; Kabeer, V.; Afsal, K. Breast Cancer Segmentation Using Hybrid HHO-CS SVM Optimization Techniques. Multimed. Tools Appl. 2024, 83, 69145–69167. [Google Scholar] [CrossRef]
Valvano, G.; Santini, G.; Martini, N.; Ripoli, A.; Iacconi, C.; Chiappino, D.; Della Latta, D. Convolutional Neural Networks for the Segmentation of Microcalcification in Mammography Imaging. J. Healthc. Eng. 2019, 2019, 9360941. [Google Scholar] [CrossRef]
Mahesh, T.R.; Thakur, A.; Gupta, M.; Sinha, D.K.; Mishra, K.K.; Venkatesan, V.K.; Guluwadi, S. Transformative Breast Cancer Diagnosis Using CNNs with Optimized ReduceLROnPlateau and Early Stopping Enhancements. Int. J. Comput. Intell. Syst. 2024, 17, 14. [Google Scholar] [CrossRef]
Cai, G.; Guo, Y.; Chen, W.; Zeng, H.; Zhou, Y.; Lu, Y. Computer-Aided Detection and Diagnosis of Microcalcification Clusters on Full Field Digital Mammograms Based on Deep Learning Method Using Neutrosophic Boosting. Multimed. Tools Appl. 2020, 79, 17147–17167. [Google Scholar] [CrossRef]
Vaka, A.R.; Soni, B.; Reddy, S. Breast Cancer Detection by Leveraging Machine Learning. ICT Express 2020, 6, 320–324. [Google Scholar] [CrossRef]
Ur Rehman, K.; Li, J.; Pei, Y.; Yasin, A.; Ali, S.; Mahmood, T. Computer Vision-Based Microcalcification Detection in Digital Mammograms Using Fully Connected Depthwise Separable Convolutional Neural Network. Sensors 2021, 21, 4854. [Google Scholar] [CrossRef] [PubMed]
Ragab, M.; Albukhari, A.; Alyami, J.; Mansour, R.F. Ensemble Deep-Learning-Enabled Clinical Decision Support System for Breast Cancer Diagnosis and Classification on Ultrasound Images. Biology 2022, 11, 439. [Google Scholar] [CrossRef] [PubMed]
Sheeba, A.; Santhosh Kumar, P.; Ramamoorthy, M.; Sasikala, S. Microscopic Image Analysis in Breast Cancer Detection Using Ensemble Deep Learning Architectures Integrated with Web of Things. Biomed. Signal Process. Control 2023, 79, 104048. [Google Scholar] [CrossRef]
Yan, F.; Huang, H.; Pedrycz, W.; Hirota, K. Automated Breast Cancer Detection in Mammography Using Ensemble Classifier and Feature Weighting Algorithms. Expert Syst. Appl. 2023, 227, 120282. [Google Scholar] [CrossRef]
Asadi, B.; Memon, Q. Efficient Breast Cancer Detection via Cascade Deep Learning Network. Int. J. Intell. Netw. 2023, 4, 46–52. [Google Scholar] [CrossRef]
Huynh, H.N.; Tran, A.T.; Tran, T.N. Region-of-Interest Optimization for Deep-Learning-Based Breast Cancer Detection in Mammograms. Appl. Sci. 2023, 13, 6894. [Google Scholar] [CrossRef]
Bouzar-benlabiod, L.; Harrar, K.; Yamoun, L.; Yacine, M. A Novel Breast Cancer Detection Architecture Based on a CNN-CBR System for Mammogram Classification. Comput. Biol. Med. 2023, 163, 107133. [Google Scholar] [CrossRef]
Oyelade, O.N.; Irunokhai, E.A.; Wang, H. A Twin Convolutional Neural Network with Hybrid Binary Optimizer for Multimodal Breast Cancer Digital Image Classification. Sci. Rep. 2024, 14, 692. [Google Scholar] [CrossRef]
Kadadevarmath, J.; Reddy, A.P. Improved Watershed Segmentation and DualNet Deep Learning Classifiers for Breast Cancer Classification. SN Comput. Sci. 2024, 5, 458. [Google Scholar] [CrossRef]
AlSalman, H.; Al-Rakhami, M.S.; Alfakih, T.; Hassan, M.M. Federated Learning Approach for Breast Cancer Detection Based on DCNN. IEEE Access 2024, 12, 40114–40138. [Google Scholar] [CrossRef]
Pawłowska, A.; Ćwierz-Pieńkowska, A.; Domalik, A.; Jaguś, D.; Kasprzak, P.; Matkowski, R.; Fura, Ł.; Nowicki, A.; Żołek, N. Curated Benchmark Dataset for Ultrasound Based Breast Lesion Analysis. Sci Data 2024, 11, 148. [Google Scholar] [CrossRef] [PubMed]
Al-Dhabyani, W.; Gomaa, M.; Khaled, H.; Fahmy, A. Dataset of Breast Ultrasound Images. Data Brief 2020, 28, 104863. [Google Scholar] [CrossRef] [PubMed]
HMSS Dataset. Available online: https://www.ultrasoundcases.info/ (accessed on 5 September 2023).
Jha, K.; Pasbola, M.; Rai, H.M.; Amanzholova, S. Utilizing Smartwatches and Deep Learning Models for Enhanced Avalanche Victim Identification, Localization, and Efficient Recovery Strategies: An In-Depth Study. In Proceedings of the 5th International Conference on Information Management & Machine Intelligence, Jaipur, India, 23–25 November 2023; ACM: New York, NY, USA, 2023; pp. 1–5. [Google Scholar]
Moqurrab, S.A.; Rai, H.M.; Yoo, J. HRIDM: Hybrid Residual/Inception-Based Deeper Model for Arrhythmia Detection from Large Sets of 12-Lead ECG Recordings. Algorithms 2024, 17, 364. [Google Scholar] [CrossRef]
Vo, D.M.; Nguyen, N.Q.; Lee, S.W. Classification of Breast Cancer Histology Images Using Incremental Boosting Convolution Networks. Inf. Sci. (N.Y.) 2019, 482, 123–138. [Google Scholar] [CrossRef]
Nasir, M.U.; Ghazal, T.M.; Khan, M.A.; Zubair, M.; Rahman, A.U.; Ahmed, R.; Hamadi, H.A.; Yeun, C.Y. Breast Cancer Prediction Empowered with Fine-Tuning. Comput Intell Neurosci 2022, 2022, 5918686. [Google Scholar] [CrossRef]
Arooj, S.; Atta-ur-Rahman; Zubair, M.; Khan, M.F.; Alissa, K.; Khan, M.A.; Mosavi, A. Breast Cancer Detection and Classification Empowered With Transfer Learning. Front. Public Health 2022, 10, 1–18. [Google Scholar] [CrossRef]
Arooj, S.; Khan, M.F.; Shahzad, T.; Khan, M.A.; Nasir, M.U.; Zubair, M.; Ouahada, K. Data Fusion Architecture Empowered with Deep Learning for Breast Cancer Classification. Comput. Mater. Contin. 2023, 77, 2813–2831. [Google Scholar] [CrossRef]
Melnik, A.; Miasayedzenkau, M.; Makarovets, D.; Pirshtuk, D.; Akbulut, E.; Holzmann, D.; Renusch, T.; Reichert, G.; Ritter, H. Face Generation and Editing with StyleGAN: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 46, 3557–3576. [Google Scholar] [CrossRef]
Alibani, M.; Acito, N.; Corsini, G. Multispectral Satellite Image Generation Using StyleGAN3. IEEE J. Sel. Top Appl. Earth Obs. Remote Sens. 2024, 17, 4379–4391. [Google Scholar] [CrossRef]
Ogundokun, R.O.; Li, A.; Babatunde, R.S.; Umezuruike, C.; Sadiku, P.O.; Abdulahi, A.R.T.; Babatunde, A.N. Enhancing Skin Cancer Detection and Classification in Dermoscopic Images through Concatenated MobileNetV2 and Xception Models. Bioengineering 2023, 10, 979. [Google Scholar] [CrossRef] [PubMed]
Goyal, Y.; Raj; Rai, H.M.; Aggarwal, M.; Saxena, K.; Amanzholova, S. Revolutionizing Skin Cancer Detection: A Comprehensive Review of Deep Learning Methods. In Proceedings of the 5th International Conference on Information Management & Machine Intelligence, Jaipur, India, 23–25 November 2023; ACM: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
Rai, H.M.; Yoo, J.; Dashkevych, S. Two-Headed UNetEfficientNets for Parallel Execution of Segmentation and Classification of Brain Tumors: Incorporating Postprocessing Techniques with Connected Component Labelling. J. Cancer Res. Clin. Oncol. 2024, 150, 220. [Google Scholar] [CrossRef] [PubMed]
Khan, M.B.S.; Atta-Ur-Rahman; Nawaz, M.S.; Ahmed, R.; Khan, M.A.; Mosavi, A. Intelligent Breast Cancer Diagnostic System Empowered by Deep Extreme Gradient Descent Optimization. Math. Biosci. Eng. 2022, 19, 7978–8002. [Google Scholar] [CrossRef] [PubMed]
Kumar, V.; Prabha, C.; Sharma, P.; Mittal, N.; Askar, S.S.; Abouhawwash, M. Unified Deep Learning Models for Enhanced Lung Cancer Prediction with ResNet-50–101 and EfficientNet-B3 Using DICOM Images. BMC Med. Imaging 2024, 24, 63. [Google Scholar] [CrossRef]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, ICML, Long Beach, CA, USA, 9–15 June 2019; pp. 10691–10700. [Google Scholar]
Ye, X.; Huang, Y.; Lu, Q. Automatic Multichannel Electrocardiogram Record Classification Using XGBoost Fusion Model. Front. Physiol. 2022, 13, 840011. [Google Scholar] [CrossRef]
Baumgartner, M.; Veeranki, S.P.K.; Hayn, D.; Schreier, G. Introduction and Comparison of Novel Decentral Learning Schemes with Multiple Data Pools for Privacy-Preserving ECG Classification. J. Healthc. Inform. Res. 2023, 7, 291–312. [Google Scholar] [CrossRef]
Hekal, A.A.; Elnakib, A.; Moustafa, H.E.D. Automated Early Breast Cancer Detection and Classification System. Signal Image Video Process 2021, 15, 1497–1505. [Google Scholar] [CrossRef]

Figure 1. Block diagram of the proposed methodology for the detection of breast cancer.

Figure 3. The visualization of various augmentation techniques applied to the ultrasound images.

Figure 4. The architecture of StyleGAN3 [43].

Figure 5. Random samples of synthetically generated images using StyleGAN3.

Figure 6. The layer-wise architectural illustration of EfficientNet-B7.

Figure 7. EfficientNet-B7 building blocks: architectural view of the (a) SE block, (b) MBConv1, (c) MBConv6 (3 × 3), and (d) MBConv6 (5 × 5) [47].

Figure 8. Fold 1 training and validation performance (loss and accuracy) on a real dataset.

Figure 9. Fold 2 training and validation performance (loss and accuracy) on a real dataset.

Figure 10. Fold 3 training and validation performance (loss and accuracy) on a real dataset.

Figure 11. Fold 4 training and validation performance (loss and accuracy) on a real dataset.

Figure 12. Fold 5 training and validation performance (loss and accuracy) on a real dataset.

Figure 13. Confusion matrix for breast cancer detection in the first experiment (real dataset).

Figure 14. Distribution of benign and malignant images across datasets post GAN augmentation.

Figure 15. Fold 1 training and validation performance (loss and accuracy) on a combined dataset.

Figure 16. Fold 2 training and validation performance (loss and accuracy) on a combined dataset.

Figure 17. Fold 3 training and validation performance (loss and accuracy) on a combined dataset.

Figure 18. Fold 4 training and validation performance (loss and accuracy) on a combined dataset.

Figure 19. Fold 5 training and validation performance (loss and accuracy) on a combined dataset.

Figure 20. Confusion matrix for breast cancer detection in the second experiment (Real+GAN dataset).

Figure 21. Accuracy metric comparison of both experiments across each fold.

Figure 22. Precision metric comparison of both experiments across each fold.

Figure 23. Recall metric comparison of both experiments across each fold.

Figure 24. F1-Score metric comparison of both experiments across each fold.

Figure 25. Aggregated values comparison of both experiments across each fold.

Table 1. Various augmentation techniques utilized on the ultrasound images.

Augmentation Techniques	Description	Probability
Horizontal Flipping	Images are flipped horizontally	0.5
Vertical Flipping	Images are flipped vertically	0.5
Random Shifts, Scaling, and Rotations	Random shifts, scaling, and rotations applied to images	0.5
Perspective Changes	Perspective changes applied to images	0.5
Addition of Gaussian Noise	Gaussian noise added to images	0.7
Cutout Effect	Randomly masks out sections of the image	0.5

Table 2. Performance metrics of breast cancer detection on the test dataset of the first experiment.

Metric	Fold 0	Fold 1	Fold 2	Fold 3	Fold 4	Aggregated
Accuracy	0.8558	0.8621	0.8543	0.8668	0.8746	0.8871
F1 Score	0.8557	0.8621	0.8543	0.8667	0.8745	0.8872
Precision	0.856	0.8621	0.8569	0.8718	0.8748	0.8879
Recall	0.8558	0.8621	0.8542	0.8668	0.8746	0.8871

Table 3. Performance metrics of breast cancer detection on the test dataset in the second experiment.

Metric	Fold 0	Fold 1	Fold 2	Fold 3	Fold 4	Aggregated
Accuracy	0.8934	0.9044	0.895	0.8746	0.8715	0.9201
F1 Score	0.8934	0.9044	0.895	0.8746	0.8715	0.9201
Precision	0.8935	0.9061	0.8952	0.8748	0.8748	0.9207
Recall	0.8934	0.9044	0.895	0.8746	0.8715	0.9201

Table 4. T-test results and corresponding hypothesis for the metrics.

Metric	Real+GAN Dataset	Real Dataset	t-Value	p-Value	Hypothesis Result
Accuracy	92.01	88.71	6.4	<0.0001	Reject H0; Significant
F1 Score	92.01	88.72	6.38	<0.0001	Reject H0; Significant
Precision	92.07	88.79	6.36	<0.0001	Reject H0; Significant
Recall	92.01	88.71	6.4	<0.0001	Reject H0; Significant

Table 5. Wilcoxon signed-rank test results with the corresponding hypothesis for the metrics.

Metric	Real Dataset	Real+GAN Dataset	Absolute Difference	Rank	p-Value	Hypothesis Result
Accuracy	88.71	92.01	3.3	3.5	<0.0001	Reject H0; Significant
F1 Score	88.72	92.01	3.29	2	<0.0001	Reject H0; Significant
Precision	88.79	92.07	3.28	1	<0.0001	Reject H0; Significant
Recall	88.71	92.01	3.3	3.5	<0.0001	Reject H0; Significant

Table 6. State-of-the-art comparison of breast cancer detection techniques on imaging datasets ¹.

Author/Year	Literature	Technique	Dataset	Modality	Metrics (%)
(Sadad et al., 2018)	[14]	Decision Tree	MIAS, DDSM	MMI	Acc: 98.20 Sen: 100.00 Spe: 97.00
(Mughal et al., 2018)	[15]	BPNN	MIAS, DDSM	MMI	Acc: 98.50 Sen: 100 Spe: 95.0
(Vijayarajeswari et al., 2019)	[17]	SVM	MIAS	MMI	Acc: 94.00
(Kaur et al., 2019)	[18]	SVM, LDA	MIAS	MMI	Acc: 94.00
(Cai et al., 2020)	[22]	DCNN	Local, INbreast	MMI	93.7
(Vaka et al., 2020)	[23]	DNNS	Private Hospital	H&E	Acc: 97.21 Pre: 97.90 Rec: 97.01
(Ur Rehman et al., 2021)	[24]	FC-DSCNN	DDSM, PINUM	MMI	Acc: 90.00 Sen: 99.00 Spe: 82.00 F1: 85.00 Pre: 89.00 Rec: 82.00
(Hekal et al., 2021)	[53]	SVM	CBIS-DDSM	MMI	Acc: 94.00
(Ragab et al., 2022)	[25]	CSO-MLP	BUSI	US	Acc: 97.09 Sen: 95.54 Spe: 97.65 Pre: 94.76
(Kavitha et al., 2022)	[16]	BPNN	Mini-MIAS, DDSM	MMI	Acc: 98.50 Sen: 98.46 Spe: 99.08 F1: 98.91
(Sheeba et al., 2023)	[26]	TCL-RAM	Bisque, Break His	H&E	Acc: 97.00 Sen: 93.00 Spe: 94.00
(Yan et al., 2023)	[27]	Ensemble Classifier	DDSM, MIAS	MMI	Acc: 93.26 Pre: 90.40 Rec: 82.31 F1: 89.89 Spe: 93.20
(Bouzar-Benlabiod et al., 2023)	[30]	SE-ResNet-101 + CBR	CBIS-DDSM	MMI	Acc: 86.71 IoU: 64.00 F1: 75.00 Rec: 76.00 Pre: 81.00
(Huynh et al., 2023)	[29]	EfficientNet, ConvNeXt	Six Datasets	MMI	Acc: 92.00 Pre: 92.00 Rec: 92.00 F1: 97.00
(Kadadevarmath & Reddy, 2024)	[32]	DualNet-DL model	CBIS-DDSM, MIAS	MMI	Acc:94.29 Pre:98.32 Sen: 94.74 Spe: 94.74 F1: 95.79
(AlSalman et al., 2024)	[33]	Federated +DCNN	VINDR-MAMMO, CMMD, INBREAST	MMI	Acc:98.90
(Oyelade et al., 2024)	[31]	TwinCNN	MIAS and BreakHis	H&E, MMI	Acc: 97.7 (H&E) Acc:91.3 (MMI) Acc: 68.4 (Fused)
(Mahesh et al., 2024)	[21]	CNN	CBIS-DDSM, Mini-DDSM Breast Histopathology Images,	MMI, H&E	Acc:95.2
Proposed		EffiecientNet-B7	BrEaST BUSI Thammasat HMSS	Ultrasound	Acc: 92.01 Pre: 92.07 Rec: 92.01 F1: 92.01

¹ Abbreviations: BPNN, backpropagation neural network; BUSI; breast ultrasound image dataset; CSO-MLP, cat swarm optimization with the multilayer perceptron; DCNN, deep convolutional neural network; DDSM, digital database for screening mammography; DNNS, deep neural network with support value; FC-DSCNN, fully connected depth wise-separable convolutional neural network; H&E, hematoxylin and eosin (histopathology); LDA, linear discriminant analysis; MIAS, Mammographic Image Analysis Society; MMI, mammography imaging; SVM, support vector machine; TCL-RAM, transfer learning integrated with regional attention mechanism; US, ultrasound.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rai, H.M.; Dashkevych, S.; Yoo, J. Next-Generation Diagnostics: The Impact of Synthetic Data Generation on the Detection of Breast Cancer from Ultrasound Imaging. Mathematics 2024, 12, 2808. https://doi.org/10.3390/math12182808

AMA Style

Rai HM, Dashkevych S, Yoo J. Next-Generation Diagnostics: The Impact of Synthetic Data Generation on the Detection of Breast Cancer from Ultrasound Imaging. Mathematics. 2024; 12(18):2808. https://doi.org/10.3390/math12182808

Chicago/Turabian Style

Rai, Hari Mohan, Serhii Dashkevych, and Joon Yoo. 2024. "Next-Generation Diagnostics: The Impact of Synthetic Data Generation on the Detection of Breast Cancer from Ultrasound Imaging" Mathematics 12, no. 18: 2808. https://doi.org/10.3390/math12182808

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Next-Generation Diagnostics: The Impact of Synthetic Data Generation on the Detection of Breast Cancer from Ultrasound Imaging

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Data Collection

3.2. Data Preprocessing

3.3. Data Augmentation

3.4. StyleGAN3

3.5. Data Generation Using StyleGAN3

Synthetic Image Generation

3.6. EffiecientNetB7

3.7. Evaluation Metrics

4. Results

4.1. Experimentation on Real Dataset

4.2. Experimentation on Combined Dataset (Real and Synthetic)

4.3. Statistical Analysis

4.3.1. T-Test Analysis

4.3.2. Wilcoxon Signed-Rank Test Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI