1. Introduction
Breast cancer is one of the most common cancers reported amongst women and is a primary contributor to cancer-related deaths around the world. Early diagnoses of breast cancer can enhance the patient’s quality of life and also increase their survival rate. Further, the mortality rate of the affected patients can also be reduced [
1]. The ultrasonography technique is commonly employed in the diagnosis of breast cancer due to its convenience, painless operation and efficient real-time performance [
2]. However, the ultrasonic instruments possess high sensitivity, which makes the tissues of the environment in the human body vulnerable. This also results in a massive amount of speckle noise that interferes with doctors’ diagnoses [
3]. At present, ultrasound methods are preferred in the diagnosis of breast cancer based on medical expertise. To be specific, ultrasound is involved in the classifications and marks of breast lesions. The ultrasound procedure can be prescribed in this following scenario: the doctor uses an ultrasound instrument to find a better angle and demonstrates the lesion clearly on the screen. Then, they keep the probe fixed for a long period of time using one hand while another hand is used to measure and mark the lesion on the screen [
4,
5]. In the abovementioned procedure, automatic tracking of the region of interest (lesions) and classification (malignant or benign) are in huge demand for breast lesion detection in USIs.
Computer-Aided Diagnosis (CAD) systems are widely employed in the classification and detection of tumors in breast USIs. This type of system is strongly recommended among radiotherapists for recognizing disease prognoses and breast tumors. As per the literature, the statistical method [
6] has been mainly utilized in the analysis of the extracted features such as posterior acoustic attenuation, lesion shape, margin, and homogeneity. However, the recognition of the margins and shapes of lesions is complex in USIs [
7]. In addition, Machine Learning (ML) methods have been widely used in both the analysis and classification of lesion-based handcrafted textures and morphological features of tumors [
8]. The extraction of features is, however, still largely based on medical expertise. The struggles of researchers for hand-crafted features resulted in the development of new algorithms, such as the Deep Learning (DL) algorithm, that can learn the features automatically from information, especially information that is effective in terms of extracting nonlinear features from the data. The DL model is a promising candidate in the classification of USIs, where the recognition of patterns cannot be hand-engineered with ease [
9]. Several research studies, using the DL approach, leverage the idea of a pretrained Convolution Neural Network (CNN) to categorize the tumors in breast USIs [
10].
In the current study, an Ensemble Deep-Learning-Enabled Clinical Decision Support System for Breast Cancer Diagnosis and Classification (EDLCDS-BCDC) technique was developed using USIs. The proposed EDLCDS-BCDC technique involves a Chaotic Krill Herd Algorithm (CKHA) with Kapur’s Entropy (KE) technique used for the image segmentation process. Moreover, an ensemble of three deep learning models, namely VGG-16, VGG-19, and SqueezeNet, is used for feature extraction. Furthermore, Cat Swarm Optimization (CSO) with the Multilayer Perceptron (MLP) model is also utilized to classify the images in terms of whether breast cancer exists or not. Extensive experimental analysis was conducted on benchmark database and the results of the EDLCDS-BCDC technique were examined under distinct measures.
  2. Related Works
Badawy et al. [
11] proposed a system based on combined Deep Learning (DL) and Fuzzy Logic (FL) for the automated Semantic Segmentation (SS) of tumors in Breast Ultrasound (BUS) images. The presented system comprises two stages, namely CNN-based SS and FL-based preprocessing. A total of eight common CNN-based SS methods was employed in this work. Almajalid et al. [
12] designed a segmentation architecture-based DL framework called U-net for BUS images. U-net is a type of CNN framework that was developed for the segmentation of life science images containing constrained trained data. Yousef Kalaf et al. [
13] presented an architecture for the classification of breast cancer with an attention mechanism in an adapted VGG16 framework. The adapted attention model distinguishes between features of the background and targeted lesions in ultrasound image. In addition, an ensemble of loss function was presented; this involved an integration of the logarithm of hyperbolic cosine loss and binary cross-entropy in order to enhance the methodological discrepancy between labels and lesion classification.
Cao et al. [
14] conducted a systematic evaluation of the efficiency of a number of current advanced object classification and detection approaches for breast lesion CAD. Then, they estimated distinct DL frameworks and implemented a complete research work on the recently gathered data set. Tanaka et al. [
15] designed a CAD scheme to classify benign and malignant tumors using ultrasonography-based CNN. Next, an ensemble network was created in this study by integrating two CNN architectures (VGG192 and ResNe1523). Afterwards, the balanced trained data were fine-tuned using data extension, a common method to synthetically generate new samples from the original. These data were further utilized in a mass level classification technique that enables CNN in the classification of mass with each view. 
Qi et al. [
16] developed an automatic breast cancer diagnostics system to increase the accuracy of diagnosis. The scheme, which can be installed on smartphones, takes a picture of the ultrasound report as input, and performs diagnoses on all the images. The presented method comprises three subsystems. Initially, the noise in the captured images is reduced and high-quality images are reconstructed. Next, the initial subsystem is designed according to a stacked Denoising Autoencoder (DAE) framework and Generative Adversarial Network (GAN). Next, the image is classified in terms of whether it is malignant or non-malignant; DCCN is applied to extract the high-level features from the image. At last, anomalies in the system performance are detected, which further reduces the False-Negative Rate (FNR). 
  3. The Proposed Model
The current study developed a novel EDLCDS-BCDC technique to identify the existence of breast cancer using USIs. In this technique, the pre-processing of USIs primarily occurs in two stages, namely noise elimination and contrast enhancement. Subsequently, CKHA-KE-based image segmentation with ensemble DL-based feature extraction processes are performed. Finally, CSO-MLP model is utilized to classify the images in terms of whether breast cancer exists or not. 
Figure 1 illustrates the overall process of the EDLCDS-BCDC technique.
  3.1. Pre-Processing
In this primary stage, the USIs are pre-processed, which involves the noise being removed using the WF technique. Noise extraction is an image pre-processing approach in which the features of an image, corrupted by noise, are enhanced. The adaptive filter is a particular case in which the denoising process is fully dependent upon the noise content that is locally present in the image. Assume that the corrupted images are defined as 
, the noise variance through which the whole point is demonstrated is 
 the local mean is provided as 
 about a pixel window, and local variance from the window is represented as 
. Then, the probable technique of denoising an image can be demonstrated as follows [
17]:
At this point, the noise variance across the image becomes equivalent to zero, . Once the global noise variance becomes lesser while the local variance becomes greater than global variance, the ration is almost equivalent to one.
If , then . The high local variance illustrates the occurrence of an edge from the assumed image window. In this case, once the local and global variances match with each other, then the formula is revamped as follows: .
It can be an average intensity from a usual region. Furthermore, the contrast is improved with the help of the CLAHE technique [
18]. It is an extended version of an adaptive histogram equalization in which the contrast amplification is limited, so as to minimize the noise amplification issue. In CLAHE, the contrast in the neighborhood of a provided pixel value increases, which is offered by the slope of transformation function. It functions on small regions in the image, which are named as ‘tiles’, instead of the whole image. The adjacent tiles are integrated using bilinear interpolation to eliminate the artificial boundary. It can be employed to increase the contrast level of the image.
  3.2. CKHA-KE Based Image Segmentation
Next, the infected lesion areas are segmented with the help of CKHA-KE technique. The KE technique is applied to determine the optimal threshold value, 
. In general, 
 takes values between 1 and 255 (for 8-bit depth images) and splits an image into 
 and 
 to maximize the succeeding function [
19]:
 represents the number of pixels with gray values, represented by 
, and 
 denotes the number of pixels in an image. Equation (1) is adapted easily to find a multiple-threshold value that separates the image into homogenous regions, where it can be redeveloped. Consider a gray image with an intensity value within 
, then the algorithmic search for finding the 
 optimum threshold value 
 that subdivides the image to 
 to maximize the subsequent function is as follows:
In order to detect an optimal threshold value for KE, CKHA is derived.
Having idealized on the swarm performance of krill, KHA [
20], a meta-heuristic optimization method, is used in resolving optimization problems. In KH, the place is mostly affected by three activities, namely:
- i.
- Drive affected by another krill; 
- ii.
- Foraging act; 
- iii.
- Physical diffusion. 
In KHA, the Lagrangian method is utilized in the existing search space in Equation (10):
        where 
 implies the motion created by other krill individuals; 
 signifies the foraging motion; and 
 is an arbitrary diffusion of the 
 krill individual.
A primary one, and its direction, 
, is obviously known by the subsequent parts, such as target, local, and repulsive effects. Their brief explanation is given herewith:
  and  demonstrate the maximal speed, inertia weight, and final motion, respectively.
The secondary one is computed by two modules, namely the food place and its preceding experience. In order to achieve the 
 krill, it could be idealized as follows:
        where
        
        and 
 refers to the foraging speed, 
 defines the inertia weight, and 
 represents the final one.
The tertiary part is an essential aspect in arbitrary procedures. It can be calculated based on the maximal diffusion speed and an arbitrary directional vector. Its formulation is given herewith:
        where 
 denotes the maximal diffusion speed whereas 
 indicates the arbitrary directional vector and their arrays are arbitrary numbers. At this point, the place from KH in 
 to 
 can be expressed as follows:
The CKHA technique is derived by incorporating the chaotic concepts into KHA. In this work, a 1-D chaotic map was incorporated in the CKHA design.
  3.3. Ensemble Feature Extraction
During the feature extraction process, an ensemble of DL models are used, encompassing three approaches, namely VGG-16, VGG-19, and SqueezeNet. The three vectors can be derived as given herewith:
Furthermore, the extracted feature is merged in a single vector:
        whereas 
 represents the fused vector 
. The entropy is employed on the feature vector for selecting the optimum feature according to the score. The FS method is explained arithmetically in Equations (16)–(19). Entropy 
 is utilized in the selection of 1186 score-based features from 7835 features as defined below:
In Equations 20 and (21),  represents the number of features chosen,  denotes the total number of features, and  characterizes the feature probability. The last chosen feature is given to the classifier to differentiate the normal and breast cancer images.
  3.3.1. VGG-16 and VGG-19
Simonyan and Zisserman 2014 presented VGG, a sort of CNN framework. The VGG framework won the ILSVR (ImageNet) competition in 2014. The framework enhances the AlexNet framework by replacing kernel-sized filter in which 11 represents the initial convolution layer whereas 5 denotes the next convolutional layer, with numerous small 2 × 2 filters in the max-pooling layer and 3 × 3 kernel-sized filters at the convolution layer consecutively. Finally, it has two FC layers and an activation function softmax/sigmoid for the output. The familiar VGG models are VGG16 and VGG19. Between these, the VGG19 model comprises 19 layers whereas the VGG-16 model comprises 16 layers. The major distinction between the models is the existence of an additional layer at three convolution blocks of the VGG19 model.
  3.3.2. SqueezeNet
Squeezenet is a kind of DNN that comprises 18 layers and is mainly utilized in image processing and computer vision programs. The primary goals and the objectives of the researchers, in the development of SqueezeNet, are to construct a small NN that comprises fewer parameters and to allow easy transfer through a computer network (requiring less bandwidth). Further, it should also fit into computer memory easily (requiring less memory). The first edition of this framework was executed on top of a DL architecture called Caffe [
21]. After a short period of time, the authors started utilizing this framework in many publicly available DL architectures. Firstly, SqueezeNet was labelled, in which it was compared against AlexNet. Both AlexNet and SqueezeNet are two distinct DNN frameworks yet have one common feature, namely accuracy, when estimating the ImageNet image data set. 
Figure 2 demonstrates the structure of SqueezeNet.
The primary objective of SqueezeNet is to achieve high accuracy using less parameters. To accomplish this objective, three processes are used. Primarily, a 3 
 3 filter is substituted by a 1 × 1 filter with less parameters. Next, the input channel count can be minimized to 3 × 3 filters. At last, the subsampled operation is carried out at the final stages to create a convolutional layer with a large activation function. SqueezeNet is mainly based on the concept of an Inception module [
22] to design a Fire module with a squeeze layer and an expansion layer. The fire module comprises a squeeze convolution layer (which has only 1 × 1 filters) that feeds into an expansion layer with a mix of 1 × 1 and 3 × 3 convolutional filters.
  3.4. Optimal MLP Classifier
Finally, the generated feature vectors are passed onto MLP classifier to allot proper class labels. Perceptron is a simple ANN framework that depends on a slight distinct artificial neuron called the Linear Threshold Unit (LTU) or the Threshold Logic Unit (TLU). The input and output of the cells are numbers whereas all the values are related to weight. TLU evaluates the weighted sum of the input as given below:
Later, a step function is employed for that sum and the outcome is viewed as the output:
However, 
 The perceptron is simply made up of a single layer of TLUs that are interconnected to each input. Once the neuron in a layer is interconnected, it is named as a dense layer or a fully connected layer. The perceptron is stacked by several perceptrons. The resultant ANN is otherwise called the MLP. It is composed of a TLU or a hidden layer in which the ones that pass through are input layers, and other last are output layers. In order to train the MLPS, the BP training approach is utilized to compute the gradient automatically. To optimally adjust the weight values of the MLP model, the CSO algorithm is applied. The CSO algorithm is stimulated from two characteristics of cats, namely the Seeking Model (SM) and Tracking Mode (TM). In the CSO algorithm, the cats possess the locations comprising the D-dimension, the velocity of the dimensions, the fitness value that denotes the inclusion of the cat into the fitness function, and the flag to detect the occurrence of SM or TM. The end solution is determined through the optimal location of the cat and it sustains the optimal ones until the algorithm is terminated [
23].
To model the characteristics of cats in the durations of their resting and alert states, SM is used. It includes four major variables such as SMP, SRD, CDC, and SPC. The procedure involved in SM is listed herewith:
Step l: Create  copies of the current location of  where  SMP. When the SPC value is calculated to be true, assume  (SMP ). Then, retain the current location of the candidate.
Step 2: For all copies based on CDC, arbitrarily subtract the current values of the SRD percent and substitute it with previous values.
Step 3: Determine the Fitness Value (FS) for every candidate point.
Step 4: When every FS is non-identical, determine the selection possibility of all the candidate points or else consider the selection possibility of candidate points as ‘1’.
Step 5: Determine the fitness function for every cat. When the fitness function for every cat is identical, then the probability of choosing a cat becomes 1; otherwise, the probability 
 can be determined as follows.
        
        where 
Fi indicates the fitness value of a cat, 
 represents the maximum fitness value of cats, 
 denotes the minimal fitness value of the cat, 
 for minimization problems, and 
 for maximization problems.
TM is the next mode of CSO algorithm where the cats aim at tracking their food as well as their targets. The process is listed as follows:
Step 1: Upgrade the velocity of all the dimensions based on Equation (25).
Step 2: Ensure whether the velocity falls inside the range of higher velocity. When the new velocity is above the range, it is considered as equivalent to the limit:
Step 3: Upgrade the position of 
 according to (26):
 denotes the location of the cat with optimal fitness and  implies the location of ;  denotes the acceleration coefficient to extend the velocity of the cat when moving into the solution space.
  4. Performance Validation
The proposed model was implemented on a PC with the following configuration: Intel i5, 8th generation PC with 16GB RAM, MSI L370 Apro, and Nividia 1050 Ti4 GB. The researchers used Python 3.6.5 along with pandas, sklearn, Keras, Matplotlib, TensorFlow, opencv, Pillow, seaborn and pycm. The experimental analysis was conducted for the EDLCDS-BCDC technique using the benchmark Breast Ultrasound Dataset [
24], which comprises 133 images classified as normal, 437 images classified as benign, and 210 images classified as malignant. The dataset holds 780 images sized in the range of 
 pixels. 
Figure 3 shows the input images along with ground truth images. The first, third, and fifth rows represent the original mammogram images. Next, the respective ground truth images are given in the consecutive second, fourth, and sixth images. Furthermore, 
Figure 4 includes a histogram of the images (for the input images given in the first, third, and fifth rows in 
Figure 3).
Figure 5 illustrates the sample visualization results of the proposed model during the preprocessing stage. For a given input image, the corresponding noise was removed and the contrast-enhanced images are depicted in the figure. It is evident that the quality of these images was considerably improved in this preprocessing stage.
 Table 1 exhibits the overall breast cancer classification analysis results accomplished using the EDLCDS-BCDC technique under several epochs and different measures such as 
, 
, 
 and 
. The table values imply that the proposed EDLCDS-BCDC technique accomplished the maximum breast cancer classification results in all the aspects considered for the study.
 Table 2 show the overall breast cancer classification outcomes achieved by the proposed EDLCDS-BCDC technique under several epochs. The results represent the enhanced classifier results for the EDLCDS-BCDC technique under every epoch. For instance, with 250 epochs, the EDLCDS-BCDC technique attained 
, 
, 
 and 
 values of 96.01%, 97.95%, 95.39%, and 97.52%, respectively. Similarly, with 750 epochs, the presented EDLCDS-BCDC technique obtained 
, 
, 
 and 
 values of 95.35%, 97.38%, 93.93%, and 96.75%, respectively. Moreover, with 1500 epochs, the proposed EDLCDS-BCDC technique attained 
, 
, 
 and 
 values of 97.15%, 97.35%, 94.74%, and 96.92%, respectively.
 The results from the accuracy analysis of the EDLCDS-BCDC technique conducted on the test data are illustrated in 
Figure 6. The results demonstrate that the proposed EDLCDS-BCDC system accomplished an improved validation accuracy as compared to the training accuracy. Further, the accuracy values were also found to be saturated with the number of epochs.
The loss outcome analysis results accomplished by the proposed EDLCDS-BCDC technique on test data are portrayed in 
Figure 7. The results reveal that the EDLCDS-BCDC approach reduced the validation loss as compared to the training loss. It is also shown that the loss values were saturated with increasing numbers of epochs.
Figure 8 illustrates the set of ROC curves obtained by EDLCDS-BCDC technique under distinct epochs. The results show that the proposed EDLCDS-BCDC technique achieved an increased ROC of 99.4027 under 250 epochs, 99.7071 under 500 epochs, 98.7158 under 750 epochs, 99.4562 under 1000 epochs, 98.4676 under 1250 epochs, and 98.8527 under 1500 epochs.
 Figure 9 contains the comparative analysis results, in terms of 
, 
, and 
, for the proposed EDLCDS-BCDC technique as well as other recent approaches [
25]. The results indicate that the VGG19 and Densnet161 models obtained the lowest values of 
, 
, and 
.
 In addition, the VGG11, Resnet101, and Densenet161 models produced slightly increased , , and  values. The VGG16 model accomplished reasonably good , , and  values of 84.42%, 96.21%, and 94.69%, respectively. However, the proposed EDLCDS-BCDC technique surpassed the available methods with the highest , , and  values of 84.95%, 90.20%, and 87.90%, respectively.
Figure 10 highlights the comparative analysis results, in terms of 
, accomplished by EDLCDS-BCDC and recent approaches [
25]. The results indicate that both the VGG19 and Densnet161 models obtained low 
. In addition, the VGG11, Resnet101, and Densenet161 models produced slightly increased 
 values. Moreover, the VGG16 model accomplished a reasonable 
 of 92.46%. However, the proposed EDLCDS-BCDC technique surpassed all other available methods with the highest 
 of 97.09%.
 The above-discussed results establish that the proposed EDLCDS-BCDC technique is a promising candidate for the recognition of breast lesions using USIs.