*3.4. Stage 3: Mass Segmentation, Feature Extraction, and Classification*

#### 3.4.1. Mass Segmentation and Evaluation

Following Stage 2, the final evaluation of the system's performance was based on its mass segmentation and classification. To fully separate the mass from its surrounding tissue, we utilized deep-learning-based semantic segmentation once the mass had been localized using the bounding box location obtained from the previous stage. Here, the highest CS was selected for more than one detection. Clearly, segmented mass is important in defining the area in which the features are extracted from the images when classifying the mass into benign or malignant in later stages. Therefore, the evaluation for segmentation performance from the Jaccard index, *J*, of the IoU score was calculated based on Equation (10):

$$\text{J(A, B)}\text{ or Intersection over Union, IoU} = \mid \text{A} \bigcap \text{B} \mid / \mid \text{A} \mid \mid \text{B} \mid \tag{10}$$

where A is the sample data being tested against sample data B (ground truth sample). A higher *J* or IoU score brings better similarity between the two sets. The accuracy of the segmentation was measured based on its testing performance on different input image settings, based on Equation (11), utilizing TP, FP, TN, and FN.

$$\text{Accuracy}\_{\prime} \text{Acc} = (\text{TP} + \text{TN}) / (\text{TP} + \text{FN} + \text{TN} + \text{FP}) \tag{11}$$

#### 3.4.2. Feature Extraction

In the final stage, the segmented mass was used to classify whether the mass is benign or malignant. Furthermore, handcrafted features were used to finally classify the mass into benign or malignant using a well-known machine learning technique. In this study, textural features were chosen as the main feature contributor. The segmented mass features were extracted based on three primarily used radiomics handcrafted features for mammography: textural feature (Gray-Level Co-occurrence Matrix (GLCM)), geometrical feature (mass circularity), and first-order statistics (mean intensity).

#### Feature Extraction: Gray-Level Co-Occurrence Matrix (GLCM)

The GLCM can highlight specific properties of the spatial distribution of the gray levels in the texture image. The proposed SbBDEM procedure was applied to increase the textural refinement of the dense and mass region in the earlier stage. Since both benign and malignant region segmented does not change in respect of illuminance when exposed to light, textural analysis is also essential in extracting important features to differentiate between two neighboring pixels [68]. The features were calculated based on Equations (2)–(5) as previously discussed in Section 3.2.5.

#### Feature Extraction: Circularity and Mean Intensity

A malignant breast mass varies in that its edges are uneven and likely to expand quicker, giving it a projecting look in a mammogram. In contrast, a benign mass differs because its geometric limits are more clearly defined, smooth, and consistently formed [26]. These are some of the features selected by radiologists when making visual clinical mammogram evaluations. As a result, one of the descriptors used in previous studies [25,26] is the mass's circularity characteristic, determined using Equation (12), that is implemented using the segmented region's area and perimeter.

$$\text{Circularity} = \frac{4(Area)(\pi)}{Perimeter^2} \tag{12}$$

Additionally, the inclusion of the supplementary characteristic of the mass's mean intensity is based on the notion that since malignant mass cells are more densely formed than benign mass, it may appear to have a greater overall image intensity. The features were calculated based on Equation (6).

#### 3.4.3. Mass Classification and Evaluation

All the features were trained with and without any feature selection or reduction method using a supervised weighted *k*-nearest neighbor (*k*-NN) algorithm [69,70]. To determine the proper *k* for the training images, we ran the *k*-NN algorithm with different values of *k* and chose the *k* that minimizes errors while preserving the system's capability to make accurate predictions when given new testing data. To make an unbiased test performance of the features, 5-fold cross-validation was applied during training, with the final *k*-neighbors value set to 10, using Euclidean distance measurement, having inverse distance weighting for the multivariate interpolation of the data points applied.

The mass abnormality classification's performance was based on the testing accuracy as in Equation (11) and the area under the Receiver Operating Characteristic (ROC) curve. The ROC curve is a standard measuring the degree of separability of binary classification between the mass and its background on a plot of sensitivity (TP Rate) against the specificity (FN Rate), where the highest area under the ROC curve represents the model's ability to segregate the class better.

#### **4. Results and Discussion**

In this section, the results are discussed based on the stages of experimental procedures explained in the previous section. Comparison of the result of the proposed SbBDEM technique in the pre-processing stage is made based on the performance of the immediate stage of mass detection and is compared between original, adaptive histogram equalization (HE/AHE), contrast limited adaptive histogram equalization (CLAHE), and the proposed SbBDEM technique in this study on all mammogram images.

### *4.1. Image Quality and Textural Elements*

The performance of the proposed image enhancement in the pre-processing stage before mass detection was investigated based on differently trained image input for the models. Figure 5 shows an example of mammogram and its respective histogram for comparison on the (A) original, (B) HE/AHE, (C) CLAHE, and (D) proposed SbBDEM techniques images. Comparison of histogram for the original in Figure 5A shows similar shape to the proposed SbBDEM in Figure 5D, however its pixel distribution has expanded and shifted to the left side of the histogram. This suggested that the proposed SbBDEM can retain the pixel distribution as similar as possible to the original image, but with the decrease of intensity resulted to increasing the pixel belonging to the non-dense region. More pixels of <0.5 are extrapolated causing non-dense area to be darkened, leaving the dense and mass area lighter for better edge difference for the network to learn.

Meanwhile, Table 1 shows the average scores for mean-square error (MSE), Blind/ Reference-less Image Spatial Quality Evaluator (BRISQUE), image intensity, and GLCM statistical features comparison between the proposed SbBDEM against other enhancement techniques for all mammogram images. The BRISQUE score is improved from 43.5799 in the original image to 42.3841 and the lowest amongst others, suggesting that using the proposed SbBDEM produced an acceptable quality image in terms of better perceptual ability. Additionally, the average correlation feature for the proposed SbBDEM is the lowest at 0.9752. Since correlation measures how correlated a pixel is to its neighbor over the whole image, it is easy to conclude that neighboring pixels within the proposed SbBDEM image correlate the least with each other. This supports the better edge difference between the pixels within the image for better textural perception. Meanwhile, the energy property represents the estimated pixel attribute energy values that make up an image's energy properties [71,72]. The energy features combine to create an image weight model, which is a collection of weights reflecting the importance of the image pixels from the perspective of perception. The higher energy property in the proposed SbBDEM image suggests the overall pixel carrying more weight is expected to be represented during network training. Finally, the contrast and homogeneity properties show no reflection to

the proposed SbBDEM technique as neither shows the least or the most out scores to form varying spatial pattern arrangements.

**Figure 5.** Sample images and histogram plots from columns of (**A**) Original, (**B**) HE/AHE, (**C**) CLAHE and (**D**) SbBDEM image enhancement techniques for comparison.

**Table 1.** Average Quality Tests and GLCM features on INbreast Images (N = 112) using Enhancement Techniques.


For breast mass analysis, the result from the CLAHE-enhanced image, the enhancement technique used in most past studies [10,18,24,45,51,68] is selected to be compared to the proposed SbBDEM method. Figure 6 illustrates sample images from the result of mass detection for both non-dense (Rows 1 and 2) and dense (Rows 3 and 4) images with the confidence score (CS) indicated in the yellow boxes obtained from the mass detection stage in this study. Here, the original image on the first column Figure 6A–E with the ground-truth labeled in red boxes is followed by its respective CLAHE-enhanced (second column) and the proposed SbBDEM technique (third column) images.

Visual evaluation of the images demonstrates increased and interpolated contrast stretching observed on the CLAHE-enhanced image in Figure 6F–J. Meanwhile, the proposed SbBDEM images produced darker overall contrast, as seen in Figure 6K–O, especially on the non-dense fatty tissue region, while preserving the mass and dense region intensity from the original image. Maintaining the pixel information of the mass is essential in feature extraction and convolution of the YOLOv3 algorithm, as this will also preserve the edge of the mass during enhancement.

**Figure 6.** Result of Mass Detection for comparison. Rows 1 and 2: non-dense breasts. Rows 3 and 4: dense breasts. Row 5: Example of image with True Positive mass (**TP-M**) and False Positive mass (**FP-M**) detections. Yellow boxes indicate bounding boxes with a confidence score for mass detection. (**A**–**E**): Original images. (**F**–**J**): CLAHE-enhanced images. (**K**–**O**): proposed SbBDEM images.

Other than that, Row 5 of Figure 6E,J,O demonstrates an example of True-Positive Mass (TP) (TP-M) and False-Positive Mass (FP) (FP-M) detections during the mass detection stage. Further pixel analysis based on edge detection emulated by the network's convolutional process is extracted using an 8-by-8 grid window size on the edge of expected mass FP-M corresponding to Figure 7A,B, and mass TP-M in Figure 7C,D.

**Figure 7.** (**A**) FP detected mass edge on CLAHE image. (**B**) Corresponding location of TN mass location based on (**A**) on the proposed SbBDEM image result. (**C**) TP detected mass edge on the CLAHE image. (**D**) Corresponding TP location of detected mass location based on (**C**) on the proposed SbBDEM image result. The analysis is made from Figure 6E,J,O, where Δ is the pixel edge difference. The lighter region above the red lines indicates the mass region.

The mass edge analysis is based on the difference of maximum pixel Δ in the region where the region above the red line is the ground-truth-based mass, while the opposite is the background based on the convolution filtering process using kernel K = [110; 1 0-1; 0-1-1] and maximum pooling (Max pooling) downsampling. This revealed that the FP-M detected in Figure 7A on the CLAHE image has a higher probability of being detected based on its pixel region difference, Δ = 35 compared to Δ = 23 on the same pixel location on the proposed SbBDEM image in Figure 7B, as per the ground-truth in Figure 6E. Additionally, TP-M was detected on the CLAHE image and the proposed SbBDEM image. However, even though the proposed SbBDEM image is visually darker, the TP-M detected in Figure 7D for the proposed SbBDEM has a far higher mass edge detection difference at Δ = 14 compared to its counterpart in Figure 7C using CLAHE enhancement, having Δ = 1. This indicates that the new intensity value replacing the original pixel during the proposed SbBDEM

process lowers FP detection on non-mass locations, as high-level spatial image features such as edge and coarse textures are extracted at the earliest learnable layer during YOLOv3 learning. At the same time, it increased the probability of detecting TP mass on the proposed SbBDEM image.

The mass detection performance of the overall image enhancement is made through the next stage. It is explained from the Recall-Precision Curves (RPC) in Figure 8 for models trained with the original, HE/AHE, CLAHE, and the proposed SbBDEM images. High recall and high precision are both represented as high areas under the RPC, where high precision is correlated with a low false-positive rate, and high recall is correlated with a low false-negative rate. Note that the proposed SbBDEM enhancement technique produced the highest mean Average Precision (mAP) as area under the RPC of 0.8125, followed by CLAHE images with mAP = 0.7496. In contrast, the HE/AHE images downgraded the performance from using the original images, with mAP at 0.5430 compared to 0.6842 for the original images. This result shows that refining the textural of the mass of the original pixel that further apart the difference between the mass and its neighboring non-dense or dense region background is important to preserve its edge without diminishing the mass itself. The result also justifies that improving the images based on breast density before extracting training features is essential to increase the final overall detection performance.

**Figure 8.** Graph of Recall-Precision Curves and mean Average Precision (mAP) from Mass Detection using modified YOLOv3 on different enhancement techniques.

Figure 9 presents a bar chart showing the comparison of performance between dense and non-dense breasts for mass detection on different image enhancement techniques. On average, the ability of the model to detect mass per image is shown on the overall performance showing the best mass detection when using the proposed SbBDEM images, followed by CLAHE, the original images, and finally, HE/AHE shows lesser performance compared to the original images. The lesser HE/AHE performance is in conformance with previous research [25] where HE/AHE might benefit in its application on RGB to HSV images in terms of gamma correction. Therefore, it is somewhat unsuitable in a grey-level image such as a mammogram, as it can only raise the contrast of the background noise while simultaneously reducing the amount of signal that can be utilized.

**Figure 9.** Graph of Detection Rate and Confidence Score (CS) Accuracy based on Breast Density Level for Mass Detection using modified YOLOv3 on different enhancement techniques.

As for CLAHE, although it improves mass rate detection by ±3%, the overall CS shows slightly lower performance than in the original image. Compared to other techniques, CLAHE operates on tiles rather than the overall image, in which the tiles are enhanced individually, resulting in a locally stretched contrast masking on the homogeneous areas that are limited to avoid amplifying any noise that might be present in the image [68]. This might contribute to the effect of introducing FP cases on the unrelated dense region within the image that was enhanced, giving a similar feature pattern to the mass. Meanwhile, an improvement of 10% from the original image for detection rate and a slight improvement of 2% for CS accuracy is observed when the proposed SbBDEM technique is applied for mass detection. This supports the reason that contributed to its higher performance is its ability to retain the mass and the denser region as it is while reducing the non-dense region pixel value in the background. In return, a prominent spatial feature defining a mass, such as its edge, is enhanced and contributed to the feature mapping extracted in the YOLO layers, resulting to better detection rate and CS accuracy.

On average, the detection rate of the proposed SbBDEM improved to 92.61% using the proposed SbBDEM technique, followed by CLAHE, original, and HE/AHE at 85.65%, 82.61%, and 73.91%, respectively. By standardizing all test images to only the detected images for all enhancement techniques, the CS accuracy, which measures the bounding box accuracy, is highest on average when the model is trained using the proposed SbBDEM with 98.41% accuracy. Nevertheless, CLAHE-enhanced images have a lower CS accuracy performance than the original image, which may be caused by additional FP detections where the overlapping bounding box may contribute to a wider range of overlapping intersections shared on the same image, resulting in a lower CS accuracy score.

On the other hand, non-dense breast exhibits better performance compared to dense breast, as supported by previous studies [10,18,25] on all enhancement techniques for both detection rate and CS. The highest CS accuracy using the proposed SbBDEM method is at 98.07%, showing a boost of 1.62% in performance from the original image for non-dense breast and increase of 9.79% of CS for dense breast. Even though the detection rate for dense breasts is slightly lower at 93.33% than non-dense breasts at 95.33%, the CS accuracy is observed to be slightly better at 99.12% in the dense breast than in non-dense breasts at 98.07%. Additionally, note that the dense breast detection rate improvement is the best, with an increase of 8.66% from the original image. The analysis of mass detection on the

denser background proves that by using the proposed SbBDEM technique, the overlapped mass detection could be improved.

#### *4.2. Analysis of Modified YOLOv3 Performance*

In this study, a modified convolutional neural network (CNN) for YOLOv3 is developed to evaluate the input images. Furthermore, the modification is utilized to detect the mass's location in the mammograms by improving its ability to receive spatial features enhanced from the proposed SbBDEM technique. Table 2 presents the result of mAP performance for mass detection on the original and other enhancement image input settings with and without YOLOv3 modification for comparison.

**Table 2.** mAP performance for mass detection before and after YOLOv3 modification using different image enhancement techniques.


The result displays a pattern of increasing detection performance for all image input settings on the modified YOLOv3 model, except the HE/AHE enhancement input image. The highest mAP of 81.25% is observed using the proposed SbBDEM on the modified model, with an increase in performance of 17.25% compared to using the original image on the non-modified YOLOv3 model. In this study, the modification is crafted to focus on the use of spatial features retained from the proposed SbBDEM training images. Its textural features have been improved based on the result observed from using the proposed SbBDEM technique discussed previously in Table 1. This textural refinement is further taken advantage of as an essential higher-level spatial feature extracted during training by adding the features from the earlier YOLOv3 layer to the second detection head specifically used to detect a smaller object from its initial development setting [40]. Moreover, the extra larger anchor box value that is assigned to these features gives extra weightage and encapsulates the detected mass region through the overlapping of bounding box tiled across the image, further improving the detection performance, resulting in better intersection over union (IoU) placement, given the multi-sizes of the mass on the input images [49].

#### *4.3. Performance of Mass Segmentation and Classification*

After localizing the position of the mass on the image, the mass region is segmented for the ease of feature extraction for classification in this study. Table 3 compares segmentation results by applying the proposed SbBDEM against the original HE/AHE and CLAHE enhancement techniques. A slight improvement in segmentation accuracy can be observed when using the proposed SbBDEM technique by achieving a mean accuracy of 0.9437 from 0.9431 from the original image. Since the mass is well contained within the boundingbox, less overlapping of mass and dense background issue needs to be resolved using the proposed SbBDEM technique. Nevertheless, the proposed SbBDEM technique also produces the highest accuracy along with IoU for both classes of mass and its background.


**Table 3.** Result of semantic segmentation for mass using different image input settings.

Meanwhile, we employed handcrafted features from the segmented mass region with and without using the principal component analysis (PCA) feature reduction method for benign and malignant classification. Comparison is also made using the chi-square-based feature selection method by removing features having a chi-square score of less than 1.0 as correlated features during training. The result shows the highest testing accuracy for benign vs. malignant mass of 96.0% is achieved on the training time at 0.670 s.

Additionally, a comparison of mass detection results of the past studies and similar methods are listed in Table 4, with and without breast density consideration before or after analysis performance, as well as the computational cost for each algorithm's deployment. In this study, the main objective is to validate the performance of object detection utilizing the simplest CNN of SqueezeNet for a modified YOLOv3 using a differently enhanced input image, specifically to improve the performance for the detection of mass in dense breast mammograms. Similar works addressed the problem of mass detection while disregarding the probable issue of class training imbalance caused by higher non-dense images in the training images that could contribute to lower Computer-Aided Diagnosis (CAD) establishment in clinical settings. In contrast, our study specifically brings the breast density into the focus of the learned parameter of the training images to adapt the class imbalance and improve the image before it can be trained to conduct mass detection, consequently bringing a good mass abnormality classifier. Nonetheless, limited studies have used metrics to compare their performance between non-dense and dense images before and after implementing their proposed work, making it difficult to make a suitable state-of-the-art analysis.


**Table 4.** Comparison of CAD for mammogram mass detection previous works.

<sup>1</sup> Benign, <sup>2</sup> Malignant, Acc = Accuracy, ROC = Area under ROC curve, fps = frame per second.

Although direct comparison is essentially incomparable between these works, both detection accuracy rate and testing time indicate that we achieved a better overall performance, which plays a significant role in showing that the proposed SbBDEM technique indeed increases the density-based performance. Our method outperformed works by [24,25] in terms of accuracy for non-dense and dense images. However, their work uses different datasets for a fair comparison. To the best of our knowledge, no study has been conducted using specifically the INbreast dataset with the metrics included for density-based mass detection. Meanwhile, work by [18] achieves 99.91% accuracy for benign and malignant

classification compared to our method at 96% accuracy based on different breast densities. However, since the study's augmentation process brings almost 7000 images from, originally, 112 images in INbreast, their work may cause unreliable results if the same technique is applied to a newer dataset. In contrast, the work of [45] exceeded our detection results for the same dataset. Nevertheless, it required more testing time than our approach due to the simpler training architecture employed. Additionally, since most of the studies listed applied CLAHE in their pre-processing stage, given that our enhancement method improves the detection model by mAP of 13.33% for CLAHE compared to the proposed SbBDEM technique as discussed in the result section, it is also expected to increase these studies detection stage if our pre-processing method is applied beforehand. Indeed, low accuracy limitations could be overcome by applying a more complex algorithm with more sophisticated hardware for training, which is expected to further improve the currently proposed SbBDEM technique for mass detection.

#### **5. Conclusions**

This work presents an image enhancement method according to the breast density level for Computer-Aided Diagnosis (CAD) stages for mammogram image analysis. Based on the result, the proposed SbBDEM technique could increase the performance for all stages of mass detection, segmentation, and classification for mammogram images. An improvement is observed when the proposed SbBDEM method is compared to the original image and the most widely used enhancement technique, i.e., contrast-limited adaptive histogram equalization (CLAHE) and histogram equalization (HE). The adjustment of the lower limit cap acts as a threshold value to separate the dense and mass to non-dense regions. This helps refine the textural information as a feature that represents both regions and through textural feature extraction in the classification stage, boosting the accuracy to 96% for the 5-fold cross-validation of benign vs. malignant classification experiment. The result also presents an improvement of mass detection with mean Average Precision (mAP) = 0.6401 to mAP = 0.8125, with mass detection in non-dense and dense accuracy of 93.33% and 95.33%, respectively. We achieved an increase of 98.41% confidence scores (CS) as opposed to 91.84% in the original image and a slight improvement of 0.03% in the mass segmentation using the proposed SbBDEM technique.

Meanwhile, in its original documentation, You Only Look Once v3 (YOLOv3) specializes in detecting smaller objects with the implementation of the second detection head. We further utilize this by modifying the second detection head into receiving the textural features that were already enhanced in the pre-processing stage through our proposed SbBDEM technique by adding these features to the deeper learning layer that contains more semantic information of the same image to improve the feature discrimination.

Our proposed method is limited by the unavailability of standardized image quality metrics that can determine the best image for all training images based on textural elements while considering the need for thousands of images for deep-learning purposes. While a high-quality image might be good for measuring accuracy, it is unnecessarily true to measure its textural aspect. Although statistical information for textural analysis is available, more suitable metrics can be investigated for more reliable metrics that relate image quality and texture. Additionally, with a running GPU capability of only 6 GB, the study is limited by the unavailability of a more sophisticated computing facility to employ higher-functioned YOLO, such as versions 4, 5, 6, and 7 without affecting the performance by reducing the mini-batches. However, the implementation of YOLOv3 in this study is sufficient as a way to demonstrate the effectiveness of density-based enhancement on the dataset before training and was modified based on its simplicity, which only runs on 5 MB or 1.2 million learnable parameters. Future studies could be explored by using other breast mammogram datasets with validation from a trained radiologist to enable CAD implementation in the medical field. Finally, the result obtained was comparable to the state-of-the-art performance from other methods discussed and can work as a base model for future updates by employing a more complex model on another dataset as well.

**Author Contributions:** Conceptualization, N.F.R. and I.S.I.; Formal analysis, N.F.R., N.K.A.K. and M.K.O.; Funding acquisition, N.F.R. and Z.H.C.S.; Investigation, N.F.R., I.S.I. and S.N.S.; Methodology, N.F.R., I.S.I., S.N.S. and M.K.O.; Project administration, I.S.I. and Z.H.C.S.; Resources, Z.H.C.S.; Software, I.S.I. and S.N.S.; Supervision, I.S.I.; Validation, N.K.A.K.; Visualization, N.F.R.; Writing original draft, N.F.R.; Writing—review and editing, N.F.R. and I.S.I. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Fundamental Research Grant Scheme (FRGS) under the Ministry of Higher Education Malaysia, grant number FRGS/1/2022/SKK06/UITM/02/3.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Updated dataset of INbreast is acquired through personal communication via email with the original author for latest updates.

**Acknowledgments:** The authors would like to acknowledge the Breast Research Group, INESC Porto, Portugal, for providing the updated INbreast dataset and its accompanying ground truth for this study. The authors would also like to express their gratitude to members of the Advanced Control System and Computing Research Group (ACSCRG), Advanced Rehabilitation Engineering in Diagnostic and Monitoring Research Group (AREDiM), Integrative Pharmacogenomics Institute (iPROMISE), and Centre for Electrical Engineering Studies, Universiti Teknologi MARA (UiTM), Cawangan Pulau Pinang for their assistance during the duration of this study. Finally, the authors are grateful to UiTM, Cawangan Pulau Pinang, for their immense administrative support.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
