Mask Branch Network: Weakly Supervised Branch Network with a Template Mask for Classifying Masses in 3D Automated Breast Ultrasound

Kim, Daekyung; Park, Haesol; Jang, Mijung; Lee, Kyong-Joon

doi:10.3390/app12136332

Open AccessArticle

Mask Branch Network: Weakly Supervised Branch Network with a Template Mask for Classifying Masses in 3D Automated Breast Ultrasound

¹

Monitor Corporation, Seoul 06628, Korea

²

Department of Radiology, Seoul National University Bundang Hospital, Seongnam 13620, Korea

³

Korea Institute of Science and Technology, Seoul 02792, Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(13), 6332; https://doi.org/10.3390/app12136332

Submission received: 21 April 2022 / Revised: 10 June 2022 / Accepted: 11 June 2022 / Published: 22 June 2022

(This article belongs to the Special Issue New Frontiers in Medical Image Processing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Automated breast ultrasound (ABUS) is being rapidly utilized for screening and diagnosing breast cancer. Breast masses, including cancers shown in ABUS scans, often appear as irregular hypoechoic areas that are hard to distinguish from background shadings. We propose a novel branch network architecture incorporating segmentation information of masses in the training process. The branch network is integrated into neural network, providing the spatial attention effect. The branch network boosts the performance of existing classifiers, helping to learn meaningful features around the target breast mass. For the segmentation information, we leverage the existing radiology reports without additional labeling efforts. The reports, which is generated in medical image reading process, should include the characteristics of breast masses, such as shape and orientation, and a template mask can be created in a rule-based manner. Experimental results show that the proposed branch network with a template mask significantly improves the performance of existing classifiers. We also provide qualitative interpretation of the proposed method by visualizing the attention effect on target objects.

Keywords:

ultrasound; classification; weakly supervised learning

1. Introduction

Breast cancer is the second leading cause of cancer death in women worldwide [1,2]. Many studies have revealed that breast screening can discover cancer in its early stages, reducing mortality [3,4,5,6,7]. Mammography has been the most widely used examination for breast screening, but it often misses cancer hidden in dense breast tissue that contains much fibrous or glandular tissue rather than fat [8]. Hand-held ultrasound (HHUS) imaging is a popular alternative covering dense breast tissue, but it suffers from low reproducibility depending on its operator. Recently, automated breast ultrasound (ABUS) has been introduced and is receiving favorable reviews [9]. While an HHUS makes partial two-dimensional breast scans that need to be simultaneously examined by an operator on site, an ABUS produces whole three-dimensional (3-D) breast scans with a dedicated probe that can be asynchronously examined later. This has led to the active development of computer-aided diagnosis (CAD) systems for ABUS, which can assist interpreting physicians by reducing their workload and enhancing cancer detection performance [10,11,12].

The main task of CAD systems is to detect breast masses that can possibly grow into cancer. Breast masses on ABUS scans are usually shown as hypoechoic areas that can also appear due to a variety of causes, such as fat, shadow, and anechoic tissue; thus, many CAD systems erroneously detect these areas as suspicious breast masses, resulting in false positives. Even radiology professionals have difficulty distinguishing hypoechoic areas of breast masses from those of other causes and have been found to detect a number of false positives on breast ultrasound scans [13,14]. Several studies [15,16,17,18] have proposed image classifiers based on conventional image processing techniques but have not been reported to significantly reduce false positives.

To develop competitive classifiers on ABUS, recent studies have employed convolutional neural networks (CNNs) that have shown tremendous success in image classification tasks. Chiang et al. [19] designed a two-stage breast mass detector on ABUS scans with a modified CNN architecture. A fast sliding window first searches for volumes of interest, which is then fed into a simple three-dimensional extension of a 2-D CNN to compute probabilities of tumor candidates. Moon et el. [20] incorporated focal loss and ensemble learning into their 3-D CNN model to resolve a data imbalance issue. Although those methods have been shown to sensitively detect breast masses, they yield an average of more than four false positives per scan, which means room for improvements still remains. Simple modifications of the existing CNNs may not be sufficient to analyze the three-dimensional context of breast masses on ABUS scans.

Training a CNN requires many sample images that preferably contain clear features of target objects that can be distinguished from their backgrounds. Compared to ordinary objects (e.g., cats, boats) shown in common training samples, breast masses shown in ABUS scans may not present clear boundaries. As seen in the first row of Figure 1, the breast masses (indicated by the arrows) often appear as irregularly shaded blobs, and each blob itself is hardly distinguishable from its background shadings. Moreover, the inherent low image resolution and high noise level of ABUS scans make it more difficult to train CNNs for indistinct masses.

Considering these characteristics, we empirically found that training with a segmentation mask on a mass helps improve the classification performance, probably because the low-level features on the blobs are activated on the masses rather than on the backgrounds. The second row of Figure 1 shows class activation maps (CAMs) for mass classification using the existing DenseNet [21]. The activated network weights are widely distributed across the background blobs and are not concentrated on the masses. In contrast, our modified classifier with segmentation masks intensively activates the weights on the mass blobs, as illustrated in the third row. To integrate the mask information, we attached an additional network (called the mask branch network) to the middle of a classification network. Since our branch network is independently applicable, the proposed method can be modified from any existing state-of-the-art CNN model.

A practical issue of the proposed method lies in the difficulty of generating segmentation masks Aside from the high cost of professional tasks within a three-dimensional space, the exact boundary is not clear even for medical experts. We propose to employ a template mask with predefined shapes instead (e.g., circle, oval) with minimal parameters such as diameter and direction. This information is typically recorded in radiology reports; thus, additional labeling is rarely required.

The contributions of the proposed method can be summarized as follows:

Mask Branch Network: We propose a branch network incorporating mask information that can be attached to the middle of existing networks yielding intermediate features of input scans.
Template Mask: We use the characteristics of breast mass recorded in radiology reports to generate a simplified segmentation mask, requiring no additional labor for segmentation.
Performance evaluation: Extensive experiments exhibit that the proposed method contributes to improve the model’s performance. Especially, We provide considerable test results with different architectures of the branch network as well as with variations of the template mask in the process of obtaining an optimal model.

2. Materials and Methods

The overall architecture of the proposed method is illustrated in Figure 2. Our network architecture consists of two parallel networks: a main network and a mask branch network. The main network works as a backbone CNN classifier. Any high-performance CNN classifier can be a main network, for which we employ a 3-D image classifier based on the DenseNet [21] architecture. In the middle of the main network, the mask branch network bifurcates off to compute the mask loss by comparing it with the template mask generated from a radiology report. The main loss and the mask loss from each branch are integrated into the total loss to be optimized in the training process.

2.1. Main Network

The main network implements a binary classifier determining the existence of suspicious breast lesions. It is composed of four dense blocks with transition layers based on DenseNet-BC-121 [21]. A transition layer is placed between two consecutive dense blocks to perform downsampling, batch normalization, a

1 \times 1 \times 1

convolution, and average pooling. The global average pooling (GAP) layer is connected at the end of the last dense block, followed by the fully connected (FC) layer and the softmax activation layer sequentially. The main loss can be defined as a cross entropy loss of input samples, shown as follows:

L_{m a i n} = \frac{1}{N} \sum_{i}^{N} [(1 - y_{i}) log (1 - p_{i}) + y_{i} log p_{i}],

(1)

where i means the index of an input sample, N is the number of samples,

y_{i} \in {0, 1}

is the ground-truth label, and

p_{i}

is the probability from the main network estimating whether the input sample contains a mass.

2.2. Mask Branch Network

The concept of branch architecture [22,23] is to train multiple tasks that take advantage of the interaction between different tasks. The mask branch aims to integrate spatial information into the main network, which helps the main network extract meaningful features from the area around the mass.

The mask branch is designed to start at the

1 \times 1 \times 1

convolutional layer of the second transition layer of the main branch and to generate a branch output, which is a voxel-level probability map that estimates the presence of target lesions. We may assume that this branch output simulates a segmentation mask.

We denote

F \in R^{C \times H \times W \times D}

as a midlayer CNN feature extracted from the second transition layer, where

C, H, W

and D are the number of channels, height, width and depth of the feature map, respectively.

The branch output B can be formulated as:

B = s o f t m a x (f (σ (F), w)),

(2)

where

f (\cdot, \cdot)

denotes a 3D convolution function,

σ

represents the activation function,

B \in R^{H \times W \times D \times 2}

means the generated branch output map, and

w \in R^{2 \times C \times 1 \times 1 \times 1}

indicates convolutional parameters. As a result, branch output is a voxel-level binary probability map of the target lesion and background.

Comparison between branch output and the mask yields the mask loss, a pixelwise cross-entropy loss, by which the model learns pixelwise class probabilities. With the mask that contains spatial information of the target lesion, the mask loss is calculated pixelwise between the branch output and the mask. The definition of mask loss is based on the equation below:

L_{m a s k} = \frac{1}{N M} \sum_{j}^{N} \sum_{i}^{M} [(1 - z_{i, j}) log (1 - q_{i, j}) + z_{i, j} log q_{i, j}],

(3)

where j is the index of a sample of the input batch, i is the index of a voxel in the

H \times W \times D

space,

M = H W D

is the total number of voxels in the branch output,

z_{i, j} \in {0, 1}

is a pixel value of the mask

T \in R^{H \times W \times D}

, which is proposed in Section 2.4, and

q_{i, j}

is the probability from the mask branch.

The total loss of the whole model, subject to optimization, is a combination of main loss and mask loss. To combine these losses efficiently, the uncertainty loss weight method [24] commonly used for multitask loss is adapted for combining our losses and expressed as follows:

L_{t o t a l} = \frac{1}{σ_{1}^{2}} L_{m a i n} + \frac{1}{σ_{2}^{2}} L_{m a s k} + log σ_{1} + log σ_{2},

(4)

where

σ_{1}

and

σ_{2}

are trainable variables that learn the relative weights of

L_{m a i n}

and

L_{m a s k}

adaptively and regulate the balance of the losses. When training model,

L_{t o t a l}

let scale value of

σ_{1}

and

σ_{2}

regulate the contribution of

L_{m a i n}

and

L_{m a s k}

, preventing

σ_{1}

and

σ_{2}

being too large.

Note that the mask branch network is used only for the training process to focus on the region of interest and not for the testing process. Thus, the proposed method does not increase computational complexity in the inference phase.

2.3. Variation of Network Designs

We designed a mask branch network with four types from MBN-V1 to MBN-V4, as shown in Figure 3. We explored the architecture of the mask branch network with several factors: the number of branches, starting positions of branches, and how to combine branch outputs in the case of multiple branches. Although the principle of operation is the same as that explained by MBN-V1 above, the suitable structure of the mask branch network can be diverse for each backbone network and task. We design single or multiple subnetworks parallel to the main network. MBN-V1 and MBN-V2 have a single branch, generating only one output map.

MBN-V1 is attached to the 2nd transition layer, extracting feature map

F_{1}

and resulting in

B_{1}

(Figure 3a). The mask loss

L_{m a s k}

is used to reduce the difference between

B_{1}

and

T_{1}

. MBN-V2 is integrated into the backbone on the third transition layer, as shown in Figure 3b. In contrast to MBN-V1 and MBN-V2, the other versions of the mask branch network consist of two branches that were used for MBN-V1 and MBN-V2. In MBN-V3, the two branches are separated, resulting in

B_{1}

and

B_{2}

, which have different shapes, as shown in Figure 3c. The branch outputs are matched to

T_{1}

and

T_{2}

, calculating two corresponding mask losses. Finally, the losses are combined with the uncertainty loss weight method, and the branches are optimized by each loss. Conversely, MBN-V4 integrated the output maps to one single array by a summation operation (Figure 3d). In MBN-V4, after

B_{1}

is downsampled to the same shape as

B_{2}

through 3D average pooling,

B_{1}

and

B_{2}

are integrated by elementwise summation.

We show the evaluation results of each case implemented in the experiments section.

2.4. Template Mask

The shading blobs in the background ABUS image are often indistinguishable from the shading of the mass. For the segmentation mask, a pixelwise segmentation map fully annotated by clinical experts can be an ideal mask; however, manually tracking the boundaries of a three-dimensional mass requires considerable time and effort from the expert. To address this annotation problem, we propose to utilize the information that already exists in the radiology report instead of manually labeling the ground truth.

2.4.1. Criteria on Radiology Reports

The location, size and category of suspicious lesions are usually written in radiology reports during diagnostics. By annotating the three axes of the lesion, its location and size are recorded in the form of center coordinates and diameters. For categories, the Breast Imaging Reporting and Data System (BI-RADS) [25] category is the most widely used standard for determining the malignancy of breast lesions. As factors for BIRADS, some visual characteristics are recorded in radiology reports during the diagnosis of breast images; ‘shape’, ‘orientation’, ‘margins’, ‘posterior acoustic feature’, etc. [25] Among those properties, we employ orientation features, indicating that the direction of the long axis relative to the breast skin is parallel or not because it reflects the approximate shape of the lesion with two kinds of categories: parallel or not parallel. By utilizing these properties, we can automatically create a simplified label mask, named the template mask, which contains spatial information and guides the classifier where to focus. Due to the limited information available, this simple mask cannot be regarded as an exact segmentation label map obtained by the annotation process.

2.4.2. How to Create a Template Mask

Similar to the general segmentation map, the template mask is a binary label map that consists of positive and negative areas. The positive area is created by the rule utilizing the position, diameter, and orientation of breast lesions. The shape of the template mask is one of two types: sphere or ellipsoid. If the lesion has a ‘parallel’ property of orientation, the diameter perpendicular to the skin is halved, changing the shape of the positive area from a sphere to an ellipsoid. With the center coordinate

(x_{t}, y_{t}, z_{t})

, diameter

d_{t}

and orientation of the lesion, we can define a template mask

T (x, y, z)

as a simplified segmentation label.

T (x, y, z) = \{\begin{matrix} 1 & if \frac{{(x - x_{t})}^{2}}{d_{t}} + \frac{{(y - y_{t})}^{2}}{α d_{t}} + \frac{{(z - z_{t})}^{2}}{d_{t}} \leq 1, \\ 0 & else \end{matrix}

(5)

where

α

is 0.5 if the orientation is parallel; otherwise,

α = 1

.

An example of a template mask is shown in Figure 4 with a sampled breast cancer. Figure 4a–c show transverse, coronal, and sagittal views of the center of a breast lesion. With lesion diameters of 10 mm, 18 mm, and 15 mm on each axis, the average diameter was 14.3 mm. The centroid of the template mask is located in the same position as the center of the lesion. Since the lesion has a ‘parallel’ feature of orientation, the positive area shrinks in the skin-perpendicular direction, obtaining the shape of an ellipsoid whose semidiameters are one of 7.15 mm and two of 14.3 mm, as shown in Figure 4d–f.

3. Results

To the best of our knowledge, no public dataset is available for ABUS research; thus, we built our own dataset in a tertiary hospital. ABUS images used in this study were acquired from ABUS systems (Invenia ABUS, Automated Breast Ultrasound System; GE Healthcare, Sunnyvale, CA, USA). For each breast, three volumes were obtained: the central volume, the lateral volume, and the medial volume. The institutional review board approved this study and waived informed consent, considering the retrospective study design and the use of anonymized patient data.

3.1. Implementation Details

A total of 363 patients who underwent ABUS from May 2017 to October 2019 were included. ABUS images of 286 patients presented 434 mass lesions categorized as

C 2

by BI-RADS, while other 77 images showed no mass lesions. We randomly divided the mass lesions: 304 lesions in the training set, 50 lesions in the validation set, and 80 lesions in the test set. The dataset also includes

3907

nonmass lesions that were randomly cropped from the 77 normal patients’ images and randomly divided into

3777

lesions for the training set and 50 and 80 lesions for the validation and test sets, respectively.

All center coordinates and diameters of masses were obtained from radiology reports annotated by experienced radiologists. We rescaled the original volume images to have voxel sizes of

0.3

mm to

2.0

mm depending on the mass size, and then we cropped the network input volumes at the center coordinate of the mass with a size of

48 \times 32 \times 48

voxels. Additionally, the 3-dimensional patches used for training are rotated three times with rotation angles of

90^{\circ}

,

180^{\circ}

, and

270^{\circ}

for augmentation.

The weights of the convolutional layers were initialized by the Xavier uniform initializer [26]. We performed batch normalization (BN) [27] immediately after each convolution layer, and ReLU activation was followed by BN. The models we used were trained with a batch size of 48 with 2 mini batches of the same size, and the losses were optimized by the RMSprop optimizer [28]. All networks in this study were implemented with the TensorFlow 1.12 library and were trained on Nvidia 2080-Ti on an Ubuntu 18.04 system.

3.2. Evaluation Metrics

The classification performance was evaluated by measuring the sensitivity (

S e

), specificity (

S p

), and area under the curve (

A U C

) of receiver operating characteristics (

R O C

). The sensitivity and specificity are the percentage of positive and negative results that are correctly identified. The metrics are defined as:

S e n s i t i v i t y (S e) = \frac{T P}{T P + F N}

(6)

S p e c i f i c i t y (S p) = \frac{T N}{T N + F P}

(7)

where

T P

and

T N

are true positive and true negative and likewise FP and FN are false negative and false positive. The AUC was calculated by using receiver operating characteristic (ROC) analysis on the test set.

3.3. Quantitative Results

3.3.1. Variation of MBN Architectures

To evaluate the effect of the proposed mask branch network (MBN), we tested the main network as a baseline and compared the results from the main network with MBN. We employed two networks as the main network: DenseNet-BC [21] and ResNetV2-101 [29]. As shown in Table 1, the proposed networks with MBN outperform the baselines in every performance measure in both kinds of main networks. While all versions of our method achieve higher performance overall compared with the baseline, the version of best performance is different depending on the baseline. In the case of DenseNet, MBN-V2 achieved the best performance for all metrics. In the lower part of Table 1, the result of our method with ResNet is shown for performance comparison.

Similar to cases with DenseNet, all of the proposed methods outperform the plain ResNet model. Unlike experiments based on DenseNet, the mask branch network achieved the best performance by using MBN-V2 for sensitivity and MBN-V3 for specificity and AUC. Figure 5 compares ROC curves for the mass classifier with the mask branch network.

3.3.2. Comparison of Loss Weighting Strategies

We compared weighting loss methods with mask loss. Various methods have been used to efficiently combine multiple losses in multitask learning (MTL) [30]. The most straightforward method is uniform weighting: the losses are simply added together to produce a single scalar loss value. Dynamic optimization techniques are also important in MTL to optimize the set of possibly contrasting losses or gradients because conflicting gradient problems can degrade model performance. Kendall et al. [24] proposed an uncertainty-based weighting approach and applied it to a CNN. Ref. [31] adapted the regularization term in Kendall’s uncertainty-based method. A dynamic weight average (DWA) was proposed by [32]. We adapted these methods to the proposed method with MBN-V1 and exactly the same experimental setting and compared the performance. The results are shown in Table 2, which shows that the uncertainty weighting and revised uncertainty weighting achieve higher performance compared to other settings. The results do not show many differences between uncertainty weight and revised uncertainty weight, but their performance is considerably better than the others. Considering this result, we applied uncertainty weighting to our method for the final result. The left of Figure 6 shows ROC curves depending on ways to weigh multiple losses.

3.3.3. Characteristics Suitable for Template Mask

To evaluate the effect of mass features available in radiology reports, we compared the performance of template masks utilizing various feature sets. Table 3 shows the cases using different features from breast lesions and their results. We adapted the cases to denset with MBN-V1 and uncertainty weighting. Even utilizing location only (with a fixed sphere of 20 mm diameter) helps to improve all performance metrics compared to the baseline (DenseNet without MBN; no mask) The size (diameter) also enhanced the metrics, and the combination of all features yielded the best performance, as seen in Table 3. In addition, The right graph of Figure 6 compares ROC curves for various template masks.

Figure 5. ROC curves of the mask branch network with DenseNet (Left) and ResNet (Right).

Figure 6. ROC curves of loss weight methods and template masks. (Left): Models using various ways to weigh multiple losses; (Right): Cases utilizing different characteristics for breast lesions.

3.4. Qualitative Results

Figure 7 qualitatively visualizes the spatial attention effect of the proposed mask branch by plotting the grad-CAM (class activation map) visualization. The first row shows sample mass images with red rectangles indicating the masses. The CAM of DenseNet (second row) does not appropriately activate areas where the masses are present. In contrast, the proposed method (third row) shows relatively clear activation on the target areas; thus, the main network is expected to learn to exploit and aggregate features in the target area.

3.5. Cancer Classification

We also evaluated the performance of the cancer classification task, which determines whether the lesion is malignant, as shown in Table 4. Due to the high similarity of benign and malignant lesions, the cancer classifier is a much more challenging task than the mass classifier. As seen in the first row of Table 4, without a mask branch network, the network is hardly trained with approximately 0.5 metrics. The performances of all of the cases with DenseNet are improved compared with the plain network. MBN-V2 looks the best-performing version in cancer classification, similar to mass classification in terms of sensitivity and AUC, although the highest specificity is from MBN-V4.

4. Discussion

The best-performing architecture of the mask branch network depends on the adopted tasks: mass classification and cancer classification. The mass classifier tended to result in higher performance when the mask branch network was attached to a deeper position, whereas the cancer classifier did not. The discrepancy depends on how similar the objectives of the main and branch networks are: The mask branch network has a similar function as the mass classifier in terms of detecting the existence of lesions, in which the mass classifier achieved greater improvement as more features are shared by attaching the mask branch network to the deeper position. However, the cancer classifier has to learn the malignancy of lesions, not the existence of lesions. Therefore, detailed features in layers at the bottom of the main network for cancer classification can be interrupted by the mask branch network. Due to the difference between the functions of networks, locating the attached position to the deeper position of the cancer classifier is better, reducing shared features.

A few kinds of information in radiology reports can be utilized for creating template masks. In the experiment section, we implemented and evaluated various types of template masks with orientation, position, and diameter. All cases with template masks showed better performance than the case without template masks.

In addition to input images, applying the proposed method requires detailed information on breast lesions with additional labor costs or costs. Hence, it is important to minimize additional information to increase efficiency. The objective of the template mask we intended is to cover the area where the breast mass exists. Therefore, no more information is needed if the mask is large enough to cover the target area. Thus, a template mask with minimal information, such as position and diameter, performed sufficiently.

5. Conclusions

This study introduced our novel branch network incorporating spatial information to improve the performance of a CNN for classifying masses on ABUS images. We utilized the characteristics of breast mass recorded in radiology reports to generate a simplified mask, a template mask that required no additional labor for segmentation. The proposed MBN can be attached to existing networks and has been shown to boost performance. We tested variations in the branch network and found the optimal architecture of the MBN. The proposed method was effective even in cancer classification, which is a more complicated task.

Author Contributions

Conceptualization, D.K. and K.-J.L.; methodology, D.K.; software, D.K.; validation, D.K. and M.J.; data curation, D.K. and M.J.; writing—original draft preparation, D.K.; writing—review and editing, H.P. and K.-J.L.; visualization, D.K.; supervision, H.P. and K.-J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ferlay, J.; Colombet, M.; Soerjomataram, I.; Parkin, D.M.; Piñeros, M.; Znaor, A.; Bray, F. Cancer statistics for the year 2020: An overview. Int. J. Cancer 2021, 149, 778–789. [Google Scholar] [CrossRef] [PubMed]
Siegel, R.L.; Miller, K.D.; Fuchs, H.E.; Jemal, A. Cancer statistics, 2021. CA Cancer J. Clin. 2021, 71, 7–33. [Google Scholar] [CrossRef] [PubMed]
Miller, A.B.; Wall, C.; Baines, C.J.; Sun, P.; To, T.; Narod, S.A. Twenty five year follow-up for breast cancer incidence and mortality of the Canadian National Breast Screening Study: Randomised screening trial. BMJ 2014, 348, g366. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Saslow, D.; Boetes, C.; Burke, W.; Harms, S.; Leach, M.O.; Lehman, C.D.; Morris, E.; Pisano, E.; Schnall, M.; Sener, S.; et al. American Cancer Society guidelines for breast screening with MRI as an adjunct to mammography. CA Cancer J. Clin. 2007, 57, 75–89. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Austoker, J.; Beral, V.; Berrington, A.; Blanks, R. Screening for breast cancer in England: Past and future: Advisory Committee on Breast Cancer Screening. J. Med. Screen. 2006, 13, 59–61. [Google Scholar]
Seely, J.; Alhassan, T. Screening for breast cancer in 2018—What should we be doing today? Curr. Oncol. 2018, 25, 115–124. [Google Scholar] [CrossRef] [Green Version]
Monticciolo, D.L.; Newell, M.S.; Moy, L.; Niell, B.; Monsees, B.; Sickles, E.A. Breast cancer screening in women at higher-than-average risk: Recommendations from the ACR. J. Am. Coll. Radiol. 2018, 15, 408–414. [Google Scholar] [CrossRef]
Van Goethem, M.; Schelfout, K.; Dijckmans, L.; Van Der Auwera, J.; Weyler, J.; Verslegers, I.; Biltjes, I.; De Schepper, A. MR mammography in the pre-operative staging of breast cancer in patients with dense breast tissue: Comparison with mammography and ultrasound. Eur. Radiol. 2004, 14, 809–816. [Google Scholar] [CrossRef]
Vourtsis, A.; Kachulis, A. The performance of 3D ABUS versus HHUS in the visualisation and BI-RADS characterisation of breast lesions in a large cohort of 1,886 women. Eur. Radiol. 2018, 28, 592–601. [Google Scholar] [CrossRef]
Yang, S.; Gao, X.; Liu, L.; Shu, R.; Yan, J.; Zhang, G.; Xiao, Y.; Ju, Y.; Zhao, N.; Song, H. Performance and reading time of automated breast US with or without computer-aided detection. Radiology 2019, 292, 540–549. [Google Scholar] [CrossRef]
Jiang, Y.; Inciardi, M.F.; Edwards, A.V.; Papaioannou, J. Interpretation time using a concurrent-read computer-aided detection system for automated breast ultrasound in breast cancer screening of women with dense breast tissue. Am. J. Roentgenol. 2018, 211, 452–461. [Google Scholar] [CrossRef] [PubMed]
Van Zelst, J.; Tan, T.; Clauser, P.; Domingo, A.; Dorrius, M.D.; Drieling, D.; Golatta, M.; Gras, F.; de Jong, M.; Pijnappel, R.; et al. Dedicated computer-aided detection software for automated 3D breast ultrasound; an efficient tool for the radiologist in supplemental screening of women with dense breasts. Eur. Radiol. 2018, 28, 2996–3006. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Crystal, P.; Strano, S.D.; Shcharynski, S.; Koretz, M.J. Using sonography to screen women with mammographically dense breasts. Am. J. Roentgenol. 2003, 181, 177–182. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kelly, K.M.; Dean, J.; Comulada, W.S.; Lee, S.J. Breast cancer detection using automated whole breast ultrasound and mammography in radiographically dense breasts. Eur. Radiol. 2010, 20, 734–742. [Google Scholar] [CrossRef] [Green Version]
Shi, X.; Cheng, H.; Hu, L. Mass detection and classification in breast ultrasound images using fuzzy SVM. In Proceedings of the 9th Joint International Conference on Information Sciences (JCIS-06), Kaohsiung, Taiwan, 8–11 October 2006; Atlantis Press: Amsterdam, The Netherlands, 2006. [Google Scholar]
Murali, S.; Dinesh, M. Classification of Mass in Breast Ultrasound Images Using Image Processing Techniques. Int. J. Comput. Appl. 2012, 42, 29–36. [Google Scholar]
Lo, C.M.; Chen, R.T.; Chang, Y.C.; Yang, Y.W.; Hung, M.J.; Huang, C.S.; Chang, R.F. Multi-dimensional tumor detection in automated whole breast ultrasound using topographic watershed. IEEE Trans. Med. Imaging 2014, 33, 1503–1511. [Google Scholar] [CrossRef]
Ye, C.; Vaidya, V.; Zhao, F. Improved mass detection in 3D automated breast ultrasound using region based features and multi-view information. In Proceedings of the 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Chicago, IL, USA, 26–30 August 2014; pp. 2865–2868. [Google Scholar]
Chiang, T.C.; Huang, Y.S.; Chen, R.T.; Huang, C.S.; Chang, R.F. Tumor detection in automated breast ultrasound using 3-D CNN and prioritized candidate aggregation. IEEE Trans. Med. Imaging 2018, 38, 240–249. [Google Scholar] [CrossRef]
Moon, W.K.; Huang, Y.S.; Hsu, C.H.; Chien, T.Y.C.; Chang, J.M.; Lee, S.H.; Huang, C.S.; Chang, R.F. Computer-aided tumor detection in automated breast ultrasound using a 3-D convolutional neural network. Comput. Methods Programs Biomed. 2020, 190, 105360. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Teerapittayanon, S.; McDanel, B.; Kung, H.T. Branchynet: Fast inference via early exiting from deep neural networks. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 2464–2469. [Google Scholar]
Fukui, H.; Hirakawa, T.; Yamashita, T.; Fujiyoshi, H. Attention branch network: Learning of attention mechanism for visual explanation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10705–10714. [Google Scholar]
Kendall, A.; Gal, Y.; Cipolla, R. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7482–7491. [Google Scholar]
Orel, S.G.; Kay, N.; Reynolds, C.; Sullivan, D.C. BI-RADS categorization as a predictor of malignancy. Radiology 1999, 211, 845–850. [Google Scholar] [CrossRef] [Green Version]
Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
McMahan, B.; Streeter, M. Delay-tolerant algorithms for asynchronous distributed online learning. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, USA, 8–13 December 2014; pp. 1–9. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 630–645. [Google Scholar]
Gong, T.; Lee, T.; Stephenson, C.; Renduchintala, V.; Padhy, S.; Ndirango, A.; Keskin, G.; Elibol, O.H. A comparison of loss weighting strategies for multi task learning in deep neural networks. IEEE Access 2019, 7, 141627–141632. [Google Scholar] [CrossRef]
Liebel, L.; Körner, M. Auxiliary tasks in multi-task learning. arXiv 2018, arXiv:1805.06334. [Google Scholar]
Liu, S.; Johns, E.; Davison, A.J. End-to-end multi-task learning with attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1871–1880. [Google Scholar]

Figure 1. Class activation maps (CAMs) for sample mass images. The first row shows slice images of the breast mass in different views: transverse (left), coronal (middle), and sagittal (right). The second row shows the CAM results from the existing DenseNet classifier. The third rows show the CAM results from the proposed method. An activation with a high weight is shown in red color.

Figure 2. The architecture of our proposed network: mask branch network. Mask branch network is integrated into main network, helping the main network focus on where the lesions exist. Mask branch network learns spatial information from the template mask generated from charateristics of the lesion.

Figure 3. Architectures of the mask branch network we proposed with DenseNet. (a,b) MBN-V1, MBN-V2 with a single subnet, where each branch is attached to conv2 and conv3. (c,d) MBN-V3, MBN-V4 with multiple branch subnets containing branches of both MBN-V1 and MBN-V2. In MBN-V3 of (c), the losses are calculated with each branch output separately. MBN-V4 combines the branch outputs by summation and calculates loss in once, as shown as (d).

Figure 4. Example of template mask. (a–c) Slice image of transverse, coronal, sagittal view of breast cancer on ABUS including diameters of each axis. (d–f) template mask matched to (a–c). Using average diameter of the lesion, the mask label has two of equal semidiameters of 14.3 mm and 7.15 mm semidiameters, which is the half-length of the other diameters due to parallel orientation property.

Figure 7. Class activation maps (CAMs) to show the effect of the mask branch network (MBN). (Top row) Breast mass samples on ABUS images marked by red boxes. (Middle row) CAMs of DenseNet. (Bottom row) CAMs of DenseNet with MBN.

Table 1. Performance comparison with the mask branch network(MBN) on DenseNet-BC and ResNetV2-101 on mass classifier.

	Sensitivity	Specificity	AUC
DenseNet	56.25%	86.25%	0.8553
DenseNet + MBN-V1	75.00%	92.50%	0.9252
DenseNet + MBN-V2	87.75%	93.75%	0.9491
DenseNet + MBN-V3	82.50%	92.50%	0.9472
DenseNet + MBN-V4	77.50%	92.50%	0.9483
ResNet	65.00%	70.00%	0.7675
ResNet + MBN-V1	68.75%	77.50%	0.8270
ResNet + MBN-V2	71.25%	86.25%	0.8795
ResNet + MBN-V3	70.00%	91.25%	0.9261
ResNet + MBN-V4	67.50%	90.00%	0.8797

Table 2. Experiments on ways to weigh the main loss and the mask loss. Metrics ranked with 1st place are in bold.

	Se	Sp	AUC
Equal Weighting	65.00%	82.50%	0.8720
Uncertainty Weighting	75.00%	92.50%	0.9252
Revised Uncertainty Weighting	71.25%	93.75%	0.9369
Dynamic Weight Average	70.00%	77.50%	0.8416

Table 3. Comparison on set of characteristics utilized for creating Mask Label; position, diameter, orientation, posterior eco pattern. Metrics ranked with 1st place are in bold.

Template Label	Se	Sp	AUC
No mask	56.3%	86.3%	0.855
Loc.	67.5%	90.0%	0.908
Loc. + Size	72.5%	90.0%	0.921
Loc. + Size + Orien.	75.0%	92.5%	0.925

Table 4. Cancer classification result on mask loss with DenseNet.

	Se	Sp	AUC
DenseNet	43.75%	56.25%	0.5187
DenseNet + MBN-V1	56.25%	60.41%	0.6402
DenseNet + MBN-V2	63.95%	61.57%	0.6802
DenseNet + MBN-V3	62.50%	64.58%	0.6050
DenseNet + MBN-V4	52.63%	68.42%	0.6638

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, D.; Park, H.; Jang, M.; Lee, K.-J. Mask Branch Network: Weakly Supervised Branch Network with a Template Mask for Classifying Masses in 3D Automated Breast Ultrasound. Appl. Sci. 2022, 12, 6332. https://doi.org/10.3390/app12136332

AMA Style

Kim D, Park H, Jang M, Lee K-J. Mask Branch Network: Weakly Supervised Branch Network with a Template Mask for Classifying Masses in 3D Automated Breast Ultrasound. Applied Sciences. 2022; 12(13):6332. https://doi.org/10.3390/app12136332

Chicago/Turabian Style

Kim, Daekyung, Haesol Park, Mijung Jang, and Kyong-Joon Lee. 2022. "Mask Branch Network: Weakly Supervised Branch Network with a Template Mask for Classifying Masses in 3D Automated Breast Ultrasound" Applied Sciences 12, no. 13: 6332. https://doi.org/10.3390/app12136332

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mask Branch Network: Weakly Supervised Branch Network with a Template Mask for Classifying Masses in 3D Automated Breast Ultrasound

Abstract

1. Introduction

2. Materials and Methods

2.1. Main Network

2.2. Mask Branch Network

2.3. Variation of Network Designs

2.4. Template Mask

2.4.1. Criteria on Radiology Reports

2.4.2. How to Create a Template Mask

3. Results

3.1. Implementation Details

3.2. Evaluation Metrics

3.3. Quantitative Results

3.3.1. Variation of MBN Architectures

3.3.2. Comparison of Loss Weighting Strategies

3.3.3. Characteristics Suitable for Template Mask

3.4. Qualitative Results

3.5. Cancer Classification

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI