Incorporation of Structural Similarity Index and Regularization Term into Neighbor2Neighbor Unsupervised Learning Model for Efficient Ultrasound Image Data Denoising

Wei, Peiyang; Wang, Liping; Gan, Jianhong; Shi, Xiaoyu; Shang, Mingsheng

doi:10.3390/app14177988

Open AccessArticle

Incorporation of Structural Similarity Index and Regularization Term into Neighbor2Neighbor Unsupervised Learning Model for Efficient Ultrasound Image Data Denoising

by

Peiyang Wei

^1,2,3,4,*

,

Liping Wang

⁵,

Jianhong Gan

^2,4,

Xiaoyu Shi

³ and

Mingsheng Shang

³

¹

School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

²

School of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China

³

Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China

⁴

Automatic Software Generation & Intelligence Service Key Laboratory of Sichuan Province, Chengdu 610225, China

⁵

School of Intelligent Science and Engineering, Chengdu Neusoft University, Chengdu 611844, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(17), 7988; https://doi.org/10.3390/app14177988

Submission received: 31 July 2024 / Revised: 1 September 2024 / Accepted: 5 September 2024 / Published: 6 September 2024

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Medical ultrasound imaging is extensively employed for diagnostic purposes. However, image quality remains a major obstacle to achieving greater accuracy. Conventional supervised deep learning denoising methods often rely on matched noise-free and noisy image pairs, which can be highly challenging in practical ultrasound applications. Moreover, due to the limitations associated with independent noise, existing unsupervised denoising methods such as Neighbor2Neighbor are unable to efficiently address correlated noise in ultrasound images. Meanwhile, these methods utilize a random neighborhood downsampling technique, frequently resulting in pixel loss. Hence, this study proposes a novel Neighbor2Neighbor algorithm, which reconstructs ultrasound images by improving the downsampling approach. Moreover, it incorporates a structural similarity index and a regularization term, thereby enhancing its ability to suppress both independent and correlated noise. Extensive experiments on an ultrasound image dataset demonstrate that the proposed Neighbor2Neighbor algorithm outperforms the state-of-the-art baseline algorithms in peak signal-to-noise ratio (PSNR), mean structural similarity index measure (MSSIM), feature similarity index measure (FSIM), and edge preservation index (EPI).

Keywords:

unsupervised learning; inhibition of spot; ultrasound images; denoising

1. Introduction

Ultrasound imaging technology is widely used in the medical field because of its non-invasiveness, low cost, high efficiency, convenience, and real-time performance. It has become one of the core tools for diagnosing various diseases in hospitals, especially in monitoring fetal development in pregnant women and diagnosing abdominal organ diseases. However, due to the special mechanism of ultrasound imaging, speckle noise is frequently generated during the imaging process, which reduces the quality of ultrasound images, mainly manifesting in poor image contrast and inconspicuous features characterizing the structural properties of tissues. These problems not only affect the clarity of the image but may also affect the accuracy of the diagnosis. Therefore, it is of vital significance to promote the quality of ultrasound medical images and provide auxiliary support in disease diagnosis.

In medical ultrasound imaging, low-frequency (3–30 MHz) ultrasound pulses are delivered to the patient through a sensor [1]. These pulses are reflected at the interface of different tissues and are captured by sensors and converted into electrical signals to generate ultrasound images. However, speckle noise frequently appears during this conversion, which is random interference resulting from scattering by fine particles [2]. This noise causes the image to appear as granular patterns in the bright and dark regions due to the random phase change and destructive interference of the ultrasound pulse. The existence of speckle noise reduces the detail resolution of the ultrasound image and then affects the overall quality of the image and the accuracy of the diagnosis. For the modeling of an echo envelope, Rayleigh distribution is commonly used to describe the echo envelope signal in ultrasound imaging [3]. In ultrasound image processing, common methods used to reduce speckle noise include the multi-angle plane composite technique and post-image filtering. Composite techniques include methods for combining content of different frequencies or images from different spatial views [4,5]. The post-image filtering method has been widely used in B-mode imaging [6]. These methods assume that speckle is a multiplicative noise, which can be filtered out, thus achieving the effect of suppressing speckle noise and ultimately raising the quality of the image. Among them, the nonlinear filter has the characteristics of retaining edges and smooth uniform areas [7].

For the past few years, with the development of deep learning, extensive researchers have used various convolutional neural networks to process image tasks. Furthermore, ultrasound image denoising is also one of the hotspots explored. Refs. [8,9] achieved the goal of image denoising by adding residual terms to the CNN network and formed a flexible image denoising solution. The above deep learning methods work extremely well in Gaussian denoising. However, these algorithms are based on supervised learning and require more noisy–clean image training. This kind of fully supervised learning method has a limited denoising effect in the practical application of ultrasound images. In contrast, unsupervised learning methods that do not rely on noise-free images appear to be more applicable for speckle denoising in ultrasound images.

Lehtinen et al. [10] proposed a depth noise canceller based on Noise2Noise (N2N) to train multiple noises with the same scene and learn the mapping relationship between two zero-mean noise images. When the number of samples is tiny, the transformation relationship of two zero mean noise patterns is learned. When the number of samples is enormous enough, since the noise is unpredictable, from the perspective of minimizing the loss function, N2N tends to output the expectation of all possible outputs, which is the signal itself of the clean image. However, N2N requires two noisy observations of the same scene, and this condition is also difficult to satisfy in realistic scenarios. Subsequently, the self-supervised denoising models Noise2Void (N2V) [11] and Noise2Self (N2S) [12] proposed by Krull et al. and Batson et al. can train the network with only one noisy observation for each scene. The N2V model uses a blind spot training strategy, in which the model excludes or masks pixels in the center of its receptive field during training. This design is based on the assumption that the noise has zero mean and is independent at the pixel level. By analyzing the local image context (not including the aforementioned blind spots), the model can predict the true intensity of a pixel. This strategy builds a probabilistic model through N2V to achieve coverage of noisy observations and pixel-level real signals. In this way, the N2V model effectively handles various noise levels and can effectively remove spatial variability noise. N2S utilizes a blind spot network architecture and a self-learning approach to predict a pixel’s value by examining its surrounding pixels. Again, regardless of the “blind spot”, the network implicitly learns the statistical properties of the noise present in the image, which allows the model to distinguish between the noise in each pixel and the underlying signal. This self-supervised approach allows the network to estimate the denoiser from noisy data, making it more effective against random noise mitigation. In research since then, probabilistic Noise2Void (PN2V) [13] and dilated blind-spot network [14] have introduced explicit noise modeling and probabilistic inference), as well as mask convolution and stacked expanded convolutional layers for better performance and faster training. However, the noise model is difficult to specify, especially in realistic scenarios of ultrasound images. Huang et al. [15] proposed a self-supervised framework called Neighbor2Neighbor, which trains a denoiser primarily by performing a single observation on noisy images. The framework mainly consists of two parts. First, a pair of noisy images is generated by random adjacent sub-samplers. Then, these sub-sampled image pairs are utilized for self-supervised training, while regularization loss is also introduced to cope with the non-zero actual difference between pairs of sub-sampled noise images. The regularization term mainly consists of the reconstruction part and the regularization part. Building upon the Neighbor2Neighbor framework, Song et al. [16] have integrated high-resolution anatomical MR images as auxiliary information, proposing a self-supervised method named neighbor-to-neighbor (NB2NB) that denoises PET images using a single noisy input. By employing a U-Net network architecture across three different resolution levels, this method can more delicately capture and restore the details in PET images, thereby significantly enhancing the denoising effect. This multi-resolution strategy not only optimizes noise suppression but also ensures the quantitative accuracy of the images, making them more suitable for specific types of medical imaging data.

Due to the limitation of independent noise, the current unsupervised denoising methods such as Ne2Ne cannot deal with correlated noise in ultrasound images. Meanwhile, the model causes the loss of pixels due to the random neighborhood downsampler adopted by the model.

To enhance the generalization ability and detail recovery capacity of the denoising model, our work proposes a new self-supervised learning ultrasound image denoising framework according to the limitations of the Ne2Ne algorithm. The main contributions of this paper are as follows:

(1) Based on the Ne2Ne algorithm, a term is added to constrain the structural similarity between neighborhood images, to suppress independent and correlated noise more effectively.

(2) Meanwhile, to enhance the quality of the denoising effect of the model, the downsampling strategy is differentially reorganized and optimized to ensure that the sub-image details of the subsamples are preserved. It is especially effective for the problem of pixel loss caused by random neighborhood downsamplers.

(3) Extensive experiments display that the devised improved Ne2Nes algorithm outperforms the state-of-the-art algorithm in terms of effectiveness and accuracy on ultrasound datasets.

The organization of this paper is as follows: literature survey in Section 2. In Section 3, an LTLM algorithm is proposed to identify geometric errors. In Section 4, compared with other algorithms, the performance of the LTLM algorithm is analyzed. Section 5 presents the grinding polishing simulation experiments. The conclusions are presented in Section 6.

2. Literature Survey

2.1. Speckle Noise Model

In ultrasound imaging, the scatterer density, spatial distribution, and the characteristics of the ultrasound imaging system affect the speckle noise model. When the scatterer density is high (more than 10 scatterers per resolution), as is common in blood flow cells, for example, the amplitude of the backscattered signal usually follows a Rayleigh distribution. Conversely, if the scatterer density is low, the amplitude of the backscattered signal follows a K distribution. The Rayleigh distribution can be viewed as a particular case of the K distribution [17]. However, in clinical ultrasound imaging systems, with ultrasound display equipment, nonlinear signal processing techniques (such as logarithmic compression and low-pass filtering) are introduced to address the echo envelope signal problem. This nonlinear compression has a great influence on the statistical characteristics of the echo envelope signal. For example, if the echo envelope signal originally follows a Rayleigh distribution, then after logarithmic compression, it will exhibit an F-T (Fisher–Tippet) distribution. The noise of the F-T distribution can be viewed as white Gaussian noise contaminated by outliers, and this assumption degrades the performance of the filter [18,19]. Usually, the echo envelope signal follows the K distribution, but unfortunately, the density function of the signal following the K distribution becomes complex after logarithmic compression [17]. Loupas et al. [20,21] pointed out in their study that speckle noise in ultrasound images may be related to the signal after logarithmic compression, and proposed an explicit simulation model for log-compressed ultrasound images:

μ (x) = v (x) + v^{γ} (x) θ (x)

(1)

In this model, v(x) is the raw image without noise contamination, µ(x) is the actual observed ultrasound image, θ(x) is Gaussian noise with mean 0 and standard deviation δ², and γ is a constant depending on the ultrasound equipment and imaging process. In general, γ = 0.5 is taken. In this paper, Equation (1) is used as the speckle noise model.

2.2. The Neighbor2Neighbor Model

The core idea of Noise2Noise is that for an unobserved clean scene x and two independent observed noisy images y, z, a noise-reduction network that can be trained with (y, z) pairs is tantamount to a network by training with (y, z) pairs if the noise is 0-mean. Noise2Noise minimizes the loss function θ:

\arg \min_{θ} E_{x, y, z} {‖f_{θ} (y) - z‖}_{2}^{2}

(2)

where the neural network function f_θ(y) is parameterized by θ.

Neighbor2Neighbor is an extension of Noise2Noise, which constructs similar noisy images from a single noisy image by designing a sampler. Moreover, to solve the stubbornness of image smoothing caused by different sampling positions of similar noisy images in the sampling process, a regularization term is introduced.

Neighbor2Neighbor mainly considers the following two aspects: two independent noisy images of similar scenes, and a single noisy image for a single scene.

Assumption 1:

There is a pure image x, and its corresponding noisy image is y; i.e.,

E_{y | x} (y) = x

(3)

When a quite tiny image difference ε ≠ 0 is introduced, x + ε is the pure image corresponding to another noisy image z; i.e.,

E_{z | x} (z) = x + ε

(4)

Suppose the variance of z is δ²; then

\begin{matrix} E_{x, y} {‖f_{θ} (y) - x‖}_{2}^{2} & = E_{x, y, z} {‖f_{θ} (y) - z‖}_{2}^{2} - δ_{z}^{2} \\ + 2 ε E_{x, y} (f_{θ} (y) - x) \end{matrix}

(5)

When ε→0 the (y, z) pairing can be viewed as a neighborhood of Noise2Noise, finding y, z satisfying the “similar but not identical” condition allows an algorithm to train the denoising network.

Assumption 2:

For a single noisy image, Neighbor2Neighbor uses a sampler to create two “similar but not identical” images. The sub-images obtained by sampling adjacent spots of the raw image satisfy the constraint that the differences between each other are tiny, but the matching pure images are not identical (when ε→0). For a noisy image y, Neigbor2Neighbor constructs a pair of nearest neighbor samplers g₁(*) and g₂(*), and samples two sub-images g₁(y), g₂(y). We directly construct pairs with these two sub-images. Moreover, the noise reduction network is trained in a Noise2Noise manner; then

\arg \min_{θ} E_{x, y} {‖f_{θ} (g_{1} (y)) - g_{2} (y)‖}^{2}

(6)

Neighbor2Neighbor calls this approach pseudo Noise2Noise. Since g₁(y), g₂(y) are sampled at different locations, the resulting denoising model is not optimal and leads to excessive smoothing. Therefore, Neighbor2Neighbor considers correcting this situation by adding a regularization term on the loss, adding a constraint based on pseudo Noise2Noise, transforming the optimization problem with constraints into an optimization problem with regularization, and finally optimizing the loss function:

\begin{matrix} L & = L_{r e c} + γ \cdot L_{r e g} \\ = {‖f_{θ} (g_{1} (y)) - g_{2} (y)‖}_{2}^{2} \\ + γ \cdot {‖f_{θ} (g_{1} (y)) - g_{2} (y) - (g_{1} (f_{θ} (y)) - g_{2} (f_{θ} (y)))‖}_{2}^{2} \end{matrix}

(7)

where L_rec is the reconstruction term based on the network output and the noisy target. L_reg is the regularization term and γ is the hyperparameter that controls the strength of the regularization term.

3. Related Methodologies

3.1. Downsampling Strategy

From a single noisy image y, the sub-images g₁(y), g₂(y), g₃(y), g₄(y) are constructed by sampler G, which are detailed below.

The pixels of the input image y of size N_ch × H × W are reorganized into a lower-resolution image of size 4N_ch × H/2 × W/2. The image is divided into multiple 2 × 2 cells, and its pixels are reorganized in different channels of the output image in accordance with the below equation:

G [c, m, n] = I [⌊\frac{c}{4}⌋, 2 m + (\begin{matrix} c & \mod & 2 \end{matrix}), 2 n + ⌊\frac{c}{2}⌋]

(8)

where 0 ≤ c < 4N_ch, 0 ≤ m < H, 0 ≤ n < W; since this paper targets gray-scale ultrasound images, the number of channels N_ch is set to 1.

A schematic diagram of the sampling strategy of sampler G is shown in Figure 1.

3.2. The Structure Similarity Index Measure Loss Function

In order to measure the structure similarity index measure (SSIM) [22] of the perceptual distance between two images, we try to construct the structural similarity loss function L_ssim between the downsampled sample images g₁(y), g₂(y), g₃(y), g₄(y). The SSIM index considers the similarity of luminance l(x,y), contrast c(x,y), and structure s(x,y) information structure of images x and y to measure the perceptual distance of two images. From the similarity function for x and y

S (x, y) = f (l (x, y), c (x, y), s (x, y))

(9)

the luminance similarity l(x,y) between image x and y is given by

l (x, y) = \frac{2 μ_{x} μ_{y} + ε_{1}}{μ_{x}^{2} + μ_{y}^{2} + ε_{1}}

(10)

where ε₁ is the constant that goes to 0 to avoid a zero denominator, where ε₁ = (K₁L)², and K₁ = 0.01 is used as a hyperparameter, and L is viewed as the actional range with respect to the y pixel values of the target image. Let and denote the luminance value representing the maximum number of pixels in the histograms of the x and y images, respectively. Figure 2 shows a histogram of an image, where the abscissa represents the pixels of the image and the ordinate represents the number of pixels.

Commonly, the content information of ultrasound images is concentrated in the image numerical area corresponding to the area with more pixels in the histogram, and these areas often contain important diagnostic information. Therefore, the brightness similarity used in this paper is to take the pixel value corresponding to the high point in the histogram as the image brightness value in L_ssim, instead of the brightness average value in the original L_ssim. This improvement is very important to improve the diagnostic accuracy and image quality of ultrasound images, especially in addressing complex images with uneven brightness distribution. Assuming the value of every pixel is x_i, the expression is as follows:

μ_{x} = \max (n u m (x_{i}))

(11)

where num() is the number of pixels.

The contrastive similarity c(x,y) function between image x and y is given by:

c (x, y) = \frac{2 σ_{x} σ_{y} + ε_{2}}{δ_{x}^{2} + δ_{y}^{2} + ε_{2}}

(12)

ε₂ is the small constant to avoid zeros in the denominator, where ε₂ = (K₂L)², K₂ = 0.03 as hyperparameters. Let δ_x and δ_y denote the standard deviation representing the pixel values of the x and y images, respectively.

The structural similarity s(x,y) function between image x and y is denoted by:

s (x, y) = \frac{δ_{x y} + ε_{3}}{δ_{x} δ_{y} + ε_{3}}

(13)

ε₃ is the tiny constant to avoid zeros in the denominator. Denote by δ_xy the standard deviation of the pixel values of x and y images. Setting ε₃ = ε₂/2, the numerator of c(x,y) and the denominator of s(x,y) cancel out by reduction, and finally the L_ssim loss function is obtained as:

\begin{matrix} L_{s s i m} & = 1 - S S I M (x, y) \\ = 1 - \frac{(2 μ_{x} μ_{y} + ε_{1}) (2 δ_{x y} + ε_{2})}{(μ_{x}^{2} + μ_{y}^{2} + ε_{1}) (δ_{x}^{2} + δ_{y}^{2} + ε_{2})} \end{matrix}

(14)

where

δ_{y}^{x}

is the variance of x, δ_xy is the covariance of x,y, ε₁ and ε₂ are tiny constants to avoid zeros in the denominator, where ε₁ = (K₁L)² and ε₂ = (K₂L)², and where K₁ = 0.01 and K₂ = 0.03 are two hyperparameters. L is treated as the actional scope of pixel values on the basis of the target image y. In this research, x and y correspond to g₁(y), g₃(y) and g₂(y), g₄(y), respectively.

3.3. Unsupervised Learning Model Based on Improved Neighbor2Neighbor

Aiming at the sampler description in Section 3.1, training image pairs are extracted from the single noise image to be used as input to the self-supervised training algorithm in this section. Given a pair of subsampled images (g₁(y), g₂(y), g₃(y), g₄(y)) derived from noisy image y, we train the denoising network using the structural similarity loss and the regularization loss proposed in Section 3.2.

L = L_{r e c} + γ \cdot L_{r e g} + η \cdot L_{s s i m}

(15)

where γ is a hyperparameter that regulates the intensity of the regularization term and η is a hyperparameter of the structural similarity loss function. For better ablation experiments, γ = 2 and γ = 1 are used for the synthetic experiments and the practical experiments as in the Neighbor2Neighbor model, respectively. We stop the gradients of g₁(y), g₂(y), g₃(y), and g₄(y) during training and gradually raise η to the appointed value.

L_rec is the built loss function, where L_rec is defined as:

L_{r e c} = {‖f_{θ} (g_{1} (y)) - g_{3} (y)‖}^{2} + {‖f_{θ} (g_{2} (y)) - g_{4} (y)‖}^{2}

(16)

The regularized loss function L_reg is defined as:

\begin{matrix} L_{r e g} & = {‖f_{θ} (g_{1} (y)) - g_{3} (y) - (g_{1} (f_{θ} (y)) - g_{2} (f_{θ} (y)))‖}^{2} \\ + {‖f_{θ} (g_{2} (y)) - g_{4} (y) - (g_{3} (f_{θ} (y)) - g_{4} (f_{θ} (y)))‖}^{2} \end{matrix}

(17)

where f_θ is a denoising network design, and the training framework is described in Algorithm 1. The specific improved model of this paper is shown in Figure 3.

Algorithm 1. Based on improved Neighbor2Neighbor
Input: A set of noisy images Y, Denoising network f_θ, Hyperparameter γ and η
Operation
1.	while not converged do
2.	Sample a noisy image y∈Y;
3.	Generate Sampler G = (g₁,g₂,g₃,g₄);
4.	Desire the sub-sample images (g₁(y), g₂(y), g₃(y), g₄(y)), where g₁(y), g₂(y) is the network input, and g₃(y), g₄(y) is the network target;
5.	Calculate L_ssim;
6.	For the network input g₁(y), g₂(y), derive the denoised image f_θ(g₁(y)), f_θ(g₂(y));
7.	Calculate L_rec;
8.	For the original noisy image y, derive the denoised image f_θ(y) with no gradients;
9.	Use the same sub-sampler G to derive the images (g₁(f_θ(y)), g₂(f_θ(y)), g₃(f_θ(y)), g₄(f_θ(y)));
10.	Calculate L_reg;
11.	Update the denoising network f_θ by minimizing the objective L_rec + γ·L_reg + η·L_ssim
12.	end
Output: denoise images Y′

4. Evaluation Metrics

4.1. Peak Signal-to-Noise Ratio

Peak signal-to-noise ratio (PSNR) assesses image character ground on the errors of matching pixels.

MSE = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} {(X (i, j) - Y (i, j))}^{2}

(18)

MSE denotes the mean square error of the images X and Y, where X and Y represent the noisy and denoised images, respectively, while H and W are the image dimensions. The PSNR ratio is denoted by:

P S N R = 20 \log_{10} (\frac{255}{\sqrt{M S E (X, Y)}})

(19)

4.2. Mean Structural Similarity

Mean structural similarity quality measurement (MSSIM) [23] evaluates image quality based on information structure degradation. The MSSIM evaluation metric is more stable than SSIM, especially for images with different resolutions.

M S S I M = \frac{1}{M} \sum_{i = 1}^{M} \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}

(20)

where μ_i and σ_i denote the mean and standard deviation of its intensity for the ith local window pixel, respectively. C_i is a constant to prevent errors. The upper and lower boundary range of the MSSIM value is [0, 1].

4.3. Feature Similarity Index Measure

Feature similarity index measure (FSIM) [24] is mainly used for quality assessment of feature similarity. FSIM employs phase congruency (PC) and gradient magnitude (GM) features to complement each other. Based on the relative invariance of PC to the image change, the stability features in the image are extracted. Conversely, GM is mainly used for feature extraction when the image changes.

It is assumed that the phase congruency of the raw image f₁ and the denoised image f₂ can be represented by PC₁ and PC₂, while the gradient features are represented by G₁ and G₂. The similarity between these two images is computed as:

S_{P C} = \frac{2 P C_{1} P C_{2} + T_{1}}{P C_{1}^{2} + P C_{2}^{2} + T_{1}}

(21)

where T₁ is a positive constant that can improve the stability of S_PC.

Similarly, the similarity of G₁ and G₂ can be computed:

S_{G} = \frac{2 G_{1} G_{2} + T_{2}}{G_{1}^{2} + G_{2}^{2} + T_{2}}

(22)

where T₂ is a positive invariable hinging on the actional scope of gradient magnitude values.

The similarity S_L of f₁ and f₂ is calculated by S_PC and S_G.

S_{L} (x) = {[S_{P C} (x)]}^{α} + {[S_{G} (x)]}^{β}

(23)

The relative importance of PC and GM characteristics is modified by parameters α and β. In our research, for convenience, let us set α = β = 1. The value of FSIM ranges from [0, 1].

4.4. Edge Preservation Index

The edge preservation index (EPI) [25] measures the capacity of denoised images to preserve edge particulars.

E P I = \frac{\sum_{i} \sum_{j} |Y (i + 1, j) - Y (i, j)|}{\sum_{i} \sum_{j} |X (i + 1, j) - X (i, j)|}

(24)

where X and Y denote the noisy and denoised image, while i and j denote the coordinates in the vertical and horizontal directions in the image.

The larger the value of these evaluation metrics, the better the denoising result.

5. Experiment Settings

5.1. Baselines

In this experiment, the performance of the algorithm needs to be assessed quantitatively and qualitatively. By performing algorithm experiments on the ultrasonic image dataset, those metrics values are obtained to check on the benefits of the algorithm in spot restraint and contrast gain. We compare the proposed algorithm based on the improved Neighbor2Neighbor algorithm with the baseline methods Noise2Noise [10] and Neighbor2Neighbor [15], the traditional noise reducer BM3D (Block-matching and 3D filtering) [26], and Noise2Void (N2V) [11]. To ensure the fairness and comparability of the experiment, all parameters of baseline algorithms are adopted to the best values in the paper.

During the realization of the denoising algorithm presented in this paper, the configuration of specific parameters is mandatory. This section details the training of our proposed model utilizing the PyTorch framework, with the experimental parameters delineated in Table 1. To investigate the denoising efficacy on both synthetic and authentic images, distinct initial learning rates are assigned for these conditions: 0.0003 for synthetic image denoising and 0.0001 for real image denoising. The training regimen spans 100 epochs, with a learning rate reduction by half every 20 epochs to facilitate the training process and enhance the optimization outcome, while maintaining a batch size of 4. The Adam optimizer is engaged throughout the training phase, capitalizing on its superior adaptive qualities to refine model performance. Moreover, the regularization term’s intensity, modulated by the hyperparameter γ, is designated as 2 for synthetic experiments and 1 for real-world experiments, in pursuit of optimal denoising efficacy across varying contexts.

5.2. Experimental Environment and Dataset

The experiment is carried out in the python3.9 and pytorch1.7.1 environments. The computing platform is Intel(R) Xeon(R) Gold 6142 CPU @ 2.60 GHz, 2.59 GHz memory; the GPU is NVIDIA RTX A5000, Windows11 system.

Due to the lack of public datasets in ultrasound image denoising tasks, the dataset used for synthetic experiments in this paper is the ultrasound images with less speckle noise and ideal imaging quality collected by the Philips ultrasound imaging system in the hospital ultrasound imaging department. The training dataset includes 1300 clean ultrasound images with a resolution of 256 × 256. We synthesize simulated actual medical ultrasound images in the light of the speckle noise model of equ.1. For real medical ultrasound images, we use real ultrasound images of the liver collected by a portable ultrasound diagnostic instrument, including 1300 real ultrasound images. The original resolution is 256 × 504. Before training, in order to ensure the consistency of model input and reduce the computational burden of the model, the training data is center-cropped to 256 × 256 pixels. Figure 4 shows some real medical ultrasound images.

6. Experimental Results and Analysis

6.1. Ablation Experiment

In this study, we analyze the impact of structural similarity loss function and downsampling strategy on model performance by ablation experiments. Specifically, we compare the denoising results after the improved modules are successively added to the network model with the original Neighbor2Neighbor model, thus evaluating the contribution of these improvements to the denoising performance of the model.

6.1.1. The η Hyperparameter in L_ssim

The hyperparameter η in Equation (14) is used to control the strength of the structural similarity loss function. To fully evaluate the influence of this parameter, we set different η values in the ablation experiments and assess the result of the proposed method on the test dataset. The experimental results, presented in detail in Table 2 and Table 3, reveal the performance of the model for different values of η about metrics PSNR and MSSIM. It needs to be emphasized that this ablation experiment was conducted under the speckle noise model (Equation (1)), where the standard deviation δ of the speckle noise was set to a series of different values (0.1, 0.2, 0.3, 0.4, 0.5 and 0.6), aiming to comprehensively explore the denoising effect of the model at kinds of noise classes. To comprehensively evaluate the denoising effect of the model under different noise levels.

The findings of this study reveal that varying values of the hyperparameter η significantly influence the model’s performance on the PSNR and MSSIM metrics. Adjustment of η within an appropriate range can augment the denoising efficacy of the model. Nonetheless, values of η that are either too high or too flat may result in a diminished denoising effect. The optimal PSNR is attained when the standard deviation δ is set at 0.5 and the hyperparameter η is configured to 1. At an η value of 1.5, the model achieves the highest mean structural similarity index. However, an upward trend in noise levels leads to a degradation in the model’s denoising capabilities. This study highlights the importance of considering different noise levels in designing and optimizing denoising models. The incorporation of the SSIM, with its hyperparameter η, and its tuning, results in discernible enhancements in denoising performance. These metrics the substantial contribution of image structural similarity to the advancement of ultrasound image denoising outcomes.

6.1.2. Results of Ultrasound Image Denoising with Different Loss Functions

In the experimental part of this study, we can be centered on evaluating the influence of different loss functions on model performance. Specifically, we employ different loss functions for model training: no structural similarity loss function or structural similarity loss function L_ssim. To fairly evaluate the effect of these loss functions, we tested them on an ultrasound test image dataset under simulated speckle noise with standard deviation δ = 0.5. The test consequences are displayed in Table 4, providing a comparison of the mean PSNR and MSSIM values of the models with different loss functions. Specifically, the model incorporating the SSIM loss function, L_ssim, demonstrated an improvement of 0.96 dB in PSNR and 0.03 in MSSIM when compared to the model without the structural similarity loss function.

In addition, Figure 5 displays the denoising outcomes of different loss functions on the ultrasound test images dataset, from which it can be clearly observed that compared with the traditional structural similarity loss function, the improved L_ssim loss function proposed in this study shows better PSNR and MSSIM performance in the fusing trained model. This result highlights the effectiveness of the modified loss function in this research in building up the denoising metrics of the model.

6.2. Experiment on Simulated Noisy Images

In this experiment, the public breast ultrasound image dataset [27] is used, the data included breast ultrasound images of women aged 25 to 75, and the magnitude of patients is 600 women suffering from a given condition. The dataset is made up of 780 images with a size of 500 × 500 pixels. For the convenience of comparison, the test data are cropped to a grayscale image with a size of 256 × 256, and speckle noise is simulated on it. To quantitatively evaluate all the compared algorithms, 3 speckle noise levels δ = {1.0, 0.3, 0.5} are tested. PSNR and MSSIM indicators are selected for evaluation, and the specific outcomes are displayed in Table 4.

Table 5 displays the consequence of PSNR and MSSIM for diverse noise classes. In contrast, the improved Neighbor2Neighbor improves the PSNR compared with Neighbor2Neighbor, and the denoising influence can exceed Neighbor2Neighbor. The mean structural similarity of images would take the edge off Neighbor2Neighbor. With the raising of the noise level, the advantage of the proposed algorithm in structural similarity is also prominent, notably with regard to severe noise.

6.3. Real Ultrasound Image Experiment

In our research, actual ultrasound images collected by a Healson U20 detect genre ultrasound diagnostic device are used. The liver B-mode ultrasound images of 256 × 256 and 472 × 256 are tested, respectively. The dataset is objectively evaluated by three noise evaluation metrics: MSSIM, FSIM, and EPI. The data analysis in Table 6 reveals that the refined Neighbor2Neighbor approach introduced in this research substantially surpasses alternative algorithms in preserving structural similarity and edge retention, as indicated by the MSSIM, FSIM, and EPI. The proposed method achieves a mean increase of 0.01 in MSSIM, 0.02 in FSIM, and 0.06 in EPI when compared to the original Neighbor2Neighbor technique. Furthermore, against the BM3D algorithm, the improvements are even more significant, with average enhancements of 0.10 in MSSIM, 0.15 in FSIM, and 0.51 in EPI. These results underscore the method’s efficacy in noise reduction while concurrently maintaining the structural integrity and original edge details of the images.

Additionally, Table 6 presents the processing times for denoising a single image using different denoising methods. It is evident that our approach demonstrates a significantly higher implementation efficiency compared to traditional algorithms. Although the proposed method exhibits a slightly higher runtime than the Noise2Noise, Noise2Void, and Neighbor2Neighbor speckle reduction methods, it only requires an average of 7.76 milliseconds to process speckle noise images. This is advantageous for the utilization of end-to-end network architectures as well as GPU and multi-threading capabilities.

7. Conclusions

Our research proposes a novel ultrasound image denoising algorithm on the basis of the improved Neighbor2Neighbor unsupervised learning model. Our proposed method is compared with some other classical denoising techniques, such as BM3D, Noise2Noise, Noise2Void, etc. The experimental results show that the improved Neighbor2Neighbor unsupervised learning model can effectively reduce the influence of speckle noise while maintaining the edge and detail information in the image, thus effectively improving the quality of the image. In addition, the proposed method also successfully addresses the issue that the original model cannot effectively suppress correlated or structured noise by adopting a new downsampling strategy, generating subsampled paired images as training images, and employing a self-supervised training mechanism with structural similarity loss. This improvement enhances the ability of the model in terms of detail retention. However, this research is trained and tested on a specific dataset, so the application of these results to other datasets may have certain limitations. For future work, we look forward to acquiring images from different devices and patient populations from collaborating clinical centers, thereby assessing the robustness and generalization of the algorithm in different environments. Moreover, we will also introduce evolutionary computation to optimize the hyperparameters of the network in the next work, which reduces the number of epochs, and achieves faster convergence of the model, thereby attaining our goal of improving the performance of the model.

Author Contributions

Conceptualization, P.W. and J.G.; methodology, M.S.; software, L.W.; validation, J.G., L.W. and X.S.; formal analysis, P.W.; investigation, P.W. and L.W.; resources, J.G.; data curation, J.G.; writing—original draft preparation, P.W.; writing—review and editing, P.W. and L.W.; visualization, P.W. and L.W.; supervision, J.G., X.S. and M.S.; project administration, M.S.; funding acquisition, P.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Sichuan University Students’ Innovation and Entrepreneurship Training Program (S202210621090), Chengdu University of Information Technology Key Project of Education Reform (JYJG2022090/JYJG2023212).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the correspondence author on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Baselice, F.; Ferraioli, G.; Ambrosanio, M.; Pascazio, V.; Schirinzi, G. Enhanced Wiener filter for ultrasound image restoration. Comput. Methods Programs Biomed. 2018, 153, 71–81. [Google Scholar] [CrossRef] [PubMed]
Islam, S.; Norouzian, M.; Turner, J.A. Influence of tessellation morphology on ultrasonic scattering. J. Acoust. Soc. Am. 2022, 152, 1951–1961. [Google Scholar] [CrossRef] [PubMed]
Azar, A.A.; Rivaz, H.; Boctor, E.M. Speckle Detection in Echocardiographic Images. In Echocardiography-New Techniques; IntechOpen: London, UK, 2012. [Google Scholar]
May, N.; Phoulady, A.; Choi, H.; Tavousi, P.; Shahbazmohamadi, S. Single image composite tomography utilizing large scale femtosecond laser cross-sectioning and scanning electron microscopy. Microsc. Microanal. 2022, 28 (Suppl. S1), 876–878. [Google Scholar] [CrossRef]
Song, L.; Li, Y.; Dong, G.; Lambo, R.; Qin, W.; Wang, Y.; Zhang, G.; Liu, J.; Xie, Y. Artificial intelligence-based bone-enhanced magnetic resonance image—A computed tomography/magnetic resonance image composite image modality in nasopharyngeal carcinoma radiotherapy. Quant. Imaging Med. Surg. 2021, 11, 4709–4720. [Google Scholar] [CrossRef] [PubMed]
Jamthikar, A.D.; Gupta, D.; Puvvula, A.; Amer, M.J.; Narendra, N.K.; Luca, S.; Sophie, M.; John, R.L.; Gyan, P.; Martin, M.; et al. Cardiovascular risk assessment in patients with rheumatoid arthritis using carotid ultrasound B-mode imaging. Rheumatol. Int. 2020, 40, 1921–1939. [Google Scholar] [CrossRef] [PubMed]
Prabusankarlal, K.M.; Manavalan, R.; Sivaranjani, R. An optimized non-local means filter using automated clustering based preclassification through gap statistics for speckle reduction in breast ultrasound images. Appl. Comput. Inform. 2018, 14, 48–54. [Google Scholar] [CrossRef]
Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [PubMed]
Zhang, K.; Zuo, W.; Zhang, L. FFDNet: Toward a fast and flexible solution for CNN-based image denoising. IEEE Trans. Image Process. 2018, 27, 4608–4622. [Google Scholar] [CrossRef] [PubMed]
Lehtinen, J.; Munkberg, J.; Hasselgren, J.; Laine, S.; Karras, T.; Aittala, M.; Aila, T. Noise2Noise: Learning image restoration without clean data. arXiv 2018, arXiv:1803.04189. [Google Scholar]
Krull, A.; Buchholz, T.O.; Jug, F. Noise2void-learning denoising from single noisy images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2129–2137. [Google Scholar]
Batson, J.; Royer, L. Noise2self: Blind denoising by self-supervision. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 15–20 June 2019; pp. 524–533. [Google Scholar]
Krull, A.; Vičar, T.; Prakash, M.; Lalit, M.; Jug, F. Probabilistic noise2void: Unsupervised content-aware denoising. Front. Comput. Sci. 2020, 2, 5. [Google Scholar] [CrossRef]
Wu, X.; Liu, M.; Cao, Y.; Ren, D.; Zuo, W. Unpaired Learning of Deep Image Denoising. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; LNCS 12349. pp. 352–368. [Google Scholar]
Huang, T.; Li, S.; Jia, X.; Lu, H.; Liu, J. Neighbor2neighbor: Self-supervised denoising from single noisy images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14781–14790. [Google Scholar]
Song, T.A.; Yang, F.; Dutta, J. Self-supervised PET image denoising using a neighbor-to-neighbor network. J. Nucl. Med. June 2023, 64 (Suppl. S1), 1380. [Google Scholar]
Dutt, V.; Greenleaf, J.F. Adaptive speckle reduction filter for log-compressed B-scan images. IEEE Trans. Med. Imaging 1996, 15, 802–813. [Google Scholar] [CrossRef] [PubMed]
Michailovich, O.V.; Tannenbaum, A. Despeckling of medical ultrasound images. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2006, 53, 64–78. [Google Scholar] [CrossRef] [PubMed]
Yu, Y.; Acton, S.T. Speckle reducing anisotropic diffusion. IEEE Trans. Image Process. 2002, 11, 1260–1270. [Google Scholar] [PubMed]
Loupas, T. Digital Image Processing for Noise Reduction in Medical Ultrasonics. Ph.D. Thesis, University of Edinburgh, Edinburgh, UK, 1988. [Google Scholar]
Loupas, T.; McDicken, W.N.; Allan, P.L. An adaptive weighted median filter for speckle suppression in medical ultrasonic images. IEEE Trans. Circuits Syst. 1989, 36, 129–135. [Google Scholar] [CrossRef]
Bhatt, R.; Naik, N.; Subramanian, V.K. SSIM compliant modeling framework with denoising and deblurring applications. IEEE Trans. Image Process. 2021, 30, 2611–2626. [Google Scholar] [CrossRef]
Yamaya, H.; Mimura, Y.; Yamazaki, H.; Yanagida, H. 2Pa5-1 Image Quality Assessment of 3D Ultrasound Images Based on SSIM. Proc. Symp. Ultrason. Electron. 2021, 42, 2Pa5-1. [Google Scholar]
Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A feature similarity index for image quality assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef] [PubMed]
Sun, Y.; Xin, Z.; Huang, X.; Wang, Z.; Xuan, J. Overview of SAR Image Denoising Based on Transform Domain. In 3D Imaging Technologies—Multi-Dimensional Signal Processing and Deep Learning; Smart Innovation, Systems and Technologies; Springer: Singapore, 2021; p. 234. [Google Scholar]
Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef] [PubMed]
Al-Dhabyani, W.; Gomaa, M.; Khaled, H.; Fahmy, A. Dataset of breast ultrasound images. Data Brief 2020, 28, 104863. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Illustration of the strategy for generating sub-images by downsampling G = (g₁, g₂, g₃, g₄), where k = 2; in every 2 × 2 cell, the pixels in the top left, top right, bottom left, and bottom right of the pixel matrix are filled with red, blue, orange, and green, respectively. The pixels padding red, blue, orange, and green are the pixels of the subsampled images g₁(y), g₂(y), g₃(y), and g₄(y), respectively. The subsampled paired images (g₁(y), g₂(y), g₃(y), g₄(y)) are shown as red patches, blue patches, orange patches, and green plaques on the right.

Figure 2. Flowchart of LTLM compensation.

Figure 3. Unsupervised Learning Model based on Improved Neighbor2Neighbor.

Figure 4. Some images of the real ultrasound dataset.

Figure 5. Results of ultrasound image denoising with different loss functions.

Table 1. Training parameters of the neural network model.

Parameters	Parameter Values
Batch Size	4
Iteration Method	Adam
Number of Iterations	100
Initial Learning Rate	0.0003 (Synthetic Experiments) 0.0001 (Real Experiments)
Regularization Intensity Control Parameter η	2 (Synthetic Experiments) 1 (Real Experiments)

Table 2. Ablation experiment of hyperparameter η (PSNR/dB).

Denoising Algorithms	The Standard Deviation σ of the Speckle Noise
Denoising Algorithms	0.1	0.2	0.3	0.4	0.5	0.6
No L_ssim	30.3681	30.7160	31.4335	30.3293	31.7896	30.9377
L_ssim η = 0.5	31.1547	31.8176	32.0143	32.4097	33.4576	31.0501
L_ssim η = 1	35.6007	35.2461	33.5341	32.9331	36.2193	31.2005
L_ssim η = 1.5	30.8513	31.8456	32.4519	31.0646	33.0053	30.7907

Table 3. Ablation experiment of hyperparameter η (MSSIM).

Denoising Algorithms	The Standard Deviation σ of the Speckle Noise
Denoising Algorithms	0.1	0.2	0.3	0.4	0.5	0.6
No L_ssim	0.6112	0.6530	0.6685	0.6701	0.6775	0.6736
L_ssim η = 0.5	0.6201	0.6419	0.6789	0.6797	0.6810	0.5801
L_ssim η = 1	0.6306	0.6600	0.6427	0.6588	0.6526	0.6728
L_ssim η = 1.5	0.6452	0.6516	0.6584	0.6655	0.6861	0.6673

Table 4. Average quantitative results of different loss functions on the test dataset.

Model	PSNR/dB	MSSIM2
Neighbor2Neighbor	28.6445	0.6177
Neighbor2Neighbor + L_ssim	29.6093	0.6525

Table 5. Performance comparison of different noise level denoising methods (PSNR, MSSIM).

Denoising Algorithms	δ = 0.1	δ = 0.3			δ = 0.5
Denoising Algorithms	PSNR/dB	MSSIM	PSNR/dB	MSSIM	PSNR/dB	MSSIM
Noisy	33.0975	0.7554	29.3456	0.6845	28.8437	0.6371
Neighbor2Neighbor	35.0696	0.8589	31.4945	0.7887	28.9461	0.7564
Neighbor2Neighbor + L_ssim	44.4122	0.8521	40.7540	0.8284	34.9397	0.8136

Table 6. Performance comparison of different denoising methods (MSSIM, FSIM, EPI).

Denoising Algorithms	Image1				Image2
Denoising Algorithms	MSSIM	FSIM	EPI	Time (ms)	MSSIM	FSIM	EPI	Time (ms)
BM3D	0.5520	0.7259	0.2518	620	0.5211	0.7060	0.2224	635
Noise2Noise	0.4859	0.7217	0.3392	5.24	0.5133	0.6500	0.3418	5.33
Noise2Void	0.5646	0.7434	0.3441	4.32	0.5364	0.7257	0.3334	4.56
Neighbor2Neighbor	0.6086	0.8109	0.5494	6.78	0.5793	0.8056	0.5271	6.98
Neighbor2Neighbor + L_ssim	0.6102	0.8244	0.5929	7.67	0.5951	0.8139	0.5609	7.84

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wei, P.; Wang, L.; Gan, J.; Shi, X.; Shang, M. Incorporation of Structural Similarity Index and Regularization Term into Neighbor2Neighbor Unsupervised Learning Model for Efficient Ultrasound Image Data Denoising. Appl. Sci. 2024, 14, 7988. https://doi.org/10.3390/app14177988

AMA Style

Wei P, Wang L, Gan J, Shi X, Shang M. Incorporation of Structural Similarity Index and Regularization Term into Neighbor2Neighbor Unsupervised Learning Model for Efficient Ultrasound Image Data Denoising. Applied Sciences. 2024; 14(17):7988. https://doi.org/10.3390/app14177988

Chicago/Turabian Style

Wei, Peiyang, Liping Wang, Jianhong Gan, Xiaoyu Shi, and Mingsheng Shang. 2024. "Incorporation of Structural Similarity Index and Regularization Term into Neighbor2Neighbor Unsupervised Learning Model for Efficient Ultrasound Image Data Denoising" Applied Sciences 14, no. 17: 7988. https://doi.org/10.3390/app14177988

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Incorporation of Structural Similarity Index and Regularization Term into Neighbor2Neighbor Unsupervised Learning Model for Efficient Ultrasound Image Data Denoising

Abstract

1. Introduction