Next Article in Journal
One-Dimensional Creep Consolidation Model for Peat Soil
Previous Article in Journal
A Parcel Transportation and Delivery Mechanism for an Indoor Omnidirectional Robot
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Incorporation of Structural Similarity Index and Regularization Term into Neighbor2Neighbor Unsupervised Learning Model for Efficient Ultrasound Image Data Denoising

1
School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
2
School of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China
3
Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
4
Automatic Software Generation & Intelligence Service Key Laboratory of Sichuan Province, Chengdu 610225, China
5
School of Intelligent Science and Engineering, Chengdu Neusoft University, Chengdu 611844, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(17), 7988; https://doi.org/10.3390/app14177988
Submission received: 31 July 2024 / Revised: 1 September 2024 / Accepted: 5 September 2024 / Published: 6 September 2024
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

:
Medical ultrasound imaging is extensively employed for diagnostic purposes. However, image quality remains a major obstacle to achieving greater accuracy. Conventional supervised deep learning denoising methods often rely on matched noise-free and noisy image pairs, which can be highly challenging in practical ultrasound applications. Moreover, due to the limitations associated with independent noise, existing unsupervised denoising methods such as Neighbor2Neighbor are unable to efficiently address correlated noise in ultrasound images. Meanwhile, these methods utilize a random neighborhood downsampling technique, frequently resulting in pixel loss. Hence, this study proposes a novel Neighbor2Neighbor algorithm, which reconstructs ultrasound images by improving the downsampling approach. Moreover, it incorporates a structural similarity index and a regularization term, thereby enhancing its ability to suppress both independent and correlated noise. Extensive experiments on an ultrasound image dataset demonstrate that the proposed Neighbor2Neighbor algorithm outperforms the state-of-the-art baseline algorithms in peak signal-to-noise ratio (PSNR), mean structural similarity index measure (MSSIM), feature similarity index measure (FSIM), and edge preservation index (EPI).

1. Introduction

Ultrasound imaging technology is widely used in the medical field because of its non-invasiveness, low cost, high efficiency, convenience, and real-time performance. It has become one of the core tools for diagnosing various diseases in hospitals, especially in monitoring fetal development in pregnant women and diagnosing abdominal organ diseases. However, due to the special mechanism of ultrasound imaging, speckle noise is frequently generated during the imaging process, which reduces the quality of ultrasound images, mainly manifesting in poor image contrast and inconspicuous features characterizing the structural properties of tissues. These problems not only affect the clarity of the image but may also affect the accuracy of the diagnosis. Therefore, it is of vital significance to promote the quality of ultrasound medical images and provide auxiliary support in disease diagnosis.
In medical ultrasound imaging, low-frequency (3–30 MHz) ultrasound pulses are delivered to the patient through a sensor [1]. These pulses are reflected at the interface of different tissues and are captured by sensors and converted into electrical signals to generate ultrasound images. However, speckle noise frequently appears during this conversion, which is random interference resulting from scattering by fine particles [2]. This noise causes the image to appear as granular patterns in the bright and dark regions due to the random phase change and destructive interference of the ultrasound pulse. The existence of speckle noise reduces the detail resolution of the ultrasound image and then affects the overall quality of the image and the accuracy of the diagnosis. For the modeling of an echo envelope, Rayleigh distribution is commonly used to describe the echo envelope signal in ultrasound imaging [3]. In ultrasound image processing, common methods used to reduce speckle noise include the multi-angle plane composite technique and post-image filtering. Composite techniques include methods for combining content of different frequencies or images from different spatial views [4,5]. The post-image filtering method has been widely used in B-mode imaging [6]. These methods assume that speckle is a multiplicative noise, which can be filtered out, thus achieving the effect of suppressing speckle noise and ultimately raising the quality of the image. Among them, the nonlinear filter has the characteristics of retaining edges and smooth uniform areas [7].
For the past few years, with the development of deep learning, extensive researchers have used various convolutional neural networks to process image tasks. Furthermore, ultrasound image denoising is also one of the hotspots explored. Refs. [8,9] achieved the goal of image denoising by adding residual terms to the CNN network and formed a flexible image denoising solution. The above deep learning methods work extremely well in Gaussian denoising. However, these algorithms are based on supervised learning and require more noisy–clean image training. This kind of fully supervised learning method has a limited denoising effect in the practical application of ultrasound images. In contrast, unsupervised learning methods that do not rely on noise-free images appear to be more applicable for speckle denoising in ultrasound images.
Lehtinen et al. [10] proposed a depth noise canceller based on Noise2Noise (N2N) to train multiple noises with the same scene and learn the mapping relationship between two zero-mean noise images. When the number of samples is tiny, the transformation relationship of two zero mean noise patterns is learned. When the number of samples is enormous enough, since the noise is unpredictable, from the perspective of minimizing the loss function, N2N tends to output the expectation of all possible outputs, which is the signal itself of the clean image. However, N2N requires two noisy observations of the same scene, and this condition is also difficult to satisfy in realistic scenarios. Subsequently, the self-supervised denoising models Noise2Void (N2V) [11] and Noise2Self (N2S) [12] proposed by Krull et al. and Batson et al. can train the network with only one noisy observation for each scene. The N2V model uses a blind spot training strategy, in which the model excludes or masks pixels in the center of its receptive field during training. This design is based on the assumption that the noise has zero mean and is independent at the pixel level. By analyzing the local image context (not including the aforementioned blind spots), the model can predict the true intensity of a pixel. This strategy builds a probabilistic model through N2V to achieve coverage of noisy observations and pixel-level real signals. In this way, the N2V model effectively handles various noise levels and can effectively remove spatial variability noise. N2S utilizes a blind spot network architecture and a self-learning approach to predict a pixel’s value by examining its surrounding pixels. Again, regardless of the “blind spot”, the network implicitly learns the statistical properties of the noise present in the image, which allows the model to distinguish between the noise in each pixel and the underlying signal. This self-supervised approach allows the network to estimate the denoiser from noisy data, making it more effective against random noise mitigation. In research since then, probabilistic Noise2Void (PN2V) [13] and dilated blind-spot network [14] have introduced explicit noise modeling and probabilistic inference), as well as mask convolution and stacked expanded convolutional layers for better performance and faster training. However, the noise model is difficult to specify, especially in realistic scenarios of ultrasound images. Huang et al. [15] proposed a self-supervised framework called Neighbor2Neighbor, which trains a denoiser primarily by performing a single observation on noisy images. The framework mainly consists of two parts. First, a pair of noisy images is generated by random adjacent sub-samplers. Then, these sub-sampled image pairs are utilized for self-supervised training, while regularization loss is also introduced to cope with the non-zero actual difference between pairs of sub-sampled noise images. The regularization term mainly consists of the reconstruction part and the regularization part. Building upon the Neighbor2Neighbor framework, Song et al. [16] have integrated high-resolution anatomical MR images as auxiliary information, proposing a self-supervised method named neighbor-to-neighbor (NB2NB) that denoises PET images using a single noisy input. By employing a U-Net network architecture across three different resolution levels, this method can more delicately capture and restore the details in PET images, thereby significantly enhancing the denoising effect. This multi-resolution strategy not only optimizes noise suppression but also ensures the quantitative accuracy of the images, making them more suitable for specific types of medical imaging data.
Due to the limitation of independent noise, the current unsupervised denoising methods such as Ne2Ne cannot deal with correlated noise in ultrasound images. Meanwhile, the model causes the loss of pixels due to the random neighborhood downsampler adopted by the model.
To enhance the generalization ability and detail recovery capacity of the denoising model, our work proposes a new self-supervised learning ultrasound image denoising framework according to the limitations of the Ne2Ne algorithm. The main contributions of this paper are as follows:
(1) Based on the Ne2Ne algorithm, a term is added to constrain the structural similarity between neighborhood images, to suppress independent and correlated noise more effectively.
(2) Meanwhile, to enhance the quality of the denoising effect of the model, the downsampling strategy is differentially reorganized and optimized to ensure that the sub-image details of the subsamples are preserved. It is especially effective for the problem of pixel loss caused by random neighborhood downsamplers.
(3) Extensive experiments display that the devised improved Ne2Nes algorithm outperforms the state-of-the-art algorithm in terms of effectiveness and accuracy on ultrasound datasets.
The organization of this paper is as follows: literature survey in Section 2. In Section 3, an LTLM algorithm is proposed to identify geometric errors. In Section 4, compared with other algorithms, the performance of the LTLM algorithm is analyzed. Section 5 presents the grinding polishing simulation experiments. The conclusions are presented in Section 6.

2. Literature Survey

2.1. Speckle Noise Model

In ultrasound imaging, the scatterer density, spatial distribution, and the characteristics of the ultrasound imaging system affect the speckle noise model. When the scatterer density is high (more than 10 scatterers per resolution), as is common in blood flow cells, for example, the amplitude of the backscattered signal usually follows a Rayleigh distribution. Conversely, if the scatterer density is low, the amplitude of the backscattered signal follows a K distribution. The Rayleigh distribution can be viewed as a particular case of the K distribution [17]. However, in clinical ultrasound imaging systems, with ultrasound display equipment, nonlinear signal processing techniques (such as logarithmic compression and low-pass filtering) are introduced to address the echo envelope signal problem. This nonlinear compression has a great influence on the statistical characteristics of the echo envelope signal. For example, if the echo envelope signal originally follows a Rayleigh distribution, then after logarithmic compression, it will exhibit an F-T (Fisher–Tippet) distribution. The noise of the F-T distribution can be viewed as white Gaussian noise contaminated by outliers, and this assumption degrades the performance of the filter [18,19]. Usually, the echo envelope signal follows the K distribution, but unfortunately, the density function of the signal following the K distribution becomes complex after logarithmic compression [17]. Loupas et al. [20,21] pointed out in their study that speckle noise in ultrasound images may be related to the signal after logarithmic compression, and proposed an explicit simulation model for log-compressed ultrasound images:
μ x = v x + v γ x θ x
In this model, v(x) is the raw image without noise contamination, µ(x) is the actual observed ultrasound image, θ(x) is Gaussian noise with mean 0 and standard deviation δ2, and γ is a constant depending on the ultrasound equipment and imaging process. In general, γ = 0.5 is taken. In this paper, Equation (1) is used as the speckle noise model.

2.2. The Neighbor2Neighbor Model

The core idea of Noise2Noise is that for an unobserved clean scene x and two independent observed noisy images y, z, a noise-reduction network that can be trained with (y, z) pairs is tantamount to a network by training with (y, z) pairs if the noise is 0-mean. Noise2Noise minimizes the loss function θ:
arg min θ   E x , y , z f θ y z 2 2
where the neural network function fθ(y) is parameterized by θ.
Neighbor2Neighbor is an extension of Noise2Noise, which constructs similar noisy images from a single noisy image by designing a sampler. Moreover, to solve the stubbornness of image smoothing caused by different sampling positions of similar noisy images in the sampling process, a regularization term is introduced.
Neighbor2Neighbor mainly considers the following two aspects: two independent noisy images of similar scenes, and a single noisy image for a single scene.
Assumption 1:
There is a pure image x, and its corresponding noisy image is y; i.e.,
E y | x y = x
When a quite tiny image difference ε ≠ 0 is introduced, x + ε is the pure image corresponding to another noisy image z; i.e.,
E z | x z = x + ε
Suppose the variance of z is δ2; then
E x , y f θ y x 2 2 = E x , y , z f θ y z 2 2 δ z 2 + 2 ε E x , y f θ y x
When ε→0 the (y, z) pairing can be viewed as a neighborhood of Noise2Noise, finding y, z satisfying the “similar but not identical” condition allows an algorithm to train the denoising network.
Assumption 2:
For a single noisy image, Neighbor2Neighbor uses a sampler to create two “similar but not identical” images. The sub-images obtained by sampling adjacent spots of the raw image satisfy the constraint that the differences between each other are tiny, but the matching pure images are not identical (when ε→0). For a noisy image y, Neigbor2Neighbor constructs a pair of nearest neighbor samplers g1(*) and g2(*), and samples two sub-images g1(y), g2(y). We directly construct pairs with these two sub-images. Moreover, the noise reduction network is trained in a Noise2Noise manner; then
arg min θ   E x , y f θ g 1 y g 2 y 2
Neighbor2Neighbor calls this approach pseudo Noise2Noise. Since g1(y), g2(y) are sampled at different locations, the resulting denoising model is not optimal and leads to excessive smoothing. Therefore, Neighbor2Neighbor considers correcting this situation by adding a regularization term on the loss, adding a constraint based on pseudo Noise2Noise, transforming the optimization problem with constraints into an optimization problem with regularization, and finally optimizing the loss function:
L = L r e c + γ L r e g = f θ g 1 y g 2 y 2 2   + γ f θ g 1 y g 2 y g 1 f θ y g 2 f θ y 2 2
where Lrec is the reconstruction term based on the network output and the noisy target. Lreg is the regularization term and γ is the hyperparameter that controls the strength of the regularization term.

3. Related Methodologies

3.1. Downsampling Strategy

From a single noisy image y, the sub-images g1(y), g2(y), g3(y), g4(y) are constructed by sampler G, which are detailed below.
The pixels of the input image y of size Nch × H × W are reorganized into a lower-resolution image of size 4Nch × H/2 × W/2. The image is divided into multiple 2 × 2 cells, and its pixels are reorganized in different channels of the output image in accordance with the below equation:
G c , m , n = I c 4 , 2 m + c mod 2 , 2 n + c 2
where 0 ≤ c < 4Nch, 0 ≤ m < H, 0 ≤ n < W; since this paper targets gray-scale ultrasound images, the number of channels Nch is set to 1.
A schematic diagram of the sampling strategy of sampler G is shown in Figure 1.

3.2. The Structure Similarity Index Measure Loss Function

In order to measure the structure similarity index measure (SSIM) [22] of the perceptual distance between two images, we try to construct the structural similarity loss function Lssim between the downsampled sample images g1(y), g2(y), g3(y), g4(y). The SSIM index considers the similarity of luminance l(x,y), contrast c(x,y), and structure s(x,y) information structure of images x and y to measure the perceptual distance of two images. From the similarity function for x and y
S x , y = f l x , y , c x , y , s x , y
the luminance similarity l(x,y) between image x and y is given by
l x , y = 2 μ x μ y + ε 1 μ x 2 + μ y 2 + ε 1
where ε1 is the constant that goes to 0 to avoid a zero denominator, where ε1 = (K1L)2, and K1 = 0.01 is used as a hyperparameter, and L is viewed as the actional range with respect to the y pixel values of the target image. Let and denote the luminance value representing the maximum number of pixels in the histograms of the x and y images, respectively. Figure 2 shows a histogram of an image, where the abscissa represents the pixels of the image and the ordinate represents the number of pixels.
Commonly, the content information of ultrasound images is concentrated in the image numerical area corresponding to the area with more pixels in the histogram, and these areas often contain important diagnostic information. Therefore, the brightness similarity used in this paper is to take the pixel value corresponding to the high point in the histogram as the image brightness value in Lssim, instead of the brightness average value in the original Lssim. This improvement is very important to improve the diagnostic accuracy and image quality of ultrasound images, especially in addressing complex images with uneven brightness distribution. Assuming the value of every pixel is xi, the expression is as follows:
μ x = max n u m x i
where num() is the number of pixels.
The contrastive similarity c(x,y) function between image x and y is given by:
c ( x , y ) = 2 σ x σ y + ε 2 δ x 2 + δ y 2 + ε 2
ε2 is the small constant to avoid zeros in the denominator, where ε2 = (K2L)2, K2 = 0.03 as hyperparameters. Let δx and δy denote the standard deviation representing the pixel values of the x and y images, respectively.
The structural similarity s(x,y) function between image x and y is denoted by:
s x , y = δ x y + ε 3 δ x δ y + ε 3
ε3 is the tiny constant to avoid zeros in the denominator. Denote by δxy the standard deviation of the pixel values of x and y images. Setting ε3 = ε2/2, the numerator of c(x,y) and the denominator of s(x,y) cancel out by reduction, and finally the Lssim loss function is obtained as:
L s s i m = 1 S S I M x , y = 1 2 μ x μ y + ε 1 2 δ x y + ε 2 μ x 2 + μ y 2 + ε 1 δ x 2 + δ y 2 + ε 2
where δ y x is the variance of x, δxy is the covariance of x,y, ε1 and ε2 are tiny constants to avoid zeros in the denominator, where ε1 = (K1L)2 and ε2 = (K2L)2, and where K1 = 0.01 and K2 = 0.03 are two hyperparameters. L is treated as the actional scope of pixel values on the basis of the target image y. In this research, x and y correspond to g1(y), g3(y) and g2(y), g4(y), respectively.

3.3. Unsupervised Learning Model Based on Improved Neighbor2Neighbor

Aiming at the sampler description in Section 3.1, training image pairs are extracted from the single noise image to be used as input to the self-supervised training algorithm in this section. Given a pair of subsampled images (g1(y), g2(y), g3(y), g4(y)) derived from noisy image y, we train the denoising network using the structural similarity loss and the regularization loss proposed in Section 3.2.
L = L r e c + γ L r e g + η L s s i m
where γ is a hyperparameter that regulates the intensity of the regularization term and η is a hyperparameter of the structural similarity loss function. For better ablation experiments, γ = 2 and γ = 1 are used for the synthetic experiments and the practical experiments as in the Neighbor2Neighbor model, respectively. We stop the gradients of g1(y), g2(y), g3(y), and g4(y) during training and gradually raise η to the appointed value.
Lrec is the built loss function, where Lrec is defined as:
L r e c = f θ g 1 ( y ) g 3 y 2 + f θ g 2 ( y ) g 4 y 2
The regularized loss function Lreg is defined as:
L r e g = f θ g 1 ( y ) g 3 y g 1 f θ y g 2 f θ y 2   + f θ g 2 ( y ) g 4 y g 3 f θ y g 4 f θ y 2
where fθ is a denoising network design, and the training framework is described in Algorithm 1. The specific improved model of this paper is shown in Figure 3.
Algorithm 1. Based on improved Neighbor2Neighbor
Input: A set of noisy images Y, Denoising network fθ, Hyperparameter γ and η
Operation
 1.while not converged do
 2.   Sample a noisy image yY;
 3.   Generate Sampler G = (g1,g2,g3,g4);
 4.   Desire the sub-sample images (g1(y), g2(y), g3(y), g4(y)), where g1(y), g2(y) is the network input, and g3(y), g4(y) is the network target;
 5. Calculate Lssim;
 6. For the network input g1(y), g2(y), derive the denoised image fθ(g1(y)), fθ(g2(y));
 7. Calculate Lrec;
 8. For the original noisy image y, derive the denoised image fθ(y) with no gradients;
 9. Use the same sub-sampler G to derive the images (g1(fθ(y)), g2(fθ(y)), g3(fθ(y)), g4(fθ(y)));
 10. Calculate Lreg;
 11. Update the denoising network fθ by minimizing the objective Lrec + γ·Lreg + η·Lssim
 12.end
Output: denoise images Y

4. Evaluation Metrics

4.1. Peak Signal-to-Noise Ratio

Peak signal-to-noise ratio (PSNR) assesses image character ground on the errors of matching pixels.
MSE = 1 H × W i = 1 H j = 1 W ( X ( i , j ) Y ( i , j ) ) 2
MSE denotes the mean square error of the images X and Y, where X and Y represent the noisy and denoised images, respectively, while H and W are the image dimensions. The PSNR ratio is denoted by:
P S N R = 20 log 10 ( 255 M S E ( X , Y ) )

4.2. Mean Structural Similarity

Mean structural similarity quality measurement (MSSIM) [23] evaluates image quality based on information structure degradation. The MSSIM evaluation metric is more stable than SSIM, especially for images with different resolutions.
M S S I M = 1 M i = 1 M ( 2 μ x μ y + C 1 ) ( 2 σ x y + C 2 ) ( μ x 2 + μ y 2 + C 1 ) ( σ x 2 + σ y 2 + C 2 )
where μi and σi denote the mean and standard deviation of its intensity for the ith local window pixel, respectively. Ci is a constant to prevent errors. The upper and lower boundary range of the MSSIM value is [0, 1].

4.3. Feature Similarity Index Measure

Feature similarity index measure (FSIM) [24] is mainly used for quality assessment of feature similarity. FSIM employs phase congruency (PC) and gradient magnitude (GM) features to complement each other. Based on the relative invariance of PC to the image change, the stability features in the image are extracted. Conversely, GM is mainly used for feature extraction when the image changes.
It is assumed that the phase congruency of the raw image f1 and the denoised image f2 can be represented by PC1 and PC2, while the gradient features are represented by G1 and G2. The similarity between these two images is computed as:
S P C = 2 P C 1 P C 2 + T 1 P C 1 2 + P C 2 2 + T 1
where T1 is a positive constant that can improve the stability of SPC.
Similarly, the similarity of G1 and G2 can be computed:
S G = 2 G 1 G 2 + T 2 G 1 2 + G 2 2 + T 2
where T2 is a positive invariable hinging on the actional scope of gradient magnitude values.
The similarity SL of f1 and f2 is calculated by SPC and SG.
S L ( x ) = S P C ( x ) α + S G ( x ) β
The relative importance of PC and GM characteristics is modified by parameters α and β. In our research, for convenience, let us set α = β = 1. The value of FSIM ranges from [0, 1].

4.4. Edge Preservation Index

The edge preservation index (EPI) [25] measures the capacity of denoised images to preserve edge particulars.
E P I = i j Y ( i + 1 , j ) Y ( i , j ) i j X ( i + 1 , j ) X ( i , j )
where X and Y denote the noisy and denoised image, while i and j denote the coordinates in the vertical and horizontal directions in the image.
The larger the value of these evaluation metrics, the better the denoising result.

5. Experiment Settings

5.1. Baselines

In this experiment, the performance of the algorithm needs to be assessed quantitatively and qualitatively. By performing algorithm experiments on the ultrasonic image dataset, those metrics values are obtained to check on the benefits of the algorithm in spot restraint and contrast gain. We compare the proposed algorithm based on the improved Neighbor2Neighbor algorithm with the baseline methods Noise2Noise [10] and Neighbor2Neighbor [15], the traditional noise reducer BM3D (Block-matching and 3D filtering) [26], and Noise2Void (N2V) [11]. To ensure the fairness and comparability of the experiment, all parameters of baseline algorithms are adopted to the best values in the paper.
During the realization of the denoising algorithm presented in this paper, the configuration of specific parameters is mandatory. This section details the training of our proposed model utilizing the PyTorch framework, with the experimental parameters delineated in Table 1. To investigate the denoising efficacy on both synthetic and authentic images, distinct initial learning rates are assigned for these conditions: 0.0003 for synthetic image denoising and 0.0001 for real image denoising. The training regimen spans 100 epochs, with a learning rate reduction by half every 20 epochs to facilitate the training process and enhance the optimization outcome, while maintaining a batch size of 4. The Adam optimizer is engaged throughout the training phase, capitalizing on its superior adaptive qualities to refine model performance. Moreover, the regularization term’s intensity, modulated by the hyperparameter γ, is designated as 2 for synthetic experiments and 1 for real-world experiments, in pursuit of optimal denoising efficacy across varying contexts.

5.2. Experimental Environment and Dataset

The experiment is carried out in the python3.9 and pytorch1.7.1 environments. The computing platform is Intel(R) Xeon(R) Gold 6142 CPU @ 2.60 GHz, 2.59 GHz memory; the GPU is NVIDIA RTX A5000, Windows11 system.
Due to the lack of public datasets in ultrasound image denoising tasks, the dataset used for synthetic experiments in this paper is the ultrasound images with less speckle noise and ideal imaging quality collected by the Philips ultrasound imaging system in the hospital ultrasound imaging department. The training dataset includes 1300 clean ultrasound images with a resolution of 256 × 256. We synthesize simulated actual medical ultrasound images in the light of the speckle noise model of equ.1. For real medical ultrasound images, we use real ultrasound images of the liver collected by a portable ultrasound diagnostic instrument, including 1300 real ultrasound images. The original resolution is 256 × 504. Before training, in order to ensure the consistency of model input and reduce the computational burden of the model, the training data is center-cropped to 256 × 256 pixels. Figure 4 shows some real medical ultrasound images.

6. Experimental Results and Analysis

6.1. Ablation Experiment

In this study, we analyze the impact of structural similarity loss function and downsampling strategy on model performance by ablation experiments. Specifically, we compare the denoising results after the improved modules are successively added to the network model with the original Neighbor2Neighbor model, thus evaluating the contribution of these improvements to the denoising performance of the model.

6.1.1. The η Hyperparameter in Lssim

The hyperparameter η in Equation (14) is used to control the strength of the structural similarity loss function. To fully evaluate the influence of this parameter, we set different η values in the ablation experiments and assess the result of the proposed method on the test dataset. The experimental results, presented in detail in Table 2 and Table 3, reveal the performance of the model for different values of η about metrics PSNR and MSSIM. It needs to be emphasized that this ablation experiment was conducted under the speckle noise model (Equation (1)), where the standard deviation δ of the speckle noise was set to a series of different values (0.1, 0.2, 0.3, 0.4, 0.5 and 0.6), aiming to comprehensively explore the denoising effect of the model at kinds of noise classes. To comprehensively evaluate the denoising effect of the model under different noise levels.
The findings of this study reveal that varying values of the hyperparameter η significantly influence the model’s performance on the PSNR and MSSIM metrics. Adjustment of η within an appropriate range can augment the denoising efficacy of the model. Nonetheless, values of η that are either too high or too flat may result in a diminished denoising effect. The optimal PSNR is attained when the standard deviation δ is set at 0.5 and the hyperparameter η is configured to 1. At an η value of 1.5, the model achieves the highest mean structural similarity index. However, an upward trend in noise levels leads to a degradation in the model’s denoising capabilities. This study highlights the importance of considering different noise levels in designing and optimizing denoising models. The incorporation of the SSIM, with its hyperparameter η, and its tuning, results in discernible enhancements in denoising performance. These metrics the substantial contribution of image structural similarity to the advancement of ultrasound image denoising outcomes.

6.1.2. Results of Ultrasound Image Denoising with Different Loss Functions

In the experimental part of this study, we can be centered on evaluating the influence of different loss functions on model performance. Specifically, we employ different loss functions for model training: no structural similarity loss function or structural similarity loss function Lssim. To fairly evaluate the effect of these loss functions, we tested them on an ultrasound test image dataset under simulated speckle noise with standard deviation δ = 0.5. The test consequences are displayed in Table 4, providing a comparison of the mean PSNR and MSSIM values of the models with different loss functions. Specifically, the model incorporating the SSIM loss function, Lssim, demonstrated an improvement of 0.96 dB in PSNR and 0.03 in MSSIM when compared to the model without the structural similarity loss function.
In addition, Figure 5 displays the denoising outcomes of different loss functions on the ultrasound test images dataset, from which it can be clearly observed that compared with the traditional structural similarity loss function, the improved Lssim loss function proposed in this study shows better PSNR and MSSIM performance in the fusing trained model. This result highlights the effectiveness of the modified loss function in this research in building up the denoising metrics of the model.

6.2. Experiment on Simulated Noisy Images

In this experiment, the public breast ultrasound image dataset [27] is used, the data included breast ultrasound images of women aged 25 to 75, and the magnitude of patients is 600 women suffering from a given condition. The dataset is made up of 780 images with a size of 500 × 500 pixels. For the convenience of comparison, the test data are cropped to a grayscale image with a size of 256 × 256, and speckle noise is simulated on it. To quantitatively evaluate all the compared algorithms, 3 speckle noise levels δ = {1.0, 0.3, 0.5} are tested. PSNR and MSSIM indicators are selected for evaluation, and the specific outcomes are displayed in Table 4.
Table 5 displays the consequence of PSNR and MSSIM for diverse noise classes. In contrast, the improved Neighbor2Neighbor improves the PSNR compared with Neighbor2Neighbor, and the denoising influence can exceed Neighbor2Neighbor. The mean structural similarity of images would take the edge off Neighbor2Neighbor. With the raising of the noise level, the advantage of the proposed algorithm in structural similarity is also prominent, notably with regard to severe noise.

6.3. Real Ultrasound Image Experiment

In our research, actual ultrasound images collected by a Healson U20 detect genre ultrasound diagnostic device are used. The liver B-mode ultrasound images of 256 × 256 and 472 × 256 are tested, respectively. The dataset is objectively evaluated by three noise evaluation metrics: MSSIM, FSIM, and EPI. The data analysis in Table 6 reveals that the refined Neighbor2Neighbor approach introduced in this research substantially surpasses alternative algorithms in preserving structural similarity and edge retention, as indicated by the MSSIM, FSIM, and EPI. The proposed method achieves a mean increase of 0.01 in MSSIM, 0.02 in FSIM, and 0.06 in EPI when compared to the original Neighbor2Neighbor technique. Furthermore, against the BM3D algorithm, the improvements are even more significant, with average enhancements of 0.10 in MSSIM, 0.15 in FSIM, and 0.51 in EPI. These results underscore the method’s efficacy in noise reduction while concurrently maintaining the structural integrity and original edge details of the images.
Additionally, Table 6 presents the processing times for denoising a single image using different denoising methods. It is evident that our approach demonstrates a significantly higher implementation efficiency compared to traditional algorithms. Although the proposed method exhibits a slightly higher runtime than the Noise2Noise, Noise2Void, and Neighbor2Neighbor speckle reduction methods, it only requires an average of 7.76 milliseconds to process speckle noise images. This is advantageous for the utilization of end-to-end network architectures as well as GPU and multi-threading capabilities.

7. Conclusions

Our research proposes a novel ultrasound image denoising algorithm on the basis of the improved Neighbor2Neighbor unsupervised learning model. Our proposed method is compared with some other classical denoising techniques, such as BM3D, Noise2Noise, Noise2Void, etc. The experimental results show that the improved Neighbor2Neighbor unsupervised learning model can effectively reduce the influence of speckle noise while maintaining the edge and detail information in the image, thus effectively improving the quality of the image. In addition, the proposed method also successfully addresses the issue that the original model cannot effectively suppress correlated or structured noise by adopting a new downsampling strategy, generating subsampled paired images as training images, and employing a self-supervised training mechanism with structural similarity loss. This improvement enhances the ability of the model in terms of detail retention. However, this research is trained and tested on a specific dataset, so the application of these results to other datasets may have certain limitations. For future work, we look forward to acquiring images from different devices and patient populations from collaborating clinical centers, thereby assessing the robustness and generalization of the algorithm in different environments. Moreover, we will also introduce evolutionary computation to optimize the hyperparameters of the network in the next work, which reduces the number of epochs, and achieves faster convergence of the model, thereby attaining our goal of improving the performance of the model.

Author Contributions

Conceptualization, P.W. and J.G.; methodology, M.S.; software, L.W.; validation, J.G., L.W. and X.S.; formal analysis, P.W.; investigation, P.W. and L.W.; resources, J.G.; data curation, J.G.; writing—original draft preparation, P.W.; writing—review and editing, P.W. and L.W.; visualization, P.W. and L.W.; supervision, J.G., X.S. and M.S.; project administration, M.S.; funding acquisition, P.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Sichuan University Students’ Innovation and Entrepreneurship Training Program (S202210621090), Chengdu University of Information Technology Key Project of Education Reform (JYJG2022090/JYJG2023212).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the correspondence author on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Baselice, F.; Ferraioli, G.; Ambrosanio, M.; Pascazio, V.; Schirinzi, G. Enhanced Wiener filter for ultrasound image restoration. Comput. Methods Programs Biomed. 2018, 153, 71–81. [Google Scholar] [CrossRef] [PubMed]
  2. Islam, S.; Norouzian, M.; Turner, J.A. Influence of tessellation morphology on ultrasonic scattering. J. Acoust. Soc. Am. 2022, 152, 1951–1961. [Google Scholar] [CrossRef] [PubMed]
  3. Azar, A.A.; Rivaz, H.; Boctor, E.M. Speckle Detection in Echocardiographic Images. In Echocardiography-New Techniques; IntechOpen: London, UK, 2012. [Google Scholar]
  4. May, N.; Phoulady, A.; Choi, H.; Tavousi, P.; Shahbazmohamadi, S. Single image composite tomography utilizing large scale femtosecond laser cross-sectioning and scanning electron microscopy. Microsc. Microanal. 2022, 28 (Suppl. S1), 876–878. [Google Scholar] [CrossRef]
  5. Song, L.; Li, Y.; Dong, G.; Lambo, R.; Qin, W.; Wang, Y.; Zhang, G.; Liu, J.; Xie, Y. Artificial intelligence-based bone-enhanced magnetic resonance image—A computed tomography/magnetic resonance image composite image modality in nasopharyngeal carcinoma radiotherapy. Quant. Imaging Med. Surg. 2021, 11, 4709–4720. [Google Scholar] [CrossRef] [PubMed]
  6. Jamthikar, A.D.; Gupta, D.; Puvvula, A.; Amer, M.J.; Narendra, N.K.; Luca, S.; Sophie, M.; John, R.L.; Gyan, P.; Martin, M.; et al. Cardiovascular risk assessment in patients with rheumatoid arthritis using carotid ultrasound B-mode imaging. Rheumatol. Int. 2020, 40, 1921–1939. [Google Scholar] [CrossRef] [PubMed]
  7. Prabusankarlal, K.M.; Manavalan, R.; Sivaranjani, R. An optimized non-local means filter using automated clustering based preclassification through gap statistics for speckle reduction in breast ultrasound images. Appl. Comput. Inform. 2018, 14, 48–54. [Google Scholar] [CrossRef]
  8. Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [PubMed]
  9. Zhang, K.; Zuo, W.; Zhang, L. FFDNet: Toward a fast and flexible solution for CNN-based image denoising. IEEE Trans. Image Process. 2018, 27, 4608–4622. [Google Scholar] [CrossRef] [PubMed]
  10. Lehtinen, J.; Munkberg, J.; Hasselgren, J.; Laine, S.; Karras, T.; Aittala, M.; Aila, T. Noise2Noise: Learning image restoration without clean data. arXiv 2018, arXiv:1803.04189. [Google Scholar]
  11. Krull, A.; Buchholz, T.O.; Jug, F. Noise2void-learning denoising from single noisy images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2129–2137. [Google Scholar]
  12. Batson, J.; Royer, L. Noise2self: Blind denoising by self-supervision. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 15–20 June 2019; pp. 524–533. [Google Scholar]
  13. Krull, A.; Vičar, T.; Prakash, M.; Lalit, M.; Jug, F. Probabilistic noise2void: Unsupervised content-aware denoising. Front. Comput. Sci. 2020, 2, 5. [Google Scholar] [CrossRef]
  14. Wu, X.; Liu, M.; Cao, Y.; Ren, D.; Zuo, W. Unpaired Learning of Deep Image Denoising. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; LNCS 12349. pp. 352–368. [Google Scholar]
  15. Huang, T.; Li, S.; Jia, X.; Lu, H.; Liu, J. Neighbor2neighbor: Self-supervised denoising from single noisy images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14781–14790. [Google Scholar]
  16. Song, T.A.; Yang, F.; Dutta, J. Self-supervised PET image denoising using a neighbor-to-neighbor network. J. Nucl. Med. June 2023, 64 (Suppl. S1), 1380. [Google Scholar]
  17. Dutt, V.; Greenleaf, J.F. Adaptive speckle reduction filter for log-compressed B-scan images. IEEE Trans. Med. Imaging 1996, 15, 802–813. [Google Scholar] [CrossRef] [PubMed]
  18. Michailovich, O.V.; Tannenbaum, A. Despeckling of medical ultrasound images. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2006, 53, 64–78. [Google Scholar] [CrossRef] [PubMed]
  19. Yu, Y.; Acton, S.T. Speckle reducing anisotropic diffusion. IEEE Trans. Image Process. 2002, 11, 1260–1270. [Google Scholar] [PubMed]
  20. Loupas, T. Digital Image Processing for Noise Reduction in Medical Ultrasonics. Ph.D. Thesis, University of Edinburgh, Edinburgh, UK, 1988. [Google Scholar]
  21. Loupas, T.; McDicken, W.N.; Allan, P.L. An adaptive weighted median filter for speckle suppression in medical ultrasonic images. IEEE Trans. Circuits Syst. 1989, 36, 129–135. [Google Scholar] [CrossRef]
  22. Bhatt, R.; Naik, N.; Subramanian, V.K. SSIM compliant modeling framework with denoising and deblurring applications. IEEE Trans. Image Process. 2021, 30, 2611–2626. [Google Scholar] [CrossRef]
  23. Yamaya, H.; Mimura, Y.; Yamazaki, H.; Yanagida, H. 2Pa5-1 Image Quality Assessment of 3D Ultrasound Images Based on SSIM. Proc. Symp. Ultrason. Electron. 2021, 42, 2Pa5-1. [Google Scholar]
  24. Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A feature similarity index for image quality assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef] [PubMed]
  25. Sun, Y.; Xin, Z.; Huang, X.; Wang, Z.; Xuan, J. Overview of SAR Image Denoising Based on Transform Domain. In 3D Imaging Technologies—Multi-Dimensional Signal Processing and Deep Learning; Smart Innovation, Systems and Technologies; Springer: Singapore, 2021; p. 234. [Google Scholar]
  26. Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef] [PubMed]
  27. Al-Dhabyani, W.; Gomaa, M.; Khaled, H.; Fahmy, A. Dataset of breast ultrasound images. Data Brief 2020, 28, 104863. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Illustration of the strategy for generating sub-images by downsampling G = (g1, g2, g3, g4), where k = 2; in every 2 × 2 cell, the pixels in the top left, top right, bottom left, and bottom right of the pixel matrix are filled with red, blue, orange, and green, respectively. The pixels padding red, blue, orange, and green are the pixels of the subsampled images g1(y), g2(y), g3(y), and g4(y), respectively. The subsampled paired images (g1(y), g2(y), g3(y), g4(y)) are shown as red patches, blue patches, orange patches, and green plaques on the right.
Figure 1. Illustration of the strategy for generating sub-images by downsampling G = (g1, g2, g3, g4), where k = 2; in every 2 × 2 cell, the pixels in the top left, top right, bottom left, and bottom right of the pixel matrix are filled with red, blue, orange, and green, respectively. The pixels padding red, blue, orange, and green are the pixels of the subsampled images g1(y), g2(y), g3(y), and g4(y), respectively. The subsampled paired images (g1(y), g2(y), g3(y), g4(y)) are shown as red patches, blue patches, orange patches, and green plaques on the right.
Applsci 14 07988 g001
Figure 2. Flowchart of LTLM compensation.
Figure 2. Flowchart of LTLM compensation.
Applsci 14 07988 g002
Figure 3. Unsupervised Learning Model based on Improved Neighbor2Neighbor.
Figure 3. Unsupervised Learning Model based on Improved Neighbor2Neighbor.
Applsci 14 07988 g003
Figure 4. Some images of the real ultrasound dataset.
Figure 4. Some images of the real ultrasound dataset.
Applsci 14 07988 g004
Figure 5. Results of ultrasound image denoising with different loss functions.
Figure 5. Results of ultrasound image denoising with different loss functions.
Applsci 14 07988 g005
Table 1. Training parameters of the neural network model.
Table 1. Training parameters of the neural network model.
ParametersParameter Values
Batch Size4
Iteration MethodAdam
Number of Iterations100
Initial Learning Rate0.0003 (Synthetic Experiments)
0.0001 (Real Experiments)
Regularization Intensity Control Parameter η2 (Synthetic Experiments)
1 (Real Experiments)
Table 2. Ablation experiment of hyperparameter η (PSNR/dB).
Table 2. Ablation experiment of hyperparameter η (PSNR/dB).
Denoising AlgorithmsThe Standard Deviation σ of the Speckle Noise
0.10.20.30.40.50.6
No Lssim30.368130.716031.433530.329331.789630.9377
Lssim η = 0.531.154731.817632.014332.409733.457631.0501
Lssim η = 135.600735.246133.534132.933136.219331.2005
Lssim η = 1.530.851331.845632.451931.064633.005330.7907
Table 3. Ablation experiment of hyperparameter η (MSSIM).
Table 3. Ablation experiment of hyperparameter η (MSSIM).
Denoising AlgorithmsThe Standard Deviation σ of the Speckle Noise
0.10.20.30.40.50.6
No Lssim0.61120.65300.66850.67010.67750.6736
Lssim η = 0.50.62010.64190.67890.67970.68100.5801
Lssim η = 10.63060.66000.64270.65880.65260.6728
Lssim η = 1.50.64520.65160.65840.66550.68610.6673
Table 4. Average quantitative results of different loss functions on the test dataset.
Table 4. Average quantitative results of different loss functions on the test dataset.
ModelPSNR/dBMSSIM2
Neighbor2Neighbor28.64450.6177
Neighbor2Neighbor + Lssim29.60930.6525
Table 5. Performance comparison of different noise level denoising methods (PSNR, MSSIM).
Table 5. Performance comparison of different noise level denoising methods (PSNR, MSSIM).
Denoising Algorithmsδ = 0.1δ = 0.3δ = 0.5
PSNR/dBMSSIMPSNR/dBMSSIMPSNR/dBMSSIM
Noisy33.09750.755429.34560.684528.84370.6371
Neighbor2Neighbor35.06960.858931.49450.788728.94610.7564
Neighbor2Neighbor + Lssim44.41220.852140.75400.828434.93970.8136
Table 6. Performance comparison of different denoising methods (MSSIM, FSIM, EPI).
Table 6. Performance comparison of different denoising methods (MSSIM, FSIM, EPI).
Denoising AlgorithmsImage1Image2
MSSIMFSIMEPITime (ms)MSSIMFSIMEPITime (ms)
BM3D0.55200.72590.25186200.52110.70600.2224635
Noise2Noise0.48590.72170.33925.240.51330.65000.34185.33
Noise2Void0.56460.74340.34414.320.53640.72570.33344.56
Neighbor2Neighbor0.60860.81090.54946.780.57930.80560.52716.98
Neighbor2Neighbor + Lssim0.61020.82440.59297.670.59510.81390.56097.84
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wei, P.; Wang, L.; Gan, J.; Shi, X.; Shang, M. Incorporation of Structural Similarity Index and Regularization Term into Neighbor2Neighbor Unsupervised Learning Model for Efficient Ultrasound Image Data Denoising. Appl. Sci. 2024, 14, 7988. https://doi.org/10.3390/app14177988

AMA Style

Wei P, Wang L, Gan J, Shi X, Shang M. Incorporation of Structural Similarity Index and Regularization Term into Neighbor2Neighbor Unsupervised Learning Model for Efficient Ultrasound Image Data Denoising. Applied Sciences. 2024; 14(17):7988. https://doi.org/10.3390/app14177988

Chicago/Turabian Style

Wei, Peiyang, Liping Wang, Jianhong Gan, Xiaoyu Shi, and Mingsheng Shang. 2024. "Incorporation of Structural Similarity Index and Regularization Term into Neighbor2Neighbor Unsupervised Learning Model for Efficient Ultrasound Image Data Denoising" Applied Sciences 14, no. 17: 7988. https://doi.org/10.3390/app14177988

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop