Innovative Noise Extraction and Denoising in Low-Dose CT Using a Supervised Deep Learning Framework

Zhang, Wei; Salmi, Abderrahmane; Yang, Chifu; Jiang, Feng

doi:10.3390/electronics13163184

Open AccessArticle

Innovative Noise Extraction and Denoising in Low-Dose CT Using a Supervised Deep Learning Framework

¹

School of Electromechanical Engineering, Harbin Institute of Technology, Harbin 150001, China

²

School of Computer Science, Harbin Institute of Technology, Harbin 150001, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(16), 3184; https://doi.org/10.3390/electronics13163184

Submission received: 28 June 2024 / Revised: 1 August 2024 / Accepted: 6 August 2024 / Published: 12 August 2024

(This article belongs to the Special Issue Advanced Internet of Things Solutions and Technologies)

Download

Browse Figures

Versions Notes

Abstract

:

Low-dose computed tomography (LDCT) imaging is a critical tool in medical diagnostics due to its reduced radiation exposure. However, this reduction often results in increased noise levels, compromising image quality and diagnostic accuracy. Despite advancements in denoising techniques, a robust method that effectively balances noise reduction and detail preservation remains a significant need. Current denoising algorithms frequently fail to maintain the necessary balance between suppressing noise and preserving crucial diagnostic details. Addressing this gap, our study focuses on developing a deep learning-based denoising algorithm that enhances LDCT image quality without losing essential diagnostic information. Here we present a novel supervised learning-based LDCT denoising algorithm that employs innovative noise extraction and denoising techniques. Our method significantly enhances LDCT image quality by incorporating multiple attention mechanisms within a U-Net-like architecture. Our approach includes a noise extraction network designed to capture diverse noise patterns precisely. This network is integrated into a comprehensive denoising system consisting of a generator network, a discriminator network, and a feature extraction AutoEncoder network. The generator network removes noise and produces high-quality CT images, while the discriminator network differentiates real images from denoised ones, improving the realism of the outputs. The AutoEncoder network ensures the preservation of image details and diagnostic integrity. Our method improves the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) by 7.777 and 0.128 compared to LDCT, by 0.483 and 0.064 compared to residual encoder–decoder convolutional neural network (RED-CNN), by 4.101 and 0.017 compared to Wasserstein generative adversarial network–visual geometry group (WGAN-VGG), and by 3.895 and 0.011 compared to Wasserstein generative adversarial network–autoencoder (WGAN-AE). This demonstrates that our method has a significant advantage in enhancing the signal-to-noise ratio of images. Extensive experiments on multiple standard datasets demonstrate our method’s superior performance in noise suppression and image quality enhancement compared to existing techniques. Our findings significantly impact medical imaging, particularly improving LDCT scan diagnostic accuracy. The enhanced image clarity and detail preservation offered by our method open new avenues for clinical applications and research. This improvement in LDCT image quality promises substantial contributions to clinical diagnostics, disease detection, and treatment planning, ensuring high-quality diagnostic outcomes while minimizing patient radiation exposure.

Keywords:

low-dose CT noise reduction; deep learning; self-encoders; generative adversarial networks

1. Introduction

Computed tomography (CT) is a widely utilized clinical imaging modality that provides non-invasive, multi-directional images of various parts of the human body without tissue overlap, using X-rays,

γ

-rays, and other modalities. CT scanning offers rapid image acquisition and clear imaging characteristics, making it valuable for diagnosing a wide range of diseases. However, the use of X-rays (or other similar modalities) in the image acquisition process necessitates consideration of the ionizing radiation’s impact on the human body during CT scanning [1]. Prolonged exposure to X-ray radiation beyond certain thresholds can disrupt cell metabolism or even cause diseases such as cancer. Therefore, repeated CT scans for patients requiring regular evaluations can significantly elevate cancer risk. Additionally, studies indicate that children absorb considerably higher radiation doses than adults [2]. With the growing prevalence of CT usage, awareness of radiation risks is increasing across society. Consequently, CT examinations have become a primary source of medically induced radiation in numerous developed countries.

To mitigate potential radiation hazards associated with CT examinations, Naidich et al. [3] introduced the concept of LDCT in the 1990s. LDCT gained significant attention for its efficacy in lung cancer screening, with research focusing on its ability to detect early-stage lung cancer and improve post-detection survival rates. Some studies found that LDCT’s screening positivity rate was three times higher than that of X-ray chest radiography, with its lung cancer detection capability four times higher, underscoring LDCT’s effectiveness in lung cancer detection. The prevalent approach to LDCT acquisition involves reducing tube current and the number of projections, as tube current (mA) and radiation dose exhibit a linear relationship [4]. Consequently, reducing tube current effectively decreases radiation dose. However, decreasing tube current leads to increased image noise, inversely proportional to the square root of the tube current, thereby compromising image quality.

Despite LDCT’s reduction in radiation dose, noise remains a significant concern for clinicians during diagnosis. Reduced tube current decreases the number of photons available for imaging, leading to speckle noise and directional barring artefacts in CT reconstruction images, impairing diagnostic accuracy [5,6,7]. Recovering LDCT is a burgeoning research area, with current algorithms striving to strike a balance between noise reduction and preservation of essential information, aiming for an optimal outcome. While significant progress has been made, achieving this balance remains an important yet challenging endeavor, with various methods explored. Deep learning’s advancements in traditional image processing have also spurred LDCT denoising research. In this paper, we propose a deep learning algorithm for LDCT denoising, demonstrating superior noise reduction and detail preservation in experimental results. The primary contributions of this study are as follows:

(1) This paper introduces a noise extraction module that integrates multiple attention mechanisms, including spatial attention, channel attention, and scale attention. These diverse attention mechanisms enhance the network’s ability to effectively extract and suppress noise by focusing on different dimensions and scales of the image. Spatial attention captures critical regions and structural information, channel attention amplifies significant feature channels, and scale attention processes details at various resolutions. This comprehensive approach allows the network to handle various noise scenarios more effectively compared to traditional denoising networks.

(2) Recognizing the limitations of traditional VGG networks in extracting CT image features, this paper innovatively uses an Autoencoder model to retrain on the specific project dataset. The encoder part of the Autoencoder is then employed for perceptual loss calculation. This method enables the model to measure the differences between generated and real images in a high-dimensional feature space more accurately. As a result, the generated CT images not only visually resemble the real images but also retain crucial structural and textural details, leading to superior denoising quality.

(3) Based on the prior information of noise distribution, we also constructed a denoising system including a generator network, a discriminator network, and a feature extraction AutoEncoder network. The generator network is responsible for removing noise and generating high-quality CT images, while the discriminator network helps distinguish between real images and denoised images to improve the generation effect. The feature-extracted AutoEncoder network further enhances the preservation of image details and the integrity of diagnostic information.

The rest of this paper is organized as follows. Section 2 reviews related work, covering projection domain data processing methods, iterative reconstruction methods, and post-processing methods. Section 3 describes the materials and methods used in this study, including the establishment of the LDCT noise reduction model, the LDCT image noise extraction network, the perceptual loss calculation module, the design of the noise reduction network, its generator and discriminator components, and the training process along with the loss function. Section 4 presents the experimental data and evaluation indicators, details the experiments conducted on the noise extraction network module and the noise reduction network, and analyzes the experimental results. Section 5 discusses the implications of our findings, compares them with previous works, and highlights the limitations of our approach. Finally, Section 6 concludes the paper with a summary of our contributions and suggests future research directions.

2. Related Works

The current methods for realizing the LDCT denoising task are broadly classified into three methods: (1) projection domain data processing methods; (2) iterative reconstruction methods; and (3) post-processing methods.

2.1. Projection Domain Data Processing Methods

The projection domain data processing method is a pre-processing algorithm, which is a method of filtering the projection data and then removing the noise in it, so that it is close to the projection data of a normal dose, and then reconstructing the data of these projection data to obtain the CT image. The projection data mentioned here refer to the data received by the detector, and the reconstruction method is commonly used in the filtered back-projection (FBP) algorithm [8]. This type of algorithm can make good use of the statistical properties of the noise in the projection domain, and the algorithms commonly used at present can be divided into two kinds: nonlinear filtering and statistical iterative denoising.

(1) Nonlinear filtering algorithm. Wang et al. [9] first improved the original anisotropic diffusion filter and NLGC format by combining the noise characteristics of LDCT sinograms for statistically based noise reduction, correcting the data information that needs to be considered in the diffusion process using statistical methods. In the improved version, the estimation of the noise level parameters is spatially adaptive, which is determined by the variance of the pixels in the diffusion process. The improved anisotropic diffusion filter and the NLGC method were demonstrated by computer simulations and body-mode experiments simulating LDCT studies. Ehman et al. [10] found that noise reduction achieved by using bilateral filtering in the projection space can improve the significance of liver lesions while reducing the radiation dose.

(2) Statistical iterative algorithm. Hsieh et al. [11] proposed a Radon space adaptive filtering method based on the local statistical properties of CT projections. The noise characteristics of the pre-processed projection samples are first modelled. Then, the filter is designed and the filter parameters are dynamically adjusted to fit the local noise characteristics. Due to the adaptive nature of the filter, a proper balance between streak artefact suppression and spatial resolution preservation is achieved. The results show that the adaptive filtering method can effectively reduce or eliminate the artefacts caused by quantum noise in CT. At the same time, the effect on spatial resolution is kept at a low level. Li et al. [12] proposed a penalized likelihood method for smoothing sinusoidal maps to obtain a set of nonlinear equations, which can be solved by an iterative conditional mode (ICM) algorithm in a reasonable computation time. The projection domain method is a pre-processing algorithm before reconstruction compared to other algorithms, but this algorithm relies on projection data, which are generally difficult to access, so this type of algorithm is difficult to generalize in clinical practice.

2.2. Iterative Reconstruction Methods

The iterative reconstruction method solves an ill-posed inverse problem by optimizing a regular objective function, which is constructed based on the noise characteristics of the projection data and the a priori structural information of the reconstructed image, according to the statistical properties of the projection data. The general statistical iterative models are based on three basic models: maximum likelihood estimation, least squares estimation, and maximum posterior probability estimation.

Different a priori penalty terms are constructed based on the characteristics of the reconstructed image, and adding them to the statistical iterative model can effectively improve the quality of the reconstructed image. In the past decade, many kinds of a priori penalty terms have been proposed. Chen et al. [13] exploited sparse dynamic CT image sequences with spatio-temporal correlation in dynamic CT imaging and applied the newly proposed compression-aware (CS) reconstruction method to reconstruct the target image sequences. A priori images reconstructed from concatenated sets of interwoven dynamic datasets are utilized to constrain the reconstruction of CS images at individual time frames. This method is called a priori image constrained compressed sensing (PICCS). The effectiveness of the PICCS algorithm was verified by in vivo animal experiments, which showed that PICCS was able to accurately reconstruct dynamic CT images. Wu et al. [14] proposed a feature-constrained compression-aware (FCCS) image reconstruction algorithm based on feature constraints, which utilizes a priori knowledge extracted from a clinical database to improve the quality of the images. The database consists of instances that are similar but not necessarily identical to the target image. The features of the training image are extracted using robust principal component analysis and the target image is sparsified. These features form a low-dimensional linear space with constraints on the distance between the image and the space. Sidky et al. [15] proposed an iterative algorithm for image reconstruction of a conical beam scanner based on recent research results in compressed sensing. The iterative reconstruction method constructs the objective function based on the noise characteristics of the projection data and the a priori structural information of the reconstructed image, so the reconstructed CT images are of better quality. However, the computational complexity of such methods is high, which causes it to take a long time, and from a practical point of view, the reconstruction algorithms for CT machines currently used in most hospitals are based on the FBP algorithm, so the iterative reconstruction algorithm also has great limitations in clinical practice.

2.3. Post-Processing Methods

Post-processing algorithms are methods to improve the quality of LDCT images by constructing models based on the characteristics of LDCT images. The main advantages of this algorithm are that it does not depend on projection data, it can be flexibly applied to different scanning systems, and it is fast to process [16]. The goal of natural image processing algorithms is to remove as much noise as possible while maintaining structural detail, while the goal of LDCT image post-processing algorithms is to eliminate noise and artefacts while maintaining structural detail of the tissue, so that the processed LDCT image is close to the conventional-dose CT image, which makes it easy for doctors to make a diagnosis.

Borsdorf et al. [17] proposed a new wavelet transform-based structure preserving CT image denoising method that can be used in conjunction with different reconstruction methods. The method is based on the assumption that data can be decomposed into informative and temporally uncorrelated noise. Quantitative and qualitative evaluations based on body modelling and real clinical data have shown that a high noise reduction rate of about 40% can be achieved without reducing the image resolution. Ramirez Giraldo et al. [18] investigated two image-space-based nonlinear noise reduction filters: the bilateral filter (BF) and the non-local mean (NLM) algorithms. At the same time, the authors suggest that care must be taken when choosing the NLM parameters in order to minimize the generation of artefacts that may compromise the diagnostic value. Chen et al. [19] proposed a large-scale neighborhood weighted intensity averaging (WIA-LN)-based method for LDCT image improvement. In the implementation of the proposed WIA-LN method, the processed pixel intensities are derived from the selective weighted intensity averaging of pixels belonging to different organs or attenuated tissues in the large-scale neighborhood. Noise and artefacts in LDCT images are effectively suppressed without significant loss of anatomical features. Chen et al. [20] proposed a two-step processing scheme, called the “artefact suppression large-scale non-local method”, for suppressing noise and artefacts in chest LDCT images. Specific scale and orientation properties are utilized to distinguish noise and artefacts in the image structure. A parallel implementation was used to increase the overall processing speed by more than 100 times. Wang et al. [21] proposed a new fractional-order differential model, which is based on a weighted combination of a fractional-order PM model and a fractional-order TV model, maintaining the advantages of the PM model, the TV model, and the fractional-order differential model. In addition, local intensity variance is added to the weighting coefficients and diffusion coefficients of the model to maintain the edges and details of the model. In addition, the hotness of deep learning has also made the post-processing methods hotter in recent years. Due to the rapid development of deep learning on image classification, image segmentation, and image denoising tasks, more and more teams are focusing on the performance of deep learning on medical image denoising tasks. The results are promising so far. In mid-2019, the U.S. Food and Drug Administration (FDA) approved two CT providers that can use deep learning techniques in their own CT image reconstruction [22,23,24].

In summary, there are many shortcomings in the current methodology, as shown in Table 1. Projection Domain Data Processing Methods can directly process CT projection data, but these methods are very sensitive to noise and artifacts. Despite high computational efficiency, they offer limited improvement in image quality and struggle to handle the high noise levels in low-dose CT images effectively. Iterative reconstruction methods like MAP-NN (Maximum A Posteriori Neural Network) significantly enhance denoising by incorporating prior image knowledge, resulting in improved image quality. But these methods typically involve complex models and high computational costs, making them unsuitable for real-time applications. Additionally, they heavily rely on prior knowledge and may have limited generalization ability across different datasets. Post-processing methods directly handle reconstructed CT images, significantly improving image quality and detail. These methods are sensitive to high noise levels, and some GAN-based methods may have unstable training processes and require substantial computational resources. Despite their effectiveness in image post-processing, further optimization is needed for high-noise scenarios.

3. Materials and Methods

3.1. Establishment of LDCT Noise Reduction Model

In order to perform noise reduction modeling on LDCT images from the above perspective, it is first necessary to clarify the relationship between LDCT images and standard dose CT images.

x \in R^{m \times n}

represents an LDCT image,

y \in R^{m \times n}

represents the corresponding standard dose CT image, and

z \in R^{m \times n}

represents an LDCT image containing noise, where m and n represent the width and height of the CT image, respectively. Then, it can be naturally found that the relationship can be expressed as the following equation:

y = x - z

(1)

z includes quantum noise, electrical noise, scattering noise, system noise, etc. This means that the standard dose CT image y can be regarded as the result of the LDCT x minus a noise term z. Therefore, the goal of the noise reduction task is to remove the noise terms as much as possible by processing x and restore an image close to y. Therefore, the noise reduction process is to calculate a function f that satisfies Equation (2).

arg min_{f} {∥ f (x) - y ∥}_{2}^{2}

(2)

where f represents the CT image reconstructed from LDCT. Among them, the optimization objective function can use the mean square error (MSE) and L1 loss function, structural similarity index (SSIM), perceptual loss (PL), adversarial loss (AL), total variation loss (TVL), and so on to optimize this problem.

3.2. LDCT Image Noise Extraction Network

The U-Net structure is a classical network structure in the field of medical image segmentation, employing the encoder–decoder mode and the skip connection structure to extract the feature maps at multiple scales for efficient and accurate image processing. UNet was initially designed for the task of medical image segmentation, especially for cellular image segmentation. Its structure consists of symmetric encoder and decoder sections connected by a skip connection. The role of the encoder in the deep learning model is not only to reduce the resolution of the input image, but also to aggregate global semantic information through downsampling and preserve image details by directly passing high-resolution features. This dual role of feature extraction and delivery enables the model to both understand the overall semantics and accurately segment the details in the image in complex segmentation tasks. The low-resolution information obtained through multiple downsampling provides rich contextual semantic information that helps the network understand the overall structure and category information of the image. The high-resolution information passed directly from the encoder to the decoder preserves the details and edge features of the image, ensuring accuracy during segmentation.

In order to further improve the performance of U-Net in the LDCT image noise reduction task, we introduced various attention mechanisms and modules to improve the network, and Figure 1 shows the overall network structure for our proposed noise extraction. The network aims to accurately separate and extract the noise distribution from the input LDCT images and output the corresponding noise images. The whole network structure cleverly combines multi-layer convolution operation, residual joining, multi-scale feature fusion, and multiple attention mechanisms (including spatial attention and channel attention), and introduces a scale adjustment module in the output stage to achieve fine noise extraction. The goal of this network is to extract a well-defined noise distribution from it through a series of processing steps.

Similarly to U-Net, the network maintains symmetry in the encoding and decoding parts, ensuring that features can be fully extracted and recovered in a layer-by-layer downsampling and upsampling process. Figure 1 is the overall network structure of noise extraction. The input to the network is a

256 \times 256

LDCT image, and the input image is first passed through a

1 \times 1

convolutional layer (Conv

64 \times 1 \times 1

), increasing the number of channels from 1 to 64, with the output image size remaining constant. During the encoding process, the network progressively extracts features from the image through a series of convolutional operations, while varying the number of channels of the feature map: from 64 to 128, then to 256, and finally to 512 (each time with a convolutional kernel size of

1 \times 1

). The feature maps output from each convolutional module are fused with subsequent feature maps through an addition operation (blue “+” symbols), similar to U-Net’s skip connections, to ensure that important details are preserved during the encoding process. The decoding process of the network gradually restores the feature map from a high level (512 channels) to the original input scale (64 channels) through a series of convolutional operations. Each upsampling operation not only adjusts the number of channels, but also fuses the previous layer’s feature map with the corresponding decoding layer’s features (additive operation in the figure), similar to U-Net’s deconvolution and splicing operations. At each stage of the encoder and decoder, the feature maps are combined by concatenate operations, similar to the lateral connections in U-Net. These connections help the network to utilize the rich features of the encoder stage in the decoding process, maintaining high-resolution information. In the decoding phase, the network introduces a spatial attention mechanism for highlighting important spatial locations in the feature map (labelled “SA” in purple). The channel attention mechanism is used to enhance important channels in the feature map (orange labelled “CA”). Scale Attention (SCALE-A): the scale attention mechanism further combines multi-scale features to spatially weight the features (yellow marker “SCALE-A”). The addition of these attention mechanisms allows the network to more accurately focus on important features and locations during the noise extraction process. After all the convolutional and attentional processing, the final feature map is passed through a Sigmoid activation function that outputs a

256 \times 256

map of the noise distribution.

3.3. Perceptual Loss Calculation Module

With the idea of the WGAN-VGG algorithm, a perceptual loss computation module needs to be added to the network in order to make the images generated by the network more compatible with human vision. Considering that the pre-trained VGG19 [35] is a model pre-trained on the natural dataset ImageNet, there exists the possibility that VGG19 can only extract features from natural images, but retraining on the medical image domain using VGG19 is again difficult because VGG19 is a network for classification and there is no categorical dataset available on medical images, which makes it impossible to retrain this network. At this point, another network, the Autoencoder model [36,37], is considered. It has a very good ability to extract feature representations of the data, and has a wide range of applications in image reconstruction, clustering, and machine translation. Moreover, since this model is very simple to train, ideally, the input image can be recreated from the input itself after going through the Autoencoder model. So this makes it very suitable for feature extraction of medical images. This means that this model can be used instead of the VGG module in WGAN-VGG to compute another perceptual loss. On this topic, the perceptual loss calculation formula using Autoencoder is shown in Equation (3). That is, the MSE between LDCT and NDCT is calculated between the feature maps obtained after the pre-trained encoder of Autoencoder.

L_{perc} = E_{(I_{L D}, I_{N D})} \frac{{∥ϕ (I_{L D}) - ϕ (I_{N D})∥}_{F}^{2}}{C H W}

(3)

In Equation (3),

I_{L D}

refers to the image of an LDCT,

I_{N D}

stands for the image of an NDCT,

ϕ ()

stands for the function fitted to the coding part of the Autoencoder, and

C, H

, and W stand for the number of channels, width, and height, respectively, of the last layer of the coding part.

In this project, we use an Autoencoder, as shown in Figure 2. In Figure 2, the input is a (1, 256, 256) image, and then there are two convolutional layers in the network, each with 64 kernels. At this point, there is a maximum pooling layer, maxpool, which is used to reduce the width and height of the features to one-half of their original width and height. There are two convolutional layers with 128 kernels per layer, also followed by a maximum pooling layer, which serves the same purpose as above. Then there are 4 convolutional layers with 256 kernels per layer. This is the encoder part. The perceptual loss is calculated as the gap between the final features of the encoder, and the decoder part of the Autoencoder is a symmetrical structure to the encoder part, except that the inverse convolution is used to correspond to the pooling layer in the encoder. After inputting images into the network, just compare the output with the input itself and calculate the MSE of the input and output images as a loss function for training.

3.4. Noise Reduction Network Design

The overall architecture of the proposed denoising network is shown in Figure 3, which mainly includes the generator network, the discriminator network, and the AutoEncoder network for feature extraction. The generator network is responsible for generating high-quality images from LDCT images, while the discriminator network is used to judge the authenticity of the generated images. In the entire network, short connections and deconvolution layers are introduced to accelerate the training process and retain more image details. The specific structures of the generator network and discriminator are shown in Figure 4 and Figure 5, which use symmetrically arranged convolutional layers and deconvolution layers to explore the ability of the autoencoder to handle noise samples.

In the network, the Autoencoder network is used as a feature extractor, mainly used to calculate perceptual loss. By extracting features through the Autoencoder network, the similarity between the generated image and the real image in the high-level feature space can be more accurately measured, thereby improving the quality of the generated image. The discriminator network is used to distinguish between the generated image and the real image. Through adversarial training with the generator network, the generator can generate more realistic images. In general, the network successfully improved the quality and detail retention of LDCT images through adversarial training of the generator and the discriminator, as well as the encoder–decoder structure with short connections and deconvolution layers.

Then, the follow-up is a variant of the WGAN-GP network. At this time, the training goal of the network becomes to minimize the Wasserstein distance between the CT images generated by the generator and the real CT images. Of course, as introduced in Section 1, in order to make the network training more stable, a gradient penalty term is used. In addition, as introduced before, in order to make the images generated by the network more in line with human visual perception, perceptual loss needs to be added. The final optimization goal is as shown in Equation (4).

Based on the proposed network, we further introduced WGAN-GP (Wasserstein Generative Adversarial Network with Gradient Penalty) as a variant of the network to further improve the quality of generated images and the stability of training. The WGAN-GP network optimizes the performance of the generator by minimizing the Wasserstein distance between the CT images generated by the generator and the real CT images. This method can better measure the difference between the generated images and the real images, thereby improving the authenticity of the generated images.

With the help of the idea of the WGAN-VGG algorithm, the network framework of WGAN-GP is also used in this project. The reason is that the network framework of WGAN-GP can make the training process more stable. As can be seen in Figure 3, the overall network can be divided into four main parts, which are the noise extraction module, the perceptual loss calculation module, the generator module, and the discriminator module. The generator contains nine layers of convolutional layers; the first eight layers have

64 \times 3 \times 3

convolutional kernels, of course, in order to generate a CT image, so the last layer is a convolutional layer with output channel 1. The discriminator network contains seven convolutional layers, and three fully connected layers are used to output a discriminative result. The number of neurons in the three fully connected layers is 2048, 1024, and 1, respectively.

Then, the follow-up is a variant of the WGAN-GP network. Then, at this point, the training goal of the network becomes to minimize the Wasserstein distance between the CT image generated by the generator and the real CT image. To make the network training more stable, a gradient penalty term is used. In addition, as previously described, a perceptual loss is added to make the images generated by the network more compatible with the human eye. The final optimization objective is also shown in Equation (4).

x^{'}

represents the input to the generator and y represents the NDCT.

\hat{x}

represents the sampling of a distribution between the true distribution and the generated distribution. The overall training process is shown in Algorithm 1.

\begin{matrix} min_{G} max_{D} L_{W G A N} = - E_{y} [D (y)] + E_{x^{'}} [D (G (x^{'}))] + \\ λ E_{\hat{x}} [{({∥\nabla_{\hat{x}} D (\hat{x})∥}_{2} - 1)}^{2}] + L_{perc} \end{matrix}

(4)

Algorithm 1 Training process of the noise reduction algorithm

Require:: Set hyperparameter $epoch = 300$ , $λ_{1} = 0.1$ , $λ_{2} = 1$ , $d_{iter} = 4$ , patch size $256 \times 256$ , initialize generator parameter $W_{G}$ , initialize discriminator parameter $W_{D}$ , Autoencoder pre-training parameter, Noise extraction module parameter $N o i s e_{extractor}$
Ensure:: Trained final noise reduction network
1:: for ${num}_{epoch} = 0, \dots, epoch$ do
2:: Sample LDCT, NDCT of patch size from the training set. Note LDCT as x, NDCT as y
3:: for $t = 1, \dots, d_{iter}$ do
4:: $m a s k \leftarrow N o i s e_{extractor} (x)$
5:: $x^{'} \leftarrow x + m a s k \cdot γ$
6:: $\hat{x} = ϵ y + (1 - ϵ) \cdot G (x^{'})$
7:: $L_{D} = D (G (x^{'})) - D (y) + λ_{2} {(∥ \nabla_{\hat{x}} D (\hat{x}) ∥_{2} - 1)}^{2}$
8:: end for
9:: Updating Network D Parameters:
10:: $L_{G} = - D (G (x^{'})) + λ_{1} L_{perc} (G (x^{'}), y) u p \leftarrow FindCompress (I m [i - 1,])$
11:: Updating Network G Parameters:
12:: end for

3.5. Noise Reduction Network’s Generator

The generator network uses an encoder–decoder structure consisting of 8 layers, including 4 convolutional layers and 4 deconvolutional layers. Each convolutional layer and the corresponding anti-convolutional layer are connected by a short connection to ensure that more feature information is retained during the deconvolution process. The introduction of such short connections not only accelerates the training process, but also effectively preserves the detailed information of the image and improves the reconstruction quality. Specifically, the design of the generator network is shown in Figure 4, where the encoder part extracts features layer by layer, while the decoder part recovers the image layer by layer. Each layer of the encoder performs 3D convolution, batch normalization (Batch Normalization), and LeakyReLU operations, and the inverse convolution layer performs the same operations in turn, except for the last layer, which performs only convolution and LeakyReLU operations.

The convolutional kernel sizes of the generator network are all set to

3 \times 3 \times 3

, and the number of filters is arranged in the order of 32, 64, 128, 256, 512, 256, 128, 64, 32, and 1. This design ensures the flexibility and effectiveness of the network in dealing with different layers of features. In the encoder part, the convolutional layer gradually increases the number of filters so as to extract higher-level feature information layer by layer; in the decoder part, the inverse convolutional layer gradually decreases the number of filters to recover detailed information of the image layer by layer. Batch normalization and LeakyReLU activation are performed after each layer of convolution and deconvolution operations to stabilize the training process and enhance the nonlinear representation of the network.

By connecting the corresponding convolutional and deconvolutional layers through short connections, the generator network can effectively retain feature information during the deconvolution process to avoid information loss. In addition, the batch normalization in the network further improves the training stability and convergence speed, and the LeakyReLU activation function avoids the neuron “death” problem and enhances the robustness of the network. Through this design, the generator network can retain more details when processing LDCT images and improve the reconstruction quality of the images. In conclusion, the RED-WGAN generator network can effectively improve the quality and diagnostic value of LDCT images through reasonable structural design and optimization strategies.

3.6. Noise Reduction Network’s Discriminator

The structure of the discriminator network is shown in Figure 5 and consists of four convolutional layers and one fully connected layer. Its main function is to judge whether the input image is a real image or a generated image. In the network design, the convolution kernel size of all convolutional layers is set to

3 \times 3 \times 3

to ensure the accuracy of feature extraction. Specifically, the first convolutional layer contains 32 filters, the second convolutional layer contains 64 filters, the third convolutional layer contains 128 filters, and the fourth convolutional layer contains 256 filters. By increasing the number of filters layer by layer, the discriminator is able to gradually extract different levels of features of the image and enhance the discriminative ability.

In the discriminator network, each convolutional layer is connected to an activation function and a pooling layer, with the purpose of extracting and compressing the feature map to improve the compactness of the feature expression and the discriminative efficiency. The convolution layer extracts the spatial features of the image through 3D convolution operation, while the pooling layer reduces the size of the feature map through downsampling operation to avoid overfitting and improve the computational efficiency. After three layers of convolutional processing, the feature map enters the fully connected layer. The fully connected layer integrates the previously extracted features and outputs a discriminative result. The output value indicates the probability that the input image is a real image or a generated image, and the discriminator’s discriminative ability is gradually improved through adversarial training with the generator network.

3.7. The Training Process and the Loss Function

Since the noise extraction module and the perceptual loss calculation module in the network have been trained before, the training here is to train a generator G and a discriminator D. The generator network is responsible for receiving the

x^{'}

computed in Equation (5), and then generating a fake NDCT image corresponding to the input. This “fake” NDCT image and the real NDCT image are then fed into the encoder part of the Autoencoder and the perceptual loss

L_{p e r c}

is calculated, as shown in Equation (3). In addition, another important loss is the calculation of generator loss

L_{G}

and discriminator loss

L_{D}

, as shown in Equation (5) as well as Equation (6), respectively.

λ_{1}

,

λ_{2}

are set to 0.1 as well as 1 in this experiment, respectively. Here,

\hat{x}

refers to the sampling of a distribution between the true distribution and the generated distribution, which can be calculated by Equation (7), where

ϵ

is a random number between 0 and 1.

L_{G} = - D (G (x^{'})) + λ_{1} L_{perc}

(5)

L_{D} = D (G (x^{'})) - D (y) + λ_{2} {({∥\nabla_{\hat{x}} D (\hat{x})∥}_{2} - 1)}^{2}

(6)

\hat{x} = ϵ y + (1 - ϵ) \cdot G (x^{'})

(7)

In addition, it should be noted that the parameter

d_{i t e r}

times of the discriminator need to be trained and updated first in the training process of WGAN-GP, and

d_{i t e r}

= 4 is chosen in this experiment.

4. Results

4.1. Experimental Data

The AAPM (American Association of Physicists in Medicine) dataset is a standard dataset specifically used for LDCT image reconstruction and denoising. The dataset provides researchers and engineers with a wealth of real-world LDCT scan data for developing and validating various algorithms and techniques.

The AAPM dataset is mainly derived from the data of the 2016 Low-Dose CT Grand Challenge of the American Association of Physicists in Medicine. These data are provided by several top hospitals and research institutions and contain CT scan images of the human body, covering different parts such as the chest and abdomen. The samples in the AAPM-Mayo dataset are shown in Figure 6, where (a) is an LDCT abdominal image and (b) is an NDCT abdominal image.

The CT images in the AAPM dataset are stored in the DICOM (Digital Imaging and Communications in Medicine) format. This is a standard format for medical imaging and contains the image itself as well as a large amount of scan-related metadata (such as scan parameters, patient information, etc.). The size of CT images is usually rectangular, with a typical slice size of

(512 \times 512)

pixels. The thickness of these slices (i.e., the distance between every two slices) may vary from 0.5 mm to 5 mm, depending on the scan settings and the target area. The AAPM-Mayo dataset contains chest CT images from 10 patients. Each patient’s scan includes both standard dose and quarter dose images. Each patient contains multiple slices of CT images, the exact number of which varies from patient to patient. Typically, each patient’s scan produces several hundred image slices. The number of image slices for a standard dose and quarter dose is the same. Due to the variation in scan data for each patient, the exact total number of images may vary, but it can be estimated that there are several hundred image slices per patient. The AAPM-Mayo Low-Dose CT dataset has approximately 6000 images in total.

4.2. Evaluation Indicators

Evaluation indicators include PSNR, SSIM, and frechet inception distance (FID). PSNR is used to measure the difference between two images, such as the difference between a compressed image and an original image, to evaluate the quality of a compressed image, or the difference between a restored image and ground truth, evaluate the performance of a restoration algorithm, etc. PSNR is defined as shown in Equation (8), where

M A X_{I}

is the maximum possible pixel value of the image, and MSE is the mean squared error between the original and the reconstructed image, defined as in Equation (9). I and K are the original and reconstructed images, respectively, and m and n are the dimensions of the images.

PSNR = 10 {log}_{10} (\frac{M A X_{I}^{2}}{MSE})

(8)

MSE = \frac{1}{m n} \sum_{i = 0}^{m - 1} \sum_{j = 0}^{n - 1} {[I (i, j) - K (i, j)]}^{2}

(9)

SSIM is based on the assumption that the human eye extracts structural information from an image, and is more in line with human visual perception than traditional methods. SSIM is a perceptual metric that quantifies the image quality degradation caused by processing such as data compression or transmission loss. It considers changes in structural information, luminance, and contrast. SSIM is defined as shown in Equation (10), where x and y are the two image patches being compared,

μ_{x}

and

μ_{y}

are the mean intensities,

μ_{x}^{2}

and

μ_{y}^{2}

are the variances,

σ_{x y}

is the covariance of x and y.

C_{1}

and

C_{2}

are small constants to stabilize the division with weak denominators.

SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}

(10)

The FID indicator, also known as frechet inception distance, is a metric used to evaluate the effect of a generative model. It calculates the distance between two image distributions based on the Inception network. The significance of the FID indicator is that it can quantify the similarity between the image generated by the generative model and the real image, helping us to judge the effect of the model. FID is defined as shown in Equation (11), where

μ_{r}

and

μ_{g}

are the mean feature vectors of the real and generated images, respectively,

Σ_{r}

and

Σ_{g}

are the covariance matrices of the real and generated images, respectively, and Tr denotes the trace of a matrix.

FID = {∥μ_{r} - μ_{g}∥}^{2} + Tr (Σ_{r} + Σ_{g} - 2 {(Σ_{r} Σ_{g})}^{\frac{1}{2}})

(11)

4.3. Noise Extraction Network Module Experiment

As shown in Figure 7, the input of the noise extraction network module is an LDCT image, and the output is the corresponding noise distribution map. In order to show the distribution of noise more intuitively, we binarized the noise features and generated a clear noise distribution map. The CT images input to the network are pre-processed low-dose scan data. These images introduce significant noise during the imaging process, and as the input of the network, the purpose is to extract and suppress this noise through network processing. The output label is the binarization result of the noise in the LDCT image. By representing the noise part in the form of a binary map, the network can more intuitively learn and identify the distribution characteristics of the noise.

To evaluate the independent contribution of each attention module, we removed each module separately and recorded the change in model performance. We compared the performance of the complete model, which includes all attention modules, with the performance of the model after removing a single attention module. In these experiments, we tested different combinations of the three types of attention modules. By comparing the performance of these combinations, we can more clearly understand the contribution of each module to the model’s denoising performance. Six additional networks were designed, each representing a different combination of attention modules. The specific combinations are as follows: SA (Spatial Attention only), CA (Channel Attention only), LA (Scale Attention only), SACA (Spatial + Channel Attention), SALA (Spatial + Scale Attention), CALA (Channel + Scale Attention), MANet (all attention).

These six networks were retrained using the same experimental configuration as before (including the same hyperparameter settings and loss functions). To observe the performance of these networks during training, we recorded the change in their training loss values over time. Figure 8 shows the loss reduction graphs during the training process for different combinations of attention modules. As can be seen from Figure 8, all networks showed a significant reduction in loss during the early stages of training (approximately the first 15 epochs). However, around the 20th epoch, the loss values for most networks began to stabilize and it became difficult for them to decrease further. This indicates that the networks had approached their optimal performance under the current configuration by this time.

As seen in Figure 8, the MANet model, which includes all three attention modules (spatial, channel, and scale attention), demonstrates the best performance in terms of both training loss and L1 loss on the test set. The SA, CA, and LA models, each utilizing only one type of attention module, showed some improvement in denoising performance. However, their performance did not match that of networks combining two or three attention modules. Among them, the SA module performed the best, indicating that spatial attention has a significant advantage in handling multi-scale noise. The SACA, SALA, and CALA combinations outperformed the single attention modules, demonstrating that the synergy of multiple attention mechanisms can effectively enhance denoising performance. The SACA combination performed nearly the best among all combinations, validating the advantage of combining spatial and channel attention for LDCT image denoising. The full MANet model, integrating spatial, channel, and scale attention modules, achieved the lowest training loss and test loss values. This indicates that a comprehensive consideration of multiple attention mechanisms can more thoroughly address and suppress noise in LDCT images.

Figure 9 shows the prediction results of the network with different attention mechanism combinations on the test image. Each column in the figure corresponds to a network model or attention mechanism combination. The red rectangle and the enlarged details show the detection effect of each model on lung nodules. The MANet model proposed in this paper performs best in noise control and detail restoration, and is closest to the true label. The network with different attention mechanisms also has significant improvements in detection effect, but each has its own emphasis. SA+CA and CA+LA combinations perform relatively balanced in detail restoration and noise control. SA+LA combination performs well in details, but noise control is slightly worse. The attention mechanisms (SA, CA, LA) used alone are not as effective in detail restoration and noise control as the combination, but each has its own advantages.

4.4. Noise Reduction Network Experiment

4.5. Experimental Results and Analysis

The AAPM-Mayo dataset contains chest CT image data of 10 patients. The scan data of each patient include standard dose and quarter dose images. Each patient has about 600 images, and the total data volume of the dataset is 6182 images. The data of seven people are used to train the model, the data of two patients are used as test data, and the data of one patient are used to verify the model effect. Our comparison methods include RED-CNN, WGAN-VGG, and WGAN-Autoencoder (WGAN-AE), and the test indicators include PSNR, SSIM, and FID.

It can be seen from Table 1 that for the PSNR and SSIM indicators, the WGAN-ATT-AE network proposed in this research is the highest. From the perspective of the FID indicator, WGAN-VGG has the best effect. This may be because FID itself is also a network pre-trained on natural image datasets, so it is expected that the WGAN-VGG network using the VGG network will have the smallest FID. However, based on the three indicators, it can be seen that the network WGAN-ATT-AE proposed in this research has a lower FID than WGAN-VGG and has better PSNR and SSIM performance. Overall, it can be considered to be better than WGAN-VGG in terms of the noise reduction effect. The reason for achieving such results is probably that some improvements were brought about by the noise extraction module’s better fitting of LDCT noise and the pre-trained Autoencoder module’s better extraction of CT image features.

From Table 2, we can see that for PSNR metrics, WGAN-ATT-AE is the highest because the network optimizes the network parameters only for MSE losses. From the SSIM metric, our proposed WGAN-ATT-AE network has the highest average SSIM value. From the FID metric, WGAN-VGG has the best results; this may be due to the fact that FID itself is also a pre-trained network on a natural image dataset, so it is expected that the WGAN-VGG network using the VGG network will have the smallest FID. However, combining the three metrics, it can be seen that our proposed network WGAN-ATT-AE has a better PSNR as well as SSIM performance with a lower FID than WGAN-VGG, which can be considered better than the WGAN-VGG network for noise reduction in general. The reason for this result is probably the better fitting of LDCT noise by the noise extraction module and the better extraction of CT image features by the pre-trained Autoencoder module.

In order to further prove the effectiveness of the attention module in the noise extraction network proposed in this paper, ablation experiments were conducted on the three attention modules. Here, we used six additional networks to represent the permutations and combinations of the three attention modules. That is, there is only a spatial attention module, only a channel attention module, only a scale attention module, a network using spatial attention + channel attention, a network using spatial attention + scale attention, and a network using channel attention + scale attention. For ease of distinction, these six networks are denoted as SA, CA, LA, SACA, SALA, and CALA. After the design is completed, retraining these six networks using the same experimental configuration as before can obtain the training process loss reduction graph shown in Figure 8.

Figure 10 and Figure 11 illustrate the experimental results of our study on noise reduction in LDCT images using different methods. Figure 10 presents a side-by-side comparison of abdominal CT scans processed using different noise reduction methods. It includes (a) LDCT, (b) RED-CNN, (c) WGAN-VGG, (d) WGAN-AE, (e) WGAN-ATT-AE, and (f) NDCT. The WGAN-ATT-AE method shows a superior reduction in noise compared to other methods, closely approaching the quality of the normal dose CT. This is evident in the clearer visualization of anatomical structures. WGAN-ATT-AE maintains a high level of detail, similar to NDCT, while effectively reducing noise, unlike LDCT, which retains significant noise. Comparison with other methods: Compared to RED-CNN and WGAN-VGG, WGAN-ATT-AE and WGAN-AE provide better balance between noise reduction and detail preservation. WGAN-ATT-AE offers the most visually coherent and detailed image among the tested methods.

The proposed WGAN-ATT-AE method demonstrates superior performance in both noise reduction and detail preservation, as seen in the overall comparison and the enlarged view. This visual evidence suggests that WGAN-ATT-AE is highly effective for enhancing the quality of LDCT images, potentially improving diagnostic accuracy while maintaining the advantages of low radiation exposure. These visual comparisons highlight the significant improvements achieved by our proposed method in terms of noise reduction and image quality enhancement, making it a promising tool for clinical applications in LDCT imaging.

Figure 11 shows an enlarged view of a specific region from the same set of abdominal CT scans used in the first image, focusing on the details provided by each method. The region in the LDCT image shows a significant amount of grainy noise, which obscures finer details and makes it difficult to interpret anatomical structures accurately. While RED-CNN reduces some noise, it still leaves a considerable amount of granularity, and some details are smoothed out. The WGAN-VGG method reduces noise effectively but introduces a slight blurring, which could affect the clarity of fine structures. This method provides a clearer image than the previous methods, with better detail retention and reduced noise. Our proposed method shows the best performance, significantly reducing noise while preserving fine details and textures, closely matching the quality of the NDCT. As the ground truth, the NDCT image presents the highest quality with minimal noise and clear anatomical details.

We conducted qualitative and quantitative analysis, and we compared the effects of the proposed methods (WGAN-AE, WGAN-ATT-AE) and the baseline method (WGAN-VGG) on locally enlarged images. The results show that the important anatomical structure and detailed information in the image can be effectively maintained while reducing noise, while the baseline method shows a greater degree of information loss. This finding further validates the superiority of the proposed method in maintaining image quality and highlights its potential application value in clinical diagnosis. The proposed method can better maintain detail information and structural integrity in locally enlarged images, and presents a clearer and more accurate enlargement effect compared with baseline methods. This finding highlights the superiority of the proposed method in image enhancement and provides more reliable support for clinical diagnosis, providing a further comparison of the performance of different methods in preserving image details and structure.

In conclusion, the interpretation of the local magnification results in CT images in the comparative experiments provides a comprehensive evaluation of the performance of the proposed method in image enhancement tasks and highlights its superiority in maintaining image details and structural integrity. These findings not only help to verify the effectiveness of the proposed method, but also provide an important reference for its application in clinical practice and provide useful inspiration for future research and technology development.

5. Discussion

This study presents a novel supervised learning-based algorithm for LDCT denoising, distinguished by its innovative noise extraction techniques and multiple attention mechanisms. The primary finding is that our method significantly enhances LDCT image quality by effectively reducing noise while preserving critical diagnostic details. This advancement is pivotal for the field of medical imaging, as it enhances the diagnostic accuracy and safety of LDCT scans, potentially leading to improved clinical outcomes.

Our results align with and extend previous research in LDCT denoising. Traditional denoising methods, such as filtering and model-based approaches, often face challenges in balancing noise reduction with detail preservation. In contrast, our approach integrates spatial, channel, and scale attention mechanisms within a U-Net-like architecture, demonstrating superior performance in capturing diverse noise patterns and maintaining image integrity. Compared to existing deep learning-based methods, our algorithm shows improved noise suppression and image quality enhancement, as evidenced by extensive experiments on multiple standard datasets.

Although we retrained the autoencoder network to adapt to LDCT image characteristics, its feature extraction capabilities are still constrained by the training data. The autoencoder may not fully capture all relevant image features under different noise levels or image modalities. The perceptual loss, used to improve the visual quality of generated images, relies on the accurate representation of the input image by the autoencoder. Under high noise levels or non-standard conditions, the perceptual loss may not accurately reflect image quality, potentially affecting the results.

To build on our findings, future research should aim to validate our method on larger and more diverse clinical datasets to ensure robustness and generalizability. Additionally, exploring the integration of other image enhancement techniques and adapting our model to various clinical applications could further improve its performance and utility. By addressing these limitations, we can enhance the clinical diagnostic process, improve disease detection, and optimize treatment planning, ultimately ensuring high-quality diagnostic outcomes while minimizing patient radiation exposure.

6. Conclusions

This study presents a novel approach to noise reduction in LDCT images by integrating a retrained autoencoder network and perceptual loss computation within the framework of a Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP). Our findings demonstrate that this method significantly enhances image quality while effectively mitigating the noise associated with low radiation doses.Throughout this research, we addressed several key challenges in medical imaging. First, by employing autoencoder-based feature extraction, we ensured that our model could capture and preserve essential image details, even under noisy conditions. Second, the integration of GAN-based noise reduction techniques allowed us to leverage the powerful generative capabilities of WGAN-GP to produce high-fidelity images that closely approximate NDCT scans. This combination has shown notable improvements in PSNR, SSIM, and FID, highlighting its robustness and effectiveness.

Author Contributions

W.Z. devised the project, the main conceptual ideas and proof outline. F.J. worked out almost all of the technical details, and performed the numerical calculations for the suggested experiment. C.Y. worked out the bound for quantum mechanics, with help from A.S., who verified the numerical results by an independent implementation. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by National Natural Science Foundation of China (No. 62076080), Natural Science Foundation of ChongQing CSTB2022NSCQ-MSX0922.

Data Availability Statement

Data availability statements are available at https://github.com/sufangbing/desktop-tutorial (accessed on 30 October 2023).

Acknowledgments

We express our sincere gratitude to all individuals and institutions whose contributions made this research possible. We extend our appreciation to Rui Liang for their invaluable guidance and support throughout the project.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CT	Computer Tomography
LDCT	Low-Dose Computer Tomography
NDCT	Normal Dose Computer Tomography
FPB	Back-Projection
ICM	Iterative Conditional mode
CS	Compression-Aware
PICCS	Priori Image Constrained Compressed Sensing
FCCS	Feature-Constrained Compression-Aware
BF	Bilateral Filter
NLM	Non-local mean
WIA-LN	Weighted Intensity Averaging
FDA	Food and Drug Administration

References

Chen, H.; Zhang, Y.; Zhang, W.; Liao, P.; Li, K.; Zhou, J.; Wang, G. Low-dose CT via convolutional neural network. Biomed. Opt. Express 2017, 8, 679–694. [Google Scholar] [CrossRef]
Naidich, D.P.; Marshall, C.H.; Gribbin, C.; Arams, R.S.; McCauley, D.I. Low-dose CT of the lungs: Preliminary observations. Radiology 1990, 175, 729–731. [Google Scholar] [CrossRef]
Moen, T.R.; Chen, B.; Holmes, D.R., III; Duan, X.; Yu, Z.; Yu, L.; Leng, S.; Fletcher, J.G.; McCollough, C.H. Low-dose CT image and projection dataset. Med. Phys. 2021, 48, 902–911. [Google Scholar] [CrossRef]
Wolterink, J.M.; Leiner, T.; Viergever, M.A.; Išgum, I. Generative adversarial networks for noise reduction in low-dose CT. IEEE Trans. Med. Imaging 2017, 36, 2536–2545. [Google Scholar] [CrossRef]
Chen, H.; Zhang, Y.; Kalra, M.K.; Lin, F.; Chen, Y.; Liao, P.; Zhou, J.; Wang, G. Low-dose CT with a residual encoder-decoder convolutional neural network. IEEE Trans. Med. Imaging 2017, 36, 2524–2535. [Google Scholar] [CrossRef]
Yang, W.; Zhang, H.; Yang, J.; Wu, J.; Yin, X.; Chen, Y.; Shu, H.; Luo, L.; Coatrieux, G.; Gui, Z.; et al. Improving low-dose CT image using residual convolutional network. IEEE Access 2017, 5, 24698–24705. [Google Scholar] [CrossRef]
Liu, J.; Hu, Y.; Yang, J.; Chen, Y.; Shu, H.; Luo, L.; Feng, Q.; Gui, Z.; Coatrieux, G. 3D feature constrained reconstruction for low-dose CT imaging. IEEE Trans. Circuits Syst. Video Technol. 2016, 28, 1232–1247. [Google Scholar] [CrossRef]
Yang, L.; Li, Z.; Ge, R.; Zhao, J.; Si, H.; Zhang, D. Low-dose CT denoising via sinogram inner-structure transformer. IEEE Trans. Med. Imaging 2022, 42, 910–921. [Google Scholar] [CrossRef] [PubMed]
Fan, F.; Shan, H.; Kalra, M.K.; Singh, R.; Qian, G.; Getzin, M.; Teng, Y.; Hahn, J.; Wang, G. Quadratic autoencoder (Q-AE) for low-dose CT denoising. IEEE Trans. Med. Imaging 2019, 39, 2035–2050. [Google Scholar] [CrossRef]
Gholizadeh-Ansari, M.; Alirezaie, J.; Babyn, P. Deep learning for low-dose CT denoising using perceptual loss and edge detection layer. J. Digit. Imaging 2020, 33, 504–515. [Google Scholar] [CrossRef] [PubMed]
You, C.; Yang, Q.; Shan, H.; Gjesteby, L.; Li, G.; Ju, S.; Zhang, Z.; Zhao, Z.; Zhang, Y.; Cong, W.; et al. Structurally-sensitive multi-scale deep neural network for low-dose CT denoising. IEEE Access 2018, 6, 41839–41855. [Google Scholar] [CrossRef] [PubMed]
Zhao, T.; McNitt-Gray, M.; Ruan, D. A convolutional neural network for ultra-low-dose CT denoising and emphysema screening. Med. Phys. 2019, 46, 3941–3950. [Google Scholar] [CrossRef] [PubMed]
Ma, Y.; Wei, B.; Feng, P.; He, P.; Guo, X.; Wang, G. Low-dose CT image denoising using a generative adversarial network with a hybrid loss function for noise learning. IEEE Access 2020, 8, 67519–67529. [Google Scholar] [CrossRef]
Nishio, M.; Nagashima, C.; Hirabayashi, S.; Ohnishi, A.; Sasaki, K.; Sagawa, T.; Hamada, M.; Yamashita, T. Convolutional auto-encoder for image denoising of ultra-low-dose CT. Heliyon 2017, 3, 332–339. [Google Scholar] [CrossRef] [PubMed]
Heinrich, M.P.; Stille, M.; Buzug, T.M. Residual U-net convolutional neural network architecture for low-dose CT denoising. Curr. Dir. Biomed. Eng. 2018, 4, 297–300. [Google Scholar] [CrossRef]
Zhang, Y.; Yi, B.; Wu, C.; Feng, Y. Low-dose CT image denoising method based on convolutional neural network. Acta Opt. Sin. 2018, 38, 123–129. [Google Scholar]
Trung, N.T.; Trinh, D.H.; Trung, N.L.; Luong, M. Low-dose CT image denoising using deep convolutional neural networks with extended receptive fields. Signal Image Video Process 2022, 16, 1963–1971. [Google Scholar] [CrossRef]
Gu, J.; Ye, J.C. AdaIN-based tunable CycleGAN for efficient unsupervised low-dose CT denoising. IEEE Trans. Comput. Imaging 2021, 7, 73–85. [Google Scholar] [CrossRef]
Zhang, X.; Han, Z.; Shangguan, H.; Han, X.; Cui, X.; Wang, A. Artifact and detail attention generative adversarial networks for low-dose CT denoising. IEEE Trans. Comput. Imaging 2021, 40, 3901–3918. [Google Scholar] [CrossRef]
Zhao, T.; Hoffman, J.; McNitt-Gray, M.; Ruan, D. Ultra-low-dose CT image denoising using modified BM3D scheme tailored to data statistics. Med. Phys. 2019, 46, 190–198. [Google Scholar] [CrossRef]
Zhang, J.; Shangguan, Z.; Gong, W.; Cheng, Y. A novel denoising method for low-dose CT images based on transformer and CNN. Comput. Biol. Med. 2023, 163, 5673–5682. [Google Scholar] [CrossRef] [PubMed]
Sagheer, S.V.M.; George, S.N. Denoising of low-dose CT images via low-rank tensor modeling and total variation regularization. Artif. Intell. Med. 2019, 94, 1–17. [Google Scholar] [CrossRef]
Yi, X.; Babyn, P. Sharpness-aware low-dose CT denoising using conditional generative adversarial network. J. Digit. Imaging 2018, 31, 655–669. [Google Scholar] [CrossRef] [PubMed]
Li, H.; Yang, X.; Yang, S.; Wang, D.; Jeon, G. Transformer with double enhancement for low-dose CT denoising. IEEE J. Biomed. Health Informatics 2022, 27, 4660–4671. [Google Scholar] [CrossRef] [PubMed]
Yuan, Q.; Peng, Z.; Chen, Z.; Guo, Y.; Yang, B.; Zeng, X. Edge-preserving median filter and weighted coding with sparse nonlocal regularization for low-dose ct image denoising algorithm. J. Healthc. Eng. 2021, 1, 695–712. [Google Scholar] [CrossRef] [PubMed]
Wagner, F.; Thies, M.; Denzinger, F.; Gu, M.; Patwari, M.; Ploner, S.; Maul, N.; Pfaff, L.; Huang, Y.; Maier, A. Trainable joint bilateral filters for enhanced prediction stability in low-dose CT. Sci. Rep. 2022, 12, 306–313. [Google Scholar] [CrossRef]
Ma, J.; Huang, J.; Feng, Q.; Zhang, H.; Lu, H.; Liang, Z.; Chen, W. Low-dose computed tomography image restoration using previous normal-dose scan. Med. Phys. 2011, 38, 5713–5731. [Google Scholar] [CrossRef]
Nguyen, L.K.; Wong, D.D.; Fatovich, D.M.; Yeung, J.M.; Persaud, J.; Wood, C.J.; de Vos, D.; Mendelson, R.M. Low-dose computed tomography versus plain abdominal radiography in the investigation of an acute abdomen. ANZ J. Surg. 2012, 82, 36–41. [Google Scholar] [CrossRef] [PubMed]
Tian, Z.; Jia, X.; Yuan, K.; Pan, T.; Jiang, S.B. Low-dose CT reconstruction via edge-preserving total variation regularization. Phys. Med. Biol. 2011, 56, 49–59. [Google Scholar] [CrossRef]
Liu, J.; Zhang, T.; Kang, Y.; Wang, Y.; Zhang, Y.; Hu, D.; Chen, Y. Deep residual constrained reconstruction via learned convolutional sparse coding for low-dose CT imaging. Biomed. Signal Process. Control 2023, 85, 104–116. [Google Scholar] [CrossRef]
Chen, Y.; Shi, L.; Feng, Q.; Yang, J.; Shu, H.; Luo, L.; Coatrieux, J.L.; Chen, W. Artifact suppressed dictionary learning for low-dose CT image processing. IEEE Trans. Med Imaging 2014, 33, 2271–2292. [Google Scholar] [CrossRef] [PubMed]
Geraldo, R.J.; Cura, L.M.V.; Cruvinel, P.E.; Mascarenhas, N.D. Low dose CT filtering in the image domain using MAP algorithms. IEEE Trans. Med. Imaging 2016, 1, 56–67. [Google Scholar]
Chaudhari, A.; Chaudhary, P.; Cheeran, A.N.; Aswani, Y. Improving signal to noise ratio of low-dose CT image using wavelet transform. Int. J. Eng. Comput. Sci. 2012, 4, 779–787. [Google Scholar]
Lu, S.; Yang, B.; Xiao, Y.; Liu, S.; Liu, M.; Yin, L.; Zheng, W. Iterative reconstruction of low-dose CT based on differential sparse. Biomed. Signal Process. Control 2023, 79, 104–116. [Google Scholar] [CrossRef]
Yang, Q.; Yan, P.; Zhang, Y.; Yu, H.; Shi, Y.; Mou, X.; Kalra, M.K.; Zhang, Y.; Sun, L.; Wang, G. Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss. IEEE Trans. Med. Imaging 2018, 37, 1348–1357. [Google Scholar] [CrossRef]
Ming, J.; Yi, B.; Zhang, Y.; Li, H. Low-dose CT image denoising using classification densely connected residual network. KSII Trans. Internet Inf. Syst. 2020, 14, 2480–2496. [Google Scholar]
Liu, Y.; Zhang, Y. Low-dose CT restoration via stacked sparse denoising autoencoders. Neuro Comput. 2018, 284, 80–89. [Google Scholar] [CrossRef]

Figure 1. Overall network structure of noise extraction.

Figure 2. Autoencoder network structure diagram.

Figure 3. Schematic diagram of the overall structure of our proposed denoising network.

Figure 4. Generator diagram.

Figure 5. Discriminator diagram.

Figure 6. Samples from the AAPM−Mayo dataset, where (a) is an LDCT abdominal image and (b) is an NDCT abdominal image.

Figure 7. Input and label images for the noise extraction network.

Figure 8. Training loss curve.

Figure 9. Predicted results of networks with various attention mechanism combinations on test images.

Figure 10. Comparison of LDCT image denoising results using different methods.

Figure 11. Comparison of local magnification of low-dose CT image denoising results using different methods.

Table 1. Summary of LDCT image denoising research.

Methods	Representative Studies	Limitations
Projection domain	Median filtering [25], bilateral filtering [26], EM algorithm [27]	Sensitive to noise
Iterative reconstruction	TV reconstruction [28], sparse coding [29], dictionary training [30]	High cost
Post-processing method	Gaussian filter [31], wavelet transform [32], Fourier transform [33], RED-CNN [34],WGAN-VGG [4]	Need large dataset

Table 2. Results of different network models on the test set.

Model	PSNR	SSIM	FID
LDCT	24.854	0.73	45.093
RED-CNN	32.148	0.794	37.851
WGAN-VGG	28.530	0.841	25.875
WGAN-AE	28.736	0.847	27.748
WGAN-ATT-AE	32.631	0.858	26.082

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, W.; Salmi, A.; Yang, C.; Jiang, F. Innovative Noise Extraction and Denoising in Low-Dose CT Using a Supervised Deep Learning Framework. Electronics 2024, 13, 3184. https://doi.org/10.3390/electronics13163184

AMA Style

Zhang W, Salmi A, Yang C, Jiang F. Innovative Noise Extraction and Denoising in Low-Dose CT Using a Supervised Deep Learning Framework. Electronics. 2024; 13(16):3184. https://doi.org/10.3390/electronics13163184

Chicago/Turabian Style

Zhang, Wei, Abderrahmane Salmi, Chifu Yang, and Feng Jiang. 2024. "Innovative Noise Extraction and Denoising in Low-Dose CT Using a Supervised Deep Learning Framework" Electronics 13, no. 16: 3184. https://doi.org/10.3390/electronics13163184

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Innovative Noise Extraction and Denoising in Low-Dose CT Using a Supervised Deep Learning Framework

Abstract

1. Introduction

2. Related Works

2.1. Projection Domain Data Processing Methods

2.2. Iterative Reconstruction Methods

2.3. Post-Processing Methods

3. Materials and Methods

3.1. Establishment of LDCT Noise Reduction Model

3.2. LDCT Image Noise Extraction Network

3.3. Perceptual Loss Calculation Module

3.4. Noise Reduction Network Design

3.5. Noise Reduction Network’s Generator

3.6. Noise Reduction Network’s Discriminator

3.7. The Training Process and the Loss Function

4. Results

4.1. Experimental Data

4.2. Evaluation Indicators

4.3. Noise Extraction Network Module Experiment

4.4. Noise Reduction Network Experiment

4.5. Experimental Results and Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI