Reconstruction of Optical Coherence Tomography Images from Wavelength Space Using Deep Learning

Viqar, Maryam; Sahin, Erdem; Stoykova, Elena; Madjarova, Violeta

doi:10.3390/s25010093

Open AccessArticle

Reconstruction of Optical Coherence Tomography Images from Wavelength Space Using Deep Learning

¹

Faculty of Information Technology and Communication Sciences, Tampere University, 33720 Tampere, Finland

²

Institute of Optical Materials and Technologies, Bulgarian Academy of Sciences, 1113 Sofia, Bulgaria

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(1), 93; https://doi.org/10.3390/s25010093

Submission received: 11 November 2024 / Revised: 19 December 2024 / Accepted: 24 December 2024 / Published: 27 December 2024

(This article belongs to the Section Sensing and Imaging)

Download

Browse Figures

Versions Notes

Abstract

:

Conventional Fourier domain Optical Coherence Tomography (FD-OCT) systems depend on resampling into a wavenumber (k) domain to extract the depth profile. This either necessitates additional hardware resources or amplifies the existing computational complexity. Moreover, the OCT images also suffer from speckle noise, due to systemic reliance on low-coherence interferometry. We propose a streamlined and computationally efficient approach based on Deep Learning (DL) which enables reconstructing speckle-reduced OCT images directly from the wavelength (λ) domain. For reconstruction, two encoder–decoder styled networks, namely Spatial Domain Convolution Neural Network (SD-CNN) and Fourier Domain CNN (FD-CNN), are used sequentially. The SD-CNN exploits the highly degraded images obtained by Fourier transforming the (λ) domain fringes to reconstruct the deteriorated morphological structures along with suppression of unwanted noise. The FD-CNN leverages this output to enhance the image quality further by optimization in the Fourier domain (FD). We quantitatively and visually demonstrate the efficacy of the method in obtaining high-quality OCT images. Furthermore, we illustrate the computational complexity reduction by harnessing the power of DL models. We believe that this work lays the framework for further innovations in the realm of OCT image reconstruction.

Keywords:

image reconstruction; optical coherence tomography; speckle noise; time complexity

1. Introduction

Recent advancements in machine vision expedited by Deep Learning (DL) methods have revolutionized the fields of image enhancement, reconstruction, classification, and feature extraction with enormous applications in fields of biomedical or biological imaging [1] and healthcare [2]. The benchmark results have motivated researchers to use DL models like Convolution Neural Networks (CNNs) in several imaging modalities like Magnetic Resonance Imaging (MRI), Computed Tomography (CT), Optical Coherence Tomography (OCT), etc., to reconstruct not only visually pleasant high-quality images but to further leverage medically significant and elucidative scans with reduced computational complexity. OCT is one such three-dimensional (3D) imaging technique based on interference phenomena. It is a high-speed, non-invasive, and highly sensitive imaging modality. Nonetheless, certain inherent characteristics in the fundamental operation of OCT impose constraints on its imaging capabilities including resolution, depth penetration, signal-to-noise ratio (SNR), and time complexity. Amongst Fourier Domain (FD) types of the OCT systems, the Swept Source (SS) is an advanced OCT technology based on the sweeping laser source that enables real-time, 3D biomedical image acquisition at very high speeds [3,4,5]. The imaging performance is directly influenced by factors such as the sweep rate and range, and the instantaneous linewidth of the laser source. A common drawback in most FD-OCT systems, either swept source or spectral domain, is the necessity to have data linear in the wavenumber domain for the acquired OCT interference fringes before subjecting them to the inverse Fourier transformation.

The widely used approach to address the problem of calibration and linearization in wavenumber domain is to perform resampling by employing interpolation techniques [6]. Some methods utilize resampling methods employing linear [7] or cubic spline [8] interpolation, or the Kaiser–Bessel window function [9]. Another way is to perform reconstruction using techniques like Non-uniform Discrete Fourier Transform (DFT) [10], which can work with non-uniformly spaced data. Such techniques can potentially cause repercussions such as noise, artifacts, time complexity affecting the imaging rate, etc. Additionally, these methods require a remapping function estimated via a separate calibration procedure and could potentially necessitate extra hardware.

Calibration and k-mapping are challenging issues for contemporary high-speed systems like “MHz-OCT” or “multi-MHz-OCT” systems. These systems are categorized using their A-scan rates, and they use wavelength swept lasers as one of the benchmark contributors [11]. Several types of swept lasers are used, namely short cavity lasers [12], stretched pulse lasers [13], MEMS-VCSELs [14], and Fourier domain mode locked lasers (FDMLs) [15], to name a few. For all these ultrahigh A-scan rates, both resampling and calibration stand as vital data-processing prerequisites. Recently, diverse alternatives have been employed by scholars using hardware-based techniques in place of the aforementioned resampling and calibration approaches. To linearly sample the OCT fringes with respect to the wavenumber domain, the OCT system was realized with an optical k-clock [16]. However, this approach exhibited reliability issues when operating beyond 1.3 giga samples per second (GSPS) to achieve multi-MHz speeds per A-scan. And the system was vulnerable to clocking glitches and inaccuracies in the sampling process. A more recent work [17] performs calibration and resampling between consecutive sweeps by utilizing dual-channel acquisition of the OCT signal. This approach effectively overcomes the intrinsic limitations of optical clocking, but at the same time it increases the complexity. Given that the MEMS-based systems can encounter fluctuations in wavelength versus time over several sweeps, calibration is conducted for each A-scan, thus significantly increasing the computational burden. Akinetic lasers have been used by H. Lee et. al. [18] to linearize the wavenumber domain, but their high cost makes them unsuitable for commercialization. The progress in the imaging rates has been steered primarily by the innovations in the fundamental OCT hardware. In a quest to improve the performance, these benchmark OCT systems could further be advanced by extracting more potential from the software using revolutionizing deep learning techniques.

In addition to the wavenumber linearization issue, the image quality in OCT reconstruction is also degraded by the inevitable speckle due to the coherent nature of the light source. The speckle-related artifacts can be suppressed by averaging successive B-scans when the morphology of the sample remains the same with these scans [19]. However, these iterations can be computationally complex and may cause blurring of the morphology variations when there is excessive averaging. Moreover, the dynamic nature of live samples, where movements occur, make these methods ineffective and may introduce degradations in the images.

The hardware-based resampling and calibration methods demand high-performance supplementary components, whereas software-based approaches can be established using the already built-in system with minor alterations. Thus, instead of addressing the constraints of the aforementioned hardware methods, the OCT image reconstruction can be performed using the DL techniques. The DL-based reconstruction methods have gained significant attention in various imaging modalities like MRI [20,21,22], CT [23,24], ultrasonography [25,26], and many others, compared to methods based on hand-crafted feature extractors. As far as OCT is concerned, several attempts have been made on the reconstruction of high-quality B-scans from wavenumber domain data, e.g., reconstruction from an under-sampled wavenumber domain spectrum [27,28,29].

In this work, we propose a DL framework to reconstruct high-quality OCT images directly from the wavelength domain. We take into consideration the Fourier spatial relation between the power spectral density of the interreference signal and autocorrelation of optical path length differences (Weiner–Khinchin theorem). This physical prior is used to guide the data-driven CNN-based network in reconstruction of OCT B-scans. To our knowledge, the wavelength domain-based image-reconstruction problem using a dual DL framework in the spatial and Fourier domains has been untouched so far. Utilizing the Fourier space information can further enhance the current reconstruction quality along with reduced time complexity compared to the images obtained from commercial systems [30].

The main highlights of this work are as follows:

We propose a DL framework with two networks implemented sequentially to reconstruct images. One network optimizes the λ-domain interference spectrum (or non-linear wavenumber domain) in Fourier space and is called FD-CNN model. The other network, called SD-CNN, optimizes the spatial domain. This dual optimization strategy helps the DL framework to extract the non-linear relationships in two different domains, leading to more robust information extraction.
Each DL network used is an encoder–decoder architecture based on UNET [31]. Unlike the original UNET network, the model in this work also incorporates the residual connections and attention for enhanced performance. The FD-CNN uses a frequency-loss [32] function to account for missing linearity in the wavenumber domain. The combination of two optimization models facilitates the performance by guiding the dual-domain data-driven network. The experimental results show that this architecture is more streamlined and capable of generating OCT images efficiently.
We also embedded a layer containing wavenumbers corresponding to each pixel value axially for every OCT A-scan as the input matrix to further guide the network training in SD-CNN. The ground truth is 7-averaged B-scans, obtained from the commercial OCT system [30] to suppress speckle noise.
We observe the computational time complexity and image-enhancement characteristics of the proposed model in terms of morphological details, contours, edges (high-frequency content), and suppression of unwanted speckle noise by performing a comparative analysis. For a fair analysis, comparison is performed between the proposed reconstruction method and the processing approach in the commercial OCT Optores GmbH system [30].

The remainder of this paper is organized as follows: Section 2 “Materials and Methods” focuses on the problem formulation followed by the proposed method and materials used, Section 3 presents the performed experiments, results, comparisons and analysis, and Section 4 provides a discussion on the proposed work.

2. Materials and Methods

2.1. Problem Formulation

The light reflectivity information in the SS-OCT is detected using a Michelson interferometer where the light source sweeps the spectrum linearly in time [33]. Here, the intensity

I_{D}

of the acquired interferometric pattern can be expressed mathematically as:

I_{D} = I_{D} (k (t)) = I_{1} + I_{2} + I_{3} + N_{G} (t),

(1)

where, I_{1} = S (k (t)) \{\frac{ρ}{4} [r_{r} + r_{S 1} + r_{S 2} + \dots]\},

I_{2} = S (k (t)) \{\frac{ρ}{2} [\sum_{i = 1}^{\infty} \sqrt{r_{r} r_{S i}} (c o s (k (t) (d_{r} - d_{S i})))]\},

I_{3} = S (k (t)) \{\frac{ρ}{2} [\sum_{n \neq m}^{\infty} \sqrt{r_{S j} r_{S i}} (c o s (k (t) (d_{S j} - d_{S i})))]\} .

In Equation (1),

S (k (t))

represents the spectral shape of the light source that is wavenumber (k)-dependent,

ρ

is the responsivity of the detector,

r_{r}

and

r_{S i}

(or

r_{S j}

) represent reflectivity from the reference mirror (at depth

d_{r}

) and the ith (or jth) sample particle at depth

d_{S i}

(or at

d_{S j}

), respectively.

N_{G}

is the additive white Gaussian noise term created by various sources. Furthermore, the term

I_{1}

is the DC component and is removed using the biasing technique. The term

I_{2}

dominates over

I_{3}

, as the reflectivity of the reference is higher compared to reflectivity of the object particles. Then only the term

I_{2}

is kept in Equation (1) which contains reflectivity information for particles at varying depths within the sample. This leads to further simplification:

I_{D} = S (k (t)) \{\frac{ρ}{2} \sum_{i = 1}^{\infty} \sqrt{r_{r} r_{S i}} (c o s (k (t) (d_{r} - d_{S i})))\} + N_{G} (t),

(2)

The reflectivity distribution

r_{S i}

of the sample particles can be extracted by performing IDFT (Inverse Discrete Fourier Transform) in the wavenumber space (k) as:

i_{z} = F^{- 1} \{I_{D}\},

(3)

i_{z} = \frac{1}{2} \sum_{i \neq 1}^{\infty} \sqrt{r_{r} r_{S i}} (α (d_{r} - d_{S i}) + {α (- (d}_{r} - d_{S i}))) .

(4)

Here,

ℱ

⁻¹ is the symbolic representation of IDFT in Equation (3) and

α (d)

in Equation (4) comes from the IDFT of the S(k). As the IDFT is mathematically based on the principle of uniformly spaced data points (wavenumbers), it is critical to sample the interferometric signal uniformly in the wavenumber domain with identical step size. This contrasts with the spectrum acquired, as this spectrum is uniformly distributed in the wavelength domain (spans over time interval −∆t/2 to ∆t/2), having a linear relationship with time expressed as:

λ = β t + λ_{0},

(5)

where

β

represents the sweeping speed of the source,

λ

represents wavelength that varies from

λ_{m i n}

to

λ_{m a x}

and

λ_{0}

is the central wavelength in Equation (5). Now, if we use

λ

= 2

π / k

, we can write Equation (6) as:

t = \frac{1}{β} (λ - λ_{0}) = \frac{2 π}{β} (\frac{1}{k} - \frac{1}{k_{0}}) .

(6)

Upon expansion using power series, we obtain:

t = \frac{2 π}{β} [- \frac{1}{k_{0}} (\frac{k}{k_{0}} - 1) + \frac{1}{k_{0}} {(\frac{k}{k_{0}} - 1)}^{2} - \frac{1}{k_{0}} {(\frac{k}{k_{0}} - 1)}^{3} + \dots],

(7)

k

and

k_{0}

are wavenumbers corresponding to

λ

and

λ_{0}

, respectively. Moreover, for the recent advanced light sources like FDML [15], there exists a sinusoidal relationship even between wavelength and time, which can be written as:

λ (t) = λ_{0} + \frac{∆ λ}{2} (\sin (2 π f_{t} t)),

(8)

where ∆λ is the bandwidth of the light source and

f_{t}

is the tuning frequency of the Fabry–Perot (FP) filter in the FDML source.

The crucial feature of an OCT system is the depth-encoding spectrum, which must undergo IDFT (Inverse Discrete Fourier Transform) to retrieve the depth information as obtained in Equation (4). It is evident, from Equations (7) and (8), that the spectrum is a non-linear function of the wavenumbers, and that the non-linear terms in expansion cannot be neglected; hence, the spectrum needs to go through calibration and k-linearization processes to extract B-scans [34].

Another commonly encountered problem in OCT scans is speckle noise. Speckle is an unavoidable artifact due to the coherent nature of the light source used in OCT systems. It directly results from unwanted interference of scattered light from different points within the sample volume. It can affect the quality of images and hinder the quantitative analysis severely. It leads to loss of morphological details by affecting the contrast. To reconstruct good-quality OCT images, it is important to reduce the speckle noise adequately without the loss of structural details.

In the proposed framework, we demonstrate the CNN-based neural networks governed by the physical law that relates the Fourier and the spatial domains. The FD-OCT systems are based on the fundamental Weiner–Khinchin theorem which states that the power spectral density P(k) (=

{| I_{D} |}^{2})

of the measured signal and the auto-correlation function

Γ (z)

are related by Fourier transform as follows:

Γ (z) = \int_{- \infty}^{\infty} P (k) e x p (- i 2 π k z) d k

(9)

In Equation (9),

z

represents the optical path length difference. This relation is expressed keeping in consideration the symmetrical information mirrored at positive and negative frequencies of the spectrum [35]. In a practical scenario, the obtained spectrum spans across the positive region of frequencies. This relationship is exploited by the OCT system to record data as spectrally resolved interference signals (in Fourier domain) which are then subjected to IDFT to extract the depth profile in the spatial domain. This physics prior motivates us to design the framework for the reconstruction of OCT scans by incorporating two networks; one in the spatial domain (SD) and the other in the Fourier domain (FD). The schematics of the proposed framework is illustrated in Figure 1, depicting the workflow of the two optimizations performed in the spatial and Fourier domains. It is a DL-driven framework with sequential optimization of the neural networks SD-CNN and FD-CNN to reconstruct high-quality OCT images directly from the wavelength domain. The input to the SD-CNN is the spatial domain images obtained by Fourier transforming the raw interferometric data. This neural network optimizes in the spatial domain to perform high-quality image reconstruction. These images (output of SD-CNN) are further Fourier transformed and optimized in the Fourier domain using FD-CNN to enhance the reconstruction quality; this is further elaborated on in sub-sections below. A diagram of the DL model used is shown in Figure 2a. Both networks implement this DL model that is based on UNET as the main backbone architecture. This modified UNET consists of four blocks on the encoder side, one block at the bottleneck, followed by four blocks on the decoder side. To boost the performance, we also use an attention gating network and residual and skip connections as shown in Figure 2a. Attention gating helps the network by focusing on essential features required for reconstruction, whereas residual connections help to address the problem of gradients in deeper networks. The encoder block is composed of Batch Normalization (BN), activation function PReLU, and convolution layers as stacked in the diagram shown in Figure 2a. In addition, each block has a residual connection where input of that block is concatenated with the output of the final layer of the block. The decoder block consists of Upsampling followed by ConvTranspose, BN, and PReLU layers marked with a green triangle. The block output serves as one of the inputs for the attention network. A detailed description of the attention block can be found in [36]. The skip connections from the encoder towards the decoder help to transfer spatial information to the decoder, but in addition these connections forward redundant low-level features which can be suppressed using the attention block. This block suppresses activation functions from regions with redundant information. To train the framework for speckle reduction, the ground truth images are obtained by averaging 7 consecutive B-scans to supress the speckle noise. This can be done due to similarity between consecutive scans. We represent in Figure 2b the images obtained from the OCT system as 1 b-scan and results of averaging 5, 7, and 9 B-scans for vein, lemon, and cherry samples. The 1 B-scan images clearly show high speckle noise for all the three samples. Further, for the different samples we can compare and visualize the following: (i) for vein, the improvement in the appearance of the structure (marked with a red arrow); (ii) the reduction of the background noise for lemon in 5 and 7 B-scans (marked with a red box); and (iii) the progressive over-smoothening in the cherry sample as averaging includes more B-scans. Taking into account the reduction of speckle noise, the over-smoothening, and the lateral resolution to avoid excessive blurring, as demonstrated before in several state-of-art DL methods for image enhancement or denoising [37,38], we use 7 averaged B-scans as ground truth in this work. The detailed training of the two networks along with the associated data processing is described in sub-sections below.

2.2. Spatial Domain CNN

The modified encoder–decoder model used in the SD-CNN follows from the architecture described above and illustrated in Figure 2a with some modifications in the network. The convolution layers in this network are 2D as this network works in the spatial domain. The input data for the SD-CNN network are prepared by processing the wavelength domain raw spectrum. This pre-processing involves background subtraction to remove the fixed pattern noise followed by spectral shaping using Hann windowing. Then SD-CNN takes these pre-processed unevenly spaced fringes transformed into low-quality images using IDFT as the input. They suffer from degraded resolution due to immense blurring arising from the non-linearity of the data in the wavenumber domain. Next, we obtain the ground truth images

(y_{i})

for SD-CNN, using the commercial OCT system [30]. The ground truth images in the spatial domain are obtained by averaging 7 consecutive B-scans as mentioned above to remove the speckle noise. Furthermore, we supplemented the network with knowledge of the wavenumber range, as an additional input layer while training the SD-CNN as shown in Figure 1 (K-space grid) and elaborated in Figure 2c. As discussed in Section 2.1,

λ

spans over the wavelength range

(λ_{m i n}, λ_{m a x})

, which corresponds to pixels in depth (axially). This range is sampled into

N

(total) number of sampling points and

s

refers to the sampling point. We calculate the non-uniformly sampled wavenumber domain points as follows:

λ = λ_{m i n} + \frac{s}{N - 1} (λ_{m a x} - λ_{m i n})

(10)

k = \frac{2 π}{λ} = \frac{2 π}{λ_{m i n} + s (λ_{m a x} - λ_{m i n}) / (N - 1)}

(11)

These non-uniform wavenumbers are crucial as they are Fourier transform pairs with the pixel position in depth (axially) for each A-scan. Hence, to inform the network about the non-linearity of the acquired input, we add a secondary layer to the original input (1152 × 256), using

k

values estimated from Equation (11), as a column of the matrix shown in Figure 2c. As the same wavenumber values in the column correspond to each A-scan, this column vector is repeated for all rows of the raw input. This interleaved k-layer in parallel to the input spectrum serves in providing more physical context to the network. These non-linear wavenumbers help the data-driven CNN to adjust the weights and learn accordingly. They are further divided into four 2D patches of size 288 × 256 to enhance the learning and convergence of the model.

2.3. Fourier Domain CNN

The FD-CNN is based on the architecture described above and shown in Figure 2a with minor modifications. As the A-scans have 1D dependencies, the FD-CNN uses only 1D convolution kernels for feature extraction. The input data for the FD-CNN network are prepared by processing the output of the SD-CNN (

x_{i}

) in the Fourier domain. This pre-processing involves DFT where we extract the amplitude and phase. The ground truth (

y_{i}^{'})

for the FD-CNN is the amplitude obtained by performing DFT on the ground truth images (7 averaged B-scans) originally used by the SD-CNN and described in Section 2.2. This ground truth is nearly linear in the wavenumber domain spectrum as the commercial system [30] incorporates k-linearization for obtaining the final OCT images. The FD-CNN takes these amplitude values in the Fourier domain and minimizes the loss to generate the evenly spaced wavenumber domain during the training phase. To predict the final results, the FD-CNN output

x_{i}^{'}

which is the optimized amplitude is subjected to IDFT and the phase information is utilized from the Fourier transformed output of the SD-CNN.

2.4. Loss Function

In the proposed framework, we use two networks with different loss functions. For the SD-CNN, we use the mean absolute error (L1) to minimize the loss between the pixels of the low-resolution (LR) image generated after applying IDFT to the linear in the wavelength domain spectrum and the high-resolution (HR) ground truth (7-averaged B-scans) obtained from the OCT system. The L1 loss is calculated as:

L_{L 1} = \frac{1}{I J} \sum_{i = 1}^{I} \sum_{j = 1}^{J} | H R - L R |,

(12)

where i and j are the spatial indices in Equation (12). The FD-CNN uses the Focal Frequency Loss (FFL) [32] to minimize the loss between the input λ-domain (

F_{λ} (u, v)

) and the ground truth wavenumber domain (

F_{k} (u, v)

) spectra, where u and v are the indices of frequency coefficients.

F_{λ} (u, v)

and

F_{k} (u, v)

are obtained after applying DFT on the output of SD-CNN and on the ground truth images, respectively. The FFL can be expressed as follows:

F F L (u, v) = \frac{1}{M N} \sum_{u = 0}^{M - 1} \sum_{v = 0}^{N - 1} w (u, v) {| F_{k} (u, v) - F_{λ} (u, v) |}^{2}

(13)

w (u, v) {= | F_{k} (u, v) - F_{λ} (u, v) |}^{α}

(14)

Here, in Equation (13), M and N correspond to the spectrum size and in Equation (14) w(u, v) represents the weight matrix, where the scaling factor

α

is set to 1 in this work. As the B-scan reconstruction from the raw data comprises only the amplitude of the spectrum, we minimized the error using FFL only for the amplitude using the notation

{F F L}_{a m p} .

3. Experiments

3.1. Imaging and Dataset Processing

Optical Coherence Tomographic imaging was performed on a MHz SS-OCT benchtop system from Optores GmbH, Munich [30]. In the reported experiments, the SS-OCT uses a Fourier domain mode-locked (FDML) laser with central wavelength 1310 nm and bandwidth 100 nm. The axial resolution is 15 µm (air), lateral resolution 39.5 µm, sweeping speed is 1.6 MHz, and the lateral field of view is 10 mm × 10 mm. Seven different samples (or objects), namely vein, finger, lemon, tooth, cherry, flounder egg, and seed (pea), were used as OCT volume datasets. An approval by the Ethical Committee of the Bulgarian Academy of Sciences (permission 1-44/6 November 2021) was granted. The study was performed in accordance with the tenets of the Declaration of Helsinki of 1975, revised in 2013. Informed consent was obtained for the humans involved in the study. For all volumes, each B-scan had 1024 A-scans and each A-scan had 2304 points in depth for the acquired λ-space raw spectrum. For final processing, the raw data and the corresponding B-scans (images) were cropped to size 2304 × 256 (the raw data before IDFT) due to memory limitations. After IDFT as mentioned in pre-processing (Section 2), only half of the signal (mirror symmetry) is considered resulting in size 1152 × 256.

The custom-designed MHz-OCT Processing v1.2.0.7 software allows the user to extract data from the SS-OCT system at different stages of processing. The system captures the interferometric signal corresponding to each point in the 3D dataset using the swept-source FDML laser (1310 nm). The sweeping across wavelength allows depth information to be encoded in the spectral domain. This interferometric signal is detected using a dual-balanced photoreceiver and processed to obtain the depth-resolved profile of the sample. The interreference signals are digitized using a fast PCIe (Peripheral Component Interconnect Express) and further linearized in the wavenumber domain. Once linear in the k-domain, they are Fourier transformed to obtain the reflectivity profile. To account for the wavelength-dependent shifts in the phase, dispersion compensation is performed. The 3D data are acquired via two Galvano scanning mirrors that scan in x and y directions in a predefined range. The depth-resolved profiles are used to generate the so-called B-scans (x-z) referred to as “OCT Output” in this work. For further details on the Optores OCT system, see ref. [30].

In the context of this study, we utilize the following: (i) raw data which are the raw interference spectra in λ-space, (ii) OCT B-scans, (iii) an averaged OCT B-scan. The OCT data are standardized using the mean and the standard deviation for the DL framework. Adam is used as the optimizer with a learning rate 10⁻⁴ for both the networks. The SD-CNN converges at 200 epochs while the FD-CNN with its further fine-tuning requires around 400 epochs.

3.2. Generalizability

In order to have a robust and generalizable framework, we utilize a dataset with different acquisition parameters for field of view, calibration pattern, and material properties.

The field of view in the lateral (x-y) dimensions for various samples are as follows: lemon (4 mm × 4 mm), vein (8 mm × 8 mm), cherry (8 mm × 8 mm), tooth (4 mm × 4 mm), finger (8 mm × 8 mm), seed (pea) (10 mm × 10 mm), and flounder egg (3 mm × 3 mm).

The k-linearization fringes for all the 7 volumes are shown in Figure 3, which demonstrates that each volume has a different calibration pattern, acquired at the start of data-acquisition process by the OCT system [30].

The refractive indices are important in OCT imaging as they use a light source for creating volumetric scans of samples. In this study, various samples with differing refractive indices were used, approximately as follows: human vein tissue (1.3–1.4), lemon and cherry (1.47), human finger skin (1.42), human tooth (2.6–3.1), seed (pea) (1.5–1.7), and flounder egg (1.3–1.4). This allows one to generate varying spectrum patterns for samples with different material properties, helping to train and test a more versatile model.

3.3. Training, Testing, and Validations

The implementation was performed on a CPU AMD Ryzen 7 with random-access memory 64GB, and the GPU has the graphic card Nvidia RTX 3090. The DL framework comprises the SD-CNN and FD-CNN, with the SD-CNN receiving low-quality B-scans as input. These B-scans are derived from raw λ-domain fringes following the pre-processing steps, as described in Section 2.2. The output upon Fourier domain transformation is supplied to the FD-CNN, which optimizes the network in the Fourier domain. This FD-CNN output is subjected to IDFT to obtain final B-scans as described in Section 2.3. Regarding the training of these networks, the SD-CNN is independently trained first and then the FD-CNN is trained by freezing the weights of the SD-CNN with its inputs and processing described in Section 2.2 and Section 2.3. Similarly, in the inference stage, first the SD-CNN takes a degraded input image obtained from the wavenumber domain along with the k-space grid in the fashion shown in Figure 1 for the training and then DFT is performed on the output of SD-CNN and the amplitude is fed into the FD-CNN to predict the fringes, which are then subjected to IDFT to obtain the resultant OCT images. The single DL framework uses the two models sequentially to infer the output.

From the 7 volumes described above, 5 volumes, namely lemon, vein, cherry, tooth, finger, containing 3000 B-scans (600 in each volume) were used for training, validation, and testing. These 3000 B-scans were partitioned randomly into training, validation, and testing groups with percentages 70, 20, and 10 for the DL framework proposed. Each one of the other two volumes, namely seed (pea) and flounder egg, contains 200 B-scans (size 1152 × 256). These B-scans were reserved to test the generalization capability of the proposed deep learning method described in cross-validation (Section 3.4.2). Thus, no information from these two volumes was used in the training or validation procedure.

3.4. Results

As discussed in Section 2, the SD-CNN and FD-CNN were trained sequentially, using two different loss functions namely L1 loss and

{F F L}_{a m p}

. After their individual trainings, they were tested to infer the reconstructions also in a sequential manner, in the same order as they were trained.

3.4.1. Performance Evaluation of the Entire Framework

We use different quantitative metrics, namely MSE (Mean Square Error), PSNR (Peak Signal to Noise Ratio), SSIM (Structural Similarity Index Measure), and CNR (Contrast to Noise Ratio), to evaluate the reconstruction results. CNR is the measure of contrast when comparing the foreground region with the signal and the background region affected by noise in the image calculated over the ith regions using mean

μ_{i}

(foreground) and mean

μ_{b}

(background), and standard deviations

σ_{i}

(foreground) and

σ_{b}

(background) as follows:

{C N R}_{i} = 10 {l o g}_{10} (\frac{| μ_{i} - μ_{b} |}{\sqrt{{σ_{i}}^{2} + {σ_{b}}^{2}}}),

(15)

In addition, we also use another metric, namely

β_{s}

, that measures of the degree of smoothness in images. Using representation as

μ

(mean),

I

(2D image with x and y as indices of pixels), out (output), and in (input),

β_{s}

can be calculated as:

β_{s} = \frac{Γ (I_{o u t} - μ_{o u t}, I_{i n} - μ_{i n})}{\sqrt{Γ (I_{o u t} - μ_{o u t}, I_{o u t} - μ_{o u t}) \cdot Γ (I_{i n} - μ_{i n}, I_{i n} - μ_{i n})}},

(16)

where

Γ (I_{1}, I_{2}) = \sum_{x, y} [I_{1} (x, y) . I_{2} (x, y)]

.

We discuss in this section the performance of the proposed model and compare the results between the degraded input, OCT output (generated by the Optores OCT system [30]), and high-quality ground truth. The evaluation is conducted on the segregated randomly selected test dataset, which was not exposed to the models during their training stage. In Figure 4, we evaluate the performance of the framework. The images obtained from the raw data linear in λ are fed to the SD-CNN followed by Fourier domain transformation on the output (of SD-CNN) whose amplitude is optimized using the FD-CNN. We compare the ground truth (A), the OCT output (B), the degraded raw data input to the SD-CNN (C), and the final output (D) from the proposed framework in Figure 4. That figure represents images from each of the 5 samples used in this work. Compared with the degraded inputs, the final outputs show well-reconstructed images retaining structures similar to the ground truth images. The proposed framework effectively produces enhanced quality compared to the commercial OCT output (B) which is obtained without any averaging. In Figure 4, the vein, the tooth, and the finger which are characterized by larger uniform regions show clear representation of reduced speckle noise, especially when compared to the images from OCT output.

The quantitative evaluation of the performance is conducted using the metrics SSIM, PSNR,

β_{s}

, and CNR to compare the inference of the proposed framework with the degraded input and the OCT output shown in Table 1 (with the best results highlighted). The test dataset is used to calculate PSNR, SSIM, and

β_{s}

parameters using the ground truth images as the reference. The averaged overall PSNR shows an improvement of approximately 2 dB and 13 dB, respectively, for the OCT output and input (degraded input to SD-CNN) compared to the output of the proposed framework. It is to be noted that we obtain a fairly high PSNR, especially for the lemon and cherry datasets, which comprise highly structured images. In contrast, the tooth reconstruction from the input (degraded OCT image) provides a gain of 12 dB, although it remains a little below the output generated by the OCT system, that is 22.11 dB.

Using SSIM with the ground truth as a reference, a higher degree of similarity can be seen with the final output. Though the SSIM values show increment and are precise, they might not directly reflect image quality if used independently in the OCT assessments [39]. The granular speckle noise may be misinterpreted as structures resulting in inaccurate assessments while calculating SSIM scores. The

β_{s}

parameter (Equation (16)) helps to determine how well the enhanced image preserves the structural features along with reducing speckle noise in the OCT images. There is an overall improvement from 0.87 to 0.93 between the OCT system output and the proposed DL framework’s output, indicating significant reduction of speckle noise in the reconstructed images.

We also examined the CNR over the regions marked by the orange boxes in Figure 4 using Equation (15) with i = 3 for the foreground regions and the yellow box for the background region. The findings indicate that the proposed model is capable of depicting higher contrast of the crucial features for an image in comparison to noise irrespective of the sample type.

In an attempt to monitor the variations at boundaries (or edges), we analyze the column (pixels) at the central position of the red rectangular region in Figure 4 by plotting the intensity variations in Figure 5 for all the five types of sample images (input, ground truth, OCT output, and final output). Here, the x-axis is relative pixel position, where the first row of the rectangle is referred to as 1 and is not indicative of the exact pixel position (exact) as in the original image. The ground truth is marked in red and our output in maroon closely follows it when compared to the OCT output, which is afflicted by numerous spikes due to the presence of speckle even in the smooth regions. The input as expected follows a divergent trajectory due to degradations caused by blur (as an outcome of non-linear wavenumber-based spectrum) and noise.

The performances of both models SD-CNN and FD-CNN are compared in Figure 6, where we show the inferred samples from the test dataset along with respective ground truths. The five rows contain images from each volume, namely vein, finger, lemon, tooth, and cherry (for row 1–5, respectively). Each image contains two regions marked with yellow and red boxes that are further magnified (~5×) as adjacent images for better visualization. The examples show that the SD-CNN can reconstruct well when the image contains more-uniform areas, whereas the FD-CNN provides a boost to the overall framework by enhancing the structural details and features. In particular, for lemon and cherry we can visualize the clear manifestation of the morphological structures. The output of combined SD-CNN and FD-CNN models can suppress the speckle noise and significantly improve the structures and boundaries. The homogeneity attributed to the output images using the two models is highly desirable in OCT application as it enhances the image quality greatly. This superior generalization capability stems from the use of the physics prior which optimizes both Fourier and spatial domains.

3.4.2. Cross-Validation and Ablation Studies

We perform cross-validation studies in Table 2 on two different volumes, namely (i) seed (pea) (ii) flounder egg, each containing 200 images to assess the results on completely unseen volumes which were not shown to the framework during training or validation. These samples have different texture and material properties compared to the volumes used in training or validation of the model. We tabulate average value of SSIM and PSNR scores obtained on the two volumes. In addition, we can also visualize the results in Figure 7 and compare the reconstruction performed by the model, the OCT output from the system, and the ground truth images. The cross-validation performance of the proposed DL framework shows robust reliability and generalization capability of the model on completely new data.

We conduct ablation study to assess how each of the FD-CNN and SD-CNN network contributes to the overall performance. The study is presented in Table 3 using PSNR and SSIM with average and standard deviation values. The SD-CNN, when trained and inferred individually, provides an image quality with average PSNR and SSIM of 20.81 dB and 0.42, respectively. In contrast, when only the FD-CNN is trained and inferred in a similar individual manner, the model performs poorly with average PSNR of 10.97 dB and SSIM 0.03. The CNN exhibit superior performance for images in the spatial domain, compared to the only-Fourier domain, owing to their inherent architecture, receptive field, and hierarchical feature extraction. When we evaluate the combined approach (the proposed framework), we obtain increased outputs for both PSNR and SSIM with average values of 22.30 dB and 0.46 and standard deviation of 2.51 dB and 0.08, respectively.

3.4.3. Time Complexity

In Table 4, we provide the time complexity for several processing steps like calibration, resampling, and FFT for one B-scan and averaging done to remove speckle noise on 7 consecutive B-scans. The implementation was performed on a CPU AMD Ryzen 7 with random-access memory 64 GB, and the GPU has the graphic card Nvidia RTX 3090 as mentioned previously in Section 3.3. It is to be noted that these are only a few important processing steps performed to obtain the single B-scan from raw data. The commercial systems like [30] perform several other operations to handle different types of artifacts such as noise, DC term, dispersion, etc. It can be seen that for 1 B-scan, resampling takes quite a substantial amount of time, i.e., 0.17 s. Also, this single B-scan is highly affected by speckle in practical scenarios, and methods like averaging (either adjacent scans or a registration method performed on several volumes) are required to reduce speckle; this requires multiple B-scans. Hence, much more time is taken to produce these multiple scans for reducing speckle compared to generating individual B-scans afflicted by speckle noise.

We compare the time complexities for reconstructing a volume of 600 images (each of size 1152 × 1024) using the proposed model versus the commercial OCT system [30]. The latter was employed for raw data acquisition and to generate ground truth images with speckle reduction, achieved by averaging 7 B-scans. The proposed model ensures 142.5 s computation time for this volume, whereas the commercial system takes 792 s. Individual implementations comprising of only FD-CNN and the SD-CNN take 68.55 s and 79.723 s, respectively. The OCT system [30] processes (including averaging) multiple B-scans to reduce the speckle noise, and generates smooth B-scans. This increases the computational complexity. In contrast, the proposed method does not depend on multiple B-scan processing for generating speckle-reduced B-scans, which makes it quite faster. The faster processing allows handling large volumes of OCT data with more optimized workflow in medical and laboratory settings. It enables comprehensive integration of OCT systems with advanced technological setups for timely analyses.

4. Discussion

The OCT-based image-reconstruction problem for nonlinear data-pointsin the wavenumber domain have been an active research topic for decades now. The widespread applications of this technology in several biomedical domains demand faster image-processing techniques along with high-fidelity images. The FD-OCT systems with acquired linear spectrums in the wavelength domain should undergo calibration and k-mapping prior to performing IDFT to extract the depth profile for OCT images. In this work, we propose an approach based on the DL framework to offer a paradigm reform by reconstructing OCT B-scans directly from their acquired raw wavelength domain spectrums. The two neural networks incorporated in this framework effectively model the respective representations in the spatial and Fourier domains. They together achieve OCT reconstructions retaining morphological details with reduced speckle noise. Furthermore, the demonstrated computational efficiency is also critical for time-sensitive applications that particularly require fast image reconstruction. Thus, the proposed work not only focuses on reconstruction of high-fidelity images but also reduces the computational complexity significantly. Such improvements, besides catering to present day OCT applications, will in addition open room for innovative, faster processed speckle-reduced images that are crucial for integration with advanced technologies, portability, cost-effectiveness and efficient use of resources. One of the limitations of this work is the comparison with only one OCT system by Optores [30]. The hardware and software complexity associated with different commercial OCT systems and the unavailability of procedural details makes it difficult to reproduce the exact pipeline involved from raw data acquisition to final image reconstruction. This constrained setup requirement precluded comparisons with other independent systems involving different stages of the workflow. In future, we aim to pursue avenues to extend presented work for different OCT systems and settings.

Author Contributions

Conceptualization, M.V. and E.S. (Erdem Sahin); methodology, M.V. and E.S. (Erdem Sahin), V.M. and E.S. (Elena Stoykova); software, M.V.; validation, M.V., V.M. and E.S. (Elena Stoykova); formal analysis, M.V. and E.S. (Erdem Sahin), V.M. and E.S. (Elena Stoykova); investigation, M.V. and V.M.; resources, V.M. and E.S. (Elena Stoykova); data curation, M.V., E.S. (Erdem Sahin), V.M. and E.S. (Elena Stoykova); writing—original draft preparation, M.V.; writing—review and editing, E.S. (Erdem Sahin), V.M. and E.S. (Elena Stoykova); visualization, M.V., E.S. (Erdem Sahin) and E.S. (Elena Stoykova); supervision, E.S. (Erdem Sahin), V.M. and E.S. (Elena Stoykova); project administration, V.M. and E.S. (Elena Stoykova); funding acquisition, V.M. and E.S (Elena Stoykova). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska Curie grant agreement No 956770.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board: Ethical Committee of the Bulgarian Academy of Sciences (permission 1-44/6 November 2021).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Dataset available on request from the authors.

Acknowledgments

Maryam Viqar would like to thank European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 956770 for the funding. Violeta Madjarova and Elena Stoykova would like to thank European Regional Development Fund within the Operational Programme “Science and Education for Smart Growth 2014–2020” under the Project CoE “National center of Mechatronics and Clean Technologies” BG05M2OP001-1.001-0008. Erdem Sahin. would like to acknowledge the support by the Academy of Finland (project no. 336357, PROFI 6—TAU Imaging Research Platform). We would like to acknowledge Dessislava Pashkouleva, Institute of Mechanics, Bulgarian Academy of Sciences, Sofia, Bulgaria for her efforts in preparing the samples.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Suzuki, K. Overview of deep learning in medical imaging. Radiol. Phys. Technol. 2017, 10, 257–273. [Google Scholar] [CrossRef] [PubMed]
Bakator, M.; Radosav, D. Deep learning and medical diagnosis: A review of literature. Multimodal Technol. Interact. 2018, 2, 47. [Google Scholar] [CrossRef]
Klein, T.; Wieser, W.; Eigenwillig, C.M.; Biedermann, B.R.; Huber, R. Megahertz OCT for ultrawide-field retinal imaging with a 1050 nm Fourier domain mode-locked laser. Opt. Express 2011, 19, 3044–3062. [Google Scholar] [CrossRef] [PubMed]
Wieser, W.; Draxinger, W.; Klein, T.; Karpf, S.; Pfeiffer, T.; Huber, R. High definition live 3D-OCT in vivo: Design and evaluation of a 4D OCT engine with 1 GVoxel/s. Biomed. Opt. Express 2014, 5, 2963–2977. [Google Scholar] [CrossRef] [PubMed]
Pfeiffer, T.; Petermann, M.; Draxinger, W.; Jirauschek, C.; Huber, R. Ultra low noise Fourier domain mode locked laser for high quality megahertz optical coherence tomography. Biomed. Opt. Express 2018, 9, 4130–4148. [Google Scholar] [CrossRef] [PubMed]
Dorrer, C.; Belabas, N.; Likforman, J.P.; Joffre, M. Spectral resolution and sampling issues in Fourier-transform spectral interferometry. JOSA B 2000, 17, 1795–1802. [Google Scholar] [CrossRef]
Szkulmowski, M.; Wojtkowski, M.; Bajraszewski, T.; Gorczyńska, I.; Targowski, P.; Wasilewski, W.; Kowalczyk, A.; Radzewicz, C. Quality improvement for high resolution in vivo images by spectral domain optical coherence tomography with supercontinuum source. Opt. Commun. 2005, 246, 569–578. [Google Scholar] [CrossRef]
Chen, Y.; Zhao, H.; Wang, Z. Investigation on spectral-domain optical coherence tomography using a tungsten halogen lamp as light source. Opt. Rev. 2009, 16, 26–29. [Google Scholar] [CrossRef]
Hillmann, D.; Huttmann, G.; Koch, P. Using nonequispaced fast Fourier transformation to process optical coherence tomography signals. In Proceedings of the European Conferences on Biomedical Optics, Munich, Germany, 14–18 June 2009; SPIE 7372 on Optical Coherence Tomography and Coherence Techniques IV. OPTICA Publishing Group: Munich, Germany, 2009; p. 73720R1-6. [Google Scholar] [CrossRef]
Wu, T.; Ding, Z.; Wang, K.; Wang, C. Swept source optical coherence tomography based on non-uniform discrete Fourier transform. Chin. Opt. Lett. 2009, 7, 941–944. [Google Scholar] [CrossRef]
Klein, T.; Huber, R. High-speed OCT light sources and systems. Biomed. Opt. Express 2017, 8, 828–859. [Google Scholar] [CrossRef]
Braaf, B.; Vermeer, K.A.; Sicam, V.A.D.; van Zeeburg, E.; van Meurs, J.C.; de Boer, J.F. Phase-stabilized optical frequency domain imaging at 1-µm for the measurement of blood flow in the human choroid. Opt. Express 2011, 19, 20886–20903. [Google Scholar] [CrossRef] [PubMed]
Xu, J.; Wei, X.; Yu, L.; Zhang, C.; Xu, J.; Wong, K.K.Y.; Tsia, K.K. High-performance multi-megahertz optical coherence tomography based on amplified optical time-stretch. Biomed. Opt. Express 2015, 6, 1340–1350. [Google Scholar] [CrossRef]
Jayaraman, V.; Cole, G.D.; Robertson, M.; Uddin, A.; Cable, A. High-sweep-rate 1310 nm MEMS-VCSEL with 150 nm continuous tuning range. Electron. Lett. 2012, 48, 867–869. [Google Scholar] [CrossRef] [PubMed]
Huber, R. Fourier domain mode locking (FDML): A new laser operating regime and applications for biomedical imaging, profilometry, ranging and sensing. In Advanced Solid-State Photonics; Optics InfoBase Conference Papers (OSA, 2009), 14, MA1; Optica Publishing Group: Washington, DC, USA, 2009. [Google Scholar] [CrossRef]
Liang, K.; Wang, Z.; Ahsen, O.O.; Lee, H.C.; Potsaid, B.M.; Jayaraman, V.; Cable, A.; Mashimo, H.; Li, X.; Fujimoto, J.G. Cycloid scanning for wide field optical coherence tomography endomicroscopy and angiography in vivo. Optica 2018, 5, 36–43. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Nguyen, T.; Potsaid, B.; Jayaraman, V.; Burgner, C.; Chen, S.; Li, J.; Liang, K.; Cable, A.; Traverso, G.; et al. Multi-MHz MEMS-VCSEL swept-source optical coherence tomography for endoscopic structural and angiographic imaging with miniaturized brushless motor probes. Biomed. Opt. Express 2021, 12, 2384–2403. [Google Scholar] [CrossRef] [PubMed]
Lee, H.D.; Kim, G.H.; Shin, J.G.; Lee, B.; Kim, C.S.; Eom, T.J. Akinetic swept-source optical coherence tomography based on a pulse-modulated active mode locking fiber laser for human retinal imaging. Sci. Rep. 2018, 8, 17660. [Google Scholar] [CrossRef]
Alonso-Caneiro, D.; Read, S.A.; Collins, M.J. Speckle reduction in optical coherence tomography imaging by affine-motion image registration. J. Biomed. Opt. 2011, 16, 116027. [Google Scholar] [CrossRef]
Dar, S.U.; Yurt, M.; Shahdloo, M.; Ildız, M.E.; Tınaz, B.; Çukur, T. Prior-guided image reconstruction for accelerated multi-contrast MRI via generative adversarial networks. IEEE J. Sel. Top. Signal Process. 2020, 14, 1072–1087. [Google Scholar] [CrossRef]
Korkmaz, Y.; Dar, S.U.; Yurt, M.; Özbey, M.; Cukur, T. Unsupervised MRI reconstruction via zero-shot learned adversarial transformers. IEEE Trans. Med. Imaging 2022, 41, 1747–1763. [Google Scholar] [CrossRef] [PubMed]
Qin, C.; Schlemper, J.; Caballero, J.; Price, A.N.; Hajnal, J.V.; Rueckert, D. Convolutional recurrent neural networks for dynamic MR image reconstruction. IEEE Trans. Med. Imaging 2018, 38, 280–290. [Google Scholar] [CrossRef]
Xie, S.; Zheng, X.; Chen, Y.; Xie, L.; Liu, J.; Zhang, Y.; Yan, J.; Zhu, H.; Hu, Y. Artifact removal using improved GoogLeNet for sparse-view CT reconstruction. Sci. Rep. 2018, 8, 6700. [Google Scholar] [CrossRef] [PubMed]
Chen, H.; Zhang, Y.; Chen, Y.; Zhang, J.; Zhang, W.; Sun, H.; Lv, Y.; Liao, P.; Zhou, J.; Wang, G. LEARN: Learned experts’ assessment-based reconstruction network for sparse-data CT. IEEE Trans. Med. Imaging 2018, 37, 1333–1347. [Google Scholar] [CrossRef]
Wang, R.; Fang, Z.; Gu, J.; Guo, Y.; Zhou, S.; Wang, Y.; Chang, C.; Yu, J. High-resolution image reconstruction for portable ultrasound imaging devices. EURASIP J. Adv. Signal Process. 2019, 2019, 56. [Google Scholar] [CrossRef]
Jarosik, P.; Byra, M.; Lewandowski, M. Waveflow-towards integration of ultrasound processing with deep learning. In Proceedings of the 2018 IEEE International Ultrasonics Symposium (IUS) 2018, Kobe, Japan, 22–25 October 2018; pp. 1–3. [Google Scholar] [CrossRef]
Li, X.; Dong, Z.; Liu, H.; Kang-Mieler, J.J.; Ling, Y.; Gan, Y. Frequency-aware optical coherence tomography image super-resolution via conditional generative adversarial neural network. Biomed. Opt. Express 2023, 14, 5148–5161. [Google Scholar] [CrossRef]
Ling, Y.; Dong, Z.; Li, X.; Gan, Y.; Su, Y. Deep learning empowered highly compressive SS-OCT via learnable spectral–spatial sub-sampling. Opt. Lett. 2023, 48, 1910–1913. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Liu, T.; Singh, M.; Çetintaş, E.; Luo, Y.; Rivenson, Y.; Larin, K.V.; Ozcan, A. Neural network-based image reconstruction in swept-source optical coherence tomography using undersampled spectral data. Light Sci. Appl. 2021, 10, 155. [Google Scholar] [CrossRef] [PubMed]
Wieser, W.; Biedermann, B.R.; Klein, T.; Eigenwillig, C.M.; Huber, R. Multi-megahertz OCT: High quality 3D imaging at 20 million A-scans and 4.5 GVoxels per second. Opt. Express 2010, 18, 14685–14704. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18; Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 234–241. [Google Scholar] [CrossRef]
Jiang, L.; Dai, B.; Wu, W.; Loy, C.C. Focal frequency loss for image reconstruction and synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 13919–13929. [Google Scholar] [CrossRef]
Zavareh, A.T.; Hoyos, S. Kalman-based real-time functional decomposition for the spectral calibration in swept source optical coherence tomography. IEEE Trans. Biomed. Circuits Syst. 2019, 14, 257–273. [Google Scholar] [CrossRef]
Eigenwillig, C.M.; Biedermann, B.R.; Palte, G.; Huber, R. K-space linear Fourier domain mode locked laser and applications for optical coherence tomography. Opt. Express 2008, 16, 8916–8937. [Google Scholar] [CrossRef] [PubMed]
Azimi, E.; Liu, B.; Brezinski, M.E. Real-time and high-performance calibration method for high-speed swept-source optical coherence tomography. J. Biomed. Opt. 2010, 15, 016005. [Google Scholar] [CrossRef] [PubMed]
Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Ma, Y.; Chen, X.; Zhu, W.; Cheng, X.; Xiang, D.; Shi, F. Speckle noise reduction in optical coherence tomography images based on edge-sensitive cGAN. Biomed. Opt. Express 2018, 9, 5129–5146. [Google Scholar] [CrossRef]
Ni, G.; Chen, Y.; Wu, R.; Wang, X.; Zeng, M.; Liu, Y. Sm-Net OCT: A deep-learning-based speckle-modulating optical coherence tomography. Opt. Express 2021, 29, 25511–25523. [Google Scholar] [CrossRef] [PubMed]
Liang, K.; Liu, X.; Chen, S.; Xie, J.; Qing Lee, W.; Liu, L.; Kuan Lee, H. Resolution enhancement and realistic speckle recovery with generative adversarial modeling of micro-optical coherence tomography. Biomed. Opt. Express 2020, 11, 7236–7252. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Schematics of the proposed framework containing Spatial Domain–Convolution Neural Network (SD-CNN) and Fourier Domain–Convolution Neural Network (FD-CNN);

x_{i}

and

x_{i}^{'}

are the outputs and

y_{i}

and

y_{i}^{'}

are the ground truths for SD-CNN and FD-CNN, respectively.

Figure 1. Schematics of the proposed framework containing Spatial Domain–Convolution Neural Network (SD-CNN) and Fourier Domain–Convolution Neural Network (FD-CNN);

x_{i}

and

x_{i}^{'}

are the outputs and

y_{i}

and

y_{i}^{'}

are the ground truths for SD-CNN and FD-CNN, respectively.

Figure 2. (a) DL network used as the Fourier Domain–Convolution Neural Network and Spatial Domain–Convolution Neural Network; (b) 1 B-scan and image averaging using 5, 7, 9 B-scans for vein, lemon, and cherry; (c) Wavenumber layer interleaved with the input of Spatial Domain–Convolution Neural Network.

Figure 3. Comparison of k-linearization fringes for different volumes used: vein, finger, lemon, tooth, cherry, flounder-egg, seed (pea).

Figure 4. Comparison of B-scans from five different volumes. (A) Ground truth (seven B-scans of OCT system output averaged), (B) OCT system output, (C) OCT system raw data input, (D) output of the proposed framework. The reconstruction (D) shows high similarity to the desired ground truth.

Figure 5. Comparison of line plot to show the variation in intensity (A.U.) for the central column of the red rectangle marked in Figure 4; here, plots correspond to (a) vein, (b) finger (c) lemon, (d) tooth, and (e) cherry samples, for ground truth, input, OCT output, and proposed framework outputs shown using different lines in each plot.

Figure 6. Comparison between the ground truth, the output of the Spatial Domain–Convolution Neural Network and Fourier Domain–Convolution Neural Network for (a) vein, (b) finger, (c) lemon, (d) tooth, and (e) cherry. For each image, the magnified regions are shown for better comparison. The results of the combined SD-CNN+FD-CNN show enhanced performance when compared to output of only SD-CNN, demonstrating the better reconstruction capability of high-frequency details using FD-CNN.

Figure 7. Comparison of cross-validation results on (a,c,e) flounder egg and (b,d) seed (pea) samples for ground truth, OCT output, and reconstructions obtained by the proposed model. The cross-validation results show robustness and generalization capability of the proposed model on a completely unseen volume.

Table 1. Comparison of scores for PSNR, SSIM, CNR,

β

parameters for input (degraded input to SD-CNN), OCT Output (generated by the Optores OCT system), and reconstructed OCT images from the proposed framework.

Table 1. Comparison of scores for PSNR, SSIM, CNR,

β

parameters for input (degraded input to SD-CNN), OCT Output (generated by the Optores OCT system), and reconstructed OCT images from the proposed framework.

Method	Dataset	PSNR	SSIM	CNR	$β$
Input	Overall	8.94	0.08	-	0.71
	Vein	12.76	0.14	4.44	0.68
	Finger	7.92	0.06	4.62	0.75
	Lemon	7.14	0.05	5.69	0.64
	Tooth	10.11	0.12	4.31	0.77
	Cherry	8.79	0.07	5.04	0.73
OCT Output	Overall	19.95	0.35	-	0.87
	Vein	21.20	0.22	5.21	0.88
	Finger	19.93	0.45	3.04	0.92
	Lemon	20.40	0.26	6.73	0.78
	Tooth	22.11	0.56	0.75	0.93
	Cherry	17.55	0.30	3.96	0.85
Proposed	Overall	22.30	0.46	-	0.93
	Vein	21.98	0.43	8.71	0.93
	Finger	21.75	0.42	4.86	0.94
	Lemon	25.74	0.54	7.98	0.91
	Tooth	21.61	0.32	6.48	0.92
	Cherry	21.67	0.62	4.86	0.95

Table 2. Cross-validation results.

Volume	PSNR	SSIM
Flounder egg	22.09	0.45
Seed (pea)	21.53	0.42

Table 3. Ablation study to compared effects of Fourier Domain–Convolution Neural Network (FD-CNN) and Spatial Domain–Convolution Neural Network (SD-CNN) on reconstruction quality.

SD-CNN	FD-CNN	SD-CNN + FD-CNN	PSNR		SSIM
SD-CNN	FD-CNN	SD-CNN + FD-CNN	Avg	Std	Avg	Std
✓	-	-	20.81	2.64	0.42	0.11
-	✓	-	10.97	0.80	0.03	0.01
-	-	✓	22.30	2.51	0.46	0.08

Table 4. Time comparison.

Stepwise Time Complexity (s)
Operations		Time (s)
Calibration		0.008
Resampling in K-domain		0.17
FFT		0.05
Averaging (speckle reduction)		0.07
Overall Time Complexity—Volume (s)
OCT system [30]	Fourier Domain-CNN	Spatial Domain-CNN	Fourier Domain-CNN + Spatial Domain-CNN
792	68.55	79.723	142.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Viqar, M.; Sahin, E.; Stoykova, E.; Madjarova, V. Reconstruction of Optical Coherence Tomography Images from Wavelength Space Using Deep Learning. Sensors 2025, 25, 93. https://doi.org/10.3390/s25010093

AMA Style

Viqar M, Sahin E, Stoykova E, Madjarova V. Reconstruction of Optical Coherence Tomography Images from Wavelength Space Using Deep Learning. Sensors. 2025; 25(1):93. https://doi.org/10.3390/s25010093

Chicago/Turabian Style

Viqar, Maryam, Erdem Sahin, Elena Stoykova, and Violeta Madjarova. 2025. "Reconstruction of Optical Coherence Tomography Images from Wavelength Space Using Deep Learning" Sensors 25, no. 1: 93. https://doi.org/10.3390/s25010093

APA Style

Viqar, M., Sahin, E., Stoykova, E., & Madjarova, V. (2025). Reconstruction of Optical Coherence Tomography Images from Wavelength Space Using Deep Learning. Sensors, 25(1), 93. https://doi.org/10.3390/s25010093

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reconstruction of Optical Coherence Tomography Images from Wavelength Space Using Deep Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Problem Formulation

2.2. Spatial Domain CNN

2.3. Fourier Domain CNN

2.4. Loss Function

3. Experiments

3.1. Imaging and Dataset Processing

3.2. Generalizability

3.3. Training, Testing, and Validations

3.4. Results

3.4.1. Performance Evaluation of the Entire Framework

3.4.2. Cross-Validation and Ablation Studies

3.4.3. Time Complexity

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI