Stepwise Corrected Attention Registration Network for Preoperative and Follow-Up Magnetic Resonance Imaging of Glioma Patients

Feng, Yuefei; Zheng, Yao; Huang, Dong; Wei, Jie; Liu, Tianci; Wang, Yinyan; Liu, Yang

doi:10.3390/bioengineering11090951

Open AccessArticle

Stepwise Corrected Attention Registration Network for Preoperative and Follow-Up Magnetic Resonance Imaging of Glioma Patients

by

Yuefei Feng

^1,2,†,

Yao Zheng

^1,†

,

Dong Huang

^1,2,†

,

Jie Wei

^1,2,

Tianci Liu

¹,

Yinyan Wang

^3,* and

Yang Liu

^1,2,*

¹

School of Biomedical Engineering, Air Force Medical University, No. 169 Changle West Road, Xi’an 710032, China

²

Shaanxi Provincial Key Laboratory of Bioelectromagnetic Detection and Intelligent Perception, No. 169 Changle West Road, Xi’an 710032, China

³

Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, No. 119 Area A, Nansihuanxi Road, Beijing 100070, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Bioengineering 2024, 11(9), 951; https://doi.org/10.3390/bioengineering11090951

Submission received: 19 July 2024 / Revised: 3 September 2024 / Accepted: 11 September 2024 / Published: 23 September 2024

(This article belongs to the Section Biosignal Processing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The registration of preoperative and follow-up brain MRI, which is crucial in illustrating patients’ responses to treatments and providing guidance for postoperative therapy, presents significant challenges. These challenges stem from the considerable deformation of brain tissue and the areas of non-correspondence due to surgical intervention and postoperative changes. We propose a stepwise corrected attention registration network grounded in convolutional neural networks (CNNs). This methodology leverages preoperative and follow-up MRI scans as fixed images and moving images, respectively, and employs a multi-level registration strategy that establishes a precise and holistic correspondence between images, from coarse to fine. Furthermore, our model introduces a corrected attention module into the multi-level registration network that can generate an attention map at the local level through the deformation fields of the upper-level registration network and pathological areas of preoperative images segmented by a mature algorithm in BraTS, serving to strengthen the registration accuracy of non-correspondence areas. A comparison between our scheme and the leading approach identified in the MICCAI’s BraTS-Reg challenge indicates a 7.5% enhancement in the target registration error (TRE) metric and improved visualization of non-correspondence areas. These results illustrate the better performance of our stepwise corrected attention registration network in not only enhancing the registration accuracy but also achieving a more logical representation of non-correspondence areas. Thus, this work contributes significantly to the optimization of the registration of brain MRI between preoperative and follow-up scans.

Keywords:

deformable registration; multi-level registration; non-correspondence; unsupervised learning; corrected attention

1. Introduction

Deformable medical image registration is a critical and fundamental task in various clinical applications, such as preoperative surgery planning, image-guided intervention, the monitoring of patients’ responses to treatments, and postoperative therapy [1,2,3,4,5,6,7,8]. It aims to establish a spatial correspondence between medical images from different times, different patients, or different devices by searching for and computing dense and nonlinear deformation fields. The accurate registration of preoperative and follow-up images is pivotal in the whole medical management process and especially in the individual-based treatment of glioma patients. These images serve as critical guides for the assessment of the impact of treatments such as surgical resection and for planning further therapeutic strategies. However, the registration process is fraught with challenges due to the temporally large deformation of brain tissue [9] and the large topological changes caused by disease progression in the brain [10].

The current registration methods have difficulties in handling registration between paired MRI scans with high variability and complexity in deformation. In particular, brain tissue becomes increasingly complex in response to surgery, which causes large and complex deformations. Moreover, the presence of resection cavities and postoperative changes in brain tissue can lead to significant discrepancies in corresponding anatomic structures, thereby complicating the task of aligning images accurately [11].

Many traditional methods have been proposed in the field of medical image registration over the past few decades, providing solutions to align mismatching medical data and thereby creating possibilities for the longitudinal analysis of disease progression, the evaluation of treatment effects, and other diagnostic procedures. These classical registration approaches, such as elastic matching, the demons approach, the B-spline method, and the SyN method, have achieved remarkable performance in various registration tasks [12,13,14,15]. The fundamental principle of these methods is to iteratively minimize a cost function measuring the discrepancy between image pairs, which is usually a high-dimensional mathematical optimization problem. However, while these methods have good registration performance, they also lead to high complexity and high computational costs.

Recent advancements in deep learning have sparked developments in registration methods, demonstrating notable performance enhancements over traditional algorithmic approaches. Supervised methods were the first to be introduced for the task of image registration [16], where a ground-truth deformation field is necessary for training. However, acquiring ground-truth deformation fields for medical images poses a significant challenge since they require expert annotations, which are labor-intensive, costly, and sometimes impractical. Consequently, many supervised methods resort to using synthetically generated deformations via traditional registration methods as labels for training [17,18]. While this approach provides a means to train models, it limits their registration performance to the accuracy of the traditional methods that are used to generate the labels.

Unsupervised learning approaches, which do not require ground-truth deformation fields for training, instead leverage the inherent features of images to deduce the optimal transformation [19,20,21]. These approaches utilize loss functions that capture the similarity between moving and fixed images, such as mutual information or normalized cross-correlation values, combined with regularization terms that ensure smoothness constraints on the deformation fields. Unsupervised registration methods have the advantage of being more adaptive across different datasets, as they are not confined to the specific deformations within a single training set of a single organ. They can discover plausible deformations, making them especially useful for medical images characterized by diverse and complex pathologies [22]. Moving forward, the integration of unsupervised deep learning strategies is poised to significantly impact the field of medical image registration, offering a powerful tool for the analysis and interpretation of medical images.

Multi-resolution registration approaches are regarded as some of the most sophisticated and effective methods for medical image registration, particularly due to their ability to reconcile the inherent trade-offs between the global and local registration accuracy. By employing multi-resolution image pyramids, these methods harness a systematic coarse-to-fine strategy [10,23,24], where lower-resolution images provide a macroscopic perspective for global alignment, while successive, higher-resolution levels incrementally refine the registration to capture more localized deformations. In practical application, these approaches have been shown to more efficiently achieve the searching and estimation of deformation fields, primarily because the multi-stage process reduces the dimensionality of the problem space at each level, allowing for more rapid convergence. Nevertheless, despite the clear strengths of multi-resolution strategies, they are not without limitations. A notable concern arises with the downsampling methods used to construct the image pyramids, which can lead to a loss of texture and fine details—attributes of the image that are crucial for accurate registration. While wavelet transformations have been employed to address this issue, offering an alternative that preserves more image information through multi-frequency decomposition [25], they do not always work. In particular, wavelet transformations cannot entirely compensate for challenges such as non-correspondence areas within the image pairs that do not have a direct counterpart due to surgical intervention or pathological changes, which are pervasive in preoperative and follow-up brain scans. Moreover, temporal changes in brain tissue texture and structure further compound the difficulty as no method can fabricate information that has been completely lost.

In this work, we propose a stepwise corrected attention method for registration between preoperative and follow-up MRI scans of glioma patients. Our approach takes advantage of a multi-level registration strategy that can perform registration from coarse images to fine images. Our approach allows our model to capture complex and large deformations with high efficiency and accuracy. A corrected attention module is also introduced into the stepwise registration network so that the model can continuously focus the attention on learning the deformation of large deformation areas that may be non-correspondence areas with a high probability. Our model has been validated on a public dataset and compared with state-of-the-art deformable medical image registration approaches. The main contributions of this work are summarized as follows:

Our method is designed as a bidirectional registration framework to address the problem of non-correspondence voxels, avoiding the irrationality of the registration results of areas with missing correspondence and balancing the information input into the model from paired registered images.
Our model embraces three levels of stepwise registration to capture the most accurate deformation field at the initial level and refines the deformation field based on the item output from the upper-level registration network at the following level. Thus, it is capable of generating an accurate and reasonable deformation field.
To further eliminate adverse effects of possible registration biases, our model incorporates a corrected attention module that enhances the model’s focus on areas with significant deformation and integrates the clinical data of the area if that certain pathological area of the preoperative image should have no corresponding relationship.

2. Methods

2.1. Problem Statement

Traditional deformable image registration (DIR) aims to find a spatial correspondence between two or more images that capture the complex anatomical transformations of biological tissue. In particular, it satisfies this requirement by registering moving images

I_{m} \in R^{L \times W \times D}

to fixed images

I_{f} \in R^{L \times W \times D}

. The calculation process can be expressed via the following optimization problem:

ϕ^{*} = a r g m i n ℓ (I_{f}, I_{m} \circ ϕ) + λ R (ϕ)

(1)

where

ϕ \in R^{L \times W \times D \times 3}

represents the deformation field that reveals the mapping relationship of voxels between the moving image

I_{m}

and fixed image

I_{f}

. The term

I_{m} \circ ϕ

represents the warped image

I_{w}

obtained by applying

ϕ

on

I_{m}

. The loss function ℓ is used to measure the difference between

I_{f}

and

I_{w}

, and the other term

R

(

ϕ

) with a hyperparameter

λ

is used to regularize the influences of the optimization. However, preoperative and follow-up images of glioma patients often contain voxels lacking corresponding relationships. In particular, as shown in Figure 1, there are no voxels in follow-up image corresponding to the tumor tissue of the preoperative image due to surgical resection, and there are no voxels in the preoperative image corresponding to areas where cerebrospinal fluid is present in the follow-up image. Thus, traditional registration methods may lose efficacy, given their assumption of consistent representation across the images being registered. Therefore, we propose our stepwise corrected attention registration network.

2.2. Bidirectional Registration Framework

To address the problems mentioned in the above section, our approach utilizes a bidirectional registration framework with the intention of addressing the unique challenges associated with the registration of preoperative and follow-up medical images, as shown in Figure 2. Our bidirectional framework treats each image as a fixed image or moving image. Our bidirectional approach does not presume the existence of completely equivalent anatomical structures or features in pairs of registered images. Instead, it acknowledges that some reciprocal correspondences are non-existent. Inspired by Tony’s work [26], we utilize a forward–backward consistency constraint to calculate areas of non-correspondence. Specifically, the forward–backward (inverse consistency) error

δ

from preoperative to follow-up is defined as

δ (I_{f}) = {∥ ϕ_{o n e} (I_{m}) + ϕ_{o t h e r} (I_{m} + ϕ_{o n e} (I_{m})) ∥}_{2}

(2)

where

ϕ_{o n e}

and

ϕ_{o t h e r}

represent the deformation field in the forward and backward directions, respectively. We estimate the areas of non-correspondence by checking the consistency of the bidirectional deformation fields. The threshold

τ

is defined as follows:

τ = \sum_{I_{m}} \frac{1}{N_{I_{m}}} (δ (I_{m})) + C o n s t

(3)

where

N_{I_{m}}

represents the total number of voxels in the moving image and

C o n s t

represents a constant. We first calculate the average error of the moving image and set

C o n s t

as 0.015 based on Tony’s work [26] to perform thresholding-based segmentation in order to obtain a logical non-correspondence area

M a s k = ⋁_{v o x e l}^{I_{f}} δ (I_{f}) > τ

. For any voxels, if there is a significant violation of inverse consistency, the voxel reflects non-correspondence between the two images.

2.3. Stepwise Registration Network

The stepwise registration network, which deploys successive convolutional layers configured to capture deformations with high precision at each resolution scale, is key to our registration framework. As shown in Figure 3, our approach employs significantly larger convolution kernels in the initial level of the multi-level registration network. These expansive kernels enable the registration net to apprehend broad deformations across the image volume, providing the possibility to capture the overall spatial transformations and assimilate wide-ranging spatial information, which is crucial for the model to achieve the accurate initial comprehension of the paired registered images.

As the registration process advances into subsequent registration levels, the size of the convolution kernels is progressively reduced so that the network’s focus is gradually shifted to finer details, placing a growing emphasis on the relationships with adjacent voxels. By narrowing its scope, the network refines the deformation field, focusing on subtler discrepancies and aligning local structures. Each level of registration, with the exception of the initial level, is input with the upper level’s deformation field and a warped image is acquired via spatial transformation, as well as the attention map output from the corrected attention module, so that our network can refine the deformation field step by step.

2.4. Corrected Attention Module

To overcome the negative influence of non-correspondence voxels on registration, our model merges the meticulously designed stepwise attention with a principal network that introduces attention mechanisms to direct the network’s focus towards areas of substantial deformation. The structure of our model embodies the progression of the deformation fields evolving from coarse to fine. Each level of the registration network involves the generation of an attention map derived from the deformation field output from the upper level’s registration net. This allows the multi-level registration network to incorporate continuous attention to guide the registration of areas with large deformation.

A multi-level registration network forms multi-resolution image pyramids by using different downsampling schemes, which inevitably leads to weak textures or spatial aliasing, and so the deformation field output from the preceding level of registration net may be biased [25]. In clinical practice, the missing correspondence in the preoperative image should predominantly reside within the pathological area. More specifically, when designating a follow-up image to be the fixed image, the areas of non-correspondence in the preoperative image should, to the greatest extent possible, coincide with the pathological area of the preoperative image. To this end, we employ DMFNet [27] with provided pre-trained parameters to demarcate the pathological area within the preoperative image, consequently obtaining a segmentation mask for the pathological area of the preoperative image

M_{p}

. Similarly, when the preoperative image is used as the fixed image, the non-correspondence area in the follow-up image that has a deformation field should not be outside of the pathological area of the preoperative image. Thus, our stepwise attention, after being corrected, can be expressed as follows:

A_{c u r} = C o r r e c t e d A t t e n t i o n (U p s a m p l e (ϕ_{u p p e r}, (L_{c u r}, W_{c u r}, D_{c u r}))) \oplus M_{p_{c u r}}

(4)

where the current level’s attention

A_{c u r}

is derived from the upper level’s deformation field

ϕ_{u p p e r}

that is upsampled to the size of the current level

L_{c u r} \times W_{c u r} \times D_{c u r}

. The architecture of the corrected attention module is shown in Figure 4. In the following step,

A_{c u r}

is corrected by the current level’s preoperative pathological area

M_{p_{c u r}}

.

2.5. Loss Function

The loss function of the proposed network is defined as follows:

L = L_{s i m i l a r i t y} (I_{f}, I_{w}, Mask) + λ L_{r e g u l a r i z a t i o n} (ϕ)

(5)

where

L_{s i m i l a r i t y} (I_{f}, I_{w}, Mask)

is used to evaluate the similarity loss of registration quality between the fixed image

I_{f}

and the warped image

I_{w}

, except for the non-correspondence area calculated by the forward–backward consistency constraint. This term aims to minimize the dissimilarity of corresponding areas between the two images. Specifically, we use normalized cross-correlation (NCC) as the similarity metric, given its effectiveness for image registration tasks [21]. The loss is calculated as follows:

L_{s i m i l a r i t y} (I_{f}, I_{w}) = - NCC (I_{f}, I_{w}) \times (1 - M a s k)

(6)

L_{r e g u l a r i z a t i o n} (ϕ)

is the regularization loss that ensures the smoothness of the deformation field

ϕ

. This is typically implemented as a penalty on the gradients of the deformation field. The regularization loss can be expressed as follows:

L_{r e g u l a r i z a t i o n} (ϕ) = \sum_{x} ({∥\nabla ϕ (x)∥}^{2})

(7)

The parameter

λ

controls the balance between the similarity and regularization terms, ensuring that both the registration accuracy and the smoothness of the deformation field are appropriately weighted during model training.

3. Experiments and Results

3.1. Dataset

The experimental validation of our registration model was conducted on the Brain Tumor Sequence Registration (BraTS-Reg) public dataset, which is intended to establish a benchmark environment for deformable registration algorithms [28]. This dataset is a multi-institutional dataset that comprises 140 pairs of multi-modal magnetic resonance imaging (MRI) scans. All cases were diagnosed with glioma and clinically scanned using a multi-parametric MRI acquisition protocol.

Furthermore, all images in the dataset were pre-processed via skull-stripping, which involved extracting the brain tissue. The images were also resampled to a standardized size of 240 × 240 × 155 with a 1 mm² spatial resolution. This uniformity in spatial dimensions ensures consistency across the dataset, facilitating the direct applicability and comparability of algorithms without introducing bias from pre-processing steps.

3.2. Experimental Details

The experiment was conducted using Python 3.9 as the programming language and PyTorch as the deep learning framework. In addition to PyTorch, we utilized several other Python libraries: NumPy for numerical operations, scikit-learn for five-fold cross-validation, and NiBabel for neuroimaging data processing. These libraries played a crucial role in facilitating the data handling and model implementation processes.

We resized all images to 160 × 160 × 80 to facilitate multi-level registration within the network, which is essential for optimizing the data for our specific model architecture and processing pipeline. We employed an NVIDIA RTX 4080 GPU for model training, running 1000 epochs with a batch size of 1, which was determined based on the available GPU memory. The parameter

λ

was set to 0.1. During model training, we used the Adam optimizer with an initial learning rate of 0.0001.

3.3. Evaluation Metrics

We utilized the target registration error (TRE) to evaluate our registration model. The BraTS-Reg dataset provides approximately 10 pairs of expertly annotated landmarks per patient, both in preoperative and follow-up MR scans. These landmarks, labeled to reflect the invariant anatomy despite the presence of a pathology, provide a ground-truth correspondence that is indispensable for the quantitative computation of the target registration error (TRE). The TRE is a widely used performance metric for landmark-based registration tasks; it measures the average Euclidean distance between the landmarks in the fixed image and the corresponding landmarks in the warped image. The TRE can be expressed as

TRE = \frac{\sum_{i = 1}^{n u m} \sqrt{{(x_{f} - x_{w})}^{2} + {(y_{f} - y_{w})}^{2} + {(z_{f} - z_{w})}^{2}}}{n u m}

(8)

where

(x_{f}, y_{f}, z_{f})

represent the coordinates of the landmarks in the fixed image, and

(x_{w}, y_{w}, z_{w})

represent the coordinates of the landmarks in the warped image. The term

n u m

represents the total number of landmarks.

3.4. Comparative Experiment

3.4.1. Experiment Design

To evaluate the performance of our proposed method, comparisons with various state-of-the-art deformable registration algorithms were performed, including a widely used registration method implemented in the ANTs package SyN [15]; a deep-learning-based single-level registration method named VoxelMorph [21]; and a deep-learning-based multi-level registration method named DIRAC that had ranked first in another BraTs-Reg challenge [26], which was set as the baseline. We used NCC with the sampling radius set to 3 and multi-resolution optimization with three scales and 1000, 200, 50 iterations for SyN registration. We used the official implementation to build the model and chose NCC as the metric for similarity loss and the smoothness of the deformation field as the metric for regularization loss in VoxelMorph registration. Meanwhile, we set 0.0001 as the learning rate and used the Adam optimizer to train the model for 1000 epochs. We built the model and applied the same loss function as the official implementation, and used the same optimization method with 1300 epochs for DIRAC registration. We divided the dataset into a training set, a validation set, and a test set according to the ratio of 8:1:1 for all of the tested registration methods.

To further verify the generalization capability of our proposed method, we conducted a five-fold cross-validation experiment, comparing our method with the baseline. Firstly, the entire dataset was randomly divided into five equal folds. Each fold contained an equal number of cases, ensuring that the distribution of data was consistent across all folds. For each of the five-fold cross-validation tasks in this experiment, four of the folds were used as the training set, and the remaining fold was used as the validation set. This process was repeated five times, each time with a different fold serving as the validation set. This method ensured that each case in the dataset was used for both training and validation, providing a robust assessment of the model’s performance. After each fold’s training, the model’s performance was evaluated on the validation set, and the evaluation metrics were calculated for the TRE. The results from the five validation sets were then averaged to produce a final performance metric, offering a comprehensive assessment of the model’s generalization capability across different subsets of the dataset.

3.4.2. Results and Analysis

Figure 5 shows some examples of the registration results obtained with the different methods, and the results visualized in the red box show that our method and the DIRAC method obtained registration results more similar to the fixed image. Table 1 reports the quantitative results for the test dataset compared with the state-of-the-art methods and shows that our model achieved the best registration accuracy over all compared methods. The traditional methods such as SyN and VoxelMorph had poorer performance in that they could not handle instances of considerably large deformation caused by areas of non-correspondence. Although the baseline (DIRAC) and our method both had better performance, our method improved the performance by 7.5% compared to DIRAC. Meanwhile, our model achieved the best registration performance in cases with large deformation (original TRE > 3 mm) among all of the methods.

Table 1 reports the quantitative evaluation of TRE values for the test dataset compared with the state-of-the-art methods. This table highlights the best-performing method in each case by showing the best TRE values in bold. Additionally, the mean and standard deviation (SD) of the TRE values are shown for each method in order to offer a comprehensive assessment of the registration performance achieved with each method. As seen in Table 1, our proposed method consistently outperformed the other methods across most cases, particularly in cases with significant deformation (original TRE > 3 mm). In cases with smaller amounts of deformation (original TRE < 3 mm), our method demonstrates performance levels comparable to DIRAC, showing the robustness of our approach under varying conditions of deformation.

In comparison to the traditional methods like SyN and VoxelMorph, which struggled with large deformations due to areas of non-correspondence, our method, along with DIRAC, shows a clear advantage. SyN and VoxelMorph exhibit significantly higher TRE values, indicating their limitations in handling registration task with non-correspondence areas. The mean TRE across all cases for our method is 1.85 mm, with a standard deviation of 0.83 mm, suggesting that our approach not only achieves the lowest mean TRE but also maintains consistent performance across different cases. This consistency is crucial for practical applications where reliable and accurate registration is essential. Moreover, our method demonstrates a notable improvement over DIRAC, particularly in cases with larger amounts of deformation. Specifically, our approach improves performance by 7.5% compared to DIRAC, highlighting the effectiveness of our method in the registration of preoperative and follow-up images of glioma patients. Figure 5 illustrates the visual results achieved with each of the tested methods, with those highlighted in the red box indicating that our method and DIRAC produce better registration results similar to the fixed image.

In summary, both the quantitative results and the visual results clearly demonstrate the superiority of our method in terms of both accuracy and consistency, making it a robust choice for image registration tasks, particularly in challenging cases involving large deformations.

As shown in Table 2, our method achieved better registration performance compared with the baseline. Figure 6 illustrates a comparison between the visual registration results achieved with DIRAC and those achieved with our proposed method. The areas in red represent non-correspondence areas calculated by both approaches, demonstrating that our model achieved more rational and interpretable visual results in terms of the non-correspondence areas. Additionally, we present the attention maps, which demonstrate that our corrected attention module effectively enhanced the registration network’s attention to areas with significant structural differences and pathological features. A comparative analysis was then conducted between the baseline and our proposed method, focusing on both the mean performance across folds and the variability to assess consistency and robustness. The averaged results from the five folds provide an overall evaluation of how well our method generalizes to unseen data compared to the baseline. Any significant differences in performance metrics between the methods were analyzed to highlight the strengths and weaknesses of each approach.

3.5. Ablation Experiments

In order to further demonstrate the effectiveness of the stepwise registration network and corrected attention module in the proposed model, we conducted ablation experiments. The results are shown in Table 3, and we can see that both the stepwise registration network and corrected attention module achieve effective improvements in performance. Additionally, we also performed ablation experiments based on the number of epochs and batch size. As shown in Figure 7, the results revealed that the best registration performance was achieved at 1000 epochs, while the batch size had a minimal impact on the results. Consequently, we selected a batch size of 1 for our model based on our GPU’s memory capacity.

4. Discussion

The registration of preoperative and follow-up MRI scans of glioma patients is challenging due to the existence of bidirectional non-correspondence areas and deformation areas large in volume. Our experimental validation demonstrated that our proposed method is able to (1) capture deformations accurately and refine the deformation field gradually, (2) introduce prior clinical knowledge into network training, and (3) perform better in registration tasks and achieve more rational visual results for non-correspondence areas.

Utilizing a bidirectional registration network is essential for capturing the complex transformations between preoperative and follow-up images, which both contain voxels lacking corresponding relationships. This approach mitigates the inherent limitations of traditional unidirectional registration by enabling an integrative perspective that accounts for deformations and non-correspondences in both imaging directions. Bidirectionality not only provides richer information for the refinement of the registration performance but also facilitates a more balanced and accurate assessment of the transformation consistency. Consequently, this strategy significantly improves the robustness of registration, which is required for precise clinical analyses and treatment planning.

By using the stepwise registration procedure, our model can generate accurate deformation fields at the initial level of the multi-level registration network and provide a precise deformation field for the subsequent levels of the multi-level registration network. This stepwise registration strategy provides a pivotal advantage over conventional multi-resolution registration methods in that it mitigates the risk of initial errors becoming magnified across the following levels of the registration. While traditional methods might compound the initial registration bias through successive resolution levels, our stepwise registration network avoids or reduces these biases initially and consequently prevents them from escalating as the registration network deepens. Our network delicately balances the capture of global and local deformation features, which results in more reliable and precise registration.

In order to reduce the adverse effects of non-correspondence areas during the process of multi-level registration, we must pay attention on the non-correspondence areas. Our method applies an attention map when generating a higher-resolution deformation field so that large deformation areas can be registered more accurately. The introduction of the prior knowledge of surgical resection in the pathological area makes the model constrained when calculated the missing correspondence area, and this can improve the treatment of missing correspondence areas in clinical practice.

Our method improves the registration performance between preoperative and follow-up MRI scans of glioma patients. However, some limitations exist in this study. Firstly, the reliance on a sparse set of landmarks, most of which were situated at a considerable distance from the tumor, hindered our ability to comprehensively assess the registration accuracy, particularly for areas close to the non-correspondence areas [29]. Furthermore, to the best of our knowledge, no established metric currently exists for evaluating the visual outcomes of medical image registration tasks involving missing correspondence. This is due to the lack of a ground truth for deformation fields or masks delineating non-correspondence areas. In response to these limitations, our future work will focus on enhancing the robustness and reliability of evaluations of medical image registration outcomes by generating synthetic data endowed with gold-standard deformation fields.

5. Conclusions

In this work, we have proposed a novel stepwise registration network complemented by a corrected attention module that is capable of learning large deformations with bidirectional non-correspondence areas, and it is completely unsupervised. This end-to-end framework is developed with the objective of mitigating or obviating the registration inaccuracy prompted by non-correspondence voxels, thereby enhancing the registration accuracy in normal brain tissue. By accurately capturing deformations and pinpointing areas of non-correspondence at the initial level of the registration network, our method effectively circumvents the biases in traditional multi-scale registration strategies. Additionally, our proposed corrected attention module is capable of refining the registration of non-correspondence areas and their surrounding areas. The experimental results demonstrate that our method leads to improved performance in both the TRE metric and the visualization of non-correspondence areas. In the future, we will focus on the generation of synthetic data annotated with gold-standard deformation field labels and the reasonable evaluation of non-correspondence areas. Through these advancements, we aspire to increase the utility and applicability of our technique in the realm of clinical practice.

Author Contributions

Conceptualization, Y.F. and Y.Z.; methodology, Y.F. and Y.Z.; software, Y.F.; validation, Y.F., Y.Z. and D.H.; formal analysis, D.H., Y.W. and Y.L.; investigation, Y.Z. and J.W.; resources, J.W. and T.L.; writing—original draft preparation, Y.F.; writing—review and editing, Y.F. and Y.L.; visualization, Y.F.; supervision, Y.W. and Y.L.; project administration, Y.W. and Y.L.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China, under grant no. 8247205, 82072786, 62403473.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This study’s data are publicly available.

Conflicts of Interest

The authors declare no competing interests.

Abbreviations

The following abbreviations are used in this manuscript:

MRI	magnetic resonance imaging;
CNN	convolutional neural network;
BraTS	Brain Tumor Segmentation;
MICCAI	Medical Image Computing and Computer-Assisted Intervention Society;
BraTS-Reg	Brain Tumor Sequence Registration;
TRE	target registration error;
DIR	deformable image registration;
NCC	normalized cross-correlation;
SD	standard deviation.

References

Yang, X.; Ghafourian, P.; Sharma, P.; Salman, K.; Martin, D.; Fei, B. Nonrigid registration and classification of the kidneys in 3D dynamic contrast enhanced (DCE) MR images. In Proceedings of the Medical Imaging 2012: Image Processing, San Diego, CA, USA, 4–9 February 2012; Haynor, D.R., Ourselin, S., Eds.; International Society for Optics and Photonics, SPIE: Bellingham, WA, USA, 2012; Volume 8314, p. 83140B. [Google Scholar] [CrossRef]
Anzidei, M.; Argiro, R.; Porfiri, A.; Boni, F.; Anile, M.; Zaccagna, F.; Vitolo, D.; Saba, L.; Napoli, A.; Leonardi, A.; et al. Preliminary clinical experience with a dedicated interventional robotic system for CT-guided biopsies of lung lesions: A comparison with the conventional manual technique. Eur. Radiol. 2015, 25, 1310–1316. [Google Scholar] [CrossRef] [PubMed]
Tekchandani, H.; Verma, S.; Londhe, N.D.; Jain, R.R.; Tiwari, A. Computer aided diagnosis system for cervical lymph nodes in CT images using deep learning. Biomed. Signal Process. Control 2022, 71, 103158. [Google Scholar] [CrossRef]
Krilavicius, T.; Zliobaite, I.; Simonavicius, H.; Jarusevicius, L. Predicting respiratory motion for real-time tumour tracking in radiotherapy. arXiv 2015, arXiv:1508.00749. [Google Scholar]
Andersen, E.S.; Østergaard Noe, K.; Sørensen, T.S.; Nielsen, S.K.; Fokdal, L.; Paludan, M.; Lindegaard, J.C.; Tanderup, K. Simple DVH parameter addition as compared to deformable registration for bladder dose accumulation in cervix cancer brachytherapy. Radiother. Oncol. 2013, 107, 52–57. [Google Scholar] [CrossRef] [PubMed]
Han, R.; Jones, C.; Lee, J.; Wu, P.; Vagdargi, P.; Uneri, A.; Helm, P.; Luciano, M.; Anderson, W.; Siewerdsen, J. Deformable MR-CT image registration using an unsupervised, dual-channel network for neurosurgical guidance. Med. Image Anal. 2022, 75, 102292. [Google Scholar] [CrossRef]
Brock, K.K.; Mutic, S.; McNutt, T.R.; Li, H.; Kessler, M.L. Use of image registration and fusion algorithms and techniques in radiotherapy: Report of the AAPM Radiation Therapy Committee Task Group No. 132. Med. Phys. 2017, 44, e43–e76. [Google Scholar] [CrossRef]
Velec, M.; Moseley, J.L.; Eccles, C.L.; Craig, T.; Sharpe, M.B.; Dawson, L.A.; Brock, K.K. Effect of Breathing Motion on Radiotherapy Dose Accumulation in the Abdomen Using Deformable Registration. Int. J. Radiat. Oncol. Biol. Phys. 2011, 80, 265–272. [Google Scholar] [CrossRef]
Huang, Y.; Ahmad, S.; Fan, J.; Shen, D.; Yap, P.T. Difficulty-aware hierarchical convolutional neural networks for deformable registration of brain MR images. Med. Image Anal. 2021, 67, 101817. [Google Scholar] [CrossRef]
Mok, T.C.W.; Chung, A.C.S. Large Deformation Diffeomorphic Image Registration with Laplacian Pyramid Networks. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2020, Lima, Peru, 4–8 October 2020; Martel, A.L., Abolmaesumi, P., Stoyanov, D., Mateus, D., Zuluaga, M.A., Zhou, S.K., Racoceanu, D., Joskowicz, L., Eds.; Springer: Cham, Switzerland, 2020; pp. 211–221. [Google Scholar]
Ginat, D.T.; Schaefer, P.W.; Moisi, M.D. Imaging the Intraoperative and Postoperative Brain. In Atlas of Postsurgical Neuroradiology: Imaging of the Brain, Spine, Head, and Neck; Ginat, D.T., Westesson, P.L.A., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 183–258. [Google Scholar] [CrossRef]
Bajcsy, R.; Kovačič, S. Multiresolution elastic matching. Comput. Vis. Graph. Image Process. 1989, 46, 1–21. [Google Scholar] [CrossRef]
Thirion, J.P. Image matching as a diffusion process: An analogy with Maxwell’s demons. Med. Image Anal. 1998, 2, 243–260. [Google Scholar] [CrossRef]
Rueckert, D.; Sonoda, L.; Hayes, C.; Hill, D.; Leach, M.; Hawkes, D. Nonrigid registration using free-form deformations: Application to breast MR images. IEEE Trans. Med. Imaging 1999, 18, 712–721. [Google Scholar] [CrossRef] [PubMed]
Avants, B.; Epstein, C.; Grossman, M.; Gee, J. Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain. Med. Image Anal. 2008, 12, 26–41. [Google Scholar] [CrossRef] [PubMed]
Han, X.; Yang, X.; Aylward, S.; Kwitt, R.; Niethammer, M. Efficient registration of pathological images: A joint PCA/image-reconstruction approach. In Proceedings of the 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), Melbourne, VIC, Australia, 18–21 April 2017; pp. 10–14. [Google Scholar] [CrossRef]
Sokooti, H.; de Vos, B.; Berendsen, F.; Lelieveldt, B.P.F.; Išgum, I.; Staring, M. Nonrigid Image Registration Using Multi-scale 3D Convolutional Neural Networks. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2017, Quebec City, QC, Canada, 11–13 September 2017; Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S., Eds.; Springer: Cham, Switzerland, 2017; pp. 232–239. [Google Scholar]
Eppenhof, K.A.J.; Pluim, J.P.W. Pulmonary CT Registration Through Supervised Learning With Convolutional Neural Networks. IEEE Trans. Med. Imaging 2019, 38, 1097–1105. [Google Scholar] [CrossRef] [PubMed]
de Vos, B.D.; Berendsen, F.F.; Viergever, M.A.; Sokooti, H.; Staring, M.; Išgum, I. A deep learning framework for unsupervised affine and deformable image registration. Med. Image Anal. 2019, 52, 128–143. [Google Scholar] [CrossRef] [PubMed]
Kim, B.; Kim, D.H.; Park, S.H.; Kim, J.; Lee, J.G.; Ye, J.C. CycleMorph: Cycle consistent unsupervised deformable image registration. Med. Image Anal. 2021, 71, 102036. [Google Scholar] [CrossRef]
Balakrishnan, G.; Zhao, A.; Sabuncu, M.R.; Guttag, J.; Dalca, A.V. VoxelMorph: A Learning Framework for Deformable Medical Image Registration. IEEE Trans. Med. Imaging 2019, 38, 1788–1800. [Google Scholar] [CrossRef]
Dalca, A.V.; Balakrishnan, G.; Guttag, J.; Sabuncu, M.R. Unsupervised Learning for Fast Probabilistic Diffeomorphic Registration. In Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; pp. 729–738. [Google Scholar] [CrossRef]
Fan, J.; Cao, X.; Yap, P.T.; Shen, D. BIRNet: Brain image registration using dual-supervised fully convolutional networks. Med. Image Anal. 2019, 54, 193–206. [Google Scholar] [CrossRef]
Hering, A.; van Ginneken, B.; Heldmann, S. mlVIRNET: Multilevel Variational Image Registration Network. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2019, Shenzhen, China, 13–17 October 2019; Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.T., Khan, A., Eds.; Springer: Cham, Switzerland, 2019; pp. 257–265. [Google Scholar]
Che, T.; Wang, X.; Zhao, K.; Zhao, Y.; Zeng, D.; Li, Q.; Zheng, Y.; Yang, N.; Wang, J.; Li, S. AMNet: Adaptive multi-level network for deformable registration of 3D brain MR images. Med. Image Anal. 2023, 85, 102740. [Google Scholar] [CrossRef]
Mok, T.C.W.; Chung, A.C.S. Unsupervised Deformable Image Registration with Absent Correspondences in Pre-operative and Post-Recurrence Brain Tumor MRI Scans. arXiv 2022, arXiv:2206.03900. [Google Scholar]
Chen, C.; Liu, X.; Ding, M.; Zheng, J.; Li, J. 3D Dilated Multi-Fiber Network for Real-time Brain Tumor Segmentation in MRI. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), Shenzhen, China, 13–17 October 2019; Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.T., Khan, A., Eds.; Springer: Cham, Switzerland, 2019; pp. 257–265. [Google Scholar]
Baheti, B.; Chakrabarty, S.; Akbari, H.; Bilello, M.; Wiestler, B.; Schwarting, J.; Calabrese, E.; Rudie, J.; Abidi, S.; Mousa, M.; et al. The Brain Tumor Sequence Registration (BraTS-Reg) Challenge: Establishing Correspondence Between Pre-Operative and Follow-up MRI Scans of Diffuse Glioma Patients. arXiv 2024, arXiv:2112.06979. [Google Scholar]
Waldmannstetter, D.; Wiestler, B.; Schwarting, J.; Ezhov, I.; Metz, M.; Bakas, S.; Baheti, B.; Chakrabarty, S.; Kirschke, J.S.; Heckemann, R.A.; et al. Framing image registration as a landmark detection problem for better representation of clinical relevance. arXiv 2023, arXiv:2308.01318. [Google Scholar]

Figure 1. Different corresponding relationships in the brain MRI scans of glioma patients before and after surgery: red points have a corresponding relationship in both directions, the green point has no corresponding relationship from preoperative to follow-up, and the yellow point has no corresponding relationship from follow-up to preoperative.

Figure 2. The bidirectional registration framework, wherein preoperative and follow-up images (denoted by the red and green blocks) are input as fixed and moving images, respectively. The non-correspondence areas can be located by calculating the forward–backward consistency constraint on the bidirectional deformation field.

Figure 3. Overview of the proposed stepwise registration network, where

I_{f 3}, I_{f 2}, I_{f 1}, I_{m 1}

represent different resolutions in the interpolation of registered images;

ϕ_{3}, ϕ_{2}, ϕ_{1}

represent deformation fields output from each level of the registration network; and

W_{3}, W_{2}, W_{1}

represent warped images of each level via spatial transformation. The different levels of the registration network have similar network architectures but different convolutional processes, so that our network can capture deformation features globally and then refine them locally.

Figure 3. Overview of the proposed stepwise registration network, where

I_{f 3}, I_{f 2}, I_{f 1}, I_{m 1}

represent different resolutions in the interpolation of registered images;

ϕ_{3}, ϕ_{2}, ϕ_{1}

represent deformation fields output from each level of the registration network; and

W_{3}, W_{2}, W_{1}

represent warped images of each level via spatial transformation. The different levels of the registration network have similar network architectures but different convolutional processes, so that our network can capture deformation features globally and then refine them locally.

Figure 4. The designed corrected attention module, where the size of the input data changes with the transformation of the registration network, denoted as

L \times W \times D

and

\frac{L}{2} \times \frac{W}{2} \times \frac{D}{2}

, respectively.

Figure 4. The designed corrected attention module, where the size of the input data changes with the transformation of the registration network, denoted as

L \times W \times D

and

\frac{L}{2} \times \frac{W}{2} \times \frac{D}{2}

, respectively.

Figure 5. Registration results between preoperative and follow-up MRI scans with SyN, VoxelMorph, DIRAC, and our method. The registration results in the red box show that our method and DIRAC demonstrate better registration performance.

Figure 6. Visual results for the non-correspondence areas. Areas colored in red represent non-correspondence areas. In this paper, we compare our model with DIRAC, a method that ranked first in a previous BraTs-Reg challenge.

Figure 7. Quantitative evaluation of the impact of different numbers of epochs and batch sizes on registration performance. These results demonstrate that the optimal performance was achieved at 1000 epochs, while the batch size had a minimal impact on the results.

Table 1. Quantitative evaluation of the target registration error (TRE) in millimeters for different registration methods. This table presents the results across various cases, with the best-performing method in each row highlighted in bold. The mean and standard deviation of the TRE values are also shown for each method in order to provide an overall assessment of performance.

Origin TRE	Case	Origin	SyN	VoxelMorph	DIRAC	Ours
<3 mm	1	2.61	5.95	4.00	1.58	1.57
	2	1.61	14.77	2.56	0.79	1.03
	3	2.82	6.99	3.91	1.26	1.28
	4	1.62	1.70	2.45	1.15	0.74
	5	1.47	1.61	2.25	1.43	1.40
	6	2.54	5.63	4.19	1.59	1.76
>3 mm	7	3.51	8.41	5.37	1.93	1.35
	8	4.17	4.45	6.44	3.21	2.79
	9	3.77	6.60	9.21	2.89	2.83
	10	4.06	8.88	10.73	1.31	1.24
	11	6.56	14.83	10.68	2.44	2.21
	12	12.69	20.91	20.97	3.84	3.79
	13	19.65	42.18	26.81	2.51	1.93
	14	5.12	15.41	8.45	2.05	2.05
Mean ± SD		5.16 ± 5.06	11.31 ± 10.05	8.43 ± 7.26	2.00 ± 0.88	1.85 ± 0.83

Table 2. Quantitative evaluation of TRE (in mm) in the five-fold cross-validation experiment.

Fold	DIRAC	Ours
1	2.79	2.73
2	2.50	2.51
3	2.39	2.36
4	2.95	2.87
5	2.47	2.20
Average	2.62	2.53

Table 3. Quantitative evaluation of TRE (in mm) in the ablation experiments. These results demonstrate the effectiveness of both the stepwise registration network and the corrected attention module, with both contributing to improved performance.

Our Method		TRE Result (mm)
Stepwise Registration Network	Corrected Attention Module	TRE Result (mm)
✓		1.95
	✓	1.92
✓	✓	1.85

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Feng, Y.; Zheng, Y.; Huang, D.; Wei, J.; Liu, T.; Wang, Y.; Liu, Y. Stepwise Corrected Attention Registration Network for Preoperative and Follow-Up Magnetic Resonance Imaging of Glioma Patients. Bioengineering 2024, 11, 951. https://doi.org/10.3390/bioengineering11090951

AMA Style

Feng Y, Zheng Y, Huang D, Wei J, Liu T, Wang Y, Liu Y. Stepwise Corrected Attention Registration Network for Preoperative and Follow-Up Magnetic Resonance Imaging of Glioma Patients. Bioengineering. 2024; 11(9):951. https://doi.org/10.3390/bioengineering11090951

Chicago/Turabian Style

Feng, Yuefei, Yao Zheng, Dong Huang, Jie Wei, Tianci Liu, Yinyan Wang, and Yang Liu. 2024. "Stepwise Corrected Attention Registration Network for Preoperative and Follow-Up Magnetic Resonance Imaging of Glioma Patients" Bioengineering 11, no. 9: 951. https://doi.org/10.3390/bioengineering11090951

APA Style

Feng, Y., Zheng, Y., Huang, D., Wei, J., Liu, T., Wang, Y., & Liu, Y. (2024). Stepwise Corrected Attention Registration Network for Preoperative and Follow-Up Magnetic Resonance Imaging of Glioma Patients. Bioengineering, 11(9), 951. https://doi.org/10.3390/bioengineering11090951

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Stepwise Corrected Attention Registration Network for Preoperative and Follow-Up Magnetic Resonance Imaging of Glioma Patients

Abstract

1. Introduction

2. Methods

2.1. Problem Statement

2.2. Bidirectional Registration Framework

2.3. Stepwise Registration Network

2.4. Corrected Attention Module

2.5. Loss Function

3. Experiments and Results

3.1. Dataset

3.2. Experimental Details

3.3. Evaluation Metrics

3.4. Comparative Experiment

3.4.1. Experiment Design

3.4.2. Results and Analysis

3.5. Ablation Experiments

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI