1. Introduction
With the advancement of manufacturing industries and artificial intelligence technology, industrial vision technology continues to innovate and expand its applications, maintaining computer vision [
1] as a prominent research domain. A critical challenge lies in extracting effective information from images for various applications, including autonomous navigation [
2,
3,
4,
5,
6], autonomous driving [
7,
8,
9], and robotics [
10,
11]. Image information extraction primarily encompasses two aspects: feature enhancement and feature matching. Feature enhancement algorithms focus on improving image brightness and contrast while reducing noise to achieve enhanced features. These algorithms include deep learning-based methods [
12], retinex-based approaches [
13], image inversion techniques [
14], dehazing algorithms [
15], and histogram equalization (HE) methods [
16]. Feature matching algorithms aim to extract feature points from image pairs, establish visual descriptors, and complete feature matching based on optimal path determination. While traditional feature matching algorithms include SIFT [
17], ORB [
18], and SURF [
19], more recent developments, such as the SuperGlue algorithm proposed by Sarlin P. E. et al. [
20] in 2020, incorporate deep learning networks. As a fundamental task in machine vision, feature matching quality directly influences the accuracy of applications in autonomous navigation, self-driving vehicles, and robotics. Consequently, recent years have witnessed an increasing research focus on enhancing the accuracy of feature matching algorithms.
Over the years, researchers have invested considerable effort in enhancing the efficiency and accuracy of feature matching algorithms. In 2019, Yu Chen [
21] developed a mobile platform visual matching method that integrated FAST and BRIEF algorithms while incorporating an evaluation function. This approach improved the accuracy of conventional ORB feature matching and effectively addressed motion blur issues, though challenges persisted in matching features under extremely low-light conditions. Wang Beiyi [
22] introduced an innovative image matching algorithm in 2021 that combined SURF with the Fast Library for Approximate Nearest Neighbors (FLANN). This algorithm employed Fast-Hessian for feature point detection and integrated SURF with FLANN’s RANSAC to eliminate outliers, thereby enhancing both accuracy and real-time performance of feature matching. However, its effectiveness was primarily demonstrated under normal lighting conditions. A significant breakthrough came in 2020 when Paul-Edouard Sarlin [
20] introduced the SuperGlue matching algorithm, which achieved remarkable matching accuracy in both indoor and outdoor scenarios. The algorithm comprises two primary modules: an attentional graph neural network and an optimal matching layer. It utilizes SuperPoint for feature point extraction and descriptor establishment, combining these with an attention graph neural network for iterative feature point aggregation. This process enhances feature point quality before processing through the optimal matching layer, which computes the score matrix and employs the classical Sinkhorn algorithm for image matching. While SuperGlue demonstrated superior performance across various conditions, including moderate low-light scenarios, it still faced limitations in extremely dark environments. Building upon these developments, Jiaming Sun [
23] proposed the LoFTR algorithm in 2021, inspired by SuperGlue’s attention mechanism. LoFTR employs self-attention and cross-attention within a transformer framework to generate feature descriptors for matching image pairs, establishing initial pixel-level dense matching at a coarse level before refining the assignments at finer scales.
In 2022, Qing Wang [
24] advanced the field further with Matchformer, a novel hierarchical feature extraction and matching transformer. By incorporating attention mechanisms throughout a hierarchical encoder and combining self-attention and cross-attention on multi-scale features, Matchformer achieved enhanced matching robustness. In the same year, Jiayuan Li [
25] introduced the LNIFT algorithm for spatial-domain multimodal feature matching. This approach first employs normalized filters for feature point detection and description, then detects oriented FAST and ORB keypoints on normalized images, improving keypoint distribution through adaptive non-maximum suppression (ANMS). Furthermore, LNIFT utilizes descriptors such as oriented gradient histograms (HOG) to characterize keypoints on normalized images, resulting in improved matching accuracy. In 2023, Lindenberger P et al. [
26] proposed a lightweight and fast matching algorithm called LightGlue. The algorithm features a lightweight network head and improves matching speed by dynamically reducing the number of layers based on the difficulty of image pairs and by early elimination of unmatched points.
In 2024, Yongxian Zhang et al. [
27] proposed a multimodal remote sensing image matching method (MICM) utilizing learned features and attention mechanisms to address linear radiation errors and geometric distortion problems. In 2024, Bai Zhu et al. [
28] proposed a ground-to-space robust matching algorithm (VDFT) based on viewpoint-invariant deformable feature transformation. The algorithm introduces a ResNet-like block that fuses Deformable Convolutional Networks (DCNs) and Depth-wise Separable Convolutions (DSCs), constructing a learnable and deformable feature network to address viewpoint span issues. They designed an enhanced joint detection and description strategy that improves feature point localization accuracy and representation capability through multi-level feature concatenation. Finally, the matching model is built using self-attention and cross-attention mechanisms to aggregate feature point information and enhance matching precision. While the algorithm addresses matching problems with large viewpoint spans, it is primarily based on research conducted using normal light and resolution imagery. The method has not addressed registration for low-light and highly blurred images, which limits its applicability in scenarios where high-quality reference images cannot be obtained. Additionally, the algorithm is constrained by lighting conditions, making high-precision nighttime registration challenging. In 2024, Jiang H et al. [
29] proposed an omnigeneralized matching algorithm called OmniGlue, which enhances model generalization capabilities by introducing foundational model guidance and keypoint location attention guidance.
However, in practical applications, the low-light image pairs we obtain have extremely low brightness. To meet the all-day, all-weather requirements of visual SLAM [
30], nighttime navigation and positioning become necessary, thus requiring high-quality feature extraction and matching for nighttime image pairs. But due to the problems of low brightness, low contrast, and high noise in the acquired images, traditional feature extraction operators suffer from issues such as few extracted features and low accuracy. When integrated into existing feature matching algorithms, these problems persist, resulting in fewer matching feature points and low precision. Feature matching algorithms have undergone long-term development, and whether traditional methods like SIFT, ORB, SURF, and FAST, or advanced matching algorithms like SuperGlue, Matchformer, LightGlue, LoFTR, and OmniGlue, they have achieved good matching results in normal light image pairs. However, they still have not solved the image matching problems under extreme conditions such as low-light and nighttime environments, and cannot meet the actual production requirements for matching in ultra-dark environments like nighttime autonomous navigation, tunnels, and mines.
To address the difficulty of feature point extraction in low-light images, in 2021, Risheng Liu et al. [
31] proposed a novel method based on Retinex-inspired Architecture Search Unfolding (RUAS), designed to construct a lightweight yet effective enhancement network for handling low-light image enhancement in real-world scenarios. In 2022, Wenhui Wu et al. [
32] proposed a deep unfolding network based on retinex theory, where the authors transformed the optimization problem into a learnable network for decomposing low-light images into reflectance and illumination layers. By formulating the decomposition problem as a regularized model with implicit prior conditions, they designed three learning modules responsible for data-based initialization, efficient unfolding optimization, and user-specified illumination enhancement. In 2023, Jiang Hai et al. [
33] proposed a network (R2RNet) based on retinex theory for enhancing low-light images, transforming images from real low-light conditions to real normal-light conditions. In 2024, Ji Wu et al. [
34] proposed a neural network-based low-light image enhancement network. This approach achieves feature enhancement of dark images, improving the brightness and texture information of the images.
To address the challenges of feature point extraction and matching in extremely low-light conditions, we propose a novel feature matching methodology that integrates multi-scale retinex with color restoration (MSRCR) and SuperGlue. First, the ultra-dark matching image pairs are input into the MSRCR feature enhancement algorithm for extreme feature enhancement of ultra-dark images. MSRCR is a feature enhancement algorithm based on retinex, which divides the image into illumination and reflection components. The reflection component is treated as the true surface information of objects. By using Gaussian filtering to remove the illumination component and then restoring the image’s true information through color recovery, the enhanced images are subsequently input into the SuperGlue feature matching algorithm for feature matching. This approach significantly enhances both the quantity of extracted feature points and the accuracy of matched feature points, thereby improving the efficiency of feature extraction operators and overall matching quality while substantially reducing rotation and translation errors. Our methodology follows a systematic approach: First, the SuperGlue feature matching algorithm is applied to the original low-light image pairs. Subsequently, these images undergo feature enhancement using MSRCR, followed by a second application of SuperGlue for feature matching on the enhanced images. The effectiveness of our proposed method is validated through comprehensive precision evaluation, comparing both the number of feature points and rotational/translational errors before and after enhancement. Comparative experimental results demonstrate the superior performance of our proposed approach.
This article is organized as follows:
Section 2 presents a comprehensive analysis of the retinex-based feature enhancement algorithm and examines the SuperGlue feature matching algorithm, culminating in the proposal of our novel integrated approach that combines MSRCR and SuperGlue.
Section 3 details the comparative experimental evaluation of the proposed matching method, with particular emphasis on its accuracy in low-light image matching scenarios.
Section 4 provides an in-depth discussion of the method’s effectiveness and implications. Finally,
Section 5 presents conclusions drawn from the experimental findings and their analysis.
2. Methodology
Contemporary advancements in image processing algorithms have progressed along parallel but distinct paths: enhancement algorithms have primarily focused on restoring authentic image information, improving brightness, and enhancing contrast, while matching algorithms have concentrated on developing novel techniques to improve accuracy and robustness. However, there has been limited exploration of the synergistic potential between these two domains. To address the persistent challenges of low accuracy and mismatching in low-light image scenarios, we propose a novel integrated approach that combines MSRCR and SuperGlue, bridging the gap between enhancement and matching methodologies.
This paper presents a comparative analysis of matching performance before and after enhancement, evaluating the accuracy improvements achieved through the integration of MSRCR and SuperGlue feature matching methods, as illustrated in
Figure 1. The methodology comprises the following sequential steps:
- Step (1):
The original low-light image pairs are directly processed through the SuperGlue feature matching algorithm to establish baseline matching results.
- Step (2):
These same image pairs undergo feature enhancement using the MSRCR algorithm, which restores intrinsic image information while improving brightness and contrast characteristics.
- Step (3):
The enhanced image pairs are then processed through the SuperGlue matching algorithm to obtain post-enhancement matching results.
- Step (4):
A comprehensive comparative analysis is conducted between the original and post-enhancement matching results to evaluate the accuracy and robustness of the integrated MSRCR-SuperGlue approach.
2.1. Feature Enhancement Theory Based on Retinex
The retinex theory establishes a fundamental framework for image decomposition into two distinct components: illumination and reflectance. The reflectance component, which characterizes the inherent reflectivity of objects, constitutes the high-frequency elements of the image and maintains stability regardless of external lighting conditions. Conversely, the illumination component encompasses the low-frequency elements and is significantly influenced by ambient lighting conditions. By strategically suppressing or eliminating the illumination component, it becomes possible to recover the intrinsic image information. Image enhancement is achieved through the adjustment of brightness and contrast parameters to establish optimal white balance. Through extensive research efforts globally, scholars have continuously refined enhancement theories based on retinex principles. This ongoing optimization process has led to the development of three principal retinex image enhancement algorithms, with their evolutionary progression illustrated in
Figure 2.
The single-scale retinex (SSR) algorithm implements image decomposition by separating illumination and reflection components. The core mechanism involves suppressing or eliminating low-frequency components through a methodological approach that enables the reflection image to effectively capture image information. This process is followed by white balance adjustment to enhance the image’s salient features. To achieve this objective, a Gaussian function [
35] is employed as the filtering mechanism for the input image. This function serves as an effective normalized low-pass filter, which is specifically designed to filter out the illumination component. The operational principle is structured as follows:
In the equation, represents the original image information, represents the illumination component, and represents the reflection component.
To facilitate the separation of illumination and reflectance components while optimizing computational efficiency, logarithmic transformation is applied to both sides of the equation, resulting in the following:
Introducing Gaussian filtering to compute the illumination component
, the filtering is computed as follows:
The illumination component in the original image is finally obtained through Gaussian filtering. The reflection image is derived by filtering the original image to remove the illumination image. White balancing is then applied to the resulting reflection image to enhance low-light images. The computational formulas are as follows:
The SSR feature enhancement process follows a systematic sequence: initially, logarithmic transformation is applied to the original image function to separate the illumination and reflectance components. The illumination component is then extracted through Gaussian blurring, utilizing a Gaussian low-pass filter to isolate low-frequency components. The reflectance component is subsequently obtained by subtracting the illumination component from the original image function, ultimately facilitating feature enhancement in low-light conditions.
According to Equation (
4), it is evident that for SSR, there exists only one adjustable parameter
. From the Gaussian curve, it is known that when
is large, the Gaussian image changes smoothly, which can achieve better color recovery but may lose a significant amount of edge information. Conversely, when
is small, the Gaussian image is steeper, preserving better edge information but risking color information loss and distortion. Therefore, when using the SSR algorithm for enhancing low-light images, it is necessary to continuously adjust the parameter
based on environmental changes and specific requirements, significantly increasing workload and reducing efficiency. Thus, there is a need to seek an algorithm capable of balancing image details smoothly while maintaining color fidelity.
To address the issues with the SSR algorithm, Jobson [
36] proposed the multi-scale retinex (MSR) algorithm in 1996, building upon SSR’s foundations. This algorithm adopts a weighted summation approach, separating the image information into three channels (R, G, B). It enhances each channel independently and then merges the enhanced images to produce the final low-light feature-enhanced image.
The MSR algorithm can be viewed as a weighted average of multiple SSR algorithms, assigning specific weights to each filtering channel. Ultimately, it achieves feature enhancement through weighted fusion. Its principle is as follows:
In the equation,
N represents the number of scales;
denotes the weight assigned to each scale.
in this context represents the Gaussian filtering function corresponding to the k-th channel, expressed as follows:
In the equation, represents the standard deviation of the Gaussian at the k scale, also known as the scale parameter. The values of each Gaussian function channel can be adjusted as needed to achieve the experimental requirements. Typically, the R, G, and B channels are commonly chosen for this purpose.
In the MSR algorithm, the original image undergoes three separate SSR enhancement operations in the R, G, and B channels. Different Gaussian blur parameters are chosen as needed for SSR enhancement in each channel. Subsequently, the outputs from the three channels are weighted and fused to obtain the low-light enhanced image.
While the MSR algorithm enhances low-light images by balancing detail and color information, improving brightness and contrast to a certain extent, it can disrupt the coupling between the original image’s R, G, and B channels. After MSR feature enhancement, the values of the R, G, and B channels in the original image no longer match those in the enhanced image, resulting in color distortion in the MSR-enhanced image.
Jobson [
36] introduced the multi-scale retinex with color restoration (MSRCR) algorithm to address color distortion issues that may arise from the multi-scale retinex (MSR) method. The principle behind MSRCR involves the introduction of a color restoration factor.
In the equation,
denotes the color restoration factor for the (
i)-th channel, expressed as follows:
In the equation, and represent the gain factors.
In the MSRCR algorithm, first, a Gaussian function is applied for low-pass filtering to remove illumination components. Then, multi-scale retinex (MSR) enhancement is performed to obtain the MSR image. Finally, the color restoration factor is multiplied element-wise with the MSR-enhanced image to recover color information, addressing color distortion issues in the MSR algorithm.
2.2. SuperGlue Feature Matching Algorithm
SuperGlue is a feature matching network proposed by Sarlin [
20] in 2020. It takes as input the keypoints and descriptors of two images, which are provided by feature extraction algorithms and deep learning networks. The SuperGlue matching network consists of an attentional graph neural network and an optimal matching layer, integrating feature point positions and visual descriptors extracted by the SuperPoint feature extraction network. The attentional graph neural network encodes keypoints and descriptors into a feature matching vector using a keypoint encoder, which is iteratively aggregated via attention mechanisms to enhance the feature matching vector. The enhanced feature matching vector serves as input to the optimal matching layer, where a score matrix is computed and the optimal feature assignment matrix is iteratively solved using the Sinkhorn algorithm, as shown in
Figure 3. Within the attentional graph neural network, feature aggregation is performed using encoders followed by information aggregation facilitated by attention mechanisms.
The attention graph neural network [
20] integrates attention mechanisms and convolutional neural networks, and incorporates SuperPoint for feature point extraction.
The aggregation of feature points using a multi-layer perceptron (MLP) aggregates both the positions and descriptors of feature points. This aggregation allows subsequent attention mechanisms to fully consider both appearance and spatial similarity. The formula is
The addition of descriptors and encoded feature points serves as the input to the attention graph neural network.
The attention graph neural network is based on attention mechanisms and principles of human recognition. It iteratively aggregates features within and between graphs using self-attention and cross-attention. Self-attention is employed for intra-graph aggregation, consolidating neighboring and similar feature points within the graph. Cross-attention is utilized to aggregate inter-graph information, identifying relevant points for matching across graphs.
The optimal matching layer [
20] is the second major module of the SuperGlue matching network. It takes the feature descriptors obtained from the attentional GNN and inputs them into the optimal matching layer network. It computes a score matrix by solving the shortest path problem and assigns 0s and 1s to achieve feature matching. This matching network calculates a score matrix
to represent potential matches. However, creating separate representations for
potential matches is computationally intensive. Therefore, the authors use the inner product of
and
aggregated by GNN to compute scores. The formula is as follows:
To address occlusion and visibility issues, this paper introduces a garbage bin to collect unmatched points and mismatched points. Finally, the optimal transport problem using the Sinkhorn algorithm is employed to solve the matching problem. The Sinkhorn [
37] distance is based on the Earth Mover’s Distance (EMD).
The SuperPoint network [
38] is a network designed for feature point extraction and descriptor generation, consisting of two main components: an encoder and a decoder. The architecture diagram of the SuperPoint network is shown in
Figure 4.
The encoder is a simple VGG model that takes the original image as input, passes it through convolutional layers, pooling layers, and nonlinear activation functions, and outputs the image . The encoder aims to achieve dimensionality reduction through spatial pooling.
The keypoint decoder network decodes by detecting interest points and outputs the probability that a pixel is a keypoint. After dimension reduction by the encoder, it outputs an dimensional tensor. At this point, upsampling is required to restore it to full resolution. Upsampling increases computational costs, so a keypoint detection head is designed to compute , and output represents the size, with the last dimension discarded during the softmax process.
The descriptor head computes and outputs . Using a model similar to UCN, it takes semi-dense descriptors as network inputs. These descriptors undergo bicubic interpolation followed by L2 regularization, resulting in a dense mapping of L2-normalized fixed-length descriptors.
3. Results
3.1. Data Source
The experimental dataset selected in this paper required both camera internal and external parameters and ultra-dark light conditions. However, existing datasets rarely include camera internal and external parameters. Matching datasets are mostly based on high-resolution images, which cannot meet the ultra-dark light conditions. Enhanced datasets also make it difficult to form matching pairs and lack camera internal and external parameters. Therefore, this paper selected the ScanNet indoor dataset that includes camera internal and external parameters, but it still did not meet the ultra-dark light conditions. To meet the experimental requirements, the image exposure was manually adjusted to obtain darker images, with the experimental dataset illustrated in
Figure 5.
Analysis of the dataset demonstrates several inherent challenges in low-light images. Specifically, these images exhibit blurred texture information and indistinct features, accompanied by minimal grayscale variation in the vicinity of feature points. These characteristics significantly compromise feature recognition capabilities and impede effective feature point extraction. To overcome these limitations, this study proposes an initial preprocessing step focusing on feature enhancement of low-light images, with particular emphasis on optimizing brightness and contrast parameters.
3.2. Image Enhancement Experiment
To explore the impact of different parameter configurations of the MSRCR algorithm on image matching results, this paper presents comparative experiments to verify the influence of different MSRCR parameters on matching results. Through experiments, it was found that changing MSRCR different parameters has a negligible effect on matching results. Therefore, this paper selected a set of parameter settings with the best performance for experiments, where
,
,
,
,
,
,
, and
.
represents the Gaussian filtering parameter,
represents the scale parameters,
G is the gain parameter,
b is used to adjust the overall image brightness,
represents the detail enhancement coefficient,
represents the smoothing control coefficient, and
and
represent the lower and upper truncation thresholds. The original low-light images were used as input to the MSRCR algorithm for feature enhancement, improving image brightness and contrast, and enhancing the texture information of the images, as illustrated in
Figure 6.
Qualitative analysis of the enhanced images demonstrates that the MSRCR preprocessing significantly improved both photometric and geometric properties of the scenes. Specifically, the enhancement resulted in not only optimized brightness and contrast but also distinctly visible texture features, effectively restoring the intrinsic feature information within the images. This enhancement facilitates more robust keypoint extraction in subsequent processing stages.
3.3. Experiment of Matching Before and After Enhancement
The enhanced and original low-light images were separately input into the SuperGlue algorithm for feature matching, and the matching results are shown in
Figure 7.
Comparative analysis between pre- and post-enhancement feature matching results is presented, with the left and right images illustrating the matching outcomes before and after MSRCR enhancement, respectively. Quantitative evaluation across six experimental groups demonstrates that MSRCR enhancement yields substantial improvements in multiple performance metrics. Specifically, the enhanced images exhibit an increased number of detected feature points, reduced rotational and translational errors, and significantly improved matching performance, characterized by both higher participation rates and increased accuracy of feature point correspondences.
Using the x-axis to represent SuperGlue and MSRCR+SuperGlue, and the y-axis to represent the number of correctly matched feature points and the correct matching rate, the density distribution plot is drawn as shown in
Figure 8 and
Figure 9.
As shown in
Figure 8 and
Figure 9, the proposed method significantly improves both the number of correctly matched feature points and matching accuracy, solving the issues of low matching quantity and low precision encountered by existing feature matching algorithms such as SuperGlue, Matchformer, LightGlue, LoFTR, and OmniGlue when facing extremely challenging ultra-low-light image matching.
3.4. Accuracy Assessment
This study investigates the efficacy of integrating MSRCR enhancement with the SuperGlue matching framework. The performance evaluation of the proposed methodology encompasses three critical metrics: (1) the comparative analysis of feature point extraction density pre- and post-enhancement, (2) the quantification of rotational and translational error metrics, and (3) the assessment of matching performance in terms of both quantity and precision of correspondences.
The MSRCR algorithm effectively enhances the photometric properties of low-light images through simultaneous optimization of brightness and contrast parameters. This enhancement significantly improves the visibility of intrinsic image details, thereby facilitating more robust feature point extraction.
According to
Table 1, from the data of six experimental groups, it can be observed that after MSRCR feature enhancement, there is a significant increase in grayscale variation around feature points, making feature point extraction easier. The number of extracted feature points has increased significantly, to more than 1.5 times that of the feature points extracted from the original low-light images.
As shown in
Figure 10, after applying MSRCR feature enhancement to the original low-light images, the number of feature points extracted from both left and right images increased significantly.
The experimental methodology incorporates both image pairs and their corresponding ground truth data, which specify the relative translational and rotational transformations between stereo images. The algorithmic computation of rotation and translation parameters is evaluated against these ground truth values, with the respective discrepancies quantified as ΔR and Δt.
According to the data analysis in
Table 2, it can be observed that after enhancement with MSRCR, the relative pose error decreases. In experiments 5 and 6, although the rotation and translation errors cannot be calculated directly from the original image pairs, after feature enhancement, the rotation and translation errors of the matching image pairs can still be determined.
To calculate the matching score, CM represents the number of correctly matched feature points, TM represents the total number of feature points involved in matching, and Mscore represents the matching score.
Analysis of
Table 3 and
Figure 11 demonstrates that MSRCR feature enhancement yields substantial improvements in both the absolute number of matches and correct correspondence rates across the dataset’s low-light images. The enhancement methodology exhibits particularly notable efficacy in cases of severe degradation, as evidenced by experimental groups 5 and 6, where initial matching attempts yielded no correspondences but post-enhancement processing produced satisfactory matching results. For visualization purposes in the precision analysis, the matching score for the fifth group was normalized to zero in the corresponding line graph.
Quantitative analysis of
Figure 12 demonstrates that MSRCR feature enhancement significantly improves matching precision metrics. The proposed methodology exhibits robust performance even under challenging conditions where conventional matching approaches fail, particularly in cases of severely degraded low-light images characterized by poor focus quality. While traditional methods prove ineffective for these degraded images, the proposed MSRCR-enhanced matching framework successfully establishes valid feature correspondences.
We conducted experiments comparing our proposed method with existing advanced matching algorithms LightGlue, LoFTR, and OmniGlue, performing six sets of experiments using the same low-light dataset. In the original OmniGlue algorithm, the threshold was 0.02. To maintain consistency, we set the thresholds for all algorithms to 0.2, which is consistent with the threshold used in our method, and performed statistical analysis on the number of correctly matched feature points, total matching quantity, and matching accuracy.
From
Table 4 and
Figure 13’s matching accuracy density distribution plot, it can be observed that although existing advanced matching algorithms like LightGlue, LoFTR, and OmniGlue demonstrate strong advantages in well-illuminated high-resolution images with satisfactory matching quantities and precision, they struggle significantly when dealing with noisy ultra-low-light images. In such challenging conditions, these algorithms fail to meet industrial production requirements, exhibiting low matching quantities and poor matching accuracy. In contrast, the proposed method combining MSRCR and SuperGlue shows significant improvements in both matching quantity and accuracy after feature enhancement, clearly outperforming existing algorithms.
4. Discussion
This paper addresses issues such as slow grayscale variations around image feature points in low-light and extremely low-light environments, unclear local features, difficulties in feature point extraction, and low matching accuracy. A novel matching method is proposed that combines the MSRCR feature enhancement algorithm based on retinex with the SuperGlue feature matching algorithm to achieve feature matching under extremely low-light conditions. Building upon the SuperGlue algorithm proposed by Sarlin P. E. et al. in 2020 [
20], this paper introduces a feature enhancement module that enhances features prior to matching, thereby addressing the issues of unclear local features and difficult feature point extraction in low-light images. The enhanced low-light images serve as input to the SuperGlue feature matching algorithm, effectively resolving the problems of insufficient feature points and low matching accuracy in low-light image matching. The proposed method consists of two modules: feature enhancement and feature matching. In the feature enhancement module, the MSRCR algorithm based on retinex theory is employed, which decomposes the original image into illumination and reflection components, mimicking human visual perception. By filtering out the illumination component using Gaussian filtering and performing white balancing on the remaining reflection image, the brightness and contrast of the image are enhanced, restoring the true information of objects on the ground and facilitating feature point detection and extraction. In the feature matching module, the SuperGlue algorithm is implemented, which consists of an attention graph neural network and an optimal matching layer. Throughout the matching process, SuperPoint is used to extract feature points and descriptors, which are then encoded into a feature matching vector by an encoder. The attention graph neural network enhances this feature matching vector iteratively using self-attention and cross-attention mechanisms. Subsequently, the optimal matching layer computes matching scores and iteratively uses the Sinkhorn algorithm to determine the optimal feature assignment matrix. By integrating the feature enhancement and feature matching modules, this paper enhances local features by restoring the true surface information of objects using the enhancement algorithm. The SuperGlue feature matching algorithm improves matching accuracy and increases the number of matching feature points, thereby addressing feature matching issues in extremely low-light environments.
Firstly, six pairs of low-light matching images were subjected to feature enhancement experiments.
Figure 11 shows the local detailed information before and after enhancement of three of these low-light images. The image on the left is the enhanced low-light image, and the image on the right is the original low-light image. The experimental results indicate the following:
In the original extremely low-light images, texture features exhibit significant attenuation, with predominant dark tonality impeding both object identification and feature point extraction. Following MSRCR enhancement, the algorithm effectively eliminates illumination component interference while preserving crucial reflection information. This enhancement process significantly amplifies detail features and introduces distinct grayscale gradients around feature points, facilitating clear object recognition and surface texture delineation, thus substantially improving the robustness of feature point extraction.
As shown in
Figure 14, after enhancement with MSRCR, the brightness and contrast of the images are improved. Gaussian filtering was used to remove the illumination components from the original low-light images, followed by white balance adjustment to enhance the brightness and contrast of the images, making it possible to clearly see the surface texture features of objects.
To validate the effectiveness of the proposed method, six sets of comparative experiments were conducted, with green indicators denoting correct matches and red indicators representing incorrect matches. As illustrated in
Figure 15, the analysis compared rotational and translational differences between photographic pairs using both ground truth values and computer-generated matches. Prior to feature enhancement, the system exhibited a rotation error of 40.5° and a translation error of 62.8. Post-enhancement analysis revealed significantly improved performance, with rotation error reducing to 1.0° and translation error to 1.5, representing approximately 1/40 of the original matching errors. The enhancement process demonstrated substantial improvement in matching accuracy, with computed rotational and translational components closely aligning with ground truth values. Quantitative analysis of matching performance revealed that pre-enhancement results yielded 97 correct matches out of 159 total correspondences, corresponding to a matching score of 0.61. Following enhancement, the system achieved 241 correct matches out of 246 total correspondences, resulting in a matching score of 0.98, representing a 27% improvement in matching accuracy. The results demonstrate that feature enhancement not only increased the absolute number of feature point correspondences but also substantially improved the precision of these matches.
As demonstrated in
Figure 16, when processing low-light images characterized by poor quality and blurriness, traditional feature matching algorithms, while capable of extracting feature points, generate low-quality correspondences that result in unsuccessful feature matching. This failure prevents the computation of relative rotation and translation parameters between left and right image patches, making error calculation impossible. The proposed methodology employs MSRCR feature enhancement to amplify texture information and improve image brightness and contrast before feeding the enhanced images into the SuperGlue matching algorithm. Experimental results reveal a rotation error of 36.0°, translation error of 40.0, and 36 correct matches out of 42 total correspondences, achieving a matching score of 0.86. Although the matching accuracy is constrained by the inherent low quality and blurriness of the image pairs, the proposed algorithm successfully establishes feature correspondences in scenarios where conventional matching algorithms fail completely.
However, despite the proposed method’s capability to enhance feature point correspondence quantity and matching accuracy, significant challenges persist when processing low-quality, blurry low-light images. While the algorithm successfully establishes matches in such conditions, the inherent image blurriness continues to limit both the quantity and accuracy of feature correspondences. The challenge of improving matching quality for low-quality low-light images remains a substantial research concern. Furthermore, the current implementation maintains MSRCR and SuperGlue as separate algorithmic components rather than integrating them into a unified enhanced matching framework. Future research should focus on developing a cohesive algorithm that seamlessly integrates these complementary approaches into a single enhanced matching solution.