Author Contributions
Conceptualization, L.S. and Y.W.; methodology, Y.W.; software, Y.W.; validation, L.S., Y.W. and W.Q.; formal analysis, W.Q.; investigation, L.S.; data curation, Y.W.; writing—original draft preparation, Y.W.; writing—review and editing, L.S.; visualization, Y.W.; supervision, W.Q. All authors have read and agreed to the published version of the manuscript.
Figure 1.
VO consists of three components: keypoint feature detection, correspondence establishment, and pose estimation.
Figure 1.
VO consists of three components: keypoint feature detection, correspondence establishment, and pose estimation.
Figure 2.
OFPoint pipeline. OFPoint inputs a grayscale image and outputs the keypoint locations and confidences. The multi-scale feature fusion module consists of several multi-scale feature fusion blocks. A single pixel in the output feature map represents 64 pixels in the input grayscale image.
Figure 2.
OFPoint pipeline. OFPoint inputs a grayscale image and outputs the keypoint locations and confidences. The multi-scale feature fusion module consists of several multi-scale feature fusion blocks. A single pixel in the output feature map represents 64 pixels in the input grayscale image.
Figure 3.
The multi-scale feature fusion module consists of several multi-scale feature fusion blocks, which are composed of multi-scale convolutions, channel attention, and residual connections.
Figure 3.
The multi-scale feature fusion module consists of several multi-scale feature fusion blocks, which are composed of multi-scale convolutions, channel attention, and residual connections.
Figure 4.
Constructing image pairs and training OFPoint through random homography, including rotation, scaling, and perspective transformation.
Figure 4.
Constructing image pairs and training OFPoint through random homography, including rotation, scaling, and perspective transformation.
Figure 5.
We obtain the maximum discriminative keypoint distance loss through Softmax and utilize the temperature parameter to smooth the distance differences between keypoints.
Figure 5.
We obtain the maximum discriminative keypoint distance loss through Softmax and utilize the temperature parameter to smooth the distance differences between keypoints.
Figure 6.
We obtain the maximum discriminative image grayscale loss through Softmax, and utilize the temperature parameter to smooth the similarity differences between image grayscale.
Figure 6.
We obtain the maximum discriminative image grayscale loss through Softmax, and utilize the temperature parameter to smooth the similarity differences between image grayscale.
Figure 7.
(a) OFPoint, (b) SiLK, (c) SuperPoint, (d) SIFT, (e) ORB, and (f) GFTT. The keypoint distributions in (a–c) are uniform, allowing for the detection of more details on the image.
Figure 7.
(a) OFPoint, (b) SiLK, (c) SuperPoint, (d) SIFT, (e) ORB, and (f) GFTT. The keypoint distributions in (a–c) are uniform, allowing for the detection of more details on the image.
Figure 8.
The green bounding box represents the homography estimation. Optical flow tracking based on OFPoint can achieve a homography estimation comparable to that of descriptor matching.
Figure 8.
The green bounding box represents the homography estimation. Optical flow tracking based on OFPoint can achieve a homography estimation comparable to that of descriptor matching.
Figure 9.
Four different parameter configurations were employed to evaluate the effectiveness of OFPoint in detecting keypoints and capturing details in the scene. The red boxes in the image highlight the details detected by OFPoint on different objects.
Figure 9.
Four different parameter configurations were employed to evaluate the effectiveness of OFPoint in detecting keypoints and capturing details in the scene. The red boxes in the image highlight the details detected by OFPoint on different objects.
Figure 10.
A total of nine different VO configurations, using various keypoint detectors and correspondences establishment methods, are evaluated, including (1) Orb with BruteMatch, (2) Orb with optical flow, (3) GFTT with optical flow, (4) SIFT with Flannmatch, (5) SuperPoint with Flannmatch, (6) SuperPoint with optical flow, (7) SiLK with Flannmatch, (8) SiLK with optical flow, (9) Ours’ OFPoint with optical flow. The red curve represents the ground truth, while the blue curve represents the estimated values in two dimensions.
Figure 10.
A total of nine different VO configurations, using various keypoint detectors and correspondences establishment methods, are evaluated, including (1) Orb with BruteMatch, (2) Orb with optical flow, (3) GFTT with optical flow, (4) SIFT with Flannmatch, (5) SuperPoint with Flannmatch, (6) SuperPoint with optical flow, (7) SiLK with Flannmatch, (8) SiLK with optical flow, (9) Ours’ OFPoint with optical flow. The red curve represents the ground truth, while the blue curve represents the estimated values in two dimensions.
Figure 11.
The MDE and RDE in three dimensions (x, y, z) are shown. The red curve, labeled Ours, represents the result obtained from OFPoint using optical flow.
Figure 11.
The MDE and RDE in three dimensions (x, y, z) are shown. The red curve, labeled Ours, represents the result obtained from OFPoint using optical flow.
Figure 12.
The OFPoint-based VO performs real-time tracking to calculate the pose of the mobile phone in the world coordinate system.
Figure 12.
The OFPoint-based VO performs real-time tracking to calculate the pose of the mobile phone in the world coordinate system.
Figure 13.
AR effects from the mobile phone camera: (a,b) car indoors from different perspectives; (c,d) plant outdoors from different perspectives.
Figure 13.
AR effects from the mobile phone camera: (a,b) car indoors from different perspectives; (c,d) plant outdoors from different perspectives.
Table 1.
Detection metrics for detectors on HPatches.
Table 1.
Detection metrics for detectors on HPatches.
Detectors | Size | Rep ↑ | LE (Pixel) ↓ |
---|
GFTT | | 0.463 | 0.880 |
| 0.425 | 0.894 |
ORB | | 0.538 | 1.132 |
| 0.523 | 1.221 |
SIFT | | 0.507 | 0.888 |
| 0.502 | 1.024 |
SP | | 0.639 | 1.050 |
| 0.611 | 1.141 |
DISK | | 0.634 | 0.892 |
| 0.587 | 1.031 |
ALIKE | | 0.627 | 0.955 |
| 0.579 | 1.125 |
SiLK | | 0.676 | 0.899 |
| 0.652 | 1.091 |
Ours | | 0.659 | 0.993 |
| 0.632 | 1.076 |
Table 2.
Corresponding metrics for detectors on HPatches.
Table 2.
Corresponding metrics for detectors on HPatches.
Detectors | Size | Descriptor Matching | Optical Flow Tracking |
---|
MCA ↑ | MNCC ↑ | MCA ↑ | MNCC ↑ |
GFTT | | 0.317 | 0.504 | 0.459 | 0.537 |
| 0.303 | 0.506 | 0.402 | 0.482 |
ORB | | 0.314 | 0.288 | 0.468 | 0.554 |
| 0.309 | 0.301 | 0.426 | 0.493 |
SIFT | | 0.489 | 0.418 | 0.452 | 0.486 |
| 0.457 | 0.427 | 0.420 | 0.425 |
SP | | 0.689 | 0.485 | 0.464 | 0.533 |
| 0.656 | 0.502 | 0.421 | 0.467 |
DISK | | 0.692 | 0.567 | 0.476 | 0.445 |
| 0.643 | 0.572 | 0.413 | 0.397 |
ALIKE | | 0.642 | 0.605 | 0.479 | 0.450 |
| 0.612 | 0.619 | 0.421 | 0.419 |
SiLK | | 0.653 | 0.556 | 0.477 | 0.551 |
| 0.629 | 0.572 | 0.443 | 0.496 |
Ours | | 0.639 | 0.659 | 0.517 | 0.584 |
| 0.617 | 0.678 | 0.462 | 0.523 |
Table 3.
Estimated Homographies for detectors on HPatches.
Table 3.
Estimated Homographies for detectors on HPatches.
Detectors | Size | Descriptor Matching | Optical Flow Tracking |
---|
HEA ↑ | HEAUC ↑ | HEA ↑ | HEAUC ↑ |
---|
| | | | | | | | | | | |
---|
GFTT | | 0.095 | 0.272 | 0.353 | 0.040 | 0.145 | 0.213 | 0.252 | 0.395 | 0.434 | 0.117 | 0.268 | 0.328 |
| 0.151 | 0.382 | 0.450 | 0.071 | 0.217 | 0.297 | 0.202 | 0.386 | 0.445 | 0.089 | 0.239 | 0.311 |
ORB | | 0.116 | 0.419 | 0.531 | 0.039 | 0.204 | 0.316 | 0.145 | 0.362 | 0.421 | 0.058 | 0.206 | 0.282 |
| 0.229 | 0.522 | 0.620 | 0.102 | 0.310 | 0.418 | 0.193 | 0.388 | 0.471 | 0.079 | 0.239 | 0.318 |
SIFT | | 0.589 | 0.776 | 0.815 | 0.293 | 0.577 | 0.665 | 0.284 | 0.483 | 0.534 | 0.129 | 0.320 | 0.396 |
| 0.529 | 0.722 | 0.775 | 0.294 | 0.537 | 0.623 | 0.293 | 0.456 | 0.515 | 0.131 | 0.319 | 0.391 |
SP | | 0.527 | 0.856 | 0.910 | 0.219 | 0.576 | 0.701 | 0.310 | 0.501 | 0.553 | 0.145 | 0.335 | 0.415 |
| 0.489 | 0.822 | 0.883 | 0.229 | 0.543 | 0.668 | 0.284 | 0.480 | 0.524 | 0.141 | 0.319 | 0.393 |
DISK | | 0.502 | 0.829 | 0.898 | 0.240 | 0.562 | 0.685 | 0.281 | 0.462 | 0.515 | 0.124 | 0.304 | 0.380 |
| 0.403 | 0.779 | 0.872 | 0.184 | 0.491 | 0.631 | 0.258 | 0.443 | 0.498 | 0.127 | 0.286 | 0.367 |
ALIKE | | 0.503 | 0.841 | 0.891 | 0.242 | 0.571 | 0.691 | 0.315 | 0.500 | 0.572 | 0.149 | 0.337 | 0.417 |
| 0.456 | 0.755 | 0.817 | 0.231 | 0.512 | 0.624 | 0.300 | 0.475 | 0.508 | 0.135 | 0.322 | 0.391 |
SiLK | | 0.612 | 0.855 | 0.928 | 0.346 | 0.573 | 0.692 | 0.309 | 0.498 | 0.570 | 0.136 | 0.345 | 0.391 |
| 0.525 | 0.766 | 0.879 | 0.272 | 0.564 | 0.674
| 0.271 | 0.483 | 0.521 | 0.144 | 0.296 | 0.388 |
Ours | | 0.586 | 0.842 | 0.905 | 0.286 | 0.581 | 0.689 | 0.322 | 0.506 | 0.583 | 0.139 | 0.356 | 0.412 |
| 0.514 | 0.776 | 0.854 | 0.275 | 0.555 | 0.657 | 0.294 | 0.485 | 0.542 | 0.148 | 0.338 | 0.407 |
Table 4.
The influence of various configurations of OFPoint on the metrics.
Table 4.
The influence of various configurations of OFPoint on the metrics.
Models | Backbone Encoder | Multi-Scale | Novel Loss | Detection Metrics | Corresponding Metrics | HE Metrics |
---|
Rep ↑ | LE (Pixel) ↓ | MCA ↑ | MNCC ↑ | HEA ↑ | HEAUC ↑ |
---|
Baseline | | | | 0.632 | 1.538 | 0.481 | 0.514 | 0.431 | 0.297 |
Model 1 | ◯ | | | 2.8% | 20.6% | 4.7% | 5.6% | 10.8% | 12.3% |
Model 2 | | ◯ | | 3.1% | 16.6% | 2.6% | 0.7% | 4.3% | 4.0% |
Model 3 | | | ◯ | 0.5% | 4.2% | 5.1% | 10.4% | 8.2% | 9.7% |
OFPoint | ◯ | ◯ | ◯ | 0.659 | 0.993 | 0.517 | 0.584 | 0.506 | 0.356 |
Table 5.
Model sizes for different parameter configurations.
Table 5.
Model sizes for different parameter configurations.
Model | Blocks | Channels | Params/M ↓ |
---|
Model A | 2 | 128 | 1.278 |
Model B | 4 | 128 | 1.921 |
Model C | 2 | 256 | 2.320 |
Model D | 4 | 256 | 3.712 |
SuperPoint | - | - | 1.301 |
SiLK | - | - | 0.942 |
Table 6.
Runtime metrics for detectors on HPatches.
Table 6.
Runtime metrics for detectors on HPatches.
Detectors | Size | FPS ↑ | GFLOPs ↓ | Params/M ↓ |
---|
GFTT | | 2108 | - | - |
| 524 |
ORB | | 593 | - | - |
| 200 |
SIFT | | 163 | - | - |
| 31 |
SP | | 167 | 24.812 | 1.301 |
| 67 | 210.900 |
DISK | | 88 | 24.758 | 1.092 |
| 26 | 99.030 |
ALIKE | | 249 | 2.131 | 0.329 |
| 90 | 7.991 |
SiLK | | 76 | 65.221 | 0.942 |
| 22 | 275.132 |
Ours | | 254 | 6.896 | 1.278 |
| 86 | 27.584 |
Table 7.
Application of VO on KITTI seq 00.
Table 7.
Application of VO on KITTI seq 00.
VO | MDE(m) ↓ | RDE(m) ↓ | FPS ↑ |
---|
ORB with BruteMatch | 294.173 | 1.470 | 25 |
ORB with optical | 183.075 | 0.840 | 22 |
GFTT with optical | 72.050 | 0.335 | 31 |
SIFT with Flannmatch | 16.593 | 0.124 | 9 |
SP with Flannmatch | 18.321 | 0.079 | 14 |
SP with optical | 13.994 | 0.103 | 17 |
SiLK with Flannmatch | 13.182 | 0.091 | 4 |
SiLK with optical | 6.343 | 0.127 | 7 |
Ours | 6.486 | 0.065 | 18 |