MDPI - Publisher of Open Access Journals

25 pages, 12760 KB

Open AccessArticle

Intelligent Face Recognition: Comprehensive Feature Extraction Methods for Holistic Face Analysis and Modalities

by Thoalfeqar G. Jarullah, Ahmad Saeed Mohammad, Musab T. S. Al-Kaltakchi and Jabir Alshehabi Al-Ani

Signals 2025, 6(3), 49; https://doi.org/10.3390/signals6030049 - 19 Sep 2025

Viewed by 428

Face recognition technology utilizes unique facial features to analyze and compare individuals for identification and verification purposes. This technology is crucial for several reasons, such as improving security and authentication, effectively verifying identities, providing personalized user experiences, and automating various operations, including attendance [...] Read more.

Face recognition technology utilizes unique facial features to analyze and compare individuals for identification and verification purposes. This technology is crucial for several reasons, such as improving security and authentication, effectively verifying identities, providing personalized user experiences, and automating various operations, including attendance monitoring, access management, and law enforcement activities. In this paper, comprehensive evaluations are conducted using different face detection and modality segmentation methods, feature extraction methods, and classifiers to improve system performance. As for face detection, four methods are proposed: OpenCV’s Haar Cascade classifier, Dlib’s HOG + SVM frontal face detector, Dlib’s CNN face detector, and Mediapipe’s face detector. Additionally, two types of feature extraction techniques are proposed: hand-crafted features (traditional methods: global local features) and deep learning features. Three global features were extracted, Scale-Invariant Feature Transform (SIFT), Speeded Robust Features (SURF), and Global Image Structure (GIST). Likewise, the following local feature methods are utilized: Local Binary Pattern (LBP), Weber local descriptor (WLD), and Histogram of Oriented Gradients (HOG). On the other hand, the deep learning-based features fall into two categories: convolutional neural networks (CNNs), including VGG16, VGG19, and VGG-Face, and Siamese neural networks (SNNs), which generate face embeddings. For classification, three methods are employed: Support Vector Machine (SVM), a one-class SVM variant, and Multilayer Perceptron (MLP). The system is evaluated on three datasets: in-house, Labelled Faces in the Wild (LFW), and the Pins dataset (sourced from Pinterest) providing comprehensive benchmark comparisons for facial recognition research. The best performance accuracy for the proposed ten-feature extraction methods applied to the in-house database in the context of the facial recognition task achieved 99.8% accuracy by using the VGG16 model combined with the SVM classifier. Full article

► Show Figures

Figure 1

16 pages, 7958 KB

Open AccessArticle

Development and Evaluation of a Keypoint-Based Video Stabilization Pipeline for Oral Capillaroscopy

by Vito Gentile, Vincenzo Taormina, Luana Conte, Giorgio De Nunzio, Giuseppe Raso and Donato Cascio

Sensors 2025, 25(18), 5738; https://doi.org/10.3390/s25185738 - 15 Sep 2025

Viewed by 316

Abstract

Capillaroscopy imaging is a non-invasive technique used to examine the microcirculation of the oral mucosa. However, the acquired video sequences are often affected by motion noise and shaking, which can compromise diagnostic accuracy and hinder the development of automated systems for capillary identification [...] Read more.

Capillaroscopy imaging is a non-invasive technique used to examine the microcirculation of the oral mucosa. However, the acquired video sequences are often affected by motion noise and shaking, which can compromise diagnostic accuracy and hinder the development of automated systems for capillary identification and segmentation. To address these challenges, we implemented a comprehensive video stabilization model, structured as a multi-phase pipeline and visually represented through a flow-chart. The proposed method integrates keypoint extraction, optical flow estimation, and affine transformation-based frame alignment to enhance video stability. Within this framework, we evaluated the performance of three keypoint extraction algorithms—Scale-Invariant Feature Transform (SIFT), Oriented FAST and Rotated BRIEF (ORB) and Good Features to Track (GFTT)—on a curated dataset of oral capillaroscopy videos. To simulate real-world acquisition conditions, synthetic tremors were introduced via Gaussian affine transformations. Experimental results demonstrate that all three algorithms yield comparable stabilization performance, with GFTT offering slightly higher structural fidelity and ORB excelling in computational efficiency. These findings validate the effectiveness of the proposed model and highlight its potential for improving the quality and reliability of oral videocapillaroscopy imaging. Experimental evaluation showed that the proposed pipeline achieved an average SSIM of 0.789 and reduced jitter to 25.8, compared to the perturbed input sequences. In addition, path smoothness and RMS errors (translation and rotation) consistently indicated improved stabilization across all tested feature extractors. Compared to previous stabilization approaches in nailfold capillaroscopy, our method achieved comparable or superior structural fidelity while maintaining computational efficiency. Full article

(This article belongs to the Special Issue Biomedical Signals, Images and Healthcare Data Analysis: 2nd Edition)

► Show Figures

Figure 1

17 pages, 3935 KB

Open AccessArticle

Markerless Force Estimation via SuperPoint-SIFT Fusion and Finite Element Analysis: A Sensorless Solution for Deformable Object Manipulation

by Qingqing Xu, Ruoyang Lai and Junqing Yin

Biomimetics 2025, 10(9), 600; https://doi.org/10.3390/biomimetics10090600 - 8 Sep 2025

Viewed by 409

Abstract

Contact-force perception is a critical component of safe robotic grasping. With the rapid advances in embodied intelligence technology, humanoid robots have enhanced their multimodal perception capabilities. Conventional force sensors face limitations, such as complex spatial arrangements, installation challenges at multiple nodes, and potential [...] Read more.

Contact-force perception is a critical component of safe robotic grasping. With the rapid advances in embodied intelligence technology, humanoid robots have enhanced their multimodal perception capabilities. Conventional force sensors face limitations, such as complex spatial arrangements, installation challenges at multiple nodes, and potential interference with robotic flexibility. Consequently, these conventional sensors are unsuitable for biomimetic robot requirements in object perception, natural interaction, and agile movement. Therefore, this study proposes a sensorless external force detection method that integrates SuperPoint-Scale Invariant Feature Transform (SIFT) feature extraction with finite element analysis to address force perception challenges. A visual analysis method based on the SuperPoint-SIFT feature fusion algorithm was implemented to reconstruct a three-dimensional displacement field of the target object. Subsequently, the displacement field was mapped to the contact force distribution using finite element modeling. Experimental results demonstrate a mean force estimation error of 7.60% (isotropic) and 8.15% (anisotropic), with RMSE < 8%, validated by flexible pressure sensors. To enhance the model’s reliability, a dual-channel video comparison framework was developed. By analyzing the consistency of the deformation patterns and mechanical responses between the actual compression and finite element simulation video keyframes, the proposed approach provides a novel solution for real-time force perception in robotic interactions. The proposed solution is suitable for applications such as precision assembly and medical robotics, where sensorless force feedback is crucial. Full article

(This article belongs to the Special Issue Bio-Inspired Intelligent Robot)

► Show Figures

Figure 1

21 pages, 3448 KB

Open AccessArticle

A Welding Defect Detection Model Based on Hybrid-Enhanced Multi-Granularity Spatiotemporal Representation Learning

by Chenbo Shi, Shaojia Yan, Lei Wang, Changsheng Zhu, Yue Yu, Xiangteng Zang, Aiping Liu, Chun Zhang and Xiaobing Feng

Sensors 2025, 25(15), 4656; https://doi.org/10.3390/s25154656 - 27 Jul 2025

Viewed by 786

Abstract

Real-time quality monitoring using molten pool images is a critical focus in researching high-quality, intelligent automated welding. To address interference problems in molten pool images under complex welding scenarios (e.g., reflected laser spots from spatter misclassified as porosity defects) and the limited interpretability [...] Read more.

Real-time quality monitoring using molten pool images is a critical focus in researching high-quality, intelligent automated welding. To address interference problems in molten pool images under complex welding scenarios (e.g., reflected laser spots from spatter misclassified as porosity defects) and the limited interpretability of deep learning models, this paper proposes a multi-granularity spatiotemporal representation learning algorithm based on the hybrid enhancement of handcrafted and deep learning features. A MobileNetV2 backbone network integrated with a Temporal Shift Module (TSM) is designed to progressively capture the short-term dynamic features of the molten pool and integrate temporal information across both low-level and high-level features. A multi-granularity attention-based feature aggregation module is developed to select key interference-free frames using cross-frame attention, generate multi-granularity features via grouped pooling, and apply the Convolutional Block Attention Module (CBAM) at each granularity level. Finally, these multi-granularity spatiotemporal features are adaptively fused. Meanwhile, an independent branch utilizes the Histogram of Oriented Gradient (HOG) and Scale-Invariant Feature Transform (SIFT) features to extract long-term spatial structural information from historical edge images, enhancing the model’s interpretability. The proposed method achieves an accuracy of 99.187% on a self-constructed dataset. Additionally, it attains a real-time inference speed of 20.983 ms per sample on a hardware platform equipped with an Intel i9-12900H CPU and an RTX 3060 GPU, thus effectively balancing accuracy, speed, and interpretability. Full article

(This article belongs to the Topic Applied Computing and Machine Intelligence (ACMI))

► Show Figures

Figure 1

26 pages, 92114 KB

Open AccessArticle

Multi-Modal Remote Sensing Image Registration Method Combining Scale-Invariant Feature Transform with Co-Occurrence Filter and Histogram of Oriented Gradients Features

by Yi Yang, Shuo Liu, Haitao Zhang, Dacheng Li and Ling Ma

Remote Sens. 2025, 17(13), 2246; https://doi.org/10.3390/rs17132246 - 30 Jun 2025

Viewed by 907

Abstract

Multi-modal remote sensing images often exhibit complex and nonlinear radiation differences which significantly hinder the performance of traditional feature-based image registration methods such as Scale-Invariant Feature Transform (SIFT). In contrast, structural features—such as edges and contours—remain relatively consistent across modalities. To address this [...] Read more.

Multi-modal remote sensing images often exhibit complex and nonlinear radiation differences which significantly hinder the performance of traditional feature-based image registration methods such as Scale-Invariant Feature Transform (SIFT). In contrast, structural features—such as edges and contours—remain relatively consistent across modalities. To address this challenge, we propose a novel multi-modal image registration method, Cof-SIFT, which integrates a co-occurrence filter with SIFT. By replacing the traditional Gaussian filter with a co-occurrence filter, Cof-SIFT effectively suppresses texture variations while preserving structural information, thereby enhancing robustness to cross-modal differences. To further improve image registration accuracy, we introduce an extended approach, Cof-SIFT_HOG, which extracts Histogram of Oriented Gradients (HOG) features from the image gradient magnitude map of corresponding points and refines their positions based on HOG similarity. This refinement yields more precise alignment between the reference and image to be registered. We evaluated Cof-SIFT and Cof-SIFT_HOG on a diverse set of multi-modal remote sensing image pairs. The experimental results demonstrate that both methods outperform existing approaches, including SIFT, COFSM, SAR-SIFT, PSO-SIFT, and OS-SIFT, in terms of robustness and registration accuracy. Notably, Cof-SIFT_HOG achieves the highest overall performance, confirming the effectiveness of the proposed structural-preserving and corresponding point location refinement strategies in cross-modal registration tasks. Full article

► Show Figures

Figure 1

27 pages, 86462 KB

Open AccessArticle

SAR Image Registration Based on SAR-SIFT and Template Matching

by Shichong Liu, Xiaobo Deng, Chun Liu and Yongchao Cheng

Remote Sens. 2025, 17(13), 2216; https://doi.org/10.3390/rs17132216 - 27 Jun 2025

Viewed by 580

Abstract

Accurate image registration is essential for synthetic aperture radar (SAR) applications such as change detection, image fusion, and deformation monitoring. However, SAR image registration faces challenges including speckle noise, low-texture regions, and the geometric transformation caused by topographic relief due to side-looking radar [...] Read more.

Accurate image registration is essential for synthetic aperture radar (SAR) applications such as change detection, image fusion, and deformation monitoring. However, SAR image registration faces challenges including speckle noise, low-texture regions, and the geometric transformation caused by topographic relief due to side-looking radar imaging. To address these issues, this paper proposes a novel two-stage registration method, consisting of pre-registration and fine registration. In the pre-registration stage, the scale-invariant feature transform for the synthetic aperture radar (SAR-SIFT) algorithm is integrated into an iterative optimization framework to eliminate large-scale geometric discrepancies, ensuring a coarse but reliable initial alignment. In the fine registration stage, a novel similarity measure is introduced by combining frequency-domain phase congruency and spatial-domain gradient features, which enhances the robustness and accuracy of template matching, especially in edge-rich regions. For the topographic relief in the SAR images, an adaptive local stretching transformation strategy is proposed to correct the undulating areas. Experiments on five pairs of SAR images containing flat and undulating regions show that the proposed method achieves initial alignment errors below 10 pixels and final registration errors below 1 pixel. Compared with other methods, our approach obtains more correct matching pairs (up to 100+ per image pair), higher registration precision, and improved robustness under complex terrains. These results validate the accuracy and effectiveness of the proposed registration framework. Full article

► Show Figures

Figure 1

19 pages, 8306 KB

Open AccessArticle

Plant Sam Gaussian Reconstruction (PSGR): A High-Precision and Accelerated Strategy for Plant 3D Reconstruction

by Jinlong Chen, Yingjie Jiao, Fuqiang Jin, Xingguo Qin, Yi Ning, Minghao Yang and Yongsong Zhan

Electronics 2025, 14(11), 2291; https://doi.org/10.3390/electronics14112291 - 4 Jun 2025

Viewed by 935

Abstract

Plant 3D reconstruction plays a critical role in precision agriculture and plant growth monitoring, yet it faces challenges such as complex background interference, difficulties in capturing intricate plant structures, and a slow reconstruction speed. In this study, we propose PlantSamGaussianReconstruction (PSGR), a novel [...] Read more.

Plant 3D reconstruction plays a critical role in precision agriculture and plant growth monitoring, yet it faces challenges such as complex background interference, difficulties in capturing intricate plant structures, and a slow reconstruction speed. In this study, we propose PlantSamGaussianReconstruction (PSGR), a novel method that integrates Grounding SAM with 3D Gaussian Splatting (3DGS) techniques. PSGR employs Grounding DINO and SAM for accurate plant–background segmentation, utilizes algorithms such as Scale-Invariant Feature Transform (SIFT) for camera pose estimation and sparse point cloud generation, and leverages 3DGS for plant reconstruction. Furthermore, a 3D–2D projection-guided optimization strategy is introduced to enhance segmentation precision. The experimental results of various multi-view plant image datasets demonstrate that PSGR effectively removes background noise under diverse environments, accurately captures plant details, and achieves peak signal-to-noise ratio (PSNR) values exceeding 30 in most scenarios, outperforming the original 3DGS approach. Moreover, PSGR reduces training time by up to 26.9%, significantly improving reconstruction efficiency. These results suggest that PSGR is an efficient, scalable, and high-precision solution for plant modeling. Full article

► Show Figures

Figure 1

16 pages, 9488 KB

Open AccessArticle

A Multitask Network for the Diagnosis of Autoimmune Gastritis

by Yuqi Cao, Yining Zhao, Xinao Jin, Jiayuan Zhang, Gangzhi Zhang, Pingjie Huang, Guangxin Zhang and Yuehua Han

J. Imaging 2025, 11(5), 154; https://doi.org/10.3390/jimaging11050154 - 15 May 2025

Viewed by 923

Abstract

Autoimmune gastritis (AIG) has a strong correlation with gastric neuroendocrine tumors (NETs) and gastric cancer, making its timely and accurate diagnosis crucial for tumor prevention. The endoscopic manifestations of AIG differ from those of gastritis caused by Helicobacter pylori (H. pylori) [...] Read more.

Autoimmune gastritis (AIG) has a strong correlation with gastric neuroendocrine tumors (NETs) and gastric cancer, making its timely and accurate diagnosis crucial for tumor prevention. The endoscopic manifestations of AIG differ from those of gastritis caused by Helicobacter pylori (H. pylori) infection in terms of the affected gastric anatomical regions and the pathological characteristics observed in biopsy samples. Therefore, when diagnosing AIG based on endoscopic images, it is essential not only to distinguish between normal and atrophic gastric mucosa but also to accurately identify the anatomical region in which the atrophic mucosa is located. In this study, we propose a patient-based multitask gastroscopy image classification network that analyzes all images obtained during the endoscopic procedure. First, we employ the Scale-Invariant Feature Transform (SIFT) algorithm for image registration, generating an image similarity matrix. Next, we use a hierarchical clustering algorithm to group images based on this matrix. Finally, we apply the RepLKNet model, which utilizes large-kernel convolution, to each image group to perform two tasks: anatomical region classification and lesion recognition. Our method achieves an accuracy of 93.4 ± 0.5% (95% CI) and a precision of 92.6 ± 0.4% (95% CI) in the anatomical region classification task, which categorizes images into the fundus, body, and antrum. Additionally, it attains an accuracy of 90.2 ± 1.0% (95% CI) and a precision of 90.5 ± 0.8% (95% CI) in the lesion recognition task, which identifies the presence of gastric mucosal atrophic lesions in gastroscopy images. These results demonstrate that the proposed multitask patient-based gastroscopy image analysis method holds significant practical value for advancing computer-aided diagnosis systems for atrophic gastritis and enhancing the diagnostic accuracy and efficiency of AIG. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

33 pages, 36897 KB

Open AccessArticle

Making Images Speak: Human-Inspired Image Description Generation

by Chifaa Sebbane, Ikram Belhajem and Mohammed Rziza

Information 2025, 16(5), 356; https://doi.org/10.3390/info16050356 - 28 Apr 2025

Cited by 2 | Viewed by 680

Abstract

Despite significant advances in deep learning-based image captioning, many state-of-the-art models still struggle to balance visual grounding (i.e., accurate object and scene descriptions) with linguistic coherence (i.e., grammatical fluency and appropriate use of non-visual tokens such as articles and prepositions). To address these [...] Read more.

Despite significant advances in deep learning-based image captioning, many state-of-the-art models still struggle to balance visual grounding (i.e., accurate object and scene descriptions) with linguistic coherence (i.e., grammatical fluency and appropriate use of non-visual tokens such as articles and prepositions). To address these limitations, we propose a hybrid image captioning framework that integrates handcrafted and deep visual features. Specifically, we combine local descriptors—Scale-Invariant Feature Transform (SIFT) and Bag of Features (BoF)—with high-level semantic features extracted using ResNet50. This dual representation captures both fine-grained spatial details and contextual semantics. The decoder employs Bahdanau attention refined with an Attention-on-Attention (AoA) mechanism to optimize visual-textual alignment, while GloVe embeddings and a GRU-based sequence model ensure fluent language generation. The proposed system is trained on 200,000 image-caption pairs from the MS COCO train2014 dataset and evaluated on 50,000 held-out MS COCO pairs plus the Flickr8K benchmark. Our model achieves a CIDEr score of 128.3 and a SPICE score of 29.24, reflecting clear improvements over baselines in both semantic precision—particularly for spatial relationships—and grammatical fluency. These results validate that combining classical computer vision techniques with modern attention mechanisms yields more interpretable and linguistically precise captions, addressing key limitations in neural caption generation. Full article

(This article belongs to the Topic Visual Computing and Understanding: New Developments and Trends)

► Show Figures

Figure 1

22 pages, 121478 KB

Open AccessArticle

Ground-Moving Target Relocation for a Lightweight Unmanned Aerial Vehicle-Borne Radar System Based on Doppler Beam Sharpening Image Registration

by Wencheng Liu, Zhen Chen, Zhiyu Jiang, Yanlei Li, Yunlong Liu, Xiangxi Bu and Xingdong Liang

Electronics 2025, 14(9), 1760; https://doi.org/10.3390/electronics14091760 - 25 Apr 2025

Viewed by 524

Abstract

With the rapid development of lightweight unmanned aerial vehicles (UAVs), the combination of UAVs and ground-moving target indication (GMTI) radar systems has received great interest. However, because of size, weight, and power (SWaP) limitations, the UAV may not be able to equip a [...] Read more.

With the rapid development of lightweight unmanned aerial vehicles (UAVs), the combination of UAVs and ground-moving target indication (GMTI) radar systems has received great interest. However, because of size, weight, and power (SWaP) limitations, the UAV may not be able to equip a highly accurate inertial navigation system (INS), which leads to reduced accuracy in the moving target relocation. To solve this issue, we propose using an image registration algorithm, which matches a Doppler beam sharpening (DBS) image of detected moving targets to a synthetic aperture radar (SAR) image containing coordinate information. However, when using conventional SAR image registration algorithms such as the SAR scale-invariant feature transform (SIFT) algorithm, additional difficulties arise. To overcome these difficulties, we developed a new image-matching algorithm, which first estimates the errors of the UAV platform to compensate for geometric distortions in the DBS image. In addition, to showcase the relocation improvement achieved with the new algorithm, we compared it with the affine transformation and second-order polynomial algorithms. The findings of simulated and real-world experiments demonstrate that our proposed image transformation method offers better moving target relocation results under low-accuracy INS conditions. Full article

(This article belongs to the Special Issue New Challenges in Remote Sensing Image Processing)

► Show Figures

Figure 1

23 pages, 1297 KB

Open AccessArticle

Multi-Granularity and Multi-Modal Feature Fusion for Indoor Positioning

by Lijuan Ye, Yi Wang, Shenglei Pei, Yu Wang, Hong Zhao and Shi Dong

Symmetry 2025, 17(4), 597; https://doi.org/10.3390/sym17040597 - 15 Apr 2025

Viewed by 654

Abstract

Despite the widespread adoption of indoor positioning technology, the existing solutions still face significant challenges. On one hand, Wi-Fi-based positioning struggles to balance accuracy and efficiency in complex indoor environments and architectural layouts formed by pre-existing access points (APs). On the other hand, [...] Read more.

Despite the widespread adoption of indoor positioning technology, the existing solutions still face significant challenges. On one hand, Wi-Fi-based positioning struggles to balance accuracy and efficiency in complex indoor environments and architectural layouts formed by pre-existing access points (APs). On the other hand, vision-based methods, while offering high-precision potential, are hindered by prohibitive costs associated with binocular camera systems required for depth image acquisition, limiting their large-scale deployment. Additionally, channel state information (CSI), containing multi-subcarrier data, maintains amplitude symmetry in ideal free-space conditions but becomes susceptible to periodic positioning errors in real environments due to multipath interference. Meanwhile, image-based positioning often suffers from spatial ambiguity in texture-repeated areas. To address these challenges, we propose a novel hybrid indoor positioning method that integrates multi-granularity and multi-modal features. By fusing CSI data with visual information, the system leverages spatial consistency constraints from images to mitigate CSI error fluctuations while utilizing CSI’s global stability to correct local ambiguities in image-based positioning. In the initial coarse-grained positioning phase, a neural network model is trained using image data to roughly localize indoor scenes. This model adeptly captures the geometric relationships within images, providing a foundation for more precise localization in subsequent stages. In the fine-grained positioning stage, CSI features from Wi-Fi signals and Scale-Invariant Feature Transform (SIFT) features from image data are fused, creating a rich feature fusion fingerprint library that enables high-precision positioning. The experimental results show that our proposed method synergistically combines the strengths of Wi-Fi fingerprints and visual positioning, resulting in a substantial enhancement in positioning accuracy. Specifically, our approach achieves an accuracy of 0.4 m for 45% of positioning points and 0.8 m for 67% of points. Overall, this approach charts a promising path forward for advancing indoor positioning technology. Full article

(This article belongs to the Section Mathematics)

► Show Figures

Figure 1

30 pages, 33973 KB

Open AccessArticle

Research on Rapid and Accurate 3D Reconstruction Algorithms Based on Multi-View Images

by Lihong Yang, Hang Ge, Zhiqiang Yang, Jia He, Lei Gong, Wanjun Wang, Yao Li, Liguo Wang and Zhili Chen

Appl. Sci. 2025, 15(8), 4088; https://doi.org/10.3390/app15084088 - 8 Apr 2025

Viewed by 1479

Abstract

Three-dimensional reconstruction entails the development of mathematical models of three-dimensional objects that are suitable for computational representation and processing. This technique constructs realistic 3D models of images and has significant practical applications across various fields. This study proposes a rapid and precise multi-view [...] Read more.

Three-dimensional reconstruction entails the development of mathematical models of three-dimensional objects that are suitable for computational representation and processing. This technique constructs realistic 3D models of images and has significant practical applications across various fields. This study proposes a rapid and precise multi-view 3D reconstruction method to address the challenges of low reconstruction efficiency and inadequate, poor-quality point cloud generation in incremental structure-from-motion (SFM) algorithms in multi-view geometry. The methodology involves capturing a series of overlapping images of campus. We employed the Scale-invariant feature transform (SIFT) algorithm to extract feature points from each image, applied the KD-Tree algorithm for inter-image matching, and Enhanced autonomous threshold adjustment by utilizing the Random sample consensus (RANSAC) algorithm to eliminate mismatches, thereby enhancing feature matching accuracy and the number of matched point pairs. Additionally, we developed a feature-matching strategy based on similarity, which optimizes the pairwise matching process within the incremental structure from a motion algorithm. This approach decreased the number of matches and enhanced both algorithmic efficiency and model reconstruction accuracy. For dense reconstruction, we utilized the patch-based multi-view stereo (PMVS) algorithm, which is based on facets. The results indicate that our proposed method achieves a higher number of reconstructed feature points and significantly enhances algorithmic efficiency by approximately ten times compared to the original incremental reconstruction algorithm. Consequently, the generated point cloud data are more detailed, and the textures are clearer, demonstrating that our method is an effective solution for three-dimensional reconstruction. Full article

► Show Figures

Figure 1

20 pages, 4789 KB

Open AccessCommunication

Fast Registration Algorithm for Laser Point Cloud Based on 3D-SIFT Features

by Lihong Yang, Shunqin Xu, Zhiqiang Yang, Jia He, Lei Gong, Wanjun Wang, Yao Li, Liguo Wang and Zhili Chen

Sensors 2025, 25(3), 628; https://doi.org/10.3390/s25030628 - 22 Jan 2025

Cited by 4 | Viewed by 1580

Abstract

In response to the issues of slow convergence and the tendency to fall into local optima in traditional iterative closest point (ICP) point cloud registration algorithms, this study presents a fast registration algorithm for laser point clouds based on 3D scale-invariant feature transform [...] Read more.

In response to the issues of slow convergence and the tendency to fall into local optima in traditional iterative closest point (ICP) point cloud registration algorithms, this study presents a fast registration algorithm for laser point clouds based on 3D scale-invariant feature transform (3D-SIFT) feature extraction. First, feature points are preliminarily extracted using a normal vector threshold; then, more high-quality feature points are extracted using the 3D-SIFT algorithm, effectively reducing the number of point cloud registrations. Based on the extracted feature points, a coarse registration of the point cloud is performed using the fast point feature histogram (FPFH) descriptor combined with the sample consensus initial alignment (SAC-IA) algorithm, followed by fine registration using the point-to-plane ICP algorithm with a symmetric target function. The experimental results show that this algorithm significantly improved the registration efficiency. Compared with the traditional SAC−IA+ICP algorithm, the registration accuracy of this algorithm increased by 29.55% in experiments on a public dataset, and the registration time was reduced by 81.01%. In experiments on actual collected data, the registration accuracy increased by 41.72%, and the registration time was reduced by 67.65%. The algorithm presented in this paper maintains a high registration accuracy while greatly reducing the registration speed. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

18 pages, 17735 KB

Open AccessArticle

Toward Efficient Edge Detection: A Novel Optimization Method Based on Integral Image Technology and Canny Edge Detection

by Yanqin Li and Dehai Zhang

Processes 2025, 13(2), 293; https://doi.org/10.3390/pr13020293 - 21 Jan 2025

Cited by 5 | Viewed by 1304

Abstract

The traditional SIFT (Scale Invariant Feature Transform) registration algorithm is highly regarded in the field of image processing due to its scale invariance, rotation invariance, and robustness to noise. However, it faces challenges such as a large number of feature points, high computational [...] Read more.

The traditional SIFT (Scale Invariant Feature Transform) registration algorithm is highly regarded in the field of image processing due to its scale invariance, rotation invariance, and robustness to noise. However, it faces challenges such as a large number of feature points, high computational demand, and poor real-time performance when dealing with large-scale images. A novel optimization method based on integral image technology and canny edge detection is presented in this paper, aiming to maintain the core advantages of the SIFT algorithm while reducing the complexity involved in image registration computations, enhancing the efficiency of the algorithm for real-time image processing, and better adaption to the needs of large-scale image handling. Firstly, Gaussian separation techniques were used to simplify Gaussian filtering, followed by the application of integral image techniques to accelerate the construction of the entire pyramid. Additionally, during the feature point detection phase, an innovative feature point filtering strategy was introduced by combining Canny edge detection with dilation operations alongside the traditional SIFT approach, aiming to reduce the number of feature points and thereby lessen the computational load. The method proposed in this paper takes 0.0134 s for Image type a, 0.0504 s for Image type b, and 0.0212 s for Image type c. In contrast, the traditional method takes 0.1452 s for Image type a, 0.5276 s for Image type b, and 0.2717 s for Image type c, resulting in reductions of 0.1318 s, 0.4772 s, and 0.2505 s, respectively. A series of comparative experiments showed that the time taken to construct the Gaussian pyramid using our proposed method was consistently lower than that required by the traditional method, indicating greater efficiency and stability regardless of image size or type. Full article

(This article belongs to the Special Issue Simulation, Modeling, and Decision-Making Processes in Manufacturing Systems and Industrial Engineering)

► Show Figures

Figure 1

20 pages, 7090 KB

Open AccessArticle

An Infrared and Visible Image Alignment Method Based on Gradient Distribution Properties and Scale-Invariant Features in Electric Power Scenes

by Lin Zhu, Yuxing Mao, Chunxu Chen and Lanjia Ning

J. Imaging 2025, 11(1), 23; https://doi.org/10.3390/jimaging11010023 - 13 Jan 2025

Viewed by 1333

Abstract

In grid intelligent inspection systems, automatic registration of infrared and visible light images in power scenes is a crucial research technology. Since there are obvious differences in key attributes between visible and infrared images, direct alignment is often difficult to achieve the expected [...] Read more.

In grid intelligent inspection systems, automatic registration of infrared and visible light images in power scenes is a crucial research technology. Since there are obvious differences in key attributes between visible and infrared images, direct alignment is often difficult to achieve the expected results. To overcome the high difficulty of aligning infrared and visible light images, an image alignment method is proposed in this paper. First, we use the Sobel operator to extract the edge information of the image pair. Second, the feature points in the edges are recognised by a curvature scale space (CSS) corner detector. Third, the Histogram of Orientation Gradients (HOG) is extracted as the gradient distribution characteristics of the feature points, which are normalised with the Scale Invariant Feature Transform (SIFT) algorithm to form feature descriptors. Finally, initial matching and accurate matching are achieved by the improved fast approximate nearest-neighbour matching method and adaptive thresholding, respectively. Experiments show that this method can robustly match the feature points of image pairs under rotation, scale, and viewpoint differences, and achieves excellent matching results. Full article

(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)

► Show Figures

Graphical abstract

Search Results (197)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (197)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI