MDPI - Publisher of Open Access Journals

34 pages, 5774 KB

Open AccessArticle

Approach to Semantic Visual SLAM for Bionic Robots Based on Loop Closure Detection with Combinatorial Graph Entropy in Complex Dynamic Scenes

by Dazheng Wang and Jingwen Luo

Biomimetics 2025, 10(7), 446; https://doi.org/10.3390/biomimetics10070446 - 6 Jul 2025

Viewed by 612

Abstract

In complex dynamic environments, the performance of SLAM systems on bionic robots is susceptible to interference from dynamic objects or structural changes in the environment. To address this problem, we propose a semantic visual SLAM (vSLAM) algorithm based on loop closure detection with [...] Read more.

In complex dynamic environments, the performance of SLAM systems on bionic robots is susceptible to interference from dynamic objects or structural changes in the environment. To address this problem, we propose a semantic visual SLAM (vSLAM) algorithm based on loop closure detection with combinatorial graph entropy. First, in terms of the dynamic feature detection results of YOLOv8-seg, the feature points at the edges of the dynamic object are finely judged by calculating the mean absolute deviation (MAD) of the depth of the pixel points. Then, a high-quality keyframe selection strategy is constructed by combining the semantic information, the average coordinates of the semantic objects, and the degree of variation in the dense region of feature points. Subsequently, the unweighted and weighted graphs of keyframes are constructed according to the distribution of feature points, characterization points, and semantic information, and then a high-performance loop closure detection method based on combinatorial graph entropy is developed. The experimental results show that our loop closure detection approach exhibits higher precision and recall in real scenes compared to the bag-of-words (BoW) model. Compared with ORB-SLAM2, the absolute trajectory accuracy in high-dynamic sequences improved by an average of 97.01%, while the number of extracted keyframes decreased by an average of 61.20%. Full article

(This article belongs to the Special Issue Artificial Intelligence for Autonomous Robots: 3rd Edition)

► Show Figures

Figure 1

24 pages, 41430 KB

Open AccessArticle

An Optimal Viewpoint-Guided Visual Indexing Method for UAV Autonomous Localization

by Zhiyang Ye, Yukun Zheng, Zheng Ji and Wei Liu

Remote Sens. 2025, 17(13), 2194; https://doi.org/10.3390/rs17132194 - 25 Jun 2025

Viewed by 945

Abstract

The autonomous positioning of drone-based remote sensing plays an important role in navigation in urban environments. Due to GNSS (Global Navigation Satellite System) signal occlusion, obtaining precise drone locations is still a challenging issue. Inspired by vision-based positioning methods, we proposed an autonomous [...] Read more.

The autonomous positioning of drone-based remote sensing plays an important role in navigation in urban environments. Due to GNSS (Global Navigation Satellite System) signal occlusion, obtaining precise drone locations is still a challenging issue. Inspired by vision-based positioning methods, we proposed an autonomous positioning method based on multi-view reference images rendered from the scene’s 3D geometric mesh and apply a bag-of-words (BoW) image retrieval pipeline to achieve efficient and scalable positioning, without utilizing deep learning-based retrieval or 3D point cloud registration. To minimize the number of reference images, scene coverage quantification and optimization are employed to generate the optimal viewpoints. The proposed method jointly exploits a visual-bag-of-words tree to accelerate reference image retrieval and improve retrieval accuracy, and the Perspective-n-Point (PnP) algorithm is utilized to obtain the drone’s pose. Experiments are conducted in urban real-word scenarios and the results show that positioning errors are decreased, with accuracy ranging from sub-meter to 5 m and an average latency of 0.7–1.3 s; this indicates that our method significantly improves accuracy and latency, offering robust, real-time performance over extensive areas without relying on GNSS or dense point clouds. Full article

(This article belongs to the Section Engineering Remote Sensing)

► Show Figures

Figure 1

40 pages, 3224 KB

Open AccessArticle

A Comparative Study of Image Processing and Machine Learning Methods for Classification of Rail Welding Defects

by Mohale Emmanuel Molefe, Jules Raymond Tapamo and Siboniso Sithembiso Vilakazi

J. Sens. Actuator Netw. 2025, 14(3), 58; https://doi.org/10.3390/jsan14030058 - 29 May 2025

Viewed by 2349

Abstract

Defects formed during the thermite welding process of two sections of rails require the welded joints to be inspected for quality, and the most used non-destructive method for inspection is radiography testing. However, the conventional defect investigation process from the obtained radiography images [...] Read more.

Defects formed during the thermite welding process of two sections of rails require the welded joints to be inspected for quality, and the most used non-destructive method for inspection is radiography testing. However, the conventional defect investigation process from the obtained radiography images is costly, lengthy, and subjective as it is conducted manually by trained experts. Additionally, it has been shown that most rail breaks occur due to a crack initiated from the weld joint defect that was either misclassified or undetected. To improve the condition monitoring of rails, the railway industry requires an automated defect investigation system capable of detecting and classifying defects automatically. Therefore, this work proposes a method based on image processing and machine learning techniques for the automated investigation of defects. Histogram Equalization methods are first applied to improve image quality. Then, the extraction of the weld joint from the image background is achieved using the Chan–Vese Active Contour Model. A comparative investigation is carried out between Deep Convolution Neural Networks, Local Binary Pattern extractors, and Bag of Visual Words methods (with the Speeded-Up Robust Features extractor) for extracting features in weld joint images. Classification of features extracted by local feature extractors is achieved using Support Vector Machines, K-Nearest Neighbor, and Naive Bayes classifiers. The highest classification accuracy of 95% is achieved by the Deep Convolution Neural Network model. A Graphical User Interface is provided for the onsite investigation of defects. Full article

(This article belongs to the Special Issue AI-Assisted Machine-Environment Interaction)

► Show Figures

Figure 1

14 pages, 1656 KB

Open AccessArticle

A Hybrid Learning Framework for Enhancing Bridge Damage Prediction

by Amal Abdulbaqi Maryoosh, Saeid Pashazadeh and Pedram Salehpour

Appl. Syst. Innov. 2025, 8(3), 61; https://doi.org/10.3390/asi8030061 - 30 Apr 2025

Cited by 1 | Viewed by 876

Abstract

Bridges are crucial structures for transportation networks, and their structural integrity is paramount. Deterioration and damage to bridges can lead to significant economic losses, traffic disruptions, and, in severe cases, loss of life. Traditional methods of bridge damage detection, often relying on visual [...] Read more.

Bridges are crucial structures for transportation networks, and their structural integrity is paramount. Deterioration and damage to bridges can lead to significant economic losses, traffic disruptions, and, in severe cases, loss of life. Traditional methods of bridge damage detection, often relying on visual inspections, can be challenging or impossible in critical areas such as roofing, corners, and heights. Therefore, there is a pressing need for automated and accurate techniques for bridge damage detection. This study aims to propose a novel method for bridge crack detection that leverages a hybrid supervised and unsupervised learning strategy. The proposed approach combines pixel-based feature method local binary pattern (LBP) with the mid-level feature bag of visual words (BoVW) for feature extraction, followed by the Apriori algorithm for dimensionality reduction and optimal feature selection. The selected features are then trained using the MobileNet model. The proposed model demonstrates exceptional performance, achieving accuracy rates ranging from 98.27% to 100%, with error rates between 1.73% and 0% across multiple bridge damage datasets. This study contributes a reliable hybrid learning framework for minimizing error rates in bridge damage detection, showcasing the potential of combining LBP–BoVW features with MobileNet for image-based classification tasks. Full article

► Show Figures

Figure 1

20 pages, 3071 KB

Open AccessArticle

A Keyframe Extraction Method for Assembly Line Operation Videos Based on Optical Flow Estimation and ORB Features

by Xiaoyu Gao, Hua Xiang, Tongxi Wang, Wei Zhan, Mengxue Xie, Lingxuan Zhang and Muyu Lin

Sensors 2025, 25(9), 2677; https://doi.org/10.3390/s25092677 - 23 Apr 2025

Viewed by 1238

Abstract

In modern manufacturing, cameras are widely used to record the full workflow of assembly line workers, enabling video-based operational analysis and management. However, these recordings are often excessively long, leading to high storage demands and inefficient processing. Existing keyframe extraction methods typically apply [...] Read more.

In modern manufacturing, cameras are widely used to record the full workflow of assembly line workers, enabling video-based operational analysis and management. However, these recordings are often excessively long, leading to high storage demands and inefficient processing. Existing keyframe extraction methods typically apply uniform strategies across all frames, which are ineffective in detecting subtle movements. To address this, we propose a keyframe extraction method tailored for assembly line videos, combining optical flow estimation with ORB-based visual features. Our approach adapts extraction strategies to actions with different motion amplitudes. Each video frame is first encoded into a feature vector using the ORB algorithm and a bag-of-visual-words model. Optical flow is then calculated using the DIS algorithm, allowing frames to be categorized by motion intensity. Adjacent frames within the same category are grouped, and the appropriate number of clusters, k, is determined based on the group’s characteristics. Keyframes are finally selected via k-means++ clustering within each group. The experimental results show that our method achieves a recall rate of 85.2%, with over 90% recall for actions involving minimal movement. Moreover, the method processes an average of 274 frames per second. These results highlight the method’s effectiveness in identifying subtle actions, reducing redundant content, and delivering high accuracy with efficient performance. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

20 pages, 3018 KB

Open AccessArticle

Global Semantic Localization from Abstract Ellipse-Ellipsoid Model and Object-Level Instance Topology

by Heng Wu, Yanjie Liu, Chao Wang and Yanlong Wei

Remote Sens. 2024, 16(22), 4187; https://doi.org/10.3390/rs16224187 - 10 Nov 2024

Viewed by 1354

Abstract

Robust and highly accurate localization using a camera is a challenging task when appearance varies significantly. In indoor environments, changes in illumination and object occlusion can have a significant impact on visual localization. In this paper, we propose a visual localization method based [...] Read more.

Robust and highly accurate localization using a camera is a challenging task when appearance varies significantly. In indoor environments, changes in illumination and object occlusion can have a significant impact on visual localization. In this paper, we propose a visual localization method based on an ellipse-ellipsoid model, combined with object-level instance topology and alignment. First, we develop a CNN-based (Convolutional Neural Network) ellipse prediction network, DEllipse-Net, which integrates depth information with RGB data to estimate the projection of ellipsoids onto images. Second, we model environments using 3D (Three-dimensional) ellipsoids, instance topology, and ellipsoid descriptors. Finally, the detected ellipses are aligned with the ellipsoids in the environment through semantic object association, and 6-DoF (Degree of Freedom) pose estimation is performed using the ellipse-ellipsoid model. In the bounding box noise experiment, DEllipse-Net demonstrates higher robustness compared to other methods, achieving the highest prediction accuracy for 11 out of 23 objects in ellipse prediction. In the localization test with 15 pixels of noise, we achieve

A T E

(Absolute Translation Error) and

A R E

(Absolute Rotation Error) of 0.077 m and

2 . 70^{\circ}

in the

f r 2_d e s k

sequence. Additionally, DEllipse-Net is lightweight and highly portable, with a model size of only 18.6 MB, and a single model can handle all objects. In the object-level instance topology and alignment experiment, our topology and alignment methods significantly enhance the global localization accuracy of the ellipse-ellipsoid model. In experiments involving lighting changes and occlusions, our method achieves more robust global localization compared to the classical bag-of-words based localization method and other ellipse-ellipsoid localization methods. Full article

► Show Figures

Figure 1

16 pages, 2663 KB

Open AccessArticle

Bag of Feature-Based Ensemble Subspace KNN Classifier in Muscle Ultrasound Diagnosis of Diabetic Peripheral Neuropathy

by Kadhim K. Al-Barazanchi, Ali H. Al-Timemy and Zahid M. Kadhim

Math. Comput. Appl. 2024, 29(5), 95; https://doi.org/10.3390/mca29050095 - 20 Oct 2024

Cited by 1 | Viewed by 1430

Abstract

Muscle ultrasound quantification is a valuable complementary diagnostic tool for diabetic peripheral neuropathy (DPN), enhancing physicians’ diagnostic capabilities. Quantitative assessment is generally regarded as more reliable and sensitive than visual evaluation, which often necessitates specialized expertise. This work develops a computer-aided diagnostic (CAD) [...] Read more.

Muscle ultrasound quantification is a valuable complementary diagnostic tool for diabetic peripheral neuropathy (DPN), enhancing physicians’ diagnostic capabilities. Quantitative assessment is generally regarded as more reliable and sensitive than visual evaluation, which often necessitates specialized expertise. This work develops a computer-aided diagnostic (CAD) system based on muscle ultrasound that integrates the bag of features (BOF) and an ensemble subspace k-nearest neighbor (KNN) algorithm for DPN detection. The BOF creates a histogram of visual word occurrences to represent the muscle ultrasound images and trains an ensemble classifier through cross-validation, determining optimal parameters to improve classification accuracy for the ensemble diagnosis system. The dataset includes ultrasound images of six muscles from 53 subjects, consisting of 27 control and 26 patient cases. An empirical analysis was conducted for each binary classifier based on muscle type to select the best vocabulary tree properties or K values for BOF. The result indicates that ensemble subspace KNN classification, based on the bag of features, achieved an accuracy of 97.23%. CAD systems can effectively diagnose muscle pathology, thereby addressing limitations and identifying issues in individuals with diabetes. This research underscores muscle ultrasound as a promising diagnostic tool to aid physicians in making accurate diagnoses, streamlining workflow, and uncovering muscle-related complications in DPN patients. Full article

(This article belongs to the Section Engineering)

► Show Figures

Figure 1

21 pages, 7746 KB

Open AccessArticle

Multi-Robot Collaborative Mapping with Integrated Point-Line Features for Visual SLAM

by Yu Xia, Xiao Wu, Tao Ma, Liucun Zhu, Jingdi Cheng and Junwu Zhu

Sensors 2024, 24(17), 5743; https://doi.org/10.3390/s24175743 - 4 Sep 2024

Cited by 2 | Viewed by 2630

Abstract

Simultaneous Localization and Mapping (SLAM) enables mobile robots to autonomously perform localization and mapping tasks in unknown environments. Despite significant progress achieved by visual SLAM systems in ideal conditions, relying solely on a single robot and point features for mapping in large-scale indoor [...] Read more.

Simultaneous Localization and Mapping (SLAM) enables mobile robots to autonomously perform localization and mapping tasks in unknown environments. Despite significant progress achieved by visual SLAM systems in ideal conditions, relying solely on a single robot and point features for mapping in large-scale indoor environments with weak-texture structures can affect mapping efficiency and accuracy. Therefore, this paper proposes a multi-robot collaborative mapping method based on point-line fusion to address this issue. This method is designed for indoor environments with weak-texture structures for localization and mapping. The feature-extraction algorithm, which combines point and line features, supplements the existing environment point feature-extraction method by introducing a line feature-extraction step. This integration ensures the accuracy of visual odometry estimation in scenes with pronounced weak-texture structure features. For relatively large indoor scenes, a scene-recognition-based map-fusion method is proposed in this paper to enhance mapping efficiency. This method relies on visual bag of words to determine overlapping areas in the scene, while also proposing a keyframe-extraction method based on photogrammetry to improve the algorithm’s robustness. By combining the Perspective-3-Point (P3P) algorithm and Bundle Adjustment (BA) algorithm, the relative pose-transformation relationships of multi-robots in overlapping scenes are resolved, and map fusion is performed based on these relative pose relationships. We evaluated our algorithm on public datasets and a mobile robot platform. The experimental results demonstrate that the proposed algorithm exhibits higher robustness and mapping accuracy. It shows significant effectiveness in handling mapping in scenarios with weak texture and structure, as well as in small-scale map fusion. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Figure 1

24 pages, 1413 KB

Open AccessArticle

Loop Detection Method Based on Neural Radiance Field BoW Model for Visual Inertial Navigation of UAVs

by Xiaoyue Zhang, Yue Cui, Yanchao Ren, Guodong Duan and Huanrui Zhang

Remote Sens. 2024, 16(16), 3038; https://doi.org/10.3390/rs16163038 - 19 Aug 2024

Viewed by 1455

Abstract

The loop closure detection (LCD) methods in Unmanned Aerial Vehicle (UAV) Visual Inertial Navigation System (VINS) are often affected by issues such as insufficient image texture information and limited observational perspectives, resulting in constrained UAV positioning accuracy and reduced capability to perform complex [...] Read more.

The loop closure detection (LCD) methods in Unmanned Aerial Vehicle (UAV) Visual Inertial Navigation System (VINS) are often affected by issues such as insufficient image texture information and limited observational perspectives, resulting in constrained UAV positioning accuracy and reduced capability to perform complex tasks. This study proposes a Bag-of-Words (BoW) LCD method based on Neural Radiance Field (NeRF), which estimates camera poses from existing images and achieves rapid scene reconstruction through NeRF. A method is designed to select virtual viewpoints and render images along the flight trajectory using a specific sampling approach to expand the limited observational angles, mitigating the impact of image blur and insufficient texture information at specific viewpoints while enlarging the loop closure candidate frames to improve the accuracy and success rate of LCD. Additionally, a BoW vector construction method that incorporates the importance of similar visual words and an adapted virtual image filtering and comprehensive scoring calculation method are designed to determine loop closures. Applied to VINS-Mono and ORB-SLAM3, and compared with the advanced BoW model LCDs of the two systems, results indicate that the NeRF-based BoW LCD method can detect more than 48% additional accurate loop closures, while the system’s navigation positioning error mean is reduced by over 46%, validating the effectiveness and superiority of the proposed method and demonstrating its significant importance for improving the navigation accuracy of VINS. Full article

(This article belongs to the Special Issue Advances in the Remote Sensing Application of Autonomous Unmanned Vehicles (UAV/UGV/USV/UUV))

► Show Figures

Figure 1

16 pages, 7155 KB

Open AccessArticle

Overlapping Image-Set Determination Method Based on Hybrid BoVW-NoM Approach for UAV Image Localization

by Juyeon Lee and Kanghyeok Choi

Appl. Sci. 2024, 14(13), 5839; https://doi.org/10.3390/app14135839 - 4 Jul 2024

Cited by 2 | Viewed by 1413

Abstract

With the increasing use of unmanned aerial vehicles (UAVs) in various fields, achieving the precise localization of UAV images is crucial for enhancing their utility. Photogrammetry-based techniques, particularly bundle adjustment, serve as foundational methods for accurately determining the spatial coordinates of UAV images. [...] Read more.

With the increasing use of unmanned aerial vehicles (UAVs) in various fields, achieving the precise localization of UAV images is crucial for enhancing their utility. Photogrammetry-based techniques, particularly bundle adjustment, serve as foundational methods for accurately determining the spatial coordinates of UAV images. The effectiveness of bundle adjustment is significantly influenced by the selection of input data, particularly the composition of overlapping image sets. The selection process of overlapping images significantly impacts both the accuracy of spatial coordinate determination and the computational efficiency of UAV image localization. Therefore, a strategic approach to this selection is crucial for optimizing the performance of bundle adjustment in UAV image processing. In this context, we propose an efficient methodology for determining overlapping image sets. The proposed method selects overlapping images based on image similarity, leveraging the complementary strengths of the bag of visual words and number of matches techniques. Essentially, our method achieves both high accuracy and high speed by utilizing a Bag of Visual Words for candidate selection and the number of matches for additional similarity assessment for overlapping image-set determination. We compared the performance of our proposed methodology with the conventional number of matches and bag-of-visual word-based methods for overlapping image-set determination. In the comparative evaluation, the proposed method demonstrated an average precision of 96%, comparable to that of the number of matches-based approach, while surpassing the 62% precision achieved by both bag-of-visual-word methods. Moreover, the processing time decreased by approximately 0.11 times compared with the number of matches-based methods, demonstrating relatively high efficiency. Furthermore, in the bundle adjustment results using image sets, the proposed method, along with the number of matches-based methods, showed reprojection error values of less than 1, indicating relatively high accuracy and contributing to the improvement in accuracy in estimating image positions. Full article

► Show Figures

Figure 1

21 pages, 36012 KB

Open AccessArticle

DFD-SLAM: Visual SLAM with Deep Features in Dynamic Environment

by Wei Qian, Jiansheng Peng and Hongyu Zhang

Appl. Sci. 2024, 14(11), 4949; https://doi.org/10.3390/app14114949 - 6 Jun 2024

Cited by 7 | Viewed by 3598

Abstract

Visual SLAM technology is one of the important technologies for mobile robots. Existing feature-based visual SLAM techniques suffer from tracking and loop closure performance degradation in complex environments. We propose the DFD-SLAM system to ensure outstanding accuracy and robustness across diverse environments. Initially, [...] Read more.

Visual SLAM technology is one of the important technologies for mobile robots. Existing feature-based visual SLAM techniques suffer from tracking and loop closure performance degradation in complex environments. We propose the DFD-SLAM system to ensure outstanding accuracy and robustness across diverse environments. Initially, building on the ORB-SLAM3 system, we replace the original feature extraction component with the HFNet network and introduce a frame rotation estimation method. This method determines the rotation angles between consecutive frames to select superior local descriptors. Furthermore, we utilize CNN-extracted global descriptors to replace the bag-of-words approach. Subsequently, we develop a precise removal strategy, combining semantic information from YOLOv8 to accurately eliminate dynamic feature points. In the TUM-VI dataset, DFD-SLAM shows an improvement over ORB-SLAM3 of 29.24% in the corridor sequences, 40.07% in the magistrale sequences, 28.75% in the room sequences, and 35.26% in the slides sequences. In the TUM-RGBD dataset, DFD-SLAM demonstrates a 91.57% improvement over ORB-SLAM3 in highly dynamic scenarios. This demonstrates the effectiveness of our approach. Full article

(This article belongs to the Special Issue Intelligent Control and Robotics II)

► Show Figures

Figure 1

13 pages, 73945 KB

Open AccessArticle

Route Positioning System for Campus Shuttle Bus Service Using a Single Camera

by Jhonghyun An

Electronics 2024, 13(11), 2004; https://doi.org/10.3390/electronics13112004 - 21 May 2024

Cited by 2 | Viewed by 1556

Abstract

A route positioning system is a technology that identifies the current route when driving from one stop to the next, commonly found in public transportation systems such as shuttle buses that follow fixed routes. This is especially useful for smaller-scale services, such as [...] Read more.

A route positioning system is a technology that identifies the current route when driving from one stop to the next, commonly found in public transportation systems such as shuttle buses that follow fixed routes. This is especially useful for smaller-scale services, such as shuttle buses, where using expensive technology and sensors for location tracking might not be feasible. Particularly in urban areas with tall buildings or mountainous regions with lots of trees, relying solely on GPS can lead to many errors. Therefore, this paper suggests a cost-effective solution that uses just one camera sensor to accurately determine the location of small-scale transportation services on fixed routes. For this, this paper uses a single-stage detection network that quickly identifies objects and tracks them using a simple algorithm. These detected features are compiled into a “codebook” using the bag-of-visual-words technique. During actual trips, this pre-created codebook is compared with landmarks that the camera sees. This comparison helps to determine the route currently being traveled. To test the effectiveness of this approach, this paper used the route of a shuttle bus on the Gachon University campus, which is similar to a downtown area with tall buildings or a wooded mountainous area. The results showed that the shuttle bus’s route was recognized with an accuracy of 0.60. Areas with distinct features were recognized with an accuracy of 0.99, while stops with simple, nondescript structures were recognized with an accuracy of 0.29. Additionally, applying the SORT algorithm to enhance performance slightly improved the accuracy from 0.60 to 0.61. This demonstrates that our proposed method can effectively perform location recognition using only cameras in small shuttle buses. Full article

(This article belongs to the Special Issue Computer Vision Applications for Autonomous Vehicles)

► Show Figures

Figure 1

24 pages, 1193 KB

Open AccessArticle

A Systematic Evaluation of Feature Encoding Techniques for Gait Analysis Using Multimodal Sensory Data

by Rimsha Fatima, Muhammad Hassan Khan, Muhammad Adeel Nisar, Rafał Doniec, Muhammad Shahid Farid and Marcin Grzegorzek

Sensors 2024, 24(1), 75; https://doi.org/10.3390/s24010075 - 22 Dec 2023

Cited by 11 | Viewed by 2668

Abstract

This paper addresses the problem of feature encoding for gait analysis using multimodal time series sensory data. In recent years, the dramatic increase in the use of numerous sensors, e.g., inertial measurement unit (IMU), in our daily wearable devices has gained the interest [...] Read more.

This paper addresses the problem of feature encoding for gait analysis using multimodal time series sensory data. In recent years, the dramatic increase in the use of numerous sensors, e.g., inertial measurement unit (IMU), in our daily wearable devices has gained the interest of the research community to collect kinematic and kinetic data to analyze the gait. The most crucial step for gait analysis is to find the set of appropriate features from continuous time series data to accurately represent human locomotion. This paper presents a systematic assessment of numerous feature extraction techniques. In particular, three different feature encoding techniques are presented to encode multimodal time series sensory data. In the first technique, we utilized eighteen different handcrafted features which are extracted directly from the raw sensory data. The second technique follows the Bag-of-Visual-Words model; the raw sensory data are encoded using a pre-computed codebook and a locality-constrained linear encoding (LLC)-based feature encoding technique. We evaluated two different machine learning algorithms to assess the effectiveness of the proposed features in the encoding of raw sensory data. In the third feature encoding technique, we proposed two end-to-end deep learning models to automatically extract the features from raw sensory data. A thorough experimental evaluation is conducted on four large sensory datasets and their outcomes are compared. A comparison of the recognition results with current state-of-the-art methods demonstrates the computational efficiency and high efficacy of the proposed feature encoding method. The robustness of the proposed feature encoding technique is also evaluated to recognize human daily activities. Additionally, this paper also presents a new dataset consisting of the gait patterns of 42 individuals, gathered using IMU sensors. Full article

(This article belongs to the Special Issue Deep Learning in Visual and Wearable Sensing for Motion Analysis and Healthcare)

► Show Figures

Figure 1

11 pages, 2425 KB

Open AccessArticle

RETRACTED: Research on Texture Feature Recognition of Regional Architecture Based on Visual Saliency Model

by Jing Liu, Yuxuan Song, Lingxiang Guo and Mengting Hu

Electronics 2023, 12(22), 4581; https://doi.org/10.3390/electronics12224581 - 9 Nov 2023

Cited by 1 | Viewed by 1405 | Retraction

Abstract

Architecture is a representative of a city. It is also a spatial carrier of urban culture. Identifying the architectural features in a city can help with urban transformation and promote urban development. The use of visual saliency models in regional architectural texture recognition [...] Read more.

Architecture is a representative of a city. It is also a spatial carrier of urban culture. Identifying the architectural features in a city can help with urban transformation and promote urban development. The use of visual saliency models in regional architectural texture recognition can effectively enhance the effectiveness of regional architectural texture recognition. In this paper, the improved visual saliency model first enhances the texture images of regional buildings through histogram enhancement technology, and uses visual saliency algorithms to extract the visual saliency of the texture features of regional buildings. Then, combined with the maximum interclass difference method of threshold segmentation, the visual saliency image is segmented to achieve accurate target recognition. Finally, the feature factor iteration of the Bag of Visual Words model and the function classification of support vector machines were used to complete the recognition of regional architectural texture features. Through experimental verification, the constructed regional architectural texture feature recognition method based on visual saliency model can effectively enhance the recognition image. This method performs well in boundary contour separation and visual saliency, with an average recognition rate of 0.814 for texture features in different building scenes, indicating high stability. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

19 pages, 3219 KB

Open AccessArticle

A New Method for Classifying Scenes for Simultaneous Localization and Mapping Using the Boundary Object Function Descriptor on RGB-D Points

by Victor Lomas-Barrie, Mario Suarez-Espinoza, Gerardo Hernandez-Chavez and Antonio Neme

Sensors 2023, 23(21), 8836; https://doi.org/10.3390/s23218836 - 30 Oct 2023

Cited by 1 | Viewed by 1661

Abstract

Scene classification in autonomous navigation is a highly complex task due to variations, such as light conditions and dynamic objects, in the inspected scenes; it is also a challenge for small-factor computers to run modern and highly demanding algorithms. In this contribution, we [...] Read more.

Scene classification in autonomous navigation is a highly complex task due to variations, such as light conditions and dynamic objects, in the inspected scenes; it is also a challenge for small-factor computers to run modern and highly demanding algorithms. In this contribution, we introduce a novel method for classifying scenes in simultaneous localization and mapping (SLAM) using the boundary object function (BOF) descriptor on RGB-D points. Our method aims to reduce complexity with almost no performance cost. All the BOF-based descriptors from each object in a scene are combined to define the scene class. Instead of traditional image classification methods such as ORB or SIFT, we use the BOF descriptor to classify scenes. Through an RGB-D camera, we capture points and adjust them onto layers than are perpendicular to the camera plane. From each plane, we extract the boundaries of objects such as furniture, ceilings, walls, or doors. The extracted features compose a bag of visual words classified by a support vector machine. The proposed method achieves almost the same accuracy in scene classification as a SIFT-based algorithm and is 2.38× faster. The experimental results demonstrate the effectiveness of the proposed method in terms of accuracy and robustness for the 7-Scenes and SUNRGBD datasets. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Graphical abstract

Search Results (90)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (90)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI