Algorithms

Research

16 pages, 8528 KB

Open AccessArticle

Augmented Dataset for Vision-Based Analysis of Railroad Ballast via Multi-Dimensional Data Synthesis

by Kelin Ding, Jiayi Luo, Haohang Huang, John M. Hart, Issam I. A. Qamhia and Erol Tutumluer

Algorithms 2024, 17(8), 367; https://doi.org/10.3390/a17080367 - 21 Aug 2024

Cited by 1 | Viewed by 1570

Ballast serves a vital structural function in supporting railroad tracks under continuous loading. The degradation of ballast can result in issues such as inadequate drainage, lateral instability, excessive settlement, and potential service disruptions, necessitating efficient evaluation methods to ensure safe and reliable railroad [...] Read more.

Ballast serves a vital structural function in supporting railroad tracks under continuous loading. The degradation of ballast can result in issues such as inadequate drainage, lateral instability, excessive settlement, and potential service disruptions, necessitating efficient evaluation methods to ensure safe and reliable railroad operations. The incorporation of computer vision techniques into ballast inspection processes has proven effective in enhancing accuracy and robustness. Given the data-driven nature of deep learning approaches, the efficacy of these models is intrinsically linked to the quality of the training datasets, thereby emphasizing the need for a comprehensive and meticulously annotated ballast aggregate dataset. This paper presents the development of a multi-dimensional ballast aggregate dataset, constructed using empirical data collected from field and laboratory environments, supplemented with synthetic data generated by a proprietary ballast particle generator. The dataset comprises both two-dimensional (2D) data, consisting of ballast images annotated with 2D masks for particle localization, and three-dimensional (3D) data, including heightmaps, point clouds, and 3D annotations for particle localization. The data collection process encompassed various environmental lighting conditions and degradation states, ensuring extensive coverage and diversity within the training dataset. A previously developed 2D ballast particle segmentation model was trained on this augmented dataset, demonstrating high accuracy in field ballast inspections. This comprehensive database will be utilized in subsequent research to advance 3D ballast particle segmentation and shape completion, thereby facilitating enhanced inspection protocols and the development of effective ballast maintenance methodologies. Full article

(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)

► Show Figures

Figure 1

15 pages, 7315 KB

Open AccessArticle

Computer Vision Algorithms on a Raspberry Pi 4 for Automated Depalletizing

by Danilo Greco, Majid Fasihiany, Ali Varasteh Ranjbar, Francesco Masulli, Stefano Rovetta and Alberto Cabri

Algorithms 2024, 17(8), 363; https://doi.org/10.3390/a17080363 - 18 Aug 2024

Cited by 1 | Viewed by 3534

Abstract

The primary objective of a depalletizing system is to automate the process of detecting and locating specific variable-shaped objects on a pallet, allowing a robotic system to accurately unstack them. Although many solutions exist for the problem in industrial and manufacturing settings, the [...] Read more.

The primary objective of a depalletizing system is to automate the process of detecting and locating specific variable-shaped objects on a pallet, allowing a robotic system to accurately unstack them. Although many solutions exist for the problem in industrial and manufacturing settings, the application to small-scale scenarios such as retail vending machines and small warehouses has not received much attention so far. This paper presents a comparative analysis of four different computer vision algorithms for the depalletizing task, implemented on a Raspberry Pi 4, a very popular single-board computer with low computer power suitable for the IoT and edge computing. The algorithms evaluated include the following: pattern matching, scale-invariant feature transform, Oriented FAST and Rotated BRIEF, and Haar cascade classifier. Each technique is described and their implementations are outlined. Their evaluation is performed on the task of box detection and localization in the test images to assess their suitability in a depalletizing system. The performance of the algorithms is given in terms of accuracy, robustness to variability, computational speed, detection sensitivity, and resource consumption. The results reveal the strengths and limitations of each algorithm, providing valuable insights for selecting the most appropriate technique based on the specific requirements of a depalletizing system. Full article

(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)

► Show Figures

Figure 1

20 pages, 29617 KB

Open AccessArticle

Real-Time Tracking and Detection of Cervical Cancer Precursor Cells: Leveraging SIFT Descriptors in Mobile Video Sequences for Enhanced Early Diagnosis

by Jesus Eduardo Alcaraz-Chavez, Adriana del Carmen Téllez-Anguiano, Juan Carlos Olivares-Rojas and Ricardo Martínez-Parrales

Algorithms 2024, 17(7), 309; https://doi.org/10.3390/a17070309 - 12 Jul 2024

Viewed by 1338

Abstract

Cervical cancer ranks among the leading causes of mortality in women worldwide, underscoring the critical need for early detection to ensure patient survival. While the Pap smear test is widely used, its effectiveness is hampered by the inherent subjectivity of cytological analysis, impacting [...] Read more.

Cervical cancer ranks among the leading causes of mortality in women worldwide, underscoring the critical need for early detection to ensure patient survival. While the Pap smear test is widely used, its effectiveness is hampered by the inherent subjectivity of cytological analysis, impacting its sensitivity and specificity. This study introduces an innovative methodology for detecting and tracking precursor cervical cancer cells using SIFT descriptors in video sequences captured with mobile devices. More than one hundred digital images were analyzed from Papanicolaou smears provided by the State Public Health Laboratory of Michoacán, Mexico, along with over 1800 unique examples of cervical cancer precursor cells. SIFT descriptors enabled real-time correspondence of precursor cells, yielding results demonstrating 98.34% accuracy, 98.3% precision, 98.2% recovery rate, and an F-measure of 98.05%. These methods were meticulously optimized for real-time analysis, showcasing significant potential to enhance the accuracy and efficiency of the Pap smear test in early cervical cancer detection. Full article

(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)

► Show Figures

Figure 1

12 pages, 2334 KB

Open AccessArticle

CentralBark Image Dataset and Tree Species Classification Using Deep Learning

by Charles Warner, Fanyou Wu, Rado Gazo, Bedrich Benes, Nicole Kong and Songlin Fei

Algorithms 2024, 17(5), 179; https://doi.org/10.3390/a17050179 - 27 Apr 2024

Cited by 3 | Viewed by 4991

Abstract

The task of tree species classification through deep learning has been challenging for the forestry community, and the lack of standardized datasets has hindered further progress. Our work presents a solution in the form of a large bark image dataset called CentralBark, which [...] Read more.

The task of tree species classification through deep learning has been challenging for the forestry community, and the lack of standardized datasets has hindered further progress. Our work presents a solution in the form of a large bark image dataset called CentralBark, which enhances the deep learning-based tree species classification. Additionally, we have laid out an efficient and repeatable data collection protocol to assist future works in an organized manner. The dataset contains images of 25 central hardwood and Appalachian region tree species, with over 19,000 images of varying diameters, light, and moisture conditions. We tested 25 species: elm, oak, American basswood, American beech, American elm, American sycamore, bitternut hickory, black cherry, black locust, black oak, black walnut, eastern cottonwood, hackberry, honey locust, northern red oak, Ohio buckeye, Osage-orange, pignut hickory, sassafras, shagbark hickory silver maple, slippery elm, sugar maple, sweetgum, white ash, white oak, and yellow poplar. Our experiment involved testing three different models to assess the feasibility of species classification using unaltered and uncropped images during the species-classification training process. We achieved an overall accuracy of 83.21% using the EfficientNet-b3 model, which was the best of the three models (EfficientNet-b3, ResNet-50, and MobileNet-V3-small), and an average accuracy of 80.23%. Full article

(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)

► Show Figures

Figure 1

17 pages, 1872 KB

Open AccessArticle

A Multi-Stage Method for Logo Detection in Scanned Official Documents Based on Image Processing

by María Guijarro, Juan Bayon, Daniel Martín-Carabias and Joaquín Recas

Algorithms 2024, 17(4), 170; https://doi.org/10.3390/a17040170 - 22 Apr 2024

Viewed by 2172

Abstract

A logotype is a rectangular region defined by a set of characteristics, which come from the pixel information and region shape, that differ from those of the text. In this paper, a new method for automatic logo detection is proposed and tested using [...] Read more.

A logotype is a rectangular region defined by a set of characteristics, which come from the pixel information and region shape, that differ from those of the text. In this paper, a new method for automatic logo detection is proposed and tested using the public Tobacco800 database. Our method outputs a set of regions from an official document with a high probability to contain a logo using a new approach based on the variation of the feature rectangles method available in the literature. Candidate regions were computed using the longest increasing run algorithm over the document blank lines’ indices. Those regions were further refined by using a feature-rectangle-expansion method with forward checking, where the rectangle expansion can occur in parallel in each region. Finally, a C4.5 decision tree was trained and tested against a set of 1291 official documents to evaluate its performance. The strategic combination of the three previous steps offers a precision and recall for logo detention of 98.9% and 89.9%, respectively, being also resistant to noise and low-quality documents. The method is also able to reduce the processing area of the document while maintaining a low percentage of false negatives. Full article

(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)

► Show Figures

Figure 1

13 pages, 4268 KB

Open AccessArticle

Effect of the Light Environment on Image-Based SPAD Value Prediction of Radish Leaves

by Yuto Kamiwaki and Shinji Fukuda

Algorithms 2024, 17(1), 16; https://doi.org/10.3390/a17010016 - 29 Dec 2023

Cited by 3 | Viewed by 2361

Abstract

This study aims to clarify the influence of photographic environments under different light sources on image-based SPAD value prediction. The input variables for the SPAD value prediction using Random Forests, XGBoost, and LightGBM were RGB values, HSL values, HSV values, light color temperature [...] Read more.

This study aims to clarify the influence of photographic environments under different light sources on image-based SPAD value prediction. The input variables for the SPAD value prediction using Random Forests, XGBoost, and LightGBM were RGB values, HSL values, HSV values, light color temperature (LCT), and illuminance (ILL). Model performance was assessed using Pearson’s correlation coefficient (COR), Nash–Sutcliffe efficiency (NSE), and root mean squared error (RMSE). Especially, SPAD value prediction with Random Forests resulted in high accuracy in a stable light environment; COR_RGB+ILL+LCT and COR_HSL+ILL+LCT were 0.929 and 0.922, respectively. Image-based SPAD value prediction was effective under halogen light with a similar color temperature at dusk; COR_RGB+ILL and COR_HSL+ILL were 0.895 and 0.876, respectively. The HSL value under LED could be used to predict the SPAD value with high accuracy in all performance measures. The results supported the applicability of SPAD value prediction using Random Forests under a wide range of lighting conditions, such as dusk, by training a model based on data collected under different illuminance conditions in various light sources. Further studies are required to examine this method under outdoor conditions in spatiotemporally dynamic light environments. Full article

(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)

► Show Figures

Figure 1

15 pages, 8312 KB

Open AccessArticle

A Lightweight Graph Neural Network Algorithm for Action Recognition Based on Self-Distillation

by Miao Feng and Jean Meunier

Algorithms 2023, 16(12), 552; https://doi.org/10.3390/a16120552 - 1 Dec 2023

Cited by 2 | Viewed by 2847

Abstract

Recognizing human actions can help in numerous ways, such as health monitoring, intelligent surveillance, virtual reality and human–computer interaction. A quick and accurate detection algorithm is required for daily real-time detection. This paper first proposes to generate a lightweight graph neural network by [...] Read more.

Recognizing human actions can help in numerous ways, such as health monitoring, intelligent surveillance, virtual reality and human–computer interaction. A quick and accurate detection algorithm is required for daily real-time detection. This paper first proposes to generate a lightweight graph neural network by self-distillation for human action recognition tasks. The lightweight graph neural network was evaluated on the NTU-RGB+D dataset. The results demonstrate that, with competitive accuracy, the heavyweight graph neural network can be compressed by up to

80 %

. Furthermore, the learned representations have denser clusters, estimated by the Davies–Bouldin index, the Dunn index and silhouette coefficients. The ideal input data and algorithm capacity are also discussed. Full article

(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)

► Show Figures

Figure 1

13 pages, 8724 KB

Open AccessArticle

Cloud Detection and Tracking Based on Object Detection with Convolutional Neural Networks

by Jose Antonio Carballo, Javier Bonilla, Jesús Fernández-Reche, Bijan Nouri, Antonio Avila-Marin, Yann Fabel and Diego-César Alarcón-Padilla

Algorithms 2023, 16(10), 487; https://doi.org/10.3390/a16100487 - 19 Oct 2023

Cited by 7 | Viewed by 3191

Abstract

Due to the need to know the availability of solar resources for the solar renewable technologies in advance, this paper presents a new methodology based on computer vision and the object detection technique that uses convolutional neural networks (EfficientDet-D2 model) to detect clouds [...] Read more.

Due to the need to know the availability of solar resources for the solar renewable technologies in advance, this paper presents a new methodology based on computer vision and the object detection technique that uses convolutional neural networks (EfficientDet-D2 model) to detect clouds in image series. This methodology also calculates the speed and direction of cloud motion, which allows the prediction of transients in the available solar radiation due to clouds. The convolutional neural network model retraining and validation process finished successfully, which gave accurate cloud detection results in the test. Also, during the test, the estimation of the remaining time for a transient due to a cloud was accurate, mainly due to the precise cloud detection and the accuracy of the remaining time algorithm. Full article

(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)

► Show Figures

Figure 1

15 pages, 11652 KB

Open AccessArticle

Ascertaining the Ideality of Photometric Stereo Datasets under Unknown Lighting

by Elisa Crabu, Federica Pes, Giuseppe Rodriguez and Giuseppa Tanda

Algorithms 2023, 16(8), 375; https://doi.org/10.3390/a16080375 - 5 Aug 2023

Cited by 6 | Viewed by 1773

Abstract

The standard photometric stereo model makes several assumptions that are rarely verified in experimental datasets. In particular, the observed object should behave as a Lambertian reflector, and the light sources should be positioned at an infinite distance from it, along a known direction. [...] Read more.

The standard photometric stereo model makes several assumptions that are rarely verified in experimental datasets. In particular, the observed object should behave as a Lambertian reflector, and the light sources should be positioned at an infinite distance from it, along a known direction. Even when Lambert’s law is approximately fulfilled, an accurate assessment of the relative position between the light source and the target is often unavailable in real situations. The Hayakawa procedure is a computational method for estimating such information directly from data images. It occasionally breaks down when some of the available images excessively deviate from ideality. This is generally due to observing a non-Lambertian surface, or illuminating it from a close distance, or both. Indeed, in narrow shooting scenarios, typical, e.g., of archaeological excavation sites, it is impossible to position a flashlight at a sufficient distance from the observed surface. It is then necessary to understand if a given dataset is reliable and which images should be selected to better reconstruct the target. In this paper, we propose some algorithms to perform this task and explore their effectiveness. Full article

(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)

► Show Figures

Figure 1

12 pages, 2788 KB

Open AccessArticle

Vessel Velocity Estimation and Docking Analysis: A Computer Vision Approach

by João V. R. de Andrade, Bruno J. T. Fernandes, André R. L. C. Izídio, Nilson M. da Silva Filho and Francisco Cruz

Algorithms 2023, 16(7), 326; https://doi.org/10.3390/a16070326 - 30 Jun 2023

Cited by 2 | Viewed by 2417

Abstract

The opportunities for leveraging technology to enhance the efficiency of vessel port activities are vast. Applying video analytics to model and optimize certain processes offers a remarkable way to improve overall operations. Within the realm of vessel port activities, two crucial processes are [...] Read more.

The opportunities for leveraging technology to enhance the efficiency of vessel port activities are vast. Applying video analytics to model and optimize certain processes offers a remarkable way to improve overall operations. Within the realm of vessel port activities, two crucial processes are vessel approximation and the docking process. This work specifically focuses on developing a vessel velocity estimation model and a docking mooring analytical system using a computer vision approach. The study introduces algorithms for speed estimation and mooring bitt detection, leveraging techniques such as the Structural Similarity Index (SSIM) for precise image comparison. The obtained results highlight the effectiveness of the proposed algorithms, demonstrating satisfactory speed estimation capabilities and successful identification of tied cables on the mooring bitts. These advancements pave the way for enhanced safety and efficiency in vessel docking procedures. However, further research and improvements are necessary to address challenges related to occlusions and illumination variations and explore additional techniques to enhance the models’ performance and applicability in real-world scenarios. Full article

(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)

► Show Figures

Figure 1

21 pages, 6194 KB

Open AccessArticle

Fusion of CCTV Video and Spatial Information for Automated Crowd Congestion Monitoring in Public Urban Spaces

by Vivian W. H. Wong and Kincho H. Law

Algorithms 2023, 16(3), 154; https://doi.org/10.3390/a16030154 - 10 Mar 2023

Cited by 8 | Viewed by 4476

Abstract

Crowd congestion is one of the main causes of modern public safety issues such as stampedes. Conventional crowd congestion monitoring using closed-circuit television (CCTV) video surveillance relies on manual observation, which is tedious and often error-prone in public urban spaces where crowds are [...] Read more.

Crowd congestion is one of the main causes of modern public safety issues such as stampedes. Conventional crowd congestion monitoring using closed-circuit television (CCTV) video surveillance relies on manual observation, which is tedious and often error-prone in public urban spaces where crowds are dense, and occlusions are prominent. With the aim of managing crowded spaces safely, this study proposes a framework that combines spatial and temporal information to automatically map the trajectories of individual occupants, as well as to assist in real-time congestion monitoring and prediction. Through exploiting both features from CCTV footage and spatial information of the public space, the framework fuses raw CCTV video and floor plan information to create visual aids for crowd monitoring, as well as a sequence of crowd mobility graphs (CMGraphs) to store spatiotemporal features. This framework uses deep learning-based computer vision models, geometric transformations, and Kalman filter-based tracking algorithms to automate the retrieval of crowd congestion data, specifically the spatiotemporal distribution of individuals and the overall crowd flow. The resulting collective crowd movement data is then stored in the CMGraphs, which are designed to facilitate congestion forecasting at key exit/entry regions. We demonstrate our framework on two video data, one public from a train station dataset and the other recorded at a stadium following a crowded football game. Using both qualitative and quantitative insights from the experiments, we demonstrate that the suggested framework can be useful to help assist urban planners and infrastructure operators with the management of congestion hazards. Full article

(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)

► Show Figures

Figure 1

31 pages, 29405 KB

Open AccessArticle

Assessing the Mass Transfer Coefficient in Jet Bioreactors with Classical Computer Vision Methods and Neural Networks Algorithms

by Irina Nizovtseva, Vladimir Palmin, Ivan Simkin, Ilya Starodumov, Pavel Mikushin, Alexander Nozik, Timur Hamitov, Sergey Ivanov, Sergey Vikharev, Alexei Zinovev, Vladislav Svitich, Matvey Mogilev, Margarita Nikishina, Simon Kraev, Stanislav Yurchenko, Timofey Mityashin, Dmitrii Chernushkin, Anna Kalyuzhnaya and Felix Blyakhman

Algorithms 2023, 16(3), 125; https://doi.org/10.3390/a16030125 - 21 Feb 2023

Cited by 8 | Viewed by 3256

Abstract

Development of energy-efficient and high-performance bioreactors requires progress in methods for assessing the key parameters of the biosynthesis process. With a wide variety of approaches and methods for determining the phase contact area in gas–liquid flows, the question of obtaining its accurate quantitative [...] Read more.

Development of energy-efficient and high-performance bioreactors requires progress in methods for assessing the key parameters of the biosynthesis process. With a wide variety of approaches and methods for determining the phase contact area in gas–liquid flows, the question of obtaining its accurate quantitative estimation remains open. Particularly challenging are the issues of getting information about the mass transfer coefficients instantly, as well as the development of predictive capabilities for the implementation of effective flow control in continuous fermentation both on the laboratory and industrial scales. Motivated by the opportunity to explore the possibility of applying classical and non-classical computer vision methods to the results of high-precision video records of bubble flows obtained during the experiment in the bioreactor vessel, we obtained a number of results presented in the paper. Characteristics of the bioreactor’s bubble flow were estimated first by classical computer vision (CCV) methods including an elliptic regression approach for single bubble boundaries selection and clustering, image transformation through a set of filters and developing an algorithm for separation of the overlapping bubbles. The application of the developed method for the entire video filming makes it possible to obtain parameter distributions and set dropout thresholds in order to obtain better estimates due to averaging. The developed CCV methodology was also tested and verified on a collected and labeled manual dataset. An onwards deep neural network (NN) approach was also applied, for instance the segmentation task, and has demonstrated certain advantages in terms of high segmentation resolution, while the classical one tends to be more speedy. Thus, in the current manuscript both advantages and disadvantages of the classical computer vision method (CCV) and neural network approach (NN) are discussed based on evaluation of bubbles’ number and their area defined. An approach to mass transfer coefficient estimation methodology in virtue of obtained results is also represented. Full article

(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)

► Show Figures

Figure 1

17 pages, 495 KB

Open AccessArticle

Audiovisual Biometric Network with Deep Feature Fusion for Identification and Text Prompted Verification

by Juan Carlos Atenco, Juan Carlos Moreno and Juan Manuel Ramirez

Algorithms 2023, 16(2), 66; https://doi.org/10.3390/a16020066 - 19 Jan 2023

Cited by 4 | Viewed by 3037

Abstract

In this work we present a bimodal multitask network for audiovisual biometric recognition. The proposed network performs the fusion of features extracted from face and speech data through a weighted sum to jointly optimize the contribution of each modality, aiming for the identification [...] Read more.

In this work we present a bimodal multitask network for audiovisual biometric recognition. The proposed network performs the fusion of features extracted from face and speech data through a weighted sum to jointly optimize the contribution of each modality, aiming for the identification of a client. The extracted speech features are simultaneously used in a speech recognition task with random digit sequences. Text prompted verification is performed by fusing the scores obtained from the matching of bimodal embeddings with the Word Error Rate (WER) metric calculated from the accuracy of the transcriptions. The score fusion outputs a value that can be compared with a threshold to accept or reject the identity of a client. Training and evaluation was carried out by using our proprietary database BIOMEX-DB and VidTIMIT audiovisual database. Our network achieved an accuracy of 100% and an Equal Error Rate (EER) of 0.44% for identification and verification, respectively, in the best case. To the best of our knowledge, this is the first system that combines the mutually related tasks previously described for biometric recognition. Full article

(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Recent Advances in Algorithms for Computer Vision Applications

Share This Special Issue

Special Issue Editor

Special Issue Information

Benefits of Publishing in a Special Issue

Published Papers (13 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI