Recent Advances in Algorithms for Computer Vision Applications

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Combinatorial Optimization, Graph, and Network Algorithms".

Deadline for manuscript submissions: closed (30 June 2024) | Viewed by 20914

Special Issue Editor

Department of Computer Information Systems, State University of New York at Buffalo State, Buffalo, NY 14222, USA
Interests: computer vision; image processing; pattern recognition; machine learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Multi-source visual information fusion and quality improvement can enhance the ability to perceive the real world. Image fusion focuses on fusing multi-source images from multiple sensors into a synthesized image that provides either comprehensive or reliable descriptions. Quality improvement techniques can be used to address the challenges of low-quality image analysis.

So far, a lot of brain-inspired solutions have been proposed in order to accomplish the above two tasks, and the artificial neural network, as one of the most popular techniques, has been widely used in image fusion and quality improvement. As this is an exciting research field, there are many interesting issues that remain to be explored, such as deep few-shot learning, unsupervised learning, the application of embodied neural systems and industrial applications.

Potential topics of interest for this Special Issue include (but are not limited to) the following areas:

  • Image acquisition;
  • Image quality analysis;
  • Image filtering, restoration and enhancement;
  • Image segmentation;
  • Biomedical image processing;
  • Color image processing;
  • Multispectral image processing;
  • Hardware and architectures for image processing;
  • Image databases;
  • Image retrieval and indexing;
  • Image compression;
  • Low-level and high-level image description;
  • Mathematical methods in image processing, analysis and representation;
  • Artificial intelligence tools in image analysis;
  • Pattern recognition algorithms applied for images;
  • Practical applications of image processing, analysis and recognition algorithms in medicine, surveillance, biometrics, document analysis, multimedia, intelligent transportation systems, stereo vision, remote sensing, computer vision, robotics and other fields.

Dr. Guanqiu Qi
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 8528 KiB  
Article
Augmented Dataset for Vision-Based Analysis of Railroad Ballast via Multi-Dimensional Data Synthesis
by Kelin Ding, Jiayi Luo, Haohang Huang, John M. Hart, Issam I. A. Qamhia and Erol Tutumluer
Algorithms 2024, 17(8), 367; https://doi.org/10.3390/a17080367 - 21 Aug 2024
Viewed by 356
Abstract
Ballast serves a vital structural function in supporting railroad tracks under continuous loading. The degradation of ballast can result in issues such as inadequate drainage, lateral instability, excessive settlement, and potential service disruptions, necessitating efficient evaluation methods to ensure safe and reliable railroad [...] Read more.
Ballast serves a vital structural function in supporting railroad tracks under continuous loading. The degradation of ballast can result in issues such as inadequate drainage, lateral instability, excessive settlement, and potential service disruptions, necessitating efficient evaluation methods to ensure safe and reliable railroad operations. The incorporation of computer vision techniques into ballast inspection processes has proven effective in enhancing accuracy and robustness. Given the data-driven nature of deep learning approaches, the efficacy of these models is intrinsically linked to the quality of the training datasets, thereby emphasizing the need for a comprehensive and meticulously annotated ballast aggregate dataset. This paper presents the development of a multi-dimensional ballast aggregate dataset, constructed using empirical data collected from field and laboratory environments, supplemented with synthetic data generated by a proprietary ballast particle generator. The dataset comprises both two-dimensional (2D) data, consisting of ballast images annotated with 2D masks for particle localization, and three-dimensional (3D) data, including heightmaps, point clouds, and 3D annotations for particle localization. The data collection process encompassed various environmental lighting conditions and degradation states, ensuring extensive coverage and diversity within the training dataset. A previously developed 2D ballast particle segmentation model was trained on this augmented dataset, demonstrating high accuracy in field ballast inspections. This comprehensive database will be utilized in subsequent research to advance 3D ballast particle segmentation and shape completion, thereby facilitating enhanced inspection protocols and the development of effective ballast maintenance methodologies. Full article
(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)
Show Figures

Figure 1

15 pages, 7315 KiB  
Article
Computer Vision Algorithms on a Raspberry Pi 4 for Automated Depalletizing
by Danilo Greco, Majid Fasihiany, Ali Varasteh Ranjbar, Francesco Masulli, Stefano Rovetta and Alberto Cabri
Algorithms 2024, 17(8), 363; https://doi.org/10.3390/a17080363 - 18 Aug 2024
Viewed by 535
Abstract
The primary objective of a depalletizing system is to automate the process of detecting and locating specific variable-shaped objects on a pallet, allowing a robotic system to accurately unstack them. Although many solutions exist for the problem in industrial and manufacturing settings, the [...] Read more.
The primary objective of a depalletizing system is to automate the process of detecting and locating specific variable-shaped objects on a pallet, allowing a robotic system to accurately unstack them. Although many solutions exist for the problem in industrial and manufacturing settings, the application to small-scale scenarios such as retail vending machines and small warehouses has not received much attention so far. This paper presents a comparative analysis of four different computer vision algorithms for the depalletizing task, implemented on a Raspberry Pi 4, a very popular single-board computer with low computer power suitable for the IoT and edge computing. The algorithms evaluated include the following: pattern matching, scale-invariant feature transform, Oriented FAST and Rotated BRIEF, and Haar cascade classifier. Each technique is described and their implementations are outlined. Their evaluation is performed on the task of box detection and localization in the test images to assess their suitability in a depalletizing system. The performance of the algorithms is given in terms of accuracy, robustness to variability, computational speed, detection sensitivity, and resource consumption. The results reveal the strengths and limitations of each algorithm, providing valuable insights for selecting the most appropriate technique based on the specific requirements of a depalletizing system. Full article
(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)
Show Figures

Figure 1

20 pages, 29617 KiB  
Article
Real-Time Tracking and Detection of Cervical Cancer Precursor Cells: Leveraging SIFT Descriptors in Mobile Video Sequences for Enhanced Early Diagnosis
by Jesus Eduardo Alcaraz-Chavez, Adriana del Carmen Téllez-Anguiano, Juan Carlos Olivares-Rojas and Ricardo Martínez-Parrales
Algorithms 2024, 17(7), 309; https://doi.org/10.3390/a17070309 - 12 Jul 2024
Viewed by 536
Abstract
Cervical cancer ranks among the leading causes of mortality in women worldwide, underscoring the critical need for early detection to ensure patient survival. While the Pap smear test is widely used, its effectiveness is hampered by the inherent subjectivity of cytological analysis, impacting [...] Read more.
Cervical cancer ranks among the leading causes of mortality in women worldwide, underscoring the critical need for early detection to ensure patient survival. While the Pap smear test is widely used, its effectiveness is hampered by the inherent subjectivity of cytological analysis, impacting its sensitivity and specificity. This study introduces an innovative methodology for detecting and tracking precursor cervical cancer cells using SIFT descriptors in video sequences captured with mobile devices. More than one hundred digital images were analyzed from Papanicolaou smears provided by the State Public Health Laboratory of Michoacán, Mexico, along with over 1800 unique examples of cervical cancer precursor cells. SIFT descriptors enabled real-time correspondence of precursor cells, yielding results demonstrating 98.34% accuracy, 98.3% precision, 98.2% recovery rate, and an F-measure of 98.05%. These methods were meticulously optimized for real-time analysis, showcasing significant potential to enhance the accuracy and efficiency of the Pap smear test in early cervical cancer detection. Full article
(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)
Show Figures

Figure 1

12 pages, 2334 KiB  
Article
CentralBark Image Dataset and Tree Species Classification Using Deep Learning
by Charles Warner, Fanyou Wu, Rado Gazo, Bedrich Benes, Nicole Kong and Songlin Fei
Algorithms 2024, 17(5), 179; https://doi.org/10.3390/a17050179 - 27 Apr 2024
Viewed by 1525
Abstract
The task of tree species classification through deep learning has been challenging for the forestry community, and the lack of standardized datasets has hindered further progress. Our work presents a solution in the form of a large bark image dataset called CentralBark, which [...] Read more.
The task of tree species classification through deep learning has been challenging for the forestry community, and the lack of standardized datasets has hindered further progress. Our work presents a solution in the form of a large bark image dataset called CentralBark, which enhances the deep learning-based tree species classification. Additionally, we have laid out an efficient and repeatable data collection protocol to assist future works in an organized manner. The dataset contains images of 25 central hardwood and Appalachian region tree species, with over 19,000 images of varying diameters, light, and moisture conditions. We tested 25 species: elm, oak, American basswood, American beech, American elm, American sycamore, bitternut hickory, black cherry, black locust, black oak, black walnut, eastern cottonwood, hackberry, honey locust, northern red oak, Ohio buckeye, Osage-orange, pignut hickory, sassafras, shagbark hickory silver maple, slippery elm, sugar maple, sweetgum, white ash, white oak, and yellow poplar. Our experiment involved testing three different models to assess the feasibility of species classification using unaltered and uncropped images during the species-classification training process. We achieved an overall accuracy of 83.21% using the EfficientNet-b3 model, which was the best of the three models (EfficientNet-b3, ResNet-50, and MobileNet-V3-small), and an average accuracy of 80.23%. Full article
(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)
Show Figures

Figure 1

17 pages, 1872 KiB  
Article
A Multi-Stage Method for Logo Detection in Scanned Official Documents Based on Image Processing
by María Guijarro, Juan Bayon, Daniel Martín-Carabias and Joaquín Recas
Algorithms 2024, 17(4), 170; https://doi.org/10.3390/a17040170 - 22 Apr 2024
Viewed by 1044
Abstract
A logotype is a rectangular region defined by a set of characteristics, which come from the pixel information and region shape, that differ from those of the text. In this paper, a new method for automatic logo detection is proposed and tested using [...] Read more.
A logotype is a rectangular region defined by a set of characteristics, which come from the pixel information and region shape, that differ from those of the text. In this paper, a new method for automatic logo detection is proposed and tested using the public Tobacco800 database. Our method outputs a set of regions from an official document with a high probability to contain a logo using a new approach based on the variation of the feature rectangles method available in the literature. Candidate regions were computed using the longest increasing run algorithm over the document blank lines’ indices. Those regions were further refined by using a feature-rectangle-expansion method with forward checking, where the rectangle expansion can occur in parallel in each region. Finally, a C4.5 decision tree was trained and tested against a set of 1291 official documents to evaluate its performance. The strategic combination of the three previous steps offers a precision and recall for logo detention of 98.9% and 89.9%, respectively, being also resistant to noise and low-quality documents. The method is also able to reduce the processing area of the document while maintaining a low percentage of false negatives. Full article
(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)
Show Figures

Figure 1

13 pages, 4268 KiB  
Article
Effect of the Light Environment on Image-Based SPAD Value Prediction of Radish Leaves
by Yuto Kamiwaki and Shinji Fukuda
Algorithms 2024, 17(1), 16; https://doi.org/10.3390/a17010016 - 29 Dec 2023
Viewed by 1556
Abstract
This study aims to clarify the influence of photographic environments under different light sources on image-based SPAD value prediction. The input variables for the SPAD value prediction using Random Forests, XGBoost, and LightGBM were RGB values, HSL values, HSV values, light color temperature [...] Read more.
This study aims to clarify the influence of photographic environments under different light sources on image-based SPAD value prediction. The input variables for the SPAD value prediction using Random Forests, XGBoost, and LightGBM were RGB values, HSL values, HSV values, light color temperature (LCT), and illuminance (ILL). Model performance was assessed using Pearson’s correlation coefficient (COR), Nash–Sutcliffe efficiency (NSE), and root mean squared error (RMSE). Especially, SPAD value prediction with Random Forests resulted in high accuracy in a stable light environment; CORRGB+ILL+LCT and CORHSL+ILL+LCT were 0.929 and 0.922, respectively. Image-based SPAD value prediction was effective under halogen light with a similar color temperature at dusk; CORRGB+ILL and CORHSL+ILL were 0.895 and 0.876, respectively. The HSL value under LED could be used to predict the SPAD value with high accuracy in all performance measures. The results supported the applicability of SPAD value prediction using Random Forests under a wide range of lighting conditions, such as dusk, by training a model based on data collected under different illuminance conditions in various light sources. Further studies are required to examine this method under outdoor conditions in spatiotemporally dynamic light environments. Full article
(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)
Show Figures

Figure 1

15 pages, 8312 KiB  
Article
A Lightweight Graph Neural Network Algorithm for Action Recognition Based on Self-Distillation
by Miao Feng and Jean Meunier
Algorithms 2023, 16(12), 552; https://doi.org/10.3390/a16120552 - 1 Dec 2023
Cited by 1 | Viewed by 1718
Abstract
Recognizing human actions can help in numerous ways, such as health monitoring, intelligent surveillance, virtual reality and human–computer interaction. A quick and accurate detection algorithm is required for daily real-time detection. This paper first proposes to generate a lightweight graph neural network by [...] Read more.
Recognizing human actions can help in numerous ways, such as health monitoring, intelligent surveillance, virtual reality and human–computer interaction. A quick and accurate detection algorithm is required for daily real-time detection. This paper first proposes to generate a lightweight graph neural network by self-distillation for human action recognition tasks. The lightweight graph neural network was evaluated on the NTU-RGB+D dataset. The results demonstrate that, with competitive accuracy, the heavyweight graph neural network can be compressed by up to 80%. Furthermore, the learned representations have denser clusters, estimated by the Davies–Bouldin index, the Dunn index and silhouette coefficients. The ideal input data and algorithm capacity are also discussed. Full article
(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)
Show Figures

Figure 1

13 pages, 8724 KiB  
Article
Cloud Detection and Tracking Based on Object Detection with Convolutional Neural Networks
by Jose Antonio Carballo, Javier Bonilla, Jesús Fernández-Reche, Bijan Nouri, Antonio Avila-Marin, Yann Fabel and Diego-César Alarcón-Padilla
Algorithms 2023, 16(10), 487; https://doi.org/10.3390/a16100487 - 19 Oct 2023
Cited by 3 | Viewed by 1931
Abstract
Due to the need to know the availability of solar resources for the solar renewable technologies in advance, this paper presents a new methodology based on computer vision and the object detection technique that uses convolutional neural networks (EfficientDet-D2 model) to detect clouds [...] Read more.
Due to the need to know the availability of solar resources for the solar renewable technologies in advance, this paper presents a new methodology based on computer vision and the object detection technique that uses convolutional neural networks (EfficientDet-D2 model) to detect clouds in image series. This methodology also calculates the speed and direction of cloud motion, which allows the prediction of transients in the available solar radiation due to clouds. The convolutional neural network model retraining and validation process finished successfully, which gave accurate cloud detection results in the test. Also, during the test, the estimation of the remaining time for a transient due to a cloud was accurate, mainly due to the precise cloud detection and the accuracy of the remaining time algorithm. Full article
(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)
Show Figures

Figure 1

15 pages, 11652 KiB  
Article
Ascertaining the Ideality of Photometric Stereo Datasets under Unknown Lighting
by Elisa Crabu, Federica Pes, Giuseppe Rodriguez and Giuseppa Tanda
Algorithms 2023, 16(8), 375; https://doi.org/10.3390/a16080375 - 5 Aug 2023
Cited by 1 | Viewed by 1224
Abstract
The standard photometric stereo model makes several assumptions that are rarely verified in experimental datasets. In particular, the observed object should behave as a Lambertian reflector, and the light sources should be positioned at an infinite distance from it, along a known direction. [...] Read more.
The standard photometric stereo model makes several assumptions that are rarely verified in experimental datasets. In particular, the observed object should behave as a Lambertian reflector, and the light sources should be positioned at an infinite distance from it, along a known direction. Even when Lambert’s law is approximately fulfilled, an accurate assessment of the relative position between the light source and the target is often unavailable in real situations. The Hayakawa procedure is a computational method for estimating such information directly from data images. It occasionally breaks down when some of the available images excessively deviate from ideality. This is generally due to observing a non-Lambertian surface, or illuminating it from a close distance, or both. Indeed, in narrow shooting scenarios, typical, e.g., of archaeological excavation sites, it is impossible to position a flashlight at a sufficient distance from the observed surface. It is then necessary to understand if a given dataset is reliable and which images should be selected to better reconstruct the target. In this paper, we propose some algorithms to perform this task and explore their effectiveness. Full article
(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)
Show Figures

Figure 1

12 pages, 2788 KiB  
Article
Vessel Velocity Estimation and Docking Analysis: A Computer Vision Approach
by João V. R. de Andrade, Bruno J. T. Fernandes, André R. L. C. Izídio, Nilson M. da Silva Filho and Francisco Cruz
Algorithms 2023, 16(7), 326; https://doi.org/10.3390/a16070326 - 30 Jun 2023
Cited by 1 | Viewed by 1652
Abstract
The opportunities for leveraging technology to enhance the efficiency of vessel port activities are vast. Applying video analytics to model and optimize certain processes offers a remarkable way to improve overall operations. Within the realm of vessel port activities, two crucial processes are [...] Read more.
The opportunities for leveraging technology to enhance the efficiency of vessel port activities are vast. Applying video analytics to model and optimize certain processes offers a remarkable way to improve overall operations. Within the realm of vessel port activities, two crucial processes are vessel approximation and the docking process. This work specifically focuses on developing a vessel velocity estimation model and a docking mooring analytical system using a computer vision approach. The study introduces algorithms for speed estimation and mooring bitt detection, leveraging techniques such as the Structural Similarity Index (SSIM) for precise image comparison. The obtained results highlight the effectiveness of the proposed algorithms, demonstrating satisfactory speed estimation capabilities and successful identification of tied cables on the mooring bitts. These advancements pave the way for enhanced safety and efficiency in vessel docking procedures. However, further research and improvements are necessary to address challenges related to occlusions and illumination variations and explore additional techniques to enhance the models’ performance and applicability in real-world scenarios. Full article
(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)
Show Figures

Figure 1

21 pages, 6194 KiB  
Article
Fusion of CCTV Video and Spatial Information for Automated Crowd Congestion Monitoring in Public Urban Spaces
by Vivian W. H. Wong and Kincho H. Law
Algorithms 2023, 16(3), 154; https://doi.org/10.3390/a16030154 - 10 Mar 2023
Cited by 2 | Viewed by 2693
Abstract
Crowd congestion is one of the main causes of modern public safety issues such as stampedes. Conventional crowd congestion monitoring using closed-circuit television (CCTV) video surveillance relies on manual observation, which is tedious and often error-prone in public urban spaces where crowds are [...] Read more.
Crowd congestion is one of the main causes of modern public safety issues such as stampedes. Conventional crowd congestion monitoring using closed-circuit television (CCTV) video surveillance relies on manual observation, which is tedious and often error-prone in public urban spaces where crowds are dense, and occlusions are prominent. With the aim of managing crowded spaces safely, this study proposes a framework that combines spatial and temporal information to automatically map the trajectories of individual occupants, as well as to assist in real-time congestion monitoring and prediction. Through exploiting both features from CCTV footage and spatial information of the public space, the framework fuses raw CCTV video and floor plan information to create visual aids for crowd monitoring, as well as a sequence of crowd mobility graphs (CMGraphs) to store spatiotemporal features. This framework uses deep learning-based computer vision models, geometric transformations, and Kalman filter-based tracking algorithms to automate the retrieval of crowd congestion data, specifically the spatiotemporal distribution of individuals and the overall crowd flow. The resulting collective crowd movement data is then stored in the CMGraphs, which are designed to facilitate congestion forecasting at key exit/entry regions. We demonstrate our framework on two video data, one public from a train station dataset and the other recorded at a stadium following a crowded football game. Using both qualitative and quantitative insights from the experiments, we demonstrate that the suggested framework can be useful to help assist urban planners and infrastructure operators with the management of congestion hazards. Full article
(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)
Show Figures

Figure 1

31 pages, 29405 KiB  
Article
Assessing the Mass Transfer Coefficient in Jet Bioreactors with Classical Computer Vision Methods and Neural Networks Algorithms
by Irina Nizovtseva, Vladimir Palmin, Ivan Simkin, Ilya Starodumov, Pavel Mikushin, Alexander Nozik, Timur Hamitov, Sergey Ivanov, Sergey Vikharev, Alexei Zinovev, Vladislav Svitich, Matvey Mogilev, Margarita Nikishina, Simon Kraev, Stanislav Yurchenko, Timofey Mityashin, Dmitrii Chernushkin, Anna Kalyuzhnaya and Felix Blyakhman
Algorithms 2023, 16(3), 125; https://doi.org/10.3390/a16030125 - 21 Feb 2023
Cited by 3 | Viewed by 2257
Abstract
Development of energy-efficient and high-performance bioreactors requires progress in methods for assessing the key parameters of the biosynthesis process. With a wide variety of approaches and methods for determining the phase contact area in gas–liquid flows, the question of obtaining its accurate quantitative [...] Read more.
Development of energy-efficient and high-performance bioreactors requires progress in methods for assessing the key parameters of the biosynthesis process. With a wide variety of approaches and methods for determining the phase contact area in gas–liquid flows, the question of obtaining its accurate quantitative estimation remains open. Particularly challenging are the issues of getting information about the mass transfer coefficients instantly, as well as the development of predictive capabilities for the implementation of effective flow control in continuous fermentation both on the laboratory and industrial scales. Motivated by the opportunity to explore the possibility of applying classical and non-classical computer vision methods to the results of high-precision video records of bubble flows obtained during the experiment in the bioreactor vessel, we obtained a number of results presented in the paper. Characteristics of the bioreactor’s bubble flow were estimated first by classical computer vision (CCV) methods including an elliptic regression approach for single bubble boundaries selection and clustering, image transformation through a set of filters and developing an algorithm for separation of the overlapping bubbles. The application of the developed method for the entire video filming makes it possible to obtain parameter distributions and set dropout thresholds in order to obtain better estimates due to averaging. The developed CCV methodology was also tested and verified on a collected and labeled manual dataset. An onwards deep neural network (NN) approach was also applied, for instance the segmentation task, and has demonstrated certain advantages in terms of high segmentation resolution, while the classical one tends to be more speedy. Thus, in the current manuscript both advantages and disadvantages of the classical computer vision method (CCV) and neural network approach (NN) are discussed based on evaluation of bubbles’ number and their area defined. An approach to mass transfer coefficient estimation methodology in virtue of obtained results is also represented. Full article
(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)
Show Figures

Figure 1

17 pages, 495 KiB  
Article
Audiovisual Biometric Network with Deep Feature Fusion for Identification and Text Prompted Verification
by Juan Carlos Atenco, Juan Carlos Moreno and Juan Manuel Ramirez
Algorithms 2023, 16(2), 66; https://doi.org/10.3390/a16020066 - 19 Jan 2023
Cited by 2 | Viewed by 2178
Abstract
In this work we present a bimodal multitask network for audiovisual biometric recognition. The proposed network performs the fusion of features extracted from face and speech data through a weighted sum to jointly optimize the contribution of each modality, aiming for the identification [...] Read more.
In this work we present a bimodal multitask network for audiovisual biometric recognition. The proposed network performs the fusion of features extracted from face and speech data through a weighted sum to jointly optimize the contribution of each modality, aiming for the identification of a client. The extracted speech features are simultaneously used in a speech recognition task with random digit sequences. Text prompted verification is performed by fusing the scores obtained from the matching of bimodal embeddings with the Word Error Rate (WER) metric calculated from the accuracy of the transcriptions. The score fusion outputs a value that can be compared with a threshold to accept or reject the identity of a client. Training and evaluation was carried out by using our proprietary database BIOMEX-DB and VidTIMIT audiovisual database. Our network achieved an accuracy of 100% and an Equal Error Rate (EER) of 0.44% for identification and verification, respectively, in the best case. To the best of our knowledge, this is the first system that combines the mutually related tasks previously described for biometric recognition. Full article
(This article belongs to the Special Issue Recent Advances in Algorithms for Computer Vision Applications)
Show Figures

Figure 1

Back to TopTop