Applications of Artificial Intelligence in Computer Vision

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 15 September 2024 | Viewed by 6125

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan City 33302, Taiwan
Interests: computer vision; multimedia; medical image; knowledge graph; Internet of Things
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

We are pleased to invite you to submit your research contributions to the upcoming Special Issue on "Applications of Artificial Intelligence in Computer Vision". This Special Issue aims to bring together leading researchers and practitioners from academia and industry to discuss the latest advances, findings, and practical applications of AI in the field of computer vision.

Scope and Topics

The broad and interdisciplinary nature of artificial intelligence (AI) in computer vision makes it an engaging and impactful area for research. This Special Issue invites high-quality, original, and previously unpublished research papers, reviews, and case studies that contribute to this growing field. Topics of interest for submission include, but are not limited to, the following:

  1. Advanced machine learning and deep learning techniques for image and video analyses;
  2. Object detection, recognition, and tracking;
  3. Scene understanding and semantic segmentation;
  4. Three-dimensional vision and depth estimation;
  5. Face and gesture recognition;
  6. AI in medical image analyses;
  7. Augmented reality (AR) and virtual reality (VR) in computer vision;
  8. AI-driven surveillance and security systems;
  9. Real-time computer vision applications;
  10. Explainable AI in computer vision.
Technical Program Committee Member:
Name: Dr. Ashu Abdul
Email: [email protected]
Affiliation: Department of Computer Science and Engineering, SRM University-AP,  Neerukonda, Mangalagiri (Mandal), Guntur Dist., Andhra Pradesh-522240, India
Research Interests:artificial intelligence; chatbots; computer vision; medical image processing; natural language generation; recommendation systems

Prof. Dr. Jenhui Chen
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • image processing
  • medical image
  • object detection
  • scene understanding

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 5517 KiB  
Article
Dual-Level Viewpoint-Learning for Cross-Domain Vehicle Re-Identification
by Ruihua Zhou, Qi Wang, Lei Cao, Jianqiang Xu, Xiaogang Zhu, Xin Xiong, Huiqi Zhang and Yuling Zhong
Electronics 2024, 13(10), 1823; https://doi.org/10.3390/electronics13101823 - 8 May 2024
Viewed by 311
Abstract
The definition of vehicle viewpoint annotations is ambiguous due to human subjective judgment, which makes the cross-domain vehicle re-identification methods unable to learn the viewpoint invariance features during source domain pre-training. This will further lead to cross-view misalignment in downstream target domain tasks. [...] Read more.
The definition of vehicle viewpoint annotations is ambiguous due to human subjective judgment, which makes the cross-domain vehicle re-identification methods unable to learn the viewpoint invariance features during source domain pre-training. This will further lead to cross-view misalignment in downstream target domain tasks. To solve the above challenges, this paper presents a dual-level viewpoint-learning framework that contains an angle invariance pre-training method and a meta-orientation adaptation learning strategy. The dual-level viewpoint-annotation proposal is first designed to concretely redefine the vehicle viewpoint from two aspects (i.e., angle-level and orientation-level). An angle invariance pre-training method is then proposed to preserve identity similarity and difference across the cross-view; this consists of a part-level pyramidal network and an angle bias metric loss. Under the supervision of angle bias metric loss, the part-level pyramidal network, as the backbone, learns the subtle differences of vehicles from different angle-level viewpoints. Finally, a meta-orientation adaptation learning strategy is designed to extend the generalization ability of the re-identification model to the unseen orientation-level viewpoints. Simultaneously, the proposed meta-learning strategy enforces meta-orientation training and meta-orientation testing according to the orientation-level viewpoints in the target domain. Extensive experiments on public vehicle re-identification datasets demonstrate that the proposed method combines the redefined dual-level viewpoint-information and significantly outperforms other state-of-the-art methods in alleviating viewpoint variations. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Computer Vision)
Show Figures

Figure 1

24 pages, 5326 KiB  
Article
Edge-Enhanced Dual-Stream Perception Network for Monocular Depth Estimation
by Zihang Liu and Quande Wang
Electronics 2024, 13(9), 1652; https://doi.org/10.3390/electronics13091652 - 25 Apr 2024
Viewed by 352
Abstract
Estimating depth from a single RGB image has a wide range of applications, such as in robot navigation and autonomous driving. Currently, Convolutional Neural Networks based on encoder–decoder architecture are the most popular methods to estimate depth maps. However, convolutional operators have limitations [...] Read more.
Estimating depth from a single RGB image has a wide range of applications, such as in robot navigation and autonomous driving. Currently, Convolutional Neural Networks based on encoder–decoder architecture are the most popular methods to estimate depth maps. However, convolutional operators have limitations in modeling large-scale dependence, often leading to inaccurate depth predictions at object edges. To address these issues, a new edge-enhanced dual-stream monocular depth estimation method is introduced in this paper. ResNet and Swin Transformer are combined to better extract global and local features, which benefits the estimation of the depth map. To better integrate the information from the two branches of the encoder and the shallow branch of the decoder, we designed a lightweight decoder based on the multi-head Cross-Attention Module. Furthermore, in order to improve the boundary clarity of objects in the depth map, a loss function with an additional penalty for depth estimation error on the edges of objects is presented. The results on three datasets, NYU Depth V2, KITTI, and SUN RGB-D, show that the method presented in this paper achieves better performance for monocular depth estimation. Additionally, it has good generalization capabilities for various scenarios and real-world images. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Computer Vision)
Show Figures

Figure 1

17 pages, 2448 KiB  
Article
A Lightweight Context-Aware Feature Transformer Network for Human Pose Estimation
by Yanli Ma, Qingxuan Shi and Fan Zhang
Electronics 2024, 13(4), 716; https://doi.org/10.3390/electronics13040716 - 9 Feb 2024
Viewed by 737
Abstract
We propose a Context-aware Feature Transformer Network (CaFTNet), a novel network for human pose estimation. To address the issue of limited modeling of global dependencies in convolutional neural networks, we design the Transformerneck to strengthen the expressive power of features. Transformerneck directly substitutes [...] Read more.
We propose a Context-aware Feature Transformer Network (CaFTNet), a novel network for human pose estimation. To address the issue of limited modeling of global dependencies in convolutional neural networks, we design the Transformerneck to strengthen the expressive power of features. Transformerneck directly substitutes 3×3 convolution in the bottleneck of HRNet with a Contextual Transformer (CoT) block while reducing the complexity of the network. Specifically, the CoT first produces keys with static contextual information through 3×3 convolution. Then, relying on query and contextualization keys, dynamic contexts are generated through two concatenated 1×1 convolutions. Static and dynamic contexts are eventually fused as an output. Additionally, for multi-scale networks, in order to further refine the features of the fusion output, we propose an Attention Feature Aggregation Module (AFAM). Technically, given an intermediate input, the AFAM successively deduces attention maps along the channel and spatial dimensions. Then, an adaptive refinement module (ARM) is exploited to activate the obtained attention maps. Finally, the input undergoes adaptive feature refinement through multiplication with the activated attention maps. Through the above procedures, our lightweight network provides powerful clues for the detection of keypoints. Experiments are performed on the COCO and MPII datasets. The model achieves a 76.2 AP on the COCO val2017 dataset. Compared to other methods with a CNN as the backbone, CaFTNet has a 72.9% reduced number of parameters. On the MPII dataset, our method uses only 60.7% of the number of parameters, acquiring similar results to other methods with a CNN as the backbone. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Computer Vision)
Show Figures

Figure 1

15 pages, 2842 KiB  
Article
Face Recognition Bias Assessment through Quality Estimation Models
by Luis Lopez Paya, Pedro Cordoba, Angela Sanchez Perez, Javier Barrachina, Manuel Benavent-Lledo, David Mulero-Pérez and Jose Garcia-Rodriguez
Electronics 2023, 12(22), 4649; https://doi.org/10.3390/electronics12224649 - 15 Nov 2023
Viewed by 1006
Abstract
Recent advances in facial recognition technology have achieved outstanding performance, but unconstrained face recognition remains an ongoing issue. Facial-image-quality-evaluation algorithms evaluate the quality of the input samples, providing crucial information about the accuracy of recognition decisions. By doing so, this can lead to [...] Read more.
Recent advances in facial recognition technology have achieved outstanding performance, but unconstrained face recognition remains an ongoing issue. Facial-image-quality-evaluation algorithms evaluate the quality of the input samples, providing crucial information about the accuracy of recognition decisions. By doing so, this can lead to improved results in challenging scenarios. In recent years, significant progress has been made in assessing the quality of facial images. The computation of quality scores has become highly precise and closely correlated with the model results. In this paper, we reviewed and analyzed the existing biases of cutting-edge quality-estimation techniques for face recognition. Our experimentation focused on the quality estimators developed by MagFace, FaceQNet, and SER-FIQ and were evaluated on the CelebA reference dataset. A study of bias in the face-recognition model was conducted by analyzing the quality scores presented in each article. This allowed for an examination of existing biases within both the quality estimators and the face-recognition models. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Computer Vision)
Show Figures

Figure 1

14 pages, 3041 KiB  
Article
MammalClub: An Annotated Wild Mammal Dataset for Species Recognition, Individual Identification, and Behavior Recognition
by Wenbo Lu, Yaqin Zhao, Jin Wang, Zhaoxiang Zheng, Liqi Feng and Jiaxi Tang
Electronics 2023, 12(21), 4506; https://doi.org/10.3390/electronics12214506 - 2 Nov 2023
Cited by 1 | Viewed by 1057
Abstract
Mammals play an important role in conserving species diversity and maintaining ecological balance, so research on mammal species composition, individual identification, and behavioral analysis is of great significance for optimizing the ecological environment. Due to their great capabilities for feature extraction, deep learning [...] Read more.
Mammals play an important role in conserving species diversity and maintaining ecological balance, so research on mammal species composition, individual identification, and behavioral analysis is of great significance for optimizing the ecological environment. Due to their great capabilities for feature extraction, deep learning networks have gradually been applied to wildlife monitoring. However, training a network requires a large number of animal image samples. Although a few wildlife datasets contain many mammals, most mammal images in these datasets are not annotated. In particular, selecting mammalian images from vast and comprehensive datasets is still a time-consuming task. Therefore, there is currently a lack of specialized datasets of images of wild mammals. To address these limitations, this article created a mammal image dataset (named MammalClub), which contains three sub-datasets (i.e., a species recognition sub-dataset, an individual identification sub-dataset, and a behavior recognition sub-dataset). This study labeled the bounding boxes of the images used for species recognition and the coordinates of the mammals’ skeletal joints for behavior recognition. This study also captured images of each individual from different points of view for individual mammal identification. This study explored novel intelligent animal recognition models and compared and analyzed them with the mainstream models in order to test the dataset. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Computer Vision)
Show Figures

Figure 1

18 pages, 1533 KiB  
Article
EDKSANet: An Efficient Dual-Kernel Split Attention Neural Network for the Classification of Tibetan Medicinal Materials
by Jindong Qi, Bianba Wangdui, Jun Jiang, Jie Yang and Yanxia Zhou
Electronics 2023, 12(20), 4330; https://doi.org/10.3390/electronics12204330 - 19 Oct 2023
Viewed by 821
Abstract
Tibetan medicine has received wide acclaim for its unique diagnosis and treatment methods. The identification of Tibetan medicinal materials, which are a vital component of Tibetan medicine, is a key research area in this field. However, traditional deep learning-based visual neural networks face [...] Read more.
Tibetan medicine has received wide acclaim for its unique diagnosis and treatment methods. The identification of Tibetan medicinal materials, which are a vital component of Tibetan medicine, is a key research area in this field. However, traditional deep learning-based visual neural networks face significant challenges in efficiently and accurately identifying Tibetan medicinal materials due to their large number, complex morphology, and the scarcity of public visual datasets. To address this issue, we constructed a computer vision dataset with 300 Tibetan medicinal materials and proposed a lightweight and efficient cross-dimensional attention mechanism, the Dual-Kernel Split Attention (DKSA) module, which can adaptively share parameters of the kernel in both spatial and channel dimensions. Based on the DKSA module, we achieve efficient unification of convolution and self-attention under the CNN architecture and develop a new lightweight backbone architecture, EDKSANet, to provide enhanced performance for various computer vision tasks. As compared to RedNet, the top-1 accuracy is improved by 1.2% on an ImageNet dataset, and a larger margin of +1.5 box AP for object detection and an improvement of +1.3 mask AP for instance segmentation on MS-COCO dataset are obtained. Moreover, EDKSANet achieved excellent classification performance on the Tibetan medicinal materials dataset, with an accuracy of up to 96.85%. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Computer Vision)
Show Figures

Figure 1

17 pages, 11089 KiB  
Article
Efficient Object Placement via FTOPNet
by Guosheng Ye, Jianming Wang and Zizhong Yang
Electronics 2023, 12(19), 4106; https://doi.org/10.3390/electronics12194106 - 30 Sep 2023
Viewed by 830
Abstract
Image composition involves the placement of foreground objects at an appropriate scale within a background image to create a visually realistic composite image. However, manual operations for this task are time-consuming and labor-intensive. In this study, we propose an efficient method for foreground [...] Read more.
Image composition involves the placement of foreground objects at an appropriate scale within a background image to create a visually realistic composite image. However, manual operations for this task are time-consuming and labor-intensive. In this study, we propose an efficient method for foreground object placement, comprising a background feature extraction module (BFEM) designed for background images and a foreground–background cross-attention feature fusion module (FBCAFFM). The BFEM is capable of extracting precise and comprehensive information from the background image. The fused features enable the network to learn additional information related to foreground–background matching, aiding in the prediction of foreground object placement and size. Our experiments are conducted using the publicly available object placement assessment (OPA) dataset. Both quantitative and visual results demonstrate that FTOPNet effectively performs the foreground object placement task and offers a practical solution for image composition tasks. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Computer Vision)
Show Figures

Figure 1

Back to TopTop