applsci-logo

Journal Browser

Journal Browser

New Insights into Computer Vision and Graphics

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 30 April 2025 | Viewed by 7248

Special Issue Editor


E-Mail Website
Guest Editor
Department of Information Engineering, China University of Geosciences, Wuhan 430075, China
Interests: computer vision; deep learning; image and video understanding

Special Issue Information

Dear Colleagues,

Application trends, device technologies, and the blurring of boundaries between disciplines are propelling information technology forward. This poses new challenges in the study of visual computing-based interactive graphics processing technology. Therefore, this Special Issue intends to presentation new ideas and experimental discoveries in the field of computer vision and graphics, from its design, service, and theory, to its applications.

Computer vision and graphics focus on the computational processing and applications of visual data. Areas relevant to computer vision and graphics include, but are not limited to, robotics, medical imaging, security and surveillance, gaming and entertainment, education and training, art and design, environmental monitoring, etc. High-speed processing techniques and real-time performance, developing and refining deep learning techniques for computer vision and graphics applications, and explainable AI techniques to improve the transparency and interpretability of AI models are all topics of interest.

This Special Issue will publish high-quality, original research papers in overlapping fields, including the following:

  • Image processing/analysis;
  • Computer vision theory and application;
  • Video and audio encoding;
  • Motion detection and tracking;
  • Reconstruction and representation;
  • Facial and hand gesture recognition;
  • Rendering techniques;
  • Matching, inference, and recognition;
  • Geometric modeling;
  • 3D vision;
  • Graph-based learning and applications.

Dr. Yuanyuan Liu
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing/analysis
  • computer vision theory and application
  • video and audio encoding
  • motion detection and tracking
  • reconstruction and representation
  • facial and hand gesture recognition
  • rendering techniques
  • matching, inference, and recognition
  • geometric modeling
  • 3D vision
  • graph-based learning and applications

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 5326 KiB  
Article
6-DoF Pose Estimation from Single RGB Image and CAD Model Retrieval Using Feature Similarity Measurement
by Sieun Park, Won-Je Jeong, Mayura Manawadu and Soon-Yong Park
Appl. Sci. 2025, 15(3), 1501; https://doi.org/10.3390/app15031501 - 1 Feb 2025
Viewed by 613
Abstract
This study presents six degrees of freedom (6-DoF) pose estimation of an object from a single RGB image and retrieval of the matching CAD model by measuring the similarity between RGB and CAD rendering images. The 6-DoF pose estimation of an RGB object [...] Read more.
This study presents six degrees of freedom (6-DoF) pose estimation of an object from a single RGB image and retrieval of the matching CAD model by measuring the similarity between RGB and CAD rendering images. The 6-DoF pose estimation of an RGB object is one of the important techniques in 3D computer vision. However, in addition to 6-DoF pose estimation, retrieval and alignment of the matching CAD model with the RGB object should be performed for various industrial applications such as eXtended Reality (XR), Augmented Reality (AR), robot’s pick and place, and so on. This paper addresses 6-DoF pose estimation and CAD model retrieval problems simultaneously and quantitatively analyzes how much the 6-DoF pose estimation affects the CAD model retrieval performance. This study consists of two main steps. The first step is 6-DoF pose estimation based on the PoseContrast network. We enhance the structure of PoseConstrast by adding variance uncertainty weight and feature attention modules. The second step is the retrieval of the matching CAD model by an image similarity measurement between the CAD rendering and the RGB object. In our experiments, we used 2000 RGB images collected from Google and Bing search engines and 100 CAD models from ShapeNetCore. The Pascal3D+ dataset is used to train the pose estimation network and DELF features are used for the similarity measurement. Comprehensive ablation studies about the proposed network show the quantitative performance analysis with respect to the baseline model. Experimental results show that the pose estimation performance has a positive correlation with the CAD retrieval performance. Full article
(This article belongs to the Special Issue New Insights into Computer Vision and Graphics)
Show Figures

Figure 1

21 pages, 7424 KiB  
Article
Neural Network Ensemble to Detect Dicentric Chromosomes in Metaphase Images
by Ignacio Atencia-Jiménez, Adayabalam S. Balajee, Miguel J. Ruiz-Gómez, Francisco Sendra-Portero, Alegría Montoro and Miguel A. Molina-Cabello
Appl. Sci. 2024, 14(22), 10440; https://doi.org/10.3390/app142210440 - 13 Nov 2024
Viewed by 1197
Abstract
The Dicentric Chromosome Assay (DCA) is widely used in biological dosimetry, where the number of dicentric chromosomes induced by ionizing radiation (IR) exposure is quantified to estimate the absorbed radiation dose an individual has received. Dicentric chromosome scoring is a laborious and time-consuming [...] Read more.
The Dicentric Chromosome Assay (DCA) is widely used in biological dosimetry, where the number of dicentric chromosomes induced by ionizing radiation (IR) exposure is quantified to estimate the absorbed radiation dose an individual has received. Dicentric chromosome scoring is a laborious and time-consuming process which is performed manually in most cytogenetic biodosimetry laboratories. Further, dicentric chromosome scoring constitutes a bottleneck when several hundreds of samples need to be analyzed for dose estimation in the aftermath of large-scale radiological/nuclear incident(s). Recently, much interest has focused on automating dicentric chromosome scoring using Artificial Intelligence (AI) tools to reduce analysis time and improve the accuracy of dicentric chromosome detection. Our study aims to detect dicentric chromosomes in metaphase plate images using an ensemble of artificial neural network detectors suitable for datasets that present a low number of samples (in this work, only 50 images). In our approach, the input image is first processed by several operators, each producing a transformed image. Then, each transformed image is transferred to a specific detector trained with a training set processed by the same operator that transformed the image. Following this, the detectors provide their predictions about the detected chromosomes. Finally, all predictions are combined using a consensus function. Regarding the operators used, images were binarized separately applying Otsu and Spline techniques, while morphological opening and closing filters with different sizes were used to eliminate noise, isolate specific components, and enhance the structures of interest (chromosomes) within the image. Consensus-based decisions are typically more precise than those made by individual networks, as the consensus method can rectify certain misclassifications, assuming that individual network results are correct. The results indicate that our methodology worked satisfactorily in detecting a majority of chromosomes, with remarkable classification performance even with the low number of training samples utilized. AI-based dicentric chromosome detection will be beneficial for a rapid triage by improving the detection of dicentric chromosomes and thereby the dose prediction accuracy. Full article
(This article belongs to the Special Issue New Insights into Computer Vision and Graphics)
Show Figures

Figure 1

18 pages, 1493 KiB  
Article
Hypergraph Position Attention Convolution Networks for 3D Point Cloud Segmentation
by Yanpeng Rong, Liping Nong, Zichen Liang, Zhuocheng Huang, Jie Peng and Yiping Huang
Appl. Sci. 2024, 14(8), 3526; https://doi.org/10.3390/app14083526 - 22 Apr 2024
Viewed by 1769
Abstract
Point cloud segmentation, as the basis for 3D scene understanding and analysis, has made significant progress in recent years. Graph-based modeling and learning methods have played an important role in point cloud segmentation. However, due to the inherent complexity of point cloud data, [...] Read more.
Point cloud segmentation, as the basis for 3D scene understanding and analysis, has made significant progress in recent years. Graph-based modeling and learning methods have played an important role in point cloud segmentation. However, due to the inherent complexity of point cloud data, it is difficult to capture higher-order and complex features of 3D data using graph learning methods. In addition, how to quickly and efficiently extract important features from point clouds also poses a great challenge to the current research. To address these challenges, we propose a new framework, called hypergraph position attention convolution networks (HGPAT), for point cloud segmentation. Firstly, we use hypergraph to model the higher-order relationships among point clouds. Secondly, in order to effectively learn the feature information of point cloud data, a hyperedge position attention convolution module is proposed, which utilizes the hyperedge–hyperedge propagation pattern to extract and aggregate more important features. Finally, we design a ResNet-like module to reduce the computational complexity of the network and improve its efficiency. We have conducted point cloud segmentation experiments on the ShapeNet Part and S3IDS datasets, and the experimental results demonstrate the effectiveness of the proposed method compared with the state-of-the-art ones. Full article
(This article belongs to the Special Issue New Insights into Computer Vision and Graphics)
Show Figures

Figure 1

13 pages, 12039 KiB  
Article
Camera Path Generation for Triangular Mesh Using Toroidal Patches
by Jinyoung Choi, Kangmin Kim, Seongil Kim, Minseok Kim, Taekgwan Nam and Youngjin Park
Appl. Sci. 2024, 14(2), 490; https://doi.org/10.3390/app14020490 - 5 Jan 2024
Viewed by 1291
Abstract
Triangular mesh data structures are principal in computer graphics, serving as the foundation for many 3D models. To effectively utilize these 3D models across diverse industries, it is important to understand the model’s overall shape and geometric features thoroughly. In this work, we [...] Read more.
Triangular mesh data structures are principal in computer graphics, serving as the foundation for many 3D models. To effectively utilize these 3D models across diverse industries, it is important to understand the model’s overall shape and geometric features thoroughly. In this work, we introduce a novel method for generating camera paths that emphasize the model’s local geometric characteristics. This method uses a toroidal patch-based spatial data structure, approximating the mesh’s faces within a predetermined tolerance ϵ, encapsulating their geometric intricacies. This facilitates the determination of the camera position and gaze path, ensuring the mesh’s key characteristics are captured. During the path construction, we create a bounding cylinder for the mesh, project the mesh’s faces and associated toroidal patches onto the cylinder’s lateral surface, and sequentially select grids of the cylinder containing the highest number of toroidal patches as we traverse the lateral surface. The centers of the selected grids are used as control points for a periodic B-spline curve, which serves as our foundational path. After initial curve generation, we generated camera position and gaze path from the curve by multiplying factors to ensure a uniform camera amplitude. We applied our method to ten triangular mesh models, demonstrating its effectiveness and adaptability across various mesh configurations. Full article
(This article belongs to the Special Issue New Insights into Computer Vision and Graphics)
Show Figures

Figure 1

28 pages, 4448 KiB  
Article
ED2IF2-Net: Learning Disentangled Deformed Implicit Fields and Enhanced Displacement Fields from Single Images Using Pyramid Vision Transformer
by Xiaoqiang Zhu, Xinsheng Yao, Junjie Zhang, Mengyao Zhu, Lihua You, Xiaosong Yang, Jianjun Zhang, He Zhao and Dan Zeng
Appl. Sci. 2023, 13(13), 7577; https://doi.org/10.3390/app13137577 - 27 Jun 2023
Cited by 1 | Viewed by 1471
Abstract
There has emerged substantial research in addressing single-view 3D reconstruction and the majority of the state-of-the-art implicit methods employ CNNs as the backbone network. On the other hand, transformers have shown remarkable performance in many vision tasks. However, it is still unknown whether [...] Read more.
There has emerged substantial research in addressing single-view 3D reconstruction and the majority of the state-of-the-art implicit methods employ CNNs as the backbone network. On the other hand, transformers have shown remarkable performance in many vision tasks. However, it is still unknown whether transformers are suitable for single-view implicit 3D reconstruction. In this paper, we propose the first end-to-end single-view 3D reconstruction network based on the Pyramid Vision Transformer (PVT), called ED2IF2-Net, which disentangles the reconstruction of an implicit field into the reconstruction of topological structures and the recovery of surface details to achieve high-fidelity shape reconstruction. ED2IF2-Net uses a Pyramid Vision Transformer encoder to extract multi-scale hierarchical local features and a global vector of the input single image, which are fed into three separate decoders. A coarse shape decoder reconstructs a coarse implicit field based on the global vector, a deformation decoder iteratively refines the coarse implicit field using the pixel-aligned local features to obtain a deformed implicit field through multiple implicit field deformation blocks (IFDBs), and a surface detail decoder predicts an enhanced displacement field using the local features with hybrid attention modules (HAMs). The final output is a fusion of the deformed implicit field and the enhanced displacement field, with four loss terms applied to reconstruct the coarse implicit field, structure details through a novel deformation loss, overall shape after fusion, and surface details via a Laplacian loss. The quantitative results obtained from the ShapeNet dataset validate the exceptional performance of ED2IF2-Net. Notably, ED2IF2-Net-L stands out as the top-performing variant, exhibiting the highest mean IoU, CD, EMD, ECD-3D, and ECD-2D scores, reaching impressive values of 61.1, 7.26, 2.51, 6.08, and 1.84, respectively. The extensive experimental evaluations consistently demonstrate the state-of-the-art capabilities of ED2IF2-Net in terms of reconstructing topological structures and recovering surface details, all while maintaining competitive inference time. Full article
(This article belongs to the Special Issue New Insights into Computer Vision and Graphics)
Show Figures

Figure 1

Back to TopTop