Deep Learning in Multimedia and Computer Vision

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Electronic Multimedia".

Deadline for manuscript submissions: 15 August 2024 | Viewed by 3570

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science and Engineering, Hefei University of Technology, Hefei 230009, China
Interests: deep learning; computer vision; image/video restoration; enhancement and compression
School of Artificial Intelligence, Anhui University, Hefei 230039, China
Interests: computer vision; neural architecture search; neural network compression

Special Issue Information

Dear Colleagues,

Image processing and computer vision have been important areas of research for several decades, and have numerous applications in various fields including healthcare, security, and robotics. In recent years, the development of deep learning techniques has revolutionized these fields, enabling researchers to achieve unprecedented levels of accuracy and performance in a wide range of tasks such as image recognition, segmentation, restoration, and object detection.

The success of deep learning in image processing and computer vision can be attributed to its ability to automatically learn complex feature representations from large datasets. This has allowed researchers to develop more accurate and robust models for a variety of applications. Moreover, the availability of large-scale datasets and powerful computational resources has facilitated the training of deep learning models, making it possible to tackle even more challenging problems.

Given the importance and potential of deep learning in image processing and computer vision, there is a need to bring together the latest research in this area to showcase the state-of-the-art techniques and identify future directions of research.

The objective of this Special Issue is to bring together cutting-edge research on the use of deep learning techniques in image processing and computer vision. The scope of this Special Issue includes, but is not limited to, the following topics:

  • Deep learning for image recognition and classification;
  • Deep learning for object detection and tracking;
  • Deep learning for image segmentation and feature extraction;
  • Deep learning for image restoration and enhancement;
  • Deep learning for 3D vision and reconstruction;
  • Deep learning for medical image analysis;
  • Deep learning for video processing and analysis.

Dr. Feng Li
Dr. Ke Xu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • computer vision
  • convolutional neural network
  • learned image and video processing
  • multimedia analysis and applications

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 7701 KiB  
Article
Detection-Free Object Tracking for Multiple Occluded Targets in Plenoptic Video
by Yunjeong Yong, Jiwoo Kang and Heeseok Oh
Electronics 2024, 13(3), 590; https://doi.org/10.3390/electronics13030590 - 31 Jan 2024
Viewed by 539
Abstract
Multiple object tracking (MOT) is a fundamental task in vision, but MOT techniques for plenoptic video are scarce. Almost all 2D MOT algorithms that show high performance mostly use the detection-based method which has the disadvantage of operating only for a specific object. [...] Read more.
Multiple object tracking (MOT) is a fundamental task in vision, but MOT techniques for plenoptic video are scarce. Almost all 2D MOT algorithms that show high performance mostly use the detection-based method which has the disadvantage of operating only for a specific object. To enable tracking of arbitrary desired objects, this paper introduces a groundbreaking detection-free tracking method for MOT in plenoptic videos. The proposed method deviates from traditional detection-based tracking methods, emphasizing the challenges of tracking targets with occlusions. The paper presents specialized algorithms that exploit the multifocal information of plenoptic video, including the focal range restriction and dynamic focal range adjustment schemes to secure robustness for occluded object tracking. To the improvement of the spatial searching capability, the anchor ensemble and the dynamic change of spatial search region algorithms are also proposed. Additionally, in terms of MOT, to reduce the computation time involved, the motion-adaptive time scheduling technique is proposed, which improves computation speed while guaranteeing a certain level of accuracy. Experimental results show a significant improvement in tracking performance, with a 77% success rate based on intersection over union for occluded targets in plenoptic videos, marking a substantial advancement in the field of plenoptic object tracking. Full article
(This article belongs to the Special Issue Deep Learning in Multimedia and Computer Vision)
Show Figures

Figure 1

16 pages, 2260 KiB  
Article
High Frequency Component Enhancement Network for Image Manipulation Detection
by Wenyan Pan, Wentao Ma, Xiaoqian Wu and Wei Liu
Electronics 2024, 13(2), 447; https://doi.org/10.3390/electronics13020447 - 21 Jan 2024
Viewed by 834
Abstract
With the support of deep neural networks, the existing image manipulation detection (IMD) methods can detect manipulated regions within a suspicious image effectively. In general, manipulation operations (e.g., splicing, copy-move, and removal) tend to leave manipulation artifacts in the high-frequency domain of the [...] Read more.
With the support of deep neural networks, the existing image manipulation detection (IMD) methods can detect manipulated regions within a suspicious image effectively. In general, manipulation operations (e.g., splicing, copy-move, and removal) tend to leave manipulation artifacts in the high-frequency domain of the image, which provides rich clues for locating manipulated regions. Inspired by this phenomenon, in this paper, we propose a High-Frequency Component Enhancement Network, short for HFCE-Net, for image manipulation detection, which aims to fully explore the manipulation artifacts left in the high-frequency domain to improve the localization performance in IMD tasks. Specifically, the HFCE-Net consists of two parallel branches, i.e., the main stream and high-frequency auxiliary branch (HFAB). The HFAB is introduced to fully explore high-frequency artifacts within manipulated images. To achieve this goal, the input image of the HFAB is filtered out of the low-frequency component by the Sobel filter. Furthermore, the HFEB is supervised with the edge information of the manipulated regions. The main stream branch takes the RGB image as input, and aggregates the features learned from the HFAB by the proposed multi-layer fusion (MLF) in a hierarchical manner. We conduct extensive experiments on widely used benchmarks, and the results demonstrate that our HFCE-Net exhibits a strong ability to capture high-frequency information within the manipulated image. Moreover, the proposed HFCE-Net achieves comparable performance (57.3%, 90.9%, and 73.8% F1 on CASIA, NIST, and Coverage datasets) and achieves 1.9%, 9.0%, and 1.5% improvement over the existing method. Full article
(This article belongs to the Special Issue Deep Learning in Multimedia and Computer Vision)
Show Figures

Figure 1

18 pages, 839 KiB  
Article
Image–Text Cross-Modal Retrieval with Instance Contrastive Embedding
by Ruigeng Zeng, Wentao Ma, Xiaoqian Wu, Wei Liu and Jie Liu
Electronics 2024, 13(2), 300; https://doi.org/10.3390/electronics13020300 - 9 Jan 2024
Viewed by 949
Abstract
Image–text cross-modal retrieval aims to bridge the semantic gap between different modalities, allowing for the search of images based on textual descriptions or vice versa. Existing efforts in this field concentrate on coarse-grained feature representation and then utilize pairwise ranking loss to pull [...] Read more.
Image–text cross-modal retrieval aims to bridge the semantic gap between different modalities, allowing for the search of images based on textual descriptions or vice versa. Existing efforts in this field concentrate on coarse-grained feature representation and then utilize pairwise ranking loss to pull image–text positive pairs closer, pushing negative ones apart. However, using pairwise ranking loss directly on coarse-grained representation lacks reliability as it disregards fine-grained information, posing a challenge in narrowing the semantic gap between image and text. To this end, we propose an Instance Contrastive Embedding (IConE) method for image–text cross-modal retrieval. Specifically, we first transfer the multi-modal pre-training model to the cross-modal retrieval task to leverage the interactive information between image and text, thereby enhancing the model’s representational capabilities. Then, to comprehensively consider the feature distribution of intra- and inter-modality, we design a novel two-stage training strategy that combines instance loss and contrastive loss, dedicated to extracting fine-grained representation within instances and bridging the semantic gap between modalities. Extensive experiments on two public benchmark datasets, Flickr30k and MS-COCO, demonstrate that our IConE outperforms several state-of-the-art (SoTA) baseline methods and achieves competitive performance. Full article
(This article belongs to the Special Issue Deep Learning in Multimedia and Computer Vision)
Show Figures

Figure 1

22 pages, 2291 KiB  
Article
An Enhanced Detection Method of PCB Defect Based on D-DenseNet (PCBDD-DDNet)
by Haiyan Kang and Yujie Yang
Electronics 2023, 12(23), 4737; https://doi.org/10.3390/electronics12234737 - 22 Nov 2023
Viewed by 780
Abstract
Printed Circuit Boards (PCBs), as integral components of electronic products, play a crucial role in modern industrial production. However, due to the precision and complexity of PCBs, existing PCB defect detection methods exhibit some issues such as low detection accuracy and limited usability. [...] Read more.
Printed Circuit Boards (PCBs), as integral components of electronic products, play a crucial role in modern industrial production. However, due to the precision and complexity of PCBs, existing PCB defect detection methods exhibit some issues such as low detection accuracy and limited usability. In order to address these problems, a PCB defect detection method based on D-DenseNet (PCBDD-DDNet) has been proposed. This method capitalizes on the advantages of two deep learning networks, CDBN (Convolutional Deep Belief Networks) and DenseNet (Densely Connected Convolutional Networks), to construct the D-DenseNet (Combination of CDBN and DenseNet) network. Within this network, CDBN focuses on extracting low-level features, while DenseNet is responsible for high-level feature extraction. The outputs from both networks are integrated using a weighted averaging approach. Additionally, the D-DenseNet employs a multi-scale module to extract features from different levels. This is achieved by incorporating filters of sizes 3 × 3, 5 × 5, and 7 × 7 along the three paths of the CDBN network, multi-scale feature extraction network, and DenseNet network, effectively capturing information at various scales. To prevent overfitting and enhance network performance, the Adafactor optimization function and L2 regularization are introduced. Finally, online hard example mining mechanism (OHEM) is incorporated to improve the network’s handling of challenging samples and enhance the accuracy of the PCB defect detection network. The effectiveness of this PCBDD-DDNet method is demonstrated through experiments conducted on publicly available PCB datasets. And the method achieves a mAP (mean Average Precision) of 93.24%, with an accuracy higher than other classical networks. The results affirm the method’s efficacy in PCB defect detection. Full article
(This article belongs to the Special Issue Deep Learning in Multimedia and Computer Vision)
Show Figures

Figure 1

Back to TopTop