Advanced Convolutional Neural Network (CNN) Technology in Object Detection and Data Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 December 2024 | Viewed by 7110

Special Issue Editor


E-Mail Website
Guest Editor
INRIA Institut National de Recherche en Informatique et en Automatique, Le Chesnay, France
Interests: attention mechanism; reinforcement learning; generative learning; causal inference

Special Issue Information

Dear Colleagues,

Convolutional neural networks (CNNs) and related deep neural networks have seen great success in machine learning and computer vision. Advanced CNNs, such as fast, faster R-CNN, have achieved breakthrough performance in object detection. Recently, transformer models have been widely applied in classification, object detection, and multimodal machine learning tasks.

To further boost the research and application of advanced deep neural networks in various computer vision applications, this Special Issue aims to gather and collect advanced deep neural networks and algorithms in the field of computer vision and related areas. We encouraged the submission of research papers on, but not restricted to, object detection, image segmentation, and classification.

Dr. Shiyang Yan
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • convolutional neural network
  • computer vision
  • object detection
  • image segmentation
  • image classification

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 4745 KiB  
Article
Implementing YOLO Convolutional Neural Network for Seed Size Detection
by Jakub Pawłowski, Marcin Kołodziej and Andrzej Majkowski
Appl. Sci. 2024, 14(14), 6294; https://doi.org/10.3390/app14146294 - 19 Jul 2024
Viewed by 425
Abstract
The article presents research on the application of image processing techniques and convolutional neural networks (CNN) for the detection and measurement of seed sizes, specifically focusing on coffee and white bean seeds. The primary objective of the study is to evaluate the potential [...] Read more.
The article presents research on the application of image processing techniques and convolutional neural networks (CNN) for the detection and measurement of seed sizes, specifically focusing on coffee and white bean seeds. The primary objective of the study is to evaluate the potential of using CNNs to develop tools that automate seed recognition and measurement in images. A database was created, containing photographs of coffee and white bean seeds with precise annotations of their location and type. Image processing techniques and You Only Look Once v8 (YOLO) models were employed to analyze the seeds’ position, size, and type. A detailed comparison of the effectiveness and performance of the applied methods was conducted. The experiments demonstrated that the best-trained CNN model achieved a segmentation accuracy of 90.1% IoU, with an average seed size error of 0.58 mm. The conclusions indicate a significant potential for using image processing techniques and CNN models in automating seed analysis processes, which could lead to increased efficiency and accuracy in these processes. Full article
Show Figures

Figure 1

15 pages, 2625 KiB  
Article
Segmentation of Liver Tumors by Monai and PyTorch in CT Images with Deep Learning Techniques
by Sabir Muhammad and Jing Zhang
Appl. Sci. 2024, 14(12), 5144; https://doi.org/10.3390/app14125144 - 13 Jun 2024
Viewed by 819
Abstract
Image segmentation and identification are crucial to modern medical image processing techniques. This research provides a novel and effective method for identifying and segmenting liver tumors from public CT images. Our approach leverages the hybrid ResUNet model, a combination of both the ResNet [...] Read more.
Image segmentation and identification are crucial to modern medical image processing techniques. This research provides a novel and effective method for identifying and segmenting liver tumors from public CT images. Our approach leverages the hybrid ResUNet model, a combination of both the ResNet and UNet models developed by the Monai and PyTorch frameworks. The ResNet deep dense network architecture is implemented on public CT scans using the MSD Task03 Liver dataset. The novelty of our method lies in several key aspects. First, we introduce innovative enhancements to the ResUNet architecture, optimizing its performance, especially for liver tumor segmentation tasks. Additionally, by harassing the capabilities of Monai, we streamline the implementation process, eliminating the need for manual script writing and enabling faster, more efficient model development and optimization. The process of preparing images for analysis by a deep neural network involves several steps: data augmentation, a Hounsfield windowing unit, and image normalization. ResUNet network performance is measured by using the DC metric Dice coefficient. This approach, which utilizes residual connections, has proven to be more reliable than other existing techniques. This approach achieved DC values of 0.98% for detecting liver tumors and 0.87% for segmentation. Both qualitative and quantitative evaluations show promising results regarding model precision and accuracy. The implications of this research are that it could be used to increase the precision and accuracy of liver tumor detection and liver segmentation, reflecting the potential of the proposed method. This could help in the early diagnosis and treatment of liver cancer, which can ultimately improve patient prognosis. Full article
Show Figures

Figure 1

18 pages, 3884 KiB  
Article
A Method for Underwater Biological Detection Based on Improved YOLOXs
by Heng Wang, Pu Zhang, Mengnan You and Xinyuan You
Appl. Sci. 2024, 14(8), 3196; https://doi.org/10.3390/app14083196 - 10 Apr 2024
Viewed by 813
Abstract
This article proposes a lightweight underwater biological target detection network based on the improvement of YOLOXs, addressing the challenges of complex and dynamic underwater environments, limited memory in underwater devices, and constrained computational capabilities. Firstly, in the backbone network, GhostConv and GhostBottleneck are [...] Read more.
This article proposes a lightweight underwater biological target detection network based on the improvement of YOLOXs, addressing the challenges of complex and dynamic underwater environments, limited memory in underwater devices, and constrained computational capabilities. Firstly, in the backbone network, GhostConv and GhostBottleneck are introduced to replace standard convolutions and the Bottleneck1 structure in CSPBottleneck_1, significantly reducing the model’s parameter count and computational load, facilitating the construction of a lightweight network. Next, in the feature fusion network, a Contextual Transformer block replaces the 3 × 3 convolution in CSPBottleneck_2. This enhances self-attention learning by leveraging the rich context between input keys, improving the model’s representational capacity. Finally, the positioning loss function Focal_EIoU Loss is employed to replace IoU Loss, enhancing the model’s robustness and generalization ability, leading to faster and more accurate convergence during training. Our experimental results demonstrate that compared to the YOLOXs model, the proposed YOLOXs-GCE achieves a 1.1% improvement in mAP value, while reducing parameters by 24.47%, the computational load by 26.39%, and the model size by 23.87%. This effectively enhances the detection performance of the model, making it suitable for complex and dynamic underwater environments, as well as underwater devices with limited memory. The model meets the requirements of underwater target detection tasks. Full article
Show Figures

Figure 1

27 pages, 28358 KiB  
Article
Fast Coherent Video Style Transfer via Flow Errors Reduction
by Li Wang, Xiaosong Yang and Jianjun Zhang
Appl. Sci. 2024, 14(6), 2630; https://doi.org/10.3390/app14062630 - 21 Mar 2024
Viewed by 716
Abstract
For video style transfer, naively applying still image techniques to process a video frame-by-frame independently often causes flickering artefacts. Some works adopt optical flow into the design of temporal constraint loss to secure temporal consistency. However, these works still suffer from incoherence (including [...] Read more.
For video style transfer, naively applying still image techniques to process a video frame-by-frame independently often causes flickering artefacts. Some works adopt optical flow into the design of temporal constraint loss to secure temporal consistency. However, these works still suffer from incoherence (including ghosting artefacts) where large motions or occlusions occur, as optical flow fails to detect the boundaries of objects accurately. To address this problem, we propose a novel framework which consists of the following two stages: (1) creating new initialization images from proposed mask techniques, which are able to significantly reduce the flow errors; (2) process these initialized images iteratively with proposed losses to obtain stylized videos which are free of artefacts, which also increases the speed from over 3 min per frame to less than 2 s per frame for the gradient-based optimization methods. To be specific, we propose a multi-scale mask fusion scheme to reduce untraceable flow errors, and obtain an incremental mask to reduce ghosting artefacts. In addition, a multi-frame mask fusion scheme is designed to reduce traceable flow errors. In our proposed losses, the Sharpness Losses are used to deal with the potential image blurriness artefacts over long-range frames, and the Coherent Losses are performed to restrict the temporal consistency at both the multi-frame RGB level and Feature level. Overall, our approach produces stable video stylization outputs even in large motion or occlusion scenarios. The experiments demonstrate that the proposed method outperforms the state-of-the-art video style transfer methods qualitatively and quantitatively on the MPI Sintel dataset. Full article
Show Figures

Figure 1

23 pages, 7009 KiB  
Article
PGDS-YOLOv8s: An Improved YOLOv8s Model for Object Detection in Fisheye Images
by Degang Yang, Jie Zhou, Tingting Song, Xin Zhang and Yingze Song
Appl. Sci. 2024, 14(1), 44; https://doi.org/10.3390/app14010044 - 20 Dec 2023
Cited by 2 | Viewed by 2399
Abstract
Recently, object detection has become a research hotspot in computer vision, which often detects regular images with small viewing angles. In order to obtain a field of view without blind spots, fisheye cameras, which have distortions and discontinuities, have come into use. The [...] Read more.
Recently, object detection has become a research hotspot in computer vision, which often detects regular images with small viewing angles. In order to obtain a field of view without blind spots, fisheye cameras, which have distortions and discontinuities, have come into use. The fisheye camera, which has a wide viewing angle, and an unmanned aerial vehicle equipped with a fisheye camera are used to obtain a field of view without blind spots. However, distorted and discontinuous objects appear in the captured fisheye images due to the unique viewing angle of fisheye cameras. It poses a significant challenge to some existing object detectors. To solve this problem, this paper proposes a PGDS-YOLOv8s model to solve the issue of detecting distorted and discontinuous objects in fisheye images. First, two novel downsampling modules are proposed. Among them, the Max Pooling and Ghost’s Downsampling (MPGD) module effectively extracts the essential feature information of distorted and discontinuous objects. The Average Pooling and Ghost’s Downsampling (APGD) module acquires rich global features and reduces the feature loss of distorted and discontinuous objects. In addition, the proposed C2fs module uses Squeeze-and-Excitation (SE) blocks to model the interdependence of the channels to acquire richer gradient flow information about the features. The C2fs module provides a better understanding of the contextual information in fisheye images. Subsequently, an SE block is added after the Spatial Pyramid Pooling Fast (SPPF), thus improving the model’s ability to capture features of distorted, discontinuous objects. Moreover, the UAV-360 dataset is created for object detection in fisheye images. Finally, experiments show that the proposed PGDS-YOLOv8s model on the VOC-360 dataset improves [email protected] by 19.8% and [email protected]:0.95 by 27.5% compared to the original YOLOv8s model. In addition, the improved model on the UAV-360 dataset achieves 89.0% for [email protected] and 60.5% for [email protected]:0.95. Furthermore, on the MS-COCO 2017 dataset, the PGDS-YOLOv8s model improved AP by 1.4%, AP50 by 1.7%, and AP75 by 1.2% compared with the original YOLOv8s model. Full article
Show Figures

Figure 1

13 pages, 2895 KiB  
Article
Improved Lightweight Multi-Target Recognition Model for Live Streaming Scenes
by Zongwei Li, Kai Qiao, Jianing Chen, Zhenyu Li and Yanhui Zhang
Appl. Sci. 2023, 13(18), 10170; https://doi.org/10.3390/app131810170 - 10 Sep 2023
Viewed by 1050
Abstract
Nowadays, the commercial potential of live e-commerce is being continuously explored, and machine vision algorithms are gradually attracting the attention of marketers and researchers. During live streaming, the visuals can be effectively captured by algorithms, thereby providing additional data support. This paper aims [...] Read more.
Nowadays, the commercial potential of live e-commerce is being continuously explored, and machine vision algorithms are gradually attracting the attention of marketers and researchers. During live streaming, the visuals can be effectively captured by algorithms, thereby providing additional data support. This paper aims to consider the diversity of live streaming devices and proposes an extremely lightweight and high-precision model to meet different requirements in live streaming scenarios. Building upon yolov5s, we incorporate the MobileNetV3 module and the CA attention mechanism to optimize the model. Furthermore, we construct a multi-object dataset specific to live streaming scenarios, including anchor facial expressions and commodities. A series of experiments have demonstrated that our model realized a 0.4% improvement in accuracy compared to the original model, while reducing its weight to 10.52%. Full article
Show Figures

Figure 1

Back to TopTop