applsci-logo

Journal Browser

Journal Browser

Machine Learning for Object Detection and Scene Description in Images and Videos

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 March 2025 | Viewed by 1576

Special Issue Editors


E-Mail Website
Guest Editor
Institute of Control and Industrial Electronics, Warsaw University of Technology, Ul. Koszykowa 75, 00-662 Warszawa, Poland
Interests: computer vision; machine learning; deep learning; image processing
Special Issues, Collections and Topics in MDPI journals
Department of Electronic Engineering, Yeungnam University, Gyeongsan 35841, Republic of Korea
Interests: image processing computer vision signal; image and video processing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor

Special Issue Information

Dear Colleagues,

Object detection and scene description are fundamental to advancing computer vision as a tool for automatically understanding the human environment. Recognizing and interpreting objects and scenes is critical for machines to understand and interact with the world meaningfully. This understanding forms the basis for more complex tasks like image and video analysis, autonomous navigation, and interactive systems. These technologies have various applications across various industries, namely healthcare, robotics, automotive, security, etc. Object detection and scene description improve the interaction between humans and computers, making it more intuitive. In big data, these methods enable the analysis and interpretation of visual data, constituting the majority of the data generated today. The complexity of real-world scenes and the variety of objects present ongoing challenges, making this an active and exciting area of research. Improving object detection and scene description models' accuracy, speed, and robustness remains crucial, driving innovation in machine learning algorithms and computational strategies. This Special Issue aims to present recent advances in object detection, semantic and instance segmentation, image captioning, visual question answering, scene modeling, object tracking, video summarizing, action recognition, and all other fields related to machine learning.

Dr. Marcin Iwanowski
Dr. Sungho Kim
Prof. Dr. Zhaoqing Pan
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • scene description
  • object detection
  • image segmentation
  • semantic segmentation
  • image captioning
  • video summarizing
  • robot vision
  • action recognition

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 13678 KiB  
Article
Improving CNN Fish Detection and Classification with Tracking
by Boubker Zouin, Jihad Zahir, Florian Baletaud, Laurent Vigliola and Sébastien Villon
Appl. Sci. 2024, 14(22), 10122; https://doi.org/10.3390/app142210122 - 5 Nov 2024
Viewed by 579
Abstract
The regular and consistent monitoring of marine ecosystems and fish communities is becoming more and more crucial due to increasing human pressures. To this end, underwater camera technology has become a major tool to collect an important amount of marine data. As the [...] Read more.
The regular and consistent monitoring of marine ecosystems and fish communities is becoming more and more crucial due to increasing human pressures. To this end, underwater camera technology has become a major tool to collect an important amount of marine data. As the size of the data collected outgrew the ability to process it, new means of automatic processing have been explored. Convolutional neural networks (CNNs) have been the most popular method for automatic underwater video analysis for the last few years. However, such algorithms are rather image-based and do not exploit the potential of video data. In this paper, we propose a method of coupling video tracking and CNN image analysis to perform a robust and accurate fish classification on deep sea videos and improve automatic classification accuracy. Our method fused CNNs and tracking methods, allowing us to detect 12% more individuals compared to CNN alone. Full article
Show Figures

Figure 1

16 pages, 6525 KiB  
Article
Recurrent and Concurrent Prediction of Longitudinal Progression of Stargardt Atrophy and Geographic Atrophy towards Comparative Performance on Optical Coherence Tomography as on Fundus Autofluorescence
by Zubin Mishra, Ziyuan Chris Wang, Emily Xu, Sophia Xu, Iyad Majid, SriniVas R. Sadda and Zhihong Jewel Hu
Appl. Sci. 2024, 14(17), 7773; https://doi.org/10.3390/app14177773 - 3 Sep 2024
Cited by 1 | Viewed by 590
Abstract
Stargardt atrophy and geographic atrophy (GA) represent pivotal endpoints in FDA-approved clinical trials. Predicting atrophy progression is crucial for evaluating drug efficacy. Fundus autofluorescence (FAF), the standard 2D imaging modality in these trials, has limitations in patient comfort. In contrast, spectral-domain optical coherence [...] Read more.
Stargardt atrophy and geographic atrophy (GA) represent pivotal endpoints in FDA-approved clinical trials. Predicting atrophy progression is crucial for evaluating drug efficacy. Fundus autofluorescence (FAF), the standard 2D imaging modality in these trials, has limitations in patient comfort. In contrast, spectral-domain optical coherence tomography (SD-OCT), a 3D imaging modality, is more patient friendly but suffers from lower image quality. This study has two primary objectives: (1) develop an efficient predictive modeling for the generation of future FAF images and prediction of future Stargardt atrophic (as well as GA) regions and (2) develop an efficient predictive modeling with advanced 3D OCT features at ellipsoid zone (EZ) for the comparative performance in the generation of future enface EZ maps and prediction of future Stargardt atrophic regions on OCT as on FAF. To achieve these goals, we propose two deep neural networks (termed ReConNet and ReConNet-Ensemble) with recurrent learning units (long short-term memory, LSTM) integrating with a convolutional neural network (CNN) encoder–decoder architecture and concurrent learning units integrated by ensemble/multiple recurrent learning channels. The ReConNet, which incorporates LSTM connections with CNN, is developed for the first goal on longitudinal FAF. The ReConNet-Ensemble, which incorporates multiple recurrent learning channels based on enhanced EZ enface maps to capture higher-order inherent OCT EZ features, is developed for the second goal on longitudinal OCT. Using FAF images at months 0, 6, and 12 to predict atrophy at month 18, the ReConNet achieved mean (±standard deviation, SD) and median Dice coefficients of 0.895 (±0.086) and 0.922 for Stargardt atrophy and 0.864 (±0.113) and 0.893 for GA. Using SD-OCT images at months 0 and 6 to predict atrophy at month 12, the ReConNet-Ensemble achieved mean and median Dice coefficients of 0.882 (±0.101) and 0.906 for Stargardt atrophy. The prediction performance on OCT images is comparably good to that on FAF. These results underscore the potential of SD-OCT for efficient and practical assessment of atrophy progression in clinical trials and retina clinics, complementing or surpassing the widely used FAF imaging technique. Full article
Show Figures

Figure 1

Back to TopTop