Submit to Special Issue Submit Abstract to Special Issue Review for Applied Sciences Propose a Special Issue

Journal Menu

Journal Browser

Machine Learning for Object Detection and Scene Description in Images and Videos

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 March 2026 | Viewed by 3608

Share This Special Issue

Special Issue Editors

Dr. Marcin Iwanowski

E-Mail Website
Guest Editor

Institute of Control and Industrial Electronics, Warsaw University of Technology, Ul. Koszykowa 75, 00-662 Warszawa, Poland
Interests: computer vision; machine learning; deep learning; image processing
Special Issues, Collections and Topics in MDPI journals

Dr. Sungho Kim

E-Mail Website
Guest Editor

Department of Electronic Engineering, Yeungnam University, Gyeongsan 35841, Republic of Korea
Interests: image processing computer vision signal; image and video processing
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. Zhaoqing Pan

E-Mail Website
Guest Editor

School of Electrical and Information Engineering, Tianjin University, Tianjin, China
Interests: multimedia processing; video coding/compression
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Object detection and scene description are fundamental to advancing computer vision as a tool for automatically understanding the human environment. Recognizing and interpreting objects and scenes is critical for machines to understand and interact with the world meaningfully. This understanding forms the basis for more complex tasks like image and video analysis, autonomous navigation, and interactive systems. These technologies have various applications across various industries, namely healthcare, robotics, automotive, security, etc. Object detection and scene description improve the interaction between humans and computers, making it more intuitive. In big data, these methods enable the analysis and interpretation of visual data, constituting the majority of the data generated today. The complexity of real-world scenes and the variety of objects present ongoing challenges, making this an active and exciting area of research. Improving object detection and scene description models' accuracy, speed, and robustness remains crucial, driving innovation in machine learning algorithms and computational strategies. This Special Issue aims to present recent advances in object detection, semantic and instance segmentation, image captioning, visual question answering, scene modeling, object tracking, video summarizing, action recognition, and all other fields related to machine learning.

Dr. Marcin Iwanowski
Dr. Sungho Kim
Prof. Dr. Zhaoqing Pan
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

machine learning
scene description
object detection
image segmentation
semantic segmentation
image captioning
video summarizing
robot vision
action recognition

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

15 pages, 5494 KiB

Open AccessArticle

Classification of OCT Images of the Human Eye Using Mobile Devices

by Agnieszka Stankiewicz, Tomasz Marciniak, Nina Budna, Róża Chwałek and Marcin Dziedzic

Appl. Sci. 2025, 15(6), 2937; https://doi.org/10.3390/app15062937 - 8 Mar 2025

Viewed by 540

Abstract

The aim of this study was to develop a mobile application for Android devices dedicated to the classification of pathological changes in human eye optical coherence tomography (OCT) B-scans. The classification process is conducted using convolutional neural networks (CNNs). Six models were trained during the study: a simple convolutional neural network with three convolutional layers, VGG16, InceptionV3, Xception, Joint Attention Network + MobileNetV2 and OpticNet-71. All of these models were converted to TensorFlow Lite format to implement them into a mobile application. For this purpose, three models with the best parameters were chosen, taking accuracy, precision, recall, F1-score and confusion matrix into consideration. The Android application designed for the classification of OCT images was developed using the Kotlin programming language within the Android Studio integrated development environment. With the application, classification can be performed on an image chosen from the user’s files or an image acquired using the photo-taking function. The results of the classification are displayed for three neural networks, along with the respective classification times for each neural network and the associated image undergoing the classification task. The mobile application has been tested using various smartphones. The testing phase included an evaluation of image classification times and score accuracy, considering factors such as image acquisition method, i.e., camera or gallery. Full article

(This article belongs to the Special Issue Machine Learning for Object Detection and Scene Description in Images and Videos)

► Show Figures

Figure 1

13 pages, 13678 KiB

Open AccessArticle

Improving CNN Fish Detection and Classification with Tracking

by Boubker Zouin, Jihad Zahir, Florian Baletaud, Laurent Vigliola and Sébastien Villon

Appl. Sci. 2024, 14(22), 10122; https://doi.org/10.3390/app142210122 - 5 Nov 2024

Viewed by 1385

Abstract

The regular and consistent monitoring of marine ecosystems and fish communities is becoming more and more crucial due to increasing human pressures. To this end, underwater camera technology has become a major tool to collect an important amount of marine data. As the size of the data collected outgrew the ability to process it, new means of automatic processing have been explored. Convolutional neural networks (CNNs) have been the most popular method for automatic underwater video analysis for the last few years. However, such algorithms are rather image-based and do not exploit the potential of video data. In this paper, we propose a method of coupling video tracking and CNN image analysis to perform a robust and accurate fish classification on deep sea videos and improve automatic classification accuracy. Our method fused CNNs and tracking methods, allowing us to detect 12% more individuals compared to CNN alone. Full article

(This article belongs to the Special Issue Machine Learning for Object Detection and Scene Description in Images and Videos)

► Show Figures

Figure 1

16 pages, 6525 KiB

Open AccessArticle

Recurrent and Concurrent Prediction of Longitudinal Progression of Stargardt Atrophy and Geographic Atrophy towards Comparative Performance on Optical Coherence Tomography as on Fundus Autofluorescence

by Zubin Mishra, Ziyuan Chris Wang, Emily Xu, Sophia Xu, Iyad Majid, SriniVas R. Sadda and Zhihong Jewel Hu

Appl. Sci. 2024, 14(17), 7773; https://doi.org/10.3390/app14177773 - 3 Sep 2024

Cited by 1 | Viewed by 1055

Abstract

Stargardt atrophy and geographic atrophy (GA) represent pivotal endpoints in FDA-approved clinical trials. Predicting atrophy progression is crucial for evaluating drug efficacy. Fundus autofluorescence (FAF), the standard 2D imaging modality in these trials, has limitations in patient comfort. In contrast, spectral-domain optical coherence tomography (SD-OCT), a 3D imaging modality, is more patient friendly but suffers from lower image quality. This study has two primary objectives: (1) develop an efficient predictive modeling for the generation of future FAF images and prediction of future Stargardt atrophic (as well as GA) regions and (2) develop an efficient predictive modeling with advanced 3D OCT features at ellipsoid zone (EZ) for the comparative performance in the generation of future enface EZ maps and prediction of future Stargardt atrophic regions on OCT as on FAF. To achieve these goals, we propose two deep neural networks (termed ReConNet and ReConNet-Ensemble) with recurrent learning units (long short-term memory, LSTM) integrating with a convolutional neural network (CNN) encoder–decoder architecture and concurrent learning units integrated by ensemble/multiple recurrent learning channels. The ReConNet, which incorporates LSTM connections with CNN, is developed for the first goal on longitudinal FAF. The ReConNet-Ensemble, which incorporates multiple recurrent learning channels based on enhanced EZ enface maps to capture higher-order inherent OCT EZ features, is developed for the second goal on longitudinal OCT. Using FAF images at months 0, 6, and 12 to predict atrophy at month 18, the ReConNet achieved mean (±standard deviation, SD) and median Dice coefficients of 0.895 (±0.086) and 0.922 for Stargardt atrophy and 0.864 (±0.113) and 0.893 for GA. Using SD-OCT images at months 0 and 6 to predict atrophy at month 12, the ReConNet-Ensemble achieved mean and median Dice coefficients of 0.882 (±0.101) and 0.906 for Stargardt atrophy. The prediction performance on OCT images is comparably good to that on FAF. These results underscore the potential of SD-OCT for efficient and practical assessment of atrophy progression in clinical trials and retina clinics, complementing or surpassing the widely used FAF imaging technique. Full article

(This article belongs to the Special Issue Machine Learning for Object Detection and Scene Description in Images and Videos)

► Show Figures

Journal Menu

Journal Browser

Machine Learning for Object Detection and Scene Description in Images and Videos

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (3 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI