applsci-logo

Journal Browser

Journal Browser

Deep Image Semantic Segmentation and Recognition

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (31 December 2021) | Viewed by 41005

Special Issue Editors


E-Mail Website
Guest Editor
Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
Interests: computer vision

E-Mail Website
Guest Editor
Faculty of Computer and Information Science, University of Ljubljana, 12, 1000 Ljubljana, Slovenia
Interests: biometrics; computer vision

E-Mail Website
Guest Editor
Department of Telecommunications, Brno University of Technology, 616 00 Brno, Czech Republic
Interests: big data; deep learning; computer vision

E-Mail Website
Guest Editor
CITIC research center, University of A Coruña, A Coruña, Spain
Interests: robotics, cognitive robotics, evolutionary robotics, educational robotics, computer vision

Special Issue Information

Dear Colleagues,

Recent advances in hardware development and deep neural network architectures on top of the availability of big image databases spurred many new research directions in the field of computer vision, detection, segmentation, semantics extraction and recognition. Motivation for these research efforts stems from various practical applications ranging from autonomous driving to robotics in agriculture, from medical image analysis and biometrics to geosensing, and many more application areas that will benefit from significant improvement in performance of segmentation and recognition algorithms based on deep neural networks.

The aim of this special issue is to gather state of the art research to provide practitioners with broad overview of suitable deep neural network architectures and applications areas with objective performance metrices. We welcome well structured manuscripts with nicely illustrated background and novelty. We also recommend to authors to make the source code, databases, models and architectures publicly available, and to submit multimedia with each manuscript as it significantly increases the visibility and citations of publications.

Prof. Dr. Aleš Jaklič,
Prof. Dr. Peter Peer,
Prof. Dr. Radim Burget,
Prof. Dr. Fran Bellas
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • deep learning
  • detection
  • segmentation
  • recognition
  • reconstruction
  • grouping
  • semantics
  • verification
  • identification

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 2624 KiB  
Article
AB-ResUNet+: Improving Multiple Cardiovascular Structure Segmentation from Computed Tomography Angiography Images
by Marija Habijan, Irena Galić, Krešimir Romić and Hrvoje Leventić
Appl. Sci. 2022, 12(6), 3024; https://doi.org/10.3390/app12063024 - 16 Mar 2022
Cited by 8 | Viewed by 4048
Abstract
Accurate segmentation of cardiovascular structures plays an important role in many clinical applications. Recently, fully convolutional networks (FCNs), led by the UNet architecture, have significantly improved the accuracy and speed of semantic segmentation tasks, greatly improving medical segmentation and analysis tasks. The UNet [...] Read more.
Accurate segmentation of cardiovascular structures plays an important role in many clinical applications. Recently, fully convolutional networks (FCNs), led by the UNet architecture, have significantly improved the accuracy and speed of semantic segmentation tasks, greatly improving medical segmentation and analysis tasks. The UNet architecture makes heavy use of contextual information. However, useful channel features are not fully exploited. In this work, we present an improved UNet architecture that exploits residual learning, squeeze and excitation operations, Atrous Spatial Pyramid Pooling (ASPP), and the attention mechanism for accurate and effective segmentation of complex cardiovascular structures and name it AB-ResUNet+. The channel attention block is inserted into the skip connection to optimize the coding ability of each layer. The ASPP block is located at the bottom of the network and acts as a bridge between the encoder and decoder. This increases the field of view of the filters and allows them to include a wider context. The proposed AB-ResUNet+ is evaluated on eleven datasets of different cardiovascular structures, including coronary sinus (CS), descending aorta (DA), inferior vena cava (IVC), left atrial appendage (LAA), left atrial wall (LAW), papillary muscle (PM), posterior mitral leaflet (PML), proximal ascending aorta (PAA), pulmonary aorta (PA), right ventricular wall (RVW), and superior vena cava (SVC). Our experimental evaluations show that the proposed AB-ResUNet+ significantly outperforms the UNet, ResUNet, and ResUNet++ architecture by achieving higher values in terms of Dice coefficient and mIoU. Full article
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)
Show Figures

Figure 1

21 pages, 1307 KiB  
Article
Deeply-Supervised 3D Convolutional Neural Networks for Automated Ovary and Follicle Detection from Ultrasound Volumes
by Božidar Potočnik and Martin Šavc
Appl. Sci. 2022, 12(3), 1246; https://doi.org/10.3390/app12031246 - 25 Jan 2022
Cited by 5 | Viewed by 2174
Abstract
Automated detection of ovarian follicles in ultrasound images is much appreciated when its effectiveness is comparable with the experts’ annotations. Today’s best methods estimate follicles notably worse than the experts. This paper describes the development of two-stage deeply-supervised 3D Convolutional Neural Networks (CNN) [...] Read more.
Automated detection of ovarian follicles in ultrasound images is much appreciated when its effectiveness is comparable with the experts’ annotations. Today’s best methods estimate follicles notably worse than the experts. This paper describes the development of two-stage deeply-supervised 3D Convolutional Neural Networks (CNN) based on the established U-Net. Either the entire U-Net or specific parts of the U-Net decoder were replicated in order to integrate the prior knowledge into the detection. Methods were trained end-to-end by follicle detection, while transfer learning was employed for ovary detection. The USOVA3D database of annotated ultrasound volumes, with its verification protocol, was used to verify the effectiveness. In follicle detection, the proposed methods estimate follicles up to 2.9% more accurately than the compared methods. With our two-stage CNNs trained by transfer learning, the effectiveness of ovary detection surpasses the up-to-date automated detection methods by about 7.6%. The obtained results demonstrated that our methods estimate follicles only slightly worse than the experts, while the ovaries are detected almost as accurately as by the experts. Statistical analysis of 50 repetitions of CNN model training proved that the training is stable, and that the effectiveness improvements are not only due to random initialisation. Our deeply-supervised 3D CNNs can be adapted easily to other problem domains. Full article
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)
Show Figures

Figure 1

14 pages, 5034 KiB  
Article
An Advanced Spectral–Spatial Classification Framework for Hyperspectral Imagery Based on DeepLab v3+
by Yifan Si, Dawei Gong, Yang Guo, Xinhua Zhu, Qiangsheng Huang, Julian Evans, Sailing He and Yaoran Sun
Appl. Sci. 2021, 11(12), 5703; https://doi.org/10.3390/app11125703 - 19 Jun 2021
Cited by 13 | Viewed by 2605
Abstract
DeepLab v3+ neural network shows excellent performance in semantic segmentation. In this paper, we proposed a segmentation framework based on DeepLab v3+ neural network and applied it to the problem of hyperspectral imagery classification (HSIC). The dimensionality reduction of the hyperspectral image is [...] Read more.
DeepLab v3+ neural network shows excellent performance in semantic segmentation. In this paper, we proposed a segmentation framework based on DeepLab v3+ neural network and applied it to the problem of hyperspectral imagery classification (HSIC). The dimensionality reduction of the hyperspectral image is performed using principal component analysis (PCA). DeepLab v3+ is used to extract spatial features, and those are fused with spectral features. A support vector machine (SVM) classifier is used for fitting and classification. Experimental results show that the framework proposed in this paper outperforms most traditional machine learning algorithms and deep-learning algorithms in hyperspectral imagery classification tasks. Full article
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)
Show Figures

Figure 1

18 pages, 7145 KiB  
Article
Supervised Learning Based Peripheral Vision System for Immersive Visual Experiences for Extended Display
by Muhammad Ayaz Shirazi, Riaz Uddin and Min-Young Kim
Appl. Sci. 2021, 11(11), 4726; https://doi.org/10.3390/app11114726 - 21 May 2021
Cited by 1 | Viewed by 2579
Abstract
Video display content can be extended to the walls of the living room around the TV using projection. The problem of providing appropriate projection content is hard for the computer and we solve this problem with deep neural network. We propose the peripheral [...] Read more.
Video display content can be extended to the walls of the living room around the TV using projection. The problem of providing appropriate projection content is hard for the computer and we solve this problem with deep neural network. We propose the peripheral vision system that provides the immersive visual experiences to the user by extending the video content using deep learning and projecting that content around the TV screen. The user may manually create the appropriate content for the existing TV screen, but it is too expensive to create it. The PCE (Pixel context encoder) network considers the center of the video frame as input and the outside area as output to extend the content using supervised learning. The proposed system is expected to pave a new road to the home appliance industry, transforming the living room into the new immersive experience platform. Full article
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)
Show Figures

Figure 1

24 pages, 9133 KiB  
Article
How to Correctly Detect Face-Masks for COVID-19 from Visual Information?
by Borut Batagelj, Peter Peer, Vitomir Štruc and Simon Dobrišek
Appl. Sci. 2021, 11(5), 2070; https://doi.org/10.3390/app11052070 - 26 Feb 2021
Cited by 78 | Viewed by 10882
Abstract
The new Coronavirus disease (COVID-19) has seriously affected the world. By the end of November 2020, the global number of new coronavirus cases had already exceeded 60 million and the number of deaths 1,410,378 according to information from the World Health Organization (WHO). [...] Read more.
The new Coronavirus disease (COVID-19) has seriously affected the world. By the end of November 2020, the global number of new coronavirus cases had already exceeded 60 million and the number of deaths 1,410,378 according to information from the World Health Organization (WHO). To limit the spread of the disease, mandatory face-mask rules are now becoming common in public settings around the world. Additionally, many public service providers require customers to wear face-masks in accordance with predefined rules (e.g., covering both mouth and nose) when using public services. These developments inspired research into automatic (computer-vision-based) techniques for face-mask detection that can help monitor public behavior and contribute towards constraining the COVID-19 pandemic. Although existing research in this area resulted in efficient techniques for face-mask detection, these usually operate under the assumption that modern face detectors provide perfect detection performance (even for masked faces) and that the main goal of the techniques is to detect the presence of face-masks only. In this study, we revisit these common assumptions and explore the following research questions: (i) How well do existing face detectors perform with masked-face images? (ii) Is it possible to detect a proper (regulation-compliant) placement of facial masks? and iii) How useful are existing face-mask detection techniques for monitoring applications during the COVID-19 pandemic? To answer these and related questions we conduct a comprehensive experimental evaluation of several recent face detectors for their performance with masked-face images. Furthermore, we investigate the usefulness of multiple off-the-shelf deep-learning models for recognizing correct face-mask placement. Finally, we design a complete pipeline for recognizing whether face-masks are worn correctly or not and compare the performance of the pipeline with standard face-mask detection models from the literature. To facilitate the study, we compile a large dataset of facial images from the publicly available MAFA and Wider Face datasets and annotate it with compliant and non-compliant labels. The annotation dataset, called Face-Mask-Label Dataset (FMLD), is made publicly available to the research community. Full article
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)
Show Figures

Figure 1

15 pages, 5709 KiB  
Article
Semantic 3D Mapping from Deep Image Segmentation
by Francisco Martín, Fernando González, José Miguel Guerrero, Manuel Fernández and Jonatan Ginés
Appl. Sci. 2021, 11(4), 1953; https://doi.org/10.3390/app11041953 - 23 Feb 2021
Cited by 4 | Viewed by 3255
Abstract
The perception and identification of visual stimuli from the environment is a fundamental capacity of autonomous mobile robots. Current deep learning techniques make it possible to identify and segment objects of interest in an image. This paper presents a novel algorithm to segment [...] Read more.
The perception and identification of visual stimuli from the environment is a fundamental capacity of autonomous mobile robots. Current deep learning techniques make it possible to identify and segment objects of interest in an image. This paper presents a novel algorithm to segment the object’s space from a deep segmentation of an image taken by a 3D camera. The proposed approach solves the boundary pixel problem that appears when a direct mapping from segmented pixels to their correspondence in the point cloud is used. We validate our approach by comparing baseline approaches using real images taken by a 3D camera, showing that our method outperforms their results in terms of accuracy and reliability. As an application of the proposed algorithm, we present a semantic mapping approach for a mobile robot’s indoor environments. Full article
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)
Show Figures

Figure 1

20 pages, 20267 KiB  
Article
Dual-Window Superpixel Data Augmentation for Hyperspectral Image Classification
by Álvaro Acción, Francisco Argüello and Dora B. Heras
Appl. Sci. 2020, 10(24), 8833; https://doi.org/10.3390/app10248833 - 10 Dec 2020
Cited by 25 | Viewed by 3126
Abstract
Deep learning (DL) has been shown to obtain superior results for classification tasks in the field of remote sensing hyperspectral imaging. Superpixel-based techniques can be applied to DL, significantly decreasing training and prediction times, but the results are usually far from satisfactory due [...] Read more.
Deep learning (DL) has been shown to obtain superior results for classification tasks in the field of remote sensing hyperspectral imaging. Superpixel-based techniques can be applied to DL, significantly decreasing training and prediction times, but the results are usually far from satisfactory due to overfitting. Data augmentation techniques alleviate the problem by synthetically generating new samples from an existing dataset in order to improve the generalization capabilities of the classification model. In this paper we propose a novel data augmentation framework in the context of superpixel-based DL called dual-window superpixel (DWS). With DWS, data augmentation is performed over patches centered on the superpixels obtained by the application of simple linear iterative clustering (SLIC) superpixel segmentation. DWS is based on dividing the input patches extracted from the superpixels into two regions and independently applying transformations over them. As a result, four different data augmentation techniques are proposed that can be applied to a superpixel-based CNN classification scheme. An extensive comparison in terms of classification accuracy with other data augmentation techniques from the literature using two datasets is also shown. One of the datasets consists of small hyperspectral small scenes commonly found in the literature. The other consists of large multispectral vegetation scenes of river basins. The experimental results show that the proposed approach increases the overall classification accuracy for the selected datasets. In particular, two of the data augmentation techniques introduced, namely, dual-flip and dual-rotate, obtained the best results. Full article
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)
Show Figures

Figure 1

27 pages, 8373 KiB  
Article
Multi-Frame Labeled Faces Database: Towards Face Super-Resolution from Realistic Video Sequences
by Martin Rajnoha, Anzhelika Mezina and Radim Burget
Appl. Sci. 2020, 10(20), 7213; https://doi.org/10.3390/app10207213 - 16 Oct 2020
Cited by 6 | Viewed by 3199
Abstract
Forensically trained facial reviewers are still considered as one of the most accurate approaches for person identification from video records. The human brain can utilize information, not just from a single image, but also from a sequence of images (i.e., videos), and even [...] Read more.
Forensically trained facial reviewers are still considered as one of the most accurate approaches for person identification from video records. The human brain can utilize information, not just from a single image, but also from a sequence of images (i.e., videos), and even in the case of low-quality records or a long distance from a camera, it can accurately identify a given person. Unfortunately, in many cases, a single still image is needed. An example of such a case is a police search that is about to be announced in newspapers. This paper introduces a face database obtained from real environment counting in 17,426 sequences of images. The dataset includes persons of various races and ages and also different environments, different lighting conditions or camera device types. This paper also introduces a new multi-frame face super-resolution method and compares this method with the state-of-the-art single-frame and multi-frame super-resolution methods. We prove that the proposed method increases the quality of face images, even in cases of low-resolution low-quality input images, and provides better results than single-frame approaches that are still considered the best in this area. Quality of face images was evaluated using several objective mathematical methods, and also subjective ones, by several volunteers. The source code and the dataset were released and the experiment is fully reproducible. Full article
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)
Show Figures

Figure 1

12 pages, 1067 KiB  
Article
LSUN-Stanford Car Dataset: Enhancing Large-Scale Car Image Datasets Using Deep Learning for Usage in GAN Training
by Tin Kramberger and Božidar Potočnik
Appl. Sci. 2020, 10(14), 4913; https://doi.org/10.3390/app10144913 - 17 Jul 2020
Cited by 30 | Viewed by 6461
Abstract
Currently there is no publicly available adequate dataset that could be used for training Generative Adversarial Networks (GANs) on car images. All available car datasets differ in noise, pose, and zoom levels. Thus, the objective of this work was to create an improved [...] Read more.
Currently there is no publicly available adequate dataset that could be used for training Generative Adversarial Networks (GANs) on car images. All available car datasets differ in noise, pose, and zoom levels. Thus, the objective of this work was to create an improved car image dataset that would be better suited for GAN training. To improve the performance of the GAN, we coupled the LSUN and Stanford car datasets. A new merged dataset was then pruned in order to adjust zoom levels and reduce the noise of images. This process resulted in fewer images that could be used for training, with increased quality though. This pruned dataset was evaluated by training the StyleGAN with original settings. Pruning the combined LSUN and Stanford datasets resulted in 2,067,710 images of cars with less noise and more adjusted zoom levels. The training of the StyleGAN on the LSUN-Stanford car dataset proved to be superior to the training with just the LSUN dataset by 3.7% using the Fréchet Inception Distance (FID) as a metric. Results pointed out that the proposed LSUN-Stanford car dataset is more consistent and better suited for training GAN neural networks than other currently available large car datasets. Full article
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)
Show Figures

Figure 1

Back to TopTop