Journal of Imaging

15 pages, 419 KiB

Open AccessArticle

Full-Reference Image Quality Assessment Based on an Optimal Linear Combination of Quality Measures Selected by Simulated Annealing

by Domonkos Varga

J. Imaging 2022, 8(8), 224; https://doi.org/10.3390/jimaging8080224 - 21 Aug 2022

Cited by 7 | Viewed by 2179

Abstract

Digital images can be distorted or contaminated by noise in various steps of image acquisition, transmission, and storage. Thus, the research of such algorithms, which can evaluate the perceptual quality of digital images consistent with human quality judgement, is a hot topic in [...] Read more.

Digital images can be distorted or contaminated by noise in various steps of image acquisition, transmission, and storage. Thus, the research of such algorithms, which can evaluate the perceptual quality of digital images consistent with human quality judgement, is a hot topic in the literature. In this study, an image quality assessment (IQA) method is introduced that predicts the perceptual quality of a digital image by optimally combining several IQA metrics. To be more specific, an optimization problem is defined first using the weighted sum of a few IQA metrics. Subsequently, the optimal values of the weights are determined by minimizing the root mean square error between the predicted and ground-truth scores using the simulated annealing algorithm. The resulted optimization-based IQA metrics were assessed and compared to other state-of-the-art methods on four large, widely applied benchmark IQA databases. The numerical results empirically corroborate that the proposed approach is able to surpass other competing IQA methods. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

13 pages, 2839 KiB

Open AccessArticle

Image Quality Comparison between Digital Breast Tomosynthesis Images and 2D Mammographic Images Using the CDMAM Test Object

by Ioannis A. Tsalafoutas, Angeliki C. Epistatou and Konstantinos K. Delibasis

J. Imaging 2022, 8(8), 223; https://doi.org/10.3390/jimaging8080223 - 21 Aug 2022

Viewed by 1607

Abstract

To evaluate the image quality (IQ) of synthesized two-dimensional (s2D) and tomographic layer (TL) mammographic images in comparison to the 2D digital mammographic images produced with a new digital breast tomosynthesis (DBT) system. Methods: The CDMAM test object was used for IQ evaluation [...] Read more.

To evaluate the image quality (IQ) of synthesized two-dimensional (s2D) and tomographic layer (TL) mammographic images in comparison to the 2D digital mammographic images produced with a new digital breast tomosynthesis (DBT) system. Methods: The CDMAM test object was used for IQ evaluation of actual 2D images, s2D and TL images, acquired using all available acquisition modes. Evaluation was performed automatically using the commercial software that accompanied CDMAM. Results: The IQ scores of the TLs with the in-focus CDMAM were comparable, although usually inferior to those of 2D images acquired with the same acquisition mode, and better than the respective s2D images. The IQ results of TLs satisfied the EUREF limits applicable to 2D images, whereas for s2D images this was not the case. The use of high-dose mode (H-mode), instead of normal-dose mode (N-mode), increased the image quality of both TL and s2D images, especially when the standard mode (ST) was used. Although the high-resolution (HR) mode produced TL images of similar or better image quality compared to ST mode, HR s2D images were clearly inferior to ST s2D images. Conclusions: s2D images present inferior image quality compared to 2D and TL images. The HR mode produces TL images and s2D images with half the pixel size and requires a 25% increase in average glandular dose (AGD). Despite that, IQ evaluation results with CDMAM are in favor of HR resolution mode only for TL images and mainly for smaller-sized details. Full article

(This article belongs to the Special Issue Advances in IoMT, Deep Learning and Computer Vision for Mammographic Image Analysis)

► Show Figures

Figure 1

15 pages, 32829 KiB

Open AccessArticle

Unsupervised Domain Adaptation for Vertebrae Detection and Identification in 3D CT Volumes Using a Domain Sanity Loss

by Pascal Sager, Sebastian Salzmann, Felice Burn and Thilo Stadelmann

J. Imaging 2022, 8(8), 222; https://doi.org/10.3390/jimaging8080222 - 19 Aug 2022

Cited by 3 | Viewed by 1782

Abstract

A variety of medical computer vision applications analyze 2D slices of computed tomography (CT) scans, whereas axial slices from the body trunk region are usually identified based on their relative position to the spine. A limitation of such systems is that either the [...] Read more.

A variety of medical computer vision applications analyze 2D slices of computed tomography (CT) scans, whereas axial slices from the body trunk region are usually identified based on their relative position to the spine. A limitation of such systems is that either the correct slices must be extracted manually or labels of the vertebrae are required for each CT scan to develop an automated extraction system. In this paper, we propose an unsupervised domain adaptation (UDA) approach for vertebrae detection and identification based on a novel Domain Sanity Loss (DSL) function. With UDA the model’s knowledge learned on a publicly available (source) data set can be transferred to the target domain without using target labels, where the target domain is defined by the specific setup (CT modality, study protocols, applied pre- and processing) at the point of use (e.g., a specific clinic with its specific CT study protocols). With our approach, a model is trained on the source and target data set in parallel. The model optimizes a supervised loss for labeled samples from the source domain and the DSL loss function based on domain-specific “sanity checks” for samples from the unlabeled target domain. Without using labels from the target domain, we are able to identify vertebra centroids with an accuracy of

72.8

%. By adding only ten target labels during training the accuracy increases to

89.2

%, which is on par with the current state-of-the-art for full supervised learning, while using about 20 times less labels. Thus, our model can be used to extract 2D slices from 3D CT scans on arbitrary data sets fully automatically without requiring an extensive labeling effort, contributing to the clinical adoption of medical imaging by hospitals. Full article

(This article belongs to the Special Issue Advances in Deep Neural Networks for Visual Pattern Recognition)

► Show Figures

Figure 1

13 pages, 3872 KiB

Open AccessArticle

matRadiomics: A Novel and Complete Radiomics Framework, from Image Visualization to Predictive Model

by Giovanni Pasini, Fabiano Bini, Giorgio Russo, Albert Comelli, Franco Marinozzi and Alessandro Stefano

J. Imaging 2022, 8(8), 221; https://doi.org/10.3390/jimaging8080221 - 18 Aug 2022

Cited by 25 | Viewed by 3743

Abstract

Radiomics aims to support clinical decisions through its workflow, which is divided into: (i) target identification and segmentation, (ii) feature extraction, (iii) feature selection, and (iv) model fitting. Many radiomics tools were developed to fulfill the steps mentioned above. However, to date, users [...] Read more.

Radiomics aims to support clinical decisions through its workflow, which is divided into: (i) target identification and segmentation, (ii) feature extraction, (iii) feature selection, and (iv) model fitting. Many radiomics tools were developed to fulfill the steps mentioned above. However, to date, users must switch different software to complete the radiomics workflow. To address this issue, we developed a new free and user-friendly radiomics framework, namely matRadiomics, which allows the user: (i) to import and inspect biomedical images, (ii) to identify and segment the target, (iii) to extract the features, (iv) to reduce and select them, and (v) to build a predictive model using machine learning algorithms. As a result, biomedical images can be visualized and segmented and, through the integration of Pyradiomics into matRadiomics, radiomic features can be extracted. These features can be selected using a hybrid descriptive–inferential method, and, consequently, used to train three different classifiers: linear discriminant analysis, k-nearest neighbors, and support vector machines. Model validation is performed using k-fold cross-Validation and k-fold stratified cross-validation. Finally, the performance metrics of each model are shown in the graphical interface of matRadiomics. In this study, we discuss the workflow, architecture, application, future development of matRadiomics, and demonstrate its working principles in a real case study with the aim of establishing a reference standard for the whole radiomics analysis, starting from the image visualization up to the predictive model implementation. Full article

(This article belongs to the Special Issue Radiomics and Texture Analysis in Medical Imaging)

► Show Figures

Figure 1

10 pages, 649 KiB

Open AccessArticle

Effect of Gray Value Discretization and Image Filtration on Texture Features of the Pancreas Derived from Magnetic Resonance Imaging at 3T

by Bassam M. Abunahel, Beau Pontre and Maxim S. Petrov

J. Imaging 2022, 8(8), 220; https://doi.org/10.3390/jimaging8080220 - 18 Aug 2022

Cited by 1 | Viewed by 1568

Abstract

Radiomics of pancreas magnetic resonance (MR) images is positioned well to play an important role in the management of diseases characterized by diffuse involvement of the pancreas. The effect of image pre-processing configurations on these images has been sparsely investigated. Fifteen individuals with [...] Read more.

Radiomics of pancreas magnetic resonance (MR) images is positioned well to play an important role in the management of diseases characterized by diffuse involvement of the pancreas. The effect of image pre-processing configurations on these images has been sparsely investigated. Fifteen individuals with definite chronic pancreatitis (an exemplar diffuse disease of the pancreas) and 15 healthy individuals were included in this age- and sex-matched case-control study. MR images of the pancreas were acquired using a single 3T scanner. A total of 93 first-order and second-order texture features of the pancreas were compared between the study groups, by subjecting MR images of the pancreas to 7 image pre-processing configurations related to gray level discretization and image filtration. The studied parameters of intensity discretization did not vary in terms of their effect on the number of significant first-order texture features. The number of statistically significant first-order texture features varied after filtering (7 with the use of logarithm filter and 3 with the use of Laplacian of Gaussian filter with 5 mm σ). Intensity discretization generally affected the number of significant second-order texture features more markedly than filtering. The use of fixed bin number of 16 yielded 42 significant second-order texture features, fixed bin number of 128–38 features, fixed bin width of 6–24 features, and fixed bin width of 42–26 features. The specific parameters of filtration and intensity discretization had differing effects on radiomics signature of the pancreas. Relative discretization with fixed bin number of 16 and use of logarithm filter hold promise as pre-processing configurations of choice in future radiomics studies in diffuse diseases of the pancreas. Full article

(This article belongs to the Special Issue Radiomics and Texture Analysis in Medical Imaging)

► Show Figures

Figure 1

12 pages, 4822 KiB

Open AccessArticle

Multi-Camera Multi-Person Tracking and Re-Identification in an Operating Room

by Haowen Hu, Ryo Hachiuma, Hideo Saito, Yoshifumi Takatsume and Hiroki Kajita

J. Imaging 2022, 8(8), 219; https://doi.org/10.3390/jimaging8080219 - 17 Aug 2022

Cited by 3 | Viewed by 3523

Abstract

Multi-camera multi-person (MCMP) tracking and re-identification (ReID) are essential tasks in safety, pedestrian analysis, and so on; however, most research focuses on outdoor scenarios because they are much more complicated to deal with occlusions and misidentification in a crowded room with obstacles. Moreover, [...] Read more.

Multi-camera multi-person (MCMP) tracking and re-identification (ReID) are essential tasks in safety, pedestrian analysis, and so on; however, most research focuses on outdoor scenarios because they are much more complicated to deal with occlusions and misidentification in a crowded room with obstacles. Moreover, it is challenging to complete the two tasks in one framework. We present a trajectory-based method, integrating tracking and ReID tasks. First, the poses of all surgical members captured by each camera are detected frame-by-frame; then, the detected poses are exploited to track the trajectories of all members for each camera; finally, these trajectories of different cameras are clustered to re-identify the members in the operating room across all cameras. Compared to other MCMP tracking and ReID methods, the proposed one mainly exploits trajectories, taking texture features that are less distinguishable in the operating room scenario as auxiliary cues. We also integrate temporal information during ReID, which is more reliable than the state-of-the-art framework where ReID is conducted frame-by-frame. In addition, our framework requires no training before deployment in new scenarios. We also created an annotated MCMP dataset with actual operating room videos. Our experiments prove the effectiveness of the proposed trajectory-based ReID algorithm. The proposed framework achieves 85.44% accuracy in the ReID task, outperforming the state-of-the-art framework in our operating room dataset. Full article

(This article belongs to the Special Issue Advances in Human Action Recognition Using Deep Learning)

► Show Figures

Figure 1

20 pages, 5696 KiB

Open AccessArticle

Drone Model Classification Using Convolutional Neural Network Trained on Synthetic Data

by Mariusz Wisniewski, Zeeshan A. Rana and Ivan Petrunin

J. Imaging 2022, 8(8), 218; https://doi.org/10.3390/jimaging8080218 - 12 Aug 2022

Cited by 14 | Viewed by 4408

Abstract

We present a convolutional neural network (CNN) that identifies drone models in real-life videos. The neural network is trained on synthetic images and tested on a real-life dataset of drone videos. To create the training and validation datasets, we show a method of [...] Read more.

We present a convolutional neural network (CNN) that identifies drone models in real-life videos. The neural network is trained on synthetic images and tested on a real-life dataset of drone videos. To create the training and validation datasets, we show a method of generating synthetic drone images. Domain randomization is used to vary the simulation parameters such as model textures, background images, and orientation. Three common drone models are classified: DJI Phantom, DJI Mavic, and DJI Inspire. To test the performance of the neural network model, Anti-UAV, a real-life dataset of flying drones is used. The proposed method reduces the time-cost associated with manually labelling drones, and we prove that it is transferable to real-life videos. The CNN achieves an overall accuracy of 92.4%, a precision of 88.8%, a recall of 88.6%, and an f1 score of 88.7% when tested on the real-life dataset. Full article

► Show Figures

Graphical abstract

29 pages, 21933 KiB

Open AccessArticle

Compact Hybrid Multi-Color Space Descriptor Using Clustering-Based Feature Selection for Texture Classification

by Mohamed Alimoussa, Alice Porebski, Nicolas Vandenbroucke, Sanaa El Fkihi and Rachid Oulad Haj Thami

J. Imaging 2022, 8(8), 217; https://doi.org/10.3390/jimaging8080217 - 8 Aug 2022

Cited by 4 | Viewed by 1820

Abstract

Color texture classification aims to recognize patterns by the analysis of their colors and their textures. This process requires using descriptors to represent and discriminate the different texture classes. In most traditional approaches, these descriptors are used with a predefined setting of their [...] Read more.

Color texture classification aims to recognize patterns by the analysis of their colors and their textures. This process requires using descriptors to represent and discriminate the different texture classes. In most traditional approaches, these descriptors are used with a predefined setting of their parameters and computed from images coded in a chosen color space. The prior choice of a color space, a descriptor and its setting suited to a given application is a crucial but difficult problem that strongly impacts the classification results. To overcome this problem, this paper proposes a color texture representation that simultaneously takes into account the properties of several settings from different descriptors computed from images coded in multiple color spaces. Since the number of color texture features generated from this representation is high, a dimensionality reduction scheme by clustering-based sequential feature selection is applied to provide a compact hybrid multi-color space (CHMCS) descriptor. The experimental results carried out on five benchmark color texture databases with five color spaces and manifold settings of two texture descriptors show that combining different configurations always improves the accuracy compared to a predetermined configuration. On average, the CHMCS representation achieves 94.16% accuracy and outperforms deep learning networks and handcrafted color texture descriptors by over 5%, especially when the dataset is small. Full article

(This article belongs to the Special Issue Color Texture Classification)

► Show Figures

Figure 1

13 pages, 21914 KiB

Open AccessArticle

A Dataset for Temporal Semantic Segmentation Dedicated to Smart Mobility of Wheelchairs on Sidewalks

by Benoit Decoux, Redouane Khemmar, Nicolas Ragot, Arthur Venon, Marcos Grassi-Pampuch, Antoine Mauri, Louis Lecrosnier and Vishnu Pradeep

J. Imaging 2022, 8(8), 216; https://doi.org/10.3390/jimaging8080216 - 7 Aug 2022

Cited by 2 | Viewed by 2158

Abstract

In smart mobility, the semantic segmentation of images is an important task for a good understanding of the environment. In recent years, many studies have been made on this subject, in the field of Autonomous Vehicles on roads. Some image datasets are available [...] Read more.

In smart mobility, the semantic segmentation of images is an important task for a good understanding of the environment. In recent years, many studies have been made on this subject, in the field of Autonomous Vehicles on roads. Some image datasets are available for learning semantic segmentation models, leading to very good performance. However, for other types of autonomous mobile systems like Electric Wheelchairs (EW) on sidewalks, there is no specific dataset. Our contribution presented in this article is twofold: (1) the proposal of a new dataset of short sequences of exterior images of street scenes taken from viewpoints located on sidewalks, in a 3D virtual environment (CARLA); (2) a convolutional neural network (CNN) adapted for temporal processing and including additional techniques to improve its accuracy. Our dataset includes a smaller subset, made of image pairs taken from the same places in the maps of the virtual environment, but from different viewpoints: one located on the road and the other located on the sidewalk. This additional set is aimed at showing the importance of the viewpoint in the result of semantic segmentation. Full article

(This article belongs to the Special Issue Computer Vision and Scene Understanding for Autonomous Driving)

► Show Figures

Figure 1

18 pages, 3939 KiB

Open AccessArticle

Proposals Generation for Weakly Supervised Object Detection in Artwork Images

by Federico Milani, Nicolò Oreste Pinciroli Vago and Piero Fraternali

J. Imaging 2022, 8(8), 215; https://doi.org/10.3390/jimaging8080215 - 6 Aug 2022

Cited by 3 | Viewed by 1881

Abstract

Object Detection requires many precise annotations, which are available for natural images but not for many non-natural data sets such as artworks data sets. A solution is using Weakly Supervised Object Detection (WSOD) techniques that learn accurate object localization from image-level labels. Studies [...] Read more.

Object Detection requires many precise annotations, which are available for natural images but not for many non-natural data sets such as artworks data sets. A solution is using Weakly Supervised Object Detection (WSOD) techniques that learn accurate object localization from image-level labels. Studies have demonstrated that state-of-the-art end-to-end architectures may not be suitable for domains in which images or classes sensibly differ from those used to pre-train networks. This paper presents a novel two-stage Weakly Supervised Object Detection approach for obtaining accurate bounding boxes on non-natural data sets. The proposed method exploits existing classification knowledge to generate pseudo-ground truth bounding boxes from Class Activation Maps (CAMs). The automatically generated annotations are used to train a robust Faster R-CNN object detector. Quantitative and qualitative analysis shows that bounding boxes generated from CAMs can compensate for the lack of manually annotated ground truth (GT) and that an object detector, trained with such pseudo-GT, surpasses end-to-end WSOD state-of-the-art methods on ArtDL 2.0 (≈41.5% mAP) and IconArt (≈17% mAP), two artworks data sets. The proposed solution is a step towards the computer-aided study of non-natural images and opens the way to more advanced tasks, e.g., automatic artwork image captioning for digital archive applications. Full article

(This article belongs to the Special Issue Unsupervised Deep Learning and Its Applications in Imaging Processing)

► Show Figures

Figure 1

22 pages, 1913 KiB

Open AccessArticle

Targeted Data Augmentation and Hierarchical Classification with Deep Learning for Fish Species Identification in Underwater Images

by Abdelouahid Ben Tamou, Abdesslam Benzinou and Kamal Nasreddine

J. Imaging 2022, 8(8), 214; https://doi.org/10.3390/jimaging8080214 - 1 Aug 2022

Cited by 5 | Viewed by 2461

Abstract

In this paper, we address fish species identification in underwater video for marine monitoring applications such as the study of marine biodiversity. Video is the least disruptive monitoring method for fish but requires efficient techniques of image processing and analysis to overcome challenging [...] Read more.

In this paper, we address fish species identification in underwater video for marine monitoring applications such as the study of marine biodiversity. Video is the least disruptive monitoring method for fish but requires efficient techniques of image processing and analysis to overcome challenging underwater environments. We propose two Deep Convolutional Neural Network (CNN) approaches for fish species classification in unconstrained underwater environment. In the first approach, we use a traditional transfer learning framework and we investigate a new technique based on training/validation loss curves for targeted data augmentation. In the second approach, we propose a hierarchical CNN classification to classify fish first into family levels and then into species categories. To demonstrate the effectiveness of the proposed approaches, experiments are carried out on two benchmark datasets for automatic fish identification in unconstrained underwater environment. The proposed approaches yield accuracies of 99.86% and 81.53% on the Fish Recognition Ground-Truth dataset and LifeClef 2015 Fish dataset, respectively. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

24 pages, 5510 KiB

Open AccessArticle

HEROHE Challenge: Predicting HER2 Status in Breast Cancer from Hematoxylin–Eosin Whole-Slide Imaging

by Eduardo Conde-Sousa, João Vale, Ming Feng, Kele Xu, Yin Wang, Vincenzo Della Mea, David La Barbera, Ehsan Montahaei, Mahdieh Baghshah, Andreas Turzynski, Jacob Gildenblat, Eldad Klaiman, Yiyu Hong, Guilherme Aresta, Teresa Araújo, Paulo Aguiar, Catarina Eloy and Antonio Polónia

J. Imaging 2022, 8(8), 213; https://doi.org/10.3390/jimaging8080213 - 31 Jul 2022

Cited by 11 | Viewed by 4912

Abstract

Breast cancer is the most common malignancy in women worldwide, and is responsible for more than half a million deaths each year. The appropriate therapy depends on the evaluation of the expression of various biomarkers, such as the human epidermal growth factor receptor [...] Read more.

Breast cancer is the most common malignancy in women worldwide, and is responsible for more than half a million deaths each year. The appropriate therapy depends on the evaluation of the expression of various biomarkers, such as the human epidermal growth factor receptor 2 (HER2) transmembrane protein, through specialized techniques, such as immunohistochemistry or in situ hybridization. In this work, we present the HER2 on hematoxylin and eosin (HEROHE) challenge, a parallel event of the 16th European Congress on Digital Pathology, which aimed to predict the HER2 status in breast cancer based only on hematoxylin–eosin-stained tissue samples, thus avoiding specialized techniques. The challenge consisted of a large, annotated, whole-slide images dataset (509), specifically collected for the challenge. Models for predicting HER2 status were presented by 21 teams worldwide. The best-performing models are presented by detailing the network architectures and key parameters. Methods are compared and approaches, core methodologies, and software choices contrasted. Different evaluation metrics are discussed, as well as the performance of the presented models for each of these metrics. Potential differences in ranking that would result from different choices of evaluation metrics highlight the need for careful consideration at the time of their selection, as the results show that some metrics may misrepresent the true potential of a model to solve the problem for which it was developed. The HEROHE dataset remains publicly available to promote advances in the field of computational pathology. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Figure 1

16 pages, 959 KiB

Open AccessFeature PaperOpinion

When We Study the Ability to Attend, What Exactly Are We Trying to Understand?

by John K. Tsotsos

J. Imaging 2022, 8(8), 212; https://doi.org/10.3390/jimaging8080212 - 31 Jul 2022

Cited by 3 | Viewed by 2059

Abstract

When we study the human ability to attend, what exactly do we seek to understand? It is not clear what the answer might be to this question. There is still so much to know, while acknowledging the tremendous progress of past decades of [...] Read more.

When we study the human ability to attend, what exactly do we seek to understand? It is not clear what the answer might be to this question. There is still so much to know, while acknowledging the tremendous progress of past decades of research. It is as if each new study adds a tile to the mosaic that, when viewed from a distance, we hope will reveal the big picture of attention. However, there is no map as to how each tile might be placed nor any guide as to what the overall picture might be. It is like digging up bits of mosaic tile at an ancient archeological site with no key as to where to look and then not only having to decide which picture it belongs to but also where exactly in that puzzle it should be placed. I argue that, although the unearthing of puzzle pieces is very important, so is their placement, but this seems much less emphasized. We have mostly unearthed a treasure trove of puzzle pieces but they are all waiting for cleaning and reassembly. It is an activity that is scientifically far riskier, but with great risk comes a greater reward. Here, I will look into two areas of broad agreement, specifically regarding visual attention, and dig deeper into their more nuanced meanings, in the hope of sketching a starting point for the guide to the attention mosaic. The goal is to situate visual attention as a purely computational problem and not as a data explanation task; it may become easier to place the puzzle pieces once you understand why they exist in the first place. Full article

(This article belongs to the Special Issue Human Attention and Visual Cognition)

► Show Figures

Figure 1

11 pages, 258 KiB

Open AccessArticle

Regression Analysis between the Different Breast Dose Quantities Reported in Digital Mammography and Patient Age, Breast Thickness, and Acquisition Parameters

by Salam Dhou, Entesar Dalah, Reda AlGhafeer, Aisha Hamidu and Abdulmunhem Obaideen

J. Imaging 2022, 8(8), 211; https://doi.org/10.3390/jimaging8080211 - 31 Jul 2022

Cited by 3 | Viewed by 1576

Abstract

Breast cancer is the leading cause of cancer death among women worldwide. Screening mammography is considered the primary imaging modality for the early detection of breast cancer. The radiation dose from mammography increases the patients’ risk of radiation-induced cancer. The mean glandular dose [...] Read more.

Breast cancer is the leading cause of cancer death among women worldwide. Screening mammography is considered the primary imaging modality for the early detection of breast cancer. The radiation dose from mammography increases the patients’ risk of radiation-induced cancer. The mean glandular dose (MGD), or the average glandular dose (AGD), provides an estimate of the absorbed dose of radiation by the glandular tissues of a breast. In this paper, MGD is estimated for the craniocaudal (CC) and mediolateral–oblique (MLO) views using entrance skin dose (ESD), X-ray spectrum information, patient age, breast glandularity, and breast thickness. Moreover, a regression analysis is performed to evaluate the impact of mammography acquisition parameters, age, and breast thickness on the estimated MGD and other machine-produced dose quantities, namely, ESD and organ dose (OD). Furthermore, a correlation study is conducted to evaluate the correlation between the ESD and OD, and the estimated MGD per image view. This retrospective study was applied to a dataset of 2035 mammograms corresponding to a cohort of 486 subjects with an age range of 28–86 years who underwent screening mammography examinations. Linear regression metrics were calculated to evaluate the strength of the correlations. The mean (and range) MGD for the CC view was 0.832 (0.110–3.491) mGy and for the MLO view was 0.995 (0.256–2.949) mGy. All the mammography dose quantities strongly correlated with tube exposure (mAs): ESD (R² = 0.938 for the CC view and R² = 0.945 for the MLO view), OD (R² = 0.969 for the CC view and R² = 0.983 for the MLO view), and MGD (R² = 0.980 for the CC view and R² = 0.972 for the MLO view). Breast thickness showed a better correlation with all the mammography dose quantities than patient age, which showed a poor correlation. Moreover, a strong correlation was found between the calculated MGD and both the ESD (R² = 0.929 for the CC view and R² = 0.914 for the MLO view) and OD (R² = 0.971 for the CC view and R² = 0.972 for the MLO view). Furthermore, it was found that the MLO scan views yield a slightly higher dose compared to CC scan views. It was also found that the glandular absorbed dose is more dependent on glandularity than size. Despite being more reflective of the dose absorbed by the glandular tissue than OD and ESD, MGD is considered labor-intensive and time-consuming to estimate. Full article

(This article belongs to the Section Medical Imaging)

21 pages, 4675 KiB

Open AccessArticle

High-Temporal-Resolution Object Detection and Tracking Using Images and Events

by Zaid El Shair and Samir A. Rawashdeh

J. Imaging 2022, 8(8), 210; https://doi.org/10.3390/jimaging8080210 - 27 Jul 2022

Cited by 5 | Viewed by 3924

Abstract

Event-based vision is an emerging field of computer vision that offers unique properties, such as asynchronous visual output, high temporal resolutions, and dependence on brightness changes, to generate data. These properties can enable robust high-temporal-resolution object detection and tracking when combined with frame-based [...] Read more.

Event-based vision is an emerging field of computer vision that offers unique properties, such as asynchronous visual output, high temporal resolutions, and dependence on brightness changes, to generate data. These properties can enable robust high-temporal-resolution object detection and tracking when combined with frame-based vision. In this paper, we present a hybrid, high-temporal-resolution object detection and tracking approach that combines learned and classical methods using synchronized images and event data. Off-the-shelf frame-based object detectors are used for initial object detection and classification. Then, event masks, generated per detection, are used to enable inter-frame tracking at varying temporal resolutions using the event data. Detections are associated across time using a simple, low-cost association metric. Moreover, we collect and label a traffic dataset using the hybrid sensor DAVIS 240c. This dataset is utilized for quantitative evaluation using state-of-the-art detection and tracking metrics. We provide ground truth bounding boxes and object IDs for each vehicle annotation. Further, we generate high-temporal-resolution ground truth data to analyze tracking performance at different temporal rates. Our approach shows promising results, with minimal performance deterioration at higher temporal resolutions (48–384 Hz) when compared with the baseline frame-based performance at 24 Hz. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications)

► Show Figures

Figure 1

12 pages, 6496 KiB

Open AccessArticle

Indoor Scene Recognition via Object Detection and TF-IDF

by Edvard Heikel and Leonardo Espinosa-Leal

J. Imaging 2022, 8(8), 209; https://doi.org/10.3390/jimaging8080209 - 26 Jul 2022

Cited by 8 | Viewed by 3574

Abstract

Indoor scene recognition and semantic information can be helpful for social robots. Recently, in the field of indoor scene recognition, researchers have incorporated object-level information and shown improved performances. This paper demonstrates that scene recognition can be performed solely using object-level information in [...] Read more.

Indoor scene recognition and semantic information can be helpful for social robots. Recently, in the field of indoor scene recognition, researchers have incorporated object-level information and shown improved performances. This paper demonstrates that scene recognition can be performed solely using object-level information in line with these advances. A state-of-the-art object detection model was trained to detect objects typically found in indoor environments and then used to detect objects in scene data. These predicted objects were then used as features to predict room categories. This paper successfully combines approaches conventionally used in computer vision and natural language processing (YOLO and TF-IDF, respectively). These approaches could be further helpful in the field of embodied research and dynamic scene classification, which we elaborate on. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications)

► Show Figures

Figure 1

9 pages, 695 KiB

Open AccessArticle

Time Is Money: Considerations for Measuring the Radiological Reading Time

by Raphael Sexauer and Caroline Bestler

J. Imaging 2022, 8(8), 208; https://doi.org/10.3390/jimaging8080208 - 24 Jul 2022

Viewed by 1477

Abstract

Timestamps in the Radiology Information System (RIS) are a readily available and valuable source of information with increasing significance, among others, due to the current focus on the clinical impact of artificial intelligence applications. We aimed to evaluate timestamp-based radiological dictation time, introduce [...] Read more.

Timestamps in the Radiology Information System (RIS) are a readily available and valuable source of information with increasing significance, among others, due to the current focus on the clinical impact of artificial intelligence applications. We aimed to evaluate timestamp-based radiological dictation time, introduce timestamp modeling techniques, and compare those with prospective measured reporting. Dictation time was calculated from RIS timestamps between 05/2010 and 01/2021 at our institution (n = 108,310). We minimized contextual outliers by simulating the raw data by iteration (1000, vector size (µ/sd/λ) = 100/loop), assuming normally distributed reporting times. In addition, 329 reporting times were prospectively measured by two radiologists (1 and 4 years of experience). Altogether, 106,127 of 108,310 exams were included after simulation, with a mean dictation time of 16.62 min. Mean dictation time was 16.05 min head CT (44,743/45,596), 15.84 min for chest CT (32,797/33,381), 17.92 min for abdominal CT (n = 22,805/23,483), 10.96 min for CT foot (n = 937/958), 9.14 min for lumbar spine (881/892), 8.83 min for shoulder (409/436), 8.83 min for CT wrist (1201/1322), and 39.20 min for a polytrauma patient (2127/2242), without a significant difference to the prospective reporting times. In conclusion, timestamp analysis is useful to measure current reporting practice, whereas body-region and radiological experience are confounders. This could aid in cost–benefit assessments of workflow changes (e.g., AI implementation). Full article

► Show Figures

Figure 1

27 pages, 4503 KiB

Open AccessReview

A Comprehensive Review on Temporal-Action Proposal Generation

by Sorn Sooksatra and Sitapa Watcharapinchai

J. Imaging 2022, 8(8), 207; https://doi.org/10.3390/jimaging8080207 - 23 Jul 2022

Cited by 1 | Viewed by 1922

Abstract

Temporal-action proposal generation (TAPG) is a well-known pre-processing of temporal-action localization and mainly affects localization performance on untrimmed videos. In recent years, there has been growing interest in proposal generation. Researchers have recently focused on anchor- and boundary-based methods for generating action proposals. [...] Read more.

Temporal-action proposal generation (TAPG) is a well-known pre-processing of temporal-action localization and mainly affects localization performance on untrimmed videos. In recent years, there has been growing interest in proposal generation. Researchers have recently focused on anchor- and boundary-based methods for generating action proposals. The main purpose of this paper is to provide a comprehensive review of temporal-action proposal generation with network architectures and empirical results. The pre-processing step for input data is also discussed for network construction. The content of this paper was obtained from the research literature related to temporal-action proposal generation from 2012 to 2022 for performance evaluation and comparison. From several well-known databases, we used specific keywords to select 71 related studies according to their contributions and evaluation criteria. The contributions and methodologies are summarized and analyzed in a tabular form for each category. The result from state-of-the-art research was further analyzed to show its limitations and challenges for action proposal generation. TAPG performance in average recall ranges from 60% up to 78% in two TAPG benchmarks. In addition, several future potential research directions in this field are suggested based on the current limitations of the related studies. Full article

(This article belongs to the Special Issue Advances in Human Action Recognition Using Deep Learning)

► Show Figures

Figure 1

13 pages, 3467 KiB

Open AccessArticle

Diffraction Enhanced Imaging Analysis with Pseudo-Voigt Fit Function

by Deepak Mani, Andreas Kupsch, Bernd R. Müller and Giovanni Bruno

J. Imaging 2022, 8(8), 206; https://doi.org/10.3390/jimaging8080206 - 23 Jul 2022

Cited by 6 | Viewed by 1881

Abstract

Diffraction enhanced imaging (DEI) is an advanced digital radiographic imaging technique employing the refraction of X-rays to contrast internal interfaces. This study aims to qualitatively and quantitatively evaluate images acquired using this technique and to assess how different fitting functions to the typical [...] Read more.

Diffraction enhanced imaging (DEI) is an advanced digital radiographic imaging technique employing the refraction of X-rays to contrast internal interfaces. This study aims to qualitatively and quantitatively evaluate images acquired using this technique and to assess how different fitting functions to the typical rocking curves (RCs) influence the quality of the images. RCs are obtained for every image pixel. This allows the separate determination of the absorption and the refraction properties of the material in a position-sensitive manner. Comparison of various types of fitting functions reveals that the Pseudo-Voigt (PsdV) function is best suited to fit typical RCs. A robust algorithm was developed in the Python programming language, which reliably extracts the physically meaningful information from each pixel of the image. We demonstrate the potential of the algorithm with two specimens: a silicone gel specimen that has well-defined interfaces, and an additively manufactured polycarbonate specimen. Full article

(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)

► Show Figures

Figure 1

40 pages, 5188 KiB

Open AccessReview

Brain Tumor Diagnosis Using Machine Learning, Convolutional Neural Networks, Capsule Neural Networks and Vision Transformers, Applied to MRI: A Survey

by Andronicus A. Akinyelu, Fulvio Zaccagna, James T. Grist, Mauro Castelli and Leonardo Rundo

J. Imaging 2022, 8(8), 205; https://doi.org/10.3390/jimaging8080205 - 22 Jul 2022

Cited by 39 | Viewed by 7227

Abstract

Management of brain tumors is based on clinical and radiological information with presumed grade dictating treatment. Hence, a non-invasive assessment of tumor grade is of paramount importance to choose the best treatment plan. Convolutional Neural Networks (CNNs) represent one of the effective Deep [...] Read more.

Management of brain tumors is based on clinical and radiological information with presumed grade dictating treatment. Hence, a non-invasive assessment of tumor grade is of paramount importance to choose the best treatment plan. Convolutional Neural Networks (CNNs) represent one of the effective Deep Learning (DL)-based techniques that have been used for brain tumor diagnosis. However, they are unable to handle input modifications effectively. Capsule neural networks (CapsNets) are a novel type of machine learning (ML) architecture that was recently developed to address the drawbacks of CNNs. CapsNets are resistant to rotations and affine translations, which is beneficial when processing medical imaging datasets. Moreover, Vision Transformers (ViT)-based solutions have been very recently proposed to address the issue of long-range dependency in CNNs. This survey provides a comprehensive overview of brain tumor classification and segmentation techniques, with a focus on ML-based, CNN-based, CapsNet-based, and ViT-based techniques. The survey highlights the fundamental contributions of recent studies and the performance of state-of-the-art techniques. Moreover, we present an in-depth discussion of crucial issues and open challenges. We also identify some key limitations and promising future research directions. We envisage that this survey shall serve as a good springboard for further study. Full article

(This article belongs to the Special Issue Radiomics and Texture Analysis in Medical Imaging)

► Show Figures

Figure 1

19 pages, 1505 KiB

Open AccessArticle

Lung Volume Calculation in Preclinical MicroCT: A Fast Geometrical Approach

by Juan Antonio Camara, Anna Pujol, Juan Jose Jimenez, Jaime Donate, Marina Ferrer and Greetje Vande Velde

J. Imaging 2022, 8(8), 204; https://doi.org/10.3390/jimaging8080204 - 22 Jul 2022

Viewed by 2134

Abstract

In this study, we present a time-efficient protocol for thoracic volume calculation as a proxy for total lung volume. We hypothesize that lung volume can be calculated indirectly from this thoracic volume. We compared the measured thoracic volume with manually segmented and automatically [...] Read more.

In this study, we present a time-efficient protocol for thoracic volume calculation as a proxy for total lung volume. We hypothesize that lung volume can be calculated indirectly from this thoracic volume. We compared the measured thoracic volume with manually segmented and automatically thresholded lung volumes, with manual segmentation as the gold standard. A linear regression formula was obtained and used for calculating the theoretical lung volume. This volume was compared with the gold standard volumes. In healthy animals, thoracic volume was 887.45 mm³, manually delineated lung volume 554.33 mm³ and thresholded aerated lung volume 495.38 mm³ on average. Theoretical lung volume was 554.30 mm³. Finally, the protocol was applied to three animal models of lung pathology (lung metastasis and transgenic primary lung tumor and fungal infection). In confirmed pathologic animals, thoracic volumes were: 893.20 mm³, 860.12 and 1027.28 mm³. Manually delineated volumes were 640.58, 503.91 and 882.42 mm³, respectively. Thresholded lung volumes were 315.92 mm³, 408.72 and 236 mm³, respectively. Theoretical lung volume resulted in 635.28, 524.30 and 863.10.42 mm³. No significant differences were observed between volumes. This confirmed the potential use of this protocol for lung volume calculation in pathologic models. Full article

(This article belongs to the Topic Medical Image Analysis)

► Show Figures

Figure 1

Journal Menu

Journal Browser

J. Imaging, Volume 8, Issue 8 (August 2022) – 21 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI