Source Camera Identification Techniques: A Survey
Point Projection Mapping System for Tracking, Registering, Labeling, and Validating Optical Tissue Measurements

Journal Description

Journal of Imaging

Journal of Imaging is an international, multi/interdisciplinary, peer-reviewed, open access journal of imaging techniques published online monthly by MDPI.

Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
High Visibility: indexed within Scopus, ESCI (Web of Science), PubMed, PMC, dblp, Inspec, Ei Compendex, and other databases.
Journal Rank: CiteScore - Q2 (Computer Graphics and Computer-Aided Design)
Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 21.7 days after submission; acceptance to publication is undertaken in 3.8 days (median values for papers published in this journal in the second half of 2023).
Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.

Impact Factor: 3.2 (2022); 5-Year Impact Factor: 3.2 (2022)

Imprint Information Journal Flyer Open Access ISSN: 2313-433X

Latest Articles

18 pages, 1563 KiB

Open AccessArticle

Fast Linde–Buzo–Gray (FLBG) Algorithm for Image Compression through Rescaling Using Bilinear Interpolation

by Muhammmad Bilal, Zahid Ullah, Omer Mujahid and Tama Fouzder

J. Imaging 2024, 10(5), 124; https://doi.org/10.3390/jimaging10050124 - 20 May 2024

Abstract

Vector quantization (VQ) is a block coding method that is famous for its high compression ratio and simple encoder and decoder implementation. Linde–Buzo–Gray (LBG) is a renowned technique for VQ that uses a clustering-based approach for finding the optimum codebook. Numerous algorithms, such as Particle Swarm Optimization (PSO), the Cuckoo search algorithm (CS), bat algorithm, and firefly algorithm (FA), are used for codebook design. These algorithms are primarily focused on improving the image quality in terms of the PSNR and SSIM but use exhaustive searching to find the optimum codebook, which causes the computational time to be very high. In our study, our algorithm enhances LBG by minimizing the computational complexity by reducing the total number of comparisons among the codebook and training vectors using a match function. The input image is taken as a training vector at the encoder side, which is initialized with the random selection of the vectors from the input image. Rescaling using bilinear interpolation through the nearest neighborhood method is performed to reduce the comparison of the codebook with the training vector. The compressed image is first downsized by the encoder, which is then upscaled at the decoder side during decompression. Based on the results, it is demonstrated that the proposed method reduces the computational complexity by 50.2% compared to LBG and above 97% compared to the other LBG-based algorithms. Moreover, a 20% reduction in the memory size is also obtained, with no significant loss in the image quality compared to the LBG algorithm. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

11 pages, 2191 KiB

Open AccessCommunication

Head Gesture Recognition Combining Activity Detection and Dynamic Time Warping

by Huaizhou Li and Haiyan Hu

J. Imaging 2024, 10(5), 123; https://doi.org/10.3390/jimaging10050123 - 19 May 2024

Abstract

The recognition of head movements plays an important role in human–computer interface domains. The data collected with image sensors or inertial measurement unit (IMU) sensors are often used for identifying these types of actions. Compared with image processing methods, a recognition system using an IMU sensor has obvious advantages in terms of complexity, processing speed, and cost. In this paper, an IMU sensor is used to collect head movement data on the legs of glasses, and a new approach for recognizing head movements is proposed by combining activity detection and dynamic time warping (DTW). The activity detection of the time series of head movements is essentially based on the different characteristics exhibited by actions and noises. The DTW method estimates the warp path distances between the time series of the actions and the templates by warping under the time axis. Then, the types of head movements are determined by the minimum of these distances. The results show that a 100% accuracy was achieved in the task of classifying six types of head movements. This method provides a new option for head gesture recognition in current human–computer interfaces. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

11 pages, 1233 KiB

Open AccessArticle

Imaging-Based Deep Learning for Predicting Desmoid Tumor Progression

by Rabih Fares, Lilian D. Atlan, Ido Druckmann, Shai Factor, Yair Gortzak, Ortal Segal, Moran Artzi and Amir Sternheim

J. Imaging 2024, 10(5), 122; https://doi.org/10.3390/jimaging10050122 - 17 May 2024

Abstract

Desmoid tumors (DTs) are non-metastasizing and locally aggressive soft-tissue mesenchymal neoplasms. Those that become enlarged often become locally invasive and cause significant morbidity. DTs have a varied pattern of clinical presentation, with up to 50–60% not growing after diagnosis and 20–30% shrinking or even disappearing after initial progression. Enlarging tumors are considered unstable and progressive. The management of symptomatic and enlarging DTs is challenging, and primarily consists of chemotherapy. Despite wide surgical resection, DTs carry a rate of local recurrence as high as 50%. There is a consensus that contrast-enhanced magnetic resonance imaging (MRI) or, alternatively, computerized tomography (CT) is the preferred modality for monitoring DTs. Each uses Response Evaluation Criteria in Solid Tumors version 1.1 (RECIST 1.1), which measures the largest diameter on axial, sagittal, or coronal series. This approach, however, reportedly lacks accuracy in detecting response to therapy and fails to detect tumor progression, thus calling for more sophisticated methods. The objective of this study was to detect unique features identified by deep learning that correlate with the future clinical course of the disease. Between 2006 and 2019, 51 patients (mean age 41.22 ± 15.5 years) who had a tissue diagnosis of DT were included in this retrospective single-center study. Each had undergone at least three MRI examinations (including a pretreatment baseline study), and each was followed by orthopedic oncology specialists for a median of 38.83 months (IQR 44.38). Tumor segmentations were performed on a T2 fat-suppressed treatment-naive MRI sequence, after which the segmented lesion was extracted to a three-dimensional file together with its DICOM file and run through deep learning software. The results of the algorithm were then compared to clinical data collected from the patients’ medical files. There were 28 males (13 stable) and 23 females (15 stable) whose ages ranged from 19.07 to 83.33 years. The model was able to independently predict clinical progression as measured from the baseline MRI with an overall accuracy of 93% (93 ± 0.04) and ROC of 0.89 ± 0.08. Artificial intelligence may contribute to risk stratification and clinical decision-making in patients with DT by predicting which patients are likely to progress. Full article

(This article belongs to the Special Issue Clinical and Pathological Imaging in the Era of Artificial Intelligence: New Insights and Perspectives)

► Show Figures

Figure 1

15 pages, 2997 KiB

Open AccessArticle

Overcoming Dimensionality Constraints: A Gershgorin Circle Theorem-Based Feature Extraction for Weighted Laplacian Matrices in Computer Vision Applications

by Sahaj Anilbhai Patel and Abidin Yildirim

J. Imaging 2024, 10(5), 121; https://doi.org/10.3390/jimaging10050121 - 15 May 2024

Abstract

In graph theory, the weighted Laplacian matrix is the most utilized technique to interpret the local and global properties of a complex graph structure within computer vision applications. However, with increasing graph nodes, the Laplacian matrix’s dimensionality also increases accordingly. Therefore, there is always the “curse of dimensionality”; In response to this challenge, this paper introduces a new approach to reducing the dimensionality of the weighted Laplacian matrix by utilizing the Gershgorin circle theorem by transforming the weighted Laplacian matrix into a strictly diagonal domain and then estimating rough eigenvalue inclusion of a matrix. The estimated inclusions are represented as reduced features, termed GC features; The proposed Gershgorin circle feature extraction (GCFE) method was evaluated using three publicly accessible computer vision datasets, varying image patch sizes, and three different graph types. The GCFE method was compared with eight distinct studies. The GCFE demonstrated a notable positive Z-score compared to other feature extraction methods such as I-PCA, kernel PCA, and spectral embedding. Specifically, it achieved an average Z-score of 6.953 with the 2D grid graph type and 4.473 with the pairwise graph type, particularly on the E_Balanced dataset. Furthermore, it was observed that while the accuracy of most major feature extraction methods declined with smaller image patch sizes, the GCFE maintained consistent accuracy across all tested image patch sizes. When the GCFE method was applied to the E_MNSIT dataset using the K-NN graph type, the GCFE method confirmed its consistent accuracy performance, evidenced by a low standard deviation (SD) of 0.305. This performance was notably lower compared to other methods like Isomap, which had an SD of 1.665, and LLE, which had an SD of 1.325; The GCFE outperformed most feature extraction methods in terms of classification accuracy and computational efficiency. The GCFE method also requires fewer training parameters for deep-learning models than the traditional weighted Laplacian method, establishing its potential for more effective and efficient feature extraction in computer vision tasks. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

36 pages, 7878 KiB

Open AccessReview

Advances in Real-Time 3D Reconstruction for Medical Endoscopy

by Alexander Richter, Till Steinmann, Jean-Claude Rosenthal and Stefan J. Rupitsch

J. Imaging 2024, 10(5), 120; https://doi.org/10.3390/jimaging10050120 - 14 May 2024

Abstract

This contribution is intended to provide researchers with a comprehensive overview of the current state-of-the-art concerning real-time 3D reconstruction methods suitable for medical endoscopy. Over the past decade, there have been various technological advancements in computational power and an increased research effort in many computer vision fields such as autonomous driving, robotics, and unmanned aerial vehicles. Some of these advancements can also be adapted to the field of medical endoscopy while coping with challenges such as featureless surfaces, varying lighting conditions, and deformable structures. To provide a comprehensive overview, a logical division of monocular, binocular, trinocular, and multiocular methods is performed and also active and passive methods are distinguished. Within these categories, we consider both flexible and non-flexible endoscopes to cover the state-of-the-art as fully as possible. The relevant error metrics to compare the publications presented here are discussed, and the choice of when to choose a GPU rather than an FPGA for camera-based 3D reconstruction is debated. We elaborate on the good practice of using datasets and provide a direct comparison of the presented work. It is important to note that in addition to medical publications, publications evaluated on the KITTI and Middlebury datasets are also considered to include related methods that may be suited for medical 3D reconstruction. Full article

(This article belongs to the Special Issue Advances in Biomedical Image Processing and Artificial Intelligence for Computer-Aided Diagnosis in Medicine)

► Show Figures

Figure 1

22 pages, 604 KiB

Open AccessSystematic Review

The Accuracy of Three-Dimensional Soft Tissue Simulation in Orthognathic Surgery—A Systematic Review

by Anna Olejnik, Laurence Verstraete, Tomas-Marijn Croonenborghs, Constantinus Politis and Gwen R. J. Swennen

J. Imaging 2024, 10(5), 119; https://doi.org/10.3390/jimaging10050119 - 14 May 2024

Abstract

Three-dimensional soft tissue simulation has become a popular tool in the process of virtual orthognathic surgery planning and patient–surgeon communication. To apply 3D soft tissue simulation software in routine clinical practice, both qualitative and quantitative validation of its accuracy are required. The objective of this study was to systematically review the literature on the accuracy of 3D soft tissue simulation in orthognathic surgery. The Web of Science, PubMed, Cochrane, and Embase databases were consulted for the literature search. The systematic review (SR) was conducted according to the PRISMA statement, and 40 articles fulfilled the inclusion and exclusion criteria. The Quadas-2 tool was used for the risk of bias assessment for selected studies. A mean error varying from 0.27 mm to 2.9 mm for 3D soft tissue simulations for the whole face was reported. In the studies evaluating 3D soft tissue simulation accuracy after a Le Fort I osteotomy only, the upper lip and paranasal regions were reported to have the largest error, while after an isolated bilateral sagittal split osteotomy, the largest error was reported for the lower lip and chin regions. In the studies evaluating simulation after bimaxillary osteotomy with or without genioplasty, the highest inaccuracy was reported at the level of the lips, predominantly the lower lip, chin, and, sometimes, the paranasal regions. Due to the variability in the study designs and analysis methods, a direct comparison was not possible. Therefore, based on the results of this SR, guidelines to systematize the workflow for evaluating the accuracy of 3D soft tissue simulations in orthognathic surgery in future studies are proposed. Full article

(This article belongs to the Special Issue Advances in Biomedical Image Processing and Artificial Intelligence for Computer-Aided Diagnosis in Medicine)

► Show Figures

Figure 1

17 pages, 6612 KiB

Open AccessArticle

Semi-Supervised Medical Image Segmentation Based on Deep Consistent Collaborative Learning

by Xin Zhao and Wenqi Wang

J. Imaging 2024, 10(5), 118; https://doi.org/10.3390/jimaging10050118 - 14 May 2024

Abstract

In the realm of medical image analysis, the cost associated with acquiring accurately labeled data is prohibitively high. To address the issue of label scarcity, semi-supervised learning methods are employed, utilizing unlabeled data alongside a limited set of labeled data. This paper presents a novel semi-supervised medical segmentation framework, DCCLNet (deep consistency collaborative learning UNet), grounded in deep consistent co-learning. The framework synergistically integrates consistency learning from feature and input perturbations, coupled with collaborative training between CNN (convolutional neural networks) and ViT (vision transformer), to capitalize on the learning advantages offered by these two distinct paradigms. Feature perturbation involves the application of auxiliary decoders with varied feature disturbances to the main CNN backbone, enhancing the robustness of the CNN backbone through consistency constraints generated by the auxiliary and main decoders. Input perturbation employs an MT (mean teacher) architecture wherein the main network serves as the student model guided by a teacher model subjected to input perturbations. Collaborative training aims to improve the accuracy of the main networks by encouraging mutual learning between the CNN and ViT. Experiments conducted on publicly available datasets for ACDC (automated cardiac diagnosis challenge) and Prostate datasets yielded Dice coefficients of 0.890 and 0.812, respectively. Additionally, comprehensive ablation studies were performed to demonstrate the effectiveness of each methodological contribution in this study. Full article

(This article belongs to the Special Issue Deep Learning in Biomedical Image Segmentation and Classification: Advancements, Challenges and Applications)

► Show Figures

Figure 1

13 pages, 5592 KiB

Open AccessArticle

Bayesian Networks in the Management of Hospital Admissions: A Comparison between Explainable AI and Black Box AI during the Pandemic

by Giovanna Nicora, Michele Catalano, Chandra Bortolotto, Marina Francesca Achilli, Gaia Messana, Antonio Lo Tito, Alessio Consonni, Sara Cutti, Federico Comotto, Giulia Maria Stella, Angelo Corsico, Stefano Perlini, Riccardo Bellazzi, Raffaele Bruno and Lorenzo Preda

J. Imaging 2024, 10(5), 117; https://doi.org/10.3390/jimaging10050117 - 10 May 2024

Abstract

Artificial Intelligence (AI) and Machine Learning (ML) approaches that could learn from large data sources have been identified as useful tools to support clinicians in their decisional process; AI and ML implementations have had a rapid acceleration during the recent COVID-19 pandemic. However, many ML classifiers are “black box” to the final user, since their underlying reasoning process is often obscure. Additionally, the performance of such models suffers from poor generalization ability in the presence of dataset shifts. Here, we present a comparison between an explainable-by-design (“white box”) model (Bayesian Network (BN)) versus a black box model (Random Forest), both studied with the aim of supporting clinicians of Policlinico San Matteo University Hospital in Pavia (Italy) during the triage of COVID-19 patients. Our aim is to evaluate whether the BN predictive performances are comparable with those of a widely used but less explainable ML model such as Random Forest and to test the generalization ability of the ML models across different waves of the pandemic. Full article

(This article belongs to the Special Issue Advances and Challenges in Multimodal Machine Learning)

► Show Figures

Figure 1

29 pages, 101406 KiB

Open AccessArticle

When Two Eyes Don’t Suffice—Learning Difficult Hyperfluorescence Segmentations in Retinal Fundus Autofluorescence Images via Ensemble Learning

by Monty Santarossa, Tebbo Tassilo Beyer, Amelie Bernadette Antonia Scharf, Ayse Tatli, Claus von der Burchard, Jakob Nazarenus, Johann Baptist Roider and Reinhard Koch

J. Imaging 2024, 10(5), 116; https://doi.org/10.3390/jimaging10050116 - 9 May 2024

Abstract

Hyperfluorescence (HF) and reduced autofluorescence (RA) are important biomarkers in fundus autofluorescence images (FAF) for the assessment of health of the retinal pigment epithelium (RPE), an important indicator of disease progression in geographic atrophy (GA) or central serous chorioretinopathy (CSCR). Autofluorescence images have been annotated by human raters, but distinguishing biomarkers (whether signals are increased or decreased) from the normal background proves challenging, with borders being particularly open to interpretation. Consequently, significant variations emerge among different graders, and even within the same grader during repeated annotations. Tests on in-house FAF data show that even highly skilled medical experts, despite previously discussing and settling on precise annotation guidelines, reach a pair-wise agreement measured in a Dice score of no more than 63–80% for HF segmentations and only 14–52% for RA. The data further show that the agreement of our primary annotation expert with herself is a 72% Dice score for HF and 51% for RA. Given these numbers, the task of automated HF and RA segmentation cannot simply be refined to the improvement in a segmentation score. Instead, we propose the use of a segmentation ensemble. Learning from images with a single annotation, the ensemble reaches expert-like performance with an agreement of a 64–81% Dice score for HF and 21–41% for RA with all our experts. In addition, utilizing the mean predictions of the ensemble networks and their variance, we devise ternary segmentations where FAF image areas are labeled either as confident background, confident HF, or potential HF, ensuring that predictions are reliable where they are confident (97% Precision), while detecting all instances of HF (99% Recall) annotated by all experts. Full article

(This article belongs to the Special Issue Medical Image Classification and Segmentation: Progress and Challenges)

► Show Figures

Figure 1

13 pages, 2998 KiB

Open AccessTechnical Note

Image Quality Assessment Tool for Conventional and Dynamic Magnetic Resonance Imaging Acquisitions

by Katerina Nikiforaki, Ioannis Karatzanis, Aikaterini Dovrou, Maciej Bobowicz, Katarzyna Gwozdziewicz, Oliver Díaz, Manolis Tsiknakis, Dimitrios I. Fotiadis, Karim Lekadir and Kostas Marias

J. Imaging 2024, 10(5), 115; https://doi.org/10.3390/jimaging10050115 - 9 May 2024

Abstract

Image quality assessment of magnetic resonance imaging (MRI) data is an important factor not only for conventional diagnosis and protocol optimization but also for fairness, trustworthiness, and robustness of artificial intelligence (AI) applications, especially on large heterogeneous datasets. Information on image quality in multi-centric studies is important to complement the contribution profile from each data node along with quantity information, especially when large variability is expected, and certain acceptance criteria apply. The main goal of this work was to present a tool enabling users to assess image quality based on both subjective criteria as well as objective image quality metrics used to support the decision on image quality based on evidence. The evaluation can be performed on both conventional and dynamic MRI acquisition protocols, while the latter is also checked longitudinally across dynamic series. The assessment provides an overall image quality score and information on the types of artifacts and degrading factors as well as a number of objective metrics for automated evaluation across series (BRISQUE score, Total Variation, PSNR, SSIM, FSIM, MS-SSIM). Moreover, the user can define specific regions of interest (ROIs) to calculate the regional signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR), thus individualizing the quality output to specific use cases, such as tissue-specific contrast or regional noise quantification. Full article

(This article belongs to the Special Issue Clinical and Pathological Imaging in the Era of Artificial Intelligence: New Insights and Perspectives)

► Show Figures

Figure 1

15 pages, 14216 KiB

Open AccessArticle

A New Dataset and Comparative Study for Aphid Cluster Detection and Segmentation in Sorghum Fields

by Raiyan Rahman, Christopher Indris, Goetz Bramesfeld, Tianxiao Zhang, Kaidong Li, Xiangyu Chen, Ivan Grijalva, Brian McCornack, Daniel Flippo, Ajay Sharda and Guanghui Wang

J. Imaging 2024, 10(5), 114; https://doi.org/10.3390/jimaging10050114 - 8 May 2024

Abstract

Aphid infestations are one of the primary causes of extensive damage to wheat and sorghum fields and are one of the most common vectors for plant viruses, resulting in significant agricultural yield losses. To address this problem, farmers often employ the inefficient use of harmful chemical pesticides that have negative health and environmental impacts. As a result, a large amount of pesticide is wasted on areas without significant pest infestation. This brings to attention the urgent need for an intelligent autonomous system that can locate and spray sufficiently large infestations selectively within the complex crop canopies. We have developed a large multi-scale dataset for aphid cluster detection and segmentation, collected from actual sorghum fields and meticulously annotated to include clusters of aphids. Our dataset comprises a total of 54,742 image patches, showcasing a variety of viewpoints, diverse lighting conditions, and multiple scales, highlighting its effectiveness for real-world applications. In this study, we trained and evaluated four real-time semantic segmentation models and three object detection models specifically for aphid cluster segmentation and detection. Considering the balance between accuracy and efficiency, Fast-SCNN delivered the most effective segmentation results, achieving 80.46% mean precision, 81.21% mean recall, and 91.66 frames per second (FPS). For object detection, RT-DETR exhibited the best overall performance with a 61.63% mean average precision (mAP), 92.6% mean recall, and 72.55 on an NVIDIA V100 GPU. Our experiments further indicate that aphid cluster segmentation is more suitable for assessing aphid infestations than using detection models. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

18 pages, 17544 KiB

Open AccessReview

Review of Image Quality Assessment Methods for Compressed Images

by Sonain Jamil

J. Imaging 2024, 10(5), 113; https://doi.org/10.3390/jimaging10050113 - 8 May 2024

Abstract

The compression of images for efficient storage and transmission is crucial in handling large data volumes. Lossy image compression reduces storage needs but introduces perceptible distortions affected by content, compression levels, and display environments. Each compression method generates specific visual anomalies like blocking, blurring, or color shifts. Standardizing efficient lossy compression necessitates evaluating perceptual quality. Objective measurements offer speed and cost efficiency, while subjective assessments, despite their cost and time implications, remain the gold standard. This paper delves into essential research queries to achieve visually lossless images. The paper describes the influence of compression on image quality, appropriate objective image quality metrics (IQMs), and the effectiveness of subjective assessment methods. It also provides an overview of the existing literature, surveys, and subjective and objective image quality assessment (IQA) methods. Our aim is to offer insights, identify challenges in existing methodologies, and assist researchers in selecting the most effective assessment approach for their needs. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

25 pages, 10696 KiB

Open AccessArticle

Day-to-Night Street View Image Generation for 24-Hour Urban Scene Auditing Using Generative AI

by Zhiyi Liu, Tingting Li, Tianyi Ren, Da Chen, Wenjing Li and Waishan Qiu

J. Imaging 2024, 10(5), 112; https://doi.org/10.3390/jimaging10050112 - 7 May 2024

Abstract

A smarter city should be a safer city. Nighttime safety in metropolitan areas has long been a global concern, particularly for large cities with diverse demographics and intricate urban forms, whose citizens are often threatened by higher street-level crime rates. However, due to the lack of night-time urban appearance data, prior studies based on street view imagery (SVI) rarely addressed the perceived night-time safety issue, which can generate important implications for crime prevention. This study hypothesizes that night-time SVI can be effectively generated from widely existing daytime SVIs using generative AI (GenAI). To test the hypothesis, this study first collects pairwise day-and-night SVIs across four cities diverged in urban landscapes to construct a comprehensive day-and-night SVI dataset. It then trains and validates a day-to-night (D2N) model with fine-tuned brightness adjustment, effectively transforming daytime SVIs to nighttime ones for distinct urban forms tailored for urban scene perception studies. Our findings indicate that: (1) the performance of D2N transformation varies significantly by urban-scape variations related to urban density; (2) the proportion of building and sky views are important determinants of transformation accuracy; (3) within prevailed models, CycleGAN maintains the consistency of D2N scene conversion, but requires abundant data. Pix2Pix achieves considerable accuracy when pairwise day–and–night-night SVIs are available and are sensitive to data quality. StableDiffusion yields high-quality images with expensive training costs. Therefore, CycleGAN is most effective in balancing the accuracy, data requirement, and cost. This study contributes to urban scene studies by constructing a first-of-its-kind D2N dataset consisting of pairwise day-and-night SVIs across various urban forms. The D2N generator will provide a cornerstone for future urban studies that heavily utilize SVIs to audit urban environments. Full article

(This article belongs to the Special Issue Visual Localization—Volume II)

► Show Figures

Figure 1

17 pages, 1003 KiB

Open AccessArticle

Autoencoder-Based Unsupervised Surface Defect Detection Using Two-Stage Training

by Tesfaye Getachew Shiferaw and Li Yao

J. Imaging 2024, 10(5), 111; https://doi.org/10.3390/jimaging10050111 - 5 May 2024

Abstract

Accurately detecting defects while reconstructing a high-quality normal background in surface defect detection using unsupervised methods remains a significant challenge. This study proposes an unsupervised method that effectively addresses this challenge by achieving both accurate defect detection and a high-quality normal background reconstruction without noise. We propose an adaptive weighted structural similarity (AW-SSIM) loss for focused feature learning. AW-SSIM improves structural similarity (SSIM) loss by assigning different weights to its sub-functions of luminance, contrast, and structure based on their relative importance for a specific training sample. Moreover, it dynamically adjusts the Gaussian window’s standard deviation (

σ

) during loss calculation to balance noise reduction and detail preservation. An artificial defect generation algorithm (ADGA) is proposed to generate an artificial defect closely resembling real ones. We use a two-stage training strategy. In the first stage, the model trains only on normal samples using AW-SSIM loss, allowing it to learn robust representations of normal features. In the second stage of training, the weights obtained from the first stage are used to train the model on both normal and artificially defective training samples. Additionally, the second stage employs a combined learned Perceptual Image Patch Similarity (LPIPS) and AW-SSIM loss. The combined loss helps the model in achieving high-quality normal background reconstruction while maintaining accurate defect detection. Extensive experimental results demonstrate that our proposed method achieves a state-of-the-art defect detection accuracy. The proposed method achieved an average area under the receiver operating characteristic curve (AuROC) of 97.69% on six samples from the MVTec anomaly detection dataset. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

18 pages, 5383 KiB

Open AccessArticle

Reliable Out-of-Distribution Recognition of Synthetic Images

by Anatol Maier and Christian Riess

J. Imaging 2024, 10(5), 110; https://doi.org/10.3390/jimaging10050110 - 1 May 2024

Abstract

Generative adversarial networks (GANs) and diffusion models (DMs) have revolutionized the creation of synthetically generated but realistic-looking images. Distinguishing such generated images from real camera captures is one of the key tasks in current multimedia forensics research. One particular challenge is the generalization to unseen generators or post-processing. This can be viewed as an issue of handling out-of-distribution inputs. Forensic detectors can be hardened by the extensive augmentation of the training data or specifically tailored networks. Nevertheless, such precautions only manage but do not remove the risk of prediction failures on inputs that look reasonable to an analyst but in fact are out of the training distribution of the network. With this work, we aim to close this gap with a Bayesian Neural Network (BNN) that provides an additional uncertainty measure to warn an analyst of difficult decisions. More specifically, the BNN learns the task at hand and also detects potential confusion between post-processing and image generator artifacts. Our experiments show that the BNN achieves on-par performance with the state-of-the-art detectors while producing more reliable predictions on out-of-distribution examples. Full article

(This article belongs to the Special Issue Robust Deep Learning Techniques for Multimedia Forensics and Security)

► Show Figures

Figure 1

25 pages, 1448 KiB

Open AccessArticle

Skin Tone Estimation under Diverse Lighting Conditions

by Success K. Mbatha, Marthinus J. Booysen and Rensu P. Theart

J. Imaging 2024, 10(5), 109; https://doi.org/10.3390/jimaging10050109 - 30 Apr 2024

Abstract

Knowledge of a person’s level of skin pigmentation, or so-called “skin tone”, has proven to be an important building block in improving the performance and fairness of various applications that rely on computer vision. These include medical diagnosis of skin conditions, cosmetic and skincare support, and face recognition, especially for darker skin tones. However, the perception of skin tone, whether by the human eye or by an optoelectronic sensor, uses the reflection of light from the skin. The source of this light, or illumination, affects the skin tone that is perceived. This study aims to refine and assess a convolutional neural network-based skin tone estimation model that provides consistent accuracy across different skin tones under various lighting conditions. The 10-point Monk Skin Tone Scale was used to represent the skin tone spectrum. A dataset of 21,375 images was captured from volunteers across the pigmentation spectrum. Experimental results show that a regression model outperforms other models, with an estimated-to-target distance of 0.5. Using a threshold estimated-to-target skin tone distance of 2 for all lights results in average accuracy values of 85.45% and 97.16%. With the Monk Skin Tone Scale segmented into three groups, the lighter exhibits strong accuracy, the middle displays lower accuracy, and the dark falls between the two. The overall skin tone estimation achieves average error distances in the LAB space of

16.40 \pm 20.62

. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

21 pages, 12872 KiB

Open AccessArticle

Optimizing Vision Transformers for Histopathology: Pretraining and Normalization in Breast Cancer Classification

by Giulia Lucrezia Baroni, Laura Rasotto, Kevin Roitero, Angelica Tulisso, Carla Di Loreto and Vincenzo Della Mea

J. Imaging 2024, 10(5), 108; https://doi.org/10.3390/jimaging10050108 - 30 Apr 2024

Abstract

This paper introduces a self-attention Vision Transformer model specifically developed for classifying breast cancer in histology images. We examine various training strategies and configurations, including pretraining, dimension resizing, data augmentation and color normalization strategies, patch overlap, and patch size configurations, in order to evaluate their impact on the effectiveness of the histology image classification. Additionally, we provide evidence for the increase in effectiveness gathered through geometric and color data augmentation techniques. We primarily utilize the BACH dataset to train and validate our methods and models, but we also test them on two additional datasets, BRACS and AIDPATH, to verify their generalization capabilities. Our model, developed from a transformer pretrained on ImageNet, achieves an accuracy rate of 0.91 on the BACH dataset, 0.74 on the BRACS dataset, and 0.92 on the AIDPATH dataset. Using a model based on the prostate small and prostate medium HistoEncoder models, we achieve accuracy rates of 0.89 and 0.86, respectively. Our results suggest that pretraining on large-scale general datasets like ImageNet is advantageous. We also show the potential benefits of using domain-specific pretraining datasets, such as extensive histopathological image collections as in HistoEncoder, though not yet with clear advantages. Full article

(This article belongs to the Special Issue Advances in Biomedical Image Processing and Artificial Intelligence for Computer-Aided Diagnosis in Medicine)

► Show Figures

Figure 1

17 pages, 3251 KiB

Open AccessArticle

Artificial Intelligence, Intrapartum Ultrasound and Dystocic Delivery: AIDA (Artificial Intelligence Dystocia Algorithm), a Promising Helping Decision Support System

by Antonio Malvasi, Lorenzo E. Malgieri, Ettore Cicinelli, Antonella Vimercati, Antonio D’Amato, Miriam Dellino, Giuseppe Trojano, Tommaso Difonzo, Renata Beck and Andrea Tinelli

J. Imaging 2024, 10(5), 107; https://doi.org/10.3390/jimaging10050107 - 29 Apr 2024

Abstract

The position of the fetal head during engagement and progression in the birth canal is the primary cause of dystocic labor and arrest of progression, often due to malposition and malrotation. The authors performed an investigation on pregnant women in labor, who all underwent vaginal digital examination by obstetricians and midwives as well as intrapartum ultrasonography to collect four “geometric parameters”, measured in all the women. All parameters were measured using artificial intelligence and machine learning algorithms, called AIDA (artificial intelligence dystocia algorithm), which incorporates a human-in-the-loop approach, that is, to use AI (artificial intelligence) algorithms that prioritize the physician’s decision and explainable artificial intelligence (XAI). The AIDA was structured into five classes. After a number of “geometric parameters” were collected, the data obtained from the AIDA analysis were entered into a red, yellow, or green zone, linked to the analysis of the progress of labor. Using the AIDA analysis, we were able to identify five reference classes for patients in labor, each of which had a certain sort of birth outcome. A 100% cesarean birth prediction was made in two of these five classes. The use of artificial intelligence, through the evaluation of certain obstetric parameters in specific decision-making algorithms, allows physicians to systematically understand how the results of the algorithms can be explained. This approach can be useful in evaluating the progress of labor and predicting the labor outcome, including spontaneous, whether operative VD (vaginal delivery) should be attempted, or if ICD (intrapartum cesarean delivery) is preferable or necessary. Full article

(This article belongs to the Special Issue Clinical and Pathological Imaging in the Era of Artificial Intelligence: New Insights and Perspectives)

► Show Figures

Figure 1

14 pages, 5855 KiB

Open AccessArticle

Rock Slope Stability Analysis Using Terrestrial Photogrammetry and Virtual Reality on Ignimbritic Deposits

by Tania Peralta, Melanie Menoscal, Gianella Bravo, Victoria Rosado, Valeria Vaca, Diego Capa, Maurizio Mulas and Luis Jordá-Bordehore

J. Imaging 2024, 10(5), 106; https://doi.org/10.3390/jimaging10050106 - 28 Apr 2024

Abstract

Puerto de Cajas serves as a vital high-altitude passage in Ecuador, connecting the coastal region to the city of Cuenca. The stability of this rocky massif is carefully managed through the assessment of blocks and discontinuities, ensuring safe travel. This study presents a novel approach, employing rapid and cost-effective methods to evaluate an unexplored area within the protected expanse of Cajas. Using terrestrial photogrammetry and strategically positioned geomechanical stations along the slopes, we generated a detailed point cloud capturing elusive terrain features. We have used terrestrial photogrammetry for digitalization of the slope. Validation of the collected data was achieved by comparing directional data from Cloud Compare software with manual readings using a digital compass integrated in a phone at control points. The analysis encompasses three slopes, employing the SMR, Q-slope, and kinematic methodologies. Results from the SMR system closely align with kinematic analysis, indicating satisfactory slope quality. Nonetheless, continued vigilance in stability control remains imperative for ensuring road safety and preserving the site’s integrity. Moreover, this research lays the groundwork for the creation of a publicly accessible 3D repository, enhancing visualization capabilities through Google Virtual Reality. This initiative not only aids in replicating the findings but also facilitates access to an augmented reality environment, thereby fostering collaborative research endeavors. Full article

(This article belongs to the Special Issue Exploring Challenges and Innovations in 3D Point Cloud Processing)

► Show Figures

Figure 1

25 pages, 9712 KiB

Open AccessArticle

Comparative Analysis of Color Space and Channel, Detector, and Descriptor for Feature-Based Image Registration

by Wenan Yuan, Sai Raghavendra Prasad Poosa and Rutger Francisco Dirks

J. Imaging 2024, 10(5), 105; https://doi.org/10.3390/jimaging10050105 - 28 Apr 2024

Abstract

The current study aimed to quantify the value of color spaces and channels as a potential superior replacement for standard grayscale images, as well as the relative performance of open-source detectors and descriptors for general feature-based image registration purposes, based on a large benchmark dataset. The public dataset UDIS-D, with 1106 diverse image pairs, was selected. In total, 21 color spaces or channels including RGB, XYZ, Y′CrCb, HLS, L*a*b* and their corresponding channels in addition to grayscale, nine feature detectors including AKAZE, BRISK, CSE, FAST, HL, KAZE, ORB, SIFT, and TBMR, and 11 feature descriptors including AKAZE, BB, BRIEF, BRISK, DAISY, FREAK, KAZE, LATCH, ORB, SIFT, and VGG were evaluated according to reprojection error (RE), root mean square error (RMSE), structural similarity index measure (SSIM), registration failure rate, and feature number, based on 1,950,984 image registrations. No meaningful benefits from color space or channel were observed, although XYZ, RGB color space and L* color channel were able to outperform grayscale by a very minor margin. Per the dataset, the best-performing color space or channel, detector, and descriptor were XYZ/RGB, SIFT/FAST, and AKAZE. The most robust color space or channel, detector, and descriptor were L*a*b*, TBMR, and VGG. The color channel, detector, and descriptor with the most initial detector features and final homography features were Z/L*, FAST, and KAZE. In terms of the best overall unfailing combinations, XYZ/RGB+SIFT/FAST+VGG/SIFT seemed to provide the highest image registration quality, while Z+FAST+VGG provided the most image features. Full article

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

► Show Figures

Figure 1

Journal Menu

Journal Browser

► Journal Browser

Highly Accessed Articles

Latest Books

More Books and Reprints...

E-Mail Alert

News

17 May 2024
Tu Youyou Award—Open for Nominations

16 May 2024
MDPI Romania Author Training Academic Events in April

9 May 2024
Meet Us at the 34th European Symposium on Computer Aided Process Engineering and the 15th International Symposium on Process Systems Engineering (ESCAPE34-PSE24), 2–6 June 2024, Florence, Italy

More News & Announcements...

Topics

Propose a Topic

Topic in Algorithms, Diagnostics, Entropy, Information, J. Imaging

Application of Machine Learning in Molecular Imaging Topic Editors: Allegra Conti, Nicola Toschi, Marianna Inglese, Andrea Duggento, Matthew Grech-Sollars, Serena Monti, Giancarlo Sportelli, Pietro Carra
Deadline: 31 May 2024

Topic in Applied Sciences, Computation, Entropy, J. Imaging

Color Image Processing: Models and Methods (CIP: MM) Topic Editors: Giuliana Ramella, Isabella Torcicollo
Deadline: 30 July 2024

Topic in Applied Sciences, Sensors, J. Imaging, MAKE

Applications in Image Analysis and Pattern Recognition Topic Editors: Bin Fan, Wenqi Ren
Deadline: 31 August 2024

Topic in Applied Sciences, Electronics, J. Imaging, MAKE, Remote Sensing

Computational Intelligence in Remote Sensing: 2nd Edition Topic Editors: Yue Wu, Kai Qin, Maoguo Gong, Qiguang Miao
Deadline: 31 December 2024