Previous Issue
Volume 10, November
 
 

J. Imaging, Volume 10, Issue 12 (December 2024) – 37 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
26 pages, 4760 KiB  
Article
Explainable AI-Based Skin Cancer Detection Using CNN, Particle Swarm Optimization and Machine Learning
by Syed Adil Hussain Shah, Syed Taimoor Hussain Shah, Roa’a Khaled, Andrea Buccoliero, Syed Baqir Hussain Shah, Angelo Di Terlizzi, Giacomo Di Benedetto and Marco Agostino Deriu
J. Imaging 2024, 10(12), 332; https://doi.org/10.3390/jimaging10120332 (registering DOI) - 22 Dec 2024
Abstract
Skin cancer is among the most prevalent cancers globally, emphasizing the need for early detection and accurate diagnosis to improve outcomes. Traditional diagnostic methods, based on visual examination, are subjective, time-intensive, and require specialized expertise. Current artificial intelligence (AI) approaches for skin cancer [...] Read more.
Skin cancer is among the most prevalent cancers globally, emphasizing the need for early detection and accurate diagnosis to improve outcomes. Traditional diagnostic methods, based on visual examination, are subjective, time-intensive, and require specialized expertise. Current artificial intelligence (AI) approaches for skin cancer detection face challenges such as computational inefficiency, lack of interpretability, and reliance on standalone CNN architectures. To address these limitations, this study proposes a comprehensive pipeline combining transfer learning, feature selection, and machine-learning algorithms to improve detection accuracy. Multiple pretrained CNN models were evaluated, with Xception emerging as the optimal choice for its balance of computational efficiency and performance. An ablation study further validated the effectiveness of freezing task-specific layers within the Xception architecture. Feature dimensionality was optimized using Particle Swarm Optimization, reducing dimensions from 1024 to 508, significantly enhancing computational efficiency. Machine-learning classifiers, including Subspace KNN and Medium Gaussian SVM, further improved classification accuracy. Evaluated on the ISIC 2018 and HAM10000 datasets, the proposed pipeline achieved impressive accuracies of 98.5% and 86.1%, respectively. Moreover, Explainable-AI (XAI) techniques, such as Grad-CAM, LIME, and Occlusion Sensitivity, enhanced interpretability. This approach provides a robust, efficient, and interpretable solution for automated skin cancer diagnosis in clinical applications. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis: Progress and Challenges)
13 pages, 6518 KiB  
Article
Towards Robust Supervised Pectoral Muscle Segmentation in Mammography Images
by Parvaneh Aliniya, Mircea Nicolescu, Monica Nicolescu and George Bebis
J. Imaging 2024, 10(12), 331; https://doi.org/10.3390/jimaging10120331 (registering DOI) - 22 Dec 2024
Abstract
Mammography images are the most commonly used tool for breast cancer screening. The presence of pectoral muscle in images for the mediolateral oblique view makes designing a robust automated breast cancer detection system more challenging. Most of the current methods for removing the [...] Read more.
Mammography images are the most commonly used tool for breast cancer screening. The presence of pectoral muscle in images for the mediolateral oblique view makes designing a robust automated breast cancer detection system more challenging. Most of the current methods for removing the pectoral muscle are based on traditional machine learning approaches. This is partly due to the lack of segmentation masks of pectoral muscle in available datasets. In this paper, we provide the segmentation masks of the pectoral muscle for the INbreast, MIAS, and CBIS-DDSM datasets, which will enable the development of supervised methods and the utilization of deep learning. Training deep learning-based models using segmentation masks will also be a powerful tool for removing pectoral muscle for unseen data. To test the validity of this idea, we trained AU-Net separately on the INbreast and CBIS-DDSM for the segmentation of the pectoral muscle. We used cross-dataset testing to evaluate the performance of the models on an unseen dataset. In addition, the models were tested on all of the images in the MIAS dataset. The experimental results show that cross-dataset testing achieves a comparable performance to the same-dataset experiments. Full article
18 pages, 36094 KiB  
Article
Arbitrary Optics for Gaussian Splatting Using Space Warping
by Jakob Nazarenus, Simin Kou, Fang-Lue Zhang and Reinhard Koch
J. Imaging 2024, 10(12), 330; https://doi.org/10.3390/jimaging10120330 (registering DOI) - 22 Dec 2024
Abstract
Due to recent advances in 3D reconstruction from RGB images, it is now possible to create photorealistic representations of real-world scenes that only require minutes to be reconstructed and can be rendered in real time. In particular, 3D Gaussian splatting shows promising results, [...] Read more.
Due to recent advances in 3D reconstruction from RGB images, it is now possible to create photorealistic representations of real-world scenes that only require minutes to be reconstructed and can be rendered in real time. In particular, 3D Gaussian splatting shows promising results, outperforming preceding reconstruction methods while simultaneously reducing the overall computational requirements. The main success of 3D Gaussian splatting relies on the efficient use of a differentiable rasterizer to render the Gaussian scene representation. One major drawback of this method is its underlying pinhole camera model. In this paper, we propose an extension of the existing method that removes this constraint and enables scene reconstructions using arbitrary camera optics such as highly distorting fisheye lenses. Our method achieves this by applying a differentiable warping function to the Gaussian scene representation. Additionally, we reduce overfitting in outdoor scenes by utilizing a learnable skybox, reducing the presence of floating artifacts within the reconstructed scene. Based on synthetic and real-world image datasets, we show that our method is capable of creating an accurate scene reconstruction from highly distorted images and rendering photorealistic images from such reconstructions. Full article
(This article belongs to the Special Issue Geometry Reconstruction from Images (2nd Edition))
23 pages, 3194 KiB  
Article
The Use of Hybrid CNN-RNN Deep Learning Models to Discriminate Tumor Tissue in Dynamic Breast Thermography
by Andrés Munguía-Siu, Irene Vergara and Juan Horacio Espinoza-Rodríguez
J. Imaging 2024, 10(12), 329; https://doi.org/10.3390/jimaging10120329 (registering DOI) - 21 Dec 2024
Abstract
Breast cancer is one of the leading causes of death for women worldwide, and early detection can help reduce the death rate. Infrared thermography has gained popularity as a non-invasive and rapid method for detecting this pathology and can be further enhanced by [...] Read more.
Breast cancer is one of the leading causes of death for women worldwide, and early detection can help reduce the death rate. Infrared thermography has gained popularity as a non-invasive and rapid method for detecting this pathology and can be further enhanced by applying neural networks to extract spatial and even temporal data derived from breast thermographic images if they are acquired sequentially. In this study, we evaluated hybrid convolutional-recurrent neural network (CNN-RNN) models based on five state-of-the-art pre-trained CNN architectures coupled with three RNNs to discern tumor abnormalities in dynamic breast thermographic images. The hybrid architecture that achieved the best performance for detecting breast cancer was VGG16-LSTM, which showed accuracy (ACC), sensitivity (SENS), and specificity (SPEC) of 95.72%, 92.76%, and 98.68%, respectively, with a CPU runtime of 3.9 s. However, the hybrid architecture that showed the fastest CPU runtime was AlexNet-RNN with 0.61 s, although with lower performance (ACC: 80.59%, SENS: 68.52%, SPEC: 92.76%), but still superior to AlexNet (ACC: 69.41%, SENS: 52.63%, SPEC: 86.18%) with 0.44 s. Our findings show that hybrid CNN-RNN models outperform stand-alone CNN models, indicating that temporal data recovery from dynamic breast thermographs is possible without significantly compromising classifier runtime. Full article
15 pages, 11038 KiB  
Article
X-Ray Image-Based Real-Time COVID-19 Diagnosis Using Deep Neural Networks (CXR-DNNs)
by Ali Yousuf Khan, Miguel-Angel Luque-Nieto, Muhammad Imran Saleem and Enrique Nava-Baro
J. Imaging 2024, 10(12), 328; https://doi.org/10.3390/jimaging10120328 - 19 Dec 2024
Viewed by 163
Abstract
On 11 February 2020, the prevalent outbreak of COVID-19, a coronavirus illness, was declared a global pandemic. Since then, nearly seven million people have died and over 765 million confirmed cases of COVID-19 have been reported. The goal of this study is to [...] Read more.
On 11 February 2020, the prevalent outbreak of COVID-19, a coronavirus illness, was declared a global pandemic. Since then, nearly seven million people have died and over 765 million confirmed cases of COVID-19 have been reported. The goal of this study is to develop a diagnostic tool for detecting COVID-19 infections more efficiently. Currently, the most widely used method is Reverse Transcription Polymerase Chain Reaction (RT-PCR), a clinical technique for infection identification. However, RT-PCR is expensive, has limited sensitivity, and requires specialized medical expertise. One of the major challenges in the rapid diagnosis of COVID-19 is the need for reliable imaging, particularly X-ray imaging. This work takes advantage of artificial intelligence (AI) techniques to enhance diagnostic accuracy by automating the detection of COVID-19 infections from chest X-ray (CXR) images. We obtained and analyzed CXR images from the Kaggle public database (4035 images in total), including cases of COVID-19, viral pneumonia, pulmonary opacity, and healthy controls. By integrating advanced techniques with transfer learning from pre-trained convolutional neural networks (CNNs), specifically InceptionV3, ResNet50, and Xception, we achieved an accuracy of 95%, significantly higher than the 85.5% achieved with ResNet50 alone. Additionally, our proposed method, CXR-DNNs, can accurately distinguish between three different types of chest X-ray images for the first time. This computer-assisted diagnostic tool has the potential to significantly enhance the speed and accuracy of COVID-19 diagnoses. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

19 pages, 2563 KiB  
Article
Optimization of Cocoa Pods Maturity Classification Using Stacking and Voting with Ensemble Learning Methods in RGB and LAB Spaces
by Kacoutchy Jean Ayikpa, Abou Bakary Ballo, Diarra Mamadou and Pierre Gouton
J. Imaging 2024, 10(12), 327; https://doi.org/10.3390/jimaging10120327 - 18 Dec 2024
Viewed by 297
Abstract
Determining the maturity of cocoa pods early is not just about guaranteeing harvest quality and optimizing yield. It is also about efficient resource management. Rapid identification of the stage of maturity helps avoid losses linked to a premature or late harvest, improving productivity. [...] Read more.
Determining the maturity of cocoa pods early is not just about guaranteeing harvest quality and optimizing yield. It is also about efficient resource management. Rapid identification of the stage of maturity helps avoid losses linked to a premature or late harvest, improving productivity. Early determination of cocoa pod maturity ensures both the quality and quantity of the harvest, as immature or overripe pods cannot produce premium cocoa beans. Our innovative research harnesses artificial intelligence and computer vision technologies to revolutionize the cocoa industry, offering precise and advanced tools for accurately assessing cocoa pod maturity. Providing an objective and rapid assessment enables farmers to make informed decisions about the optimal time to harvest, helping to maximize the yield of their plantations. Furthermore, by automating this process, these technologies reduce the margins for human error and improve the management of agricultural resources. With this in mind, our study proposes to exploit a computer vision method based on the GLCM (gray level co-occurrence matrix) algorithm to extract the characteristics of images in the RGB (red, green, blue) and LAB (luminance, axis between red and green, axis between yellow and blue) color spaces. This approach allows for in-depth image analysis, which is essential for capturing the nuances of cocoa pod maturity. Next, we apply classification algorithms to identify the best performers. These algorithms are then combined via stacking and voting techniques, allowing our model to be optimized by taking advantage of the strengths of each method, thus guaranteeing more robust and precise results. The results demonstrated that the combination of algorithms produced superior performance, especially in the LAB color space, where voting scored 98.49% and stacking 98.71%. In comparison, in the RGB color space, voting scored 96.59% and stacking 97.06%. These results surpass those generally reported in the literature, showing the increased effectiveness of combined approaches in improving the accuracy of classification models. This highlights the importance of exploring ensemble techniques to maximize performance in complex contexts such as cocoa pod maturity classification. Full article
(This article belongs to the Special Issue Imaging Applications in Agriculture)
Show Figures

Figure 1

38 pages, 3841 KiB  
Review
Computer Vision-Based Gait Recognition on the Edge: A Survey on Feature Representations, Models, and Architectures
by Edwin Salcedo
J. Imaging 2024, 10(12), 326; https://doi.org/10.3390/jimaging10120326 - 18 Dec 2024
Viewed by 227
Abstract
Computer vision-based gait recognition (CVGR) is a technology that has gained considerable attention in recent years due to its non-invasive, unobtrusive, and difficult-to-conceal nature. Beyond its applications in biometrics, CVGR holds significant potential for healthcare and human–computer interaction. Current CVGR systems often transmit [...] Read more.
Computer vision-based gait recognition (CVGR) is a technology that has gained considerable attention in recent years due to its non-invasive, unobtrusive, and difficult-to-conceal nature. Beyond its applications in biometrics, CVGR holds significant potential for healthcare and human–computer interaction. Current CVGR systems often transmit collected data to a cloud server for machine learning-based gait pattern recognition. While effective, this cloud-centric approach can result in increased system response times. Alternatively, the emerging paradigm of edge computing, which involves moving computational processes to local devices, offers the potential to reduce latency, enable real-time surveillance, and eliminate reliance on internet connectivity. Furthermore, recent advancements in low-cost, compact microcomputers capable of handling complex inference tasks (e.g., Jetson Nano Orin, Jetson Xavier NX, and Khadas VIM4) have created exciting opportunities for deploying CVGR systems at the edge. This paper reports the state of the art in gait data acquisition modalities, feature representations, models, and architectures for CVGR systems suitable for edge computing. Additionally, this paper addresses the general limitations and highlights new avenues for future research in the promising intersection of CVGR and edge computing. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

14 pages, 1855 KiB  
Article
Point-Cloud Instance Segmentation for Spinning Laser Sensors
by Alvaro Casado-Coscolla, Carlos Sanchez-Belenguer, Erik Wolfart and Vitor Sequeira
J. Imaging 2024, 10(12), 325; https://doi.org/10.3390/jimaging10120325 - 17 Dec 2024
Viewed by 293
Abstract
In this paper, we face the point-cloud segmentation problem for spinning laser sensors from a deep-learning (DL) perspective. Since the sensors natively provide their measurements in a 2D grid, we directly use state-of-the-art models designed for visual information for the segmentation task and [...] Read more.
In this paper, we face the point-cloud segmentation problem for spinning laser sensors from a deep-learning (DL) perspective. Since the sensors natively provide their measurements in a 2D grid, we directly use state-of-the-art models designed for visual information for the segmentation task and then exploit the range information to ensure 3D accuracy. This allows us to effectively address the main challenges of applying DL techniques to point clouds, i.e., lack of structure and increased dimensionality. To the best of our knowledge, this is the first work that faces the 3D segmentation problem from a 2D perspective without explicitly re-projecting 3D point clouds. Moreover, our approach exploits multiple channels available in modern sensors, i.e., range, reflectivity, and ambient illumination. We also introduce a novel data-mining pipeline that enables the annotation of 3D scans without human intervention. Together with this paper, we present a new public dataset with all the data collected for training and evaluating our approach, where point clouds preserve their native sensor structure and where every single measurement contains range, reflectivity, and ambient information, together with its associated 3D point. As experimental results show, our approach achieves state-of-the-art results both in terms of performance and inference time. Additionally, we provide a novel ablation test that analyses the individual and combined contributions of the different channels provided by modern laser sensors. Full article
Show Figures

Figure 1

12 pages, 2922 KiB  
Article
Exploiting 2D Neural Network Frameworks for 3D Segmentation Through Depth Map Analytics of Harvested Wild Blueberries (Vaccinium angustifolium Ait.)
by Connor C. Mullins, Travis J. Esau, Qamar U. Zaman, Ahmad A. Al-Mallahi and Aitazaz A. Farooque
J. Imaging 2024, 10(12), 324; https://doi.org/10.3390/jimaging10120324 - 15 Dec 2024
Viewed by 506
Abstract
This study introduced a novel approach to 3D image segmentation utilizing a neural network framework applied to 2D depth map imagery, with Z axis values visualized through color gradation. This research involved comprehensive data collection from mechanically harvested wild blueberries to populate 3D [...] Read more.
This study introduced a novel approach to 3D image segmentation utilizing a neural network framework applied to 2D depth map imagery, with Z axis values visualized through color gradation. This research involved comprehensive data collection from mechanically harvested wild blueberries to populate 3D and red–green–blue (RGB) images of filled totes through time-of-flight and RGB cameras, respectively. Advanced neural network models from the YOLOv8 and Detectron2 frameworks were assessed for their segmentation capabilities. Notably, the YOLOv8 models, particularly YOLOv8n-seg, demonstrated superior processing efficiency, with an average time of 18.10 ms, significantly faster than the Detectron2 models, which exceeded 57 ms, while maintaining high performance with a mean intersection over union (IoU) of 0.944 and a Matthew’s correlation coefficient (MCC) of 0.957. A qualitative comparison of segmentation masks indicated that the YOLO models produced smoother and more accurate object boundaries, whereas Detectron2 showed jagged edges and under-segmentation. Statistical analyses, including ANOVA and Tukey’s HSD test (α = 0.05), confirmed the superior segmentation performance of models on depth maps over RGB images (p < 0.001). This study concludes by recommending the YOLOv8n-seg model for real-time 3D segmentation in precision agriculture, providing insights that can enhance volume estimation, yield prediction, and resource management practices. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

13 pages, 3641 KiB  
Review
Current Role of CT Pulmonary Angiography in Pulmonary Embolism: A State-of-the-Art Review
by Ignacio Diaz-Lorenzo, Alberto Alonso-Burgos, Alfonsa Friera Reyes, Ruben Eduardo Pacios Blanco, Maria del Carmen de Benavides Bernaldo de Quiros and Guillermo Gallardo Madueño
J. Imaging 2024, 10(12), 323; https://doi.org/10.3390/jimaging10120323 - 15 Dec 2024
Viewed by 434
Abstract
The purpose of this study is to conduct a literature review on the current role of computed tomography pulmonary angiography (CTPA) in the diagnosis and prognosis of pulmonary embolism (PE). It addresses key topics such as the quantification of the thrombotic burden, its [...] Read more.
The purpose of this study is to conduct a literature review on the current role of computed tomography pulmonary angiography (CTPA) in the diagnosis and prognosis of pulmonary embolism (PE). It addresses key topics such as the quantification of the thrombotic burden, its role as a predictor of mortality, new diagnostic techniques that are available, the possibility of analyzing the thrombus composition to differentiate its evolutionary stage, and the applicability of artificial intelligence (AI) in PE through CTPA. The only finding from CTPA that has been validated as a prognostic factor so far is the right ventricle/left ventricle (RV/LV) diameter ratio being >1, which is associated with a 2.5-fold higher risk of all-cause mortality or adverse events, and a 5-fold higher risk of PE-related mortality. The increasing use of techniques such as dual-energy computed tomography allows for the more accurate diagnosis of perfusion defects, which may go undetected in conventional computed tomography, identifying up to 92% of these defects compared to 78% being detected by CTPA. Additionally, it is essential to explore the latest advances in the application of AI to CTPA, which are currently expanding and have demonstrated a 23% improvement in the detection of subsegmental emboli compared to manual interpretation. With deep image analysis, up to a 95% accuracy has been achieved in predicting PE severity based on the thrombus volume and perfusion deficits. These advancements over the past 10 years significantly contribute to early intervention strategies and, therefore, to the improvement of morbidity and mortality outcomes for these patients. Full article
(This article belongs to the Special Issue Tools and Techniques for Improving Radiological Imaging Applications)
Show Figures

Figure 1

22 pages, 838 KiB  
Article
MediScan: A Framework of U-Health and Prognostic AI Assessment on Medical Imaging
by Sibtain Syed, Rehan Ahmed, Arshad Iqbal, Naveed Ahmad and Mohammed Ali Alshara
J. Imaging 2024, 10(12), 322; https://doi.org/10.3390/jimaging10120322 - 13 Dec 2024
Viewed by 622
Abstract
With technological advancements, remarkable progress has been made with the convergence of health sciences and Artificial Intelligence (AI). Modern health systems are proposed to ease patient diagnostics. However, the challenge is to provide AI-based precautions to patients and doctors for more accurate risk [...] Read more.
With technological advancements, remarkable progress has been made with the convergence of health sciences and Artificial Intelligence (AI). Modern health systems are proposed to ease patient diagnostics. However, the challenge is to provide AI-based precautions to patients and doctors for more accurate risk assessment. The proposed healthcare system aims to integrate patients, doctors, laboratories, pharmacies, and administrative personnel use cases and their primary functions onto a single platform. The proposed framework can also process microscopic images, CT scans, X-rays, and MRI to classify malignancy and give doctors a set of AI precautions for patient risk assessment. The proposed framework incorporates various DCNN models for identifying different forms of tumors and fractures in the human body i.e., brain, bones, lungs, kidneys, and skin, and generating precautions with the help of the Fined-Tuned Large Language Model (LLM) i.e., Generative Pretrained Transformer 4 (GPT-4). With enough training data, DCNN can learn highly representative, data-driven, hierarchical image features. The GPT-4 model is selected for generating precautions due to its explanation, reasoning, memory, and accuracy on prior medical assessments and research studies. Classification models are evaluated by classification report (i.e., Recall, Precision, F1 Score, Support, Accuracy, and Macro and Weighted Average) and confusion matrix and have shown robust performance compared to the conventional schemes. Full article
Show Figures

Figure 1

16 pages, 1289 KiB  
Article
DAT: Deep Learning-Based Acceleration-Aware Trajectory Forecasting
by Ali Asghar Sharifi, Ali Zoljodi and Masoud Daneshtalab
J. Imaging 2024, 10(12), 321; https://doi.org/10.3390/jimaging10120321 - 13 Dec 2024
Viewed by 348
Abstract
As the demand for autonomous driving (AD) systems has increased, the enhancement of their safety has become critically important. A fundamental capability of AD systems is object detection and trajectory forecasting of vehicles and pedestrians around the ego-vehicle, which is essential for preventing [...] Read more.
As the demand for autonomous driving (AD) systems has increased, the enhancement of their safety has become critically important. A fundamental capability of AD systems is object detection and trajectory forecasting of vehicles and pedestrians around the ego-vehicle, which is essential for preventing potential collisions. This study introduces the Deep learning-based Acceleration-aware Trajectory forecasting (DAT) model, a deep learning-based approach for object detection and trajectory forecasting, utilizing raw sensor measurements. DAT is an end-to-end model that processes sequential sensor data to detect objects and forecasts their future trajectories at each time step. The core innovation of DAT lies in its novel forecasting module, which leverages acceleration data to enhance trajectory forecasting, leading to the consideration of a variety of agent motion models. We propose a robust and innovative method for estimating ground-truth acceleration for objects, along with an object detector that predicts acceleration attributes for each detected object and a novel method for trajectory forecasting. DAT is trained and evaluated on the NuScenes dataset, demonstrating its empirical effectiveness through extensive experiments. The results indicate that DAT significantly surpasses state-of-the-art methods, particularly in enhancing forecasting accuracy for objects exhibiting both linear and nonlinear motion patterns, achieving up to a 2× improvement. This advancement highlights the critical role of incorporating acceleration data into predictive models, representing a substantial step forward in the development of safer autonomous driving systems. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

16 pages, 5125 KiB  
Article
Multi-Level Feature Fusion in CNN-Based Human Action Recognition: A Case Study on EfficientNet-B7
by Pitiwat Lueangwitchajaroen, Sitapa Watcharapinchai, Worawit Tepsan and Sorn Sooksatra
J. Imaging 2024, 10(12), 320; https://doi.org/10.3390/jimaging10120320 (registering DOI) - 12 Dec 2024
Viewed by 454
Abstract
Accurate human action recognition is becoming increasingly important across various fields, including healthcare and self-driving cars. A simple approach to enhance model performance is incorporating additional data modalities, such as depth frames, point clouds, and skeleton information, while previous studies have predominantly used [...] Read more.
Accurate human action recognition is becoming increasingly important across various fields, including healthcare and self-driving cars. A simple approach to enhance model performance is incorporating additional data modalities, such as depth frames, point clouds, and skeleton information, while previous studies have predominantly used late fusion techniques to combine these modalities, our research introduces a multi-level fusion approach that combines information at early, intermediate, and late stages together. Furthermore, recognizing the challenges of collecting multiple data types in real-world applications, our approach seeks to exploit multimodal techniques while relying solely on RGB frames as the single data source. In our work, we used RGB frames from the NTU RGB+D dataset as the sole data source. From these frames, we extracted 2D skeleton coordinates and optical flow frames using pre-trained models. We evaluated our multi-level fusion approach with EfficientNet-B7 as a case study, and our methods demonstrated significant improvement, achieving 91.5% in NTU RGB+D 60 dataset accuracy compared to single-modality and single-view models. Despite their simplicity, our methods are also comparable to other state-of-the-art approaches. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

14 pages, 2304 KiB  
Article
Improved Generalizability in Medical Computer Vision: Hyperbolic Deep Learning in Multi-Modality Neuroimaging
by Cyrus Ayubcha, Sulaiman Sajed, Chady Omara, Anna B. Veldman, Shashi B. Singh, Yashas Ullas Lokesha, Alex Liu, Mohammad Ali Aziz-Sultan, Timothy R. Smith and Andrew Beam
J. Imaging 2024, 10(12), 319; https://doi.org/10.3390/jimaging10120319 - 12 Dec 2024
Viewed by 449
Abstract
Deep learning has shown significant value in automating radiological diagnostics but can be limited by a lack of generalizability to external datasets. Leveraging the geometric principles of non-Euclidean space, certain geometric deep learning approaches may offer an alternative means of improving model generalizability. [...] Read more.
Deep learning has shown significant value in automating radiological diagnostics but can be limited by a lack of generalizability to external datasets. Leveraging the geometric principles of non-Euclidean space, certain geometric deep learning approaches may offer an alternative means of improving model generalizability. This study investigates the potential advantages of hyperbolic convolutional neural networks (HCNNs) over traditional convolutional neural networks (CNNs) in neuroimaging tasks. We conducted a comparative analysis of HCNNs and CNNs across various medical imaging modalities and diseases, with a focus on a compiled multi-modality neuroimaging dataset. The models were assessed for their performance parity, robustness to adversarial attacks, semantic organization of embedding spaces, and generalizability. Zero-shot evaluations were also performed with ischemic stroke non-contrast CT images. HCNNs matched CNNs’ performance in less complex settings and demonstrated superior semantic organization and robustness to adversarial attacks. While HCNNs equaled CNNs in out-of-sample datasets identifying Alzheimer’s disease, in zero-shot evaluations, HCNNs outperformed CNNs and radiologists. HCNNs deliver enhanced robustness and organization in neuroimaging data. This likely underlies why, while HCNNs perform similarly to CNNs with respect to in-sample tasks, they confer improved generalizability. Nevertheless, HCNNs encounter efficiency and performance challenges with larger, complex datasets. These limitations underline the need for further optimization of HCNN architectures. HCNNs present promising improvements in generalizability and resilience for medical imaging applications, particularly in neuroimaging. Despite facing challenges with larger datasets, HCNNs enhance performance under adversarial conditions and offer better semantic organization, suggesting valuable potential in generalizable deep learning models in medical imaging and neuroimaging diagnostics. Full article
Show Figures

Figure 1

22 pages, 15973 KiB  
Article
Three-Dimensional Bone-Image Synthesis with Generative Adversarial Networks
by Christoph Angermann, Johannes Bereiter-Payr, Kerstin Stock, Gerald Degenhart and Markus Haltmeier
J. Imaging 2024, 10(12), 318; https://doi.org/10.3390/jimaging10120318 - 11 Dec 2024
Viewed by 415
Abstract
Medical image processing has been highlighted as an area where deep-learning-based models have the greatest potential. However, in the medical field, in particular, problems of data availability and privacy are hampering research progress and, thus, rapid implementation in clinical routine. The generation of [...] Read more.
Medical image processing has been highlighted as an area where deep-learning-based models have the greatest potential. However, in the medical field, in particular, problems of data availability and privacy are hampering research progress and, thus, rapid implementation in clinical routine. The generation of synthetic data not only ensures privacy but also allows the drawing of new patients with specific characteristics, enabling the development of data-driven models on a much larger scale. This work demonstrates that three-dimensional generative adversarial networks (GANs) can be efficiently trained to generate high-resolution medical volumes with finely detailed voxel-based architectures. In addition, GAN inversion is successfully implemented for the three-dimensional setting and used for extensive research on model interpretability and applications such as image morphing, attribute editing, and style mixing. The results are comprehensively validated on a database of three-dimensional HR-pQCT instances representing the bone micro-architecture of the distal radius. Full article
(This article belongs to the Special Issue Advances in Medical Imaging and Machine Learning)
Show Figures

Figure 1

22 pages, 3640 KiB  
Article
Evaluation of Color Difference Models for Wide Color Gamut and High Dynamic Range
by Olga Basova, Sergey Gladilin, Vladislav Kokhan, Mikhalina Kharkevich, Anastasia Sarycheva, Ivan Konovalenko, Mikhail Chobanu and Ilya Nikolaev
J. Imaging 2024, 10(12), 317; https://doi.org/10.3390/jimaging10120317 - 10 Dec 2024
Viewed by 390
Abstract
Color difference models (CDMs) are essential for accurate color reproduction in image processing. While CDMs aim to reflect perceived color differences (CDs) from psychophysical data, they remain largely untested in wide color gamut (WCG) and high dynamic range (HDR) contexts, which are underrepresented [...] Read more.
Color difference models (CDMs) are essential for accurate color reproduction in image processing. While CDMs aim to reflect perceived color differences (CDs) from psychophysical data, they remain largely untested in wide color gamut (WCG) and high dynamic range (HDR) contexts, which are underrepresented in current datasets. This gap highlights the need to validate CDMs across WCG and HDR. Moreover, the non-geodesic structure of perceptual color space necessitates datasets covering CDs of various magnitudes, while most existing datasets emphasize only small and threshold CDs. To address this, we collected a new dataset encompassing a broad range of CDs in WCG and HDR contexts and developed a novel CDM fitted to these data. Benchmarking various CDMs using STRESS and significant error fractions on both new and established datasets reveals that CAM16-UCS with power correction is the most versatile model, delivering strong average performance across WCG colors up to 1611 cd/m2. However, even the best CDM fails to achieve the desired accuracy limits and yields significant errors. CAM16-UCS, though promising, requires further refinement, particularly in its power correction component to better capture the non-geodesic structure of perceptual color space. Full article
(This article belongs to the Special Issue Color in Image Processing and Computer Vision)
Show Figures

Figure 1

11 pages, 1525 KiB  
Article
Toward Closing the Loop in Image-to-Image Conversion in Radiotherapy: A Quality Control Tool to Predict Synthetic Computed Tomography Hounsfield Unit Accuracy
by Paolo Zaffino, Ciro Benito Raggio, Adrian Thummerer, Gabriel Guterres Marmitt, Johannes Albertus Langendijk, Anna Procopio, Carlo Cosentino, Joao Seco, Antje Christin Knopf, Stefan Both and Maria Francesca Spadea
J. Imaging 2024, 10(12), 316; https://doi.org/10.3390/jimaging10120316 - 10 Dec 2024
Viewed by 532
Abstract
In recent years, synthetic Computed Tomography (CT) images generated from Magnetic Resonance (MR) or Cone Beam Computed Tomography (CBCT) acquisitions have been shown to be comparable to real CT images in terms of dose computation for radiotherapy simulation. However, until now, there has [...] Read more.
In recent years, synthetic Computed Tomography (CT) images generated from Magnetic Resonance (MR) or Cone Beam Computed Tomography (CBCT) acquisitions have been shown to be comparable to real CT images in terms of dose computation for radiotherapy simulation. However, until now, there has been no independent strategy to assess the quality of each synthetic image in the absence of ground truth. In this work, we propose a Deep Learning (DL)-based framework to predict the accuracy of synthetic CT in terms of Mean Absolute Error (MAE) without the need for a ground truth (GT). The proposed algorithm generates a volumetric map as an output, informing clinicians of the predicted MAE slice-by-slice. A cascading multi-model architecture was used to deal with the complexity of the MAE prediction task. The workflow was trained and tested on two cohorts of head and neck cancer patients with different imaging modalities: 27 MR scans and 33 CBCT. The algorithm evaluation revealed an accurate HU prediction (a median absolute prediction deviation equal to 4 HU for CBCT-based synthetic CTs and 6 HU for MR-based synthetic CTs), with discrepancies that do not affect the clinical decisions made on the basis of the proposed estimation. The workflow exhibited no systematic error in MAE prediction. This work represents a proof of concept about the feasibility of synthetic CT evaluation in daily clinical practice, and it paves the way for future patient-specific quality assessment strategies. Full article
Show Figures

Figure 1

12 pages, 599 KiB  
Article
PAS or Not PAS? The Sonographic Assessment of Placenta Accreta Spectrum Disorders and the Clinical Validation of a New Diagnostic and Prognostic Scoring System
by Antonella Vimercati, Arianna Galante, Margherita Fanelli, Francesca Cirignaco, Amerigo Vitagliano, Pierpaolo Nicolì, Andrea Tinelli, Antonio Malvasi, Miriam Dellino, Gianluca Raffaello Damiani, Barbara Crescenza, Giorgio Maria Baldini, Ettore Cicinelli and Marco Cerbone
J. Imaging 2024, 10(12), 315; https://doi.org/10.3390/jimaging10120315 - 10 Dec 2024
Viewed by 403
Abstract
This study aimed to evaluate our center’s experience in diagnosing and managing placenta accreta spectrum (PAS) in a high-risk population, focusing on prenatal ultrasound features associated with PAS severity and maternal outcomes. We conducted a retrospective analysis of 102 high-risk patients with confirmed [...] Read more.
This study aimed to evaluate our center’s experience in diagnosing and managing placenta accreta spectrum (PAS) in a high-risk population, focusing on prenatal ultrasound features associated with PAS severity and maternal outcomes. We conducted a retrospective analysis of 102 high-risk patients with confirmed placenta previa who delivered at our center between 2018 and 2023. Patients underwent transabdominal and transvaginal ultrasound scans, assessing typical sonographic features. Binary and multivariate logistic regression analyses were performed to identify sonographic markers predictive of PAS and relative complications. Key ultrasound features—retroplacental myometrial thinning (<1 mm), vascular lacunae, and retroplacental vascularization—were significantly associated with PAS and a higher risk of surgical complications. An exceedingly rare sign, the “riddled cervix” sign, was observed in only three patients with extensive cervical or parametrial involvement. Those patients had the worst surgical outcomes. This study highlights the utility of specific ultrasound features in stratifying PAS risk and guiding clinical and surgical management in high-risk pregnancies. The findings support integrating these markers into prenatal diagnostic protocols to improve patient outcomes and inform surgical planning. Full article
Show Figures

Figure 1

19 pages, 9164 KiB  
Article
A Regularization Method for Landslide Thickness Estimation
by Lisa Borgatti, Davide Donati, Liwei Hu, Germana Landi and Fabiana Zama
J. Imaging 2024, 10(12), 314; https://doi.org/10.3390/jimaging10120314 - 10 Dec 2024
Viewed by 369
Abstract
Accurate estimation of landslide depth is essential for practical hazard assessment and risk mitigation. This work addresses the problem of determining landslide depth from satellite-derived elevation data. Using the principle of mass conservation, this problem can be formulated as a linear inverse problem. [...] Read more.
Accurate estimation of landslide depth is essential for practical hazard assessment and risk mitigation. This work addresses the problem of determining landslide depth from satellite-derived elevation data. Using the principle of mass conservation, this problem can be formulated as a linear inverse problem. To solve the inverse problem, we present a regularization approach that computes approximate solutions and regularization parameters using the Balancing Principle. Synthetic data were carefully designed and generated to evaluate the method under controlled conditions, allowing for precise validation of its performance. Through comprehensive testing with this synthetic dataset, we demonstrate the method’s robustness across varying noise levels. When applied to real-world data from the Fels landslide in Alaska, the proposed method proved its practical value in reconstructing landslide thickness patterns. These reconstructions showed good agreement with existing geological interpretations, validating the method’s effectiveness in real-world scenarios. Full article
Show Figures

Figure 1

15 pages, 627 KiB  
Review
Real-Time Emotion Recognition for Improving the Teaching–Learning Process: A Scoping Review
by Cèlia Llurba and Ramon Palau
J. Imaging 2024, 10(12), 313; https://doi.org/10.3390/jimaging10120313 - 9 Dec 2024
Viewed by 460
Abstract
Emotion recognition (ER) is gaining popularity in various fields, including education. The benefits of ER in the classroom for educational purposes, such as improving students’ academic performance, are gradually becoming known. Thus, real-time ER is proving to be a valuable tool for teachers [...] Read more.
Emotion recognition (ER) is gaining popularity in various fields, including education. The benefits of ER in the classroom for educational purposes, such as improving students’ academic performance, are gradually becoming known. Thus, real-time ER is proving to be a valuable tool for teachers as well as for students. However, its feasibility in educational settings requires further exploration. This review offers learning experiences based on real-time ER with students to explore their potential in learning and in improving their academic achievement. The purpose is to present evidence of good implementation and suggestions for their successful application. The content analysis finds that most of the practices lead to significant improvements in terms of educational purposes. Nevertheless, the analysis identifies problems that might block the implementation of these practices in the classroom and in education; among the obstacles identified are the absence of privacy of the students and the support needs of the students. We conclude that artificial intelligence (AI) and ER are potential tools to approach the needs in ordinary classrooms, although reliable automatic recognition is still a challenge for researchers to achieve the best ER feature in real time, given the high input data variability. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis: Progress and Challenges)
Show Figures

Figure 1

7 pages, 1102 KiB  
Communication
Quantitative MRI Assessment of Post-Surgical Spinal Cord Injury Through Radiomic Analysis
by Azadeh Sharafi, Andrew P. Klein and Kevin M. Koch
J. Imaging 2024, 10(12), 312; https://doi.org/10.3390/jimaging10120312 - 8 Dec 2024
Viewed by 406
Abstract
This study investigates radiomic efficacy in post-surgical traumatic spinal cord injury (SCI), overcoming MRI limitations from metal artifacts to enhance diagnosis, severity assessment, and lesion characterization or prognosis and therapy guidance. Traumatic spinal cord injury (SCI) causes severe neurological deficits. While MRI allows [...] Read more.
This study investigates radiomic efficacy in post-surgical traumatic spinal cord injury (SCI), overcoming MRI limitations from metal artifacts to enhance diagnosis, severity assessment, and lesion characterization or prognosis and therapy guidance. Traumatic spinal cord injury (SCI) causes severe neurological deficits. While MRI allows qualitative injury evaluation, standard imaging alone has limitations for precise SCI diagnosis, severity stratification, and pathology characterization, which are needed to guide prognosis and therapy. Radiomics enables quantitative tissue phenotyping by extracting a high-dimensional set of descriptive texture features from medical images. However, the efficacy of postoperative radiomic quantification in the presence of metal-induced MRI artifacts from spinal instrumentation has yet to be fully explored. A total of 50 healthy controls and 12 SCI patients post-stabilization surgery underwent 3D multi-spectral MRI. Automated spinal cord segmentation was followed by radiomic feature extraction. Supervised machine learning categorized SCI versus controls, injury severity, and lesion location relative to instrumentation. Radiomics differentiated SCI patients (Matthews correlation coefficient (MCC) 0.97; accuracy 1.0), categorized injury severity (MCC: 0.95; ACC: 0.98), and localized lesions (MCC: 0.85; ACC: 0.90). Combined T1 and T2 features outperformed individual modalities across tasks with gradient boosting models showing the highest efficacy. The radiomic framework achieved excellent performance, differentiating SCI from controls and accurately categorizing injury severity. The ability to reliably quantify SCI severity and localization could potentially inform diagnosis, prognosis, and guide therapy. Further research is warranted to validate radiomic SCI biomarkers and explore clinical integration. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

59 pages, 3270 KiB  
Review
State-of-the-Art Deep Learning Methods for Microscopic Image Segmentation: Applications to Cells, Nuclei, and Tissues
by Fatma Krikid, Hugo Rositi and Antoine Vacavant
J. Imaging 2024, 10(12), 311; https://doi.org/10.3390/jimaging10120311 - 6 Dec 2024
Viewed by 722
Abstract
Microscopic image segmentation (MIS) is a fundamental task in medical imaging and biological research, essential for precise analysis of cellular structures and tissues. Despite its importance, the segmentation process encounters significant challenges, including variability in imaging conditions, complex biological structures, and artefacts (e.g., [...] Read more.
Microscopic image segmentation (MIS) is a fundamental task in medical imaging and biological research, essential for precise analysis of cellular structures and tissues. Despite its importance, the segmentation process encounters significant challenges, including variability in imaging conditions, complex biological structures, and artefacts (e.g., noise), which can compromise the accuracy of traditional methods. The emergence of deep learning (DL) has catalyzed substantial advancements in addressing these issues. This systematic literature review (SLR) provides a comprehensive overview of state-of-the-art DL methods developed over the past six years for the segmentation of microscopic images. We critically analyze key contributions, emphasizing how these methods specifically tackle challenges in cell, nucleus, and tissue segmentation. Additionally, we evaluate the datasets and performance metrics employed in these studies. By synthesizing current advancements and identifying gaps in existing approaches, this review not only highlights the transformative potential of DL in enhancing diagnostic accuracy and research efficiency but also suggests directions for future research. The findings of this study have significant implications for improving methodologies in medical and biological applications, ultimately fostering better patient outcomes and advancing scientific understanding. Full article
Show Figures

Figure 1

17 pages, 10713 KiB  
Article
UV Hyperspectral Imaging with Xenon and Deuterium Light Sources: Integrating PCA and Neural Networks for Analysis of Different Raw Cotton Types
by Mohammad Al Ktash, Mona Knoblich, Max Eberle, Frank Wackenhut and Marc Brecht
J. Imaging 2024, 10(12), 310; https://doi.org/10.3390/jimaging10120310 - 5 Dec 2024
Viewed by 500
Abstract
Ultraviolet (UV) hyperspectral imaging shows significant promise for the classification and quality assessment of raw cotton, a key material in the textile industry. This study evaluates the efficacy of UV hyperspectral imaging (225–408 nm) using two different light sources: xenon arc (XBO) and [...] Read more.
Ultraviolet (UV) hyperspectral imaging shows significant promise for the classification and quality assessment of raw cotton, a key material in the textile industry. This study evaluates the efficacy of UV hyperspectral imaging (225–408 nm) using two different light sources: xenon arc (XBO) and deuterium lamps, in comparison to NIR hyperspectral imaging. The aim is to determine which light source provides better differentiation between cotton types in UV hyperspectral imaging, as each interacts differently with the materials, potentially affecting imaging quality and classification accuracy. Principal component analysis (PCA) and Quadratic Discriminant Analysis (QDA) were employed to differentiate between various cotton types and hemp plant. PCA for the XBO illumination revealed that the first three principal components (PCs) accounted for 94.8% of the total variance: PC1 (78.4%) and PC2 (11.6%) clustered the samples into four main groups—hemp (HP), recycled cotton (RcC), and organic cotton (OC) from the other cotton samples—while PC3 (6%) further separated RcC. When using the deuterium light source, the first three PCs explained 89.4% of the variance, effectively distinguishing sample types such as HP, RcC, and OC from the remaining samples, with PC3 clearly separating RcC. When combining the PCA scores with QDA, the classification accuracy reached 76.1% for the XBO light source and 85.1% for the deuterium light source. Furthermore, a deep learning technique called a fully connected neural network for classification was applied. The classification accuracy for the XBO and deuterium light sources reached 83.6% and 90.1%, respectively. The results highlight the ability of this method to differentiate conventional and organic cotton, as well as hemp, and to identify distinct types of recycled cotton, suggesting varying recycling processes and possible common origins with raw cotton. These findings underscore the potential of UV hyperspectral imaging, coupled with chemometric models, as a powerful tool for enhancing cotton classification accuracy in the textile industry. Full article
(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)
Show Figures

Figure 1

17 pages, 3796 KiB  
Article
FastQAFPN-YOLOv8s-Based Method for Rapid and Lightweight Detection of Walnut Unseparated Material
by Junqiu Li, Jiayi Wang, Dexiao Kong, Qinghui Zhang and Zhenping Qiang
J. Imaging 2024, 10(12), 309; https://doi.org/10.3390/jimaging10120309 - 2 Dec 2024
Viewed by 560
Abstract
Walnuts possess significant nutritional and economic value. Fast and accurate sorting of shells and kernels will enhance the efficiency of automated production. Therefore, we propose a FastQAFPN-YOLOv8s object detection network to achieve rapid and precise detection of unsorted materials. The method uses lightweight [...] Read more.
Walnuts possess significant nutritional and economic value. Fast and accurate sorting of shells and kernels will enhance the efficiency of automated production. Therefore, we propose a FastQAFPN-YOLOv8s object detection network to achieve rapid and precise detection of unsorted materials. The method uses lightweight Pconv (Partial Convolution) operators to build the FasterNextBlock structure, which serves as the backbone feature extractor for the Fasternet feature extraction network. The ECIoU loss function, combining EIoU (Efficient-IoU) and CIoU (Complete-IoU), speeds up the adjustment of the prediction frame and the network regression. In the Neck section of the network, the QAFPN feature fusion extraction network is proposed to replace the PAN-FPN (Path Aggregation Network—Feature Pyramid Network) in YOLOv8s with a Rep-PAN structure based on the QARepNext reparameterization framework for feature fusion extraction to strike a balance between network performance and inference speed. To validate the method, we built a three-axis mobile sorting device and created a dataset of 3000 images of walnuts after shell removal for experiments. The results show that the improved network contains 6071008 parameters, a training time of 2.49 h, a model size of 12.3 MB, an mAP (Mean Average Precision) of 94.5%, and a frame rate of 52.1 FPS. Compared with the original model, the number of parameters decreased by 45.5%, with training time reduced by 32.7%, the model size shrunk by 45.3%, and frame rate improved by 40.8%. However, some accuracy is sacrificed due to the lightweight design, resulting in a 1.2% decrease in mAP. The network reduces the model size by 59.7 MB and 23.9 MB compared to YOLOv7 and YOLOv6, respectively, and improves the frame rate by 15.67 fps and 22.55 fps, respectively. The average confidence and mAP show minimal changes compared to YOLOv7 and improved by 4.2% and 2.4% compared to YOLOv6, respectively. The FastQAFPN-YOLOv8s detection method effectively reduces model size while maintaining recognition accuracy. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

12 pages, 1486 KiB  
Article
Elucidating Early Radiation-Induced Cardiotoxicity Markers in Preclinical Genetic Models Through Advanced Machine Learning and Cardiac MRI
by Dayeong An and El-Sayed Ibrahim
J. Imaging 2024, 10(12), 308; https://doi.org/10.3390/jimaging10120308 - 1 Dec 2024
Viewed by 554
Abstract
Radiation therapy (RT) is widely used to treat thoracic cancers but carries a risk of radiation-induced heart disease (RIHD). This study aimed to detect early markers of RIHD using machine learning (ML) techniques and cardiac MRI in a rat model. SS.BN3 consomic rats, [...] Read more.
Radiation therapy (RT) is widely used to treat thoracic cancers but carries a risk of radiation-induced heart disease (RIHD). This study aimed to detect early markers of RIHD using machine learning (ML) techniques and cardiac MRI in a rat model. SS.BN3 consomic rats, which have a more subtle RIHD phenotype compared to Dahl salt-sensitive (SS) rats, were treated with localized cardiac RT or sham at 10 weeks of age. Cardiac MRI was performed 8 and 10 weeks post-treatment to assess global and regional cardiac function. ML algorithms were applied to differentiate sham-treated and irradiated rats based on early changes in myocardial function. Despite normal global left ventricular ejection fraction in both groups, strain analysis showed significant reductions in the anteroseptal and anterolateral segments of irradiated rats. Gradient boosting achieved an F1 score of 0.94 and an ROC value of 0.95, while random forest showed an accuracy of 88%. These findings suggest that ML, combined with cardiac MRI, can effectively detect early preclinical changes in RIHD, particularly alterations in regional myocardial contractility, highlighting the potential of these techniques for early detection and monitoring of radiation-induced cardiac dysfunction. Full article
(This article belongs to the Special Issue Progress and Challenges in Biomedical Image Analysis)
Show Figures

Figure 1

17 pages, 648 KiB  
Article
Temporal Gap-Aware Attention Model for Temporal Action Proposal Generation
by Sorn Sooksatra and Sitapa Watcharapinchai
J. Imaging 2024, 10(12), 307; https://doi.org/10.3390/jimaging10120307 - 29 Nov 2024
Viewed by 531
Abstract
Temporal action proposal generation is a method for extracting temporal action instances or proposals from untrimmed videos. Existing methods often struggle to segment contiguous action proposals, which are a group of action boundaries with small temporal gaps. To address this limitation, we propose [...] Read more.
Temporal action proposal generation is a method for extracting temporal action instances or proposals from untrimmed videos. Existing methods often struggle to segment contiguous action proposals, which are a group of action boundaries with small temporal gaps. To address this limitation, we propose incorporating an attention mechanism to weigh the importance of each proposal within a contiguous group. This mechanism leverages the gap displacement between proposals to calculate attention scores, enabling a more accurate localization of action boundaries. We evaluate our method against a state-of-the-art boundary-based baseline on ActivityNet v1.3 and Thumos 2014 datasets. The experimental results demonstrate that our approach significantly improves the performance of short-duration and contiguous action proposals, achieving an average recall of 78.22%. Full article
Show Figures

Figure 1

39 pages, 3120 KiB  
Article
A Comparative Review of the SWEET Simulator: Theoretical Verification Against Other Simulators
by Amine Ben-Daoued, Frédéric Bernardin and Pierre Duthon
J. Imaging 2024, 10(12), 306; https://doi.org/10.3390/jimaging10120306 - 27 Nov 2024
Viewed by 504
Abstract
Accurate luminance-based image generation is critical in physically based simulations, as even minor inaccuracies in radiative transfer calculations can introduce noise or artifacts, adversely affecting image quality. The radiative transfer simulator, SWEET, uses a backward Monte Carlo approach, and its performance is analyzed [...] Read more.
Accurate luminance-based image generation is critical in physically based simulations, as even minor inaccuracies in radiative transfer calculations can introduce noise or artifacts, adversely affecting image quality. The radiative transfer simulator, SWEET, uses a backward Monte Carlo approach, and its performance is analyzed alongside other simulators to assess how Monte Carlo-induced biases vary with parameters like optical thickness and medium anisotropy. This work details the advancements made to SWEET since the previous publication, with a specific focus on a more comprehensive comparison with other simulators such as Mitsuba. The core objective is to evaluate the precision of SWEET by comparing radiometric quantities like luminance, which serves as a method for validating the simulator. This analysis is particularly important in contexts such as automotive camera imaging, where accurate scene representation is crucial to reducing noise and ensuring the reliability of image-based systems in autonomous driving. By focusing on detailed radiometric comparisons, this study underscores SWEET’s ability to minimize noise, thus providing high-quality imaging for advanced applications. Full article
Show Figures

Figure 1

16 pages, 20362 KiB  
Article
IngredSAM: Open-World Food Ingredient Segmentation via a Single Image Prompt
by Leyi Chen, Bowen Wang and Jiaxin Zhang
J. Imaging 2024, 10(12), 305; https://doi.org/10.3390/jimaging10120305 - 26 Nov 2024
Viewed by 548
Abstract
Food semantic segmentation is of great significance in the field of computer vision and artificial intelligence, especially in the application of food image analysis. Due to the complexity and variety of food, it is difficult to effectively handle this task using supervised methods. [...] Read more.
Food semantic segmentation is of great significance in the field of computer vision and artificial intelligence, especially in the application of food image analysis. Due to the complexity and variety of food, it is difficult to effectively handle this task using supervised methods. Thus, we introduce IngredSAM, a novel approach for open-world food ingredient semantic segmentation, extending the capabilities of the Segment Anything Model (SAM). Utilizing visual foundation models (VFMs) and prompt engineering, IngredSAM leverages discriminative and matchable semantic features between a single clean image prompt of specific ingredients and open-world images to guide the generation of accurate segmentation masks in real-world scenarios. This method addresses the challenges of traditional supervised models in dealing with the diverse appearances and class imbalances of food ingredients. Our framework demonstrates significant advancements in the segmentation of food ingredients without any training process, achieving 2.85% and 6.01% better performance than previous state-of-the-art methods on both FoodSeg103 and UECFoodPix datasets. IngredSAM exemplifies a successful application of one-shot, open-world segmentation, paving the way for downstream applications such as enhancements in nutritional analysis and consumer dietary trend monitoring. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

19 pages, 785 KiB  
Article
Transformer Dil-DenseUnet: An Advanced Architecture for Stroke Segmentation
by Nesrine Jazzar, Besma Mabrouk and Ali Douik
J. Imaging 2024, 10(12), 304; https://doi.org/10.3390/jimaging10120304 - 25 Nov 2024
Viewed by 521
Abstract
We propose a novel architecture, Transformer Dil-DenseUNet, designed to address the challenges of accurately segmenting stroke lesions in MRI images. Precise segmentation is essential for diagnosing and treating stroke patients, as it provides critical spatial insights into the affected brain regions and the [...] Read more.
We propose a novel architecture, Transformer Dil-DenseUNet, designed to address the challenges of accurately segmenting stroke lesions in MRI images. Precise segmentation is essential for diagnosing and treating stroke patients, as it provides critical spatial insights into the affected brain regions and the extent of damage. Traditional manual segmentation is labor-intensive and error-prone, highlighting the need for automated solutions. Our Transformer Dil-DenseUNet combines DenseNet, dilated convolutions, and Transformer blocks, each contributing unique strengths to enhance segmentation accuracy. The DenseNet component captures fine-grained details and global features by leveraging dense connections, improving both precision and feature reuse. The dilated convolutional blocks, placed before each DenseNet module, expand the receptive field, capturing broader contextual information essential for accurate segmentation. Additionally, the Transformer blocks within our architecture address CNN limitations in capturing long-range dependencies by modeling complex spatial relationships through multi-head self-attention mechanisms. We assess our model’s performance on the Ischemic Stroke Lesion Segmentation Challenge 2015 (SISS 2015) and ISLES 2022 datasets. In the testing phase, the model achieves a Dice coefficient of 0.80 ± 0.30 on SISS 2015 and 0.81 ± 0.33 on ISLES 2022, surpassing the current state-of-the-art results on these datasets. Full article
(This article belongs to the Special Issue Advances in Medical Imaging and Machine Learning)
Show Figures

Figure 1

21 pages, 11261 KiB  
Article
Enhanced YOLOv8 Ship Detection Empower Unmanned Surface Vehicles for Advanced Maritime Surveillance
by Abdelilah Haijoub, Anas Hatim, Antonio Guerrero-Gonzalez, Mounir Arioua and Khalid Chougdali
J. Imaging 2024, 10(12), 303; https://doi.org/10.3390/jimaging10120303 - 24 Nov 2024
Viewed by 671
Abstract
The evolution of maritime surveillance is significantly marked by the incorporation of Artificial Intelligence and machine learning into Unmanned Surface Vehicles (USVs). This paper presents an AI approach for detecting and tracking unmanned surface vehicles, specifically leveraging an enhanced version of YOLOv8, fine-tuned [...] Read more.
The evolution of maritime surveillance is significantly marked by the incorporation of Artificial Intelligence and machine learning into Unmanned Surface Vehicles (USVs). This paper presents an AI approach for detecting and tracking unmanned surface vehicles, specifically leveraging an enhanced version of YOLOv8, fine-tuned for maritime surveillance needs. Deployed on the NVIDIA Jetson TX2 platform, the system features an innovative architecture and perception module optimized for real-time operations and energy efficiency. Demonstrating superior detection accuracy with a mean Average Precision (mAP) of 0.99 and achieving an operational speed of 17.99 FPS, all while maintaining energy consumption at just 5.61 joules. The remarkable balance between accuracy, processing speed, and energy efficiency underscores the potential of this system to significantly advance maritime safety, security, and environmental monitoring. Full article
(This article belongs to the Section Visualization and Computer Graphics)
Show Figures

Figure 1

Previous Issue
Back to TopTop