Journal of Imaging doi: 10.3390/jimaging10030071
Authors: Sneha Paul Zachary Patterson Nizar Bouguila
The application of large field-of-view (FoV) cameras equipped with fish-eye lenses brings notable advantages to various real-world computer vision applications, including autonomous driving. While deep learning has proven successful in conventional computer vision applications using regular perspective images, its potential in fish-eye camera contexts remains largely unexplored due to limited datasets for fully supervised learning. Semi-supervised learning comes as a potential solution to manage this challenge. In this study, we explore and benchmark two popular semi-supervised methods from the perspective image domain for fish-eye image segmentation. We further introduce FishSegSSL, a novel fish-eye image segmentation framework featuring three semi-supervised components: pseudo-label filtering, dynamic confidence thresholding, and robust strong augmentation. Evaluation on the WoodScape dataset, collected from vehicle-mounted fish-eye cameras, demonstrates that our proposed method enhances the model’s performance by up to 10.49% over fully supervised methods using the same amount of labeled data. Our method also improves the existing image segmentation methods by 2.34%. To the best of our knowledge, this is the first work on semi-supervised semantic segmentation on fish-eye images. Additionally, we conduct a comprehensive ablation study and sensitivity analysis to showcase the efficacy of each proposed method in this research.
]]>Journal of Imaging doi: 10.3390/jimaging10030070
Authors: Mingyang Zhang Kristof Van Beeck Toon Goedemé
While Siamese object tracking has witnessed significant advancements, its hard real-time behaviour on embedded devices remains inadequately addressed. In many application cases, an embedded implementation should not only have a minimal execution latency, but this latency should ideally also have zero variance, i.e., be predictable. This study aims to address this issue by meticulously analysing real-time predictability across different components of a deep-learning-based video object tracking system. Our detailed experiments not only indicate the superiority of Field-Programmable Gate Array (FPGA) implementations in terms of hard real-time behaviour but also unveil important time predictability bottlenecks. We introduce dedicated hardware accelerators for key processes, focusing on depth-wise cross-correlation and padding operations, utilizing high-level synthesis (HLS). Implemented on a KV260 board, our enhanced tracker exhibits not only a speed up, with a factor of 6.6, in mean execution time but also significant improvements in hard real-time predictability by yielding 11 times less latency variation as compared to our baseline. A subsequent analysis of power consumption reveals our approach’s contribution to enhanced power efficiency. These advancements underscore the crucial role of hardware acceleration in realizing time-predictable object tracking on embedded systems, setting new standards for future hardware–software co-design endeavours in this domain.
]]>Journal of Imaging doi: 10.3390/jimaging10030069
Authors: Mikolaj Czerkawski Priti Upadhyay Christopher Davison Robert Atkinson Craig Michie Ivan Andonovic Malcolm Macdonald Javier Cardona Christos Tachtatzis
There are several image inverse tasks, such as inpainting or super-resolution, which can be solved using deep internal learning, a paradigm that involves employing deep neural networks to find a solution by learning from the sample itself rather than a dataset. For example, Deep Image Prior is a technique based on fitting a convolutional neural network to output the known parts of the image (such as non-inpainted regions or a low-resolution version of the image). However, this approach is not well adjusted for samples composed of multiple modalities. In some domains, such as satellite image processing, accommodating multi-modal representations could be beneficial or even essential. In this work, Multi-Modal Convolutional Parameterisation Network (MCPN) is proposed, where a convolutional neural network approximates shared information between multiple modes by combining a core shared network with modality-specific head networks. The results demonstrate that these approaches can significantly outperform the single-mode adoption of a convolutional parameterisation network on guided image inverse problems of inpainting and super-resolution.
]]>Journal of Imaging doi: 10.3390/jimaging10030068
Authors: Shintaro Ito Kanta Miura Koichi Ito Takafumi Aoki
In this paper, we propose a method to refine the depth maps obtained by Multi-View Stereo (MVS) through iterative optimization of the Neural Radiance Field (NeRF). MVS accurately estimates the depths on object surfaces, and NeRF accurately estimates the depths at object boundaries. The key ideas of the proposed method are to combine MVS and NeRF to utilize the advantages of both in depth map estimation and to use NeRF for depth map refinement. We also introduce a Huber loss into the NeRF optimization to improve the accuracy of the depth map refinement, where the Huber loss reduces the estimation error in the radiance fields by placing constraints on errors larger than a threshold. Through a set of experiments using the Redwood-3dscan dataset and the DTU dataset, which are public datasets consisting of multi-view images, we demonstrate the effectiveness of the proposed method compared to conventional methods: COLMAP, NeRF, and DS-NeRF.
]]>Journal of Imaging doi: 10.3390/jimaging10030067
Authors: San Chain Tun Tsubasa Onizuka Pyke Tin Masaru Aikawa Ikuo Kobayashi Thi Thi Zin
This study innovates livestock health management, utilizing a top-view depth camera for accurate cow lameness detection, classification, and precise segmentation through integration with a 3D depth camera and deep learning, distinguishing it from 2D systems. It underscores the importance of early lameness detection in cattle and focuses on extracting depth data from the cow’s body, with a specific emphasis on the back region’s maximum value. Precise cow detection and tracking are achieved through the Detectron2 framework and Intersection Over Union (IOU) techniques. Across a three-day testing period, with observations conducted twice daily with varying cow populations (ranging from 56 to 64 cows per day), the study consistently achieves an impressive average detection accuracy of 99.94%. Tracking accuracy remains at 99.92% over the same observation period. Subsequently, the research extracts the cow’s depth region using binary mask images derived from detection results and original depth images. Feature extraction generates a feature vector based on maximum height measurements from the cow’s backbone area. This feature vector is utilized for classification, evaluating three classifiers: Random Forest (RF), K-Nearest Neighbor (KNN), and Decision Tree (DT). The study highlights the potential of top-view depth video cameras for accurate cow lameness detection and classification, with significant implications for livestock health management.
]]>Journal of Imaging doi: 10.3390/jimaging10030066
Authors: Lisbeth Lyhne Kim Christian Houlind Johnny Christensen Radu L. Vijdea Meinhard R. Hansen Malene Roland V. Pedersen Helle Precht
This study aimed to test the accuracy of a magnetic resonance imaging (MRI)-based method to detect and characterise deep venous thrombosis (DVT) in the ilio-femoro-caval veins. Patients with verified DVT in the lower extremities with extension of the thrombi to the iliac veins, who were suitable for catheter-based venous thrombolysis, were included in this study. Before the intervention, magnetic resonance venography (MRV) was performed, and the ilio-femoro-caval veins were independently evaluated for normal appearance, stenosis, and occlusion by two single-blinded observers. The same procedure was used to evaluate digital subtraction phlebography (DSP), considered to be the gold standard, which made it possible to compare the results. A total of 123 patients were included for MRV and DSP, resulting in 246 image sets to be analysed. In total, 496 segments were analysed for occlusion, stenosis, or normal appearance. The highest sensitivity compared occlusion with either normal or stenosis (0.98) in MRV, while the lowest was found between stenosis and normal (0.84). Specificity varied from 0.59 (stenosis >< occlusion) to 0.94 (occlusion >< normal). The Kappa statistic was calculated as a measure of inter-observer agreement. The kappa value for MRV was 0.91 and for DSP, 0.80. In conclusion, MRV represents a sensitive method to analyse DVT in the pelvis veins with advantages such as no radiation and contrast and the possibility to investigate the anatomical relationship in the area.
]]>Journal of Imaging doi: 10.3390/jimaging10030065
Authors: Florian Côme Fizaine Patrick Bard Michel Paindavoine Cécile Robin Edouard Bouyé Raphaël Lefèvre Annie Vinter
Text line segmentation is a necessary preliminary step before most text transcription algorithms are applied. The leading deep learning networks used in this context (ARU-Net, dhSegment, and Doc-UFCN) are based on the U-Net architecture. They are efficient, but fall under the same concept, requiring a post-processing step to perform instance (e.g., text line) segmentation. In the present work, we test the advantages of Mask-RCNN, which is designed to perform instance segmentation directly. This work is the first to directly compare Mask-RCNN- and U-Net-based networks on text segmentation of historical documents, showing the superiority of the former over the latter. Three studies were conducted, one comparing these networks on different historical databases, another comparing Mask-RCNN with Doc-UFCN on a private historical database, and a third comparing the handwritten text recognition (HTR) performance of the tested networks. The results showed that Mask-RCNN outperformed ARU-Net, dhSegment, and Doc-UFCN using relevant line segmentation metrics, that performance evaluation should not focus on the raw masks generated by the networks, that a light mask processing is an efficient and simple solution to improve evaluation, and that Mask-RCNN leads to better HTR performance.
]]>Journal of Imaging doi: 10.3390/jimaging10030064
Authors: Anudari Khishigdelger Ahmed Salem Hyun-Soo Kang
Chest X-ray (CXR) imaging plays a pivotal role in diagnosing various pulmonary diseases, which account for a significant portion of the global mortality rate, as recognized by the World Health Organization (WHO). Medical practitioners routinely depend on CXR images to identify anomalies and make critical clinical decisions. Dramatic improvements in super-resolution (SR) have been achieved by applying deep learning techniques. However, some SR methods are very difficult to utilize due to their low-resolution inputs and features containing abundant low-frequency information, similar to the case of X-ray image super-resolution. In this paper, we introduce an advanced deep learning-based SR approach that incorporates the innovative residual-in-residual (RIR) structure to augment the diagnostic potential of CXR imaging. Specifically, we propose forming a light network consisting of residual groups built by residual blocks, with multiple skip connections to facilitate the efficient bypassing of abundant low-frequency information through multiple skip connections. This approach allows the main network to concentrate on learning high-frequency information. In addition, we adopted the dense feature fusion within residual groups and designed high parallel residual blocks for better feature extraction. Our proposed methods exhibit superior performance compared to existing state-of-the-art (SOTA) SR methods, delivering enhanced accuracy and notable visual improvements, as evidenced by our results.
]]>Journal of Imaging doi: 10.3390/jimaging10030063
Authors: Reagan E. Mandiya Hervé M. Kongo Selain K. Kasereka Kyamakya Kyandoghere Petro Mushidi Tshakwanda Nathanaël M. Kasoro
Rapid and precise identification of Coronavirus Disease 2019 (COVID-19) is pivotal for effective patient care, comprehending the pandemic’s trajectory, and enhancing long-term patient survival rates. Despite numerous recent endeavors in medical imaging, many convolutional neural network-based models grapple with the expressiveness problem and overfitting, and the training process of these models is always resource-intensive. This paper presents an innovative approach employing Xception, augmented with cutting-edge transfer learning techniques to forecast COVID-19 from X-ray thorax images. Our experimental findings demonstrate that the proposed model surpasses the predictive accuracy of established models in the domain, including Xception, VGG-16, and ResNet. This research marks a significant stride toward enhancing COVID-19 detection through a sophisticated and high-performing imaging model.
]]>Journal of Imaging doi: 10.3390/jimaging10030062
Authors: Gang Hu Conner Saeli
Deep edge detection is challenging, especially with the existing methods, like HED (holistic edge detection). These methods combine multiple feature side outputs (SOs) to create the final edge map, but they neglect diverse edge importance within one output. This creates a problem: to include desired edges, unwanted noise must also be accepted. As a result, the output often has increased noise or thick edges, ignoring important boundaries. To address this, we propose a new approach called the normalized Hadamard-product (NHP) operation-based deep network for edge detection. By multiplying the side outputs from the backbone network, the Hadamard-product operation encourages agreement among features across different scales while suppressing disagreed weak signals. This method produces additional Mutually Agreed Salient Edge (MASE) maps to enrich the hierarchical level of side outputs without adding complexity. Our experiments demonstrate that the NHP operation significantly improves performance, e.g., an ODS score reaching 0.818 on BSDS500, outperforming human performance (0.803), achieving state-of-the-art results in deep edge detection.
]]>Journal of Imaging doi: 10.3390/jimaging10030061
Authors: Shubham Rana Salvatore Gerbino Mariano Crimaldi Valerio Cirillo Petronia Carillo Fabrizio Sarghini Albino Maggio
This article is focused on the comprehensive evaluation of alleyways to scale-invariant feature transform (SIFT) and random sample consensus (RANSAC) based multispectral (MS) image registration. In this paper, the idea is to extensively evaluate three such SIFT- and RANSAC-based registration approaches over a heterogenous mix containing Triticum aestivum crop and Raphanus raphanistrum weed. The first method is based on the application of a homography matrix, derived during the registration of MS images on spatial coordinates of individual annotations to achieve spatial realignment. The second method is based on the registration of binary masks derived from the ground truth of individual spectral channels. The third method is based on the registration of only the masked pixels of interest across the respective spectral channels. It was found that the MS image registration technique based on the registration of binary masks derived from the manually segmented images exhibited the highest accuracy, followed by the technique involving registration of masked pixels, and lastly, registration based on the spatial realignment of annotations. Among automatically segmented images, the technique based on the registration of automatically predicted mask instances exhibited higher accuracy than the technique based on the registration of masked pixels. In the ground truth images, the annotations performed through the near-infrared channel were found to have a higher accuracy, followed by green, blue, and red spectral channels. Among the automatically segmented images, the accuracy of the blue channel was observed to exhibit a higher accuracy, followed by the green, near-infrared, and red channels. At the individual instance level, the registration based on binary masks depicted the highest accuracy in the green channel, followed by the method based on the registration of masked pixels in the red channel, and lastly, the method based on the spatial realignment of annotations in the green channel. The instance detection of wild radish with YOLOv8l-seg was observed at a mAP@0.5 of 92.11% and a segmentation accuracy of 98% towards segmenting its binary mask instances.
]]>Journal of Imaging doi: 10.3390/jimaging10030060
Authors: Robert Zboray Wolf Schweitzer Lars Ebert Martin Wolf Sabino Guglielmini Stefan Haemmerle Stephan Weiss Bruno Koller
The rate of parental consent for fetal and perinatal autopsy is decreasing, whereas parents are more likely to agree to virtual autopsy by non-invasive imaging methods. Fetal and perinatal virtual autopsy needs high-resolution and good soft-tissue contrast for investigation of the cause of death and underlying trauma or pathology in fetuses and stillborn infants. This is offered by micro-computed tomography (CT), as opposed to the limited resolution provided by clinical CT scanners, and this is one of the most promising tools for non-invasive perinatal postmortem imaging. We developed and optimized a micro-CT scanner with a dual-energy imaging option. It is dedicated to post-mortem CT angiography and virtual autopsy of fetuses and stillborn infants in that the chamber can be cooled down to around 5 °C; this increases tissue rigidity and slows decomposition of the native specimen. This, together with the dedicated gantry-based architecture, attempts to reduce potential motion artifacts. The developed methodology is based on prior endovascular injection of a BaSO4-based contrast agent. We explain the design choices and considerations for this scanner prototype. We give details of the treatment of the optimization of the dual-energy and virtual mono-energetic imaging option that has been based on minimizing noise propagation and maximizing the contrast-to-noise ratio for vascular features. We demonstrate the scanner capabilities with proof-of-concept experiments on phantoms and stillborn piglets.
]]>Journal of Imaging doi: 10.3390/jimaging10030059
Authors: Yanhua Huang Zhendong Wu Juan Chen Hui Xiang
Personal privacy protection has been extensively investigated. The privacy protection of face recognition applications combines face privacy protection with face recognition. Traditional face privacy-protection methods encrypt or perturb facial images for protection. However, the original facial images or parameters need to be restored during recognition. In this paper, it is found that faces can still be recognized correctly when only some of the high-order and local feature information from faces is retained, while the rest of the information is fuzzed. Based on this, a privacy-preserving face recognition method combining random convolution and self-learning batch normalization is proposed. This method generates a privacy-preserved scrambled facial image and an image fuzzy degree that is close to an encryption of the image. The server directly recognizes the scrambled facial image, and the recognition accuracy is equivalent to that of the normal facial image. The method ensures the revocability and irreversibility of the privacy preserving of faces at the same time. In this experiment, the proposed method is tested on the LFW, Celeba, and self-collected face datasets. On the three datasets, the proposed method outperforms the existing face privacy-preserving recognition methods in terms of face visual information elimination and recognition accuracy. The recognition accuracy is >99%, and the visual information elimination is close to an encryption effect.
]]>Journal of Imaging doi: 10.3390/jimaging10030058
Authors: Se-On Kim Yoon-Chul Kim
Centerline tracking is useful in performing segmental analysis of vessel tortuosity in angiography data. However, a highly tortuous) artery can produce multiple centerlines due to over-segmentation of the artery, resulting in inaccurate path-finding results when using the shortest path-finding algorithm. In this study, the internal carotid arteries (ICAs) from three-dimensional (3D) time-of-flight magnetic resonance angiography (TOF MRA) data were used to demonstrate the effectiveness of a new path-finding method. The method is based on a series of depth-first searches (DFSs) with randomly different orders of neighborhood searches and produces an appropriate path connecting the two endpoints in the ICAs. It was compared with three existing methods which were (a) DFS with a sequential order of neighborhood search, (b) Dijkstra algorithm, and (c) A* algorithm. The path-finding accuracy was evaluated by counting the number of successful paths. The method resulted in an accuracy of 95.8%, outperforming the three existing methods. In conclusion, the proposed method has been shown to be more suitable as a path-finding procedure than the existing methods, particularly in cases where there is more than one centerline resulting from over-segmentation of a highly tortuous artery.
]]>Journal of Imaging doi: 10.3390/jimaging10030057
Authors: Zhongliang Guo Ognjen Arandjelović David Reid Yaxiong Lei Jochen Büttner
Jochen Büttner was not included as an author in the original publication [...]
]]>Journal of Imaging doi: 10.3390/jimaging10030056
Authors: Qiwen Lu Shengbo Chen Xiaoke Zhu
Language bias stands as a noteworthy concern in visual question answering (VQA), wherein models tend to rely on spurious correlations between questions and answers for prediction. This prevents the models from effectively generalizing, leading to a decrease in performance. In order to address this bias, we propose a novel modality fusion collaborative de-biasing algorithm (CoD). In our approach, bias is considered as the model’s neglect of information from a particular modality during prediction. We employ a collaborative training approach to facilitate mutual modeling between different modalities, achieving efficient feature fusion and enabling the model to fully leverage multimodal knowledge for prediction. Our experiments on various datasets, including VQA-CP v2, VQA v2, and VQA-VS, using different validation strategies, demonstrate the effectiveness of our approach. Notably, employing a basic baseline model resulted in an accuracy of 60.14% on VQA-CP v2.
]]>Journal of Imaging doi: 10.3390/jimaging10030055
Authors: Majid Ansari-Asl Markus Barbieri Gaël Obein Jon Yngve Hardeberg
The application of materials with changing visual properties with lighting and observation directions has found broad utility across diverse industries, from architecture and fashion to automotive and film production. The expanding array of applications and appearance reproduction requirements emphasizes the critical role of material appearance measurement and surface characterization. Such measurements offer twofold benefits in soft proofing and product quality control, reducing errors and material waste while providing objective quality assessment. Some image-based setups have been proposed to capture the appearance of material surfaces with spatial variations in visual properties in terms of Spatially Varying Bidirectional Reflectance Distribution Functions (SVBRDF) and Bidirectional Texture Functions (BTF). However, comprehensive exploration of optical design concerning spectral channels and per-pixel incident-reflection direction calculations, along with measurement validation, remains an unexplored domain within these systems. Therefore, we developed a novel advanced multispectral image-based device designed to measure SVBRDF and BTF, addressing these gaps in the existing literature. Central to this device is a novel rotation table as sample holder and passive multispectral imaging. In this paper, we present our compact multispectral image-based appearance measurement device, detailing its design, assembly, and optical considerations. Preliminary measurements showcase the device’s potential in capturing angular and spectral data, promising valuable insights into material appearance properties.
]]>Journal of Imaging doi: 10.3390/jimaging10030054
Authors: Yuko Harada Kyosuke Shimada Satoshi John Harada Tomomi Sato Yukino Kubota Miyoko Yamashita
The mortality rate of cancer patients has been decreasing; however, patients often suffer from cardiac disorders due to chemotherapy or other cancer therapies (e.g., cancer-therapy-related cardiovascular toxicity (CVR-CVT)). Therefore, the field of cardio-oncology has drawn more attention in recent years. The first European Society of Cardiology (ESC) guidelines on cardio-oncology was established last year. Echocardiography is the gold standard for the diagnosis of CVR-CVT, but many breast cancer patients are unable to undergo echocardiography due to their surgery wounds or anatomical reasons. We performed a study to evaluate the usefulness of myocardial scintigraphy using Iodine-123 β-methyl-P-iodophenyl-pentadecanoic acid (123I-BMIPP) in comparison with echocardiography and published the results in the Journal of Imaging last year. This is the secondary analysis following our previous study. A total of 114 breast cancer patients who received chemotherapy within 3 years underwent echocardiography, as well as Thallium (201Tl) and 123I-BMIPP myocardial perfusion and metabolism scintigraphy. The ratio of isotope uptake reduction was scored by Heart Risk View-S software (Nihon Medi-Physics). The scores were then compared with the echocardiography parameters. All the patients’ charts and data from January 2022 to November 2023 were reviewed for the secondary analysis. Echocardiogram parameters were obtained from 99 patients (87% of total patients). No correlations were found between the echocardiography parameters and Heart Risk View-S scores of 201Tl myocardial perfusion scintigraphy, nor those of the BMIPP myocardial metabolism scintigraphy. In total, 8 patients out of 114 (7.0%) died within 22 months, while 3 patients out of 26 CVR-CVT patients (11.5%) died within 22 months. Evaluation by echocardiography was sometimes difficult to perform on breast cancer patients. However, other imaging modalities, including myocardial scintigraphy, cannot serve as alternatives to echocardiography. Cardiac scintigraphy detects circulation disorder or metabolism disorder in the myocardium; therefore, it should be able to reveal myocardial damage to some extent. The mortality rate of breast cancer patients was higher with CVR-CVT. A new modality to detect CVR-CVT besides echocardiography can possibly be anticipated for patients who cannot undergo echocardiography.
]]>Journal of Imaging doi: 10.3390/jimaging10030053
Authors: Giuseppe Bonifazi Paolo Barontini Riccardo Gasbarrone Davide Gattabria Silvia Serranti
In this manuscript, a method that utilizes classical image techniques to assess particle aggregation and segregation, with the primary goal of validating particle size distribution determined by conventional methods, is presented. This approach can represent a supplementary tool in quality control systems for powder production processes in industries such as manufacturing and pharmaceuticals. The methodology involves the acquisition of high-resolution images, followed by their fractal and textural analysis. Fractal analysis plays a crucial role by quantitatively measuring the complexity and self-similarity of particle structures. This approach allows for the numerical evaluation of aggregation and segregation phenomena, providing valuable insights into the underlying mechanisms at play. Textural analysis contributes to the characterization of patterns and spatial correlations observed in particle images. The examination of textural features offers an additional understanding of particle arrangement and organization. Consequently, it aids in validating the accuracy of particle size distribution measurements. To this end, by incorporating fractal and structural analysis, a methodology that enhances the reliability and accuracy of particle size distribution validation is obtained. It enables the identification of irregularities, anomalies, and subtle variations in particle arrangements that might not be detected by traditional measurement techniques alone.
]]>Journal of Imaging doi: 10.3390/jimaging10030052
Authors: Jiahao Xia Gavin Gong Jiawei Liu Zhigang Zhu Hao Tang
In this paper, a Segment Anything Model (SAM)-based pedestrian infrastructure segmentation workflow is designed and optimized, which is capable of efficiently processing multi-sourced geospatial data, including LiDAR data and satellite imagery data. We used an expanded definition of pedestrian infrastructure inventory, which goes beyond the traditional transportation elements to include street furniture objects that are important for accessibility but are often omitted from the traditional definition. Our contributions lie in producing the necessary knowledge to answer the following three questions. First, how can mobile LiDAR technology be leveraged to produce comprehensive pedestrian-accessible infrastructure inventory? Second, which data representation can facilitate zero-shot segmentation of infrastructure objects with SAM? Third, how well does the SAM-based method perform on segmenting pedestrian infrastructure objects? Our proposed method is designed to efficiently create pedestrian-accessible infrastructure inventory through the zero-shot segmentation of multi-sourced geospatial datasets. Through addressing three research questions, we show how the multi-mode data should be prepared, what data representation works best for what asset features, and how SAM performs on these data presentations. Our findings indicate that street-view images generated from mobile LiDAR point-cloud data, when paired with satellite imagery data, can work efficiently with SAM to create a scalable pedestrian infrastructure inventory approach with immediate benefits to GIS professionals, city managers, transportation owners, and walkers, especially those with travel-limiting disabilities, such as individuals who are blind, have low vision, or experience mobility disabilities.
]]>Journal of Imaging doi: 10.3390/jimaging10030051
Authors: Tirui Wu Ciaran Eising Martin Glavin Edward Jones
Image decolorization is an image pre-processing step which is widely used in image analysis, computer vision, and printing applications. The most commonly used methods give each color channel (e.g., the R component in RGB format, or the Y component of an image in CIE-XYZ format) a constant weight without considering image content. This approach is simple and fast, but it may cause significant information loss when images contain too many isoluminant colors. In this paper, we propose a new method which is not only efficient, but also can preserve a higher level of image contrast and detail than the traditional methods. It uses the information from the cumulative distribution function (CDF) of the information in each color channel to compute a weight for each pixel in each color channel. Then, these weights are used to combine the three color channels (red, green, and blue) to obtain the final grayscale value. The algorithm works in RGB color space directly without any color conversion. In order to evaluate the proposed algorithm objectively, two new metrics are also developed. Experimental results show that the proposed algorithm can run as efficiently as the traditional methods and obtain the best overall performance across four different metrics.
]]>Journal of Imaging doi: 10.3390/jimaging10020050
Authors: Alessandro Benfenati Pasquale Cascarano
The Plug-and-Play framework has demonstrated that a denoiser can implicitly serve as the image prior for model-based methods for solving various inverse problems such as image restoration tasks. This characteristic enables the integration of the flexibility of model-based methods with the effectiveness of learning-based denoisers. However, the regularization strength induced by denoisers in the traditional Plug-and-Play framework lacks a physical interpretation, necessitating demanding parameter tuning. This paper addresses this issue by introducing the Constrained Plug-and-Play (CPnP) method, which reformulates the traditional PnP as a constrained optimization problem. In this formulation, the regularization parameter directly corresponds to the amount of noise in the measurements. The solution to the constrained problem is obtained through the design of an efficient method based on the Alternating Direction Method of Multipliers (ADMM). Our experiments demonstrate that CPnP outperforms competing methods in terms of stability and robustness while also achieving competitive performance for image quality.
]]>Journal of Imaging doi: 10.3390/jimaging10020049
Authors: Midori Tanaka Tsubasa Ando Takahiko Horiuchi
Depending on various design conditions, including optics and circuit design, the image-forming characteristics of the modulated transfer function (MTF), which affect the spatial resolution of a digital image, may vary among image channels within or between imaging devices. In this study, we propose a method for automatically converting the MTF to the target MTF, focusing on adjusting the MTF characteristics that affect the signals of different image channels within and between different image devices. The experimental results of MTF conversion using the proposed method for multiple image channels with different MTF characteristics indicated that the proposed method could produce sharper images by moving the source MTF of each channel closer to a target MTF with a higher MTF value. This study is expected to contribute to technological advancements in various imaging devices as follows: (1) Even if the imaging characteristics of the hardware are unknown, the MTF can be converted to the target MTF using the image after it is captured. (2) As any MTF can be converted into a target, image simulation for conversion to a different MTF is possible. (3) It is possible to generate high-definition images, thereby meeting the requirements of various industrial and research fields in which high-definition images are required.
]]>Journal of Imaging doi: 10.3390/jimaging10020048
Authors: Maria-Eugenia Sánchez-Morales José-Trinidad Guillen-Bonilla Héctor Guillen-Bonilla Alex Guillen-Bonilla Jorge Aguilar-Santiago Maricela Jiménez-Rodríguez
This paper proposes the transformation S→C→, where S is a digital gray-level image and C→ is a vector expressed through the textural space. The proposed transformation is denominated Vectorial Image Representation on the Texture Space (VIR-TS), given that the digital image S is represented by the textural vector C→. This vector C→ contains all of the local texture characteristics in the image of interest, and the texture unit T→ entertains a vectorial character, since it is defined through the resolution of a homogeneous equation system. For the application of this transformation, a new classifier for multiple classes is proposed in the texture space, where the vector C→ is employed as a characteristics vector. To verify its efficiency, it was experimentally deployed for the recognition of digital images of tree barks, obtaining an effective performance. In these experiments, the parametric value λ employed to solve the homogeneous equation system does not affect the results of the image classification. The VIR-TS transform possesses potential applications in specific tasks, such as locating missing persons, and the analysis and classification of diagnostic and medical images.
]]>Journal of Imaging doi: 10.3390/jimaging10020047
Authors: Dunia Pineda Medina Ileana Miranda Cabrera Rolisbel Alfonso de la Cruz Lizandra Guerra Arzuaga Sandra Cuello Portal Monica Bianchini
Artificial intelligence techniques are now widely used in various agricultural applications, including the detection of devastating diseases such as late blight (Phytophthora infestans) and early blight (Alternaria solani) affecting potato (Solanum tuberorsum L.) crops. In this paper, we present a mobile application for detecting potato crop diseases based on deep neural networks. The images were taken from the PlantVillage dataset with a batch of 1000 images for each of the three identified classes (healthy, early blight-diseased, late blight-diseased). An exploratory analysis of the architectures used for early and late blight diagnosis in potatoes was performed, achieving an accuracy of 98.7%, with MobileNetv2. Based on the results obtained, an offline mobile application was developed, supported on devices with Android 4.1 or later, also featuring an information section on the 27 diseases affecting potato crops and a gallery of symptoms. For future work, segmentation techniques will be used to highlight the damaged region in the potato leaf by evaluating its extent and possibly identifying different types of diseases affecting the same plant.
]]>Journal of Imaging doi: 10.3390/jimaging10020046
Authors: Dennis Siegel Christian Kraetzer Stefan Seidlitz Jana Dittmann
In recent discussions in the European Parliament, the need for regulations for so-called high-risk artificial intelligence (AI) systems was identified, which are currently codified in the upcoming EU Artificial Intelligence Act (AIA) and approved by the European Parliament. The AIA is the first document to be turned into European Law. This initiative focuses on turning AI systems in decision support systems (human-in-the-loop and human-in-command), where the human operator remains in control of the system. While this supposedly solves accountability issues, it includes, on one hand, the necessary human–computer interaction as a potential new source of errors; on the other hand, it is potentially a very effective approach for decision interpretation and verification. This paper discusses the necessary requirements for high-risk AI systems once the AIA comes into force. Particular attention is paid to the opportunities and limitations that result from the decision support system and increasing the explainability of the system. This is illustrated using the example of the media forensic task of DeepFake detection.
]]>Journal of Imaging doi: 10.3390/jimaging10020045
Authors: Soumick Chatterjee Fatima Saad Chompunuch Sarasaen Suhita Ghosh Valerie Krug Rupali Khatun Rahul Mishra Nirja Desai Petia Radeva Georg Rose Sebastian Stober Oliver Speck Andreas Nürnberger
The outbreak of COVID-19 has shocked the entire world with its fairly rapid spread, and has challenged different sectors. One of the most effective ways to limit its spread is the early and accurate diagnosing of infected patients. Medical imaging, such as X-ray and computed tomography (CT), combined with the potential of artificial intelligence (AI), plays an essential role in supporting medical personnel in the diagnosis process. Thus, in this article, five different deep learning models (ResNet18, ResNet34, InceptionV3, InceptionResNetV2, and DenseNet161) and their ensemble, using majority voting, have been used to classify COVID-19, pneumoniæ and healthy subjects using chest X-ray images. Multilabel classification was performed to predict multiple pathologies for each patient, if present. Firstly, the interpretability of each of the networks was thoroughly studied using local interpretability methods—occlusion, saliency, input X gradient, guided backpropagation, integrated gradients, and DeepLIFT—and using a global technique—neuron activation profiles. The mean micro F1 score of the models for COVID-19 classifications ranged from 0.66 to 0.875, and was 0.89 for the ensemble of the network models. The qualitative results showed that the ResNets were the most interpretable models. This research demonstrates the importance of using interpretability methods to compare different models before making a decision regarding the best performing model.
]]>Journal of Imaging doi: 10.3390/jimaging10020044
Authors: Chin-Chen Chang Ping-Hao Peng
Neural style transfer is an algorithm that transfers the style of one image to another image and converts the style of the second image while preserving its content. In this paper, we propose a style transfer approach for sand painting generation based on convolutional neural networks. The proposed approach aims to improve sand painting generation via neural style transfer, which can address the problem of blurred objects. Furthermore, it can reduce background noise caused by neural style transfers. First, we segment the main objects from the content image. Subsequently, we perform close–open filtering operations on the content image to obtain smooth images. Subsequently, we perform Sobel edge detection to process the images and obtain edge maps. Based on these edge maps and the input style image, we perform neural style transfer to generate sand painting images. Finally, we integrate the generated images to obtain the final stylized sand painting image. The results show that the proposed approach yields good visual effects from sand paintings. Moreover, the proposed approach achieves better visual effects for sand painting than the previous method.
]]>Journal of Imaging doi: 10.3390/jimaging10020043
Authors: Francisco J. Ávila Juan M. Bueno
The optical quality of an image depends on both the optical properties of the imaging system and the physical properties of the medium the light passes while travelling from the object to the image plane. The computation of the point spread function (PSF) associated to the optical system is often used to assess the image quality. In a non-ideal optical system, the PSF is affected by aberrations that distort the final image. Moreover, in the presence of turbid media, the scattering phenomena spread the light at wide angular distributions that contribute to reduce contrast and sharpness. If the mathematical degradation operator affecting the recorded image is known, the image can be restored through deconvolution methods. In some scenarios, no (or partial) information on the PSF is available. In those cases, blind deconvolution approaches arise as useful solutions for image restoration. In this work, a new blind deconvolution method is proposed to restore images using spherical aberration (SA) and scatter-based kernel filters. The procedure was evaluated in different microscopy images. The results show the capability of the algorithm to detect both degradation coefficients (i.e., SA and scattering) and to restore images without information on the real PSF.
]]>Journal of Imaging doi: 10.3390/jimaging10020042
Authors: Adrian-Alin Barglazan Remus Brad Constantin Constantinescu
In recent years, significant advancements in the field of machine learning have influenced the domain of image restoration. While these technological advancements present prospects for improving the quality of images, they also present difficulties, particularly the proliferation of manipulated or counterfeit multimedia information on the internet. The objective of this paper is to provide a comprehensive review of existing inpainting algorithms and forgery detections, with a specific emphasis on techniques that are designed for the purpose of removing objects from digital images. In this study, we will examine various techniques encompassing conventional texture synthesis methods as well as those based on neural networks. Furthermore, we will present the artifacts frequently introduced by the inpainting procedure and assess the state-of-the-art technology for detecting such modifications. Lastly, we shall look at the available datasets and how the methods compare with each other. Having covered all the above, the outcome of this study is to provide a comprehensive perspective on the abilities and constraints of detecting object removal via the inpainting procedure in images.
]]>Journal of Imaging doi: 10.3390/jimaging10020041
Authors: Kani Djoulde Boukar Ousman Abboubakar Hamadjam Laurent Bitjoka Clergé Tchiegang
The purpose of this work is to classify pepper seeds using color filter array (CFA) images. This study focused specifically on Penja pepper, which is found in the Litoral region of Cameroon and is a type of Piper nigrum. India and Brazil are the largest producers of this variety of pepper, although the production of Penja pepper is not as significant in terms of quantity compared to other major producers. However, it is still highly sought after and one of the most expensive types of pepper on the market. It can be difficult for humans to distinguish between different types of peppers based solely on the appearance of their seeds. To address this challenge, we collected 5618 samples of white and black Penja pepper and other varieties for classification using image processing and a supervised machine learning method. We extracted 18 attributes from the images and trained them in four different models. The most successful model was the support vector machine (SVM), which achieved an accuracy of 0.87, a precision of 0.874, a recall of 0.873, and an F1-score of 0.874.
]]>Journal of Imaging doi: 10.3390/jimaging10020040
Authors: Niklas Dormagen Max Klein Andreas S. Schmitz Markus H. Thoma Mike Schwarz
Detecting micron-sized particles is an essential task for the analysis of complex plasmas because a large part of the analysis is based on the initially detected positions of the particles. Accordingly, high accuracy in particle detection is desirable. Previous studies have shown that machine learning algorithms have made great progress and outperformed classical approaches. This work presents an approach for tracking micron-sized particles in a dense cloud of particles in a dusty plasma at Plasmakristall-Experiment 4 using a U-Net. The U-net is a convolutional network architecture for the fast and precise segmentation of images that was developed at the Computer Science Department of the University of Freiburg. The U-Net architecture, with its intricate design and skip connections, has been a powerhouse in achieving precise object delineation. However, as experiments are to be conducted in resource-constrained environments, such as parabolic flights, preferably with real-time applications, there is growing interest in exploring less complex U-net architectures that balance efficiency and effectiveness. We compare the full-size neural network, three optimized neural networks, the well-known StarDist and trackpy, in terms of accuracy in artificial data analysis. Finally, we determine which of the compact U-net architectures provides the best balance between efficiency and effectiveness. We also apply the full-size neural network and the the most effective compact network to the data of the PK-4 experiment. The experimental data were generated under laboratory conditions.
]]>Journal of Imaging doi: 10.3390/jimaging10020039
Authors: Dimitri B. A. Mantovani Milena S. Pitombeira Phelipi N. Schuck Adriel S. de Araújo Carlos Alberto Buchpiguel Daniele de Paula Faria Ana Maria M. da Silva
This study aims to evaluate non-invasive PET quantification methods for (R)-[11C]PK11195 uptake measurement in multiple sclerosis (MS) patients and healthy controls (HC) in comparison with arterial input function (AIF) using dynamic (R)-[11C]PK11195 PET and magnetic resonance images. The total volume of distribution (VT) and distribution volume ratio (DVR) were measured in the gray matter, white matter, caudate nucleus, putamen, pallidum, thalamus, cerebellum, and brainstem using AIF, the image-derived input function (IDIF) from the carotid arteries, and pseudo-reference regions from supervised clustering analysis (SVCA). Uptake differences between MS and HC groups were tested using statistical tests adjusted for age and sex, and correlations between the results from the different quantification methods were also analyzed. Significant DVR differences were observed in the gray matter, white matter, putamen, pallidum, thalamus, and brainstem of MS patients when compared to the HC group. Also, strong correlations were found in DVR values between non-invasive methods and AIF (0.928 for IDIF and 0.975 for SVCA, p < 0.0001). On the other hand, (R)-[11C]PK11195 uptake could not be differentiated between MS patients and HC using VT values, and a weak correlation (0.356, p < 0.0001) was found between VTAIF and VTIDIF. Our study shows that the best alternative for AIF is using SVCA for reference region modeling, in addition to a cautious and appropriate methodology.
]]>Journal of Imaging doi: 10.3390/jimaging10020038
Authors: Jan Stepanek Juan M. Farina Ahmed K. Mahmoud Chieh-Ju Chao Said Alsidawi Chadi Ayoub Timothy Barry Milagros Pereyra Isabel G. Scalia Mohammed Tiseer Abbas Rachel E. Wraith Lisa S. Brown Michael S. Radavich Pamela J. Curtisi Patricia C. Hartzendorf Elizabeth M. Lasota Kyley N. Umetsu Jill M. Peterson Kristin E. Karlson Karen Breznak David F. Fortuin Steven J. Lester Reza Arsanjani
Exposure to high altitude results in hypobaric hypoxia, leading to physiological changes in the cardiovascular system that may result in limiting symptoms, including dyspnea, fatigue, and exercise intolerance. However, it is still unclear why some patients are more susceptible to high-altitude symptoms than others. Hypoxic simulation testing (HST) simulates changes in physiology that occur at a specific altitude by asking the patients to breathe a mixture of gases with decreased oxygen content. This study aimed to determine whether the use of transthoracic echocardiography (TTE) during HST can detect the rise in right-sided pressures and the impact of hypoxia on right ventricle (RV) hemodynamics and right to left shunts, thus revealing the underlying causes of high-altitude signs and symptoms. A retrospective study was performed including consecutive patients with unexplained dyspnea at high altitude. HSTs were performed by administrating reduced FiO2 to simulate altitude levels specific to patients’ history. Echocardiography images were obtained at baseline and during hypoxia. The study included 27 patients, with a mean age of 65 years, 14 patients (51.9%) were female. RV systolic pressure increased at peak hypoxia, while RV systolic function declined as shown by a significant decrease in the tricuspid annular plane systolic excursion (TAPSE), the maximum velocity achieved by the lateral tricuspid annulus during systole (S’ wave), and the RV free wall longitudinal strain. Additionally, right-to-left shunt was present in 19 (70.4%) patients as identified by bubble contrast injections. Among these, the severity of the shunt increased at peak hypoxia in eight cases (42.1%), and the shunt was only evident during hypoxia in seven patients (36.8%). In conclusion, the use of TTE during HST provides valuable information by revealing the presence of symptomatic, sustained shunts and confirming the decline in RV hemodynamics, thus potentially explaining dyspnea at high altitude. Further studies are needed to establish the optimal clinical role of this physiologic method.
]]>Journal of Imaging doi: 10.3390/jimaging10020037
Authors: Lianne Feenstra Stefan D. van der Stel Marcos Da Silva Guimaraes Behdad Dashtbozorg Theo J. M. Ruers
The validation of newly developed optical tissue-sensing techniques for tumor detection during cancer surgery requires an accurate correlation with the histological results. Additionally, such an accurate correlation facilitates precise data labeling for developing high-performance machine learning tissue-classification models. In this paper, a newly developed Point Projection Mapping system will be introduced, which allows non-destructive tracking of the measurement locations on tissue specimens. Additionally, a framework for accurate registration, validation, and labeling with the histopathology results is proposed and validated on a case study. The proposed framework provides a more-robust and accurate method for the tracking and validation of optical tissue-sensing techniques, which saves time and resources compared to the available conventional techniques.
]]>Journal of Imaging doi: 10.3390/jimaging10020036
Authors: Ichiro Kuriki Kazuki Sato Satoshi Shioiri
Head-mounted displays (HMDs) are becoming more and more popular as a device for displaying a virtual reality space, but how real are they? The present study attempted to quantitatively evaluate the degree of reality achieved with HMDs by using a perceptual phenomenon as a measure. Lightness constancy is an ability that is present in human visual perception, in which the perceived reflectance (i.e., the lightness) of objects appears to stay constant across illuminant changes. Studies on color/lightness constancy in humans have shown that the degree of constancy is high, in general, when real objects are used as stimuli. We asked participants to make lightness matches between two virtual environments with different illuminant intensities, as presented in an HMD. The participants’ matches showed a high degree of lightness constancy in the HMD; our results marked no less than 74.2% (84.8% at the maximum) in terms of the constancy index, whereas the average score on the computer screen was around 65%. The effect of head-tracking ability was confirmed by disabling that function, and the result showed a significant drop in the constancy index but that it was equally effective when the virtual environment was generated by replay motions. HMDs yield a realistic environment, with the extension of the visual scene being accompanied by head motions.
]]>Journal of Imaging doi: 10.3390/jimaging10020035
Authors: Vlad-Octavian Bolocan Mihaela Secareanu Elena Sava Cosmin Medar Loredana Sabina Cornelia Manolescu Alexandru-Ștefan Cătălin Rașcu Maria Glencora Costache George Daniel Radavoi Robert-Andrei Dobran Viorel Jinga
In the original publication [...]
]]>Journal of Imaging doi: 10.3390/jimaging10020034
Authors: Dimitris Kaimaris
In the context of producing a digital surface model (DSM) and an orthophotomosaic of a study area, a modern Unmanned Aerial System (UAS) allows us to reduce the time required both for primary data collection in the field and for data processing in the office. It features sophisticated sensors and systems, is easy to use and its products come with excellent horizontal and vertical accuracy. In this study, the UAS WingtraOne GEN II with RGB sensor (42 Mpixel), multispectral (MS) sensor (1.2 Mpixel) and built-in multi-frequency PPK GNSS antenna (for the high accuracy calculation of the coordinates of the centers of the received images) is used. The first objective is to test and compare the accuracy of the DSMs and orthophotomosaics generated from the UAS RGB sensor images when image processing is performed using only the PPK system measurements (without Ground Control Points (GCPs)), or when processing is performed using only GCPs. For this purpose, 20 GCPs and 20 Check Points (CPs) were measured in the field. The results show that the horizontal accuracy of orthophotomosaics is similar in both processing cases. The vertical accuracy is better in the case of image processing using only the GCPs, but that is subject to change, as the survey was only conducted at one location. The second objective is to perform image fusion using the images of the above two UAS sensors and to control the spectral information transferred from the MS to the fused images. The study was carried out at three archaeological sites (Northern Greece). The combined study of the correlation matrix and the ERGAS index value at each location reveals that the process of improving the spatial resolution of MS orthophotomosaics leads to suitable fused images for classification, and therefore image fusion can be performed by utilizing the images from the two sensors.
]]>Journal of Imaging doi: 10.3390/jimaging10020033
Authors: Philipp Schippers Gundula Rösch Rebecca Sohn Matthias Holzapfel Marius Junker Anna E. Rapp Zsuzsa Jenei-Lanzl Philipp Drees Frank Zaucke Andrea Meurer
Collaborative manual image analysis by multiple experts in different locations is an essential workflow in biomedical science. However, sharing the images and writing down results by hand or merging results from separate spreadsheets can be error-prone. Moreover, blinding and anonymization are essential to address subjectivity and bias. Here, we propose a new workflow for collaborative image analysis using a lightweight online tool named Tyche. The new workflow allows experts to access images via temporarily valid URLs and analyze them blind in a random order inside a web browser with the means to store the results in the same window. The results are then immediately computed and visible to the project master. The new workflow could be used for multi-center studies, inter- and intraobserver studies, and score validations.
]]>Journal of Imaging doi: 10.3390/jimaging10020032
Authors: Tibor Sloboda Lukáš Hudec Matej Halinkovič Wanda Benesova
Histological staining is the primary method for confirming cancer diagnoses, but certain types, such as p63 staining, can be expensive and potentially damaging to tissues. In our research, we innovate by generating p63-stained images from H&E-stained slides for metaplastic breast cancer. This is a crucial development, considering the high costs and tissue risks associated with direct p63 staining. Our approach employs an advanced CycleGAN architecture, xAI-CycleGAN, enhanced with context-based loss to maintain structural integrity. The inclusion of convolutional attention in our model distinguishes between structural and color details more effectively, thus significantly enhancing the visual quality of the results. This approach shows a marked improvement over the base xAI-CycleGAN and standard CycleGAN models, offering the benefits of a more compact network and faster training even with the inclusion of attention.
]]>Journal of Imaging doi: 10.3390/jimaging10020031
Authors: Chijioke Emeka Nwokeji Akbar Sheikh-Akbari Anatoliy Gorbenko Iosif Mporas
The successful investigation and prosecution of significant crimes, including child pornography, insurance fraud, movie piracy, traffic monitoring, and scientific fraud, hinge largely on the availability of solid evidence to establish the case beyond any reasonable doubt. When dealing with digital images/videos as evidence in such investigations, there is a critical need to conclusively prove the source camera/device of the questioned image. Extensive research has been conducted in the past decade to address this requirement, resulting in various methods categorized into brand, model, or individual image source camera identification techniques. This paper presents a survey of all those existing methods found in the literature. It thoroughly examines the efficacy of these existing techniques for identifying the source camera of images, utilizing both intrinsic hardware artifacts such as sensor pattern noise and lens optical distortion, and software artifacts like color filter array and auto white balancing. The investigation aims to discern the strengths and weaknesses of these techniques. The paper provides publicly available benchmark image datasets and assessment criteria used to measure the performance of those different methods, facilitating a comprehensive comparison of existing approaches. In conclusion, the paper outlines directions for future research in the field of source camera identification.
]]>Journal of Imaging doi: 10.3390/jimaging10020030
Authors: Khadija Aguerchi Younes Jabrane Maryam Habba Amir Hajjam El Hassani
Breast cancer is considered one of the most-common types of cancers among females in the world, with a high mortality rate. Medical imaging is still one of the most-reliable tools to detect breast cancer. Unfortunately, manual image detection takes much time. This paper proposes a new deep learning method based on Convolutional Neural Networks (CNNs). Convolutional Neural Networks are widely used for image classification. However, the determination process for accurate hyperparameters and architectures is still a challenging task. In this work, a highly accurate CNN model to detect breast cancer by mammography was developed. The proposed method is based on the Particle Swarm Optimization (PSO) algorithm in order to look for suitable hyperparameters and the architecture for the CNN model. The CNN model using PSO achieved success rates of 98.23% and 97.98% on the DDSM and MIAS datasets, respectively. The experimental results proved that the proposed CNN model gave the best accuracy values in comparison with other studies in the field. As a result, CNN models for mammography classification can now be created automatically. The proposed method can be considered as a powerful technique for breast cancer prediction.
]]>Journal of Imaging doi: 10.3390/jimaging10020029
Authors: Daniel Meneveaux Gianmarco Cherchi
This special issue on geometry reconstruction from images has received much attention from the community, with 10 published papers [...]
]]>Journal of Imaging doi: 10.3390/jimaging10010028
Authors: Shiva Moghtaderi Omid Yaghoobian Khan A. Wahid Kiven Erique Lukong
Endoscopies are helpful for examining internal organs, including the gastrointestinal tract. The endoscope device consists of a flexible tube to which a camera and light source are attached. The diagnostic process heavily depends on the quality of the endoscopic images. That is why the visual quality of endoscopic images has a significant effect on patient care, medical decision-making, and the efficiency of endoscopic treatments. In this study, we propose an endoscopic image enhancement technique based on image fusion. Our method aims to improve the visual quality of endoscopic images by first generating multiple sub images from the single input image which are complementary to one another in terms of local and global contrast. Then, each sub layer is subjected to a novel wavelet transform and guided filter-based decomposition technique. To generate the final improved image, appropriate fusion rules are utilized at the end. A set of upper gastrointestinal tract endoscopic images were put to the test in studies to confirm the efficacy of our strategy. Both qualitative and quantitative analyses show that the proposed framework performs better than some of the state-of-the-art algorithms.
]]>Journal of Imaging doi: 10.3390/jimaging10010027
Authors: Joanna Gawel Zbigniew Rogulski
The aim of this article is to review the single photon emission computed tomography (SPECT) segmentation methods used in patient-specific dosimetry of 177Lu molecular therapy. Notably, 177Lu-labelled radiopharmaceuticals are currently used in molecular therapy of metastatic neuroendocrine tumours (ligands for somatostatin receptors) and metastatic prostate adenocarcinomas (PSMA ligands). The proper segmentation of the organs at risk and tumours in targeted radionuclide therapy is an important part of the optimisation process of internal patient dosimetry in this kind of therapy. Because this is the first step in dosimetry assessments, on which further dose calculations are based, it is important to know the level of uncertainty that is associated with this part of the analysis. However, the robust quantification of SPECT images, which would ensure accurate dosimetry assessments, is very hard to achieve due to the intrinsic features of this device. In this article, papers on this topic were collected and reviewed to weigh up the advantages and disadvantages of the segmentation methods used in clinical practice. Degrading factors of SPECT images were also studied to assess their impact on the quantification of 177Lu therapy images. Our review of the recent literature gives an insight into this important topic. However, based on the PubMed and IEEE databases, only a few papers investigating segmentation methods in 177Lumolecular therapy were found. Although segmentation is an important step in internal dose calculations, this subject has been relatively lightly investigated for SPECT systems. This is mostly due to the inner features of SPECT. What is more, even when studies are conducted, they usually utilise the diagnostic radionuclide 99mTc and not a therapeutic one like 177Lu, which could be of concern regarding SPECT camera performance and its overall outcome on dosimetry.
]]>Journal of Imaging doi: 10.3390/jimaging10010026
Authors: Stacy A. Doore David Istrati Chenchang Xu Yixuan Qiu Anais Sarrazin Nicholas A. Giudice
The lack of accessible information conveyed by descriptions of art images presents significant barriers for people with blindness and low vision (BLV) to engage with visual artwork. Most museums are not able to easily provide accessible image descriptions for BLV visitors to build a mental representation of artwork due to vastness of collections, limitations of curator training, and current measures for what constitutes effective automated captions. This paper reports on the results of two studies investigating the types of information that should be included to provide high-quality accessible artwork descriptions based on input from BLV description evaluators. We report on: (1) a qualitative study asking BLV participants for their preferences for layered description characteristics; and (2) an evaluation of several current models for image captioning as applied to an artwork image dataset. We then provide recommendations for researchers working on accessible image captioning and museum engagement applications through a focus on spatial information access strategies.
]]>Journal of Imaging doi: 10.3390/jimaging10010025
Authors: Olga Cherepkova Seyed Ali Amirshahi Marius Pedersen
This paper is an investigation in the field of personalized image quality assessment with the focus of studying individual contrast preferences for natural images. To achieve this objective, we conducted an in-lab experiment with 22 observers who assessed 499 natural images and collected their contrast level preferences. We used a three-alternative forced choice comparison approach coupled with a modified adaptive staircase algorithm to dynamically adjust the contrast for each new triplet. Through cluster analysis, we clustered observers into three groups based on their preferred contrast ranges: low contrast, natural contrast, and high contrast. This finding demonstrates the existence of individual variations in contrast preferences among observers. To facilitate further research in the field of personalized image quality assessment, we have created a database containing 10,978 original contrast level values preferred by observers, which is publicly available online.
]]>Journal of Imaging doi: 10.3390/jimaging10010024
Authors: Zejin Zhang Tao Wang Jian Wang Yao Sun
Higher standards have been proposed for detection systems since camouflaged objects are not distinct enough, making it possible to ignore the difference between their background and foreground. In this paper, we present a new framework for Camouflaged Object Detection (COD) named FSANet, which consists mainly of three operations: spatial detail mining (SDM), cross-scale feature combination (CFC), and hierarchical feature aggregation decoder (HFAD). The framework simulates the three-stage detection process of the human visual mechanism when observing a camouflaged scene. Specifically, we have extracted five feature layers using the backbone and divided them into two parts with the second layer as the boundary. The SDM module simulates the human cursory inspection of the camouflaged objects to gather spatial details (such as edge, texture, etc.) and fuses the features to create a cursory impression. The CFC module is used to observe high-level features from various viewing angles and extracts the same features by thoroughly filtering features of various levels. We also design side-join multiplication in the CFC module to avoid detail distortion and use feature element-wise multiplication to filter out noise. Finally, we construct an HFAD module to deeply mine effective features from these two stages, direct the fusion of low-level features using high-level semantic knowledge, and improve the camouflage map using hierarchical cascade technology. Compared to the nineteen deep-learning-based methods in terms of seven widely used metrics, our proposed framework has clear advantages on four public COD datasets, demonstrating the effectiveness and superiority of our model.
]]>Journal of Imaging doi: 10.3390/jimaging10010023
Authors: Reece Walsh Islam Osman Omar Abdelaziz Mohamed S. Shehata
Few-shot learning aims to identify unseen classes with limited labelled data. Recent few-shot learning techniques have shown success in generalizing to unseen classes; however, the performance of these techniques has also been shown to degrade when tested on an out-of-domain setting. Previous work, additionally, has also demonstrated increasing reliance on supervised finetuning in an off-line or online capacity. This paper proposes a novel, fully self-supervised few-shot learning technique (FSS) that utilizes a vision transformer and masked autoencoder. The proposed technique can generalize to out-of-domain classes by finetuning the model in a fully self-supervised method for each episode. We evaluate the proposed technique using three datasets (all out-of-domain). As such, our results show that FSS has an accuracy gain of 1.05%, 0.12%, and 1.28% on the ISIC, EuroSat, and BCCD datasets, respectively, without the use of supervised training.
]]>Journal of Imaging doi: 10.3390/jimaging10010022
Authors: Purnomo Sidi Priambodo Toto Aminoto Basari Basari
Human body tissue disease diagnosis will become more accurate if transmittance images, such as X-ray images, are separated according to each constituent tissue. This research proposes a new image decomposition technique based on the matrix inverse method for biological tissue images. The fundamental idea of this research is based on the fact that when k different monochromatic lights penetrate a biological tissue, they will experience different attenuation coefficients. Furthermore, the same happens when monochromatic light penetrates k different biological tissues, as they will also experience different attenuation coefficients. The various attenuation coefficients are arranged into a unique k×k-dimensional square matrix. k-many images taken by k-many different monochromatic lights are then merged into an image vector entity; further, a matrix inverse operation is performed on the merged image, producing N-many tissue thickness images of the constituent tissues. This research demonstrates that the proposed method effectively decomposes images of biological objects into separate images, each showing the thickness distributions of different constituent tissues. In the future, this proposed new technique is expected to contribute to supporting medical imaging analysis.
]]>Journal of Imaging doi: 10.3390/jimaging10010021
Authors: Xuehai Zhang Wenbo Zhou Kunlin Liu Hao Tang Zhenyu Zhang Weiming Zhang Nenghai Yu
Face swapping is an intriguing and intricate task in the field of computer vision. Currently, most mainstream face swapping methods employ face recognition models to extract identity features and inject them into the generation process. Nonetheless, such methods often struggle to effectively transfer identity information, which leads to generated results failing to achieve a high identity similarity to the source face. Furthermore, if we can accurately disentangle identity information, we can achieve controllable face swapping, thereby providing more choices to users. In pursuit of this goal, we propose a new face swapping framework (ControlFace) based on the disentanglement of identity information. We disentangle the structure and texture of the source face, encoding and characterizing them in the form of feature embeddings separately. According to the semantic level of each feature representation, we inject them into the corresponding feature mapper and fuse them adequately in the latent space of StyleGAN. Owing to such disentanglement of structure and texture, we are able to controllably transfer parts of the identity features. Extensive experiments and comparisons with state-of-the-art face swapping methods demonstrate the superiority of our face swapping framework in terms of transferring identity information, producing high-quality face images, and controllable face swapping.
]]>Journal of Imaging doi: 10.3390/jimaging10010020
Authors: Parvaneh Aliniya Mircea Nicolescu Monica Nicolescu George Bebis
Mass segmentation is one of the fundamental tasks used when identifying breast cancer due to the comprehensive information it provides, including the location, size, and border of the masses. Despite significant improvement in the performance of the task, certain properties of the data, such as pixel class imbalance and the diverse appearance and sizes of masses, remain challenging. Recently, there has been a surge in articles proposing to address pixel class imbalance through the formulation of the loss function. While demonstrating an enhancement in performance, they mostly fail to address the problem comprehensively. In this paper, we propose a new perspective on the calculation of the loss that enables the binary segmentation loss to incorporate the sample-level information and region-level losses in a hybrid loss setting. We propose two variations of the loss to include mass size and density in the loss calculation. Also, we introduce a single loss variant using the idea of utilizing mass size and density to enhance focal loss. We tested the proposed method on benchmark datasets: CBIS-DDSM and INbreast. Our approach outperformed the baseline and state-of-the-art methods on both datasets.
]]>Journal of Imaging doi: 10.3390/jimaging10010019
Authors: Kacoutchy Jean Ayikpa Pierre Gouton Diarra Mamadou Abou Bakary Ballo
The quality of cocoa beans is crucial in influencing the taste, aroma, and texture of chocolate and consumer satisfaction. High-quality cocoa beans are valued on the international market, benefiting Ivorian producers. Our study uses advanced techniques to evaluate and classify cocoa beans by analyzing spectral measurements, integrating machine learning algorithms, and optimizing parameters through genetic algorithms. The results highlight the critical importance of parameter optimization for optimal performance. Logistic regression, support vector machines (SVM), and random forest algorithms demonstrate a consistent performance. XGBoost shows improvements in the second generation, followed by a slight decrease in the fifth. On the other hand, the performance of AdaBoost is not satisfactory in generations two and five. The results are presented on three levels: first, using all parameters reveals that logistic regression obtains the best performance with a precision of 83.78%. Then, the results of the parameters selected in the second generation still show the logistic regression with the best precision of 84.71%. Finally, the results of the parameters chosen in the second generation place random forest in the lead with a score of 74.12%.
]]>Journal of Imaging doi: 10.3390/jimaging10010018
Authors: Wissam AlKendi Franck Gechter Laurent Heyberger Christophe Guyeux
Handwritten Text Recognition (HTR) is essential for digitizing historical documents in different kinds of archives. In this study, we introduce a hybrid form archive written in French: the Belfort civil registers of births. The digitization of these historical documents is challenging due to their unique characteristics such as writing style variations, overlapped characters and words, and marginal annotations. The objective of this survey paper is to summarize research on handwritten text documents and provide research directions toward effectively transcribing this French dataset. To achieve this goal, we presented a brief survey of several modern and historical HTR offline systems of different international languages, and the top state-of-the-art contributions reported of the French language specifically. The survey classifies the HTR systems based on techniques employed, datasets used, publication years, and the level of recognition. Furthermore, an analysis of the systems’ accuracies is presented, highlighting the best-performing approach. We have also showcased the performance of some HTR commercial systems. In addition, this paper presents a summarization of the HTR datasets that publicly available, especially those identified as benchmark datasets in the International Conference on Document Analysis and Recognition (ICDAR) and the International Conference on Frontiers in Handwriting Recognition (ICFHR) competitions. This paper, therefore, presents updated state-of-the-art research in HTR and highlights new directions in the research field.
]]>Journal of Imaging doi: 10.3390/jimaging10010017
Authors: Sean S. Healy Carl N. Stephan
When an unidentified skeleton is discovered, a video superimposition (VS) of the skull and a facial photograph may be undertaken to assist identification. In the first instance, the method is fundamentally a photographic one, requiring the overlay of two 2D photographic images at transparency for comparison. Presently, mathematical and anatomical techniques used to compare skull/face anatomy dominate superimposition discussions, however, little attention has been paid to the equally fundamental photographic prerequisites that underpin these methods. This predisposes error, as the optical parameters of the two comparison photographs are (presently) rarely matched prior to, or for, comparison. In this paper, we: (1) review the basic but critical photographic prerequisites that apply to VS; (2) propose a replacement for the current anatomy-centric searches for the correct ‘skull pose’ with a photographic-centric camera vantage point search; and (3) demarcate superimposition as a clear two-stage phased procedure that depends first on photographic parameter matching, as a prerequisite to undertaking any anatomical comparison(s).
]]>Journal of Imaging doi: 10.3390/jimaging10010016
Authors: Leon Eversberg Jens Lambrecht
Generating synthetic data is a promising solution to the challenge of limited training data for industrial deep learning applications. However, training on synthetic data and testing on real-world data creates a sim-to-real domain gap. Research has shown that the combination of synthetic and real images leads to better results than those that are generated using only one source of data. In this work, the generation of synthetic training images via physics-based rendering is combined with deep active learning for an industrial object detection task to iteratively improve model performance over time. Our experimental results show that synthetic images improve model performance, especially at the beginning of the model’s life cycle with limited training data. Furthermore, our implemented hybrid query strategy selects diverse and informative new training images in each active learning cycle, which outperforms random sampling. In conclusion, this work presents a workflow to train and iteratively improve object detection models with a small number of real-world images, leading to data-efficient and cost-effective computer vision models.
]]>Journal of Imaging doi: 10.3390/jimaging10010015
Authors: Spiros Papadopoulos Georgia Koukiou Vassilis Anastassopoulos
According to existing signatures for various kinds of land cover coming from different spectral bands, i.e., optical, thermal infrared and PolSAR, it is possible to infer about the land cover type having a single decision from each of the spectral bands. Fusing these decisions, it is possible to radically improve the reliability of the decision regarding each pixel, taking into consideration the correlation of the individual decisions of the specific pixel as well as additional information transferred from the pixels’ neighborhood. Different remotely sensed data contribute their own information regarding the characteristics of the materials lying in each separate pixel. Hyperspectral and multispectral images give analytic information regarding the reflectance of each pixel in a very detailed manner. Thermal infrared images give valuable information regarding the temperature of the surface covered by each pixel, which is very important for recording thermal locations in urban regions. Finally, SAR data provide structural and electrical characteristics of each pixel. Combining information from some of these sources further improves the capability for reliable categorization of each pixel. The necessary mathematical background regarding pixel-based classification and decision fusion methods is analytically presented.
]]>Journal of Imaging doi: 10.3390/jimaging10010014
Authors: Tamás Molnár Géza Király
Forest damage has become more frequent in Hungary in the last decades, and remote sensing offers a powerful tool for monitoring them rapidly and cost-effectively. A combined approach was developed to utilise high-resolution ESA Sentinel-2 satellite imagery and Google Earth Engine cloud computing and field-based forest inventory data. Maps and charts were derived from vegetation indices (NDVI and Z∙NDVI) of satellite images to detect forest disturbances in the Hungarian study site for the period of 2017–2020. The NDVI maps were classified to reveal forest disturbances, and the cloud-based method successfully showed drought and frost damage in the oak-dominated Nagyerdő forest of Debrecen. Differences in the reactions to damage between tree species were visible on the index maps; therefore, a random forest machine learning classifier was applied to show the spatial distribution of dominant species. An accuracy assessment was accomplished with confusion matrices that compared classified index maps to field-surveyed data, demonstrating 99.1% producer, 71% user, and 71% total accuracies for forest damage and 81.9% for tree species. Based on the results of this study and the resilience of Google Earth Engine, the presented method has the potential to be extended to monitor all of Hungary in a faster, more accurate way using systematically collected field-data, the latest satellite imagery, and artificial intelligence.
]]>Journal of Imaging doi: 10.3390/jimaging10010013
Authors: Shion Ando Ping Yeap Loh
Ultrasound imaging has been used to investigate compression of the median nerve in carpal tunnel syndrome patients. Ultrasound imaging and the extraction of median nerve parameters from ultrasound images are crucial and are usually performed manually by experts. The manual annotation of ultrasound images relies on experience, and intra- and interrater reliability may vary among studies. In this study, two types of convolutional neural networks (CNNs), U-Net and SegNet, were used to extract the median nerve morphology. To the best of our knowledge, the application of these methods to ultrasound imaging of the median nerve has not yet been investigated. Spearman’s correlation and Bland–Altman analyses were performed to investigate the correlation and agreement between manual annotation and CNN estimation, namely, the cross-sectional area, circumference, and diameter of the median nerve. The results showed that the intersection over union (IoU) of U-Net (0.717) was greater than that of SegNet (0.625). A few images in SegNet had an IoU below 0.6, decreasing the average IoU. In both models, the IoU decreased when the median nerve was elongated longitudinally with a blurred outline. The Bland–Altman analysis revealed that, in general, both the U-Net- and SegNet-estimated measurements showed 95% limits of agreement with manual annotation. These results show that these CNN models are promising tools for median nerve ultrasound imaging analysis.
]]>Journal of Imaging doi: 10.3390/jimaging10010012
Authors: Keval Thaker Sumanth Chennupati Nathir Rawashdeh Samir A. Rawashdeh
Despite significant strides in achieving vehicle autonomy, robust perception under low-light conditions still remains a persistent challenge. In this study, we investigate the potential of multispectral imaging, thereby leveraging deep learning models to enhance object detection performance in the context of nighttime driving. Features encoded from the red, green, and blue (RGB) visual spectrum and thermal infrared images are combined to implement a multispectral object detection model. This has proven to be more effective compared to using visual channels only, as thermal images provide complementary information when discriminating objects in low-illumination conditions. Additionally, there is a lack of studies on effectively fusing these two modalities for optimal object detection performance. In this work, we present a framework based on the Faster R-CNN architecture with a feature pyramid network. Moreover, we design various fusion approaches using concatenation and addition operators at varying stages of the network to analyze their impact on object detection performance. Our experimental results on the KAIST and FLIR datasets show that our framework outperforms the baseline experiments of the unimodal input source and the existing multispectral object detectors.
]]>Journal of Imaging doi: 10.3390/jimaging10010011
Authors: Vladimir O. Alekseychuk Andreas Kupsch David Plotzki Carsten Bellon Giovanni Bruno
This study reports a strategy to use sophisticated, realistic X-ray Computed Tomography (CT) simulations to reduce Missing Wedge (MW) and Region-of-Interest (RoI) artifacts in FBP (Filtered Back-Projection) reconstructions. A 3D model of the object is used to simulate the projections that include the missing information inside the MW and outside the RoI. Such information augments the experimental projections, thereby drastically improving the reconstruction results. An X-ray CT dataset of a selected object is modified to mimic various degrees of RoI and MW problems. The results are evaluated in comparison to a standard FBP reconstruction of the complete dataset. In all cases, the reconstruction quality is significantly improved. Small inclusions present in the scanned object are better localized and quantified. The proposed method has the potential to improve the results of any CT reconstruction algorithm.
]]>Journal of Imaging doi: 10.3390/jimaging10010010
Authors: Donatela Šarić Aditya Suneel Sole
The appearance of a surface depends on four main appearance attributes, namely color, gloss, texture, and translucency. Gloss is an important attribute that people use to understand surface appearance, right after color. In the past decades, extensive research has been conducted in the field of gloss and gloss perception, with different aims to understand the complex nature of gloss appearance. This paper reviews the research conducted on the topic of gloss and gloss perception and discusses the results and potential future research on gloss and gloss perception. Our primary focus in this review is on research in the field of gloss and the setup of associated psychophysical experiments. However, due to the industrial and application-oriented nature of this review, the primary focus is the gloss of dielectric materials, a critical aspect in various industries. This review not only summarizes the existing research but also highlights potential avenues for future research in the pursuit of a more comprehensive understanding of gloss perception.
]]>Journal of Imaging doi: 10.3390/jimaging10010009
Authors: Rodolfo Reda Dario Di Nardo Alessio Zanza Valentina Bellanova Rosemary Abbagnale Francesco Pagnoni Maurilio D’Angelo Ajinkya M. Pawar Massimo Galli Luca Testarelli
(1) The possibility of knowing information about the anatomy in advance, in particular the arrangement of the endodontic system, is crucial for successful treatment and for avoiding complications during endodontic therapy; the aim was to find a correlation between a minimally invasive and less stressful endodontic access on Ni-Ti rotary instruments, but which allows correct vision and identification of anatomical reference points, simplifying the typologies based on the shape of the pulp chamber in coronal three-dimensional exam views. (2) Based on the inclusion criteria, 104 maxillary molars (52 maxillary first molars and 52 maxillary second molars) were included in the study after 26 Cone Beam Computed Tomography (CBCT) acquisitions (from 15 males and 11 females). And linear measurements were taken with the CBCT-dedicated software for subsequent analysis. (3) The results of the present study show data similar to those already published about this topic. Pawar and Singh’s simplified classification actually seems to offer a schematic way of classification that includes almost all of the cases that have been analyzed. (4) The use of a diagnostic examination with a wide Field of View (FOV) and low radiation dose represents an exam capable of obtaining a lot of clinical information for endodontic treatment. Nevertheless, the endodontic anatomy of the upper second molar represents a major challenge for the clinician due to its complexity both in canal shape and in ramification.
]]>Journal of Imaging doi: 10.3390/jimaging10010008
Authors: Gigi Tăbăcaru Simona Moldovanu Elena Răducan Marian Barbu
Ensemble learning is a process that belongs to the artificial intelligence (AI) field. It helps to choose a robust machine learning (ML) model, usually used for data classification. AI has a large connection with image processing and feature classification, and it can also be successfully applied to analyzing fundus eye images. Diabetic retinopathy (DR) is a disease that can cause vision loss and blindness, which, from an imaging point of view, can be shown when screening the eyes. Image processing tools can analyze and extract the features from fundus eye images, and these corroborate with ML classifiers that can perform their classification among different disease classes. The outcomes integrated into automated diagnostic systems can be a real success for physicians and patients. In this study, in the form image processing area, the manipulation of the contrast with the gamma correction parameter was applied because DR affects the blood vessels, and the structure of the eyes becomes disorderly. Therefore, the analysis of the texture with two types of entropies was necessary. Shannon and fuzzy entropies and contrast manipulation led to ten original features used in the classification process. The machine learning library PyCaret performs complex tasks, and the empirical process shows that of the fifteen classifiers, the gradient boosting classifier (GBC) provides the best results. Indeed, the proposed model can classify the DR degrees as normal or severe, achieving an accuracy of 0.929, an F1 score of 0.902, and an area under the curve (AUC) of 0.941. The validation of the selected model with a bootstrap statistical technique was performed. The novelty of the study consists of the extraction of features from preprocessed fundus eye images, their classification, and the manipulation of the contrast in a controlled way.
]]>Journal of Imaging doi: 10.3390/jimaging10010007
Authors: Théo Barrios Stéphanie Prévost Céline Loscos
In the last decade, many neural network algorithms have been proposed to solve depth reconstruction. Our focus is on reconstruction from images captured by multi-camera arrays which are a grid of vertically and horizontally aligned cameras that are uniformly spaced. Training these networks using supervised learning requires data with ground truth. Existing datasets are simulating specific configurations. For example, they represent a fixed-size camera array or a fixed space between cameras. When the distance between cameras is small, the array is said to be with a short baseline. Light-field cameras, with a baseline of less than a centimeter, are for instance in this category. On the contrary, an array with large space between cameras is said to be of a wide baseline. In this paper, we present a purely virtual data generator to create large training datasets: this generator can adapt to any camera array configuration. Parameters are for instance the size (number of cameras) and the distance between two cameras. The generator creates virtual scenes by randomly selecting objects and textures and following user-defined parameters like the disparity range or image parameters (resolution, color space). Generated data are used only for the learning phase. They are unrealistic but can present concrete challenges for disparity reconstruction such as thin elements and the random assignment of textures to objects to avoid color bias. Our experiments focus on wide-baseline configuration which requires more datasets. We validate the generator by testing the generated datasets with known deep-learning approaches as well as depth reconstruction algorithms in order to validate them. The validation experiments have proven successful.
]]>Journal of Imaging doi: 10.3390/jimaging10010006
Authors: Ana Nunes Pedro Serranho Pedro Guimarães João Ferreira Miguel Castelo-Branco Rui Bernardes
Background: Retinal texture has gained momentum as a source of biomarkers of neurodegeneration, as it is sensitive to subtle differences in the central nervous system from texture analysis of the neuroretina. Sex differences in the retina structure, as detected by layer thickness measurements from optical coherence tomography (OCT) data, have been discussed in the literature. However, the effect of sex on retinal interocular differences in healthy adults has been overlooked and remains largely unreported. Methods: We computed mean value fundus images for the neuroretina layers as imaged by OCT of healthy individuals. Texture metrics were obtained from these images to assess whether women and men have the same retina texture characteristics in both eyes. Texture features were tested for group mean differences between the right and left eye. Results: Corrected texture differences exist only in the female group. Conclusions: This work illustrates that the differences between the right and left eyes manifest differently in females and males. This further supports the need for tight control and minute analysis in studies where interocular asymmetry may be used as a disease biomarker, and the potential of texture analysis applied to OCT imaging to spot differences in the retina.
]]>Journal of Imaging doi: 10.3390/jimaging10010005
Authors: Jing Zhang Rémi Synave Samuel Delepoulle Rémi Cozot
The composition of an image is a critical element chosen by the author to construct an image that conveys a narrative and related emotions. Other key elements include framing, lighting, and colors. Assessing classical and simple composition rules in an image, such as the well-known “rule of thirds”, has proven effective in evaluating the aesthetic quality of an image. It is widely acknowledged that composition is emphasized by the presence of leading lines. While these leading lines may not be explicitly visible in the image, they connect key points within the image and can also serve as boundaries between different areas of the image. For instance, the boundary between the sky and the ground can be considered a leading line in the image. Making the image’s composition explicit through a set of leading lines is valuable when analyzing an image or assisting in photography. To the best of our knowledge, no computational method has been proposed to trace image leading lines. We conducted user studies to assess the agreement among image experts when requesting them to draw leading lines on images. According to these studies, which demonstrate that experts concur in identifying leading lines, this paper introduces a fully automatic computational method for recovering the leading lines that underlie the image’s composition. Our method consists of two steps: firstly, based on feature detection, potential weighted leading lines are established; secondly, these weighted leading lines are grouped to generate the leading lines of the image. We evaluate our method through both subjective and objective studies, and we propose an objective metric to compare two sets of leading lines.
]]>Journal of Imaging doi: 10.3390/jimaging10010004
Authors: Hannes Mareen Louis De Neve Peter Lambert Glenn Van Wallendael
Image manipulation is easier than ever, often facilitated using accessible AI-based tools. This poses significant risks when used to disseminate disinformation, false evidence, or fraud, which highlights the need for image forgery detection and localization methods to combat this issue. While some recent detection methods demonstrate good performance, there is still a significant gap to be closed to consistently and accurately detect image manipulations in the wild. This paper aims to enhance forgery detection and localization by combining existing detection methods that complement each other. First, we analyze these methods’ complementarity, with an objective measurement of complementariness, and calculation of a target performance value using a theoretical oracle fusion. Then, we propose a novel fusion method that combines the existing methods’ outputs. The proposed fusion method is trained using a Generative Adversarial Network architecture. Our experiments demonstrate improved detection and localization performance on a variety of datasets. Although our fusion method is hindered by a lack of generalization, this is a common problem in supervised learning, and hence a motivation for future work. In conclusion, this work deepens our understanding of forgery detection methods’ complementariness and how to harmonize them. As such, we contribute to better protection against image manipulations and the battle against disinformation.
]]>Journal of Imaging doi: 10.3390/jimaging10010003
Authors: Jing Ng David Arness Ashlee Gronowski Zhonglin Qu Chng Wei Lau Daniel Catchpoole Quang Vinh Nguyen
Biomedical datasets are usually large and complex, containing biological information about a disease. Computational analytics and the interactive visualisation of such data are essential decision-making tools for disease diagnosis and treatment. Oncology data models were observed in a virtual reality environment to analyse gene expression and clinical data from a cohort of cancer patients. The technology enables a new way to view information from the outside in (exocentric view) and the inside out (egocentric view), which is otherwise not possible on ordinary displays. This paper presents a usability study on the exocentric and egocentric views of biomedical data visualisation in virtual reality and their impact on usability on human behaviour and perception. Our study revealed that the performance time was faster in the exocentric view than in the egocentric view. The exocentric view also received higher ease-of-use scores than the egocentric view. However, the influence of usability on time performance was only evident in the egocentric view. The findings of this study could be used to guide future development and refinement of visualisation tools in virtual reality.
]]>Journal of Imaging doi: 10.3390/jimaging10010002
Authors: Guzel Khayretdinova Dominique Apprato Christian Gout
In this paper, we propose a new model for image segmentation under geometric constraints. We define the geometric constraints and we give a minimization problem leading to a variational equation. This new model based on a minimal surface makes it possible to consider many different applications from image segmentation to data approximation.
]]>Journal of Imaging doi: 10.3390/jimaging10010001
Authors: Giansalvo Gusinu Claudia Frau Giuseppe A. Trunfio Paolo Solla Leonardo Antonio Sechi
Currently, Parkinson’s Disease (PD) is diagnosed primarily based on symptoms by experts clinicians. Neuroimaging exams represent an important tool to confirm the clinical diagnosis. Among them, Brain Parenchyma Sonography (BPS) is used to evaluate the hyperechogenicity of Substantia Nigra (SN), found in more than 90% of PD patients. In this article, we exploit a new dataset of BPS images to investigate an automatic segmentation approach for SN that can increase the accuracy of the exam and its practicability in clinical routine. This study achieves state-of-the-art performance in SN segmentation of BPS images. Indeed, it is found that the modified U-Net network scores a Dice coefficient of 0.859 ± 0.037. The results presented in this study demonstrate the feasibility and usefulness of SN automatic segmentation in BPS medical images, to the point that this study can be considered as the first stage of the development of an end-to-end CAD (Computer Aided Detection) system. Furthermore, the used dataset, which will be further enriched in the future, has proven to be very effective in supporting the training of CNNs and may pave the way for future studies in the field of CAD applied to PD.
]]>Journal of Imaging doi: 10.3390/jimaging9120283
Authors: Rossana Buongiorno Giulio Del Corso Danila Germanese Leonardo Colligiani Lorenzo Python Chiara Romei Sara Colantonio
Imaging plays a key role in the clinical management of Coronavirus disease 2019 (COVID-19) as the imaging findings reflect the pathological process in the lungs. The visual analysis of High-Resolution Computed Tomography of the chest allows for the differentiation of parenchymal abnormalities of COVID-19, which are crucial to be detected and quantified in order to obtain an accurate disease stratification and prognosis. However, visual assessment and quantification represent a time-consuming task for radiologists. In this regard, tools for semi-automatic segmentation, such as those based on Convolutional Neural Networks, can facilitate the detection of pathological lesions by delineating their contour. In this work, we compared four state-of-the-art Convolutional Neural Networks based on the encoder–decoder paradigm for the binary segmentation of COVID-19 infections after training and testing them on 90 HRCT volumetric scans of patients diagnosed with COVID-19 collected from the database of the Pisa University Hospital. More precisely, we started from a basic model, the well-known UNet, then we added an attention mechanism to obtain an Attention-UNet, and finally we employed a recurrence paradigm to create a Recurrent–Residual UNet (R2-UNet). In the latter case, we also added attention gates to the decoding path of an R2-UNet, thus designing an R2-Attention UNet so as to make the feature representation and accumulation more effective. We compared them to gain understanding of both the cognitive mechanism that can lead a neural model to the best performance for this task and the good compromise between the amount of data, time, and computational resources required. We set up a five-fold cross-validation and assessed the strengths and limitations of these models by evaluating the performances in terms of Dice score, Precision, and Recall defined both on 2D images and on the entire 3D volume. From the results of the analysis, it can be concluded that Attention-UNet outperforms the other models by achieving the best performance of 81.93%, in terms of 2D Dice score, on the test set. Additionally, we conducted statistical analysis to assess the performance differences among the models. Our findings suggest that integrating the recurrence mechanism within the UNet architecture leads to a decline in the model’s effectiveness for our particular application.
]]>Journal of Imaging doi: 10.3390/jimaging9120282
Authors: Mostafa Daneshgar Rahbar Seyed Ziae Mousavi Mojab
This study proposed enhanced U-Net with GridMask (EUGNet) image augmentation techniques focused on pixel manipulation, emphasizing GridMask augmentation. This study introduces EUGNet, which incorporates GridMask augmentation to address U-Net’s limitations. EUGNet features a deep contextual encoder, residual connections, class-balancing loss, adaptive feature fusion, GridMask augmentation module, efficient implementation, and multi-modal fusion. These innovations enhance segmentation accuracy and robustness, making it well-suited for medical image analysis. The GridMask algorithm is detailed, demonstrating its distinct approach to pixel elimination, enhancing model adaptability to occlusions and local features. A comprehensive dataset of robotic surgical scenarios and instruments is used for evaluation, showcasing the framework’s robustness. Specifically, there are improvements of 1.6 percentage points in balanced accuracy for the foreground, 1.7 points in intersection over union (IoU), and 1.7 points in mean Dice similarity coefficient (DSC). These improvements are highly significant and have a substantial impact on inference speed. The inference speed, which is a critical factor in real-time applications, has seen a noteworthy reduction. It decreased from 0.163 milliseconds for the U-Net without GridMask to 0.097 milliseconds for the U-Net with GridMask.
]]>Journal of Imaging doi: 10.3390/jimaging9120281
Authors: Sahin Coskun Gokce Nur Yilmaz Federica Battisti Musaed Alhussein Saiful Islam
A three-dimensional (3D) video is a special video representation with an artificial stereoscopic vision effect that increases the depth perception of the viewers. The quality of a 3D video is generally measured based on the similarity to stereoscopic vision obtained with the human vision system (HVS). The reason for the usage of these high-cost and time-consuming subjective tests is due to the lack of an objective video Quality of Experience (QoE) evaluation method that models the HVS. In this paper, we propose a hybrid 3D-video QoE evaluation method based on spatial resolution associated with depth cues (i.e., motion information, blurriness, retinal-image size, and convergence). The proposed method successfully models the HVS by considering the 3D video parameters that directly affect depth perception, which is the most important element of stereoscopic vision. Experimental results show that the measurement of the 3D-video QoE by the proposed hybrid method outperforms the widely used existing methods. It is also found that the proposed method has a high correlation with the HVS. Consequently, the results suggest that the proposed hybrid method can be conveniently utilized for the 3D-video QoE evaluation, especially in real-time applications.
]]>Journal of Imaging doi: 10.3390/jimaging9120280
Authors: Vlad-Octavian Bolocan Mihaela Secareanu Elena Sava Cosmin Medar Loredana Sabina Cornelia Manolescu Alexandru-Ștefan Cătălin Rașcu Maria Glencora Costache George Daniel Radavoi Robert-Andrei Dobran Viorel Jinga
(1) Background: Computed tomography (CT) imaging challenges in diagnosing renal cell carcinoma (RCC) include distinguishing malignant from benign tissues and determining the likely subtype. The goal is to show the algorithm’s ability to improve renal cell carcinoma identification and treatment, improving patient outcomes. (2) Methods: This study uses the European Deep-Health toolkit’s Convolutional Neural Network with ECVL, (European Computer Vision Library), and EDDL, (European Distributed Deep Learning Library). Image segmentation utilized U-net architecture and classification with resnet101. The model’s clinical efficiency was assessed utilizing kidney, tumor, Dice score, and renal cell carcinoma categorization quality. (3) Results: The raw dataset contains 457 healthy right kidneys, 456 healthy left kidneys, 76 pathological right kidneys, and 84 pathological left kidneys. Preparing raw data for analysis was crucial to algorithm implementation. Kidney segmentation performance was 0.84, and tumor segmentation mean Dice score was 0.675 for the suggested model. Renal cell carcinoma classification was 0.885 accurate. (4) Conclusion and key findings: The present study focused on analyzing data from both healthy patients and diseased renal patients, with a particular emphasis on data processing. The method achieved a kidney segmentation accuracy of 0.84 and mean Dice scores of 0.675 for tumor segmentation. The system performed well in classifying renal cell carcinoma, achieving an accuracy of 0.885, results which indicates that the technique has the potential to improve the diagnosis of kidney pathology.
]]>Journal of Imaging doi: 10.3390/jimaging9120279
Authors: Suhong Yoo Namhoon Kim
This study presents a methodology for the coarse alignment of light detection and ranging (LiDAR) point clouds, which involves estimating the position and orientation of each station using the pinhole camera model and a position/orientation estimation algorithm. Ground control points are obtained using LiDAR camera images and the point clouds are obtained from the reference station. The estimated position and orientation vectors are used for point cloud registration. To evaluate the accuracy of the results, the positions of the LiDAR and the target were measured using a total station, and a comparison was carried out with the results of semi-automatic registration. The proposed methodology yielded an estimated mean LiDAR position error of 0.072 m, which was similar to the semi-automatic registration value of 0.070 m. When the point clouds of each station were registered using the estimated values, the mean registration accuracy was 0.124 m, while the semi-automatic registration accuracy was 0.072 m. The high accuracy of semi-automatic registration is due to its capability for performing both coarse alignment and refined registration. The comparison between the point cloud with refined alignment using the proposed methodology and the point-to-point distance analysis revealed that the average distance was measured at 0.0117 m. Moreover, 99% of the points exhibited distances within the range of 0.0696 m.
]]>Journal of Imaging doi: 10.3390/jimaging9120278
Authors: Paolo Rota Miguel Angel Guevara Lopez Francesco Setti
In the rapidly evolving field of industrial machine learning, this Special Issue on Industrial Machine Learning Applications aims to shed light on the innovative strides made toward more intelligent, more efficient, and adaptive industrial processes [...]
]]>Journal of Imaging doi: 10.3390/jimaging9120277
Authors: Jiajun Zhang Georgina Cosma Sarah Bugby Jason Watkins
Image retrieval is the process of searching and retrieving images from a datastore based on their visual content and features. Recently, much attention has been directed towards the retrieval of irregular patterns within industrial or healthcare images by extracting features from the images, such as deep features, colour-based features, shape-based features, and local features. This has applications across a spectrum of industries, including fault inspection, disease diagnosis, and maintenance prediction. This paper proposes an image retrieval framework to search for images containing similar irregular patterns by extracting a set of morphological features (DefChars) from images. The datasets employed in this paper contain wind turbine blade images with defects, chest computerised tomography scans with COVID-19 infections, heatsink images with defects, and lake ice images. The proposed framework was evaluated with different feature extraction methods (DefChars, resized raw image, local binary pattern, and scale-invariant feature transforms) and distance metrics to determine the most efficient parameters in terms of retrieval performance across datasets. The retrieval results show that the proposed framework using the DefChars and the Manhattan distance metric achieves a mean average precision of 80% and a low standard deviation of ±0.09 across classes of irregular patterns, outperforming alternative feature–metric combinations across all datasets. Our proposed ImR framework performed better (by 8.71%) than Super Global, a state-of-the-art deep-learning-based image retrieval approach across all datasets.
]]>Journal of Imaging doi: 10.3390/jimaging9120276
Authors: Rina Buoy Masakazu Iwamura Sovila Srun Koichi Kise
Attention-based encoder–decoder scene text recognition (STR) architectures have been proven effective in recognizing text in the real world, thanks to their ability to learn an internal language model. Nevertheless, the cross-attention operation that is used to align visual and linguistic features during decoding is computationally expensive, especially in low-resource environments. To address this bottleneck, we propose a cross-attention-free STR framework that still learns a language model. The framework we propose is ViTSTR-Transducer, which draws inspiration from ViTSTR, a vision transformer (ViT)-based method designed for STR and the recurrent neural network transducer (RNN-T) initially introduced for speech recognition. The experimental results show that our ViTSTR-Transducer models outperform the baseline attention-based models in terms of the required decoding floating point operations (FLOPs) and latency while achieving a comparable level of recognition accuracy. Compared with the baseline context-free ViTSTR models, our proposed models achieve superior recognition accuracy. Furthermore, compared with the recent state-of-the-art (SOTA) methods, our proposed models deliver competitive results.
]]>Journal of Imaging doi: 10.3390/jimaging9120275
Authors: Amal El Kaid Karim Baïna
Three-dimensional human pose estimation has made significant advancements through the integration of deep learning techniques. This survey provides a comprehensive review of recent 3D human pose estimation methods, with a focus on monocular images, videos, and multi-view cameras. Our approach stands out through a systematic literature review methodology, ensuring an up-to-date and meticulous overview. Unlike many existing surveys that categorize approaches based on learning paradigms, our survey offers a fresh perspective, delving deeper into the subject. For image-based approaches, we not only follow existing categorizations but also introduce and compare significant 2D models. Additionally, we provide a comparative analysis of these methods, enhancing the understanding of image-based pose estimation techniques. In the realm of video-based approaches, we categorize them based on the types of models used to capture inter-frame information. Furthermore, in the context of multi-person pose estimation, our survey uniquely differentiates between approaches focusing on relative poses and those addressing absolute poses. Our survey aims to serve as a pivotal resource for researchers, highlighting state-of-the-art deep learning strategies and identifying promising directions for future exploration in 3D human pose estimation.
]]>Journal of Imaging doi: 10.3390/jimaging9120274
Authors: Maryam Zamanian Giorgio Treglia Iraj Abedi
Due to the importance of correct and timely diagnosis of bone metastases in advanced breast cancer (BrC), we performed a meta-analysis evaluating the diagnostic accuracy of [18F]FDG, or Na[18F]F PET, PET(/CT), and (/MRI) versus [99mTc]Tc-diphosphonates bone scintigraphy (BS). The PubMed, Embase, Scopus, and Scholar electronic databases were searched. The results of the selected studies were analyzed using pooled sensitivity and specificity, diagnostic odds ratio (DOR), positive–negative likelihood ratio (LR+–LR−), and summary receiver–operating characteristic (SROC) curves. Eleven studies including 753 BrC patients were included in the meta-analysis. The patient-based pooled values of sensitivity, specificity, and area under the SROC curve (AUC) for BS (with 95% confidence interval values) were 90% (86–93), 91% (87–94), and 0.93, respectively. These indices for [18F]FDG PET(/CT) were 92% (88–95), 99% (96–100), and 0.99, respectively, and for Na[18F]F PET(/CT) were 96% (90–99), 81% (72–88), and 0.99, respectively. BS has good diagnostic performance in detecting BrC bone metastases. However, due to the higher and balanced sensitivity and specificity of [18F]FDG PET(/CT) compared to BS and Na[18F]F PET(/CT), and its advantage in evaluating extra-skeletal lesions, [18F]FDG PET(/CT) should be the preferred multimodal imaging method for evaluating bone metastases of BrC, if available.
]]>Journal of Imaging doi: 10.3390/jimaging9120273
Authors: Yanxia Zheng Xiyuan Qian
Go is a game that can be won or lost based on the number of intersections surrounded by black or white pieces. The traditional method is a manual counting method, which is time-consuming and error-prone. In addition, the generalization of the current Go-image-recognition methods is poor, and accuracy needs to be further improved. To solve these problems, a Go-game image recognition based on an improved pix2pix was proposed. Firstly, a channel-coordinate mixed-attention (CCMA) mechanism was designed by combining channel attention and coordinate attention effectively; therefore, the model could learn the target feature information. Secondly, in order to obtain the long-distance contextual information, a deep dilated-convolution (DDC) module was proposed, which densely linked the dilated convolution with different dilated rates. The experimental results showed that compared with other existing Go-image-recognition methods, such as DenseNet, VGG-16, and Yolo v5, the proposed method could effectively improve the generalization ability and accuracy of a Go-image-recognition model, and the average accuracy rate was over 99.99%.
]]>Journal of Imaging doi: 10.3390/jimaging9120272
Authors: Sewon Lim Hayun Nam Hyemin Shin Sein Jeong Kyuseok Kim Youngjin Lee
In this study, we aimed to address the issue of noise amplification after scatter correction when using a virtual grid in breast X-ray images. To achieve this, we suggested an algorithm for estimating noise level and developed a noise reduction algorithm based on generative adversarial networks (GANs). Synthetic scatter in breast X-ray images were collected using Sizgraphy equipment and scatter correction was performed using dedicated software. After scatter correction, we determined the level of noise using noise-level function plots and trained a GAN using 42 noise combinations. Subsequently, we obtained the resulting images and quantitatively evaluated their quality by measuring the contrast-to-noise ratio (CNR), coefficient of variance (COV), and normalized noise–power spectrum (NNPS). The evaluation revealed an improvement in the CNR by approximately 2.80%, an enhancement in the COV by 12.50%, and an overall improvement in the NNPS across all frequency ranges. In conclusion, the application of our GAN-based noise reduction algorithm effectively reduced noise and demonstrated the acquisition of improved-quality breast X-ray images.
]]>Journal of Imaging doi: 10.3390/jimaging9120271
Authors: Johan Jönemo Anders Eklund
Brain age prediction from 3D MRI volumes using deep learning has recently become a popular research topic, as brain age has been shown to be an important biomarker. Training deep networks can be very computationally demanding for large datasets like the U.K. Biobank (currently 29,035 subjects). In our previous work, it was demonstrated that using a few 2D projections (mean and standard deviation along three axes) instead of each full 3D volume leads to much faster training at the cost of a reduction in prediction accuracy. Here, we investigated if another set of 2D projections, based on higher-order statistical central moments and eigenslices, leads to a higher accuracy. Our results show that higher-order moments do not lead to a higher accuracy, but that eigenslices provide a small improvement. We also show that an ensemble of such models provides further improvement.
]]>Journal of Imaging doi: 10.3390/jimaging9120270
Authors: Alessandro Wollek Sardi Hyska Bastian Sabel Michael Ingrisch Tobias Lasser
Public chest X-ray (CXR) data sets are commonly compressed to a lower bit depth to reduce their size, potentially hiding subtle diagnostic features. In contrast, radiologists apply a windowing operation to the uncompressed image to enhance such subtle features. While it has been shown that windowing improves classification performance on computed tomography (CT) images, the impact of such an operation on CXR classification performance remains unclear. In this study, we show that windowing strongly improves the CXR classification performance of machine learning models and propose WindowNet, a model that learns multiple optimal window settings. Our model achieved an average AUC score of 0.812 compared with the 0.759 score of a commonly used architecture without windowing capabilities on the MIMIC data set.
]]>Journal of Imaging doi: 10.3390/jimaging9120269
Authors: Ammara Ammara Ghulam Abbas Francesco V. Pepe Muhammad Afzaal Muhammad Qamar Abdul Ghuffar
Nanoslits have various applications, including localized surface plasmon resonance (LSPR)-based nanodevices, optical biosensors, superfocusing, high-efficiency refractive index sensors and chip-based protein detection. In this study, the effect of substrates on the optical properties of gold nanoslits placed in free space is discussed; for this purpose, glass BK7 and Al2O3 are used as substrates and the wavelength of incident light is supposed to be 650 nm. The optical properties, power flow and electric field enhancement for gold nanoslits are investigated by using the finite element method (FEM) in COMSOL Multiphysics software. The effect of polarization of an incident electromagnetic wave as it propagates from a gold nanoslit is also analyzed. As special case, the effect of glass and alumina substrate on magnetic field, power flow and electric field enhancement is discussed. The goal of this research is to investigate the phenomenon of power flow and electric field enhancement. The study of power flow in gold nanoslits provides valuable insights into the behavior of light at the nanoscale and offers opportunities for developing novel applications in the field of nanophotonics and plasmonics. The consequences of this study show the significance of gold nanoslits as optical nanosensors.
]]>Journal of Imaging doi: 10.3390/jimaging9120268
Authors: Rodrigo Dalvit Carvalho da Silva Ramin Soltanzadeh Chase R. Figley
Coronary artery disease is one of the leading causes of death worldwide, and medical imaging methods such as coronary artery computed tomography are vitally important in its detection. More recently, various computational approaches have been proposed to automatically extract important artery coronary features (e.g., vessel centerlines, cross-sectional areas along vessel branches, etc.) that may ultimately be able to assist with more accurate and timely diagnoses. The current study therefore validated and benchmarked a recently developed automated 3D centerline extraction method for coronary artery centerline tracking using synthetically segmented coronary artery models based on the widely used Rotterdam Coronary Artery Algorithm Evaluation Framework (RCAAEF) training dataset. Based on standard accuracy metrics and the ground truth centerlines of all 32 coronary vessel branches in the RCAAEF training dataset, this 3D divide and conquer Voronoi diagram method performed exceptionally well, achieving an average overlap accuracy (OV) of 99.97%, overlap until first error (OF) of 100%, overlap of the clinically relevant portion of the vessel (OT) of 99.98%, and an average error distance inside the vessels (AI) of only 0.13 mm. Accuracy was also found to be exceptionally for all four coronary artery sub-types, with average OV values of 99.99% for right coronary arteries, 100% for left anterior descending arteries, 99.96% for left circumflex arteries, and 100% for large side-branch vessels. These results validate that the proposed method can be employed to quickly, accurately, and automatically extract 3D centerlines from segmented coronary arteries, and indicate that it is likely worthy of further exploration given the importance of this topic.
]]>Journal of Imaging doi: 10.3390/jimaging9120267
Authors: Evangelia Siomou Dimitrios K. Filippiadis Efstathios P. Efstathopoulos Ioannis Antonakos George S. Panayiotakis
This study establishes typical Diagnostic Reference Levels (DRL) values and assesses patient doses in computed tomography (CT)-guided biopsy procedures. The Effective Dose (ED), Entrance Skin Dose (ESD), and Size-Specific Dose Estimate (SSDE) were calculated using the relevant literature-derived conversion factors. A retrospective analysis of 226 CT-guided biopsies across five categories (Iliac bone, liver, lung, mediastinum, and para-aortic lymph nodes) was conducted. Typical DRL values were computed as median distributions, following guidelines from the International Commission on Radiological Protection (ICRP) Publication 135. DRLs for helical mode CT acquisitions were set at 9.7 mGy for Iliac bone, 8.9 mGy for liver, 8.8 mGy for lung, 7.9 mGy for mediastinal mass, and 9 mGy for para-aortic lymph nodes biopsies. In contrast, DRLs for biopsy acquisitions were 7.3 mGy, 7.7 mGy, 5.6 mGy, 5.6 mGy, and 7.4 mGy, respectively. Median SSDE values varied from 7.6 mGy to 10 mGy for biopsy acquisitions and from 11.3 mGy to 12.6 mGy for helical scans. Median ED values ranged from 1.6 mSv to 5.7 mSv for biopsy scans and from 3.9 mSv to 9.3 mSv for helical scans. The study highlights the significance of using DRLs for optimizing CT-guided biopsy procedures, revealing notable variations in radiation exposure between helical scans covering entire anatomical regions and localized biopsy acquisitions.
]]>Journal of Imaging doi: 10.3390/jimaging9120266
Authors: Luca Zedda Andrea Loddo Cecilia Di Ruberto
Malaria is a potentially fatal infectious disease caused by the Plasmodium parasite. The mortality rate can be significantly reduced if the condition is diagnosed and treated early. However, in many underdeveloped countries, the detection of malaria parasites from blood smears is still performed manually by experienced hematologists. This process is time-consuming and error-prone. In recent years, deep-learning-based object-detection methods have shown promising results in automating this task, which is critical to ensure diagnosis and treatment in the shortest possible time. In this paper, we propose a novel Transformer- and attention-based object-detection architecture designed to detect malaria parasites with high efficiency and precision, focusing on detecting several parasite sizes. The proposed method was tested on two public datasets, namely MP-IDB and IML. The evaluation results demonstrated a mean average precision exceeding 83.6% on distinct Plasmodium species within MP-IDB and reaching nearly 60% on IML. These findings underscore the effectiveness of our proposed architecture in automating malaria parasite detection, offering a potential breakthrough in expediting diagnosis and treatment processes.
]]>Journal of Imaging doi: 10.3390/jimaging9120265
Authors: Renato R. Maaliw
The advancement of medical prognoses hinges on the delivery of timely and reliable assessments. Conventional methods of assessments and diagnosis, often reliant on human expertise, lead to inconsistencies due to professionals’ subjectivity, knowledge, and experience. To address these problems head-on, we harnessed artificial intelligence’s power to introduce a transformative solution. We leveraged convolutional neural networks to engineer our SCOLIONET architecture, which can accurately identify Cobb angle measurements. Empirical testing on our pipeline demonstrated a mean segmentation accuracy of 97.50% (Sorensen–Dice coefficient) and 96.30% (Intersection over Union), indicating the model’s proficiency in outlining vertebrae. The level of quantification accuracy was attributed to the state-of-the-art design of the atrous spatial pyramid pooling to better segment images. We also compared physician’s manual evaluations against our machine driven measurements to validate our approach’s practicality and reliability further. The results were remarkable, with a p-value (t-test) of 0.1713 and an average acceptable deviation of 2.86 degrees, suggesting insignificant difference between the two methods. Our work holds the premise of enabling medical practitioners to expedite scoliosis examination swiftly and consistently in improving and advancing the quality of patient care.
]]>Journal of Imaging doi: 10.3390/jimaging9120264
Authors: Thawatchai Prabsattroo Kanokpat Wachirasirikul Prasit Tansangworn Puengjai Punikhom Waraporn Sudchai
Computed tomography examinations have caused high radiation doses for patients, especially for CT scans of the brain. This study aimed to optimize the radiation dose and image quality in adult brain CT protocols. Images were acquired using a Catphan 700 phantom. Radiation doses were recorded as CTDIvol and dose length product (DLP). CT brain protocols were optimized by varying parameters such as kVp, mAs, signal-to-noise ratio (SNR) level, and Clearview iterative reconstruction (IR). The image quality was also evaluated using AutoQA Plus v.1.8.7.0 software. CT number accuracy and linearity had a robust positive correlation with the linear attenuation coefficient (µ) and showed more inaccurate CT numbers when using 80 kVp. The modulation transfer function (MTF) showed a higher value in 100 and 120 kVp protocols (p < 0.001), while high-contrast spatial resolution showed a higher value in 80 and 100 kVp protocols (p < 0.001). Low-contrast detectability and the contrast-to-noise ratio (CNR) tended to increase when using high mAs, SNR, and the Clearview IR protocol. Noise decreased when using a high radiation dose and a high percentage of Clearview IR. CTDIvol and DLP were increased with increasing kVp, mAs, and SNR levels, while the increasing percentage of Clearview did not affect the radiation dose. Optimized protocols, including radiation dose and image quality, should be evaluated to preserve diagnostic capability. The recommended parameter settings include kVp set between 100 and 120 kVp, mAs ranging from 200 to 300 mAs, SNR level within the range of 0.7–1.0, and an iterative reconstruction value of 30% Clearview to 60% or higher.
]]>Journal of Imaging doi: 10.3390/jimaging9120263
Authors: Ahmad Ihsan Khairul Muttaqin Rahmatul Fajri Mursyidah Mursyidah Islam Md Rizwanul Fattah
In this paper, we introduce a new and advanced multi-feature selection method for bacterial classification that uses the salp swarm algorithm (SSA). We improve the SSA’s performance by using opposition-based learning (OBL) and a local search algorithm (LSA). The proposed method has three main stages, which automate the categorization of bacteria based on their unique characteristics. The method uses a multi-feature selection approach augmented by an enhanced version of the SSA. The enhancements include using OBL to increase population diversity during the search process and LSA to address local optimization problems. The improved salp swarm algorithm (ISSA) is designed to optimize multi-feature selection by increasing the number of selected features and improving classification accuracy. We compare the ISSA’s performance to that of several other algorithms on ten different test datasets. The results show that the ISSA outperforms the other algorithms in terms of classification accuracy on three datasets with 19 features, achieving an accuracy of 73.75%. Additionally, the ISSA excels at determining the optimal number of features and producing a better fit value, with a classification error rate of 0.249. Therefore, the ISSA method is expected to make a significant contribution to solving feature selection problems in bacterial analysis.
]]>Journal of Imaging doi: 10.3390/jimaging9120262
Authors: María Villa-Monedero Manuel Gil-Martín Daniel Sáez-Trigueros Andrzej Pomirski Rubén San-Segundo
Several sign language datasets are available in the literature. Most of them are designed for sign language recognition and translation. This paper presents a new sign language dataset for automatic motion generation. This dataset includes phonemes for each sign (specified in HamNoSys, a transcription system developed at the University of Hamburg, Hamburg, Germany) and the corresponding motion information. The motion information includes sign videos and the sequence of extracted landmarks associated with relevant points of the skeleton (including face, arms, hands, and fingers). The dataset includes signs from three different subjects in three different positions, performing 754 signs including the entire alphabet, numbers from 0 to 100, numbers for hour specification, months, and weekdays, and the most frequent signs used in Spanish Sign Language (LSE). In total, there are 6786 videos and their corresponding phonemes (HamNoSys annotations). From each video, a sequence of landmarks was extracted using MediaPipe. The dataset allows training an automatic system for motion generation from sign language phonemes. This paper also presents preliminary results in motion generation from sign phonemes obtaining a Dynamic Time Warping distance per frame of 0.37.
]]>Journal of Imaging doi: 10.3390/jimaging9120261
Authors: Dimitris Kalatzis Ellas Spyratou Maria Karnachoriti Maria Anthi Kouri Ioannis Stathopoulos Nikolaos Danias Nikolaos Arkadopoulos Spyros Orfanoudakis Ioannis Seimenis Athanassios G. Kontos Efstathios P. Efstathopoulos
Raman spectroscopy (RS) techniques are attracting attention in the medical field as a promising tool for real-time biochemical analyses. The integration of artificial intelligence (AI) algorithms with RS has greatly enhanced its ability to accurately classify spectral data in vivo. This combination has opened up new possibilities for precise and efficient analysis in medical applications. In this study, healthy and cancerous specimens from 22 patients who underwent open colorectal surgery were collected. By using these spectral data, we investigate an optimal preprocessing pipeline for statistical analysis using AI techniques. This exploration entails proposing preprocessing methods and algorithms to enhance classification outcomes. The research encompasses a thorough ablation study comparing machine learning and deep learning algorithms toward the advancement of the clinical applicability of RS. The results indicate substantial accuracy improvements using techniques like baseline correction, L2 normalization, filtering, and PCA, yielding an overall accuracy enhancement of 15.8%. In comparing various algorithms, machine learning models, such as XGBoost and Random Forest, demonstrate effectiveness in classifying both normal and abnormal tissues. Similarly, deep learning models, such as 1D-Resnet and particularly the 1D-CNN model, exhibit superior performance in classifying abnormal cases. This research contributes valuable insights into the integration of AI in medical diagnostics and expands the potential of RS methods for achieving accurate malignancy classification.
]]>Journal of Imaging doi: 10.3390/jimaging9120260
Authors: Dara Molloy Brian Deegan Darragh Mullins Enda Ward Jonathan Horgan Ciaran Eising Patrick Denny Edward Jones Martin Glavin
In advanced driver assistance systems (ADAS) or autonomous vehicle research, acquiring semantic information about the surrounding environment generally relies heavily on camera-based object detection. Image signal processors (ISPs) in cameras are generally tuned for human perception. In most cases, ISP parameters are selected subjectively and the resulting image differs depending on the individual who tuned it. While the installation of cameras on cars started as a means of providing a view of the vehicle’s environment to the driver, cameras are increasingly becoming part of safety-critical object detection systems for ADAS. Deep learning-based object detection has become prominent, but the effect of varying the ISP parameters has an unknown performance impact. In this study, we analyze the performance of 14 popular object detection models in the context of changes in the ISP parameters. We consider eight ISP blocks: demosaicing, gamma, denoising, edge enhancement, local tone mapping, saturation, contrast, and hue angle. We investigate two raw datasets, PASCALRAW and a custom raw dataset collected from an advanced driver assistance system (ADAS) perspective. We found that varying from a default ISP degrades the object detection performance and that the models differ in sensitivity to varying ISP parameters. Finally, we propose a novel methodology that increases object detection model robustness via ISP variation data augmentation.
]]>Journal of Imaging doi: 10.3390/jimaging9120259
Authors: Kaushik Roy Christian Simon Peyman Moghadam Mehrtash Harandi
Lifelong learning portrays learning gradually in nonstationary environments and emulates the process of human learning, which is efficient, robust, and able to learn new concepts incrementally from sequential experience. To equip neural networks with such a capability, one needs to overcome the problem of catastrophic forgetting, the phenomenon of forgetting past knowledge while learning new concepts. In this work, we propose a novel knowledge distillation algorithm that makes use of contrastive learning to help a neural network to preserve its past knowledge while learning from a series of tasks. Our proposed generalized form of contrastive distillation strategy tackles catastrophic forgetting of old knowledge, and minimizes semantic drift by maintaining a similar embedding space, as well as ensures compactness in feature distribution to accommodate novel tasks in a current model. Our comprehensive study shows that our method achieves improved performances in the challenging class-incremental, task-incremental, and domain-incremental learning for supervised scenarios.
]]>Journal of Imaging doi: 10.3390/jimaging9120258
Authors: Xingshuo Peng Keyuan Wang Zelin Zhang Nan Geng Zhiyi Zhang
The phenotyping of plant growth enriches our understanding of intricate genetic characteristics, paving the way for advancements in modern breeding and precision agriculture. Within the domain of phenotyping, segmenting 3D point clouds of plant organs is the basis of extracting plant phenotypic parameters. In this study, we introduce a novel method for point-cloud downsampling that adeptly mitigates the challenges posed by sample imbalances. In subsequent developments, we architect a deep learning framework founded on the principles of SqueezeNet for the segmentation of plant point clouds. In addition, we also use the time series as input variables, which effectively improves the segmentation accuracy of the network. Based on semantic segmentation, the MeanShift algorithm is employed to execute instance segmentation on the point-cloud data of crops. In semantic segmentation, the average Precision, Recall, F1-score, and IoU of maize reached 99.35%, 99.26%, 99.30%, and 98.61%, and the average Precision, Recall, F1-score, and IoU of tomato reached 97.98%, 97.92%, 97.95%, and 95.98%. In instance segmentation, the accuracy of maize and tomato reached 98.45% and 96.12%. This research holds the potential to advance the fields of plant phenotypic extraction, ideotype selection, and precision agriculture.
]]>Journal of Imaging doi: 10.3390/jimaging9120257
Authors: Thomas Gerhard Wolf Samuel Basmaci Sven Schumann Andrea Lisa Waber
To examine root canal morphology of mandibular second premolars (Mn2P) of a mixed Swiss-German population by means of micro-computed tomography (micro-CT). Root canal configuration (RCC) of 102 Mn2P were investigated using micro-CT unit (µCT 40; SCANCO Medical AG, Brüttisellen, Switzerland) with 3D software imaging (VGStudio Max 2.2; Volume Graphics GmbH, Heidelberg, Germany), described with a four-digit system code indicating the main root canal from coronal to apical thirds and the number of main foramina. A total of 12 different RCCs were detected. 1-1-1/1 (54.9%) was most frequently observed RCC, followed by 1-1-1/2 (14.7%), 1-1-2/2 (10.8%), 1-2-2/2 (4.9%), 1-1-3/3 (3.9%), 1-1-1/3 (2.9%), 2-1-1/1 (2.9%) and less frequently 1-1-2/3, 1-2-1/2, 2-1-2/2, 1-1-2/5, 1-1-1/4 with each 1.0%. No accessory foramina were present in 35.3%, one in 35.3%, two in 21.6%, three and four in 2.9%, and five in 2.0%. In 55.9% Mn2Ps, accessory root canals were present in apical third and 8.8% in middle third of a root. Connecting canals were observed less frequently (6.9%) in apical and 2.9% in the middle third, no accessory/connecting canals in coronal third. Every tenth tooth showed at least or more than three main foramina. Almost two thirds of the sample showed accessory root canals, predominantly in apical third. The mainly single-rooted sample of Mn2Ps showed less frequent morphological diversifications than Mn1Ps.
]]>Journal of Imaging doi: 10.3390/jimaging9120256
Authors: Roser Viñals Jean-Philippe Thiran
Ultrafast ultrasound imaging, characterized by high frame rates, generates low-quality images. Convolutional neural networks (CNNs) have demonstrated great potential to enhance image quality without compromising the frame rate. However, CNNs have been mostly trained on simulated or phantom images, leading to suboptimal performance on in vivo images. In this study, we present a method to enhance the quality of single plane wave (PW) acquisitions using a CNN trained on in vivo images. Our contribution is twofold. Firstly, we introduce a training loss function that accounts for the high dynamic range of the radio frequency data and uses the Kullback–Leibler divergence to preserve the probability distributions of the echogenicity values. Secondly, we conduct an extensive performance analysis on a large new in vivo dataset of 20,000 images, comparing the predicted images to the target images resulting from the coherent compounding of 87 PWs. Applying a volunteer-based dataset split, the peak signal-to-noise ratio and structural similarity index measure increase, respectively, from 16.466 ± 0.801 dB and 0.105 ± 0.060, calculated between the single PW and target images, to 20.292 ± 0.307 dB and 0.272 ± 0.040, between predicted and target images. Our results demonstrate significant improvements in image quality, effectively reducing artifacts.
]]>Journal of Imaging doi: 10.3390/jimaging9120255
Authors: Julian Meißner Michael Kisiel Nagarajan M. Thoppey Michael M. Morlock Sebastian Bannwarth
Three-dimensional body scanners are attracting increasing interest in various application areas. To evaluate their accuracy, their 3D point clouds must be compared to a reference system by using a reference object. Since different scanning systems use different coordinate systems, an alignment is required for their evaluation. However, this process can result in translational and rotational misalignment. To understand the effects of alignment errors on the accuracy of measured circumferences of the human lower body, such misalignment is simulated in this paper and the resulting characteristic error patterns are analyzed. The results show that the total error consists of two components, namely translational and tilt. Linear correlations were found between the translational error (R2 = 0.90, … 0.97) and the change in circumferences as well as between the tilt error (R2 = 0.55, … 0.78) and the change in the body’s mean outline. Finally, by systematic analysis of the error patterns, recommendations were derived and applied to 3D body scans of human subjects resulting in a reduction of error by 67% and 84%.
]]>